← All Posts

Week 3: When Agents Start Auditing Themselves

5 min readMarch 15, 2026
claude-codeai-agentscost-optimizationgovernancesecurityweekly-reflection

Week 3: When Agents Start Auditing Themselves

Agents Auditing Agents
Agents Auditing Agents

Two weeks in, I had 22 agents, 35 skills, and a system that could pull data from Jira, Confluence, Google Workspace, and Slack in parallel. It was powerful. It was also burning $672 in a single day, fighting 112K characters of JSON I couldn't parse, and running on permissions I'd never actually reviewed.

Week 3 is the week the system grew up. Not by getting bigger — by getting honest about its own limits.

The $672 Wake-Up Call

I'd been running agents all week without looking at the bill. When I finally checked, one day stood out: $672. For context, a well-optimized day should cost $15-25.

The culprit was embarrassingly simple. "Fast mode" — a toggle I'd enabled weeks ago for snappier interactive responses — was silently doubling the cost of every agent dispatch. Interactive chat? Fast mode is great. Firing off 15 automated agents that don't need speed? You're paying twice for nothing.

The immediate fix was turning off fast mode for agent dispatches. But the real fix was making cost visible. I added a real-time cost indicator to my terminal status bar — a number that updates with every API call, like a taxi meter ticking in the corner of your eye.

The behavior change was immediate. When you can see the meter running, you naturally batch requests, skip unnecessary dispatches, and think twice before running a full pipeline when you only need one phase.

Cost Optimization Impact
$672
Worst single day
$100–150
Unoptimized weekly
$35–50
Optimized weekly
Three changes: fast mode off for agents, model tiering (Opus for synthesis, Sonnet for mechanical), scoped execution (run single phases instead of full pipelines)

Model tiering was the other big lever. I'd previously migrated all agents to Opus because quality mattered. But not every agent needs Opus. A publisher that formats sprint data into a Confluence table? Sonnet handles that fine. An agent that cross-references five data sources and synthesizes a risk assessment? That's Opus territory. Matching model to cognitive load cut costs by roughly 60% with no quality loss on the mechanical tasks.

The 112K Character Wall

One of my agents audited team wiki pages — finding stale ones, flagging missing ones, checking if dissolved teams still had active pages. The analysis part worked beautifully. It found six dissolved teams still listed on the wiki, some 2.5 years stale. Five new teams had no wiki pages at all.

Then I tried to automate the fix.

Confluence stores page content in Atlassian Document Format — a deeply nested JSON structure. The page I needed to edit was 112,000 characters of it. I wrote a Python parser. It broke on nested tables. I wrote a second parser. It couldn't handle the custom macros. I wrote a third parser, and it worked — but the entire parsed payload exceeded the MCP tool's parameter limit.

Three parsers. Two hours. Zero successful edits.

Then I stopped and did the 13 wiki edits manually. It took three minutes.

The lesson crystallized into a rule I now apply constantly: if the manual path takes three minutes and the automated path has taken two hours of debugging with no end in sight, do it manually. Automate the analysis — the hard part was finding those six stale pages across hundreds. Don't automate the edit when it's 13 clicks.

This comes up more than you'd expect with AI agents. The instinct is to automate end-to-end. But the value curve isn't linear. The last 10% of automation often costs 90% of the effort. Know when to stop.

Agents Auditing Agents

By day 18, I had enough agents that I needed agents to check on them.

The permissions-auditor analyzed 83 session transcripts (14 days of history), cataloged every tool invocation, and cross-referenced it against my Claude Code permission settings. What it found was unsettling:

  • A shell command had somehow leaked into the auto-allow array — meaning it could run without confirmation
  • Stale duplicate entries cluttered the permission list
  • Several tools that should have required confirmation were running silently
  • Destructive commands like rm and kill weren't explicitly blocked

I'd been running this system daily for two weeks without realizing that my permission config had drifted. The permissions-auditor found issues in its own infrastructure.

The security-auditor ran an 8-phase review: token permissions, credential exposure, hook integrity, skill supply chain analysis, MCP server configs, git credential handling, report file exposure, and a summary with severity ratings. It treated my AI tooling with the same rigor you'd apply to a production deployment — because that's what it had become.

The Governance Stack
Permissions Auditor
→ What tools can run without confirmation? What's drifted?
Security Auditor
→ Are credentials exposed? Are hooks intact? Supply chain clean?
Cost Tracker
→ What's each agent actually costing? Where's the waste?
Duplicate Detection
→ Has this Jira comment already been posted? Idempotency checks.

The pattern that emerged: any system complex enough to be useful is complex enough to need governance. When your agents are making API calls, posting to Jira, and accessing sensitive data across multiple services, "it works" isn't sufficient. You need to know how it works, what it's allowed to do, and whether those answers have drifted since you last checked.

The First Full Operational Day

Day 12 was the day the system stopped being a project and started being infrastructure.

The morning routine: fire up the daily assistant, which pulls my calendar, cross-references Jira sprint data, scans overnight Slack activity, and generates a briefing with meeting prep notes for each meeting. During the day: standup-note-to-jira converts voice meeting notes into structured Jira comments. End of day: the EOD summary agent pulls everything together — what happened in each meeting, what action items emerged, what moved on the sprint board.

For the first time, I wasn't building agents. I was using them. The daily cycle just ran.

But here's the nuance that matters: it wasn't fully autonomous. The standup-to-Jira agent drafted five Jira comments. I reviewed them and posted one. The daily update publisher generated a Slack message. I edited two lines before sending. The briefing flagged three risks. I escalated one.

This is the pattern I'd recommend to anyone building agent systems: human-in-the-loop for all writes, autonomous for all reads. Let the agents gather, analyze, and draft. Keep the final action — the post, the send, the update — in human hands. Not because the agents can't do it, but because the five seconds of review catches the one time in twenty that the context was wrong.

The Numbers After Three Weeks

MetricWeek 1Week 2Week 3
Agents111722
Daily cost$80-150$40-80$15-25
Reports generated~10~2560+ total
Manual hours saved/week2-35-78-12
Biggest single agent180 lines450 lines688 lines

The cost curve is the story worth telling. Week 1 was expensive because everything was new and I was iterating constantly. Week 2 was cheaper because agents stabilized. Week 3 was cheapest because I stopped running things I didn't need, scoped execution to single phases, and matched models to tasks.

The hours saved are real but hard to measure precisely. What I can say: the daily briefing replaces 45 minutes of morning prep. The EOD summary replaces 30 minutes of note consolidation. The sprint board publisher replaces an hour of manual Confluence formatting. These add up.

What's Different Now

Three weeks ago, I typed "how do I connect to Jira?" into Claude Code. Now I have a fleet of 22 agents that runs my daily workflow, audits its own permissions, tracks its own costs, and generates 60+ reports — all from markdown files and API calls.

The biggest shift wasn't technical. It was mental. I stopped thinking of Claude Code as a tool I use and started thinking of it as infrastructure I operate. That changes everything — how you think about cost, permissions, monitoring, and governance.

If you're in week one of your own AI agent journey, here's what I wish I'd known: the building phase is the easy part. The interesting problems — cost, governance, knowing when to stop automating — start when the system actually works.