← All Posts

Cutting Costs by 40%: CLI Tools and Model Tiering

5 min readFebruary 8, 2026
optimizationcosttooling

Cutting Costs by 40%: CLI Tools and Model Tiering

Cutting Costs by 40%: CLI Tools and Model Tiering
Cutting Costs by 40%: CLI Tools and Model Tiering

Day 8 was the payoff. After 7 days of building, breaking, and fixing, it was time to optimize. The goal: reduce costs without reducing quality.

Cost optimization — before and after CLI tools and model tiering
Cost optimization — before and after CLI tools and model tiering

The Cost Problem

My agent ecosystem was running roughly $333/week on a path to $1,400/month. For a personal productivity tool, that's significant. The breakdown revealed two major cost drivers:

  1. All agents running on Opus (~5x the cost of Sonnet)
  2. Failed runs requiring retries (auth expiry, search failures, incomplete results)

The retry problem was especially insidious. My daily briefing agent was dispatched 11 times across 4 days — it should have been dispatched 4 times. Nearly 3x the expected cost, all because of data source reliability issues.

Fix 1: Direct CLI Tools

My enterprise search platform (Glean MCP) had a fundamental reliability problem: auth tokens expired every hour. This caused:

  • Silent failures where agents proceeded with no data
  • Retry loops that multiplied costs
  • Inconsistent results depending on when the token was last refreshed

The fix was replacing Glean with a direct CLI tool (`desk`) for Google Workspace access. This CLI uses OAuth refresh tokens that never expire — no manual re-authentication needed.

I updated four agents to use the new CLI as their primary data source:

AgentData SourceResult
Daily briefingCalendar, sheets, docs52% fewer tokens (47K vs 98K)
EOD summaryCalendar, meeting notes3/3 notes found (vs 0/3 with expired auth)
Standup syncMeeting notesFirst-try success (vs retry failures)
Program monitorWeekly huddle notesReliable retrieval

The token reduction alone was significant — the briefing agent was consuming 98K tokens per run with Glean (bloated response metadata) vs 47K with direct CLI access.

Fix 2: Model Tiering

Not every agent needs Opus. The insight from Day 3 was that synthesis tasks need Opus but mechanical tasks don't. Day 8 was when I actually applied this:

Kept on Opus (complex reasoning, cross-source synthesis):

  • Daily briefing — synthesizes calendar + Jira + Slack + Confluence
  • EOD summary — cross-references meetings, action items, and tomorrow's context
  • DCT program monitor — scans 10 channels, cross-references with Jira
  • AI pillar monitor — scans 8 channels, reads 7 Confluence pages

Switched to Sonnet (structured data formatting, template-driven output):

  • Sprint board publisher — pull Jira data, format as table
  • Roadmap publisher — pull initiatives, group by pillar/quarter
  • Daily update publisher — format sprint tickets as Slack text
  • Initiative notes checker — parse fields, classify freshness
  • Standup sync — match notes to tickets, format comments

The rule: if an agent's job is "pull data and format it," Sonnet is fine. If its job is "read from 5 sources and tell me what matters," it needs Opus.

Fix 3: Session Consolidation

Day 7 had 21 sessions. Each new session costs ~$3 in cache creation overhead (loading CLAUDE.md, MEMORY.md, and other context into the model's cache). Twenty-one sessions meant ~$63 just in startup costs.

Day 8: 2 sessions. Same work output.

The trick is using longer sessions with Claude Code's `/clear` command (which resets conversation context without closing the session) instead of opening new terminal windows.

The Numbers

MetricBefore (Day 7)After (Day 8)Change
Sessions/day212-90%
Daily briefing tokens98K47K-52%
EOD notes retrieval0/3 (expired auth)3/3100% reliability
Projected weekly cost~$333~$200-40%

Data Source Reliability Scorecard

After a week of real-world usage, here's how the three data source integrations stacked up:

SourceAuthAccuracyUptimeToken Efficiency
Direct CLI (desk)A+ (never expires)A+ (no truncation)100%A+ (15-52% fewer tokens)
Atlassian MCPA (stable)A (direct API)~95%B+ (ADF is verbose)
Enterprise Search (Glean)D (hourly expiry)B- (truncates/summarizes)~65%C (bloated metadata)

The enterprise search platform still has one exclusive capability: Slack channel searches. There's no alternative for that. But for everything else — calendars, docs, sheets, drive — the direct CLI tool is strictly superior.

The Final Architecture

``` Day 1: 0 agents, 0 skills, $0 infrastructure Day 8: 17 agents, 35 skills, 3 data sources, ~$200/week

Routing: desk > Atlassian MCP > Glean (fallback only) Models: Opus (synthesis) + Sonnet (formatting) Dispatch: Hierarchical (team-lead orchestrator → specialists) Memory: Persistent MEMORY.md + agent-changelog.md Reports: Local first → Drive sync ```

Key Takeaways

  1. Direct API access beats middleware every time — fewer tokens, no auth expiry, no truncation
  2. Model tiering is the single biggest cost lever — Opus for reasoning, Sonnet for formatting
  3. Session consolidation eliminates cache creation overhead — 2 long sessions beats 20 short ones
  4. Reliability improvements compound — fixing retries alone saves 3x on affected agents
  5. Track data source reliability as a first-class metric — it directly drives agent cost through retry volume
  6. Optimize last, not first — you need to build and break things before you know what to optimize