TPM Agent Ecosystem
35 specialized AI agents and 42 reusable skills running across two pillars (AI + Tech Investment) — daily briefings, sprint health, program monitoring, governance audits. Config-driven multi-tenancy made the scope expansion possible without a rebuild.
Scroll to orbit the agents
TPM Agent Ecosystem

A hierarchical multi-agent system built on Claude Code that handles the full spectrum of TPM daily operations across two programs at once — AI initiatives and a Tech Investment portfolio of platform/infrastructure squads. 35 agents, 42 reusable skills, 2 pillars, ~$35–50/week, 12–15 hours saved per week.
It started as a single-tenant toolkit for one program. When my scope doubled in April 2026, the ecosystem absorbed the second pillar in a single weekend refactor — because the agents were portable. That outcome is the whole story.
Architecture
One Orchestrator routes requests to specialized sub-agents based on task type and pillar. Model tiering matches cognitive load to cost — Opus for cross-source synthesis, Sonnet for structured/mechanical transformations. Each agent reads a per-pillar YAML config at startup, so the same code drives both programs.
tpm-team-lead (Orchestrator)
│ --pillar=ai | --pillar=ti
│
├── AI Pillar (~/.tpm/pillars/ai.yml) 18 agents
│ ├── Opus daily-briefing · eod-summary · program-monitor
│ │ portfolio-review · risk-radar
│ └── Sonnet sprint-board · roadmap-publisher · standup-notes
│ feedback-triage · launch-readiness · …
│
├── Tech Investment Pillar (~/.tpm/pillars/ti.yml) 17 agents
│ ├── Opus migration-watch · ktlo-radar · dep-coordinator
│ │ decentralization-lens
│ └── Sonnet defect-hygiene · milestone-pulse · roadmap-sync
│ squad-page-auditor · …
│
└── Governance — agents auditing agents
permissions-auditor · security-auditor · trace-collector
The code is the same; the YAML is the only diff. A new pillar today is a new file, not a new fork.
Config-driven multi-tenancy
The single move that made the scope expansion survivable. Each agent takes a --pillar=<name> flag and reads the matching YAML from ~/.tpm/pillars/:
# pillars/ai.yml name: AI slack_channels: ["#ai-eng", "#ai-ops", "#ai-launch"] jira: { project: AIPLAT, ktlo_label: ktlo-ai } confluence: { space: AI, roadmap_page: "AI Roadmap" }
No conditionals in agent code, no environment-variable contortions, no implicit defaults. Each agent fails fast on missing config (silent overreach is worse than a hard error). Each has a validate-config mode that pre-flights a tenant before it gets cron'd.
The refactor itself was about six hours on a Saturday. The cost of not doing it the first time was the whole second weekend.
MCP & CLI integrations
Every agent connects to real data sources via Model Context Protocol or direct CLI tools — not scraping, not screenshots.
| Integration | Data | Method |
|---|---|---|
| Jira | Tickets, sprints, story points, epics | Atlassian MCP |
| Confluence | Wiki pages, agendas, sprint goals | Atlassian MCP |
| Slack | Channel activity, thread context | Slack MCP |
| Google Workspace | Calendar, Docs, Drive (Gemini meeting notes) | Google Workspace MCP |
| GitHub | PR state, deploy status, file diffs | GitHub MCP |
| Notion | Cross-team observability dashboard | Notion API |
| Obsidian | TPM brain (vault, daily notes, initiative pages) | filesystem + Dataview |
Impact (week 8 of operation)
| Metric | Week 1 | Week 8 |
|---|---|---|
| Production agents | 11 | 35 |
| Reusable skills | 15 | 42 |
| Pillars covered | 1 | 2 |
| Weekly cost | $100–150 | $35–50 |
| Reports generated/week | ~10 | 120+ |
| Manual hours saved/week | 2–3 | 12–15 |
Key design patterns
- Config-driven tenancy — Per-pillar YAML, never hardcoded constants. Adding a tenant is a file, not a fork.
- Model tiering — Opus for synthesis, Sonnet for mechanical. ~40% cost cut with no measurable quality drop on structured tasks.
- Scoped execution — Run a single pipeline phase ($1–2) instead of a full agent ($5–7) when you only need part of the output.
- Human-in-the-loop on writes — Autonomous reads, reviewed writes. Agent drafts 5 Jira comments; you post 1.
- Report-first architecture — Every agent saves Markdown locally to
~/Reports/{date}/{agent}.md. Obsidian sees them via a vault symlink; Notion gets a structured run log via shell helpers. Two systems, one bridge. - Agents auditing agents —
permissions-auditorandsecurity-auditorreview the system's own config.trace-collectoringests every dispatch (run id, duration, model, cost) into the Notion observability layer.
Evolution
Month 1 — Foundation (single pillar)
First 11 agents. Established the orchestrator pattern, MCP tool wrappers, plan-mode → build → test loop. Cost optimization via Sonnet swap for mechanics agents.
Month 2 — Maturity & governance
Grew to 22. Built the audit layer (permissions, security). Cost-tier introspection added. First fully autonomous days where the fleet ran the daily cycle end-to-end.
Month 3 — Scope doubled, refactor over rebuild
Inherited the Tech Investment pillar in April. Refactored from hardcoded constants to per-pillar YAML in a weekend. Wrote new vault entries, channel inventory, and risk model for the new program. Three weeks later, the operational tempo on both pillars matched the original tempo on one.
Build log
Daily journal (Week 1)
- Day 1 — Zero to 35 Skills
- Day 2 — Building My First Three AI Agents
- Day 3 — When Sonnet Isn't Enough
- Day 4 — The Orchestrator Pattern
- Day 5 — The Day Everything Broke
- Day 6 — Scaling to Full Program Coverage
- Day 7 — Seeing What Your AI Actually Costs
- Day 8 — Cutting Costs by 40%
Weekly summaries
- Week 1 — Building the Foundation
- Week 2 — TPM AI Operating System
- Week 3 — When Agents Start Auditing Themselves