active

TPM Agent Ecosystem

35 specialized AI agents and 42 reusable skills running across two pillars (AI + Tech Investment) — daily briefings, sprint health, program monitoring, governance audits. Config-driven multi-tenancy made the scope expansion possible without a rebuild.

aiagentsclaude-codetpmproductionmcpmulti-tenantconfig-driven

Loading frames 0/96

Scroll to orbit the agents

TPM Agent Ecosystem

A hierarchical multi-agent system built on Claude Code that handles the full spectrum of TPM daily operations across two programs at once — AI initiatives and a Tech Investment portfolio of platform/infrastructure squads. 35 agents, 42 reusable skills, 2 pillars, ~$35–50/week, 12–15 hours saved per week.

It started as a single-tenant toolkit for one program. When my scope doubled in April 2026, the ecosystem absorbed the second pillar in a single weekend refactor — because the agents were portable. That outcome is the whole story.

Architecture

One Orchestrator routes requests to specialized sub-agents based on task type and pillar. Model tiering matches cognitive load to cost — Opus for cross-source synthesis, Sonnet for structured/mechanical transformations. Each agent reads a per-pillar YAML config at startup, so the same code drives both programs.

tpm-team-lead (Orchestrator)
│  --pillar=ai | --pillar=ti
│
├── AI Pillar  (~/.tpm/pillars/ai.yml)         18 agents
│   ├── Opus    daily-briefing · eod-summary · program-monitor
│   │           portfolio-review · risk-radar
│   └── Sonnet  sprint-board · roadmap-publisher · standup-notes
│               feedback-triage · launch-readiness · …
│
├── Tech Investment Pillar  (~/.tpm/pillars/ti.yml)  17 agents
│   ├── Opus    migration-watch · ktlo-radar · dep-coordinator
│   │           decentralization-lens
│   └── Sonnet  defect-hygiene · milestone-pulse · roadmap-sync
│               squad-page-auditor · …
│
└── Governance — agents auditing agents
    permissions-auditor · security-auditor · trace-collector

The code is the same; the YAML is the only diff. A new pillar today is a new file, not a new fork.

Config-driven multi-tenancy

The single move that made the scope expansion survivable. Each agent takes a --pillar=<name> flag and reads the matching YAML from ~/.tpm/pillars/:

# pillars/ai.yml
name: AI
slack_channels: ["#ai-eng", "#ai-ops", "#ai-launch"]
jira: { project: AIPLAT, ktlo_label: ktlo-ai }
confluence: { space: AI, roadmap_page: "AI Roadmap" }

No conditionals in agent code, no environment-variable contortions, no implicit defaults. Each agent fails fast on missing config (silent overreach is worse than a hard error). Each has a validate-config mode that pre-flights a tenant before it gets cron'd.

The refactor itself was about six hours on a Saturday. The cost of not doing it the first time was the whole second weekend.

MCP & CLI integrations

Every agent connects to real data sources via Model Context Protocol or direct CLI tools — not scraping, not screenshots.

Integration	Data	Method
Jira	Tickets, sprints, story points, epics	Atlassian MCP
Confluence	Wiki pages, agendas, sprint goals	Atlassian MCP
Slack	Channel activity, thread context	Slack MCP
Google Workspace	Calendar, Docs, Drive (Gemini meeting notes)	Google Workspace MCP
GitHub	PR state, deploy status, file diffs	GitHub MCP
Notion	Cross-team observability dashboard	Notion API
Obsidian	TPM brain (vault, daily notes, initiative pages)	filesystem + Dataview

Impact (week 8 of operation)

Metric	Week 1	Week 8
Production agents	11	35
Reusable skills	15	42
Pillars covered	1	2
Weekly cost	$100–150	$35–50
Reports generated/week	~10	120+
Manual hours saved/week	2–3	12–15

Key design patterns

Config-driven tenancy — Per-pillar YAML, never hardcoded constants. Adding a tenant is a file, not a fork.
Model tiering — Opus for synthesis, Sonnet for mechanical. ~40% cost cut with no measurable quality drop on structured tasks.
Scoped execution — Run a single pipeline phase ($1–2) instead of a full agent ($5–7) when you only need part of the output.
Human-in-the-loop on writes — Autonomous reads, reviewed writes. Agent drafts 5 Jira comments; you post 1.
Report-first architecture — Every agent saves Markdown locally to ~/Reports/{date}/{agent}.md. Obsidian sees them via a vault symlink; Notion gets a structured run log via shell helpers. Two systems, one bridge.
Agents auditing agents — permissions-auditor and security-auditor review the system's own config. trace-collector ingests every dispatch (run id, duration, model, cost) into the Notion observability layer.

Evolution

Month 1 — Foundation (single pillar)

First 11 agents. Established the orchestrator pattern, MCP tool wrappers, plan-mode → build → test loop. Cost optimization via Sonnet swap for mechanics agents.

Month 2 — Maturity & governance

Grew to 22. Built the audit layer (permissions, security). Cost-tier introspection added. First fully autonomous days where the fleet ran the daily cycle end-to-end.

Month 3 — Scope doubled, refactor over rebuild

Inherited the Tech Investment pillar in April. Refactored from hardcoded constants to per-pillar YAML in a weekend. Wrote new vault entries, channel inventory, and risk model for the new program. Three weeks later, the operational tempo on both pillars matched the original tempo on one.

TPM Agent Ecosystem

TPM Agent Ecosystem

Architecture

Config-driven multi-tenancy

MCP & CLI integrations

Impact (week 8 of operation)

Key design patterns

Evolution

Month 1 — Foundation (single pillar)

Month 2 — Maturity & governance

Month 3 — Scope doubled, refactor over rebuild

Build log

Daily journal (Week 1)

Weekly summaries

Related