← All Posts

Prompt Engineering for Production Agents

5 min readFebruary 18, 2026
prompt-engineeringagentsbest-practices

Prompt Engineering for Production Agents

Prompt Engineering for Production Agents
Prompt Engineering for Production Agents

Writing a prompt for a chatbot is easy. Writing a system prompt for an agent that runs autonomously every day is a completely different skill.

The Core Principle

Agents need instructions, not suggestions.

This doesn't work:

"You should prefer using desk CLI for Google Workspace data."

This works:

"IMPORTANT: ALWAYS use desk CLI first for Google Workspace data. Do NOT use Glean MCP tools unless desk fails."

Sonnet in particular will interpret soft language as optional. Opus handles nuance better but still benefits from directness.

Anatomy of a Production Agent Prompt

Every agent prompt I write has these sections:

  1. Identity & Purpose — What this agent is, in one sentence
  2. Data Sources — Exactly what to read, in what order, with fallbacks
  3. Processing Steps — Numbered steps with explicit substeps
  4. Output Format — Exact template with examples
  5. Safety Rules — What NOT to do (equally important)

Anti-Patterns

The Vague Agent

"Analyze the sprint and provide insights"

What "insights"? Insights about velocity? Quality? Team health? The agent will guess, and guess differently each time.

The Over-Specified Agent

"Read exactly 47 Jira tickets, sort by priority descending, then by created date ascending, filter to status In Progress or Code Review..."

This is a SQL query, not an agent prompt. Let the agent reason about what's relevant.

The Sweet Spot

"Pull all tickets from the active sprint. Group by status: Done (merged this sprint), In Code Review, In Progress, Blocked. For each ticket, include key, summary, assignee, and days in current status. Flag any ticket blocked >3 days."

Clear purpose, structured output, room for the agent to handle edge cases.

Testing Agent Prompts

You can't unit test prompts, but you can:

  1. Run the agent 3 times with the same input — check consistency
  2. Deliberately include edge cases in the data — see how it handles them
  3. Compare token usage across runs — high variance means unstable prompts
  4. Read the full agent output, not just the summary — catch silent failures