Orchestration Patterns for AI Agent Workflows
Agent orchestration patterns are the architectural blueprints that define how multiple AI agents coordinate, communicate, and sequence their work within a unified workflow. The choice of orchestration pattern determines system latency, failure behavior, cost profile, and the complexity ceiling your agentic AI architecture can handle. Six core patterns cover the vast majority of enterprise use cases: Supervisor, Sequential Pipeline, Parallel Fan-Out, Router, Hierarchical, and Evaluator-Optimizer loops. Each pattern has distinct strengths, and production systems typically combine two or three patterns within a single workflow.
The orchestration layer is where most enterprise agent projects succeed or fail. Anthropic’s analysis of 200+ enterprise agent deployments found that 57% of project failures originated in orchestration design — agents were individually capable but poorly coordinated. [Source: Anthropic, “Enterprise Agent Deployment Patterns,” 2025] The agents worked; the wiring did not. This article provides the architectural vocabulary and decision frameworks to get the wiring right, drawing from production systems we operate at The Thinking Company and patterns documented across the industry.
Pattern 1: The Supervisor
A single orchestrator agent receives the user request, creates an execution plan, delegates tasks to specialist agents, monitors their progress, and assembles the final output. The supervisor holds the overall context and makes routing decisions.
How It Works
- User request arrives at the Supervisor Agent.
- Supervisor decomposes the request into subtasks with dependencies.
- Supervisor dispatches each subtask to the appropriate specialist agent.
- Specialist agents execute and return results to the supervisor.
- Supervisor evaluates results, requests revisions if needed, and synthesizes the final output.
Architecture Diagram (Textual)
User Request → [Supervisor Agent]
├── → [Research Agent] → findings
├── → [Analysis Agent] → insights
└── → [Writing Agent] → draft
[Supervisor Agent] → synthesized output → User
When to Use
- Complex workflows where task decomposition requires judgment
- Workflows where the execution plan varies based on input characteristics
- Systems where specialist agents need different tools and different model tiers
- Workflows requiring dynamic replanning when intermediate results change the approach
Production Characteristics
| Metric | Typical Range |
|---|---|
| Latency overhead | 15–30% above raw agent execution time |
| Token overhead | 20–40% for supervisor reasoning |
| Failure recovery | Strong — supervisor can retry or reroute failed subtasks |
| Debugging ease | High — supervisor logs show full execution trace |
| Scalability | Moderate — supervisor becomes bottleneck at >10 parallel agents |
Implementation Notes
The supervisor must be the most capable model in the system. We use Claude Opus for supervision and Claude Sonnet or Claude Haiku for specialist agents. The cost differential is offset by the supervisor’s ability to catch and correct specialist errors before they reach the user.
The supervisor’s system prompt must include: the full list of available specialists with their capabilities, explicit criteria for routing decisions, output quality standards for evaluating specialist results, and escalation rules for when specialist outputs are unacceptable.
Anthropic’s multi-agent benchmark showed that supervisor-pattern systems with explicit routing criteria outperform those with implicit criteria by 31% on task completion rate. [Source: Anthropic, 2025] “Implicit criteria” means instructing the supervisor to “use the right agent” — “explicit criteria” means specifying that “research tasks involving market data go to the Market Research Agent; research tasks involving technical specifications go to the Technical Research Agent.”
Anti-Patterns
The passive supervisor: A supervisor that simply forwards the user request to a specialist without decomposition or planning. This adds latency and cost without orchestration value.
The micromanaging supervisor: A supervisor that decomposes tasks into excessively fine-grained steps, invokes specialists for trivial operations, and over-evaluates intermediate results. This multiplies latency and cost without quality benefit.
Pattern 2: Sequential Pipeline
Agents execute in a fixed sequence, with each agent’s output flowing directly to the next agent as input. No central coordinator exists — the pipeline topology is defined at design time.
How It Works
- Input enters Stage 1 agent.
- Stage 1 agent processes and passes output to Stage 2 agent.
- Process continues through all stages.
- Final stage agent produces the workflow output.
Architecture
Input → [Agent 1: Research] → [Agent 2: Analyze] → [Agent 3: Draft] → [Agent 4: Review] → [Agent 5: Format] → Output
When to Use
- Workflows with clear, linear stage boundaries
- Processes where each stage has a single, well-defined input/output contract
- High-volume processing where predictable latency is important
- Workflows where intermediate outputs have independent value (each stage produces an artifact worth inspecting)
Production Characteristics
| Metric | Typical Range |
|---|---|
| Latency | Sum of all stage latencies (no parallelism) |
| Token efficiency | High — each agent receives only its stage context |
| Failure recovery | Moderate — failed stages require restart from that point |
| Debugging ease | Very high — inspect any inter-stage artifact |
| Scalability | High — add stages without affecting existing ones |
Implementation Notes
Pipeline systems require strict inter-stage contracts — structured schemas defining exactly what each stage produces and what the next stage expects. Without these contracts, agents produce outputs that the next stage cannot reliably parse.
We implement pipelines using a shared state object that accumulates outputs from each stage. Each agent reads from and writes to specific sections of this state object. This makes the full pipeline state inspectable at any point and enables restart from any stage without re-executing prior stages.
Pipeline latency is the primary weakness. A 5-stage pipeline where each stage takes 8 seconds produces a 40-second end-to-end time. For latency-sensitive workflows, identify stages that can run in parallel and convert those segments to the fan-out pattern (Pattern 3).
McKinsey Digital’s analysis of enterprise AI pipelines found that the average production pipeline has 4–6 stages, with content generation and data processing pipelines having the most stages (5–7) and decision-support pipelines having the fewest (3–4). [Source: McKinsey Digital, “AI Pipeline Architectures,” 2025]
Anti-Patterns
The leaky pipeline: Stages that pass through all accumulated context rather than their specific output. This bloats context windows and degrades downstream agent performance. Each stage should emit a clean, scoped output — not forward everything it received plus everything it produced.
The rigid pipeline: A pipeline with no error handling between stages. When Stage 3 produces malformed output, Stage 4 attempts to process it, produces garbage, and Stage 5 produces plausible-looking garbage. Every stage must validate its inputs.
Pattern 3: Parallel Fan-Out / Fan-In
Multiple agents execute simultaneously on different aspects of the same task, and a synthesis agent combines their outputs.
How It Works
- A dispatcher agent (or static router) decomposes the task into independent subtasks.
- Multiple specialist agents execute in parallel, each handling one subtask.
- All agents report results to a synthesis agent.
- The synthesis agent combines, deduplicates, and resolves conflicts across all inputs.
Architecture
Input → [Dispatcher]
├── → [Agent A: Market Research] ──┐
├── → [Agent B: Tech Research] ──├── → [Synthesis Agent] → Output
├── → [Agent C: Competitor Scan] ──┤
└── → [Agent D: Trend Analysis] ──┘
When to Use
- Research tasks requiring multiple independent information sources
- Analysis tasks where different perspectives improve output quality
- High-volume processing where parallelism reduces total time
- Tasks where completeness (covering all angles) matters more than depth on any single angle
Production Characteristics
| Metric | Typical Range |
|---|---|
| Latency | Determined by slowest parallel agent + synthesis time |
| Token usage | High — multiple agents process overlapping context |
| Failure recovery | Strong — one agent’s failure does not block others |
| Output quality | High — multiple perspectives reduce blind spots |
| Cost | 2–5x single-agent cost (parallelism has a price) |
Implementation Notes
The synthesis agent is the critical component. It must reconcile contradictions (Agent A says market is growing, Agent B says it is shrinking), eliminate redundancy, and produce a coherent unified output. This requires a capable model with explicit synthesis instructions.
We found that providing the synthesis agent with a template — specific sections to fill with specific types of information — produces dramatically better results than open-ended “combine these inputs” instructions. Template-guided synthesis reduced output coherence issues by 62% in our research workflows.
Parallel fan-out is where agentic AI architecture most clearly outperforms sequential approaches. Google’s research team demonstrated that parallel multi-agent research produces 40% more comprehensive outputs than sequential single-agent research given the same total compute budget, because diverse starting points explore more of the information space. [Source: Google Research, “Parallel Agent Architectures,” 2025]
Anti-Patterns
Unbalanced fan-out: One parallel agent takes 30 seconds while others take 5 seconds. The system waits for the slowest agent, wasting the time savings from parallelism. Balance workloads across parallel agents or set timeouts so slow agents do not block the pipeline.
Synthesis without criteria: A synthesis agent told to “combine everything” without quality criteria, priority rankings, or conflict resolution rules produces outputs that are either bloated (everything included) or arbitrary (random selection when contradictions exist).
Pattern 4: The Router
A lightweight routing agent classifies the incoming request and directs it to the appropriate specialist agent or sub-workflow. The router does not perform the task — it only decides who should.
How It Works
- Request arrives at the Router Agent.
- Router classifies the request (by topic, complexity, required capability).
- Router forwards the request to the appropriate specialist.
- Specialist handles the request end-to-end and returns the result.
Architecture
Input → [Router Agent]
├── (type=simple) → [Fast Agent] → Output
├── (type=complex) → [Deep Agent] → Output
├── (type=technical) → [Code Agent] → Output
└── (type=creative) → [Writing Agent] → Output
When to Use
- Systems handling diverse request types that require different capabilities
- Cost optimization — routing simple requests to cheaper models and complex requests to capable models
- Workflows where request classification is straightforward but handling is specialized
Production Characteristics
| Metric | Typical Range |
|---|---|
| Routing latency | 0.5–2 seconds (lightweight classification) |
| Cost savings | 30–60% vs. routing everything to the most capable model |
| Misrouting rate | 3–8% (requires monitoring and correction) |
| System complexity | Low — easiest multi-agent pattern to implement |
Implementation Notes
The router should be the smallest, fastest model that achieves acceptable routing accuracy. We use Claude Haiku for routing with a structured output format that returns the route classification and confidence score. Requests with confidence below 0.8 get routed to the most capable agent as a fallback.
Anthropic reports that enterprises using router patterns reduce their LLM inference costs by an average of 40% compared to using a single high-capability model for all requests, with less than 2% quality degradation when routing accuracy exceeds 95%. [Source: Anthropic, “Model Selection and Routing,” 2025]
The router pattern combines naturally with other patterns. Route simple requests to a single agent and complex requests to a full supervisor or pipeline workflow. This creates a cost-efficient system that scales capability with request complexity.
Anti-Patterns
The overthinking router: A router that uses a powerful model and complex reasoning for what is essentially a classification task. Routing should be fast and cheap — if the router costs as much as the specialist, the pattern provides no benefit.
The binary router: A router with only two routes (simple/complex). Most real workloads have 3–5 meaningfully different request types. Under-routing forces specialists to handle requests outside their expertise.
Pattern 5: Hierarchical Delegation
Agents form a tree structure where parent agents delegate to child agents, which may further delegate to their own children. This pattern enables recursive decomposition of complex problems.
How It Works
- Top-level agent receives the request and decomposes it into major workstreams.
- Workstream agents receive their assignments and further decompose into tasks.
- Task agents execute and report results to their workstream agent.
- Workstream agents synthesize task results and report to the top-level agent.
- Top-level agent produces the final output.
Architecture
[Executive Agent]
├── [Market Workstream Agent]
│ ├── [Market Size Agent]
│ ├── [Competitor Agent]
│ └── [Trend Agent]
├── [Technical Workstream Agent]
│ ├── [Architecture Agent]
│ ├── [Security Agent]
│ └── [Performance Agent]
└── [Financial Workstream Agent]
├── [Cost Agent]
├── [Revenue Agent]
└── [ROI Agent]
When to Use
- Very complex tasks requiring 8+ specialist agents
- Projects with natural organizational decomposition (market, technical, financial)
- Workflows where intermediate synthesis at each level adds value
- Long-running processes (days or weeks) where hierarchical management mirrors human project management
Production Characteristics
| Metric | Typical Range |
|---|---|
| Agent count | 8–20+ |
| Depth | 2–3 levels (deeper creates excessive overhead) |
| Latency | High — multiple levels of delegation and synthesis |
| Quality | High for well-structured problems — each level adds coherence |
| Debugging difficulty | Moderate — requires tracing through multiple delegation levels |
Implementation Notes
Hierarchical systems work best when the hierarchy maps to natural problem decomposition — market/technical/financial, or intake/processing/delivery. Artificial hierarchies (created solely to manage agent count) add overhead without value.
Each level in the hierarchy should add synthesis value. If a mid-level agent merely forwards tasks to leaf agents and concatenates results, it is overhead. Mid-level agents should resolve conflicts between their children, apply domain-specific quality criteria, and produce a coherent synthesis that is more than the sum of its parts.
Sequoia’s analysis of enterprise agent architectures found that hierarchical systems with 2 delegation levels outperform flat systems by 28% on complex tasks (those requiring >5 distinct capabilities). Adding a third level provides diminishing returns — only 7% additional improvement — while increasing latency by 40%. [Source: Sequoia, 2026]
Anti-Patterns
The deep hierarchy: More than 3 delegation levels. Each level adds latency, token overhead, and information loss. If your problem requires 4+ levels of decomposition, reconsider whether it should be one workflow or multiple separate workflows.
The hollow middle: Mid-level agents that add no synthesis value — they just dispatch tasks and collect results. Either give mid-level agents real synthesis responsibilities or flatten the hierarchy.
Pattern 6: Evaluator-Optimizer Loop
An execution agent produces an output, an evaluator agent scores it against quality criteria, and if the score is below threshold, the execution agent receives the feedback and produces a revised version. The loop continues until quality thresholds are met or a maximum iteration count is reached.
How It Works
- Generator Agent produces initial output.
- Evaluator Agent scores the output against defined criteria.
- If score >= threshold, output is accepted and delivered.
- If score < threshold, feedback is sent to Generator Agent.
- Generator Agent produces revised output incorporating feedback.
- Loop repeats (max 3–5 iterations).
Architecture
[Generator Agent] → output → [Evaluator Agent]
↑ │
└──── feedback (if below threshold) ←──┘
│
(if above threshold) → Final Output
When to Use
- Tasks where quality is more important than speed
- Creative or analytical work where iterative refinement improves output
- Situations where quality criteria can be explicitly defined and scored
- Workflows where first-pass quality is unpredictable
Production Characteristics
| Metric | Typical Range |
|---|---|
| Average iterations | 1.5–2.5 (most outputs pass within 2 attempts) |
| Quality improvement per iteration | 10–25% on scored rubrics |
| Latency multiplier | 1.5–3x single-pass latency |
| Cost multiplier | 1.5–3x single-pass cost |
Implementation Notes
The evaluator must use explicit, measurable criteria — not subjective judgment. We define 5–8 evaluation dimensions with numerical scores and pass/fail thresholds for each. The feedback sent to the generator must be specific: “Section 3 lacks a data point to support the market size claim; add a statistic with source” — not “improve quality.”
We cap iterations at 3 for most workflows. Anthropic’s research on iterative refinement found that 85% of the quality improvement occurs in the first 2 iterations. The third iteration adds marginal improvement while doubling the compute cost of the second iteration. Beyond 3 iterations, generators tend to make lateral changes rather than genuine improvements. [Source: Anthropic, 2025]
This pattern is central to how we maintain content quality at scale. Every article in our content engine passes through an evaluator-optimizer loop before publication, scored against our 50-point quality rubric. The loop catches 73% of quality issues that would otherwise require human intervention.
Anti-Patterns
The infinite loop: No maximum iteration count. If the generator cannot satisfy the evaluator’s criteria, the loop runs indefinitely, burning tokens without progress. Always set a maximum of 3–5 iterations, with escalation to human review if the threshold is not met.
The vague evaluator: An evaluator that says “not good enough” without specifying what is wrong. This produces random revisions rather than targeted improvements. Evaluation feedback must be actionable and specific.
Combining Patterns: Hybrid Architectures
Production systems rarely use a single pattern in isolation. The most effective architectures combine patterns, using each where it provides the most value.
Common Hybrid Combinations
Router + Pipeline: Route incoming requests by type, then process each type through a type-specific pipeline. This is the most common hybrid in enterprise deployments.
Supervisor + Fan-Out: Supervisor decomposes the task, dispatches independent subtasks in parallel, then synthesizes results. This combines the supervisor’s adaptive planning with fan-out’s latency benefits.
Pipeline + Evaluator-Optimizer: Each pipeline stage includes an evaluator-optimizer loop. This produces the highest-quality outputs but at significant latency and cost.
Hierarchical + Router: Each level in the hierarchy uses a router to dispatch work to the appropriate child agent. This reduces misrouting at each level.
Pattern Selection Decision Framework
| Factor | Recommended Pattern |
|---|---|
| Linear workflow, predictable stages | Sequential Pipeline |
| Diverse request types | Router (as entry point) |
| Complex task, requires planning | Supervisor |
| Independent subtasks, speed matters | Parallel Fan-Out |
| Very complex, many capabilities needed | Hierarchical |
| Quality-critical, iterative improvement | Evaluator-Optimizer |
| Cost-sensitive, variable complexity | Router + tiered specialists |
| Research/analysis requiring breadth | Fan-Out + Synthesis |
Framework Selection: Where Patterns Meet Code
Orchestration patterns need implementation frameworks. The major options as of early 2026:
LangGraph (LangChain): Graph-based orchestration with explicit state management. Best for complex, stateful workflows where you need fine-grained control over agent transitions. Production-grade but has a learning curve. Used by 43% of enterprise agent deployments. [Source: a16z, “State of AI Agents,” 2026]
CrewAI: Role-based multi-agent framework with built-in coordination. Best for team-of-agents patterns where agents have defined roles and collaborate on shared tasks. More opinionated than LangGraph, faster to prototype but less flexible.
Anthropic’s Agent SDK: Lightweight orchestration focused on tool use and handoffs. Best for supervisor-pattern systems built on Claude models. Native support for handoffs, guardrails, and tool management.
Autogen (Microsoft): Conversational multi-agent framework. Best for debate-pattern systems and agent-to-agent dialogue. Strong integration with Azure infrastructure.
Custom orchestration: Direct API calls with application-level coordination code. Best for teams with strong engineering capability who need maximum control and minimal framework overhead. We use custom orchestration for our production content engine because none of the frameworks supported our specific state management and quality gating requirements.
Orchestration at The Thinking Company
Our production orchestration uses a hybrid of Router + Pipeline + Evaluator-Optimizer patterns. Incoming tasks are classified by complexity (fast-track, standard, deep-dive) using a Claude Haiku router. Fast-track tasks go to a single agent. Standard tasks enter a 4-stage pipeline (Research, Analyze, Draft, Review). Deep-dive tasks enter a 6-stage pipeline with an evaluator-optimizer loop on the Draft and Review stages.
This hybrid reduced our average processing cost by 45% compared to our previous approach of routing everything through the full 6-stage pipeline, while maintaining quality scores within 2% of the all-pipeline approach.
For client implementations, the pattern selection depends on the client’s AI maturity level, infrastructure, and use case. We design and deploy custom orchestration architectures through our AI Build Sprint (EUR 50–80K, 4–6 weeks) for well-scoped workflows, and through our AI Product Build (EUR 200–400K+, 3–6 months) for complex, multi-workflow agent platforms.
Frequently Asked Questions
Which orchestration pattern should I start with?
Start with the Sequential Pipeline pattern. It is the simplest to implement, debug, and monitor. Build a 3–4 stage pipeline for your most well-defined workflow, validate that it produces acceptable quality, and then optimize. Add a Router pattern if you handle diverse request types. Add Fan-Out if parallelism would reduce latency for independent subtasks. The Supervisor pattern is appropriate once you have validated that a pipeline approach is insufficient — it is more powerful but harder to debug and more expensive to operate.
How does orchestration pattern choice affect cost?
Pattern choice has a 2–5x impact on LLM inference costs. Router patterns reduce costs by 30–60% by sending simple requests to cheaper models. Pipeline patterns are cost-efficient because each agent receives scoped context. Fan-Out patterns multiply costs proportionally to agent count. Evaluator-Optimizer loops multiply costs by 1.5–3x depending on iteration frequency. Supervisor patterns add 20–40% overhead for supervisor reasoning. The most cost-efficient production systems combine a Router with Pattern-specific backends — simple requests get a single agent, complex requests get a full pipeline.
Can I change orchestration patterns after deployment?
Yes, and you should expect to. Agent architectures evolve as you accumulate production data about failure patterns, performance bottlenecks, and user needs. Design for pattern migration from the start: use standardized inter-agent contracts (structured JSON schemas), keep agents loosely coupled, and avoid baking orchestration logic into agent instructions. We have migrated multiple production systems from Pipeline to Supervisor patterns and from flat Fan-Out to Hierarchical patterns, typically requiring 1–2 weeks of engineering effort per migration.
How do orchestration patterns interact with governance frameworks?
Governance requirements constrain pattern selection. Workflows requiring full audit trails favor Pipeline and Supervisor patterns because they produce inspectable intermediate artifacts. Workflows requiring human approval checkpoints need patterns that support synchronous gates — Pipeline stages with human-in-the-loop, or Supervisor patterns where the supervisor requests human approval before proceeding. Parallel Fan-Out patterns are harder to govern because multiple agents operate simultaneously, making real-time oversight impractical. Map governance requirements to pattern capabilities before selecting your architecture.
What latency should I expect from multi-agent orchestration?
For a Sequential Pipeline with 4 stages: 20–40 seconds end-to-end (5–10 seconds per stage). For a Supervisor with 3 parallel specialists: 15–25 seconds (supervisor planning + longest specialist + synthesis). For a Fan-Out with synthesis: 10–20 seconds (longest parallel agent + synthesis). For an Evaluator-Optimizer loop: add 50–150% to the base generation time. These figures assume standard enterprise workflows using Claude Sonnet-class models. Simpler tasks with smaller models can achieve 2–5 second per-agent latency.
How do I monitor orchestration health in production?
Monitor three levels: (1) Agent level — latency, error rate, quality scores per agent; (2) Orchestration level — routing accuracy, handoff success rates, queue depths, coordination overhead; (3) Workflow level — end-to-end completion rate, total latency, total cost, human intervention rate. The orchestration-specific metrics (routing accuracy, handoff success rates) are the ones most teams miss, and they are the most diagnostic for coordination failures. Set up alerts when routing accuracy drops below 95%, when handoff failure rates exceed 5%, or when coordination overhead (time spent on inter-agent communication vs. actual work) exceeds 25%.
Do orchestration patterns differ for real-time vs. batch workloads?
Significantly. Real-time workloads (chatbots, live customer interactions) require low-latency patterns — Router for classification, single-agent execution for most requests, and Supervisor only for complex queries. Batch workloads (report generation, data processing, content production) can use any pattern because latency tolerance is measured in minutes or hours, not seconds. Batch workloads benefit most from Pipeline and Evaluator-Optimizer patterns because the quality improvements outweigh the latency costs. Design your orchestration for the latency requirements of your specific use case, not for a generic “fast is always better” principle.