The Thinking Company

Orchestration Patterns for AI Agent Workflows

Agent orchestration patterns are the architectural blueprints that define how multiple AI agents coordinate, communicate, and sequence their work within a unified workflow. The choice of orchestration pattern determines system latency, failure behavior, cost profile, and the complexity ceiling your agentic AI architecture can handle. Six core patterns cover the vast majority of enterprise use cases: Supervisor, Sequential Pipeline, Parallel Fan-Out, Router, Hierarchical, and Evaluator-Optimizer loops. Each pattern has distinct strengths, and production systems typically combine two or three patterns within a single workflow.

The orchestration layer is where most enterprise agent projects succeed or fail. Anthropic’s analysis of 200+ enterprise agent deployments found that 57% of project failures originated in orchestration design — agents were individually capable but poorly coordinated. [Source: Anthropic, “Enterprise Agent Deployment Patterns,” 2025] The agents worked; the wiring did not. This article provides the architectural vocabulary and decision frameworks to get the wiring right, drawing from production systems we operate at The Thinking Company and patterns documented across the industry.

Pattern 1: The Supervisor

A single orchestrator agent receives the user request, creates an execution plan, delegates tasks to specialist agents, monitors their progress, and assembles the final output. The supervisor holds the overall context and makes routing decisions.

How It Works

  1. User request arrives at the Supervisor Agent.
  2. Supervisor decomposes the request into subtasks with dependencies.
  3. Supervisor dispatches each subtask to the appropriate specialist agent.
  4. Specialist agents execute and return results to the supervisor.
  5. Supervisor evaluates results, requests revisions if needed, and synthesizes the final output.

Architecture Diagram (Textual)

User Request → [Supervisor Agent]
                    ├── → [Research Agent] → findings
                    ├── → [Analysis Agent] → insights
                    └── → [Writing Agent] → draft
               [Supervisor Agent] → synthesized output → User

When to Use

  • Complex workflows where task decomposition requires judgment
  • Workflows where the execution plan varies based on input characteristics
  • Systems where specialist agents need different tools and different model tiers
  • Workflows requiring dynamic replanning when intermediate results change the approach

Production Characteristics

MetricTypical Range
Latency overhead15–30% above raw agent execution time
Token overhead20–40% for supervisor reasoning
Failure recoveryStrong — supervisor can retry or reroute failed subtasks
Debugging easeHigh — supervisor logs show full execution trace
ScalabilityModerate — supervisor becomes bottleneck at >10 parallel agents

Implementation Notes

The supervisor must be the most capable model in the system. We use Claude Opus for supervision and Claude Sonnet or Claude Haiku for specialist agents. The cost differential is offset by the supervisor’s ability to catch and correct specialist errors before they reach the user.

The supervisor’s system prompt must include: the full list of available specialists with their capabilities, explicit criteria for routing decisions, output quality standards for evaluating specialist results, and escalation rules for when specialist outputs are unacceptable.

Anthropic’s multi-agent benchmark showed that supervisor-pattern systems with explicit routing criteria outperform those with implicit criteria by 31% on task completion rate. [Source: Anthropic, 2025] “Implicit criteria” means instructing the supervisor to “use the right agent” — “explicit criteria” means specifying that “research tasks involving market data go to the Market Research Agent; research tasks involving technical specifications go to the Technical Research Agent.”

Anti-Patterns

The passive supervisor: A supervisor that simply forwards the user request to a specialist without decomposition or planning. This adds latency and cost without orchestration value.

The micromanaging supervisor: A supervisor that decomposes tasks into excessively fine-grained steps, invokes specialists for trivial operations, and over-evaluates intermediate results. This multiplies latency and cost without quality benefit.

Pattern 2: Sequential Pipeline

Agents execute in a fixed sequence, with each agent’s output flowing directly to the next agent as input. No central coordinator exists — the pipeline topology is defined at design time.

How It Works

  1. Input enters Stage 1 agent.
  2. Stage 1 agent processes and passes output to Stage 2 agent.
  3. Process continues through all stages.
  4. Final stage agent produces the workflow output.

Architecture

Input → [Agent 1: Research] → [Agent 2: Analyze] → [Agent 3: Draft] → [Agent 4: Review] → [Agent 5: Format] → Output

When to Use

  • Workflows with clear, linear stage boundaries
  • Processes where each stage has a single, well-defined input/output contract
  • High-volume processing where predictable latency is important
  • Workflows where intermediate outputs have independent value (each stage produces an artifact worth inspecting)

Production Characteristics

MetricTypical Range
LatencySum of all stage latencies (no parallelism)
Token efficiencyHigh — each agent receives only its stage context
Failure recoveryModerate — failed stages require restart from that point
Debugging easeVery high — inspect any inter-stage artifact
ScalabilityHigh — add stages without affecting existing ones

Implementation Notes

Pipeline systems require strict inter-stage contracts — structured schemas defining exactly what each stage produces and what the next stage expects. Without these contracts, agents produce outputs that the next stage cannot reliably parse.

We implement pipelines using a shared state object that accumulates outputs from each stage. Each agent reads from and writes to specific sections of this state object. This makes the full pipeline state inspectable at any point and enables restart from any stage without re-executing prior stages.

Pipeline latency is the primary weakness. A 5-stage pipeline where each stage takes 8 seconds produces a 40-second end-to-end time. For latency-sensitive workflows, identify stages that can run in parallel and convert those segments to the fan-out pattern (Pattern 3).

McKinsey Digital’s analysis of enterprise AI pipelines found that the average production pipeline has 4–6 stages, with content generation and data processing pipelines having the most stages (5–7) and decision-support pipelines having the fewest (3–4). [Source: McKinsey Digital, “AI Pipeline Architectures,” 2025]

Anti-Patterns

The leaky pipeline: Stages that pass through all accumulated context rather than their specific output. This bloats context windows and degrades downstream agent performance. Each stage should emit a clean, scoped output — not forward everything it received plus everything it produced.

The rigid pipeline: A pipeline with no error handling between stages. When Stage 3 produces malformed output, Stage 4 attempts to process it, produces garbage, and Stage 5 produces plausible-looking garbage. Every stage must validate its inputs.

Pattern 3: Parallel Fan-Out / Fan-In

Multiple agents execute simultaneously on different aspects of the same task, and a synthesis agent combines their outputs.

How It Works

  1. A dispatcher agent (or static router) decomposes the task into independent subtasks.
  2. Multiple specialist agents execute in parallel, each handling one subtask.
  3. All agents report results to a synthesis agent.
  4. The synthesis agent combines, deduplicates, and resolves conflicts across all inputs.

Architecture

Input → [Dispatcher]
            ├── → [Agent A: Market Research] ──┐
            ├── → [Agent B: Tech Research]  ──├── → [Synthesis Agent] → Output
            ├── → [Agent C: Competitor Scan] ──┤
            └── → [Agent D: Trend Analysis] ──┘

When to Use

  • Research tasks requiring multiple independent information sources
  • Analysis tasks where different perspectives improve output quality
  • High-volume processing where parallelism reduces total time
  • Tasks where completeness (covering all angles) matters more than depth on any single angle

Production Characteristics

MetricTypical Range
LatencyDetermined by slowest parallel agent + synthesis time
Token usageHigh — multiple agents process overlapping context
Failure recoveryStrong — one agent’s failure does not block others
Output qualityHigh — multiple perspectives reduce blind spots
Cost2–5x single-agent cost (parallelism has a price)

Implementation Notes

The synthesis agent is the critical component. It must reconcile contradictions (Agent A says market is growing, Agent B says it is shrinking), eliminate redundancy, and produce a coherent unified output. This requires a capable model with explicit synthesis instructions.

We found that providing the synthesis agent with a template — specific sections to fill with specific types of information — produces dramatically better results than open-ended “combine these inputs” instructions. Template-guided synthesis reduced output coherence issues by 62% in our research workflows.

Parallel fan-out is where agentic AI architecture most clearly outperforms sequential approaches. Google’s research team demonstrated that parallel multi-agent research produces 40% more comprehensive outputs than sequential single-agent research given the same total compute budget, because diverse starting points explore more of the information space. [Source: Google Research, “Parallel Agent Architectures,” 2025]

Anti-Patterns

Unbalanced fan-out: One parallel agent takes 30 seconds while others take 5 seconds. The system waits for the slowest agent, wasting the time savings from parallelism. Balance workloads across parallel agents or set timeouts so slow agents do not block the pipeline.

Synthesis without criteria: A synthesis agent told to “combine everything” without quality criteria, priority rankings, or conflict resolution rules produces outputs that are either bloated (everything included) or arbitrary (random selection when contradictions exist).

Pattern 4: The Router

A lightweight routing agent classifies the incoming request and directs it to the appropriate specialist agent or sub-workflow. The router does not perform the task — it only decides who should.

How It Works

  1. Request arrives at the Router Agent.
  2. Router classifies the request (by topic, complexity, required capability).
  3. Router forwards the request to the appropriate specialist.
  4. Specialist handles the request end-to-end and returns the result.

Architecture

Input → [Router Agent]
            ├── (type=simple)    → [Fast Agent]     → Output
            ├── (type=complex)   → [Deep Agent]     → Output
            ├── (type=technical) → [Code Agent]      → Output
            └── (type=creative)  → [Writing Agent]   → Output

When to Use

  • Systems handling diverse request types that require different capabilities
  • Cost optimization — routing simple requests to cheaper models and complex requests to capable models
  • Workflows where request classification is straightforward but handling is specialized

Production Characteristics

MetricTypical Range
Routing latency0.5–2 seconds (lightweight classification)
Cost savings30–60% vs. routing everything to the most capable model
Misrouting rate3–8% (requires monitoring and correction)
System complexityLow — easiest multi-agent pattern to implement

Implementation Notes

The router should be the smallest, fastest model that achieves acceptable routing accuracy. We use Claude Haiku for routing with a structured output format that returns the route classification and confidence score. Requests with confidence below 0.8 get routed to the most capable agent as a fallback.

Anthropic reports that enterprises using router patterns reduce their LLM inference costs by an average of 40% compared to using a single high-capability model for all requests, with less than 2% quality degradation when routing accuracy exceeds 95%. [Source: Anthropic, “Model Selection and Routing,” 2025]

The router pattern combines naturally with other patterns. Route simple requests to a single agent and complex requests to a full supervisor or pipeline workflow. This creates a cost-efficient system that scales capability with request complexity.

Anti-Patterns

The overthinking router: A router that uses a powerful model and complex reasoning for what is essentially a classification task. Routing should be fast and cheap — if the router costs as much as the specialist, the pattern provides no benefit.

The binary router: A router with only two routes (simple/complex). Most real workloads have 3–5 meaningfully different request types. Under-routing forces specialists to handle requests outside their expertise.

Pattern 5: Hierarchical Delegation

Agents form a tree structure where parent agents delegate to child agents, which may further delegate to their own children. This pattern enables recursive decomposition of complex problems.

How It Works

  1. Top-level agent receives the request and decomposes it into major workstreams.
  2. Workstream agents receive their assignments and further decompose into tasks.
  3. Task agents execute and report results to their workstream agent.
  4. Workstream agents synthesize task results and report to the top-level agent.
  5. Top-level agent produces the final output.

Architecture

[Executive Agent]
    ├── [Market Workstream Agent]
    │       ├── [Market Size Agent]
    │       ├── [Competitor Agent]
    │       └── [Trend Agent]
    ├── [Technical Workstream Agent]
    │       ├── [Architecture Agent]
    │       ├── [Security Agent]
    │       └── [Performance Agent]
    └── [Financial Workstream Agent]
            ├── [Cost Agent]
            ├── [Revenue Agent]
            └── [ROI Agent]

When to Use

  • Very complex tasks requiring 8+ specialist agents
  • Projects with natural organizational decomposition (market, technical, financial)
  • Workflows where intermediate synthesis at each level adds value
  • Long-running processes (days or weeks) where hierarchical management mirrors human project management

Production Characteristics

MetricTypical Range
Agent count8–20+
Depth2–3 levels (deeper creates excessive overhead)
LatencyHigh — multiple levels of delegation and synthesis
QualityHigh for well-structured problems — each level adds coherence
Debugging difficultyModerate — requires tracing through multiple delegation levels

Implementation Notes

Hierarchical systems work best when the hierarchy maps to natural problem decomposition — market/technical/financial, or intake/processing/delivery. Artificial hierarchies (created solely to manage agent count) add overhead without value.

Each level in the hierarchy should add synthesis value. If a mid-level agent merely forwards tasks to leaf agents and concatenates results, it is overhead. Mid-level agents should resolve conflicts between their children, apply domain-specific quality criteria, and produce a coherent synthesis that is more than the sum of its parts.

Sequoia’s analysis of enterprise agent architectures found that hierarchical systems with 2 delegation levels outperform flat systems by 28% on complex tasks (those requiring >5 distinct capabilities). Adding a third level provides diminishing returns — only 7% additional improvement — while increasing latency by 40%. [Source: Sequoia, 2026]

Anti-Patterns

The deep hierarchy: More than 3 delegation levels. Each level adds latency, token overhead, and information loss. If your problem requires 4+ levels of decomposition, reconsider whether it should be one workflow or multiple separate workflows.

The hollow middle: Mid-level agents that add no synthesis value — they just dispatch tasks and collect results. Either give mid-level agents real synthesis responsibilities or flatten the hierarchy.

Pattern 6: Evaluator-Optimizer Loop

An execution agent produces an output, an evaluator agent scores it against quality criteria, and if the score is below threshold, the execution agent receives the feedback and produces a revised version. The loop continues until quality thresholds are met or a maximum iteration count is reached.

How It Works

  1. Generator Agent produces initial output.
  2. Evaluator Agent scores the output against defined criteria.
  3. If score >= threshold, output is accepted and delivered.
  4. If score < threshold, feedback is sent to Generator Agent.
  5. Generator Agent produces revised output incorporating feedback.
  6. Loop repeats (max 3–5 iterations).

Architecture

[Generator Agent] → output → [Evaluator Agent]
       ↑                              │
       └──── feedback (if below threshold) ←──┘

                              (if above threshold) → Final Output

When to Use

  • Tasks where quality is more important than speed
  • Creative or analytical work where iterative refinement improves output
  • Situations where quality criteria can be explicitly defined and scored
  • Workflows where first-pass quality is unpredictable

Production Characteristics

MetricTypical Range
Average iterations1.5–2.5 (most outputs pass within 2 attempts)
Quality improvement per iteration10–25% on scored rubrics
Latency multiplier1.5–3x single-pass latency
Cost multiplier1.5–3x single-pass cost

Implementation Notes

The evaluator must use explicit, measurable criteria — not subjective judgment. We define 5–8 evaluation dimensions with numerical scores and pass/fail thresholds for each. The feedback sent to the generator must be specific: “Section 3 lacks a data point to support the market size claim; add a statistic with source” — not “improve quality.”

We cap iterations at 3 for most workflows. Anthropic’s research on iterative refinement found that 85% of the quality improvement occurs in the first 2 iterations. The third iteration adds marginal improvement while doubling the compute cost of the second iteration. Beyond 3 iterations, generators tend to make lateral changes rather than genuine improvements. [Source: Anthropic, 2025]

This pattern is central to how we maintain content quality at scale. Every article in our content engine passes through an evaluator-optimizer loop before publication, scored against our 50-point quality rubric. The loop catches 73% of quality issues that would otherwise require human intervention.

Anti-Patterns

The infinite loop: No maximum iteration count. If the generator cannot satisfy the evaluator’s criteria, the loop runs indefinitely, burning tokens without progress. Always set a maximum of 3–5 iterations, with escalation to human review if the threshold is not met.

The vague evaluator: An evaluator that says “not good enough” without specifying what is wrong. This produces random revisions rather than targeted improvements. Evaluation feedback must be actionable and specific.

Combining Patterns: Hybrid Architectures

Production systems rarely use a single pattern in isolation. The most effective architectures combine patterns, using each where it provides the most value.

Common Hybrid Combinations

Router + Pipeline: Route incoming requests by type, then process each type through a type-specific pipeline. This is the most common hybrid in enterprise deployments.

Supervisor + Fan-Out: Supervisor decomposes the task, dispatches independent subtasks in parallel, then synthesizes results. This combines the supervisor’s adaptive planning with fan-out’s latency benefits.

Pipeline + Evaluator-Optimizer: Each pipeline stage includes an evaluator-optimizer loop. This produces the highest-quality outputs but at significant latency and cost.

Hierarchical + Router: Each level in the hierarchy uses a router to dispatch work to the appropriate child agent. This reduces misrouting at each level.

Pattern Selection Decision Framework

FactorRecommended Pattern
Linear workflow, predictable stagesSequential Pipeline
Diverse request typesRouter (as entry point)
Complex task, requires planningSupervisor
Independent subtasks, speed mattersParallel Fan-Out
Very complex, many capabilities neededHierarchical
Quality-critical, iterative improvementEvaluator-Optimizer
Cost-sensitive, variable complexityRouter + tiered specialists
Research/analysis requiring breadthFan-Out + Synthesis

Framework Selection: Where Patterns Meet Code

Orchestration patterns need implementation frameworks. The major options as of early 2026:

LangGraph (LangChain): Graph-based orchestration with explicit state management. Best for complex, stateful workflows where you need fine-grained control over agent transitions. Production-grade but has a learning curve. Used by 43% of enterprise agent deployments. [Source: a16z, “State of AI Agents,” 2026]

CrewAI: Role-based multi-agent framework with built-in coordination. Best for team-of-agents patterns where agents have defined roles and collaborate on shared tasks. More opinionated than LangGraph, faster to prototype but less flexible.

Anthropic’s Agent SDK: Lightweight orchestration focused on tool use and handoffs. Best for supervisor-pattern systems built on Claude models. Native support for handoffs, guardrails, and tool management.

Autogen (Microsoft): Conversational multi-agent framework. Best for debate-pattern systems and agent-to-agent dialogue. Strong integration with Azure infrastructure.

Custom orchestration: Direct API calls with application-level coordination code. Best for teams with strong engineering capability who need maximum control and minimal framework overhead. We use custom orchestration for our production content engine because none of the frameworks supported our specific state management and quality gating requirements.

Orchestration at The Thinking Company

Our production orchestration uses a hybrid of Router + Pipeline + Evaluator-Optimizer patterns. Incoming tasks are classified by complexity (fast-track, standard, deep-dive) using a Claude Haiku router. Fast-track tasks go to a single agent. Standard tasks enter a 4-stage pipeline (Research, Analyze, Draft, Review). Deep-dive tasks enter a 6-stage pipeline with an evaluator-optimizer loop on the Draft and Review stages.

This hybrid reduced our average processing cost by 45% compared to our previous approach of routing everything through the full 6-stage pipeline, while maintaining quality scores within 2% of the all-pipeline approach.

For client implementations, the pattern selection depends on the client’s AI maturity level, infrastructure, and use case. We design and deploy custom orchestration architectures through our AI Build Sprint (EUR 50–80K, 4–6 weeks) for well-scoped workflows, and through our AI Product Build (EUR 200–400K+, 3–6 months) for complex, multi-workflow agent platforms.

Frequently Asked Questions

Which orchestration pattern should I start with?

Start with the Sequential Pipeline pattern. It is the simplest to implement, debug, and monitor. Build a 3–4 stage pipeline for your most well-defined workflow, validate that it produces acceptable quality, and then optimize. Add a Router pattern if you handle diverse request types. Add Fan-Out if parallelism would reduce latency for independent subtasks. The Supervisor pattern is appropriate once you have validated that a pipeline approach is insufficient — it is more powerful but harder to debug and more expensive to operate.

How does orchestration pattern choice affect cost?

Pattern choice has a 2–5x impact on LLM inference costs. Router patterns reduce costs by 30–60% by sending simple requests to cheaper models. Pipeline patterns are cost-efficient because each agent receives scoped context. Fan-Out patterns multiply costs proportionally to agent count. Evaluator-Optimizer loops multiply costs by 1.5–3x depending on iteration frequency. Supervisor patterns add 20–40% overhead for supervisor reasoning. The most cost-efficient production systems combine a Router with Pattern-specific backends — simple requests get a single agent, complex requests get a full pipeline.

Can I change orchestration patterns after deployment?

Yes, and you should expect to. Agent architectures evolve as you accumulate production data about failure patterns, performance bottlenecks, and user needs. Design for pattern migration from the start: use standardized inter-agent contracts (structured JSON schemas), keep agents loosely coupled, and avoid baking orchestration logic into agent instructions. We have migrated multiple production systems from Pipeline to Supervisor patterns and from flat Fan-Out to Hierarchical patterns, typically requiring 1–2 weeks of engineering effort per migration.

How do orchestration patterns interact with governance frameworks?

Governance requirements constrain pattern selection. Workflows requiring full audit trails favor Pipeline and Supervisor patterns because they produce inspectable intermediate artifacts. Workflows requiring human approval checkpoints need patterns that support synchronous gates — Pipeline stages with human-in-the-loop, or Supervisor patterns where the supervisor requests human approval before proceeding. Parallel Fan-Out patterns are harder to govern because multiple agents operate simultaneously, making real-time oversight impractical. Map governance requirements to pattern capabilities before selecting your architecture.

What latency should I expect from multi-agent orchestration?

For a Sequential Pipeline with 4 stages: 20–40 seconds end-to-end (5–10 seconds per stage). For a Supervisor with 3 parallel specialists: 15–25 seconds (supervisor planning + longest specialist + synthesis). For a Fan-Out with synthesis: 10–20 seconds (longest parallel agent + synthesis). For an Evaluator-Optimizer loop: add 50–150% to the base generation time. These figures assume standard enterprise workflows using Claude Sonnet-class models. Simpler tasks with smaller models can achieve 2–5 second per-agent latency.

How do I monitor orchestration health in production?

Monitor three levels: (1) Agent level — latency, error rate, quality scores per agent; (2) Orchestration level — routing accuracy, handoff success rates, queue depths, coordination overhead; (3) Workflow level — end-to-end completion rate, total latency, total cost, human intervention rate. The orchestration-specific metrics (routing accuracy, handoff success rates) are the ones most teams miss, and they are the most diagnostic for coordination failures. Set up alerts when routing accuracy drops below 95%, when handoff failure rates exceed 5%, or when coordination overhead (time spent on inter-agent communication vs. actual work) exceeds 25%.

Do orchestration patterns differ for real-time vs. batch workloads?

Significantly. Real-time workloads (chatbots, live customer interactions) require low-latency patterns — Router for classification, single-agent execution for most requests, and Supervisor only for complex queries. Batch workloads (report generation, data processing, content production) can use any pattern because latency tolerance is measured in minutes or hours, not seconds. Batch workloads benefit most from Pipeline and Evaluator-Optimizer patterns because the quality improvements outweigh the latency costs. Design your orchestration for the latency requirements of your specific use case, not for a generic “fast is always better” principle.