CapabilityAtlas CapabilityAtlas
Sign In
search
Architecture & Systems Fundamentals

Orchestration

Coordinating multi-step LLM workflows: sequencing, parallelization, dependency and state management.

Orchestration — Competence

What an interviewer or hiring manager expects you to know.

Core Knowledge

  • What orchestration means for LLMs. Coordinating multi-step LLM workflows: sequencing (step B depends on step A’s output), parallelization (steps B and C can run simultaneously), branching (different steps based on classification results), error recovery (what happens when step 3 of 7 fails), and state management (carrying context across steps). This is the LLM equivalent of workflow orchestration in distributed systems — same patterns (DAGs, state machines, queues), new constraints (non-deterministic outputs, token limits, cost accumulation).

  • Orchestration frameworks. LangChain LCEL (LangChain Expression Language — composable chains with pipe operator, supports streaming, batch, async, and fallback; good for linear/branching workflows), LangGraph (graph-based orchestration for stateful, cyclic workflows — the agent framework built on LangChain; supports checkpointing, human-in-the-loop nodes, and branching), LlamaIndex Workflows (event-driven orchestration with @step decorators, async-first, streaming), CrewAI (role-based multi-agent orchestration — define agents with roles/goals/tools, coordinate via tasks), AutoGen (Microsoft’s multi-agent conversation framework — agents communicate via message passing), Temporal.io (durable workflow engine — not LLM-specific but excellent for long-running AI workflows with human checkpoints and retry guarantees), Prefect/Airflow (traditional pipeline orchestrators adapted for LLM workflows — better for batch than real-time).

  • Common orchestration patterns. Sequential chain (A → B → C — simplest, e.g., extract → validate → format). Map-reduce (split input into chunks, process in parallel, combine results — for large document processing). Router (classify input, route to specialized handler — e.g., question type → appropriate expert prompt). Evaluator-optimizer loop (generate → evaluate → regenerate if below threshold — for quality-critical outputs). Plan-then-execute (LLM generates a plan, then executes steps one at a time — the basis of most agent patterns). Human-in-the-loop (automated steps interspersed with human review checkpoints — Skill 17).

  • State management across steps. Each step in an orchestration receives context from previous steps and passes context to the next. Challenges: token budget grows with each step (accumulated context), inconsistent state when parallel steps modify shared state, and lost context when a step fails and the workflow retries. Solutions: explicit state objects (Pydantic models carrying only the fields each step needs), conversation memory (LangChain memory modules, LlamaIndex ChatMemoryBuffer), checkpointing (LangGraph’s built-in checkpoint system, Temporal’s durable state), and context summarization (periodically summarize accumulated context to stay within token limits).

  • Error handling in multi-step workflows. When step 3 of 7 fails: retry the step (with the same patterns from Skill 2 — exponential backoff, circuit breaker), skip the step and continue with degraded output (if the step is optional), fall back to a different implementation of the step (cheaper model, simpler prompt), or abort the entire workflow and return a partial result with an explanation. The orchestrator must distinguish retryable errors (API timeout) from non-retryable errors (input is invalid for this step). LangGraph and Temporal both support error handling at the step level.

Expected Practical Skills

  • Build a multi-step LLM pipeline. Implement a 3-5 step workflow: e.g., classify input → retrieve context → generate response → validate output → format for delivery. Use LCEL or LangGraph. Handle errors at each step. Pass state between steps. Add logging per step for debugging.
  • Implement parallel execution. Run multiple LLM calls simultaneously (e.g., extract information from 5 documents in parallel, then combine). Use async/await with the Anthropic/OpenAI SDK or LangChain’s batch operations. Manage rate limits across parallel calls.
  • Design a router workflow. Build a classifier that routes inputs to specialized handlers. Implement: classification prompt → routing logic → handler-specific prompts → unified output format. Measure: routing accuracy, per-handler quality, overall latency.
  • Add checkpointing to a long-running workflow. Implement state persistence so that if a 10-step workflow fails at step 7, it can resume from step 7 instead of restarting. Use LangGraph checkpointing or Temporal durable execution.
  • Debug a multi-step failure. Given a bad output from a 5-step pipeline, trace through the execution log to identify which step produced the error. Use LangFuse trace visualization or LangSmith’s step-by-step trace view.

Interview-Ready Explanations

  • “Walk me through how you’d orchestrate a complex multi-step LLM workflow.” Start with decomposition: break the task into atomic steps, each with a clear input/output contract. Identify dependencies (which steps depend on which). Determine parallelization opportunities (independent steps run concurrently). Choose orchestration framework: LCEL for linear/branching, LangGraph for stateful/cyclic, Temporal for long-running with human checkpoints. Implement state management (Pydantic models for step inputs/outputs). Add per-step error handling (retry, fallback, skip). Instrument with LangFuse for end-to-end tracing. Test: unit test each step with mocked inputs, integration test the full pipeline with golden datasets.

  • “How do you handle failures in a multi-step LLM pipeline?” Per-step error policy: each step defines what happens on failure (retry 3x → fallback to simpler approach → skip → abort). Distinguish retryable (timeout, rate limit) from non-retryable (invalid input, content policy block). Checkpoint state so failures don’t restart from scratch. Partial results: if step 5 of 7 fails, return steps 1-4 results with a clear indication of what’s missing. Alert: trigger monitoring when failure rate exceeds baseline. Post-mortem: LangFuse trace shows exactly where and why the failure occurred.

  • “When would you use LangGraph vs. LCEL vs. Temporal vs. building custom?” LCEL: linear or simple branching workflows where each step is a single LLM call. Fast to build, limited control. LangGraph: stateful workflows with cycles (agent loops), human-in-the-loop nodes, or complex branching. Best for agent-like behavior. Temporal: long-running workflows (hours/days), workflows requiring durable execution guarantees, workflows with external system integrations and human checkpoints. Custom: when framework abstractions add more complexity than they remove, or when you need fine-grained control over execution (typically at very high scale or with unusual requirements).

  • Harness Design — harness is the foundation orchestration builds on
  • Agent Architecture — agents are orchestration with autonomous decision-making
  • Prompting — each orchestration step needs a well-designed prompt