Failure Mode Reasoning — Competence
What an interviewer or hiring manager expects you to know.
Core Knowledge
-
The taxonomy of LLM failures. Hallucination (model generates plausible but factually incorrect information — the most discussed failure mode; sub-types: factual fabrication, citation hallucination as in Avianca incident 2023, entity confusion, temporal confusion). Instruction drift (model gradually ignores instructions over long conversations or large contexts — instructions at the beginning lose influence). Context poisoning (irrelevant or contradictory information in the context degrades output quality — a single bad RAG chunk can corrupt the whole response). Prompt injection (malicious input overrides system instructions — Skill 15 covers defense). Cascade failures (error in step 2 of a 7-step pipeline propagates and amplifies through subsequent steps). Sycophancy (model agrees with user even when the user is wrong). Refusal over-correction (model refuses reasonable requests due to overly conservative safety training).
-
Why LLM failures are structurally different from software bugs. Software bugs are deterministic — the same input produces the same wrong output every time, and you can write a test that catches it. LLM failures are probabilistic — the same input might succeed 90% of the time and fail 10%. This means: you can’t “fix” a failure by fixing a single code path; you must design systems that tolerate probabilistic failures. The failure rate is a distribution, not a binary. Testing requires statistical methods (Skill 9), not just unit tests.
-
Hallucination types and mitigations. Intrinsic hallucination (contradicts the provided context — mitigate with faithfulness checking via Ragas, groundedness scoring via Azure Content Safety). Extrinsic hallucination (adds information not in the context — can be correct or incorrect; mitigate with source attribution requirements, “cite your sources” instructions). Factual fabrication (invents facts, statistics, citations — mitigate with fact-checking against knowledge bases, structured output with source fields that must be populated). The Patronus AI Lynx model and Vectara’s FCS (Factual Consistency Score) are specialized hallucination detectors.
-
Cascade failure patterns in multi-step systems. A coding agent makes a wrong assumption in step 3 → writes incorrect code in step 5 → the test fails in step 7 → the agent “fixes” it in step 9 by introducing a worse hack → 15 steps later the project is in a bad state. Mitigation: validate outputs at each step (not just the final output), implement rollback capability (revert to last-known-good state), set cost/step budgets to prevent runaway cascades, and design “circuit breakers” that stop execution when quality metrics drop below threshold.
-
Sycophantic confirmation. The agent confirms incorrect data you provide and builds an entire system around it. Unlike hallucination (where the model invents data), sycophantic confirmation is the model agreeing with YOUR bad data — dirty spreadsheets, outdated documentation, wrong assumptions stated as facts. The model won’t push back. It will validate your incorrect inputs, firm against them, and produce confident output that inherits every upstream error. This is especially dangerous in enterprise contexts where agents ingest company data stores that haven’t been audited. Mitigation: validate input data independently before feeding it to agents, design prompts that explicitly instruct the model to flag contradictions and inconsistencies in provided data rather than assuming it’s correct, and implement verification loops that cross-check agent outputs against authoritative sources.
-
Tool selection error. The agent picks the wrong tool for the task. It might still produce output — but using a tool it should never have invoked in the first place. Common causes: ambiguous or overlapping tool descriptions in the system prompt, too many tools available (18+ degrades selection reliability — scope to 4-5 per agent), tool descriptions that are too long or too similar, and system prompt instructions that create unintended keyword associations with specific tools. The Claude Certified Architect exam tests this failure mode specifically because it’s a primary bottleneck in production agentic systems. Mitigation: write tool descriptions that explicitly state when to use THIS tool versus similar tools (see MCP Design, skill 21), limit tool count per agent, test tool selection with adversarial edge cases, and use
tool_choiceforced selection for mandatory first steps. -
Provider-level failures. API outages (Anthropic, OpenAI, and Google all have multi-hour outages 2-4x per year), rate limiting (429 errors spike during peak usage), model behavior changes (provider updates the model and your outputs change — the “GPT-4 got worse” incidents of 2023-2024), and deprecation (model versions sunset with 6-12 months notice). Mitigation: multi-provider fallback (Skill 14), version pinning, nightly regression testing (Skill 11), and graceful degradation design.
-
Defensive architecture patterns. Defense in depth (multiple independent checks catch different failure types — no single layer catches everything), graceful degradation (when the AI fails, fall back to a simpler response rather than nothing), blast radius limitation (contain failures to the affected feature rather than the whole application), and fail-safe defaults (when uncertain, choose the safer action — don’t send the email, don’t execute the trade, don’t publish the content).
Expected Practical Skills
- Conduct a failure mode analysis for an LLM feature. Given a product spec, enumerate: what can go wrong at each step, what’s the probability (rare/occasional/frequent), what’s the severity (cosmetic/functional/dangerous), and what’s the mitigation for each. Produce a failure mode table.
- Design a cascade failure test. Simulate failures at each step of a multi-step pipeline and verify that the system handles each gracefully — retry, fallback, abort with useful error message, or degrade without data corruption.
- Implement hallucination detection. Add faithfulness checking to a RAG system: compare generated output against retrieved context using Ragas faithfulness metric or a custom LLM-as-judge rubric. Alert when faithfulness drops below threshold.
- Build a chaos test for LLM systems. Simulate: API timeouts, rate limits, malformed responses, out-of-distribution inputs, and adversarial prompts. Verify the system handles each without crashing, returning dangerous output, or losing data.
- Write a failure postmortem. When an LLM system fails in production: document what happened, the root cause, the blast radius, how it was detected, how it was resolved, and what changes prevent recurrence. The format mirrors software incident postmortems but includes LLM-specific elements (model version, prompt version, input characteristics).
Interview-Ready Explanations
-
“Walk me through how you’d analyze failure modes for a new LLM application.” Start with a failure mode enumeration: for each component (prompt, retrieval, generation, output parsing, delivery), list what can fail, how likely it is, and what the impact is. Classify by severity: cosmetic (wrong formatting — fix later), functional (wrong answer — needs guardrail), dangerous (harmful output, data leak — needs hard block). Design mitigations: hallucination → faithfulness checking, cascade failure → step-level validation + rollback, provider outage → multi-provider fallback, prompt injection → guardrails stack. Prioritize: fix dangerous failures first, then functional, then cosmetic.
-
“How do you design for reliability when the LLM itself is probabilistically unreliable?” Accept that LLMs will fail some percentage of the time — design the system around that reality. Layers: output validation (catch malformed responses — retry), quality scoring (catch low-quality responses — escalate to a better model or human), guardrails (catch harmful responses — block), monitoring (detect quality degradation in production — alert), and human fallback (when all automated recovery fails — route to a human). The goal isn’t 100% reliability from the LLM — it’s 99.9% reliability from the system.
-
“What are the most dangerous failure modes in production LLM systems?” Confidently wrong (hallucination that sounds authoritative — the user trusts it and makes a bad decision; the Avianca lawyers cited fake cases). Slow degradation (quality drops 1% per week as data drifts — undetectable without monitoring, becomes a crisis after months). Adversarial exploitation (prompt injection that extracts PII or bypasses safety — a security incident, not just a quality issue). Cascading rework (an agent makes a wrong decision early, then spends 50 steps trying to fix it — burns budget with no recovery). Silent failure (the system returns a plausible but wrong answer with no error signal — the user never knows it failed).
Related
- Agent Architecture — agents have the most complex failure modes
- Guardrails & Safety — guardrails are the first line of failure defense
- Regression Detection — detecting failure rate changes over time