LLM Cost Estimation — Competence
What an interviewer or hiring manager expects you to know.
Core Knowledge
-
Token economics across providers. Know the pricing models for Anthropic Claude (Opus 4 at ≈$15/MTok input, ≈$75/MTok output; Sonnet 4 at ≈$3/$15; Haiku at ≈$0.25/$1.25), OpenAI (GPT-4o at ≈$2.50/$10; o3 at ≈$10/$40), Google Gemini (2.5 Pro at ≈$1.25/$10), and how open-source models (Llama 3, Mistral) shift cost to GPU compute. Understand why output tokens cost 3-5x input tokens (autoregressive generation vs. parallel encoding).
-
Cost structure of agentic workflows. A single Claude Code session can cost $5-$50+ depending on task complexity. Multi-agent systems compound: Agent A’s output becomes Agent B’s input context. Context windows grow over a project’s life — early tasks are cheap (small codebase), late tasks expensive (large codebase, extensive history). Know the difference between per-request cost and per-project cost.
-
Observability tooling. Langfuse (open-source, traces + cost attribution per run), Helicone (proxy-based, request-level cost + caching), LiteLLM (provider-normalizing proxy with cost tracking), Portkey (AI gateway with routing + cost aggregation). Know that these tools track past spend but none predict future cost — the estimation gap.
-
The estimation problem. Pre-project cost estimation doesn’t exist as a product category. Current practice is spreadsheets and gut feel. The core difficulty: non-deterministic execution (same prompt, different token counts), heavy-tailed failure distribution (median task costs X, cascading rework costs 50X), growing context tax, and no historical baselines to calibrate against.
-
Model routing as cost lever. Routing 60-70% of tasks to a cheaper model (Sonnet/Haiku) while reserving Opus for complex reasoning can cut costs 40-60% with minimal quality degradation. Tools: Martian, Unify AI, custom routers. The routing function itself is a cost-quality optimization problem.
Expected Practical Skills
- Estimate a project before running it. Given a project description (“build OAuth into a Next.js app”), produce a cost range (P10/P50/P90) based on task count, average tokens per task, model selection, and retry probability.
- Set up cost tracking. Instrument an LLM application with Langfuse or Helicone to capture per-request and per-session cost. Attribute costs to features, users, or project phases.
- Build a cost comparison. Given a task, estimate cost across Claude Opus vs. Sonnet vs. GPT-4o vs. open-source. Include not just API price but latency, quality, and retry rates.
- Calculate unit economics. For an AI product: LLM cost per user interaction, per MAU, and resulting gross margin at a given price point.
- Monitor and alert on budget. Set up spend alerts, per-session caps, and budget-aware degradation (switch to cheaper model when budget depletes).
Interview-Ready Explanations
-
“Walk me through how you’d estimate the cost of a complex agentic project before starting it.” Task decomposition (how many LLM invocations) → tokens per invocation (input context + output) → model pricing → multipliers for retries (10-30%), context growth (1.5-3x), and eval costs. Present as a distribution. Cite Infracost as the cloud analog.
-
“How do you decide which model to use for a given task?” Cost-quality-latency optimization. Define quality threshold. Compare Opus at $15/$75 per MTok vs. Sonnet at $3/$15 — 5x difference. Measure on a representative sample. If Sonnet achieves >95% of Opus quality, route there. Factor latency and rate limits.
-
“What are the failure modes in LLM cost management?” Late-project cost explosion (last 20% costs more than the first 80%). Eval cost blindness. Context reconstruction after human review pauses. Cascading retries in multi-agent pipelines. Budget exhaustion without graceful degradation. Spec ambiguity driving exploration loops.
Related
- Model Routing — routing is the primary cost optimization lever
- Eval Frameworks — eval costs are a first-class budget item
- Use Case Qualification — cost modeling feeds ROI calculations