CapabilityAtlas CapabilityAtlas
Sign In
search
Integration & Operations Fundamentals

LLM Cost Estimation

Pre-project cost modeling, token economics, cost/quality/latency tradeoffs, budget-aware architecture.

LLM Cost Estimation — Competence

What an interviewer or hiring manager expects you to know.

Core Knowledge

  • Token economics across providers. Know the pricing models for Anthropic Claude (Opus 4 at ≈$15/MTok input, ≈$75/MTok output; Sonnet 4 at ≈$3/$15; Haiku at ≈$0.25/$1.25), OpenAI (GPT-4o at ≈$2.50/$10; o3 at ≈$10/$40), Google Gemini (2.5 Pro at ≈$1.25/$10), and how open-source models (Llama 3, Mistral) shift cost to GPU compute. Understand why output tokens cost 3-5x input tokens (autoregressive generation vs. parallel encoding).

  • Cost structure of agentic workflows. A single Claude Code session can cost $5-$50+ depending on task complexity. Multi-agent systems compound: Agent A’s output becomes Agent B’s input context. Context windows grow over a project’s life — early tasks are cheap (small codebase), late tasks expensive (large codebase, extensive history). Know the difference between per-request cost and per-project cost.

  • Observability tooling. Langfuse (open-source, traces + cost attribution per run), Helicone (proxy-based, request-level cost + caching), LiteLLM (provider-normalizing proxy with cost tracking), Portkey (AI gateway with routing + cost aggregation). Know that these tools track past spend but none predict future cost — the estimation gap.

  • The estimation problem. Pre-project cost estimation doesn’t exist as a product category. Current practice is spreadsheets and gut feel. The core difficulty: non-deterministic execution (same prompt, different token counts), heavy-tailed failure distribution (median task costs X, cascading rework costs 50X), growing context tax, and no historical baselines to calibrate against.

  • Model routing as cost lever. Routing 60-70% of tasks to a cheaper model (Sonnet/Haiku) while reserving Opus for complex reasoning can cut costs 40-60% with minimal quality degradation. Tools: Martian, Unify AI, custom routers. The routing function itself is a cost-quality optimization problem.

Expected Practical Skills

  • Estimate a project before running it. Given a project description (“build OAuth into a Next.js app”), produce a cost range (P10/P50/P90) based on task count, average tokens per task, model selection, and retry probability.
  • Set up cost tracking. Instrument an LLM application with Langfuse or Helicone to capture per-request and per-session cost. Attribute costs to features, users, or project phases.
  • Build a cost comparison. Given a task, estimate cost across Claude Opus vs. Sonnet vs. GPT-4o vs. open-source. Include not just API price but latency, quality, and retry rates.
  • Calculate unit economics. For an AI product: LLM cost per user interaction, per MAU, and resulting gross margin at a given price point.
  • Monitor and alert on budget. Set up spend alerts, per-session caps, and budget-aware degradation (switch to cheaper model when budget depletes).

Interview-Ready Explanations

  • “Walk me through how you’d estimate the cost of a complex agentic project before starting it.” Task decomposition (how many LLM invocations) → tokens per invocation (input context + output) → model pricing → multipliers for retries (10-30%), context growth (1.5-3x), and eval costs. Present as a distribution. Cite Infracost as the cloud analog.

  • “How do you decide which model to use for a given task?” Cost-quality-latency optimization. Define quality threshold. Compare Opus at $15/$75 per MTok vs. Sonnet at $3/$15 — 5x difference. Measure on a representative sample. If Sonnet achieves >95% of Opus quality, route there. Factor latency and rate limits.

  • “What are the failure modes in LLM cost management?” Late-project cost explosion (last 20% costs more than the first 80%). Eval cost blindness. Context reconstruction after human review pauses. Cascading retries in multi-agent pipelines. Budget exhaustion without graceful degradation. Spec ambiguity driving exploration loops.