The Consultant Who Closes AI Deals Others Can't Even Scope

The meeting that changed the job

Somewhere in the last 18 months, your customer meetings shifted. CTOs used to ask about integrations and uptime SLAs. Now they ask: “Can your platform do AI? What would it cost to process 50,000 documents a month? How do we handle PII?”

Most SEs freeze at question two. They say “let me get back to you,” loop in product, wait three weeks, and lose the deal to a competitor whose SE modeled costs on the call.

The SE who can qualify AI use cases, estimate token costs in real time, navigate compliance, and propose an architecture — that person isn’t just closing deals. They’re defining what the product builds next.

This isn’t a career pivot. It’s four capabilities layered on top of what you already do.

Capability 1: Use case qualification — including when to kill the deal

Your buyers have AI use cases they want to explore. Most are bad ideas. The SE who kills bad ideas early saves everyone — including the customer — months of wasted work. The SE who can’t kill bad ideas early delivers failed pilots that burn trust and revenue.

The qualification framework:

Data availability. Does the customer actually have the data? “We have all our customer data” means nothing. Where is it? What format? How clean? How often updated? If the answer involves “we’d need to consolidate three systems first,” the project starts with data engineering, not AI. That triples timeline and cost. Surface this in the first meeting, not in month three.

Error tolerance. What happens when the AI is wrong? A bad product recommendation is invisible. A wrong medical code denies patient care. Error tolerance determines accuracy threshold, which determines model choice, which determines cost. A 95% requirement and a 99.5% requirement are different projects with different budgets — and often different architectural patterns. Ask this question explicitly: “If the AI gives a wrong answer, what’s the worst-case business impact?” The answer tells you whether this is a $50K project or a $500K project.

Volume and unit economics. How many invocations per day, at what input size, with which model? If you can model this on a whiteboard during the meeting, you’re the SE who closes the deal. If you can’t, you’re the SE who forwards the question to product.

Build vs. buy vs. configure. Many use cases are solved by existing products — Glean for enterprise search, Harvey for legal, Cohere for RAG-as-a-service. The SE who knows the landscape saves the customer a six-month build by recommending a two-week integration. Being the person who says “you don’t need to build this” builds more trust than being the person who scopes a six-month project.

Capability 2: Cost estimation — including the variance

Token economics is the new cloud pricing — except it changes quarterly and most estimates are wrong by 2-5x when they hit production. Here’s how to get it right.

The napkin math. A support agent averaging 2,000 input tokens and 500 output tokens per turn, 15 turns per conversation, 1,000 conversations/day on Sonnet: ~30M input tokens/month ($90) + 7.5M output tokens/month ($112) = roughly $200/month in base token cost.

Now add what your napkin missed:

Retries and fallbacks. Under normal conditions, retries add 10-15% to token cost. During API degradation events (which happen 2-4 times per year per provider), retry costs can spike 50-100% for that period. Budget 20% overhead as baseline.
RAG retrieval calls. Each query hits an embedding model for search. At scale, embedding costs are 10-30% of generation costs. Most estimates forget this line item.
Eval and monitoring. Running your eval suite and production quality sampling costs tokens too. Budget 5-10% of production volume for eval.
Long-tail inputs. Your average is 2,000 input tokens. Your 95th percentile might be 15,000. Your 99th percentile might be 50,000. A handful of pathological inputs can dominate your token spend. Cap input lengths or alert on cost-per-request anomalies.

Present as a range, not a point estimate. “Best case: $200/month if input stays within expected bounds. Expected: $300/month including retries and eval. Worst case: $800/month during high-volume periods or API degradation.” Your customer trusts ranges. They don’t trust a single number that will inevitably be wrong.

Model routing saves 60-80% on mixed workloads. Classification tasks run on Haiku at $0.25/M input tokens. Complex analysis needs Sonnet. Smart routing — a lightweight model triages requests, escalating only when necessary — cuts costs dramatically. If you can explain this to a CTO on a whiteboard, you’ve differentiated yourself from every other SE in the room.

Why AI deals fail — and how to see it coming

Most AI content for consultants assumes: qualified → scoped → closed. In practice, deals fail for reasons that have nothing to do with the technology.

The pilot that doesn’t convert. The customer runs a 30-day pilot. Results are “promising” but not conclusive. Nobody defined success criteria upfront. The pilot evaluation becomes a committee debate. The deal stalls indefinitely. Prevention: Define pilot success criteria in the SOW. “If accuracy exceeds 90% on the test set AND cost per transaction is under $0.50, the customer commits to the production contract.” Measurable. Binary. Agreed before the pilot starts.

The compliance wall. Legal review adds 30-90 days to every enterprise AI deal. DPAs need negotiation. Security reviews require documentation you haven’t prepared. Internal AI governance committees (which many enterprises created in 2025) want to review the architecture. Prevention: Start compliance work in week one, in parallel with technical scoping. Don’t wait for the POC to succeed before engaging legal. Prepare the DPA comparison matrix, the data flow diagram, and the security questionnaire responses before the customer asks for them.

The stakeholder split. The CTO wants AI. The VP of Operations worries about reliability. The CFO questions ROI. The CISO wants to slow down. These aren’t technical objections — they’re organizational friction. Prevention: Map the buyer org in the first meeting. Identify who benefits, who’s at risk, and who has veto power. Address each stakeholder’s concern in the proposal: the CTO gets the architecture, the VP gets the guardrails and monitoring plan, the CFO gets the cost model with ROI projection, the CISO gets the security assessment.

The data that doesn’t exist. The customer says they have the data. They don’t. Or they have data, but it’s in 15 different systems, inconsistently formatted, with no clear owner. The “AI project” becomes a data integration project that takes 6 months before any model is called. Prevention: Request a data sample in the qualification phase. Not a description of the data. Not a schema. An actual sample. If they can’t produce one in two weeks, the data isn’t ready.

Capability 3: Compliance and security fluency

Every enterprise AI deal stalls at compliance. The SE who navigates these conversations — as a technical advisor, not a lawyer — unblocks deals that die in legal review.

Data residency. Where does data go when sent to an LLM API? AWS Bedrock and Azure OpenAI Service keep data in the customer’s cloud account. Direct API calls to Anthropic or OpenAI send data to the provider’s infrastructure (with DPA protections). A financial services customer with strict data sovereignty needs Bedrock or Azure — not direct API calls. Know which option fits which requirement. Have the comparison ready before the customer’s legal team asks.

PII handling. Know the options: pre-processing PII redaction (Presidio, custom NER), provider-side DPAs with no-training clauses, or self-hosted models (Llama, Mistral) that keep data entirely on-premises. The right answer depends on jurisdiction, data type, and risk appetite. The wrong answer is “we’ll figure it out later.” Figure it out in the proposal.

Prompt injection and output safety. The CISO will ask. Know the mitigation stack: input filtering, output validation, sandboxed tool execution, parameter validation on all tool calls, human-in-the-loop for high-risk actions. You don’t need to implement these — you need to articulate them credibly in a 5-minute conversation.

Capability 4: Architecture recommendation — including where it breaks

The customer doesn’t need “AI.” They need a specific architecture within their constraints. Recommend the right one — and be honest about the failure modes.

RAG vs. fine-tuning vs. prompt engineering. RAG when data changes frequently. Fine-tuning when the task requires a specific output style prompting can’t achieve. Prompt engineering alone for classification, extraction, and summarization on well-structured inputs. Most SEs default to RAG for everything. Sometimes a well-crafted prompt with structured output is all the customer needs.

Where RAG fails: Retrieval returns stale documents when the knowledge base isn’t maintained. Chunking strategy doesn’t match the document structure, producing incoherent context. Multiple documents contradict each other and the system picks whichever was retrieved first. Tell the customer: “RAG works well for X, but you need a re-indexing strategy, a conflict resolution approach, and retrieval quality monitoring. Here’s what that looks like.”

Agents vs. workflows. Fixed, predictable steps? Build a workflow. Dynamic planning where the next step depends on results? That’s an agent. Agents are more capable but harder to control, debug, and cost-predict. Don’t recommend an agent when a workflow will do. Where agents fail: Runaway loops that burn through token budgets. Tool selection errors where the agent uses the wrong capability. Cascading failures where an early mistake compounds through every subsequent step. Tell the customer: “An agent architecture gives you flexibility, but you need cost controls, tool auditing, and circuit breakers. Here’s the monitoring plan.”

Self-hosted vs. API. Self-hosted models eliminate per-token costs but add infrastructure complexity. Below ~10M tokens/month, APIs are almost always cheaper. Above ~100M tokens/month, self-hosted makes economic sense — if the customer has GPU infrastructure or will use a managed service.

One deal, end to end

The ask: A healthcare company wants to automate prior authorization reviews. 500 reviews per day, each involving a 10-page clinical document and a coverage policy. Currently done by nurses at $45/hour, averaging 20 minutes per review.

Qualification: Data exists (clinical docs in their EHR, policies in a document store). Error tolerance is low — a wrong authorization denial delays patient care. This isn’t a 95%-is-fine use case. It’s a 99%+ requirement on denials, with a human-in-the-loop for all AI-recommended denials.

Cost model: 500 reviews/day × ~8,000 input tokens per review (document + policy) × Sonnet pricing. Base token cost: ~$600/month. Add RAG for policy retrieval: +$100/month. Add eval suite: +$50/month. Add human review on all denials (~30% of volume, 150/day × 5 min nurse review): $18,750/month. Total AI-assisted cost: ~$19,500/month. Current cost: 500 × 20min × $45/hr = $7,500/day = ~$165,000/month. Net savings: ~$145,000/month. Even with the high human review cost, the ROI closes immediately because the current process is expensive.

Architecture: RAG for policy lookup (policies change quarterly, need re-indexing). Structured output for the authorization decision (approve/deny/review with cited policy sections). Human-in-the-loop for all denials. Monitoring: track denial override rate (nurse disagrees with AI) — if it exceeds 5%, the system needs prompt revision.

What could kill it: Compliance review (HIPAA, 60-90 day timeline — start immediately). Data integration (EHR extraction may require vendor cooperation). Stakeholder alignment (clinical staff may resist AI involvement in care decisions — address with the human-in-the-loop design).

Outcome: The deal scopes at $200K implementation + $24K/year platform fee. The customer sees $1.7M/year in savings. The pilot criteria: denial override rate under 5% on 100 cases.

The 60-day milestone

Days 1-15: Learn token economics cold. Estimate costs — with ranges — for any use case on a whiteboard. Practice with five real customer scenarios. Build a spreadsheet model you can adapt in real time.

Days 16-30: Build compliance fluency. Read the DPAs for Anthropic, OpenAI, and your cloud provider. Prepare a one-page compliance comparison matrix. Prepare the security questionnaire responses before anyone asks.

Days 31-45: Re-qualify three stalled deals using the framework above. For each: identify the actual blocker (data, compliance, stakeholder, cost). Write a one-page technical scope that addresses it directly.

Days 46-60: Scope a complete AI engagement for a real customer: qualified use case, architecture recommendation with failure modes, cost model with ranges, compliance approach, pilot success criteria, and 90-day implementation timeline. Present it to the customer.

At the end of 60 days, you’ve scoped a real AI engagement with cost ranges, compliance work started in parallel, and pilot criteria agreed upfront. That’s not a certification — it’s a deal that closes because you did the work that nobody else in the room could do.