Skip to content
LLM Cost Audit

Fig. 00 — Inference cost audits

Your LLM bill is40–70% largerthan it needs to be.

An independent audit of your OpenAI / Anthropic usage that finds exactly where the money leaks — uncached prompts, over-powered models, missing batch jobs, bloated context — and hands you the dollar math to recover it.

$750 fixed assessment · findings report · confidential under NDA

Current spend
$11,000/mo

Your context stack — re-sent on every call.

Fig. 01 — Where it leaks

Inaction isn't free. It bleeds — on every call.

The audit tears the context stack apart layer by layer. Each one is a leak you pay for on every request — and recover once it's fixed.

  • 01

    Uncached context

    The system prompt, tools and retrieved docs are reprocessed on every call — full price for the same tokens, thousands of times a day.

    $6,400/mo
    −$5,000
  • 02

    Over-powered model

    A frontier model runs classification, routing and extraction that a tier 5× cheaper handles identically.

    $2,400/mo
    −$1,300
  • 03

    No Batch API

    Async work — enrichment, evals, nightly jobs — runs at full real-time price when the Batch API would halve it.

    $1,200/mo
    −$500
  • 04

    Context growth

    Conversation history grows unbounded, so every turn costs more than the last. Capped now keeps unit cost from creeping back later.

    $1,000/mo
    capped
$11,000$4,200/mo62% unit cost on the sample

Fig. 02 — The method

How the audit works

One fixed-price pass over your real usage, turned into a report you can act on — or hand to your own engineers.

1

Send a usage export

After a mutual NDA, you send a usage export from your provider dashboard — not your data, not your prompts. The $750 assessment begins.

2

Get a findings report

A written report shows exactly where spend leaks, ranked by recoverable dollars, with current-vs-optimized math on each line. It's yours whether or not we go further.

3

Optional implementation

If you want the changes made, implementation is priced on results — a flat fee or a share of verified per-unit savings, whichever applies.

The practice

An engineering-led practice from a low-level performance background — the same discipline used to squeeze compilers and systems, pointed at token spend. Every audit is hands-on, fixed-price, and verified against your own invoices.

  • Compiler-grade rigor
  • Provider-agnostic
  • NDA-first
  • Fixed price

Fig. 03 — Confidentiality

Discretion isn't a policy here. It's the product.

A mutual NDA is signed before any data is shared. Your identity, your usage, and your numbers stay private. The client list is never disclosed — not on this site, not to other prospects, not in conversation.

“Your competitor might be a client — and you'd never know. That's exactly the protection you get.”

It's also the honest answer to “who are your clients?” The same wall that keeps their names private keeps yours private too. You can read the mutual NDA before you send a single byte.

Fig. 04 — Implementation & guarantee

Priced on results you can verify against your own invoices.

The assessment stands alone. If you want the fixes made, implementation is a flat fee or a share of verified savings — whichever applies. If the agreed savings aren't achieved, the savings-based fee isn't owed.

See full pricing

How “savings” is defined

Savings are measured as unit cost — cost per 1,000 calls (or per conversation, or per user) — on a fixed, agreed sample of your traffic, compared before and after.

We measure unit cost, not your total monthly bill, because your usage grows over time. This keeps the number we bill on honest and verifiable against your own invoices.

Payment can be made when your next bill confirms the savings. You're never asked to take the result on faith.

Founding cohort

Now booking the first assessments.

No fabricated logos, no manufactured case studies — just early, hands-on engagements at a fixed price. Send a usage export and find out where your number actually sits.