Fig. 00 — Inference cost audits
Your LLM bill is40–70% largerthan it needs to be.
An independent audit of your OpenAI / Anthropic usage that finds exactly where the money leaks — uncached prompts, over-powered models, missing batch jobs, bloated context — and hands you the dollar math to recover it.
$750 fixed assessment · findings report · confidential under NDA
Your context stack — re-sent on every call.
Fig. 01 — Where it leaks
Inaction isn't free. It bleeds — on every call.
The audit tears the context stack apart layer by layer. Each one is a leak you pay for on every request — and recover once it's fixed.
- 01
Uncached context
The system prompt, tools and retrieved docs are reprocessed on every call — full price for the same tokens, thousands of times a day.
$6,400/mo−$5,000 - 02
Over-powered model
A frontier model runs classification, routing and extraction that a tier 5× cheaper handles identically.
$2,400/mo−$1,300 - 03
No Batch API
Async work — enrichment, evals, nightly jobs — runs at full real-time price when the Batch API would halve it.
$1,200/mo−$500 - 04
Context growth
Conversation history grows unbounded, so every turn costs more than the last. Capped now keeps unit cost from creeping back later.
$1,000/mocapped
Fig. 02 — The method
How the audit works
One fixed-price pass over your real usage, turned into a report you can act on — or hand to your own engineers.
Send a usage export
After a mutual NDA, you send a usage export from your provider dashboard — not your data, not your prompts. The $750 assessment begins.
Get a findings report
A written report shows exactly where spend leaks, ranked by recoverable dollars, with current-vs-optimized math on each line. It's yours whether or not we go further.
Optional implementation
If you want the changes made, implementation is priced on results — a flat fee or a share of verified per-unit savings, whichever applies.
The practice
An engineering-led practice from a low-level performance background — the same discipline used to squeeze compilers and systems, pointed at token spend. Every audit is hands-on, fixed-price, and verified against your own invoices.
- Compiler-grade rigor
- Provider-agnostic
- NDA-first
- Fixed price
Fig. 03 — Confidentiality
Discretion isn't a policy here. It's the product.
A mutual NDA is signed before any data is shared. Your identity, your usage, and your numbers stay private. The client list is never disclosed — not on this site, not to other prospects, not in conversation.
“Your competitor might be a client — and you'd never know. That's exactly the protection you get.”
It's also the honest answer to “who are your clients?” The same wall that keeps their names private keeps yours private too. You can read the mutual NDA before you send a single byte.
Fig. 04 — Implementation & guarantee
Priced on results you can verify against your own invoices.
The assessment stands alone. If you want the fixes made, implementation is a flat fee or a share of verified savings — whichever applies. If the agreed savings aren't achieved, the savings-based fee isn't owed.
See full pricingHow “savings” is defined
Savings are measured as unit cost — cost per 1,000 calls (or per conversation, or per user) — on a fixed, agreed sample of your traffic, compared before and after.
We measure unit cost, not your total monthly bill, because your usage grows over time. This keeps the number we bill on honest and verifiable against your own invoices.
Founding cohort
Now booking the first assessments.
No fabricated logos, no manufactured case studies — just early, hands-on engagements at a fixed price. Send a usage export and find out where your number actually sits.