Writing
The mechanics of an LLM bill
Direct, vendor-neutral notes on where inference spend leaks and how to recover it — the same analysis an assessment runs against your real usage.
- 3 min read
The Anatomy of an LLM Bill: Where 40–70% of Your Inference Spend Actually Leaks
Most AI teams overpay for inference by 40–70% — not because the models are expensive, but because of how requests are constructed. Here's where the money actually goes, and why the bill grows faster than your users.
llm cost optimizationinference costopenaianthropic - 4 min read
A 10-Point Checklist to Audit Your OpenAI / Anthropic Spend
A practical, vendor-neutral checklist for finding waste in your LLM bill — caching, model tiering, batching, context management, and the metrics that actually tell you whether you're efficient.
llm cost auditcost optimizationopenaianthropic - 3 min read
Stop Paying Opus Prices for Haiku Work: Model Tiering and the Batch API
Two of the biggest LLM cost levers are choosing the right model tier per task and routing async work through the Batch API. Stacked, they can cut a workload's cost by 90% with no change in output quality.
model selectionbatch apillm cost optimizationanthropic - 3 min read
Prompt Caching: The Highest-Leverage Cost Lever Most AI Teams Aren't Using
Prompt caching can cut the cost of repeated context by up to 90% on Anthropic and 50% on OpenAI. Here's how it works, the real dollar math, and why most teams leave it switched off.
prompt cachingllm cost optimizationanthropicopenai