Fig. — About

The inference-cost divisionof House of STK.

House of STK is an independent engineering practice. LLM Cost Audit is its inference-cost division — built on one plain observation: an LLM bill is an engineering problem, not a fixed cost. It runs into real money, and someone has to sit down and fix it.

Why this division exists

The bill is made in the code, not the model rate.

The model price is the part everyone looks at and the part that matters least. The money is made elsewhere — in the prefix you re-send on every call, the frontier model running work a cheaper tier handles identically, the async jobs paying real-time rates, the context that grows unbounded.

That's the same class of inefficiency a systems engineer hunts in a hot loop — only here it's sitting in plain sight on a monthly invoice. We started this division to point that discipline at token spend: find where unit cost leaks, fix it without touching output, and prove the result against your own invoices.

Every engagement is run hands-on. No junior hand-off, no generic script — the analysis is done by the engineer whose name is on it.

How we work

Four principles the practice runs on

Unit cost, not the bill

We measure cost per 1,000 calls (or per conversation, or per user) on a fixed sample, before and after — the only number that separates real efficiency from usage growth, and the only one we bill savings on.

No quality trade-offs

Caching, model tiering, and batching change the price, not the output. The responses your users see are identical; you simply stop overpaying for them.

Confidential by default

A mutual NDA is signed before any data is shared. Your identity, usage, and numbers stay private, and the client list is never disclosed — to anyone.

Priced on results

A fixed $750 assessment that stands on its own, and optional implementation owed only when verified savings actually land against your invoices.

Who runs the audits

Soumik Ghosh

Engineer & practice lead

I run the audits here — every one, end to end. My background is in systems and performance engineering; this is the same work, pointed at inference cost. LLM bills get expensive in a handful of predictable ways, and most teams are too busy shipping to chase them down. That's the part I handle: find where the spend leaks, fix it without changing what your users see, and show the math against your own invoices.

Connect on LinkedIn

Naming the person in charge cuts both ways: you know exactly who handles your data — and that same person is bound by the mutual NDA that keeps your identity, usage, and numbers private. Discretion isn't a policy here; it's the product. Read the mutual NDA.

Find out where your number actually sits.

A $750 assessment turns your usage export into a line-by-line findings report — run by the engineer above, confidential under NDA.

Book an assessmentRead the writing