When a millisecond is money
Your p99 is the number that loses you the auction.
In real-time bidding, every AI call sits inside a hard deadline measured in single-digit milliseconds. Vidai's overhead stays flat at the peak, while Python proxies spike into seconds exactly when traffic is highest.
Who this is for
Four models, one auction, one clock.
An RTB auction closes in roughly 100ms. Inside that, several AI models each run on their own far tighter budget. Miss any one of them and the bid is late, which means the bid is lost.
Bid shading & pricing
Predicting the optimal clearing price in a first-price auction so the buyer does not overpay. The tightest budget on the path: it runs on essentially every impression.
CTR / CVR prediction
Scoring click-through and conversion likelihood at bid time. The model that decides whether the impression is worth bidding on at all.
Pre-bid fraud & IVT
Catching bots, data-centre traffic and click-farms before money is spent. Gradient-boosted trees over session signatures, run as a parallel ensemble.
Traffic shaping (sell side)
On the SSP, filtering bid requests a buyer is unlikely to want before they ever go over the wire, saving bandwidth and compute at both ends.
The budgets above are per-model, inside the wider auction window. They are the reason a gateway in front of these models cannot add a millisecond it cannot account for. Trading-adjacent inference and live product surfaces face the same shape of deadline.
The real problem
It's not the median. It's the peak.
Everything is fast at low traffic. Python middleware degrades non-linearly: the p99 spike lands exactly during the volume you actually care about, and a garbage-collection pause can stall a request mid-auction. "Usually fine" is not a latency guarantee when a 5ms budget is on the line.
Full benchmark & methodology → · Read the Rust-vs-Python deep dive →
The architecture trend
The hot path is moving off the network hop.
At a million-plus queries per second, a microservice call for every model is dead weight: the network hop alone can spend the budget, and a garbage-collection pause blows the auction outright. The industry response is to compile and distil models down so they run without a hop and without a collector. Vidai is built on the same physics, applied to the layer in front of those models.
Hot path, cold path
Not everything belongs on the clock.
The teams that win the latency game are strict about what runs inside the auction and what does not. Heavy work is pushed off the hot path, pre-computed, and cached. Vidai is built for the part that genuinely has to be on the clock.
Why it holds the line
Deterministic by construction, not by tuning.
Put it on your critical path.
A 20-minute technical walkthrough: the latency profile under load, the p99 numbers, and how it behaves at your peak.