When a millisecond is money

Your p99 is the number that loses you the auction.

In real-time bidding, every AI call sits inside a hard deadline measured in single-digit milliseconds. Vidai's overhead stays flat at the peak, while Python proxies spike into seconds exactly when traffic is highest.

LATENCY vs LOAD slow fast low traffic 21,000 RPS Python proxy Vidai · sub-2ms, flat Everyone is fast at low traffic. Only one stays flat at the peak.

Who this is for

Four models, one auction, one clock.

An RTB auction closes in roughly 100ms. Inside that, several AI models each run on their own far tighter budget. Miss any one of them and the bid is late, which means the bid is lost.

01< 5 ms

Bid shading & pricing

Predicting the optimal clearing price in a first-price auction so the buyer does not overpay. The tightest budget on the path: it runs on essentially every impression.

02< 10 ms

CTR / CVR prediction

Scoring click-through and conversion likelihood at bid time. The model that decides whether the impression is worth bidding on at all.

03< 15 ms

Pre-bid fraud & IVT

Catching bots, data-centre traffic and click-farms before money is spent. Gradient-boosted trees over session signatures, run as a parallel ensemble.

04< 10 ms

Traffic shaping (sell side)

On the SSP, filtering bid requests a buyer is unlikely to want before they ever go over the wire, saving bandwidth and compute at both ends.

The budgets above are per-model, inside the wider auction window. They are the reason a gateway in front of these models cannot add a millisecond it cannot account for. Trading-adjacent inference and live product surfaces face the same shape of deadline.

The real problem

It's not the median. It's the peak.

Everything is fast at low traffic. Python middleware degrades non-linearly: the p99 spike lands exactly during the volume you actually care about, and a garbage-collection pause can stall a request mid-auction. "Usually fine" is not a latency guarantee when a 5ms budget is on the line.

40ms
p99 under the 21,803 RPS peak, where Python proxies hit seconds
23.5ms
p95 at that same peak, the tail stays bounded
0
GC pauses on the hot path, the spike has no cause
1.95ms
Median, the floor the tail sits close to

Full benchmark & methodology →  ·  Read the Rust-vs-Python deep dive →

The architecture trend

The hot path is moving off the network hop.

At a million-plus queries per second, a microservice call for every model is dead weight: the network hop alone can spend the budget, and a garbage-collection pause blows the auction outright. The industry response is to compile and distil models down so they run without a hop and without a collector. Vidai is built on the same physics, applied to the layer in front of those models.

In-path, not another hop. Vidai is not a microservice your bidder calls over the wire. It sits in the request path as one compiled Rust engine, so governing the traffic does not cost you a round trip.
It is the path, not the model. Your bid-shading, CTR and fraud models stay yours, wherever they run. Vidai is the governed layer around them: routing, guardrails, cost and fallback in one pass, not seven.
No collector on the hot path. A Rust engine has no garbage collector, so there is no pause to land mid-auction. The variance a GC introduces is simply not in the system.
The honest line. Vidai does not run inside your bidding loop. It removes the middleware hops and the latency variance around your models, so the budget is spent on inference, not plumbing.

Hot path, cold path

Not everything belongs on the clock.

The teams that win the latency game are strict about what runs inside the auction and what does not. Heavy work is pushed off the hot path, pre-computed, and cached. Vidai is built for the part that genuinely has to be on the clock.

Cold path, pre-computed. Heavy NLP work like contextual and brand-safety classification is too slow for the auction. It runs asynchronously and lands in an in-memory store.
Hot path, a fast lookup. At bid time the auction does a sub-millisecond read against that cache. The intelligence was computed earlier; the clock only sees a lookup.
Vidai governs the hot path. For the calls that must happen inside the window, Vidai is the in-path layer that keeps the overhead flat and deterministic, at the peak as at idle.
Fail fast, never hang. If a downstream model is slow, a circuit breaker trips and the path falls back rather than blowing the deadline. A late bid is a lost bid; a fast fallback is not.

Why it holds the line

Deterministic by construction, not by tuning.

Rust hot path, no garbage collector. No collector pause mid-request. The overhead is the same at the peak as it is at idle, that's the whole point.
One pass, not seven hops. Translation, routing, guardrails, cost and fallback happen in a single in-path engine, so there's no chain of libraries each adding its own millisecond.
In your path, on your infrastructure. No extra network hop out to someone else's cloud and back. The latency you measure is the latency you ship.
Cost control that doesn't cost latency. Per-call attribution and spend circuits run inline at the same sub-2ms, so governing the spend never slows the auction.

Put it on your critical path.

A 20-minute technical walkthrough: the latency profile under load, the p99 numbers, and how it behaves at your peak.