Performance
Raw latency doesn't matter. Latency under load does.
Verified 21,803 requests per second at sub-2ms median on a single node, while Python alternatives fail below 1,000.
Why load is now the only number
A benchmark used to be a vanity metric. Agents made it a survival metric.
In the chat era, your AI traffic was paced by people: one prompt, one call, human typing speed as the ceiling. Agentic AI removes that ceiling. One task now fans out into ten or more autonomous calls, and the traffic through your in-path layer is climbing 10× and still rising. The layer in front of every AI request is no longer handling chat-pace load. It is handling agent-pace load, and its speed under that load is now your AI's speed.
What it means for the business
Fast enough that nobody notices it's there.
Every 100ms of added latency measurably reduces engagement. Vidai sits in the path of every AI request and adds deterministic, sub-millisecond overhead. It doesn't become the bottleneck when traffic spikes 10×.
Performance is a property of being a control plane, not a feature you tune. What's a control plane? →
Verified benchmark
Under a 21,803 RPS stress test.
A single legacy 8-core node, an Intel Xeon E3-1240 v3 from 2013, chosen so the figure is a floor, not a best case. Full methodology and reproduction steps in the technical write-up.
Per-core scaling
Why a single node carries so much.
The whole-machine number above is the headline. The number underneath it is the more telling one. Each core in the test added roughly 7,000 requests per second of headroom in a linear scaling curve, up to about 29,000 RPS in total before shared-hardware contention (NIC, cache) prevented further increase. On the same harness, Python-based AI proxies bottlenecked around 75–110 RPS per core: a per-core efficiency gap of roughly 94×.
That gap is the reason a fleet of Python nodes does not bridge the difference: 94 of them, behind a load balancer, replicate the bottleneck 94 times. The fastest path through this is not more nodes, it's a faster runtime.
Infrastructure footprint
What decides your node count: headroom, or a ceiling.
Nobody runs 21,803 requests per second sustained; that is a stress-test peak. The number that matters is what one node clears with room to spare. With that much headroom, real traffic, even agent-pace traffic, fits inside one node, so you add nodes for redundancy, not to keep up. The slower the runtime, the sooner each node redlines, and then node count is set by load, not by choice.
Why each node carries more
The headroom is a property of a Rust hot path.
The per-node margin that lets you scale by choice is not a tuning trick. It falls out of language and architecture choices that hold the same shape under 10× the load.
Want the benchmark methodology.
A 20-minute technical walkthrough, including how to reproduce the numbers.