AI Traffic Calculator
See what agents do to your AI traffic.
Add up the AI work you actually run, interactive, customer-facing and batch. The multiplier is the part that makes the room go quiet; the node count is the part the platform team has to budget for.
The AI work you run
Switch on the workloads that apply. A real enterprise runs more than one, and the total is what hits your gateway.
Your estimated AI traffic
—
The infrastructure you need
One node carries it; a second is for failover, not for load. A node here is a single legacy 8-core box (Xeon E3-1240 v3, a 2013-era CPU); modern hardware does more. See the benchmark behind these numbers →
Volume only, not cost: per-token prices move every quarter, the call count does not. The unit is the agent task; fan-out per task is the conservative call-count figure for each agentic stage from agentic-traffic research, 2026 (production runs higher). Day-paced workloads assume a 5× peak-to-average factor; batch sets its own peak from its run window. Per-node throughput is the verified single-node benchmark on the performance page; node counts assume each node is provisioned to 60% with safety headroom, not run flat out.
How we calculate this.
No magic, no cost guesswork. Here is every assumption, so you can check it against your own numbers.
The unit is the agent task
One task is one prompt or job a person or system kicks off. A task fans out into many calls, file reads, tool calls, test runs, retries, sub-agents. So calls = tasks × fan-out. The agentic stage you pick sets the fan-out: ~3 calls per task piloting, ~10 scaling, ~25 agent-first. These are conservative ends of measured ranges; production runs higher.
Three workloads, because they scale on different things
- Interactive is driven by headcount:
people × tasks per person per day. Engineers on agentic tools, analysts with copilots. Paced over a ~10h working day, 260 days a year. - Customer-facing is driven by your customer base:
customers × daily active share × tasks per session. Support chat, in-app assistants, agentic fraud or KYC checks. Paced over a ~16h active window, 365 days. - Batch & dataset is driven by the size of the corpus:
items per day × tasks per item. Classifying tickets, extracting from documents, enriching records. It runs in a concentrated window you set, so its peak is computed in that window, not smeared across a day.
A real enterprise runs more than one. The calculator sums whichever you switch on.
From tasks to peak load
Average requests per second is the day's calls divided by the workload's window. Real AI traffic is bursty, payday, market open, an incident, so day-paced workloads carry a fixed 5× peak-to-average factor. A batch job's run window already is its peak window, so it carries a smaller ~1.5× for uneven pacing inside it.
We sum the peaks across workloads. That is deliberately the conservative case: it sizes infrastructure for every peak landing at once. In practice batch (overnight) and interactive (daytime) peaks often do not coincide, so your real peak is usually lower. Read the number as a safe ceiling, not a prediction.
From peak load to node count
Nodes = peak RPS divided by what one node usefully carries, rounded up. We start from the verified single-node benchmark: 21,803 RPS for Vidai, ~644 RPS for a typical cloud-managed gateway, ~177 RPS for a Python proxy.
A node here is one machine: a legacy 8-core server, an Intel Xeon E3-1240 v3, a CPU from 2013. We benchmark on deliberately modest, dated hardware so the figure is a floor, not a best case. Current-generation silicon carries more per node, which only widens the gap.
But nobody runs a node at 100%. You provision with headroom, for traffic bursts and for a node failing over. So the usable figure per node is the benchmark × 60%, and the node count is sized against that. This is the honest reason a low-throughput gateway needs a large fleet: not that anyone runs failing nodes, but that a low ceiling, once you leave the safety margin every team leaves, forces many of them. The bar on each row shows how full one node's provisioned capacity your peak uses.
The Python proxy is shown because it is what many teams run today. At ~177 RPS per node before headroom, its node count climbs fast. That is the cost of horizontal scaling on a slow runtime, made visible, not a put-down.
Volume only, never cost. Per-token prices fall 10–20% a quarter, so a pounds figure drifts; the call count does not.
See it on a real deployment.
A 20-minute technical walkthrough on infrastructure that looks like yours.