ARCA.VISION
// RESOURCES · BENCHMARK · METHODOLOGY

The 20% default,
unpacked.

Every ROI calculator hides at least one assumption. Ours hides a multiplier: current utilization × 1.20, capped at 95%. Here is where that multiplier comes from, what the literature actually says, and where the plausibility cap should sit.

// TL;DRMODEL · NOT MEASUREMENT

A modeled multiplier,
not a published metric.

The 20% default is a deliberate, conservative input. It is not a measurement from any single customer, not an average across customers, and not a guarantee. It is a modeling rate that lets you size the opportunity before scoping an audit, which is where measured numbers come from.

If you want the measured version of this number against your traffic, the Efficiency Auditor engagement is what produces it.

// IN ONE LINE

uplifted_utilization = min(0.95, current_utilization × 1.20)

recovered_hours = fleet × 8760 × (uplifted - current)
incremental_revenue = recovered_hours × sellable_rate

// 01 · WHY A MULTIPLIER, NOT A FLAT PERCENTAGE

A flat percentage flatters big fleets.

The retail GPU rental calculator on this site uses a flat 20% of the hourly bill. That works when you're modeling avoided expense — every wasted dollar looks the same. For a neocloud operator selling hours, it overstates the opportunity.

// FLAT-PERCENTAGE FRAMING

A 10,000-GPU fleet running at 90% utilization has 10% of hours unsold. A flat 20% reclaim implies you can sell another 20% of total hours — which is physically impossible above 100% utilization.

The flat percentage works for renters because there is no upper bound on the bill you could avoid; it breaks for sellers because there is an upper bound on how much of a fleet can be utilized.

// MULTIPLIER FRAMING

A 1.20× multiplier on current utilization scales gracefully across fleet shapes. A fleet at 50% utilization models out to 60%; a fleet at 80% models to 96% — and then the cap pulls it back to a defensible 95%.

The multiplier respects the physical ceiling. Two operators with different starting utilizations get different absolute uplifts, which mirrors reality.

// 02 · WHAT THE LITERATURE REPORTS

Reported utilization, and what it actually measures.

A handful of publicly disclosed data points anchor the modeling. The numbers below are public references, not Arca.Vision measurements; we cite them to ground the calculator, not to claim them.

// INFERENCE FLEETS55–75%

Steady-state utilization reported in vLLM and TensorRT-LLM operator post-mortems. Memory-bound workloads cluster at the low end of this range.

// TRAINING FLEETS70–85%

Reported MFU (Model FLOPs Utilization) from large-scale training papers. Note: MFU is not the same as billable utilization; the calculator targets the latter.

// NEOCLOUD DISCLOSED~75%

Order-of-magnitude figure inferred from neocloud public filings. Disclosed utilization is rarely a single number, but mid-70s is a defensible anchor.

// References above are publicly disclosed or inferable. Where the literature and disclosure diverge, the calculator anchors on the more conservative number to keep the modeled uplift defensible in a CFO review.

// 03 · WHY THE CAP MATTERS

Above 95%, the model stops being honest.

A 95% ceiling is not a physical limit. It is a plausibility limit. Above 95%, scheduling overhead, customer-mix volatility, and maintenance windows make sustained utilization implausible, and the marginal hour above 95% is rarely profitable to chase.

// WHY NOT 100%

Every fleet has a non-zero fraction of hours lost to driver upgrades, kernel patches, hardware swaps, and queue-warmup transients. A model that targets 100% utilization implicitly promises to eliminate those, which is not something software can do at the host level.

// WHY NOT 80%

A more conservative 80% cap would shrink the modeled uplift for operators already running at 75%+. Those operators are exactly the ones with the measurement infrastructure to push the calculator's claims back at us, so the cap has to leave them room.

// 04 · ASSUMPTIONS WE'RE OPEN ABOUT

Three places the model is wrong on purpose.

A modeling tool that pretends to be a measurement tool is a liability. Here are the simplifications baked into the calculator and the reasons we made them.

NOTE 01

Sellable rate is flat

Real neocloud pricing is tiered (committed-use, spot, on-demand) and contract-dependent. The calculator treats sellable rate as a single number because asking a prospect for their full price ladder before they trust us is a non-starter.

NOTE 02

All recovered hours sell

We assume the marginal recovered hour can be sold at the entered rate. In practice, the marginal hour may sell at a discount or sit idle if customer demand is the binding constraint. The model is supply-side; the demand side is the operator's problem.

NOTE 03

Utilization uplift is linear with effort

We do not model diminishing returns on the multiplier. A fleet already running well-quantized workloads will see less than 20% relative uplift; a fleet running native FP16 vLLM with default KV-cache will see more. The Auditor engagement is what tells you which side of the line you're on.

// USE THE CALCULATOR

Now that you know what the
defaults mean, plug yours in.