ARCA.VISION
// THE EFFICIENCY AUDITOR · MEMORY-WALL AUDIT

Kernel-grade FinOps.
For AI fleets.

A scoped memory-wall audit, typically 1–3 weeks. We hook sys_exit_ioctl and measure the nanosecond gap between every MEMCPY at the driver level — zero observer-effect latency. The output: a savings PDF showing exactly how much of your fleet is memory-bound, not compute-bound, what to quantize, and what to reclaim.

$# memory-wall audit · contact engineering
  • On-prem · read-only · no agent SDK installed
  • Roofline classification · per-PID, per-GPU, per-shard
  • Sharding-aware · auto-detects Tensor Parallelism (TP) degree
  • vLLM-aware · KV-cache vs weight-load distinction
  • Quantization advisor · FP16 → INT4 / GGUF guidance
  • Output · savings PDF · roofline + waste score + reclaim plan
IDLE// integration · live walk-through
arca@host ~ $