GPU Memory Budget Viz

A workload-agnostic memory-budget visualiser. Each workload contributes one or more named fixed segments (model weights, activations, OS/framework, …) plus an optional per-request segment that scales linearly with batchSize. The stacked bar is normalised to capacityGb, the accent colour walks observe → filling → limit → overflow as the total approaches and crosses capacity, and an overflow sliver pulses once the workload exceeds the card. The derived "max batch before OOM" readout makes the cliff explicit.

GPU memory budget. custom (80GB). Total 25 GB of 80GB. Max batch 31.
custom (80GB)25 GB / 80GB
  • weights14 GB
  • activations1.2 GB
  • OS/framework2.0 GB
  • KV cache8.0 GB
Max batch = 31 before OOM31.5% used
GPU memory budget. custom (80GB). Total 25 GB of 80GB. Max batch 31.
Customize
GPU
80
Workload
14.0
2.0
4

Installation

npx shadcn@latest add https://craftbits.dev/r/gpu-memory-budget-viz.json

Usage

import { GpuMemoryBudgetViz } from "@craft-bits/viz/gpu-memory-budget-viz";
 
<GpuMemoryBudgetViz
  gpuName="A100"
  capacityGb={80}
  segments={[
    { id: "weights", label: "weights", gb: 14 },
    { id: "activations", label: "activations", gb: 1.2 },
    { id: "overhead", label: "OS/framework", gb: 2 },
  ]}
  perRequest={{ id: "kv-cache", label: "KV cache", perRequestGb: 2 }}
  defaultBatchSize={4}
/>

Drive the batch size from outside (controlled mode):

const [batchSize, setBatchSize] = useState(4);
 
<GpuMemoryBudgetViz
  capacityGb={24}
  batchSize={batchSize}
  onBatchSizeChange={setBatchSize}
  segments={[{ id: "weights", label: "weights", gb: 14 }]}
  perRequest={{ id: "kv", label: "KV cache", perRequestGb: 2 }}
/>;

Fixed-cost-only workload (no batch slider rendered):

<GpuMemoryBudgetViz
  gpuName="4090"
  capacityGb={24}
  segments={[
    { id: "weights", label: "weights", gb: 35 },
    { id: "overhead", label: "OS/framework", gb: 2 },
  ]}
/>

Understanding the component

  1. Single denominator. Every segment width is gb / capacityGb. The bar therefore reads as "fraction of capacity used", not "fraction of the longest bar in the chart" — the OOM line is the same place on every render.
  2. Two kinds of segments. segments[] is the list of fixed costs — model weights, activations, OS/framework overhead. perRequest is the batch-scaled cost — KV cache per request, per-sample activation, etc. The component stacks the fixed segments first, then the per-request segment, then any leftover headroom.
  3. Batch slider. When perRequest is provided, a slider renders below the bar. The slider follows Radix's controlled / uncontrolled pattern — pass batchSize + onBatchSizeChange for controlled, omit both for self-managed.
  4. Phase machine. The accent colour derives from the total fraction: observe (< 50%), filling (50–90%), limit (≥ 90%), overflow (> 100%). The phase drives the top-edge ribbon, the right-hand readout, and the overflow pulse.
  5. Max-batch readout. floor((capacity − fixedCost) / perRequestGb) is the largest integer batch the workload can fit. When the fixed cost alone exceeds capacity the readout reports "Fixed cost exceeds capacity"; when there is no per-request cost it reports "No per-request cost".
  6. Reduced motion. Under prefers-reduced-motion: reduce, every bar transition collapses to duration: 0 and the overflow sliver renders at a constant 60% opacity instead of pulsing.

Props

PropTypeDefaultDescription
capacityGbnumberTotal GPU capacity in gigabytes. Clamped to > 0.
gpuNamestringOptional GPU label rendered in the header.
segmentsreadonly GpuMemoryBudgetVizSegment[][]Fixed-cost segments (weights, activations, overhead).
perRequestGpuMemoryBudgetVizPerRequestPer-request segment that scales with batchSize. Omit to hide the slider.
batchSizenumberControlled batch size. Pair with onBatchSizeChange.
defaultBatchSizenumber1Uncontrolled initial batch size.
onBatchSizeChange(next: number) => voidFires after the slider moves.
minBatchSizenumber1Slider lower bound.
maxBatchSizenumberderivedSlider upper bound. Defaults to maxBatch + 8 so users can overshoot.
transitionTransitionSPRINGS.smoothOverride the bar-segment growth spring.
classNamestringMerged onto the root via cn().

Accessibility

  • The root is role="figure" with an aria-labelledby summary that names the GPU, the total memory used, the capacity, and the max-batch headroom.
  • The summary is mirrored in a live region (aria-live="polite") so screen-reader users hear the same update as sighted users when the slider moves.
  • The slider is a native <input type="range"> with aria-valuemin / aria-valuemax / aria-valuenow mirroring its current state and a visible label.
  • Colour is never the only signal — the percentage used, the OOM headroom, and the per-segment GB readouts are all text.
  • Focus styling on the slider uses :focus-visible with a 2px ring offset from the surface so it remains visible against both light and dark themes.
  • Motion respects prefers-reduced-motion: reduce — bar transitions snap and the overflow sliver settles to a constant 60% opacity instead of pulsing.

Credits

  • Extracted from: craftingattention (app/src/lessons/primitives/nn/GPUMemoryBudgetViz.tsx). The source was a tightly bundled LLM-batching lesson — three hardcoded model configs, two hardcoded GPU configs, lesson-specific phase narration, SvgLabel + ChallengeBtn chrome, raw --color-* track tokens, an inline SPRINGS.gentle import, and an imperative per-segment animate() against rect refs that the lesson re-keyed every slider tick. The viz extract drops the lesson chrome (raw <div> + token-styled controls), generalises the workload to arbitrary segments[] + an optional perRequest (so the same primitive teaches LLM batching, image-model VRAM, training vs inference, etc.), remaps the palette to var(--cb-*) semantic tokens, surfaces batchSize via the controlled/uncontrolled Radix pattern, and swaps the imperative bar animation for declarative motion.span growth driven by SPRINGS.smooth.