GPU Memory Budget Viz

A workload-agnostic memory-budget visualiser. Each workload contributes one or more named fixed segments (model weights, activations, OS/framework, …) plus an optional per-request segment that scales linearly with batchSize. The stacked bar is normalised to capacityGb, the accent colour walks observe → filling → limit → overflow as the total approaches and crosses capacity, and an overflow sliver pulses once the workload exceeds the card. The derived "max batch before OOM" readout makes the cliff explicit.

custom (80GB)25 GB / 80GB

weights14 GB
activations1.2 GB
OS/framework2.0 GB
KV cache8.0 GB

Max batch = 31 before OOM31.5% used

batch_size4

Customize

GPU

capacity (GB)80

Workload

weights (GB)14.0

KV / request (GB)2.0

batch size4

Installation

npx shadcn@latest add https://craftbits.dev/r/gpu-memory-budget-viz.json

Usage

import { GpuMemoryBudgetViz } from "@craft-bits/viz/gpu-memory-budget-viz";
 
<GpuMemoryBudgetViz
  gpuName="A100"
  capacityGb={80}
  segments={[
    { id: "weights", label: "weights", gb: 14 },
    { id: "activations", label: "activations", gb: 1.2 },
    { id: "overhead", label: "OS/framework", gb: 2 },
  ]}
  perRequest={{ id: "kv-cache", label: "KV cache", perRequestGb: 2 }}
  defaultBatchSize={4}
/>

Drive the batch size from outside (controlled mode):

const [batchSize, setBatchSize] = useState(4);
 
<GpuMemoryBudgetViz
  capacityGb={24}
  batchSize={batchSize}
  onBatchSizeChange={setBatchSize}
  segments={[{ id: "weights", label: "weights", gb: 14 }]}
  perRequest={{ id: "kv", label: "KV cache", perRequestGb: 2 }}
/>;

Fixed-cost-only workload (no batch slider rendered):

<GpuMemoryBudgetViz
  gpuName="4090"
  capacityGb={24}
  segments={[
    { id: "weights", label: "weights", gb: 35 },
    { id: "overhead", label: "OS/framework", gb: 2 },
  ]}
/>

Understanding the component

Single denominator. Every segment width is gb / capacityGb. The bar therefore reads as "fraction of capacity used", not "fraction of the longest bar in the chart" — the OOM line is the same place on every render.
Two kinds of segments. segments[] is the list of fixed costs — model weights, activations, OS/framework overhead. perRequest is the batch-scaled cost — KV cache per request, per-sample activation, etc. The component stacks the fixed segments first, then the per-request segment, then any leftover headroom.
Batch slider. When perRequest is provided, a slider renders below the bar. The slider follows Radix's controlled / uncontrolled pattern — pass batchSize + onBatchSizeChange for controlled, omit both for self-managed.
Phase machine. The accent colour derives from the total fraction: observe (< 50%), filling (50–90%), limit (≥ 90%), overflow (> 100%). The phase drives the top-edge ribbon, the right-hand readout, and the overflow pulse.
Max-batch readout. floor((capacity − fixedCost) / perRequestGb) is the largest integer batch the workload can fit. When the fixed cost alone exceeds capacity the readout reports "Fixed cost exceeds capacity"; when there is no per-request cost it reports "No per-request cost".
Reduced motion. Under prefers-reduced-motion: reduce, every bar transition collapses to duration: 0 and the overflow sliver renders at a constant 60% opacity instead of pulsing.

Props

Prop	Type	Default	Description
`capacityGb`	`number`	—	Total GPU capacity in gigabytes. Clamped to `> 0`.
`gpuName`	`string`	—	Optional GPU label rendered in the header.
`segments`	`readonly GpuMemoryBudgetVizSegment[]`	`[]`	Fixed-cost segments (weights, activations, overhead).
`perRequest`	`GpuMemoryBudgetVizPerRequest`	—	Per-request segment that scales with `batchSize`. Omit to hide the slider.
`batchSize`	`number`	—	Controlled batch size. Pair with `onBatchSizeChange`.
`defaultBatchSize`	`number`	`1`	Uncontrolled initial batch size.
`onBatchSizeChange`	`(next: number) => void`	—	Fires after the slider moves.
`minBatchSize`	`number`	`1`	Slider lower bound.
`maxBatchSize`	`number`	derived	Slider upper bound. Defaults to `maxBatch + 8` so users can overshoot.
`transition`	`Transition`	`SPRINGS.smooth`	Override the bar-segment growth spring.
`className`	`string`	—	Merged onto the root via `cn()`.

Accessibility

The root is role="figure" with an aria-labelledby summary that names the GPU, the total memory used, the capacity, and the max-batch headroom.
The summary is mirrored in a live region (aria-live="polite") so screen-reader users hear the same update as sighted users when the slider moves.
The slider is a native <input type="range"> with aria-valuemin / aria-valuemax / aria-valuenow mirroring its current state and a visible label.
Colour is never the only signal — the percentage used, the OOM headroom, and the per-segment GB readouts are all text.
Focus styling on the slider uses :focus-visible with a 2px ring offset from the surface so it remains visible against both light and dark themes.
Motion respects prefers-reduced-motion: reduce — bar transitions snap and the overflow sliver settles to a constant 60% opacity instead of pulsing.

Credits

Extracted from: craftingattention (app/src/lessons/primitives/nn/GPUMemoryBudgetViz.tsx). The source was a tightly bundled LLM-batching lesson — three hardcoded model configs, two hardcoded GPU configs, lesson-specific phase narration, SvgLabel + ChallengeBtn chrome, raw --color-* track tokens, an inline SPRINGS.gentle import, and an imperative per-segment animate() against rect refs that the lesson re-keyed every slider tick. The viz extract drops the lesson chrome (raw <div> + token-styled controls), generalises the workload to arbitrary segments[] + an optional perRequest (so the same primitive teaches LLM batching, image-model VRAM, training vs inference, etc.), remaps the palette to var(--cb-*) semantic tokens, surfaces batchSize via the controlled/uncontrolled Radix pattern, and swaps the imperative bar animation for declarative motion.span growth driven by SPRINGS.smooth.