GPU Memory Budget Viz
A workload-agnostic memory-budget visualiser. Each workload contributes one or more named fixed segments (model weights, activations, OS/framework, …) plus an optional per-request segment that scales linearly with batchSize. The stacked bar is normalised to capacityGb, the accent colour walks observe → filling → limit → overflow as the total approaches and crosses capacity, and an overflow sliver pulses once the workload exceeds the card. The derived "max batch before OOM" readout makes the cliff explicit.
GPU memory budget. custom (80GB). Total 25 GB of 80GB. Max batch 31.
custom (80GB)25 GB / 80GB
- weights14 GB
- activations1.2 GB
- OS/framework2.0 GB
- KV cache8.0 GB
Max batch = 31 before OOM31.5% used
GPU memory budget. custom (80GB). Total 25 GB of 80GB. Max batch 31.
Customize
GPU
80
Workload
14.0
2.0
4
Installation
npx shadcn@latest add https://craftbits.dev/r/gpu-memory-budget-viz.jsonUsage
import { GpuMemoryBudgetViz } from "@craft-bits/viz/gpu-memory-budget-viz";
<GpuMemoryBudgetViz
gpuName="A100"
capacityGb={80}
segments={[
{ id: "weights", label: "weights", gb: 14 },
{ id: "activations", label: "activations", gb: 1.2 },
{ id: "overhead", label: "OS/framework", gb: 2 },
]}
perRequest={{ id: "kv-cache", label: "KV cache", perRequestGb: 2 }}
defaultBatchSize={4}
/>Drive the batch size from outside (controlled mode):
const [batchSize, setBatchSize] = useState(4);
<GpuMemoryBudgetViz
capacityGb={24}
batchSize={batchSize}
onBatchSizeChange={setBatchSize}
segments={[{ id: "weights", label: "weights", gb: 14 }]}
perRequest={{ id: "kv", label: "KV cache", perRequestGb: 2 }}
/>;Fixed-cost-only workload (no batch slider rendered):
<GpuMemoryBudgetViz
gpuName="4090"
capacityGb={24}
segments={[
{ id: "weights", label: "weights", gb: 35 },
{ id: "overhead", label: "OS/framework", gb: 2 },
]}
/>Understanding the component
- Single denominator. Every segment width is
gb / capacityGb. The bar therefore reads as "fraction of capacity used", not "fraction of the longest bar in the chart" — the OOM line is the same place on every render. - Two kinds of segments.
segments[]is the list of fixed costs — model weights, activations, OS/framework overhead.perRequestis the batch-scaled cost — KV cache per request, per-sample activation, etc. The component stacks the fixed segments first, then the per-request segment, then any leftover headroom. - Batch slider. When
perRequestis provided, a slider renders below the bar. The slider follows Radix's controlled / uncontrolled pattern — passbatchSize+onBatchSizeChangefor controlled, omit both for self-managed. - Phase machine. The accent colour derives from the total fraction:
observe(< 50%),filling(50–90%),limit(≥ 90%),overflow(> 100%). The phase drives the top-edge ribbon, the right-hand readout, and the overflow pulse. - Max-batch readout.
floor((capacity − fixedCost) / perRequestGb)is the largest integer batch the workload can fit. When the fixed cost alone exceeds capacity the readout reports "Fixed cost exceeds capacity"; when there is no per-request cost it reports "No per-request cost". - Reduced motion. Under
prefers-reduced-motion: reduce, every bar transition collapses toduration: 0and the overflow sliver renders at a constant 60% opacity instead of pulsing.
Props
| Prop | Type | Default | Description |
|---|---|---|---|
capacityGb | number | — | Total GPU capacity in gigabytes. Clamped to > 0. |
gpuName | string | — | Optional GPU label rendered in the header. |
segments | readonly GpuMemoryBudgetVizSegment[] | [] | Fixed-cost segments (weights, activations, overhead). |
perRequest | GpuMemoryBudgetVizPerRequest | — | Per-request segment that scales with batchSize. Omit to hide the slider. |
batchSize | number | — | Controlled batch size. Pair with onBatchSizeChange. |
defaultBatchSize | number | 1 | Uncontrolled initial batch size. |
onBatchSizeChange | (next: number) => void | — | Fires after the slider moves. |
minBatchSize | number | 1 | Slider lower bound. |
maxBatchSize | number | derived | Slider upper bound. Defaults to maxBatch + 8 so users can overshoot. |
transition | Transition | SPRINGS.smooth | Override the bar-segment growth spring. |
className | string | — | Merged onto the root via cn(). |
Accessibility
- The root is
role="figure"with anaria-labelledbysummary that names the GPU, the total memory used, the capacity, and the max-batch headroom. - The summary is mirrored in a live region (
aria-live="polite") so screen-reader users hear the same update as sighted users when the slider moves. - The slider is a native
<input type="range">witharia-valuemin/aria-valuemax/aria-valuenowmirroring its current state and a visible label. - Colour is never the only signal — the percentage used, the OOM headroom, and the per-segment GB readouts are all text.
- Focus styling on the slider uses
:focus-visiblewith a 2px ring offset from the surface so it remains visible against both light and dark themes. - Motion respects
prefers-reduced-motion: reduce— bar transitions snap and the overflow sliver settles to a constant 60% opacity instead of pulsing.
Credits
- Extracted from:
craftingattention(app/src/lessons/primitives/nn/GPUMemoryBudgetViz.tsx). The source was a tightly bundled LLM-batching lesson — three hardcoded model configs, two hardcoded GPU configs, lesson-specific phase narration,SvgLabel+ChallengeBtnchrome, raw--color-*track tokens, an inlineSPRINGS.gentleimport, and an imperative per-segmentanimate()against rect refs that the lesson re-keyed every slider tick. The viz extract drops the lesson chrome (raw<div>+ token-styled controls), generalises the workload to arbitrarysegments[]+ an optionalperRequest(so the same primitive teaches LLM batching, image-model VRAM, training vs inference, etc.), remaps the palette tovar(--cb-*)semantic tokens, surfacesbatchSizevia the controlled/uncontrolled Radix pattern, and swaps the imperative bar animation for declarativemotion.spangrowth driven bySPRINGS.smooth.