KV Cache Viz
A per-layer view of the K and V tensors that an autoregressive decoder caches as it emits tokens. Each layer renders a pair of small heatmaps — the K cache on the left, the V cache on the right — with rows indexed by attention head and columns indexed by sequence position. As seqLen grows, additional columns light up to show new (K, V) pairs being appended; previously cached columns stay visible at reduced intensity so the picture reads as a cache (kept) rather than a recompute.
This is the shape-of-the-cache sibling to KVCacheBarViz (total cache footprint as a horizontal bar) and KVCacheSizeEstimator (calculator-style GiB readout). Reach for KVCacheViz when the point is the layout itself — that the cache is N layers × H heads × S positions × D dim, not a single number.
Installation
npx shadcn@latest add https://craftbits.dev/r/kv-cache-viz.jsonUsage
import { KVCacheViz } from "@craft-bits/core";
<KVCacheViz numLayers={6} defaultSeqLen={16} maxSeqLen={32} />Drive the sequence length from outside (e.g. wire it to a scrubber):
<KVCacheViz
numLayers={6}
seqLen={seqLen}
onSeqLenChange={setSeqLen}
maxSeqLen={32}
/>Highlight a single layer to draw the eye during a walkthrough:
<KVCacheViz numLayers={6} currentLayer={2} seqLen={12} maxSeqLen={32} />Autoplay the cache fill at one column per 500 ms:
<KVCacheViz playing playSpeed={500} numLayers={6} maxSeqLen={32} />Understanding the component
- One row per layer. The component renders
numLayersrow blocks, each with a small label (L1,L2, ...) and two heatmaps side by side — K on the left, V on the right. Heads stack vertically inside each heatmap; sequence positions extend horizontally. - Cells stand in for
headDimvectors. Each cell represents a length-headDimvector, not a scalar. Rather than rendering every dim as a sub-cell (which floods the layout), the component tints each cell with a stable hue stripe driven by its position — enough texture to remind viewers there is a vector under the hood, without breaking the grid. - Fill grows with
seqLen. Columns with index< seqLenlight up to accent at ~85% opacity. Columns at or beyondseqLenstay at the muted background. The grid template is fixed bymaxSeqLenso the layout does not reflow as new tokens append. - Layer highlight. Pass
currentLayerto focus on a single layer's grids — the matching row gets an accent border and tinted background, the other rows dim to 40% opacity. Passnull(the default) to show every layer equally. - Controlled or uncontrolled.
seqLenfollows the Radix pattern: passseqLenplusonSeqLenChangefor controlled mode, ordefaultSeqLenfor uncontrolled. - Autoplay with
SPRINGS.smooth. Whenplayingistrue,seqLenadvances by one everyplaySpeedms viawindow.setInterval. Cell-fade transitions rideSPRINGS.smoothby default.prefers-reduced-motion: reducesnaps every fade to instant and disables autoplay.
Props
| Prop | Type | Default | Description |
|---|---|---|---|
numLayers | number | 6 | Number of decoder layers to render. |
seqLen | number | — | Controlled sequence length. |
defaultSeqLen | number | 16 | Uncontrolled initial sequence length. |
onSeqLenChange | (seqLen: number) => void | — | Fires whenever the sequence length advances. |
maxSeqLen | number | 32 | Maximum sequence length the cache can hold. |
currentLayer | number | null | null | Layer index (0-based) to highlight. null shows every layer equally. |
numHeads | number | 3 | Attention heads visualised per layer. |
headDim | number | 8 | Per-head dimension. Drives the cell hue stripe. |
playing | boolean | false | When true, autoplay advances seqLen until it hits maxSeqLen. |
playSpeed | number | 600 | Milliseconds between autoplay steps. Floored at 80 ms. |
transition | Transition | SPRINGS.smooth | Cell-fade transition. |
className | string | — | Merged onto the root via cn(). |
Accessibility
- The figure is
role="figure"with a hidden summary listing layer / head / sequence / dim counts — screen readers hear the shape whenever props change. - A polite
aria-liveregion announces the currentseqLenwhenever it advances. - Each layer is a
role="group"witharia-label="Layer N"so screen-reader users can navigate the cache by layer. - The grids themselves are
aria-hiddendecoration — the textual summary carries the meaning. Colour is never the only signal: filled vs unfilled cells differ in both opacity and the legend's textual labels (cached/unfilled). - Motion respects
prefers-reduced-motion: reduce— cell fades collapse to instant swaps and autoplay never starts.
Credits
- Extracted from:
craftingattention(app/src/lessons/primitives/viz/KVCacheViz.tsx). The source paired a naive-recompute lane with a cached lane to make a counting argument (N(N+1)/2vsN), wrapped in a three-modeWidget(Explore / Predict / Challenge) with bookmarks, history, score dots, and per-token feedback. The library extract is a different primitive: instead of arguing the cost, it shows the shape — the actual[batch, head, seq, head_dim]tensors stacked per layer — so it reads as a sibling ofKVCacheBarVizandKVCacheSizeEstimatorrather than a re-skin. The naive-vs-cached counting story belongs in lesson code, not in the library primitive.