KV Cache Viz

A per-layer view of the K and V tensors that an autoregressive decoder caches as it emits tokens. Each layer renders a pair of small heatmaps — the K cache on the left, the V cache on the right — with rows indexed by attention head and columns indexed by sequence position. As seqLen grows, additional columns light up to show new (K, V) pairs being appended; previously cached columns stay visible at reduced intensity so the picture reads as a cache (kept) rather than a recompute.

This is the shape-of-the-cache sibling to KVCacheBarViz (total cache footprint as a horizontal bar) and KVCacheSizeEstimator (calculator-style GiB readout). Reach for KVCacheViz when the point is the layout itself — that the cache is N layers × H heads × S positions × D dim, not a single number.

KV cache layout. 6 layers, 3 heads per layer, head dim 8, 0 of 24 sequence positions filled.0 of 24 tokens cached.
KV cache layout6 L × 3 H × 0/24 S × 8 D
L1
K[3×24]
V[3×24]
L2
K[3×24]
V[3×24]
L3
K[3×24]
V[3×24]
L4
K[3×24]
V[3×24]
L5
K[3×24]
V[3×24]
L6
K[3×24]
V[3×24]
Customize
Shape
6
3
8
24
Focus
L3
Playback
500 ms

Installation

npx shadcn@latest add https://craftbits.dev/r/kv-cache-viz.json

Usage

import { KVCacheViz } from "@craft-bits/core";
 
<KVCacheViz numLayers={6} defaultSeqLen={16} maxSeqLen={32} />

Drive the sequence length from outside (e.g. wire it to a scrubber):

<KVCacheViz
  numLayers={6}
  seqLen={seqLen}
  onSeqLenChange={setSeqLen}
  maxSeqLen={32}
/>

Highlight a single layer to draw the eye during a walkthrough:

<KVCacheViz numLayers={6} currentLayer={2} seqLen={12} maxSeqLen={32} />

Autoplay the cache fill at one column per 500 ms:

<KVCacheViz playing playSpeed={500} numLayers={6} maxSeqLen={32} />

Understanding the component

  1. One row per layer. The component renders numLayers row blocks, each with a small label (L1, L2, ...) and two heatmaps side by side — K on the left, V on the right. Heads stack vertically inside each heatmap; sequence positions extend horizontally.
  2. Cells stand in for headDim vectors. Each cell represents a length-headDim vector, not a scalar. Rather than rendering every dim as a sub-cell (which floods the layout), the component tints each cell with a stable hue stripe driven by its position — enough texture to remind viewers there is a vector under the hood, without breaking the grid.
  3. Fill grows with seqLen. Columns with index < seqLen light up to accent at ~85% opacity. Columns at or beyond seqLen stay at the muted background. The grid template is fixed by maxSeqLen so the layout does not reflow as new tokens append.
  4. Layer highlight. Pass currentLayer to focus on a single layer's grids — the matching row gets an accent border and tinted background, the other rows dim to 40% opacity. Pass null (the default) to show every layer equally.
  5. Controlled or uncontrolled. seqLen follows the Radix pattern: pass seqLen plus onSeqLenChange for controlled mode, or defaultSeqLen for uncontrolled.
  6. Autoplay with SPRINGS.smooth. When playing is true, seqLen advances by one every playSpeed ms via window.setInterval. Cell-fade transitions ride SPRINGS.smooth by default. prefers-reduced-motion: reduce snaps every fade to instant and disables autoplay.

Props

PropTypeDefaultDescription
numLayersnumber6Number of decoder layers to render.
seqLennumberControlled sequence length.
defaultSeqLennumber16Uncontrolled initial sequence length.
onSeqLenChange(seqLen: number) => voidFires whenever the sequence length advances.
maxSeqLennumber32Maximum sequence length the cache can hold.
currentLayernumber | nullnullLayer index (0-based) to highlight. null shows every layer equally.
numHeadsnumber3Attention heads visualised per layer.
headDimnumber8Per-head dimension. Drives the cell hue stripe.
playingbooleanfalseWhen true, autoplay advances seqLen until it hits maxSeqLen.
playSpeednumber600Milliseconds between autoplay steps. Floored at 80 ms.
transitionTransitionSPRINGS.smoothCell-fade transition.
classNamestringMerged onto the root via cn().

Accessibility

  • The figure is role="figure" with a hidden summary listing layer / head / sequence / dim counts — screen readers hear the shape whenever props change.
  • A polite aria-live region announces the current seqLen whenever it advances.
  • Each layer is a role="group" with aria-label="Layer N" so screen-reader users can navigate the cache by layer.
  • The grids themselves are aria-hidden decoration — the textual summary carries the meaning. Colour is never the only signal: filled vs unfilled cells differ in both opacity and the legend's textual labels (cached / unfilled).
  • Motion respects prefers-reduced-motion: reduce — cell fades collapse to instant swaps and autoplay never starts.

Credits

  • Extracted from: craftingattention (app/src/lessons/primitives/viz/KVCacheViz.tsx). The source paired a naive-recompute lane with a cached lane to make a counting argument (N(N+1)/2 vs N), wrapped in a three-mode Widget (Explore / Predict / Challenge) with bookmarks, history, score dots, and per-token feedback. The library extract is a different primitive: instead of arguing the cost, it shows the shape — the actual [batch, head, seq, head_dim] tensors stacked per layer — so it reads as a sibling of KVCacheBarViz and KVCacheSizeEstimator rather than a re-skin. The naive-vs-cached counting story belongs in lesson code, not in the library primitive.