KV Cache Viz

A per-layer view of the K and V tensors that an autoregressive decoder caches as it emits tokens. Each layer renders a pair of small heatmaps — the K cache on the left, the V cache on the right — with rows indexed by attention head and columns indexed by sequence position. As seqLen grows, additional columns light up to show new (K, V) pairs being appended; previously cached columns stay visible at reduced intensity so the picture reads as a cache (kept) rather than a recompute.

This is the shape-of-the-cache sibling to KVCacheBarViz (total cache footprint as a horizontal bar) and KVCacheSizeEstimator (calculator-style GiB readout). Reach for KVCacheViz when the point is the layout itself — that the cache is N layers × H heads × S positions × D dim, not a single number.

KV cache layout6 L × 3 H × 0/24 S × 8 D

K[3×24]

V[3×24]

K[3×24]

V[3×24]

K[3×24]

V[3×24]

K[3×24]

V[3×24]

K[3×24]

V[3×24]

K[3×24]

V[3×24]

Customize

Shape

layers6

heads3

head dim8

max seq24

Focus

layer focusL3

Playback

autoplay

speed500 ms

Installation

npx shadcn@latest add https://craftbits.dev/r/kv-cache-viz.json

Usage

import { KVCacheViz } from "@craft-bits/core";
 
<KVCacheViz numLayers={6} defaultSeqLen={16} maxSeqLen={32} />

Drive the sequence length from outside (e.g. wire it to a scrubber):

<KVCacheViz
  numLayers={6}
  seqLen={seqLen}
  onSeqLenChange={setSeqLen}
  maxSeqLen={32}
/>

Highlight a single layer to draw the eye during a walkthrough:

<KVCacheViz numLayers={6} currentLayer={2} seqLen={12} maxSeqLen={32} />

Autoplay the cache fill at one column per 500 ms:

<KVCacheViz playing playSpeed={500} numLayers={6} maxSeqLen={32} />

Understanding the component

One row per layer. The component renders numLayers row blocks, each with a small label (L1, L2, ...) and two heatmaps side by side — K on the left, V on the right. Heads stack vertically inside each heatmap; sequence positions extend horizontally.
Cells stand in for headDim vectors. Each cell represents a length-headDim vector, not a scalar. Rather than rendering every dim as a sub-cell (which floods the layout), the component tints each cell with a stable hue stripe driven by its position — enough texture to remind viewers there is a vector under the hood, without breaking the grid.
Fill grows with seqLen. Columns with index < seqLen light up to accent at ~85% opacity. Columns at or beyond seqLen stay at the muted background. The grid template is fixed by maxSeqLen so the layout does not reflow as new tokens append.
Layer highlight. Pass currentLayer to focus on a single layer's grids — the matching row gets an accent border and tinted background, the other rows dim to 40% opacity. Pass null (the default) to show every layer equally.
Controlled or uncontrolled. seqLen follows the Radix pattern: pass seqLen plus onSeqLenChange for controlled mode, or defaultSeqLen for uncontrolled.
Autoplay with SPRINGS.smooth. When playing is true, seqLen advances by one every playSpeed ms via window.setInterval. Cell-fade transitions ride SPRINGS.smooth by default. prefers-reduced-motion: reduce snaps every fade to instant and disables autoplay.

Props

Prop	Type	Default	Description
`numLayers`	`number`	`6`	Number of decoder layers to render.
`seqLen`	`number`	—	Controlled sequence length.
`defaultSeqLen`	`number`	`16`	Uncontrolled initial sequence length.
`onSeqLenChange`	`(seqLen: number) => void`	—	Fires whenever the sequence length advances.
`maxSeqLen`	`number`	`32`	Maximum sequence length the cache can hold.
`currentLayer`	`number \| null`	`null`	Layer index (0-based) to highlight. `null` shows every layer equally.
`numHeads`	`number`	`3`	Attention heads visualised per layer.
`headDim`	`number`	`8`	Per-head dimension. Drives the cell hue stripe.
`playing`	`boolean`	`false`	When `true`, autoplay advances `seqLen` until it hits `maxSeqLen`.
`playSpeed`	`number`	`600`	Milliseconds between autoplay steps. Floored at 80 ms.
`transition`	`Transition`	`SPRINGS.smooth`	Cell-fade transition.
`className`	`string`	—	Merged onto the root via `cn()`.

Accessibility

The figure is role="figure" with a hidden summary listing layer / head / sequence / dim counts — screen readers hear the shape whenever props change.
A polite aria-live region announces the current seqLen whenever it advances.
Each layer is a role="group" with aria-label="Layer N" so screen-reader users can navigate the cache by layer.
The grids themselves are aria-hidden decoration — the textual summary carries the meaning. Colour is never the only signal: filled vs unfilled cells differ in both opacity and the legend's textual labels (cached / unfilled).
Motion respects prefers-reduced-motion: reduce — cell fades collapse to instant swaps and autoplay never starts.

Credits

Extracted from: craftingattention (app/src/lessons/primitives/viz/KVCacheViz.tsx). The source paired a naive-recompute lane with a cached lane to make a counting argument (N(N+1)/2 vs N), wrapped in a three-mode Widget (Explore / Predict / Challenge) with bookmarks, history, score dots, and per-token feedback. The library extract is a different primitive: instead of arguing the cost, it shows the shape — the actual [batch, head, seq, head_dim] tensors stacked per layer — so it reads as a sibling of KVCacheBarViz and KVCacheSizeEstimator rather than a re-skin. The naive-vs-cached counting story belongs in lesson code, not in the library primitive.