Multi-Head Split Viz

A teaching visualisation of the "multi-head" part of multi-head attention. One full-width d_model-dimensional bar at the top is sliced by dashed division lines into numHeads heads of size d_k = d_model / numHeads. Each head card carries a label, its d_k, an accent slice bar, and an optional W_q / W_k / W_v projection row pointing at small per-head Q / K / V blocks.

Multi-head splitd_model=64 / h=8 = d_k=8

d_model = 64

Customize

Shape

dModel64

numHeads8

Highlight

currentHeadnone

Detail

show projections

Installation

npx shadcn@latest add https://craftbits.dev/r/multi-head-split-viz.json

Usage

import { MultiHeadSplitViz } from "@craft-bits/core";
 
<MultiHeadSplitViz dModel={64} numHeads={8} />

Highlight one head — the matching slice glows, the others dim:

<MultiHeadSplitViz dModel={64} numHeads={8} currentHead={2} />

Drop the projection row for a leaner figure:

<MultiHeadSplitViz dModel={64} numHeads={4} showProjections={false} />

Understanding the component

One bar, h slices. The bar at the top represents the full d_model-dimensional input. The dashed division lines split it into numHeads equal sections — each section is one head's slice of size d_k = d_model / numHeads.
Why the split matters. Each head only sees d_k dimensions and runs its own attention on that slice. Lower-dim per-head attention is cheaper and lets independent heads specialise on different relationships between tokens. The total parameter count of W_q, W_k, W_v, W_o stays the same regardless of h.
Projections per head. When showProjections is on, each head card renders W_q, W_k, W_v labels with downward arrows feeding small Q / K / V blocks underneath.
currentHead highlights one head. Setting currentHead accent-borders that head and scales its card to 1.05, while the other heads dim to 0.45 opacity.
SPRINGS.smooth for the highlight. Active-head transitions ride a smooth spring from @craft-bits/core/motion. prefers-reduced-motion: reduce collapses every spring to an instant swap.
numHeads must divide dModel. Invalid combinations are clamped down to the largest divisor that fits.

Props

Prop	Type	Default	Description
`dModel`	`number`	`64`	Full model dimension before splitting.
`numHeads`	`number`	`8`	Number of heads. Must divide `dModel`; otherwise clamped to the largest divisor that fits.
`currentHead`	`number \| null`	—	Controlled active head. Pair with `onCurrentHeadChange`.
`defaultCurrentHead`	`number \| null`	`null`	Uncontrolled initial active head.
`onCurrentHeadChange`	`(head) => void`	—	Fires when the active head changes.
`showProjections`	`boolean`	`true`	Render the per-head `W_q / W_k / W_v` labels and `Q / K / V` blocks.
`orientation`	`'horizontal' \| 'vertical'`	`'horizontal'`	Layout direction of the head cards.
`transition`	`Transition`	`SPRINGS.smooth`	Spring for the active-head transition.
`className`	`string`	—	Merged onto the root via `cn()`.

Accessibility

The root is role="figure" with aria-label describing the matrix dimensions ("Multi-head split: d_model=64, 8 heads of d_k=8").
A visually-hidden aria-live="polite" summary narrates the currently highlighted head and its dimension range.
Each head card is a real <button> with aria-pressed and an explicit aria-label — keyboard users can Tab to a head and Space or Enter to toggle it.
Color is never the only signal — every head card carries a numeric badge, a d_k value, and a focus ring on :focus-visible.
prefers-reduced-motion: reduce collapses every spring to an instant swap.

Credits

Extracted from: craftingattention (app/src/lessons/primitives/viz/MultiHeadSplitViz.tsx). The source was a full lesson widget with Explore / Predict / Challenge mode strips, a four-token Q / K / V pipeline SVG, per-head attention-pattern heatmaps, history-undo, narration, and a question bank around d_k and parameter counts. The library extract is the pure splitting visualisation — one d_model bar, h slices with their d_k, and the per-head W_q / W_k / W_v projection labels.