Multi-Head Split Viz

A teaching visualisation of the "multi-head" part of multi-head attention. One full-width d_model-dimensional bar at the top is sliced by dashed division lines into numHeads heads of size d_k = d_model / numHeads. Each head card carries a label, its d_k, an accent slice bar, and an optional W_q / W_k / W_v projection row pointing at small per-head Q / K / V blocks.

Multi-head attention split visualisation.Multi-head split: d_model=64, 8 heads of d_k=8. No head highlighted.
Multi-head splitd_model=64 / h=8 = d_k=8
d_model = 64
Customize
Shape
64
8
Highlight
none
Detail

Installation

npx shadcn@latest add https://craftbits.dev/r/multi-head-split-viz.json

Usage

import { MultiHeadSplitViz } from "@craft-bits/core";
 
<MultiHeadSplitViz dModel={64} numHeads={8} />

Highlight one head — the matching slice glows, the others dim:

<MultiHeadSplitViz dModel={64} numHeads={8} currentHead={2} />

Drop the projection row for a leaner figure:

<MultiHeadSplitViz dModel={64} numHeads={4} showProjections={false} />

Understanding the component

  1. One bar, h slices. The bar at the top represents the full d_model-dimensional input. The dashed division lines split it into numHeads equal sections — each section is one head's slice of size d_k = d_model / numHeads.
  2. Why the split matters. Each head only sees d_k dimensions and runs its own attention on that slice. Lower-dim per-head attention is cheaper and lets independent heads specialise on different relationships between tokens. The total parameter count of W_q, W_k, W_v, W_o stays the same regardless of h.
  3. Projections per head. When showProjections is on, each head card renders W_q, W_k, W_v labels with downward arrows feeding small Q / K / V blocks underneath.
  4. currentHead highlights one head. Setting currentHead accent-borders that head and scales its card to 1.05, while the other heads dim to 0.45 opacity.
  5. SPRINGS.smooth for the highlight. Active-head transitions ride a smooth spring from @craft-bits/core/motion. prefers-reduced-motion: reduce collapses every spring to an instant swap.
  6. numHeads must divide dModel. Invalid combinations are clamped down to the largest divisor that fits.

Props

PropTypeDefaultDescription
dModelnumber64Full model dimension before splitting.
numHeadsnumber8Number of heads. Must divide dModel; otherwise clamped to the largest divisor that fits.
currentHeadnumber | nullControlled active head. Pair with onCurrentHeadChange.
defaultCurrentHeadnumber | nullnullUncontrolled initial active head.
onCurrentHeadChange(head) => voidFires when the active head changes.
showProjectionsbooleantrueRender the per-head W_q / W_k / W_v labels and Q / K / V blocks.
orientation'horizontal' | 'vertical''horizontal'Layout direction of the head cards.
transitionTransitionSPRINGS.smoothSpring for the active-head transition.
classNamestringMerged onto the root via cn().

Accessibility

  • The root is role="figure" with aria-label describing the matrix dimensions ("Multi-head split: d_model=64, 8 heads of d_k=8").
  • A visually-hidden aria-live="polite" summary narrates the currently highlighted head and its dimension range.
  • Each head card is a real <button> with aria-pressed and an explicit aria-label — keyboard users can Tab to a head and Space or Enter to toggle it.
  • Color is never the only signal — every head card carries a numeric badge, a d_k value, and a focus ring on :focus-visible.
  • prefers-reduced-motion: reduce collapses every spring to an instant swap.

Credits

  • Extracted from: craftingattention (app/src/lessons/primitives/viz/MultiHeadSplitViz.tsx). The source was a full lesson widget with Explore / Predict / Challenge mode strips, a four-token Q / K / V pipeline SVG, per-head attention-pattern heatmaps, history-undo, narration, and a question bank around d_k and parameter counts. The library extract is the pure splitting visualisation — one d_model bar, h slices with their d_k, and the per-head W_q / W_k / W_v projection labels.