Multi-Head Split Viz
A teaching visualisation of the "multi-head" part of multi-head attention. One full-width d_model-dimensional bar at the top is sliced by dashed division lines into numHeads heads of size d_k = d_model / numHeads. Each head card carries a label, its d_k, an accent slice bar, and an optional W_q / W_k / W_v projection row pointing at small per-head Q / K / V blocks.
Multi-head attention split visualisation.Multi-head split: d_model=64, 8 heads of d_k=8. No head highlighted.
Multi-head splitd_model=64 / h=8 = d_k=8
d_model = 64
Customize
Shape
64
8
Highlight
none
Detail
Installation
npx shadcn@latest add https://craftbits.dev/r/multi-head-split-viz.jsonUsage
import { MultiHeadSplitViz } from "@craft-bits/core";
<MultiHeadSplitViz dModel={64} numHeads={8} />Highlight one head — the matching slice glows, the others dim:
<MultiHeadSplitViz dModel={64} numHeads={8} currentHead={2} />Drop the projection row for a leaner figure:
<MultiHeadSplitViz dModel={64} numHeads={4} showProjections={false} />Understanding the component
- One bar, h slices. The bar at the top represents the full
d_model-dimensional input. The dashed division lines split it intonumHeadsequal sections — each section is one head's slice of sized_k = d_model / numHeads. - Why the split matters. Each head only sees
d_kdimensions and runs its own attention on that slice. Lower-dim per-head attention is cheaper and lets independent heads specialise on different relationships between tokens. The total parameter count ofW_q,W_k,W_v,W_ostays the same regardless ofh. - Projections per head. When
showProjectionsis on, each head card rendersW_q,W_k,W_vlabels with downward arrows feeding smallQ / K / Vblocks underneath. currentHeadhighlights one head. SettingcurrentHeadaccent-borders that head and scales its card to1.05, while the other heads dim to0.45opacity.SPRINGS.smoothfor the highlight. Active-head transitions ride a smooth spring from@craft-bits/core/motion.prefers-reduced-motion: reducecollapses every spring to an instant swap.numHeadsmust dividedModel. Invalid combinations are clamped down to the largest divisor that fits.
Props
| Prop | Type | Default | Description |
|---|---|---|---|
dModel | number | 64 | Full model dimension before splitting. |
numHeads | number | 8 | Number of heads. Must divide dModel; otherwise clamped to the largest divisor that fits. |
currentHead | number | null | — | Controlled active head. Pair with onCurrentHeadChange. |
defaultCurrentHead | number | null | null | Uncontrolled initial active head. |
onCurrentHeadChange | (head) => void | — | Fires when the active head changes. |
showProjections | boolean | true | Render the per-head W_q / W_k / W_v labels and Q / K / V blocks. |
orientation | 'horizontal' | 'vertical' | 'horizontal' | Layout direction of the head cards. |
transition | Transition | SPRINGS.smooth | Spring for the active-head transition. |
className | string | — | Merged onto the root via cn(). |
Accessibility
- The root is
role="figure"witharia-labeldescribing the matrix dimensions ("Multi-head split: d_model=64, 8 heads of d_k=8"). - A visually-hidden
aria-live="polite"summary narrates the currently highlighted head and its dimension range. - Each head card is a real
<button>witharia-pressedand an explicitaria-label— keyboard users canTabto a head andSpaceorEnterto toggle it. - Color is never the only signal — every head card carries a numeric badge, a
d_kvalue, and a focus ring on:focus-visible. prefers-reduced-motion: reducecollapses every spring to an instant swap.
Credits
- Extracted from:
craftingattention(app/src/lessons/primitives/viz/MultiHeadSplitViz.tsx). The source was a full lesson widget with Explore / Predict / Challenge mode strips, a four-token Q / K / V pipeline SVG, per-head attention-pattern heatmaps, history-undo, narration, and a question bank aroundd_kand parameter counts. The library extract is the pure splitting visualisation — oned_modelbar,hslices with theird_k, and the per-headW_q / W_k / W_vprojection labels.