Masking Viz
A teaching visualisation for the three masks that shape every transformer's attention pattern — causal (decoder-only, no peeking at the future), padding (batched-sequence padding tokens contribute nothing), and custom (arbitrary boolean blocking matrix). The component renders an N x N grid of post-softmax weights. Masked cells render with a cb-fg-subtle surface, a diagonal hatch pattern, and a centred -inf label so the blocked pattern is legible without relying on colour alone.
Attention masking heatmap, 8 by 8. causal (lower-triangular) mask blocks 28 of 64 cells.
Attention mask
1.00
−∞
−∞
−∞
−∞
−∞
−∞
−∞
0.16
0.84
−∞
−∞
−∞
−∞
−∞
−∞
0.36
0.16
0.48
−∞
−∞
−∞
−∞
−∞
0.08
0.32
0.28
0.32
−∞
−∞
−∞
−∞
0.15
0.08
0.38
0.18
0.22
−∞
−∞
−∞
0.06
0.12
0.13
0.15
0.24
0.30
−∞
−∞
0.09
0.13
0.26
0.25
0.10
0.11
0.08
−∞
0.34
0.10
0.06
0.12
0.16
0.06
0.09
0.06
28 / 64 masked · 44% blocked
Customize
Shape
8
Mask
causal
2
Installation
npx shadcn@latest add https://craftbits.dev/r/masking-viz.jsonUsage
import { MaskingViz } from "@craft-bits/core";
<MaskingViz seqLen={8} defaultMode="causal" />Switch on padding masking — block the trailing two columns for every row:
<MaskingViz seqLen={8} defaultMode="padding" paddingLength={2} />Drive the mode from a parent (controlled):
const [mode, setMode] = useState<MaskingVizMode>("causal");
<MaskingViz seqLen={8} mode={mode} onModeChange={setMode} />Pass a fully custom blocking mask (true means masked):
<MaskingViz
seqLen={4}
defaultMode="custom"
customMask={[
[false, false, true, true],
[false, false, false, true],
[true, false, false, false],
[true, true, false, false],
]}
/>Anatomy
- Three masks, one grid. The component always renders the same
seqLen x seqLenmatrix. Only the mask changes between modes —causalblocksj > i,paddingblocksj >= seqLen - paddingLength,customreadscustomMask[i][j]. Switching modes is a pure visual transition with no layout shift. - Real softmax weights. Cells are not faked. A deterministic LCG generates raw scores, masked cells get a large negative sentinel, then every row passes through
softmax(...). The redistribution is real — when you mask competitors, the surviving cells absorb their share. This is why the heatmap brightens along the lower triangle when you toggle causal on. - Masked-cell treatment. Blocked cells render with a
cb-fg-subtlebackground, a 45-degree hatch overlay, a centred-infglyph, and a fadedopacity: 0.4. Three redundant signals so the pattern reads at a glance and through every contrast and colourblind path. - Controlled + uncontrolled mode. Pass
modeplusonModeChangeto drive the active mode from a parent; omit them to let the component own its own state (optionally seeded withdefaultMode). The Radix-style pattern matches every other interactive component in the library. - A radiogroup, not a dropdown. The mode toggle is a
role="radiogroup"of three buttons — every option is visible, keyboard focus moves with arrow keys plus Tab, and the selected option is the only one painted withcb-accent(so the active mode reads as the loud item, not the only-visible item). - SPRINGS.smooth on cell transitions. Opacity transitions animate via a smooth spring from
@craft-bits/core/motion.prefers-reduced-motion: reducecollapses every transition to instant.
Props
| Prop | Type | Default | Description |
|---|---|---|---|
seqLen | number | 8 | Side length of the square matrix — clamped to [2, 32]. |
mode | "causal" | "padding" | "custom" | — | Controlled mask mode. Pair with onModeChange. |
defaultMode | "causal" | "padding" | "custom" | "causal" | Uncontrolled initial mode. |
onModeChange | (mode) => void | — | Fires when the active mode changes (toggle click or external setter). |
paddingLength | number | 2 | Trailing columns blocked in padding mode — clamped to [0, seqLen]. |
customMask | readonly (readonly boolean[])[] | — | Per-cell blocking matrix used in custom mode. true means masked. |
transition | Transition | SPRINGS.smooth | Spring for cell-opacity transitions. |
className | string | — | Merged onto the root via cn(). |
Accessibility
- The root is
role="figure"witharia-labelledbypointing at the "Attention mask" heading andaria-describedbyat a visually-hiddenaria-live="polite"summary. - The summary announces grid shape, active mode, and the count of masked cells whenever the mode flips, so screen readers receive the redistribution signal.
- Every cell carries
data-state="masked"ordata-state="active"so styling never depends on colour alone — the hatch overlay and-infglyph are redundant visual signals. - The mode toggle is a
role="radiogroup"ofrole="radio"buttons witharia-checkedmatching the active mode and a visiblefocus-visiblering keyed tocb-accent. - Cells use
role="img"with anaria-labellike "Row 2 key 5 weight 0.18" or "Row 2 key 5 masked (-inf)" so the matrix is fully narratable. prefers-reduced-motion: reducecollapses opacity transitions to instant.
Credits
- Extracted from:
craftingattention(app/src/lessons/primitives/viz/MaskingViz.tsx). Re-architected from the source's statefulWidget-hosted Explore / Predict / Challenge lesson (withuseWidgetHistoryundo/redo, four bookmark presets, aModeStripplusChallengeBtnplusFeedbackBadgechrome, a click-to-mark predict round generator, a "make weight exceed target" challenge round generator, twoMaskButtontogglers, the side-by-sideBarChartsoftmax-distribution figure, and the eight hard-coded"The cat sat on the mat [PAD] [PAD]"tokens) into a pure declarative grid component. Generalised the fixed eight-token sequence to an arbitraryseqLen x seqLenmatrix, replaced the booleancausalandpaddingflags with a singlemode: "causal" | "padding" | "custom"enum plus acustomMaskmatrix so the component teaches the more general masking primitive (decoder-only causal masking, encoder padding masking, sparse-attention windows, prefix-LM masks). Stripped the project's--color-accent-*/--color-ink-*/--color-surface-*raw vars and inline transitions; cell transitions now flow throughSPRINGS.smooth. Added Radix-style controlled+uncontrolledmodeAPI, arole="radiogroup"mode toggle, deterministic seeded score generation, androle="img"per-cellaria-labels for full screen-reader narration.