Masking Viz

A teaching visualisation for the three masks that shape every transformer's attention pattern — causal (decoder-only, no peeking at the future), padding (batched-sequence padding tokens contribute nothing), and custom (arbitrary boolean blocking matrix). The component renders an N x N grid of post-softmax weights. Masked cells render with a cb-fg-subtle surface, a diagonal hatch pattern, and a centred -inf label so the blocked pattern is legible without relying on colour alone.

Attention masking heatmap, 8 by 8. causal (lower-triangular) mask blocks 28 of 64 cells.
Attention mask

28 / 64 masked · 44% blocked

Customize
Shape
8
Mask
causal
2

Installation

npx shadcn@latest add https://craftbits.dev/r/masking-viz.json

Usage

import { MaskingViz } from "@craft-bits/core";
 
<MaskingViz seqLen={8} defaultMode="causal" />

Switch on padding masking — block the trailing two columns for every row:

<MaskingViz seqLen={8} defaultMode="padding" paddingLength={2} />

Drive the mode from a parent (controlled):

const [mode, setMode] = useState<MaskingVizMode>("causal");
 
<MaskingViz seqLen={8} mode={mode} onModeChange={setMode} />

Pass a fully custom blocking mask (true means masked):

<MaskingViz
  seqLen={4}
  defaultMode="custom"
  customMask={[
    [false, false, true,  true],
    [false, false, false, true],
    [true,  false, false, false],
    [true,  true,  false, false],
  ]}
/>

Anatomy

  1. Three masks, one grid. The component always renders the same seqLen x seqLen matrix. Only the mask changes between modes — causal blocks j > i, padding blocks j >= seqLen - paddingLength, custom reads customMask[i][j]. Switching modes is a pure visual transition with no layout shift.
  2. Real softmax weights. Cells are not faked. A deterministic LCG generates raw scores, masked cells get a large negative sentinel, then every row passes through softmax(...). The redistribution is real — when you mask competitors, the surviving cells absorb their share. This is why the heatmap brightens along the lower triangle when you toggle causal on.
  3. Masked-cell treatment. Blocked cells render with a cb-fg-subtle background, a 45-degree hatch overlay, a centred -inf glyph, and a faded opacity: 0.4. Three redundant signals so the pattern reads at a glance and through every contrast and colourblind path.
  4. Controlled + uncontrolled mode. Pass mode plus onModeChange to drive the active mode from a parent; omit them to let the component own its own state (optionally seeded with defaultMode). The Radix-style pattern matches every other interactive component in the library.
  5. A radiogroup, not a dropdown. The mode toggle is a role="radiogroup" of three buttons — every option is visible, keyboard focus moves with arrow keys plus Tab, and the selected option is the only one painted with cb-accent (so the active mode reads as the loud item, not the only-visible item).
  6. SPRINGS.smooth on cell transitions. Opacity transitions animate via a smooth spring from @craft-bits/core/motion. prefers-reduced-motion: reduce collapses every transition to instant.

Props

PropTypeDefaultDescription
seqLennumber8Side length of the square matrix — clamped to [2, 32].
mode"causal" | "padding" | "custom"Controlled mask mode. Pair with onModeChange.
defaultMode"causal" | "padding" | "custom""causal"Uncontrolled initial mode.
onModeChange(mode) => voidFires when the active mode changes (toggle click or external setter).
paddingLengthnumber2Trailing columns blocked in padding mode — clamped to [0, seqLen].
customMaskreadonly (readonly boolean[])[]Per-cell blocking matrix used in custom mode. true means masked.
transitionTransitionSPRINGS.smoothSpring for cell-opacity transitions.
classNamestringMerged onto the root via cn().

Accessibility

  • The root is role="figure" with aria-labelledby pointing at the "Attention mask" heading and aria-describedby at a visually-hidden aria-live="polite" summary.
  • The summary announces grid shape, active mode, and the count of masked cells whenever the mode flips, so screen readers receive the redistribution signal.
  • Every cell carries data-state="masked" or data-state="active" so styling never depends on colour alone — the hatch overlay and -inf glyph are redundant visual signals.
  • The mode toggle is a role="radiogroup" of role="radio" buttons with aria-checked matching the active mode and a visible focus-visible ring keyed to cb-accent.
  • Cells use role="img" with an aria-label like "Row 2 key 5 weight 0.18" or "Row 2 key 5 masked (-inf)" so the matrix is fully narratable.
  • prefers-reduced-motion: reduce collapses opacity transitions to instant.

Credits

  • Extracted from: craftingattention (app/src/lessons/primitives/viz/MaskingViz.tsx). Re-architected from the source's stateful Widget-hosted Explore / Predict / Challenge lesson (with useWidgetHistory undo/redo, four bookmark presets, a ModeStrip plus ChallengeBtn plus FeedbackBadge chrome, a click-to-mark predict round generator, a "make weight exceed target" challenge round generator, two MaskButton togglers, the side-by-side BarChart softmax-distribution figure, and the eight hard-coded "The cat sat on the mat [PAD] [PAD]" tokens) into a pure declarative grid component. Generalised the fixed eight-token sequence to an arbitrary seqLen x seqLen matrix, replaced the boolean causal and padding flags with a single mode: "causal" | "padding" | "custom" enum plus a customMask matrix so the component teaches the more general masking primitive (decoder-only causal masking, encoder padding masking, sparse-attention windows, prefix-LM masks). Stripped the project's --color-accent-* / --color-ink-* / --color-surface-* raw vars and inline transitions; cell transitions now flow through SPRINGS.smooth. Added Radix-style controlled+uncontrolled mode API, a role="radiogroup" mode toggle, deterministic seeded score generation, and role="img" per-cell aria-labels for full screen-reader narration.