Flash Attention Viz
A teaching visualisation for the tiling strategy behind Flash Attention. Standard attention materialises the full N×N attention matrix in HBM; Flash Attention slices it into Br × Bc tiles that fit in SRAM and computes a streaming softmax incrementally. The diagram renders side-by-side panels for Q (N×d) and Kᵀ (d×N), with the output matrix O (N×N) gridded into tiles below — a single cursor highlights the active Q block-row, K block-column, and corresponding output tile.
Tile 1 of 16: Q block-row 1, K block-column 1. Block size 2 × 2.
Flash attention tilingtile 01 / 16
Customize
Shape
8
2
Playback
500
Installation
npx shadcn@latest add https://craftbits.dev/r/flash-attention-viz.jsonUsage
import { FlashAttentionViz } from "@craft-bits/core";
<FlashAttentionViz seqLen={8} blockSize={2} />Drive the cursor from outside the component:
const [block, setBlock] = useState({ row: 0, col: 0 });
<FlashAttentionViz
seqLen={8}
blockSize={2}
currentBlock={block}
onCurrentBlockChange={setBlock}
playing={false}
/>Slow the sweep down for an explainer:
<FlashAttentionViz seqLen={12} blockSize={3} playSpeed={1200} />Understanding the component
- Three panels, one cursor. A single block coordinate
{ row, col }drives every highlight: the Q row-block at the top-left, the Kᵀ column-block at the top-right, and the output tile in O below. - Row-major sweep. Autoplay walks
(row, col)row-major across the tile grid and wraps back to the origin once every tile has been visited. - Why Q is a row-block and Kᵀ is a column-block. Flash Attention loads one block of Q rows and one block of K columns into SRAM, multiplies them to form a
Br × Bcscore tile, applies a streaming softmax, and accumulates into the matching output tile. The diagram mirrors that exactly. - Output ramp encodes progress. Past tiles inherit a fading accent ramp (deeper near the cursor, lighter near the origin); the current tile is solid accent with an inset accent border; future tiles render dim.
SPRINGS.smoothfor highlights. Block-membership flips animate via a smooth spring from@craft-bits/core/motion.prefers-reduced-motion: reduceparks the cursor on the final tile, pauses autoplay, and collapses every transition to an instant swap.- Pure primitive. The component does no real linear algebra —
Q,Kᵀ, andOare cell grids, not numeric matrices. UseAttentionHeatmapfor an actual attention-weight inspector.
Props
| Prop | Type | Default | Description |
|---|---|---|---|
seqLen | number | 8 | Sequence length N — clamped to 2..64. |
blockSize | number | 2 | Tile side length Br = Bc — clamped to 1..seqLen. |
currentBlock | FlashAttentionBlock | — | Controlled block. Pair with onCurrentBlockChange. |
defaultCurrentBlock | FlashAttentionBlock | { row: 0, col: 0 } | Uncontrolled initial block. |
onCurrentBlockChange | (block) => void | — | Fires on autoplay tick and manual scrub. |
playing | boolean | — | Controlled play state. Pair with onPlayingChange. |
defaultPlaying | boolean | true | Uncontrolled initial play state. |
onPlayingChange | (playing) => void | — | Fires when play / pause flips. |
playSpeed | number | 500 | Milliseconds between block advances. |
modelDim | number | 4 | Hidden d (cols of Q, rows of Kᵀ) used for the side panels — clamped to 2..16. |
transition | Transition | SPRINGS.smooth | Spring for block highlights. |
className | string | — | Merged onto the root via cn(). |
Accessibility
- The root is
role="figure"witharia-labelledbypointing at the "Flash attention tiling" heading andaria-describedbyat a visually-hiddenaria-live="polite"summary. - The summary announces the current tile index and the Q block-row + K block-column whenever the cursor advances.
- The play / pause button uses
aria-pressed; the label flips between "Play tiling" and "Pause tiling". - The scrubber is a native
<input type="range">with an explicitaria-label, so keyboard arrows step the cursor and screen readers narrate the value. - Colour is never the only signal — the tile counter, panel shape labels, and live summary are textual.
prefers-reduced-motion: reduceparks the cursor on the final tile, pauses autoplay, and suppresses block-highlight transitions.
Credits
- Extracted from:
craftingattention(app/src/lessons/primitives/viz/FlashAttentionViz.tsx). Stripped the Widget chrome (history-undo, bookmarks), the explore / predict / challenge mode strips, the standard-vs-flash mode toggle, the memory-bytes formatting and badges, and the lesson-specific copy. Generalised the fixedN=1024, BLOCK=256, 4×4 tilesto arbitraryseqLenandblockSizewith controlled / uncontrolled state pairs and a row-major(row, col)sweep.