Flash Attention Viz

A teaching visualisation for the tiling strategy behind Flash Attention. Standard attention materialises the full N×N attention matrix in HBM; Flash Attention slices it into Br × Bc tiles that fit in SRAM and computes a streaming softmax incrementally. The diagram renders side-by-side panels for Q (N×d) and Kᵀ (d×N), with the output matrix O (N×N) gridded into tiles below — a single cursor highlights the active Q block-row, K block-column, and corresponding output tile.

Tile 1 of 16: Q block-row 1, K block-column 1. Block size 2 × 2.
Flash attention tilingtile 01 / 16
Customize
Shape
8
2
Playback
500

Installation

npx shadcn@latest add https://craftbits.dev/r/flash-attention-viz.json

Usage

import { FlashAttentionViz } from "@craft-bits/core";
 
<FlashAttentionViz seqLen={8} blockSize={2} />

Drive the cursor from outside the component:

const [block, setBlock] = useState({ row: 0, col: 0 });
 
<FlashAttentionViz
  seqLen={8}
  blockSize={2}
  currentBlock={block}
  onCurrentBlockChange={setBlock}
  playing={false}
/>

Slow the sweep down for an explainer:

<FlashAttentionViz seqLen={12} blockSize={3} playSpeed={1200} />

Understanding the component

  1. Three panels, one cursor. A single block coordinate { row, col } drives every highlight: the Q row-block at the top-left, the Kᵀ column-block at the top-right, and the output tile in O below.
  2. Row-major sweep. Autoplay walks (row, col) row-major across the tile grid and wraps back to the origin once every tile has been visited.
  3. Why Q is a row-block and Kᵀ is a column-block. Flash Attention loads one block of Q rows and one block of K columns into SRAM, multiplies them to form a Br × Bc score tile, applies a streaming softmax, and accumulates into the matching output tile. The diagram mirrors that exactly.
  4. Output ramp encodes progress. Past tiles inherit a fading accent ramp (deeper near the cursor, lighter near the origin); the current tile is solid accent with an inset accent border; future tiles render dim.
  5. SPRINGS.smooth for highlights. Block-membership flips animate via a smooth spring from @craft-bits/core/motion. prefers-reduced-motion: reduce parks the cursor on the final tile, pauses autoplay, and collapses every transition to an instant swap.
  6. Pure primitive. The component does no real linear algebra — Q, Kᵀ, and O are cell grids, not numeric matrices. Use AttentionHeatmap for an actual attention-weight inspector.

Props

PropTypeDefaultDescription
seqLennumber8Sequence length N — clamped to 2..64.
blockSizenumber2Tile side length Br = Bc — clamped to 1..seqLen.
currentBlockFlashAttentionBlockControlled block. Pair with onCurrentBlockChange.
defaultCurrentBlockFlashAttentionBlock{ row: 0, col: 0 }Uncontrolled initial block.
onCurrentBlockChange(block) => voidFires on autoplay tick and manual scrub.
playingbooleanControlled play state. Pair with onPlayingChange.
defaultPlayingbooleantrueUncontrolled initial play state.
onPlayingChange(playing) => voidFires when play / pause flips.
playSpeednumber500Milliseconds between block advances.
modelDimnumber4Hidden d (cols of Q, rows of Kᵀ) used for the side panels — clamped to 2..16.
transitionTransitionSPRINGS.smoothSpring for block highlights.
classNamestringMerged onto the root via cn().

Accessibility

  • The root is role="figure" with aria-labelledby pointing at the "Flash attention tiling" heading and aria-describedby at a visually-hidden aria-live="polite" summary.
  • The summary announces the current tile index and the Q block-row + K block-column whenever the cursor advances.
  • The play / pause button uses aria-pressed; the label flips between "Play tiling" and "Pause tiling".
  • The scrubber is a native <input type="range"> with an explicit aria-label, so keyboard arrows step the cursor and screen readers narrate the value.
  • Colour is never the only signal — the tile counter, panel shape labels, and live summary are textual.
  • prefers-reduced-motion: reduce parks the cursor on the final tile, pauses autoplay, and suppresses block-highlight transitions.

Credits

  • Extracted from: craftingattention (app/src/lessons/primitives/viz/FlashAttentionViz.tsx). Stripped the Widget chrome (history-undo, bookmarks), the explore / predict / challenge mode strips, the standard-vs-flash mode toggle, the memory-bytes formatting and badges, and the lesson-specific copy. Generalised the fixed N=1024, BLOCK=256, 4×4 tiles to arbitrary seqLen and blockSize with controlled / uncontrolled state pairs and a row-major (row, col) sweep.