Online Softmax Stepper

Block-stream walkthrough of the online (streaming) softmax — the recurrence Flash Attention uses to compute attention tile by tile without ever materialising the full score matrix. The visualisation tiles the input into K blocks of B scores, then folds them in one at a time. Each fold updates the running max m_i, the running denominator l_i, and the recovered per-score weight distribution. Whenever a fresh maximum arrives, every previously accumulated weight rescales by α = exp(m_old − m_new) so the running distribution stays globally normalised.

The update rule on each new block B_b with max bMax:

α     = exp(m_old − bMax)              // = 1 when bMax ≤ m_old
m_new = max(m_old, bMax)
l_new = α · l_old + Σ exp(x − m_new)   x ∈ B_b
p_j   = exp(x_j − m_new) / l_new        for every j seen so far

Because every exp(...) evaluates a non-positive argument, the streaming pass is numerically stable for arbitrarily large logits — the classic "subtract the max" softmax trick maintained incrementally.

Online softmax stepper — fold 4 blocks of 16 scores into a streaming distribution.
K blocks (16 scores each)
Block 0
Block 1
Block 2
Block 3
m_i
l_i
weight distribution

No blocks processed yet. Each block of 16 scores updates the running max, the denominator, and the weight distribution. Press Process Block to begin.

0/4 blocks
Awaiting first block.
Customize
Stream
16
1500 ms
30 ms

Installation

npx shadcn@latest add https://craftbits.dev/r/online-softmax-stepper.json

Usage

import { OnlineSoftmaxStepper } from "@craft-bits/viz/online-softmax-stepper";
 
<OnlineSoftmaxStepper />

Drive playback from outside (parent scrubber, narration sync):

const [step, setStep] = useState(0);
const [playing, setPlaying] = useState(false);
 
<OnlineSoftmaxStepper
  currentStep={step}
  onCurrentStepChange={setStep}
  playing={playing}
  onPlayingChange={setPlaying}
  playSpeed={1200}
/>

Stream a custom score sequence with smaller blocks:

<OnlineSoftmaxStepper
  scores={myLogits}
  blockSize={8}
  rescaleStaggerMs={40}
/>

Understanding the component

  1. Precomputed snapshots. When scores or blockSize changes, the entire walk is recomputed inside useMemo — each snapshot carries m_i, l_i, the per-score weight vector for every score seen so far, plus a rescaled flag and the rescale factor α. Scrubbing or stepping is O(1).
  2. The block timeline. The top row tiles K = scores.length / blockSize rectangles. Unprocessed blocks sit on the neutral surface; the most recently processed block lights up in the accent colour; older processed blocks fall back to the warning hue. Colour is always paired with text — block index labels live inside every tile.
  3. The readouts. Two horizontal bars track m_i and l_i against their visible ranges. The weight-distribution panel below is one bar per score, normalised against the largest current weight so the relative shape of the distribution always fills the panel.
  4. Rescale animation. When a fresh maximum arrives, every previously accumulated bar visually shrinks. A per-bar delay stagger (capped at 50 ms via rescaleStaggerMs) so the wave of shrink reads as a single coordinated correction; latest-block bars enter without a delay.
  5. α annotation. Whenever the active snapshot's rescaled flag flips on, a chip appears with the literal rescale factor α formatted to three decimals. It animates in and out via AnimatePresence with initial={false} so the first render is silent.
  6. Phase machine. The narration paragraph reads the current phase (observe / first-block / running / rescaled / complete) and frames it in plain English.
  7. Reduced motion. Under prefers-reduced-motion: reduce, every spring collapses to instant, the α chip's enter/exit fade is removed, and autoplay still ticks at the configured cadence.

Props

PropTypeDefaultDescription
scoresreadonly number[]64-score PRNG sampleScore sequence to stream. Length should be a positive multiple of blockSize.
blockSizenumber16Number of scores grouped into a single Process Block step.
currentStepnumberControlled active step (0..blocks).
defaultCurrentStepnumber0Uncontrolled initial step.
onCurrentStepChange(step) => voidFires whenever the active step changes.
playingbooleanControlled autoplay state.
defaultPlayingbooleanfalseUncontrolled initial autoplay state.
onPlayingChange(playing) => voidFires when play / pause flips.
playSpeednumber1500Milliseconds between autoplayed Process Block taps.
rescaleStaggerMsnumber30Per-bar stagger during rescale, in ms. Clamped to [0, 50].
transitionTransitionSPRINGS.snapOverride the spring used for bar and value transitions.
onStep(snapshot) => voidFires after each Process Block tap with the latest snapshot.
onReset() => voidFires when Reset is clicked.
classNamestringMerged onto the root via cn().

Accessibility

  • The root carries a hidden title via aria-labelledby, summarising the K-block, B-score scope of the walk.
  • The block timeline is a role="list" of role="listitem" cells whose aria-label describes their processed / latest status — colour is never the only signal.
  • The weight-distribution panel is role="img" with an aria-label reporting how many scores it covers.
  • A visually hidden aria-live="polite" region announces the current block count, running max, running denominator, and the most recent rescale factor.
  • The narration paragraph above the controls is aria-live="polite" and reads as plain prose.
  • Process Block / Reset / Auto-play buttons each have visible focus rings and an aria-pressed state on Auto-play.
  • Motion respects prefers-reduced-motion: reduce — every spring collapses to instant.

Credits

  • Extracted from: craftingattention (app/src/lessons/primitives/viz/OnlineSoftmaxStepper.tsx). The source paired the visualisation with the SoftmaxComparisonInline static lookup, a hard-coded BG = oklch(0.16 0.01 280) background plus PURPLE / ORANGE inline literals, a ChallengeBtn + TogglePill chrome strip from ConstructionPrimitives, and a four-phase narration tightly coupled to the 64-score / 4-block default. The viz extract strips every lesson-only import, remaps every inline colour to the --cb-accent / --cb-warning / --cb-bg-* token vocabulary, swaps the inline SPRINGS.snappy reference for the canonical SPRINGS.snap, and exposes scores / blockSize / currentStep / playing / playSpeed / rescaleStaggerMs / transition / onStep / onReset props so the same recurrence powers any block size from 4 to 64.