Online Softmax Stepper
Block-stream walkthrough of the online (streaming) softmax — the recurrence Flash Attention uses to compute attention tile by tile without ever materialising the full score matrix. The visualisation tiles the input into K blocks of B scores, then folds them in one at a time. Each fold updates the running max m_i, the running denominator l_i, and the recovered per-score weight distribution. Whenever a fresh maximum arrives, every previously accumulated weight rescales by α = exp(m_old − m_new) so the running distribution stays globally normalised.
The update rule on each new block B_b with max bMax:
α = exp(m_old − bMax) // = 1 when bMax ≤ m_old
m_new = max(m_old, bMax)
l_new = α · l_old + Σ exp(x − m_new) x ∈ B_b
p_j = exp(x_j − m_new) / l_new for every j seen so far
Because every exp(...) evaluates a non-positive argument, the streaming pass is numerically stable for arbitrarily large logits — the classic "subtract the max" softmax trick maintained incrementally.
No blocks processed yet. Each block of 16 scores updates the running max, the denominator, and the weight distribution. Press Process Block to begin.
Installation
npx shadcn@latest add https://craftbits.dev/r/online-softmax-stepper.jsonUsage
import { OnlineSoftmaxStepper } from "@craft-bits/viz/online-softmax-stepper";
<OnlineSoftmaxStepper />Drive playback from outside (parent scrubber, narration sync):
const [step, setStep] = useState(0);
const [playing, setPlaying] = useState(false);
<OnlineSoftmaxStepper
currentStep={step}
onCurrentStepChange={setStep}
playing={playing}
onPlayingChange={setPlaying}
playSpeed={1200}
/>Stream a custom score sequence with smaller blocks:
<OnlineSoftmaxStepper
scores={myLogits}
blockSize={8}
rescaleStaggerMs={40}
/>Understanding the component
- Precomputed snapshots. When
scoresorblockSizechanges, the entire walk is recomputed insideuseMemo— each snapshot carriesm_i,l_i, the per-score weight vector for every score seen so far, plus arescaledflag and the rescale factorα. Scrubbing or stepping is O(1). - The block timeline. The top row tiles
K = scores.length / blockSizerectangles. Unprocessed blocks sit on the neutral surface; the most recently processed block lights up in the accent colour; older processed blocks fall back to the warning hue. Colour is always paired with text — block index labels live inside every tile. - The readouts. Two horizontal bars track
m_iandl_iagainst their visible ranges. The weight-distribution panel below is one bar per score, normalised against the largest current weight so the relative shape of the distribution always fills the panel. - Rescale animation. When a fresh maximum arrives, every previously accumulated bar visually shrinks. A per-bar
delaystagger (capped at 50 ms viarescaleStaggerMs) so the wave of shrink reads as a single coordinated correction; latest-block bars enter without a delay. - α annotation. Whenever the active snapshot's
rescaledflag flips on, a chip appears with the literal rescale factorαformatted to three decimals. It animates in and out viaAnimatePresencewithinitial={false}so the first render is silent. - Phase machine. The narration paragraph reads the current phase (
observe/first-block/running/rescaled/complete) and frames it in plain English. - Reduced motion. Under
prefers-reduced-motion: reduce, every spring collapses to instant, the α chip's enter/exit fade is removed, and autoplay still ticks at the configured cadence.
Props
| Prop | Type | Default | Description |
|---|---|---|---|
scores | readonly number[] | 64-score PRNG sample | Score sequence to stream. Length should be a positive multiple of blockSize. |
blockSize | number | 16 | Number of scores grouped into a single Process Block step. |
currentStep | number | — | Controlled active step (0..blocks). |
defaultCurrentStep | number | 0 | Uncontrolled initial step. |
onCurrentStepChange | (step) => void | — | Fires whenever the active step changes. |
playing | boolean | — | Controlled autoplay state. |
defaultPlaying | boolean | false | Uncontrolled initial autoplay state. |
onPlayingChange | (playing) => void | — | Fires when play / pause flips. |
playSpeed | number | 1500 | Milliseconds between autoplayed Process Block taps. |
rescaleStaggerMs | number | 30 | Per-bar stagger during rescale, in ms. Clamped to [0, 50]. |
transition | Transition | SPRINGS.snap | Override the spring used for bar and value transitions. |
onStep | (snapshot) => void | — | Fires after each Process Block tap with the latest snapshot. |
onReset | () => void | — | Fires when Reset is clicked. |
className | string | — | Merged onto the root via cn(). |
Accessibility
- The root carries a hidden title via
aria-labelledby, summarising the K-block, B-score scope of the walk. - The block timeline is a
role="list"ofrole="listitem"cells whosearia-labeldescribes their processed / latest status — colour is never the only signal. - The weight-distribution panel is
role="img"with anaria-labelreporting how many scores it covers. - A visually hidden
aria-live="polite"region announces the current block count, running max, running denominator, and the most recent rescale factor. - The narration paragraph above the controls is
aria-live="polite"and reads as plain prose. - Process Block / Reset / Auto-play buttons each have visible focus rings and an
aria-pressedstate on Auto-play. - Motion respects
prefers-reduced-motion: reduce— every spring collapses to instant.
Credits
- Extracted from:
craftingattention(app/src/lessons/primitives/viz/OnlineSoftmaxStepper.tsx). The source paired the visualisation with theSoftmaxComparisonInlinestatic lookup, a hard-codedBG = oklch(0.16 0.01 280)background plusPURPLE/ORANGEinline literals, aChallengeBtn+TogglePillchrome strip fromConstructionPrimitives, and a four-phase narration tightly coupled to the 64-score / 4-block default. The viz extract strips every lesson-only import, remaps every inline colour to the--cb-accent/--cb-warning/--cb-bg-*token vocabulary, swaps the inlineSPRINGS.snappyreference for the canonicalSPRINGS.snap, and exposesscores/blockSize/currentStep/playing/playSpeed/rescaleStaggerMs/transition/onStep/onResetprops so the same recurrence powers any block size from 4 to 64.