Online Softmax Stepper

Block-stream walkthrough of the online (streaming) softmax — the recurrence Flash Attention uses to compute attention tile by tile without ever materialising the full score matrix. The visualisation tiles the input into K blocks of B scores, then folds them in one at a time. Each fold updates the running max m_i, the running denominator l_i, and the recovered per-score weight distribution. Whenever a fresh maximum arrives, every previously accumulated weight rescales by α = exp(m_old − m_new) so the running distribution stays globally normalised.

The update rule on each new block B_b with max bMax:

α     = exp(m_old − bMax)              // = 1 when bMax ≤ m_old
m_new = max(m_old, bMax)
l_new = α · l_old + Σ exp(x − m_new)   x ∈ B_b
p_j   = exp(x_j − m_new) / l_new        for every j seen so far

Because every exp(...) evaluates a non-positive argument, the streaming pass is numerically stable for arbitrarily large logits — the classic "subtract the max" softmax trick maintained incrementally.

K blocks (16 scores each)

Block 0

Block 1

Block 2

Block 3

m_i

—

l_i

—

weight distribution

no data yet

No blocks processed yet. Each block of 16 scores updates the running max, the denominator, and the weight distribution. Press Process Block to begin.

0/4 blocks

Customize

Stream

block size16

play speed (ms)1500 ms

rescale stagger (ms)30 ms

boot in autoplay

Installation

npx shadcn@latest add https://craftbits.dev/r/online-softmax-stepper.json

Usage

import { OnlineSoftmaxStepper } from "@craft-bits/viz/online-softmax-stepper";
 
<OnlineSoftmaxStepper />

Drive playback from outside (parent scrubber, narration sync):

const [step, setStep] = useState(0);
const [playing, setPlaying] = useState(false);
 
<OnlineSoftmaxStepper
  currentStep={step}
  onCurrentStepChange={setStep}
  playing={playing}
  onPlayingChange={setPlaying}
  playSpeed={1200}
/>

Stream a custom score sequence with smaller blocks:

<OnlineSoftmaxStepper
  scores={myLogits}
  blockSize={8}
  rescaleStaggerMs={40}
/>

Understanding the component

Precomputed snapshots. When scores or blockSize changes, the entire walk is recomputed inside useMemo — each snapshot carries m_i, l_i, the per-score weight vector for every score seen so far, plus a rescaled flag and the rescale factor α. Scrubbing or stepping is O(1).
The block timeline. The top row tiles K = scores.length / blockSize rectangles. Unprocessed blocks sit on the neutral surface; the most recently processed block lights up in the accent colour; older processed blocks fall back to the warning hue. Colour is always paired with text — block index labels live inside every tile.
The readouts. Two horizontal bars track m_i and l_i against their visible ranges. The weight-distribution panel below is one bar per score, normalised against the largest current weight so the relative shape of the distribution always fills the panel.
Rescale animation. When a fresh maximum arrives, every previously accumulated bar visually shrinks. A per-bar delay stagger (capped at 50 ms via rescaleStaggerMs) so the wave of shrink reads as a single coordinated correction; latest-block bars enter without a delay.
α annotation. Whenever the active snapshot's rescaled flag flips on, a chip appears with the literal rescale factor α formatted to three decimals. It animates in and out via AnimatePresence with initial={false} so the first render is silent.
Phase machine. The narration paragraph reads the current phase (observe / first-block / running / rescaled / complete) and frames it in plain English.
Reduced motion. Under prefers-reduced-motion: reduce, every spring collapses to instant, the α chip's enter/exit fade is removed, and autoplay still ticks at the configured cadence.

Props

Prop	Type	Default	Description
`scores`	`readonly number[]`	64-score PRNG sample	Score sequence to stream. Length should be a positive multiple of `blockSize`.
`blockSize`	`number`	`16`	Number of scores grouped into a single Process Block step.
`currentStep`	`number`	—	Controlled active step (`0..blocks`).
`defaultCurrentStep`	`number`	`0`	Uncontrolled initial step.
`onCurrentStepChange`	`(step) => void`	—	Fires whenever the active step changes.
`playing`	`boolean`	—	Controlled autoplay state.
`defaultPlaying`	`boolean`	`false`	Uncontrolled initial autoplay state.
`onPlayingChange`	`(playing) => void`	—	Fires when play / pause flips.
`playSpeed`	`number`	`1500`	Milliseconds between autoplayed Process Block taps.
`rescaleStaggerMs`	`number`	`30`	Per-bar stagger during rescale, in ms. Clamped to `[0, 50]`.
`transition`	`Transition`	`SPRINGS.snap`	Override the spring used for bar and value transitions.
`onStep`	`(snapshot) => void`	—	Fires after each Process Block tap with the latest snapshot.
`onReset`	`() => void`	—	Fires when Reset is clicked.
`className`	`string`	—	Merged onto the root via `cn()`.

Accessibility

The root carries a hidden title via aria-labelledby, summarising the K-block, B-score scope of the walk.
The block timeline is a role="list" of role="listitem" cells whose aria-label describes their processed / latest status — colour is never the only signal.
The weight-distribution panel is role="img" with an aria-label reporting how many scores it covers.
A visually hidden aria-live="polite" region announces the current block count, running max, running denominator, and the most recent rescale factor.
The narration paragraph above the controls is aria-live="polite" and reads as plain prose.
Process Block / Reset / Auto-play buttons each have visible focus rings and an aria-pressed state on Auto-play.
Motion respects prefers-reduced-motion: reduce — every spring collapses to instant.

Credits

Extracted from: craftingattention (app/src/lessons/primitives/viz/OnlineSoftmaxStepper.tsx). The source paired the visualisation with the SoftmaxComparisonInline static lookup, a hard-coded BG = oklch(0.16 0.01 280) background plus PURPLE / ORANGE inline literals, a ChallengeBtn + TogglePill chrome strip from ConstructionPrimitives, and a four-phase narration tightly coupled to the 64-score / 4-block default. The viz extract strips every lesson-only import, remaps every inline colour to the --cb-accent / --cb-warning / --cb-bg-* token vocabulary, swaps the inline SPRINGS.snappy reference for the canonical SPRINGS.snap, and exposes scores / blockSize / currentStep / playing / playSpeed / rescaleStaggerMs / transition / onStep / onReset props so the same recurrence powers any block size from 4 to 64.