Attention Scale Viz
Side-by-side bar charts of softmax(QKᵀ / √dₖ) (left) versus softmax(QKᵀ) (right) over the same row of raw dot-product logits. As the dₖ slider grows, the unscaled distribution saturates toward a one-hot — gradients vanish everywhere except the argmax — while the scaled distribution keeps its shape. Makes the case for the 1/√dₖ factor in scaled dot-product attention visceral.
Where the Attention Heatmap shows a static N×N weight matrix and the Attention Stepper Viz walks an output cursor across precomputed weights, this primitive isolates a single softmax row and lets the user drag the scaling factor in real time.
Installation
npx shadcn@latest add https://craftbits.dev/r/attention-scale-viz.jsonUsage
import { AttentionScaleViz } from "@craft-bits/core";
const logits = [4.0, 3.0, 2.0, 1.0, 0.5];
<AttentionScaleViz
logits={logits}
labels={["k₁", "k₂", "k₃", "k₄", "k₅"]}
defaultDk={64}
/>Drive dk from a parent so the slider can be controlled by a scrollytelling step or a sibling widget:
const [dk, setDk] = useState(64);
<AttentionScaleViz
logits={logits}
dk={dk}
onDkChange={setDk}
/>Hide the comparison column for a compact display:
<AttentionScaleViz logits={logits} showComparison={false} />Understanding the component
- Two charts, one row of logits. The same QKᵀ row is fed to both panels. The left panel divides by
√dₖbefore softmax; the right panel doesn't. Both use the textbook max-subtraction trick so even huge unscaled logits never overflowexp. - dₖ drives the gap. As the slider walks from 1 toward 1024, the scale factor
1/√dₖshrinks from 1 to ~0.031. The right chart sees the raw logits and collapses to a one-hot; the left chart seesz / √dₖand stays well-spread. - Bar height animates with
SPRINGS.smooth. Per-barmotion.rectinterpolatesyandheighton every dₖ change, so the saturation is felt as a continuous motion rather than a snap. Reduced-motion users always get instant transitions. - Argmax callouts. The peak bar in each chart floats its
softmax(...)percentage above its top edge so the "100% on k₁" moment in the unscaled column reads at a glance. - Uniform
1/Nreference. A dashed accent line marks where every bar would land for a perfectly flat distribution — useful for spotting "still uniform" vs "starting to peak" without squinting. - Controlled + uncontrolled
dk. PairdkwithonDkChangefor full control; leave them off to let the component own its own state viadefaultDk.
Props
| Prop | Type | Default | Description |
|---|---|---|---|
logits | readonly number[] | — | Raw QKᵀ row for a single query. Required. |
labels | readonly string[] | numeric indices | Key-token labels under each bar. |
dk | number | — | Controlled head dimension. Pair with onDkChange. |
defaultDk | number | 64 | Uncontrolled initial head dimension. |
onDkChange | (dk: number) => void | — | Fires when the dₖ slider changes. |
dkRange | readonly [number, number] | [1, 1024] | [min, max] for the dₖ slider. |
applyScale | boolean | true | When false, the left chart shows plain softmax(z) too. |
showComparison | boolean | true | When false, only the scaled chart renders. |
transition | Transition | SPRINGS.smooth | Spring for bar-height transitions. |
className | string | — | Merged onto the root <div> via cn(). |
Accessibility
- The outer element is
role="figure"witharia-labelledbypointing at the heading and anaria-live="polite"summary that announces the current dₖ plus the peak probability and key in each chart. - Each chart has its own
<svg role="img">witharia-labelledbypointing at a per-chart title ("scaled · z / √dₖ" / "unscaled · softmax(z)") so the two panels are distinguishable in an AT outline. - The dₖ slider is a native
<input type="range">witharia-valuemin/aria-valuemax/aria-valuenow/aria-valuetext— keyboard arrows nudge the head dimension and screen readers narrate the value. - Color is never the only signal: the argmax bar carries its softmax percentage as visible text, and labels under each bar bold when they win.
prefers-reduced-motion: reducecollapses every transition toduration: 0— the saturation snaps instead of springs.- Color contrast is theme-driven via
--cb-accent/--cb-fg/--cb-fg-mutedtokens, so AA contrast holds in both light and dark mode.
Credits
- Extracted from:
craftingattention(app/src/lessons/primitives/viz/AttentionScaleViz.tsx). The original was an interactive widget about the quadratic memory wall (N² scaling), bundled with a lesson harness. This extract is a sibling concept — the1/√dₖscaling factor — distilled into a focused primitive: two softmax bars, one slider, controlled / uncontrolled wiring.