L2 Weight Decay Viz
A teaching visualisation for the L2 regularisation term every modern optimiser carries. The penalty λ·‖w‖² adds a −λ·η·w gradient component that multiplies each weight by the shrink factor (1 − λ·η) every step. Because the loss is proportional to each weight, the same fractional cut means a much larger absolute hit for big weights — small ones barely notice.
w_t = w_0 · (1 - λ·η)^t
The component plots the weight vector as a signed histogram (positive bars in cb-accent, negative bars in cb-warning) with dashed ghost outlines showing the original magnitudes. A λ slider plus a step scrubber drive a closed-form decay so any cursor position is a pure function of the inputs — SSR / hydration-safe and instantly snap-anywhere.
Installation
npx shadcn@latest add https://craftbits.dev/r/l2-weight-decay-viz.jsonUsage
import { L2WeightDecayViz } from "@craft-bits/core";
<L2WeightDecayViz
weights={[-3.0, 2.5, -1.0, 0.5, 4.0, -2.0, 1.5, -0.3]}
defaultLambda={0.05}
learningRate={0.1}
/>Drive the cursor externally from a narration step and run an autoplay loop:
const [step, setStep] = useState(0);
<L2WeightDecayViz
weights={initialWeights}
lambda={0.1}
learningRate={0.1}
currentStep={step}
onCurrentStepChange={setStep}
playing
playSpeed={180}
/>Hide the readout band and ghost outlines for a stripped-down figure:
<L2WeightDecayViz
weights={initialWeights}
defaultLambda={0.02}
showReadout={false}
showGhostOriginal={false}
/>Anatomy
- Multiplicative shrinkage, not subtraction. Every step applies
w[i] *= (1 - λ·η). The factor is global, so each weight loses the same fraction per step. But a 10% cut to4.0removes0.40; a 10% cut to0.3removes only0.03. Big bars melt; small bars barely shift — exactly what L2 regularisation looks like. - Closed-form, not iterative. The current weight vector is computed as
w_0 · (1 - λ·η)^tdirectly. Scrubbing the step slider doesn't replay an integration loop — it jumps to the analytic answer. Sameweights,lambda,learningRate,stepquadruple always produces the same bars, so SSR and external scrubbers are perfectly aligned. - Sign is preserved. As long as
λ·ηstays in(0, 1), the shrink factor stays positive, so multiplying by it never flips a sign — a property the bar colours (accent forw >= 0, warning forw < 0) make visible. Ifλ·η >= 1the component clamps the factor at zero rather than flipping sign, because past that point the gradient-descent step is bigger than the weight itself and the L2 story breaks. - Asymptotic, not finite. Because each step multiplies by a fixed factor less than 1, the weights halve, halve, halve — always closer, never exactly zero. The
~0label appears once a bar is within0.01so the chart reads cleanly without ever lying about "reached zero." - Ghost outlines anchor the eye. Dashed rectangles in
cb-border-strongmarkw_0[i]at low opacity — the learner can compare current vs. original without doing arithmetic. Toggle off viashowGhostOriginal={false}for a still-frame. - Controlled or uncontrolled everywhere.
lambdaandcurrentStepeach have controlled (value+on*Change) and uncontrolled (default*) forms (the Radix pattern).playingandplaySpeedare simple props so consumers own the transport — there is no built-in play button to keep the primitive small. - Reduced motion.
prefers-reduced-motion: reducecollapses every bar spring to an instant swap and disables autoplay; the slider still scrubs.
Props
| Prop | Type | Default | Description |
|---|---|---|---|
weights | readonly number[] | — | Initial weight vector. Non-finite entries are dropped. |
lambda | number | — | Controlled λ. Pair with onLambdaChange. |
defaultLambda | number | 0.01 | Uncontrolled initial λ. |
onLambdaChange | (lambda) => void | — | Fires when the slider moves. |
learningRate | number | 0.1 | Gradient-descent step size η. |
currentStep | number | — | Controlled cursor step. Pair with onCurrentStepChange. |
defaultCurrentStep | number | 0 | Uncontrolled initial cursor. |
onCurrentStepChange | (step) => void | — | Fires on autoplay tick and scrub. |
playing | boolean | false | Whether autoplay is running. |
playSpeed | number | 220 | Milliseconds per autoplay tick. |
maxStep | number | 60 | Maximum cursor step. |
showReadout | boolean | true | Show the Σw² / shrink / peak readout band. |
showGhostOriginal | boolean | true | Dashed outlines of the original weights. |
transition | Transition | SPRINGS.smooth | Spring used for bar transitions. |
className | string | — | Merged onto the root via cn(). |
Accessibility
- The outer element is
role="figure"with a hidden title and anaria-live="polite"summary — screen readers hearStep X of N. Lambda …, learning rate …. Shrink factor …. Penalty Σw² …. Peak |w| …whenever the cursor or λ changes. - Positive bars are
cb-accent, negative bars arecb-warning, ghost outlines are dashedcb-border-strong— three distinct shape / colour signals. - Both range inputs carry an explicit
aria-labeland a visible value readout — arrow keys scrub with screen-reader narration. - The zero baseline is rendered as a thicker
cb-border-strongline to distinguish it from grid ticks. prefers-reduced-motion: reducecollapses every spring to an instant swap and disables autoplay; manual scrubbing still works.
Credits
- Extracted from:
craftingattention(app/src/lessons/primitives/viz/L2WeightDecayViz.tsx). The source was aWidget-chrome lesson primitive withuseWidgetHistory(undo / redo, bookmarks), a four-bookmark preset row (no-reg / light / strong / 20-steps), aModeStriptoggle between explore and a five-round binaryusePredictRoundsquiz, a heuristic narration block, asetIntervalauto-runner with stop-when-converged logic, and a customBAR_SPRINGinline transition. The library version drops the widget chrome, the history / bookmarks, the predict mode, the narration heuristics, and the inline spring — and exposes the underlying primitive every regularisation lesson needs: a signed histogram ofw_0with a closed-form(1 - λ·η)^tshrink, controlled / uncontrolledlambdaandcurrentStep(Radix pattern), aplayingplusplaySpeedconsumer-owned transport, and anSPRINGS.smoothbar transition with an honestprefers-reduced-motionsnap. Sits in ML Viz → Regularization alongsideOverfittingGapViz,RunningStatsViz, andVarianceCompoundViz.