Weight Decay Toggle

A pocket-sized AdamW demonstrator. Four parameter bars sit on a baseline. The learner taps Step to advance one training step, then flips Weight decay on or off to see the difference. Under decay, every weight is multiplied by (1 − lr·λ) after the gradient step — a shrinkage that pulls each bar a tiny fraction closer to zero, regardless of which direction the loss gradient is pointing. The narration walks through four phases (observe, growing, toggle, insight) as the learner accumulates steps.

The teaching point sits in the insight phase: weight decay is a regulariser, not an optimiser. It doesn't care about the loss — it just keeps the weights small. AdamW applies the shrinkage after the Adam step, bypassing the adaptive scaling that broke L2 regularisation in plain Adam.

Weight decay toggle for AdamW.2.005.008.003.00
Step 0. Weights: w₁=2.00, w₂=5.00, w₃=8.00, w₄=3.00. Loss: 4.20. Weight decay off.

Four parameters training. Without weight decay, they grow as large as the loss landscape demands.

Customize
Decay
0.10
0.0010
Phase thresholds
5
10

Installation

npx shadcn@latest add https://craftbits.dev/r/weight-decay-toggle.json

Usage

import { WeightDecayToggle } from "@craft-bits/viz/weight-decay-toggle";
 
<WeightDecayToggle />

Override the starting weights so a specific bar dominates:

<WeightDecayToggle defaultWeights={[1, 3, 9, 2]} />

Crank λ to make the decay visible after a single step:

<WeightDecayToggle lambda={5} />

Subscribe to step events for an external trace:

<WeightDecayToggle
  onStep={(s) => {
    /* read s.weights, s.loss, s.step, s.decayOn */
  }}
/>

Understanding the component

  1. Four bars. A 480 × 320 SVG plots |w| against bar index. Each bar's height is (|w| / 10) × plot_height, so bars rise straight from the baseline.
  2. Step. The action button runs one gradient-descent step on every parameter — w ← w − lr × ∂L/∂w — using constant simulated gradients so the magnitude of motion is identical on every step.
  3. Decay overlay. When weight decay is on, the step also multiplies each weight by (1 − lr·λ). The reduction is small per step but accumulates — the dashed ghost bar inside each rect shows the post-decay target, and a dashed arrow under the baseline marks the direction of the pull.
  4. Phase machine. observe while idle. growing after a handful of un-decayed steps. toggle the first time the learner activates decay. insight after enough decayed steps for the conceptual punchline to land.
  5. Arithmetic annotation. Whenever decay is on, the largest-magnitude weight shows its own decay arithmetic so the maths stays visible instead of hiding behind the animation.
  6. Imperative animation. Bars and the loss readout animate via motion's animate() driving raw SVG attributes — no re-render per frame.
  7. Reduced motion. Under prefers-reduced-motion: reduce, every animation snaps to its end state.

Props

PropTypeDefaultDescription
defaultWeightsreadonly number[][2, 5, 8, 3]Starting magnitudes. Reset returns here.
paramNamesreadonly string[]["w₁", "w₂", "w₃", "w₄"]Display labels under each bar.
learningRatenumber0.001The lr term in the update.
lambdanumber0.1Weight-decay coefficient λ. Per-step shrinkage is (1 − lr·λ).
gradientsreadonly number[][0.3, -0.2, 0.15, -0.35]Simulated ∂L/∂wᵢ. Kept constant for teaching clarity.
insightAfterDecayedStepsnumber10Steps after which the narration flips to insight.
growingAfterStepsnumber5Un-decayed steps after which the narration flips to growing.
transitionTransitionSPRINGS.snapOverride the per-step bar transition.
onStep(step) => voidFires after each step.
onDecayToggle(on) => voidFires when the user toggles weight decay.
onReset() => voidFires when the user clicks Reset.
classNamestringMerged onto the root via cn().

Accessibility

  • The plot SVG is role="img" with an aria-label summarising the parameter count, step, decay state, and loss.
  • The decay toggle reports state via aria-pressed, so screen readers announce the change immediately.
  • A live region (aria-live="polite") below the buttons announces the step number, each parameter's magnitude, the loss, and whether decay is on.
  • The narration paragraph is also aria-live="polite" and reads as plain prose; it is the canonical explanation for each phase.
  • Colour is never the only signal — the decay state shows as text inside its button, in the badge, in the narration, and in the live region.
  • Motion respects prefers-reduced-motion: reduce — bars, labels, the loss readout, and reset all snap to their end states.

Credits

  • Extracted from: craftingattention (app/src/lessons/primitives/math/WeightDecayToggle.tsx). The source pulled SvgLabel and ChallengeBtn from the lesson chrome, ran on the per-track lesson palette tokens, and inlined ad-hoc spring names into the imperative animations. The viz extract drops the lesson chrome, remaps every colour to var(--cb-*) semantic tokens so consumer themes repaint freely, re-keys the bar transition to the canonical SPRINGS.snap, and exposes the previously hard-coded weights, gradients, learning rate, and λ as props.