Gradient Decay Explorer

A bar chart of gradient magnitude per layer for a deep feed-forward network. Each bar is (initScale · activationFactor)^l — the textbook geometric-series approximation of |∂L/∂h_l| during backprop. Slide between the canonical initialiser scales and watch the bars collapse into the vanishing regime or blow past the top into the exploding one.

Gradient decay: ReLU (E[σ'] ≈ 0.5), initialisation scale 1.00. After 16 layers, gradient magnitude reaches the input as 3.1e-5 — vanishing.
Gradient magnitude per layerinput |∂L| 3.1e-5
1.00
vanishing · per-layer ×0.50
Customize
Network
16
relu
Initialisation
1.00
Display

Installation

npx shadcn@latest add https://craftbits.dev/r/gradient-decay-explorer.json

Usage

import { GradientDecayExplorer } from "@craft-bits/core";
 
<GradientDecayExplorer />

Pick a different activation and depth:

<GradientDecayExplorer numLayers={32} activation="sigmoid" />

Drive initScale from outside (controlled mode):

const [scale, setScale] = useState(1.414);
 
<GradientDecayExplorer
  initScale={scale}
  onInitScaleChange={setScale}
  activation="relu"
/>

Switch to a linear y-axis to dramatise the collapse:

<GradientDecayExplorer showLogScale={false} />

Understanding the component

  1. The formula on the bars. Each bar shows (initScale · activationFactor)^l, where activationFactor is 1, 0.5, or 0.25 for identity / ReLU / sigmoid. Layer 0 is the loss-side (magnitude 1); layer numLayers − 1 is the input layer. The chart draws layer 0 on the right so "the gradient has to travel further to the left" matches the visual.
  2. Log scale is the default for a reason. At depth 16 with s = 0.5 and ReLU, the input-layer magnitude is ~1.5e-5 — invisible on a linear scale. showLogScale={true} (the default) draws 6 decades of headroom so even the worst vanishing cases stay legible.
  3. Regime colors are derived, not state. When the per-layer factor exceeds 1.05 and bars cross y = 1, they paint --cb-error (exploding). When the final magnitude drops below 0.01, the small bars paint --cb-warning (vanishing). Otherwise everything is --cb-accent (stable). The regime label under the slider gives the textual diagnosis.
  4. SPRINGS.smooth on the bar heights. Dragging the slider springs each bar's height and y independently — the animation is a wave that ripples up the chart. prefers-reduced-motion: reduce collapses it to instant.
  5. LabeledSlider for initScale. The slider lives below the chart; its value is wired to the same controlled / uncontrolled pair as the prop, so both the customize-panel above and a parent state-driver work without conflict.
  6. Activation derivative as the second multiplier. Pair activation="relu" with initScale={Math.SQRT2} (the He scale) and the per-layer factor lands at exactly 1.0, giving the flat "stable" bar profile every initialiser paper is trying to reproduce.

Props

PropTypeDefaultDescription
numLayersnumber16Network depth — one bar per layer. Clamped to [2, 64].
initScalenumberControlled per-layer weight scale. Pair with onInitScaleChange.
defaultInitScalenumber1.0Uncontrolled initial weight scale.
onInitScaleChange(initScale) => voidFires when the user drags the slider.
activation'identity' | 'sigmoid' | 'relu''relu'Activation whose derivative scales the per-layer factor.
showLogScalebooleantruePlot the y-axis on a log10 scale.
transitionTransitionSPRINGS.smoothSpring for bar-height transitions.
classNamestringMerged onto the root <div> via cn().

Accessibility

  • The figure is role="figure" with an aria-labelledby that names the chart "Gradient magnitude per layer."
  • An aria-live="polite" summary announces the activation, init scale, final magnitude, and the diagnosed regime (stable / vanishing / exploding) whenever any input changes.
  • The slider is a LabeledSlider, itself a native <input type="range"> — keyboard arrows nudge the scale, Home / End jump to extremes, and screen readers narrate the value.
  • Color is never the only signal — the regime label and the per-layer factor are both textual.
  • prefers-reduced-motion: reduce snaps every bar transition; no per-step easing is shown.

Credits

  • Extracted from: craftingattention (app/src/lessons/primitives/nn/GradientDecayExplorer.tsx). The source was an RNN-time-step gradient curve (σ^t) with a phase-driven narration system, an LSTM comparison overlay, and lesson-specific challenge buttons. Reframed as a depth-wise bar chart of (initScale · activationFactor)^l to teach the same vanishing / exploding intuition for any deep feed-forward network. Stripped the phase / narration system, the LSTM split-view, the SVG-glow annotation pulses, the breathing-pulse observe-state circle, and the imperative path-animation useEffects. Replaced the dual sliders with a single LabeledSlider from @craft-bits/core/buttons/labeled-slider and added an activation prop covering the three textbook cases.