Gradient Decay Explorer
A bar chart of gradient magnitude per layer for a deep feed-forward network. Each bar is (initScale · activationFactor)^l — the textbook geometric-series approximation of |∂L/∂h_l| during backprop. Slide between the canonical initialiser scales and watch the bars collapse into the vanishing regime or blow past the top into the exploding one.
Gradient decay: ReLU (E[σ'] ≈ 0.5), initialisation scale 1.00. After 16 layers, gradient magnitude reaches the input as 3.1e-5 — vanishing.vanishing · per-layer ×0.50
Gradient magnitude per layerinput |∂L| 3.1e-5
1.00
Customize
Network
16
relu
Initialisation
1.00
Display
Installation
npx shadcn@latest add https://craftbits.dev/r/gradient-decay-explorer.jsonUsage
import { GradientDecayExplorer } from "@craft-bits/core";
<GradientDecayExplorer />Pick a different activation and depth:
<GradientDecayExplorer numLayers={32} activation="sigmoid" />Drive initScale from outside (controlled mode):
const [scale, setScale] = useState(1.414);
<GradientDecayExplorer
initScale={scale}
onInitScaleChange={setScale}
activation="relu"
/>Switch to a linear y-axis to dramatise the collapse:
<GradientDecayExplorer showLogScale={false} />Understanding the component
- The formula on the bars. Each bar shows
(initScale · activationFactor)^l, whereactivationFactoris1,0.5, or0.25for identity / ReLU / sigmoid. Layer0is the loss-side (magnitude1); layernumLayers − 1is the input layer. The chart draws layer0on the right so "the gradient has to travel further to the left" matches the visual. - Log scale is the default for a reason. At depth 16 with
s = 0.5and ReLU, the input-layer magnitude is~1.5e-5— invisible on a linear scale.showLogScale={true}(the default) draws 6 decades of headroom so even the worst vanishing cases stay legible. - Regime colors are derived, not state. When the per-layer factor exceeds
1.05and bars crossy = 1, they paint--cb-error(exploding). When the final magnitude drops below0.01, the small bars paint--cb-warning(vanishing). Otherwise everything is--cb-accent(stable). The regime label under the slider gives the textual diagnosis. SPRINGS.smoothon the bar heights. Dragging the slider springs each bar'sheightandyindependently — the animation is a wave that ripples up the chart.prefers-reduced-motion: reducecollapses it to instant.LabeledSliderforinitScale. The slider lives below the chart; itsvalueis wired to the same controlled / uncontrolled pair as the prop, so both the customize-panel above and a parent state-driver work without conflict.- Activation derivative as the second multiplier. Pair
activation="relu"withinitScale={Math.SQRT2}(the He scale) and the per-layer factor lands at exactly1.0, giving the flat "stable" bar profile every initialiser paper is trying to reproduce.
Props
| Prop | Type | Default | Description |
|---|---|---|---|
numLayers | number | 16 | Network depth — one bar per layer. Clamped to [2, 64]. |
initScale | number | — | Controlled per-layer weight scale. Pair with onInitScaleChange. |
defaultInitScale | number | 1.0 | Uncontrolled initial weight scale. |
onInitScaleChange | (initScale) => void | — | Fires when the user drags the slider. |
activation | 'identity' | 'sigmoid' | 'relu' | 'relu' | Activation whose derivative scales the per-layer factor. |
showLogScale | boolean | true | Plot the y-axis on a log10 scale. |
transition | Transition | SPRINGS.smooth | Spring for bar-height transitions. |
className | string | — | Merged onto the root <div> via cn(). |
Accessibility
- The figure is
role="figure"with anaria-labelledbythat names the chart "Gradient magnitude per layer." - An
aria-live="polite"summary announces the activation, init scale, final magnitude, and the diagnosed regime (stable/vanishing/exploding) whenever any input changes. - The slider is a
LabeledSlider, itself a native<input type="range">— keyboard arrows nudge the scale,Home/Endjump to extremes, and screen readers narrate the value. - Color is never the only signal — the regime label and the per-layer factor are both textual.
prefers-reduced-motion: reducesnaps every bar transition; no per-step easing is shown.
Credits
- Extracted from:
craftingattention(app/src/lessons/primitives/nn/GradientDecayExplorer.tsx). The source was an RNN-time-step gradient curve (σ^t) with a phase-driven narration system, an LSTM comparison overlay, and lesson-specific challenge buttons. Reframed as a depth-wise bar chart of(initScale · activationFactor)^lto teach the same vanishing / exploding intuition for any deep feed-forward network. Stripped the phase / narration system, the LSTM split-view, the SVG-glow annotation pulses, the breathing-pulse observe-state circle, and the imperative path-animationuseEffects. Replaced the dual sliders with a singleLabeledSliderfrom@craft-bits/core/buttons/labeled-sliderand added anactivationprop covering the three textbook cases.