Gradient Decay Explorer

A bar chart of gradient magnitude per layer for a deep feed-forward network. Each bar is (initScale · activationFactor)^l — the textbook geometric-series approximation of |∂L/∂h_l| during backprop. Slide between the canonical initialiser scales and watch the bars collapse into the vanishing regime or blow past the top into the exploding one.

Gradient magnitude per layerinput |∂L| 3.1e-5

init scale (ReLU (E[σ'] ≈ 0.5))1.00

vanishing · per-layer ×0.50

Customize

Network

num layers16

activationrelu

Initialisation

init scale1.00

Display

log y-axis

Installation

npx shadcn@latest add https://craftbits.dev/r/gradient-decay-explorer.json

Usage

import { GradientDecayExplorer } from "@craft-bits/core";
 
<GradientDecayExplorer />

Pick a different activation and depth:

<GradientDecayExplorer numLayers={32} activation="sigmoid" />

Drive initScale from outside (controlled mode):

const [scale, setScale] = useState(1.414);
 
<GradientDecayExplorer
  initScale={scale}
  onInitScaleChange={setScale}
  activation="relu"
/>

Switch to a linear y-axis to dramatise the collapse:

<GradientDecayExplorer showLogScale={false} />

Understanding the component

The formula on the bars. Each bar shows (initScale · activationFactor)^l, where activationFactor is 1, 0.5, or 0.25 for identity / ReLU / sigmoid. Layer 0 is the loss-side (magnitude 1); layer numLayers − 1 is the input layer. The chart draws layer 0 on the right so "the gradient has to travel further to the left" matches the visual.
Log scale is the default for a reason. At depth 16 with s = 0.5 and ReLU, the input-layer magnitude is ~1.5e-5 — invisible on a linear scale. showLogScale={true} (the default) draws 6 decades of headroom so even the worst vanishing cases stay legible.
Regime colors are derived, not state. When the per-layer factor exceeds 1.05 and bars cross y = 1, they paint --cb-error (exploding). When the final magnitude drops below 0.01, the small bars paint --cb-warning (vanishing). Otherwise everything is --cb-accent (stable). The regime label under the slider gives the textual diagnosis.
SPRINGS.smooth on the bar heights. Dragging the slider springs each bar's height and y independently — the animation is a wave that ripples up the chart. prefers-reduced-motion: reduce collapses it to instant.
LabeledSlider for initScale. The slider lives below the chart; its value is wired to the same controlled / uncontrolled pair as the prop, so both the customize-panel above and a parent state-driver work without conflict.
Activation derivative as the second multiplier. Pair activation="relu" with initScale={Math.SQRT2} (the He scale) and the per-layer factor lands at exactly 1.0, giving the flat "stable" bar profile every initialiser paper is trying to reproduce.

Props

Prop	Type	Default	Description
`numLayers`	`number`	`16`	Network depth — one bar per layer. Clamped to `[2, 64]`.
`initScale`	`number`	—	Controlled per-layer weight scale. Pair with `onInitScaleChange`.
`defaultInitScale`	`number`	`1.0`	Uncontrolled initial weight scale.
`onInitScaleChange`	`(initScale) => void`	—	Fires when the user drags the slider.
`activation`	`'identity' \| 'sigmoid' \| 'relu'`	`'relu'`	Activation whose derivative scales the per-layer factor.
`showLogScale`	`boolean`	`true`	Plot the y-axis on a log10 scale.
`transition`	`Transition`	`SPRINGS.smooth`	Spring for bar-height transitions.
`className`	`string`	—	Merged onto the root `<div>` via `cn()`.

Accessibility

The figure is role="figure" with an aria-labelledby that names the chart "Gradient magnitude per layer."
An aria-live="polite" summary announces the activation, init scale, final magnitude, and the diagnosed regime (stable / vanishing / exploding) whenever any input changes.
The slider is a LabeledSlider, itself a native <input type="range"> — keyboard arrows nudge the scale, Home / End jump to extremes, and screen readers narrate the value.
Color is never the only signal — the regime label and the per-layer factor are both textual.
prefers-reduced-motion: reduce snaps every bar transition; no per-step easing is shown.

Credits

Extracted from: craftingattention (app/src/lessons/primitives/nn/GradientDecayExplorer.tsx). The source was an RNN-time-step gradient curve (σ^t) with a phase-driven narration system, an LSTM comparison overlay, and lesson-specific challenge buttons. Reframed as a depth-wise bar chart of (initScale · activationFactor)^l to teach the same vanishing / exploding intuition for any deep feed-forward network. Stripped the phase / narration system, the LSTM split-view, the SVG-glow annotation pulses, the breathing-pulse observe-state circle, and the imperative path-animation useEffects. Replaced the dual sliders with a single LabeledSlider from @craft-bits/core/buttons/labeled-slider and added an activation prop covering the three textbook cases.