Gradient Splitter

The single picture every backprop lesson needs at a residual junction: an upstream gradient dout arrives at a fork; two arrows leave to the two consumer branches, each carrying dout × weight; the fork's incoming gradient is the sum dA + dB. Defaults model the canonical residual case (weight = 1 on both branches, labels F(x) and x (skip)) — pass non-unit weights to teach mul-style forks, relu-then-fork, or any custom multi-branch split.

Gradient splitter: upstream gradient flowing back through a fork into two branches.Upstream gradient 1.00 splits at the fork. Branch F(x) receives 1.00 times 1.00 equals 1.00. Branch x (skip) receives 1.00 times 1.00 equals 1.00. The fork's incoming chain-rule sum is 2.00.
dout = 1.00downstream∂L/∂yfork (x)dA + dBdout × 1.00 = 1.00dout × 1.00 = 1.00F(x)1.00x (skip)1.00
F(x)
1.00 × 1.00 = 1.00
x (skip)
1.00 × 1.00 = 1.00
Customize
Preset
F(x) / x (skip)
Upstream
1.00
Branch weights
1.0
1.0

Installation

npx shadcn@latest add https://craftbits.dev/r/gradient-splitter.json

Usage

import { GradientSplitter } from "@craft-bits/core";
 
<GradientSplitter />

Pass an explicit upstream gradient and weights:

<GradientSplitter
  upstreamGradient={0.4}
  branchA={{ weight: 1, label: "F(x)" }}
  branchB={{ weight: 1, label: "x (skip)" }}
/>

Teach a multiplication fork — the weight on one branch is the other operand:

<GradientSplitter
  upstreamGradient={1}
  branchA={{ weight: 3, label: "a (× b)" }}
  branchB={{ weight: 2, label: "b (× a)" }}
/>

Understanding the component

  1. Right → centre → left. The diagram reads strictly right (downstream, where the loss lives) → centre (the fork value x) → left (the two upstream consumer branches). Gradient direction is right to left — same as the way a backward pass actually runs.
  2. One upstream, two downstreams. A single warm arrow arrives at the fork carrying dout = ∂L/∂y. Two cool arrows leave it, one per branch. Each branch arrow is labelled dout × weight = product so the chain rule is visible in the line itself, not buried in a caption.
  3. The fork sums. The fork box shows dA + dB — the additive merge the chain rule prescribes when a value x is consumed by two children. This is the exact identity that makes residual connections so kind to gradients: at worst, the skip path contributes 1 · dout and the gradient never vanishes through that junction.
  4. Defaults model a residual. Out of the box, both weights are 1 and the labels are F(x) and x (skip) — the canonical Pre-/Post-norm Transformer block. Override the weights and labels for any other fork (mul, add, a LayerNorm-then-fork, etc.).
  5. Numbers reinforce the formula. A small two-cell summary under the diagram restates upstream × weight = product per branch so the user can verify the maths without reading the SVG.
  6. Reduced motion is honest. With prefers-reduced-motion: reduce, every arrow draw-in is replaced with an instant render; the figure is still fully expressive — it just doesn't animate.

Props

PropTypeDefaultDescription
upstreamGradientnumber1The downstream gradient dout arriving at the fork.
branchA{ weight: number; label?: string }{ weight: 1, label: "F(x)" }Top-branch configuration. Gradient on this branch is upstreamGradient × weight.
branchB{ weight: number; label?: string }{ weight: 1, label: "x (skip)" }Bottom-branch configuration. Gradient is upstreamGradient × weight.
transitionTransitionSPRINGS.smoothSpring for path draw-ins.
classNamestringMerged onto the root via cn().

Accessibility

  • The figure is role="figure" with an aria-labelledby heading and an aria-live="polite" summary — screen readers announce the upstream value, each branch's contribution, and the sum at the fork.
  • Colour is never the only signal: warm vs cool stroke is reinforced by textual labels on each arrow and by a separate per-branch summary list below the diagram.
  • All numeric output uses tabular-nums so dynamic values don't visually jitter when they change width.
  • prefers-reduced-motion: reduce replaces the path-draw animation with an instant render. No autoplay, no orphaned RAF loops.

Credits

  • Extracted from: craftingattention (app/src/lessons/primitives/viz/GradientSplitter.tsx). The source was a hard-coded op-dispatcher mapping op ∈ { add, mul, pow, exp, relu, log, matmul, sum } to a self/other-gradient pair via a lookup table — a thin chain-rule reference card. The library version drops the op enum and ships the primitive every fork lesson actually needs: two configurable branches, each with its own weight and label, that visualise the chain-rule split for an arbitrary upstream gradient. The original op variants (add, mul, etc.) become one-liner wrappers any consumer can build on top.