Gradient Splitter
The single picture every backprop lesson needs at a residual junction: an upstream gradient dout arrives at a fork; two arrows leave to the two consumer branches, each carrying dout × weight; the fork's incoming gradient is the sum dA + dB. Defaults model the canonical residual case (weight = 1 on both branches, labels F(x) and x (skip)) — pass non-unit weights to teach mul-style forks, relu-then-fork, or any custom multi-branch split.
Gradient splitter: upstream gradient flowing back through a fork into two branches.Upstream gradient 1.00 splits at the fork. Branch F(x) receives 1.00 times 1.00 equals 1.00. Branch x (skip) receives 1.00 times 1.00 equals 1.00. The fork's incoming chain-rule sum is 2.00.
- F(x)
- 1.00 × 1.00 = 1.00
- x (skip)
- 1.00 × 1.00 = 1.00
Customize
Preset
F(x) / x (skip)
Upstream
1.00
Branch weights
1.0
1.0
Installation
npx shadcn@latest add https://craftbits.dev/r/gradient-splitter.jsonUsage
import { GradientSplitter } from "@craft-bits/core";
<GradientSplitter />Pass an explicit upstream gradient and weights:
<GradientSplitter
upstreamGradient={0.4}
branchA={{ weight: 1, label: "F(x)" }}
branchB={{ weight: 1, label: "x (skip)" }}
/>Teach a multiplication fork — the weight on one branch is the other operand:
<GradientSplitter
upstreamGradient={1}
branchA={{ weight: 3, label: "a (× b)" }}
branchB={{ weight: 2, label: "b (× a)" }}
/>Understanding the component
- Right → centre → left. The diagram reads strictly right (downstream, where the loss lives) → centre (the fork value
x) → left (the two upstream consumer branches). Gradient direction is right to left — same as the way a backward pass actually runs. - One upstream, two downstreams. A single warm arrow arrives at the fork carrying
dout = ∂L/∂y. Two cool arrows leave it, one per branch. Each branch arrow is labelleddout × weight = productso the chain rule is visible in the line itself, not buried in a caption. - The fork sums. The fork box shows
dA + dB— the additive merge the chain rule prescribes when a valuexis consumed by two children. This is the exact identity that makes residual connections so kind to gradients: at worst, the skip path contributes1 · doutand the gradient never vanishes through that junction. - Defaults model a residual. Out of the box, both weights are
1and the labels areF(x)andx (skip)— the canonical Pre-/Post-norm Transformer block. Override the weights and labels for any other fork (mul,add, aLayerNorm-then-fork, etc.). - Numbers reinforce the formula. A small two-cell summary under the diagram restates
upstream × weight = productper branch so the user can verify the maths without reading the SVG. - Reduced motion is honest. With
prefers-reduced-motion: reduce, every arrow draw-in is replaced with an instant render; the figure is still fully expressive — it just doesn't animate.
Props
| Prop | Type | Default | Description |
|---|---|---|---|
upstreamGradient | number | 1 | The downstream gradient dout arriving at the fork. |
branchA | { weight: number; label?: string } | { weight: 1, label: "F(x)" } | Top-branch configuration. Gradient on this branch is upstreamGradient × weight. |
branchB | { weight: number; label?: string } | { weight: 1, label: "x (skip)" } | Bottom-branch configuration. Gradient is upstreamGradient × weight. |
transition | Transition | SPRINGS.smooth | Spring for path draw-ins. |
className | string | — | Merged onto the root via cn(). |
Accessibility
- The figure is
role="figure"with anaria-labelledbyheading and anaria-live="polite"summary — screen readers announce the upstream value, each branch's contribution, and the sum at the fork. - Colour is never the only signal: warm vs cool stroke is reinforced by textual labels on each arrow and by a separate per-branch summary list below the diagram.
- All numeric output uses
tabular-numsso dynamic values don't visually jitter when they change width. prefers-reduced-motion: reducereplaces the path-draw animation with an instant render. No autoplay, no orphaned RAF loops.
Credits
- Extracted from:
craftingattention(app/src/lessons/primitives/viz/GradientSplitter.tsx). The source was a hard-coded op-dispatcher mappingop ∈ { add, mul, pow, exp, relu, log, matmul, sum }to a self/other-gradient pair via a lookup table — a thin chain-rule reference card. The library version drops the op enum and ships the primitive every fork lesson actually needs: two configurable branches, each with its ownweightandlabel, that visualise the chain-rule split for an arbitrary upstream gradient. The original op variants (add,mul, etc.) become one-liner wrappers any consumer can build on top.