Gradient Splitter

The single picture every backprop lesson needs at a residual junction: an upstream gradient dout arrives at a fork; two arrows leave to the two consumer branches, each carrying dout × weight; the fork's incoming gradient is the sum dA + dB. Defaults model the canonical residual case (weight = 1 on both branches, labels F(x) and x (skip)) — pass non-unit weights to teach mul-style forks, relu-then-fork, or any custom multi-branch split.

F(x): 1.00 × 1.00 = 1.00
x (skip): 1.00 × 1.00 = 1.00

Customize

Preset

scenarioF(x) / x (skip)

Upstream

dout1.00

Branch weights

branchA.weight1.0

branchB.weight1.0

Installation

npx shadcn@latest add https://craftbits.dev/r/gradient-splitter.json

Usage

import { GradientSplitter } from "@craft-bits/core";
 
<GradientSplitter />

Pass an explicit upstream gradient and weights:

<GradientSplitter
  upstreamGradient={0.4}
  branchA={{ weight: 1, label: "F(x)" }}
  branchB={{ weight: 1, label: "x (skip)" }}
/>

Teach a multiplication fork — the weight on one branch is the other operand:

<GradientSplitter
  upstreamGradient={1}
  branchA={{ weight: 3, label: "a (× b)" }}
  branchB={{ weight: 2, label: "b (× a)" }}
/>

Understanding the component

Right → centre → left. The diagram reads strictly right (downstream, where the loss lives) → centre (the fork value x) → left (the two upstream consumer branches). Gradient direction is right to left — same as the way a backward pass actually runs.
One upstream, two downstreams. A single warm arrow arrives at the fork carrying dout = ∂L/∂y. Two cool arrows leave it, one per branch. Each branch arrow is labelled dout × weight = product so the chain rule is visible in the line itself, not buried in a caption.
The fork sums. The fork box shows dA + dB — the additive merge the chain rule prescribes when a value x is consumed by two children. This is the exact identity that makes residual connections so kind to gradients: at worst, the skip path contributes 1 · dout and the gradient never vanishes through that junction.
Defaults model a residual. Out of the box, both weights are 1 and the labels are F(x) and x (skip) — the canonical Pre-/Post-norm Transformer block. Override the weights and labels for any other fork (mul, add, a LayerNorm-then-fork, etc.).
Numbers reinforce the formula. A small two-cell summary under the diagram restates upstream × weight = product per branch so the user can verify the maths without reading the SVG.
Reduced motion is honest. With prefers-reduced-motion: reduce, every arrow draw-in is replaced with an instant render; the figure is still fully expressive — it just doesn't animate.

Props

Prop	Type	Default	Description
`upstreamGradient`	`number`	`1`	The downstream gradient `dout` arriving at the fork.
`branchA`	`{ weight: number; label?: string }`	`{ weight: 1, label: "F(x)" }`	Top-branch configuration. Gradient on this branch is `upstreamGradient × weight`.
`branchB`	`{ weight: number; label?: string }`	`{ weight: 1, label: "x (skip)" }`	Bottom-branch configuration. Gradient is `upstreamGradient × weight`.
`transition`	`Transition`	`SPRINGS.smooth`	Spring for path draw-ins.
`className`	`string`	—	Merged onto the root via `cn()`.

Accessibility

The figure is role="figure" with an aria-labelledby heading and an aria-live="polite" summary — screen readers announce the upstream value, each branch's contribution, and the sum at the fork.
Colour is never the only signal: warm vs cool stroke is reinforced by textual labels on each arrow and by a separate per-branch summary list below the diagram.
All numeric output uses tabular-nums so dynamic values don't visually jitter when they change width.
prefers-reduced-motion: reduce replaces the path-draw animation with an instant render. No autoplay, no orphaned RAF loops.

Credits

Extracted from: craftingattention (app/src/lessons/primitives/viz/GradientSplitter.tsx). The source was a hard-coded op-dispatcher mapping op ∈ { add, mul, pow, exp, relu, log, matmul, sum } to a self/other-gradient pair via a lookup table — a thin chain-rule reference card. The library version drops the op enum and ships the primitive every fork lesson actually needs: two configurable branches, each with its own weight and label, that visualise the chain-rule split for an arbitrary upstream gradient. The original op variants (add, mul, etc.) become one-liner wrappers any consumer can build on top.