Ring All-Reduce Viz
A visualisation of the ring all-reduce collective used by NCCL, Horovod, and every modern data-parallel trainer. N GPUs sit on a ring; each GPU's gradient tensor is split into N chunks. At each of the 2(N-1) steps every GPU sends one chunk clockwise to its neighbour — first to scatter partial sums, then to broadcast the fully-reduced result.
Reduce-Scatter. Step 1 of 3. Each GPU sends one chunk clockwise to its neighbour.
Ring all-reducestep 01 / 06
Reduce-Scatter(1/3)
Customize
Ring
4
0
Playback
800ms
Installation
npx shadcn@latest add https://craftbits.dev/r/ring-all-reduce-viz.jsonUsage
import { RingAllReduceViz } from "@craft-bits/core";
<RingAllReduceViz />Drive playback from outside the component:
const [step, setStep] = useState(0);
<RingAllReduceViz
currentStep={step}
onCurrentStepChange={setStep}
playing={false}
/>Swap the GPU count to teach scaling:
<RingAllReduceViz numGpus={8} defaultPlaying />Understanding the component
- Two phases, one routing rule. Steps
0..N-2are reduce-scatter; stepsN-1..2N-3are all-gather. The routing rule is the same across both — at steps, GPUisends chunk(i - s) mod Nclockwise to GPU(i + 1) mod N. Only the semantic label changes. - GPUs sit on a polar circle. Each GPU is positioned at angle
2πi/N(starting at the top so GPU 0 is north). Chunk slices are drawn as donut sectors inside each GPU body; the active chunk for the current step is filled with--cb-accent, the rest fade to the muted border tone. - Clockwise edges with arrowheads. Each pair of adjacent GPUs is connected by a curved
<path>with an SVG arrow marker — the direction reads at a glance, even before the step ticks. - Travelling-chunk dots. A small accent dot sits 55% along each edge to indicate "this is what's in flight right now". They re-enter on each step change so the eye catches the motion without a continuous animation.
SPRINGS.smoothfor chunk highlights. Opacity transitions on the chunk slices use the library's default smooth spring; the markers and edges share the same transition so the whole ring settles together.- Reduced-motion fallback. With
prefers-reduced-motion: reduce, autoplay is forced off and chunk markers render fully visible on mount.
Props
| Prop | Type | Default | Description |
|---|---|---|---|
numGpus | number | 4 | Number of GPUs in the ring (clamped to 2..16). |
currentStep | number | — | Controlled step index. Pair with onCurrentStepChange. |
defaultCurrentStep | number | 0 | Uncontrolled initial step. |
onCurrentStepChange | (step) => void | — | Fires on autoplay tick and manual scrub. |
playing | boolean | — | Controlled play state. Pair with onPlayingChange. |
defaultPlaying | boolean | false | Uncontrolled initial play state. |
onPlayingChange | (playing) => void | — | Fires when play / pause flips. |
playSpeed | number | 800 | Milliseconds between step advances. |
showPhase | boolean | true | Render the "Reduce-Scatter" / "All-Gather" label. |
transition | Transition | SPRINGS.smooth | Spring for chunk-highlight transitions. |
className | string | — | Merged onto the root <div> via cn(). |
Accessibility
- The root is
role="figure"witharia-labelledbypointing at the "Ring all-reduce" heading andaria-describedbyat a visually-hiddenaria-live="polite"summary. - The summary announces the current phase and step on every change.
- The inner SVG has
role="img"and anaria-labeldescribing the ring topology. - The play / pause button uses
aria-pressed; the label flips between "Play ring all-reduce" and "Pause ring all-reduce". - The scrubber is a native
<input type="range">with anaria-labelso keyboard arrows nudge the cursor and screen readers narrate the value. - Colour is never the only signal — the phase label, the step counter, and the GPU labels are all textual.
prefers-reduced-motion: reducedisables autoplay and renders chunk markers fully visible on mount.
Credits
- Extracted from:
craftingattention(app/src/lessons/primitives/systems/RingAllReduceViz.tsx). Stripped the four-GPU hardcoded layout, the per-source colour palette, the chunk-merging Set arithmetic, the stats panel, the transfer log, and the keyboard handler. Generalised to arbitrarynumGpus, replaced the chunk-merging state machine with a pure step-indexed routing function, and switched from inlineoklch()palette colours to the shared--cb-accent/--cb-border-strongtokens.