Ring All-Reduce Viz

A visualisation of the ring all-reduce collective used by NCCL, Horovod, and every modern data-parallel trainer. N GPUs sit on a ring; each GPU's gradient tensor is split into N chunks. At each of the 2(N-1) steps every GPU sends one chunk clockwise to its neighbour — first to scatter partial sums, then to broadcast the fully-reduced result.

Reduce-Scatter. Step 1 of 3. Each GPU sends one chunk clockwise to its neighbour.
Ring all-reducestep 01 / 06
Reduce-Scatter(1/3)
GPU 0GPU 1GPU 2GPU 3clockwise
Customize
Ring
4
0
Playback
800ms

Installation

npx shadcn@latest add https://craftbits.dev/r/ring-all-reduce-viz.json

Usage

import { RingAllReduceViz } from "@craft-bits/core";
 
<RingAllReduceViz />

Drive playback from outside the component:

const [step, setStep] = useState(0);
 
<RingAllReduceViz
  currentStep={step}
  onCurrentStepChange={setStep}
  playing={false}
/>

Swap the GPU count to teach scaling:

<RingAllReduceViz numGpus={8} defaultPlaying />

Understanding the component

  1. Two phases, one routing rule. Steps 0..N-2 are reduce-scatter; steps N-1..2N-3 are all-gather. The routing rule is the same across both — at step s, GPU i sends chunk (i - s) mod N clockwise to GPU (i + 1) mod N. Only the semantic label changes.
  2. GPUs sit on a polar circle. Each GPU is positioned at angle 2πi/N (starting at the top so GPU 0 is north). Chunk slices are drawn as donut sectors inside each GPU body; the active chunk for the current step is filled with --cb-accent, the rest fade to the muted border tone.
  3. Clockwise edges with arrowheads. Each pair of adjacent GPUs is connected by a curved <path> with an SVG arrow marker — the direction reads at a glance, even before the step ticks.
  4. Travelling-chunk dots. A small accent dot sits 55% along each edge to indicate "this is what's in flight right now". They re-enter on each step change so the eye catches the motion without a continuous animation.
  5. SPRINGS.smooth for chunk highlights. Opacity transitions on the chunk slices use the library's default smooth spring; the markers and edges share the same transition so the whole ring settles together.
  6. Reduced-motion fallback. With prefers-reduced-motion: reduce, autoplay is forced off and chunk markers render fully visible on mount.

Props

PropTypeDefaultDescription
numGpusnumber4Number of GPUs in the ring (clamped to 2..16).
currentStepnumberControlled step index. Pair with onCurrentStepChange.
defaultCurrentStepnumber0Uncontrolled initial step.
onCurrentStepChange(step) => voidFires on autoplay tick and manual scrub.
playingbooleanControlled play state. Pair with onPlayingChange.
defaultPlayingbooleanfalseUncontrolled initial play state.
onPlayingChange(playing) => voidFires when play / pause flips.
playSpeednumber800Milliseconds between step advances.
showPhasebooleantrueRender the "Reduce-Scatter" / "All-Gather" label.
transitionTransitionSPRINGS.smoothSpring for chunk-highlight transitions.
classNamestringMerged onto the root <div> via cn().

Accessibility

  • The root is role="figure" with aria-labelledby pointing at the "Ring all-reduce" heading and aria-describedby at a visually-hidden aria-live="polite" summary.
  • The summary announces the current phase and step on every change.
  • The inner SVG has role="img" and an aria-label describing the ring topology.
  • The play / pause button uses aria-pressed; the label flips between "Play ring all-reduce" and "Pause ring all-reduce".
  • The scrubber is a native <input type="range"> with an aria-label so keyboard arrows nudge the cursor and screen readers narrate the value.
  • Colour is never the only signal — the phase label, the step counter, and the GPU labels are all textual.
  • prefers-reduced-motion: reduce disables autoplay and renders chunk markers fully visible on mount.

Credits

  • Extracted from: craftingattention (app/src/lessons/primitives/systems/RingAllReduceViz.tsx). Stripped the four-GPU hardcoded layout, the per-source colour palette, the chunk-merging Set arithmetic, the stats panel, the transfer log, and the keyboard handler. Generalised to arbitrary numGpus, replaced the chunk-merging state machine with a pure step-indexed routing function, and switched from inline oklch() palette colours to the shared --cb-accent / --cb-border-strong tokens.