Tensor Parallelism Viz
A diagram of one forward pass through a weight matrix that has been sharded across multiple GPUs. The input vector X feeds every shard at once; each GPU multiplies X by its column slice in parallel; an all-reduce sums the partial results; the final output vector Y appears on the right.
The component walks four phases — idle, compute, allreduce, done — driven either by its own autoplay loop or by the caller (controlled phase, active GPU, and play state).
Installation
npx shadcn@latest add https://craftbits.dev/r/tensor-parallelism-viz.jsonUsage
import { TensorParallelismViz } from "@craft-bits/core";
<TensorParallelismViz numGpus={4} />Autoplay the full forward pass:
<TensorParallelismViz numGpus={4} defaultPlaying />Drive the phase from outside (e.g. wire it to a scrubber):
const [phase, setPhase] = useState<TensorParallelismPhase>("idle");
<TensorParallelismViz
numGpus={4}
phase={phase}
onPhaseChange={setPhase}
activeGpu={1}
/>Understanding the component
- Input vector X. The left column of cells is the input activation. It sits at low opacity while idle and brightens once the forward pass starts so the eye follows the data into the matrix.
- Sharded weight matrix W. The component splits the matrix into
numGpuscolumn shards. Each shard renders in the same accent colour but offsets horizontally by a few pixels so the seam between shards is visible. The data attributedata-shardon each cell lets you target a specific shard from outside CSS. - Compute phase. While the component is in the compute phase, the cells of the currently active shard glow at high opacity and gain a thicker border. Setting
activeGputo-1highlights every shard at once — useful when narrating "this happens on every GPU in parallel". - All-reduce. When more than one GPU is in play, a small all-reduce node appears between the matrix and the output during the all-reduce phase. It picks up the accent colour while reducing, then switches to the success colour once the result is in.
- Output vector Y. On
done, the output column animates in from the right, scaled up from 0.8 to 1 with a tiny stagger per row. - Controlled or uncontrolled. Phase, active GPU, and playing all follow the Radix pattern — pass the controlled prop plus its
onChangecallback for controlled mode, or rely on thedefault*variants for self-driven autoplay.
Variants
A single GPU collapses the layout — no shard seams, no all-reduce node, and the pass jumps straight from compute to done:
<TensorParallelismViz numGpus={1} defaultPlaying />An eight-GPU split with wider matrix:
<TensorParallelismViz numGpus={8} cols={16} defaultPlaying />Pause on a specific phase for a screenshot, with one GPU highlighted:
<TensorParallelismViz numGpus={4} phase="compute" activeGpu={2} />Drop the GPU labels when the surrounding caption already names the shards:
<TensorParallelismViz numGpus={4} showGpuLabels={false} />Props
| Prop | Type | Default | Description |
|---|---|---|---|
numGpus | number | 2 | Number of column shards / GPUs. |
rows | number | 8 | Rows in the weight matrix. |
cols | number | 8 | Columns in the weight matrix. Floored to a multiple of numGpus shards; remainder goes to the last shard. |
phase | TensorParallelismPhase | — | Controlled phase. |
defaultPhase | TensorParallelismPhase | "idle" | Uncontrolled initial phase. |
onPhaseChange | (phase) => void | — | Fires when the phase advances. |
activeGpu | number | — | Controlled active-GPU index during compute. -1 highlights every GPU. |
defaultActiveGpu | number | -1 | Uncontrolled initial active GPU. |
onActiveGpuChange | (idx) => void | — | Fires when the active GPU changes. |
playing | boolean | — | Controlled autoplay state. |
defaultPlaying | boolean | false | Uncontrolled initial autoplay state. |
onPlayingChange | (playing) => void | — | Fires when play / pause flips. |
playSpeed | number | 420 | Milliseconds between phase advances. Floored at 80 ms. |
showGpuLabels | boolean | true | Render the per-shard GPU N labels under the matrix. |
transition | Transition | SPRINGS.smooth | Override for cell-fill / label transitions. |
className | string | — | Merged onto the root via cn(). |
Accessibility
- The figure is
role="figure"with a hidden summary listing GPU count and matrix shape — screen readers hear the configuration whenever props change. - A polite
aria-liveregion announces the current phase and active GPU, so non-sighted users follow the same pass as sighted ones. - The SVG itself is
aria-hidden. Colour is never the only signal: phase chips in the footer carry textual labels (idle,compute,allreduce,done) alongside the active dot. - Motion respects
prefers-reduced-motion: reduce— cell fades collapse to instant swaps and autoplay never starts. data-phaseon the root anddata-shard/data-col-in-shardon each weight cell expose state to CSS without resorting to className toggles.
Credits
- Extracted from:
craftingattention(app/src/lessons/primitives/viz/TensorParallelismViz.tsx). The source wrapped the diagram in a three-modeWidget(Explore / Predict / Challenge) with bookmarks, undo / redo viauseWidgetHistory, score dots, narration that timed itself to a fixed100 / 50 / 25ms latency table, and a stacked timing-comparison bar. The library extract is the pure diagram primitive — input vector, sharded weight matrix, all-reduce node, output vector — driven entirely by props with controlled / uncontrolled and play / pause APIs. Latency comparisons, narration, prediction prompts, and challenge framing belong in lesson code, not in the library primitive.