HBM Traffic Viz

A per-operation bar chart of memory traffic between HBM and SRAM on a GPU. Each operation row gets two bars at the same scale — reads on the left, writes on the right — so the user reads volume by length. A footer totals reads, writes, and (optionally) the bandwidth-bound compute time t = (read + write) / bandwidth, which is the latency lower bound for a memory-bound kernel. Pairs naturally with FlashAttentionViz to teach why tiling wins: it minimises this exact number.

HBM traffic. 4 operations. Total read 17.2 GB, total write 81.9 MB, bandwidth-bound time 11.1 ms at 1555 GB per second.
HBM traffic1555 GB/s
  • weights load10.3 ms
  • KV load772 µs
  • output write25.7 µs
  • KV write25.7 µs
total17.2 GB81.9 MB11.1 ms
Customize
Volume
16 GB
1.2 GB
GPU
1555 GB/s

Installation

npx shadcn@latest add https://craftbits.dev/r/hbm-traffic-viz.json

Usage

import { HBMTrafficViz } from "@craft-bits/core";
 
<HBMTrafficViz
  operations={[
    { id: "weights",  label: "weights load", readGb: 16,  writeGb: 0 },
    { id: "kv-load",  label: "KV load",      readGb: 1.2, writeGb: 0 },
    { id: "output",   label: "output write", readGb: 0,   writeGb: 0.04 },
    { id: "kv-write", label: "KV write",     readGb: 0,   writeGb: 0.04 },
  ]}
  bandwidthGbPerSec={1555}
  showTime
/>

Hide the time column when teaching pure volume:

<HBMTrafficViz operations={ops} showTime={false} />

Compare two kernels side-by-side by feeding two different operations arrays:

<div className="grid grid-cols-2 gap-4">
  <HBMTrafficViz operations={standardOps} bandwidthGbPerSec={1555} />
  <HBMTrafficViz operations={flashOps}    bandwidthGbPerSec={1555} />
</div>

Understanding the component

  1. One bar pair per operation. Each row has two bars at the same scale — reads on the left (anchored to the right edge so the bar grows toward the central gutter) and writes on the right (anchored to the left edge so the bar grows away). The pairing makes asymmetric rows (load-only or store-only) read at a glance.
  2. Shared normaliser. Every bar width divides by the largest of all readGb and writeGb across every operation. A write row never balloons to 100% just because no read row exists; comparison preserves real proportions across rows and across the read / write axis.
  3. Bandwidth-bound time. When showTime is on, each row shows (readGb + writeGb) / bandwidthGbPerSec in ms. The footer sums to give the kernel-level lower bound. The default 1555 GB/s matches an A100 80 GB — pass H100 (3350), MI300X (5300), or your own measured peak as needed.
  4. SPRINGS.smooth everywhere. Bar-width changes animate with the canonical smooth spring; prefers-reduced-motion: reduce collapses every spring to an instant swap.
  5. Pure presentation. The component does no GPU spec lookups, no operation inference, no formula display — it plots what you feed it. Build dataset transforms (Llama-3-8B at fp16 → ops list) in your lesson, then hand the result here.

Props

PropTypeDefaultDescription
operationsHBMTrafficVizOperation[]Rows to render, in display order.
bandwidthGbPerSecnumber1555Peak HBM bandwidth in GB/s. A100 80 GB by default.
showTimebooleantrueShow per-row and total bandwidth-bound times.
transitionTransitionSPRINGS.smoothSpring used for bar-width transitions.
classNamestringMerged onto the root via cn().

HBMTrafficVizOperation

FieldTypeDescription
idstringStable React key.
labelstringShort label rendered to the left of the bars.
readGbnumberGigabytes read from HBM by this operation.
writeGbnumberGigabytes written to HBM by this operation.

Accessibility

  • The figure is role="figure" with a hidden summary listing operation count, total reads, total writes, and the bandwidth-bound time — screen readers hear the headline whenever props change.
  • The chart axis is labelled textually ("read" / "write" / "time") above the rows; colour is never the only signal.
  • Bar widths animate via motion/react; prefers-reduced-motion: reduce collapses every transition to an instant swap.
  • All numeric readouts use font-variant-numeric: tabular-nums so values do not reflow when they update.

Credits

  • Extracted from: craftingattention (app/src/lessons/primitives/nn/HBMTrafficViz.tsx). Stripped the four-phase narration state machine (observe / standard-done / flash-done / insight), the hard-coded Standard-vs-Flash two-column SVG comparison with animated shuttle rects, the SEQ_LENGTHS = [512, 1024, 2048, 4096, 8192] slider, the Play / Replay button driving imperative motion/animate timelines, and the head_dim = 64, fp16 hard-coded formulae. The library extract is the pure plotting primitive — an operations array in, a normalised read / write bar chart with optional bandwidth time out — driven entirely by props.