Checkpoint Timeline Viz

An interactive visualisation of the checkpoint-interval tradeoff in long training runs. Three sliders (interval, size, write speed) feed a derived I/O-overhead readout; clicking Simulate failure runs the timeline forward, pausing at each checkpoint, then crashes at a random step and reveals how much work was lost between the last save and the failure.

Tighter intervals cap the maximum steps_lost at the cost of more I/O overhead. Larger checkpoints lengthen each write and push overhead up faster. Faster write speed (NVMe vs SATA) shortens each pause; the slider lets the visitor feel the difference.

Checkpoints every 500 steps balance I/O cost (100.0% overhead) against recovery risk. Click "Simulate failure" to see where a crash would land.
Checkpoint every500 steps
Checkpoint size15 GB
Write speed3 GB/s

Checkpoints every 500 steps balance I/O cost (100.0% overhead) against recovery risk. Click "Simulate failure" to see where a crash would land.

Customize
Checkpoint
500
15
3.0
10,000

Installation

npx shadcn@latest add https://craftbits.dev/r/checkpoint-timeline-viz.json

Usage

import { CheckpointTimelineViz } from "@craft-bits/viz/checkpoint-timeline-viz";
 
<CheckpointTimelineViz />

Start with a wider interval so the visitor sees a worst-case-style failure:

<CheckpointTimelineViz defaultInterval={1500} />

Subscribe to the failure event for an external chart:

<CheckpointTimelineViz
  onFailure={({ stepsLost, failureStep }) => {
    /* lift stepsLost into a chart */
  }}
/>

Understanding the component

  1. The timeline. A horizontal track spans 0 to totalSteps (default 10,000). Save icons mark every checkpoint, staggered in on entry so the eye reads the cadence at a glance.
  2. Sliders. Three native input[type=range] controls — interval, checkpoint size, and write speed. Each change clears any pending failure and resets the timeline.
  3. Simulate failure. Picks a random step inside [200, totalSteps − 200], then springs the progress bar forward at a fixed visual speed. At each checkpoint the bar pauses for a short window to show the disk write, recolouring to the warning palette.
  4. Crash. The lightning marker drops onto the failure step. A red band fills the gap between the last checkpoint and the crash; a recovery arrow points back to "resume here". The stats grid fades in below.
  5. I/O overhead colour. Green when ≤ 5%, amber when ≤ 30%, red beyond — the same threshold an engineer would use to flag "checkpointing is eating my throughput".
  6. Reduced motion. Under prefers-reduced-motion: reduce, the progress run collapses to a single snap and every entrance / shake / pulse disables.

Props

PropTypeDefaultDescription
defaultIntervalnumber500Initial checkpoint interval in steps. Clamped to [50, 2000].
defaultSizeGBnumber15Initial checkpoint size in GB. Clamped to [1, 100].
defaultWriteSpeednumber3Initial write speed in GB/s. Clamped to [0.5, 5].
totalStepsnumber10000Total training steps modelled by the timeline.
stepTimeSecondsnumber0.01Wall-clock seconds per training step. Drives the "time lost" stat.
transitionTransitionSPRINGS.snapOverride marker entrance / progress animation transition.
onFailure(info) => voidFires once after the simulated failure resolves.
classNamestringMerged onto the root via cn().

Accessibility

  • The timeline is role="img" with an aria-label summarising its purpose; markers and tick labels are aria-hidden so the live region drives the story for screen readers.
  • A polite live region announces the current narration — overhead percentage, suggested next move, recovery cost after a crash — without spamming on every slider tick.
  • Sliders are native input[type=range] so they inherit the platform-default keyboard model (arrow keys, Home/End) and receive visible focus rings.
  • The Simulate failure button disables while the simulation is running so repeat-clicks can't queue a second run.
  • Colour is never the only signal — the narration prose and the stats grid both encode the recovery cost as text.
  • Motion respects prefers-reduced-motion: reduce — the progress run collapses to a single snap and every entrance animation disables.

Credits

  • Extracted from: craftingattention (app/src/lessons/primitives/systems/CheckpointTimelineViz.tsx). The source was a lesson component that bundled a Widget chrome, a ModeStrip flip between Explore / Predict / Challenge, and the ChallengeBtn / FeedbackBadge / ScoreDots quiz UI. The viz extract keeps only the interactive timeline + simulate-failure path — the predict and challenge multiple-choice rounds were curriculum-specific and live in the lesson source. Per-track palette tokens are remapped to var(--cb-*) semantic tokens so consumer themes repaint freely, and inline SPRINGS.snappy / SPRINGS.gentle are re-keyed to the canonical SPRINGS.snap / SPRINGS.smooth from @craft-bits/core/motion.