Eval Pipeline Viz
An interactive visualisation of the CI/CD eval gate pattern: every prompt or model change triggers automated eval suites and deployment is blocked when scores regress beyond a confidence-interval overlap test. A coloured packet animates through five stages — Commit → Run Evals → Score + CI → Gate → Deploy/Block — and each stage opens a detail panel with its concrete artifact: a git-style diff, live progress bars, scored metrics with CIs, the overlap test itself, then a pass/block summary.
The teaching insight: confidence intervals — not point estimates — decide whether a change is a real regression. If the new and baseline CIs overlap, the difference is not statistically significant. Only non-overlapping CIs trigger the gate. The Introduce regression toggle flips Helpfulness so the CIs separate and the gate blocks deployment.
This is the eval gate pattern. Every prompt change flows through automated evaluations before reaching production. Step through to see how confidence intervals decide whether a change ships or gets blocked.
Installation
npx shadcn@latest add https://craftbits.dev/r/eval-pipeline-viz.jsonUsage
import { EvalPipelineViz } from "@craft-bits/viz/eval-pipeline-viz";
<EvalPipelineViz />Boot directly into the regression scenario:
<EvalPipelineViz defaultRegression />Subscribe to the final verdict:
<EvalPipelineViz
onComplete={({ pass, hasRegression, outcomes }) => {
/* lift the gate verdict into your own dashboard */
}}
/>Understanding the component
- The pipeline header. Five connected stage nodes with a static connector line behind them; an animated progress line and a glowing packet travel left-to-right as you advance. The active stage scales gently, completed stages show a check mark.
- Detail panels. Each non-idle phase renders a panel with the artifact for that stage. Stage 2 (
Run Evals) runs three simulated progress bars in parallel — Accuracy (200 cases), Helpfulness (100), Safety (50) — and auto-advances to scoring when every bar fills. - The overlap test. Stage 4 draws two horizontal CI bars per metric (grey baseline on top, violet candidate on bottom) over a shared axis. Where they intersect a soft yellow rectangle labels the
overlap; where they don't, a dashed redgapline spans the void. The verdict banner underneath summarises the gate outcome. - The gate decision. A metric is a regression when its new CI sits entirely below the baseline CI (
newCI[1] < baseCI[0]). Any single regression blocks the deploy. - Reduced motion. Under
prefers-reduced-motion: reduce, every panel transition, packet motion, and SVG entrance collapses to a snap, and the eval-progress simulation runs about three times faster.
Props
| Prop | Type | Default | Description |
|---|---|---|---|
defaultRegression | boolean | false | Whether the candidate starts in the regression scenario. |
showRegressionToggle | boolean | true | Show the Introduce regression button and bind the R shortcut. |
transition | Transition | SPRINGS.default | Override the spring used for the packet, progress line, and panel transitions. |
onComplete | (summary) => void | — | Fires when the pipeline reaches result with { pass, hasRegression, outcomes }. |
className | string | — | Merged onto the root via cn(). |
Accessibility
- Root is
role="figure"with anaria-labelsummarising the visualisation so screen-reader users get the headline. - A polite live region announces each stage transition by its label.
- Each interactive control (
Start Pipeline,Introduce regression,Reset) has a visible focus ring, anaria-label, and the regression toggle exposes its state viaaria-pressed. - Keyboard shortcuts:
Space/ArrowRightadvance through the stages,Rtoggles the regression scenario when the toggle is shown. - Each CI overlap SVG carries an
aria-labelwith its metric, both CIs, and the textual outcome — colour is never the only signal. - Motion respects
prefers-reduced-motion: reduce: every entrance, scale pulse, packet motion, and panel transition collapses to a snap.
Credits
- Extracted from:
craftingattention(app/src/lessons/primitives/systems/EvalPipelineViz.tsx). The source was a lesson component that bundledSvgLabelchrome, aChallengeBtnpredict-the-outcome quiz, and the lesson'slessonIdnarration framing. The viz extract keeps only the interactive 5-stage pipeline, the eval progress simulation, and the CI overlap test — the quiz round and lesson plumbing are curriculum-specific and live in the lesson source. Per-track palette tokens (--color-ink-*,--color-success-*,--color-fail-*,--color-accent-*,--color-surface-raised) are remapped tovar(--cb-fg-*)/var(--cb-success)/var(--cb-error)/var(--cb-accent)/var(--cb-bg-elevated)so consumer themes repaint freely. InlineSPRINGS.gentle/SPRINGS.snappyandSTAGGER.normalare re-keyed to canonicalSPRINGS.default/SPRINGS.snap/STAGGERfrom@craft-bits/core/motion.