Bootstrap CI Viz
An interactive visualisation of bootstrap confidence intervals for LLM-eval scores. The viewer presses Resample to step one bootstrap draw at a time (watching the histogram populate one bar at a time) or Run 1,000 to sprint through a bulk run. A sample-size toggle (N=50 vs N=200) demonstrates how more data narrows the CI. Once both sample sizes have been bootstrapped, a comparison panel surfaces whether each CI excludes a configurable baseline.
A single point estimate hides uncertainty; the bootstrap distribution makes the uncertainty visible — and shows when it's small enough to act on.
You evaluated your LLM on 50 examples and got 92% accuracy. Looks good — but is this enough to deploy? A single point estimate hides the uncertainty. Press Resample to see how much the score can shift.
Installation
npx shadcn@latest add https://craftbits.dev/r/bootstrap-ci-viz.jsonUsage
import { BootstrapCIViz } from "@craft-bits/viz/bootstrap-ci-viz";
<BootstrapCIViz />Start with the larger sample size if you want viewers to skip straight to the tightening:
<BootstrapCIViz defaultSampleSize={200} />Lift the bulk result into your own chart:
<BootstrapCIViz
onBulkComplete={({ sampleSize, ci }) => {
/* feed ci into a downstream chart */
}}
/>Understanding the component
- The dataset. A grid of dots — green for
pass, red forfail. The point estimate ispasses / total, surfaced in monospace tabular numerals. - Single resample. Pressing
Resample(or Space) picksNindices with replacement, highlights the picked dots, badges the multi-picks with a2/3count, and appends the resampled accuracy to the histogram. - Bulk run. Pressing
Run 1,000(or Enter) fires bootstrap batches everybulkTickMsuntilbulkTargetresamples have accumulated. The CI band only paints once the histogram has 100+ samples. - Sample-size toggle. Flipping between
N=50andN=200resets the resample history but keeps any completed bulk result. The component remembers both so the comparison panel can land once both have run. - Comparison panel. Renders side-by-side CI bars and a verdict against the configurable
baseline— overlapping CI prints in--cb-error, non-overlapping in--cb-success. - Reduced motion. Under
prefers-reduced-motion: reduce, every entrance disables and the bulk run collapses to a single synchronous computation.
Props
| Prop | Type | Default | Description |
|---|---|---|---|
defaultSampleSize | 50 | 200 | 50 | Sample size shown on mount. The toggle still lets the viewer flip. |
bulkTarget | number | 1000 | Total resamples to accumulate in a bulk run. |
bulkBatchSize | number | 50 | Resamples added per animation tick during a bulk run. |
bulkTickMs | number | 40 | Tick interval (ms) between bulk batches. Reduced-motion users skip this. |
baseline | number | 0.89 | Baseline the comparison panel measures each CI against. |
transition | Transition | SPRINGS.snap | Override histogram bar / dot / band entrance transition. |
onBulkComplete | (result) => void | — | Fires after each bulk run completes. |
className | string | — | Merged onto the root via cn(). |
Accessibility
- The root is
role="figure"with a descriptivearia-label; the histogram and dot grid carry their ownrole="img"labels summarising the dataset and CI. - A polite live region announces the current resample count and CI without spamming on every histogram bar.
- Keyboard model:
Spaceresamples once,Enterruns the bulk,Rresets. The component itself is focusable so the shortcuts work without targeting the buttons. - The sample-size toggle is a
role="radiogroup"ofrole="radio"buttons witharia-checkedreflecting the active size; the bulk button disables while running so repeat-clicks can't queue. - Colour is never the only signal — the narration, the legend, and the comparison verdict all encode pass / fail / CI overlap as words.
- Motion respects
prefers-reduced-motion: reduce— every entrance disables and the bulk run collapses to a single synchronous computation.
Credits
- Extracted from:
craftingattention(app/src/lessons/primitives/systems/BootstrapCIViz.tsx). The source was a lesson primitive for an LLM-eval lesson; the extract drops the curriculum chrome (lesson narration banner,ca-narrationclass, no-cb--color-*ink/surface tokens) and lifts the controls into a Radix-style controlled API. InlineSPRINGS.snappy/SPRINGS.gentleare re-keyed to the canonicalSPRINGS.snap/SPRINGS.smoothfrom@craft-bits/core/motion;STAGGER.tightis replaced with the canonical scalarSTAGGER. Histogram bars now animate via ascaleYtransform (rather than animatingheight/y) so the entrance respects the transform-and-opacity-only rule. Per-track palette tokens are remapped tovar(--cb-*)semantic tokens so consumer themes repaint freely.