Power Analysis Viz
A statistical-power explainer for eval design. Three sliders — sample
size, observed accuracy, and minimum detectable effect (MDE) — drive
three derived readouts: a 95 % confidence-interval bar (with optional
competing-prompt overlay), a power gauge bucketed into fail / warn /
success zones, and a "required n for 80 % power" status pill. Three
modes — scripted explore stages that unlock the sliders at the end,
predict multiple-choice rounds with variant-aware setups (CI, power,
scaling, significance, paired), and deeper challenge traps (multiple
testing, max-of-k bias, McNemar pairing) — walk the visitor from
"how big is my CI?" to "why does my best-of-five score lie?".
Your model scores 88% on 200 eval examples. The true accuracy is somewhere in this shaded range.
Installation
npx shadcn@latest add https://craftbits.dev/r/power-analysis-viz.jsonUsage
import { PowerAnalysisViz } from "@craft-bits/viz/power-analysis-viz";
<PowerAnalysisViz />Pick the starting mode:
<PowerAnalysisViz defaultMode="challenge" />Override the competing-prompt accuracy or the narration generator used once the sliders unlock:
<PowerAnalysisViz
competingAccuracy={0.79}
getNarration={(n, mde, power) =>
`n=${n}, power=${(power * 100).toFixed(0)}% for ${(mde * 100).toFixed(0)}pp`
}
/>React to learner progress:
<PowerAnalysisViz
onModeComplete={(score) =>
console.log(`${score.mode}: ${score.correct}/${score.total}`)
}
/>Understanding the component
- Three sliders, three readouts. Sample size, observed accuracy, and MDE drive a 95 % CI bar, a power gauge, and a required-n indicator. Each readout is pure-derived from the slider state — no hidden coupling.
- Wilson-style CI.
1.96 × sqrt(p(1-p)/n)half-width, clamped to[0, 1]. The competing-prompt overlay applies the same formula to a differentp, so visitors see directly when intervals overlap. - Two-proportion power.
normalCdf(z_obs - z_alpha)withz_alpha = 1.96, pooled standard error. Gauge zones at 50 % and 80 %. - Required n. Linear-search the smallest
n(in steps of 10, up to 50 000) wherecomputePower(p, mde, n) >= 0.8. The pill flips green the instant the currentncovers the requirement. - Explore mode. Five scripted stages walk through canonical "the eval is decorative" failure modes — overlapping CIs at standard n, ±9 % CIs at n=50, 12 % power for a 5pp drop, and finally 80 % power at n=700. Stage 6 unlocks the sliders; a one-shot pulse hints at the unlock, and the power gauge flashes when the visitor first crosses 80 %.
- Predict mode. Five MCQ rounds, each with a variant-specific setup:
ci,power,scaling(n grows on reveal),significance(n=100k p-value),paired(2×2 discordant-pairs table). The dashboard reveals the answer in context. - Challenge mode. Four MCQ rounds with per-option explanations — multiple testing (1 − 0.95²⁰ = 0.64), under-powered 3pp detection at n=100, max-of-k selection bias (σ · sqrt(2 ln k)), and why the two-proportion z-test is wrong on paired data.
- Reduced motion. Under
prefers-reduced-motion: reduce, every spring collapses to instant, the unlock pulse and threshold flash are suppressed, and the option-shake / scale-correct animations don't run.
Props
| Prop | Type | Default | Description |
|---|---|---|---|
exploreStages | PowerAnalysisVizExploreStage[] | 6 stages | Scripted explore stages. Final stage is treated as "free" (sliders unlocked). |
presets | PowerAnalysisVizPreset[] | quick / standard / rigorous | Sample-size preset chips in the free-exploration controls. |
predictRounds | PowerAnalysisVizPredictRound[] | 5 rounds | Multiple-choice prediction rounds. |
challengeRounds | PowerAnalysisVizChallengeRound[] | 4 rounds | Multiple-choice challenge rounds. |
competingAccuracy | number | 0.83 | Competing-prompt accuracy used in explore stages that show competing CI. |
getNarration | (n, mde, power) => string | built-in copy | Narration generator used once the sliders unlock. |
defaultMode | "explore" | "predict" | "challenge" | "explore" | Mode visible on first render. |
transition | Transition | SPRINGS.snap | Spring used for value / colour transitions. |
onModeChange | (mode) => void | — | Fires when the active mode changes. |
onModeComplete | (score) => void | — | Fires when a predict or challenge run completes. |
className | string | — | Merged onto the root via cn(). |
Accessibility
- Mode tabs are real
<button>elements witharia-pressed, ≥ 32×32 hit area, and visible focus rings. - Sliders are
<input type="range">overlays on the styled track, with visible labels and anaria-labelthat mirrors the visible copy. The power gauge is arole="progressbar"witharia-valuenow / -min / -max. - Tone-coded readouts always render their status word as text — colour is never the only signal.
- A polite live region (
aria-live="polite") narrates the active stage, question, or explanation so screen-reader users stay in sync. - Answer options are buttons with
aria-pressed, sequential focus order, and tone-coded feedback after the round is checked. - Motion respects
prefers-reduced-motion: reduce— every spring collapses to instant, the unlock pulse and threshold flash are suppressed.
Credits
- Extracted from:
craftingattention(app/src/lessons/primitives/systems/PowerAnalysisViz.tsx). The source was a lesson component wrapped in the project'sWidgetchrome with hard dependencies onModeStrip,ChallengeBtn,FeedbackBadge,ScoreDots, andEXPLORE_NARRATIONSfromConstructionPrimitives, plus per-track palette tokens (--color-fail-400,--color-warn-400,--color-success-400,--color-accent-400,--color-ink-100, …) andMICRO.tap/SPRINGS.snappy/TIMING.correct.scaleBounce/STAGGER.tightfrom@/lib/motion. The viz extract replaces every palette reference withvar(--cb-*)semantic tokens, re-keys motion to canonicalSPRINGS.snap/SPRINGS.smooth/SPRINGS.bouncyand scalarSTAGGERfrom@craft-bits/core/motion, drops the Widget chrome in favour of a token-styled root the user can compose, inlines every chrome primitive (ModeButton,PrimaryButton,SecondaryButton,PresetChip,FeedbackBadge,ScoreDots,OptionList,NarrationBlock,SliderControl,CIBar,PowerGauge,RequiredNIndicator,PredictSetup,DoneSummary) so there is no lesson-layer dependency, surfaces the explore stages / presets / predict rounds / challenge rounds / competing accuracy / narration generator as props (with sensible defaults), and addsonModeChange/onModeCompletecallbacks for lesson hosts to react to visitor progress.