Power Analysis Viz

A statistical-power explainer for eval design. Three sliders — sample size, observed accuracy, and minimum detectable effect (MDE) — drive three derived readouts: a 95 % confidence-interval bar (with optional competing-prompt overlay), a power gauge bucketed into fail / warn / success zones, and a "required n for 80 % power" status pill. Three modes — scripted explore stages that unlock the sliders at the end, predict multiple-choice rounds with variant-aware setups (CI, power, scaling, significance, paired), and deeper challenge traps (multiple testing, max-of-k bias, McNemar pairing) — walk the visitor from "how big is my CI?" to "why does my best-of-five score lie?".

95% Confidence Interval±4.5%

Your model scores 88% on 200 eval examples. The true accuracy is somewhere in this shaded range.

1 / 6
Your model scores 88% on 200 eval examples. The true accuracy is somewhere in this shaded range.
Customize
Mode
explore

Installation

npx shadcn@latest add https://craftbits.dev/r/power-analysis-viz.json

Usage

import { PowerAnalysisViz } from "@craft-bits/viz/power-analysis-viz";
 
<PowerAnalysisViz />

Pick the starting mode:

<PowerAnalysisViz defaultMode="challenge" />

Override the competing-prompt accuracy or the narration generator used once the sliders unlock:

<PowerAnalysisViz
  competingAccuracy={0.79}
  getNarration={(n, mde, power) =>
    `n=${n}, power=${(power * 100).toFixed(0)}% for ${(mde * 100).toFixed(0)}pp`
  }
/>

React to learner progress:

<PowerAnalysisViz
  onModeComplete={(score) =>
    console.log(`${score.mode}: ${score.correct}/${score.total}`)
  }
/>

Understanding the component

  1. Three sliders, three readouts. Sample size, observed accuracy, and MDE drive a 95 % CI bar, a power gauge, and a required-n indicator. Each readout is pure-derived from the slider state — no hidden coupling.
  2. Wilson-style CI. 1.96 × sqrt(p(1-p)/n) half-width, clamped to [0, 1]. The competing-prompt overlay applies the same formula to a different p, so visitors see directly when intervals overlap.
  3. Two-proportion power. normalCdf(z_obs - z_alpha) with z_alpha = 1.96, pooled standard error. Gauge zones at 50 % and 80 %.
  4. Required n. Linear-search the smallest n (in steps of 10, up to 50 000) where computePower(p, mde, n) >= 0.8. The pill flips green the instant the current n covers the requirement.
  5. Explore mode. Five scripted stages walk through canonical "the eval is decorative" failure modes — overlapping CIs at standard n, ±9 % CIs at n=50, 12 % power for a 5pp drop, and finally 80 % power at n=700. Stage 6 unlocks the sliders; a one-shot pulse hints at the unlock, and the power gauge flashes when the visitor first crosses 80 %.
  6. Predict mode. Five MCQ rounds, each with a variant-specific setup: ci, power, scaling (n grows on reveal), significance (n=100k p-value), paired (2×2 discordant-pairs table). The dashboard reveals the answer in context.
  7. Challenge mode. Four MCQ rounds with per-option explanations — multiple testing (1 − 0.95²⁰ = 0.64), under-powered 3pp detection at n=100, max-of-k selection bias (σ · sqrt(2 ln k)), and why the two-proportion z-test is wrong on paired data.
  8. Reduced motion. Under prefers-reduced-motion: reduce, every spring collapses to instant, the unlock pulse and threshold flash are suppressed, and the option-shake / scale-correct animations don't run.

Props

PropTypeDefaultDescription
exploreStagesPowerAnalysisVizExploreStage[]6 stagesScripted explore stages. Final stage is treated as "free" (sliders unlocked).
presetsPowerAnalysisVizPreset[]quick / standard / rigorousSample-size preset chips in the free-exploration controls.
predictRoundsPowerAnalysisVizPredictRound[]5 roundsMultiple-choice prediction rounds.
challengeRoundsPowerAnalysisVizChallengeRound[]4 roundsMultiple-choice challenge rounds.
competingAccuracynumber0.83Competing-prompt accuracy used in explore stages that show competing CI.
getNarration(n, mde, power) => stringbuilt-in copyNarration generator used once the sliders unlock.
defaultMode"explore" | "predict" | "challenge""explore"Mode visible on first render.
transitionTransitionSPRINGS.snapSpring used for value / colour transitions.
onModeChange(mode) => voidFires when the active mode changes.
onModeComplete(score) => voidFires when a predict or challenge run completes.
classNamestringMerged onto the root via cn().

Accessibility

  • Mode tabs are real <button> elements with aria-pressed, ≥ 32×32 hit area, and visible focus rings.
  • Sliders are <input type="range"> overlays on the styled track, with visible labels and an aria-label that mirrors the visible copy. The power gauge is a role="progressbar" with aria-valuenow / -min / -max.
  • Tone-coded readouts always render their status word as text — colour is never the only signal.
  • A polite live region (aria-live="polite") narrates the active stage, question, or explanation so screen-reader users stay in sync.
  • Answer options are buttons with aria-pressed, sequential focus order, and tone-coded feedback after the round is checked.
  • Motion respects prefers-reduced-motion: reduce — every spring collapses to instant, the unlock pulse and threshold flash are suppressed.

Credits

  • Extracted from: craftingattention (app/src/lessons/primitives/systems/PowerAnalysisViz.tsx). The source was a lesson component wrapped in the project's Widget chrome with hard dependencies on ModeStrip, ChallengeBtn, FeedbackBadge, ScoreDots, and EXPLORE_NARRATIONS from ConstructionPrimitives, plus per-track palette tokens (--color-fail-400, --color-warn-400, --color-success-400, --color-accent-400, --color-ink-100, …) and MICRO.tap / SPRINGS.snappy / TIMING.correct.scaleBounce / STAGGER.tight from @/lib/motion. The viz extract replaces every palette reference with var(--cb-*) semantic tokens, re-keys motion to canonical SPRINGS.snap / SPRINGS.smooth / SPRINGS.bouncy and scalar STAGGER from @craft-bits/core/motion, drops the Widget chrome in favour of a token-styled root the user can compose, inlines every chrome primitive (ModeButton, PrimaryButton, SecondaryButton, PresetChip, FeedbackBadge, ScoreDots, OptionList, NarrationBlock, SliderControl, CIBar, PowerGauge, RequiredNIndicator, PredictSetup, DoneSummary) so there is no lesson-layer dependency, surfaces the explore stages / presets / predict rounds / challenge rounds / competing accuracy / narration generator as props (with sensible defaults), and adds onModeChange / onModeComplete callbacks for lesson hosts to react to visitor progress.