Dataset Stratifier Viz

An interactive visualisation of stratified eval-dataset composition. Three difficulty tiers (Easy / Medium / Hard) hold fixed per-tier model accuracies; the viewer rebalances the dataset mix and watches the aggregate accuracy shift — even though the model itself never changes. The default 80/10/10 split mirrors most scraped benchmarks and inflates the aggregate to ~87%; flipping to 33/34/33 collapses it to ~68% and exposes the model's hard-tier weakness.

A single aggregate accuracy hides the failure modes you most need to surface; stratifying makes them visible — and the weighting makes the headline.

Easy
95%
Medium
72%
Hard
38%
87.0%
(0.80×95)+(0.10×72)+(0.10×38)=87.0%
Easy
acc 95%n 400 / 500
80%
Medium
acc 72%n 50 / 500
10%
Hard
acc 38%n 50 / 500
10%

Aggregate 87.0% hides a weakness. One tier scores under 50%, but the easy-heavy mix inflates the headline number — classic Simpson's paradox.

Dataset mix 80 percent Easy, 10 percent Medium, 10 percent Hard over 500 total examples. Aggregate accuracy 87.0 percent.
Customize
Setup
80%
500

Installation

npx shadcn@latest add https://craftbits.dev/r/dataset-stratifier-viz.json

Usage

import { DatasetStratifierViz } from "@craft-bits/viz/dataset-stratifier-viz";
 
<DatasetStratifierViz />

Open with a balanced mix so the viewer skips straight to the honest readout:

<DatasetStratifierViz defaultDistribution={[33, 34, 33]} />

Surface the per-tier 95% confidence intervals to teach the sample-size story:

<DatasetStratifierViz showCI defaultTotalSize={150} />

Lift changes into your own readout:

<DatasetStratifierViz
  onChange={({ distribution, aggregate }) => {
    /* feed the snapshot into a downstream chart */
  }}
/>

Understanding the component

  1. Per-tier bars. Each tier renders an accuracy bar in its own accent (--cb-success / --cb-warning / --cb-error). The fill animates via scaleX on a left transform-origin so it respects the transform-and-opacity-only rule. When showCI is on, a translucent CI band overlays the bar at accuracy ± 1.96 · √(p(1−p)/n).
  2. Distribution bar. A three-column grid renders the mix proportionally. The columns animate via layout motion so a slider change slides one column wider while the others narrow.
  3. Aggregate readout. The right-hand panel surfaces the weighted-average accuracy as a tabular-numerals figure that interpolates colour between --cb-success (>85%), --cb-warning (>65%), and --cb-error.
  4. Weighted calculation. When showCalculation is on (default), an equation row spells out the weighted sum so the viewer can connect the picture to the math.
  5. Slider panel. When interactive is on (default), three range inputs and a total-examples number input let the viewer reshape the dataset. The two non-touched sliders redistribute proportionally to their previous shares; the total always sums to 100.
  6. Paradox callout. Whenever the aggregate is above 85% and any tier is below 50%, a warning paragraph announces the Simpson's-paradox shape so the headline number can't lie to a fast reader.
  7. Reduced motion. Under prefers-reduced-motion: reduce, every entrance, bar grow, and layout transition snaps instantly.

Props

PropTypeDefaultDescription
tiers[Tier, Tier, Tier]Easy 95% / Medium 72% / Hard 38%Tier definitions in render order. Accuracies stay fixed across mix changes.
distribution[number, number, number]Controlled mix percentages (sum 100). Pair with onChange.
defaultDistribution[number, number, number][80, 10, 10]Uncontrolled initial mix.
totalSizenumberControlled total dataset size. Clamped to [10, 10000].
defaultTotalSizenumber500Uncontrolled initial total.
showCIbooleanfalseOverlay each tier bar with a 95% Wald CI band.
showCalculationbooleantrueRender the weighted-average equation row.
interactivebooleantrueRender the slider panel. Set false for a static figure.
transitionTransitionSPRINGS.snapOverride the spring used for bar grows and the aggregate readout.
onChange(next) => voidFires with { distribution, aggregate, totalSize } on every adjustment.
classNamestringMerged onto the root via cn().

Accessibility

  • The root is role="figure" with a descriptive aria-label; the tier bars, the distribution bar, and the per-tier accuracy bars each carry role="img" labels summarising their numbers.
  • A polite live region announces the current mix, total examples, and aggregate accuracy whenever the viewer reshapes the dataset.
  • Slider inputs carry per-tier aria-labels ("Easy tier percentage of dataset") and reach a 36px hit area; the total-examples input is paired with a <label htmlFor> and the same 36px minimum height.
  • All sliders show a :focus-visible ring on --cb-accent; the range track encodes percentage twice (numeric badge + visual fill).
  • The Simpson's-paradox warning is a separate paragraph — colour and the warning accent are never the only signal.
  • Motion respects prefers-reduced-motion: reduce — every entrance, bar grow, and layout transition snaps instantly.

Credits

  • Extracted from: craftingattention (app/src/lessons/primitives/systems/DatasetStratifierViz.tsx). The source was a three-mode lesson primitive (explore / predict / challenge) layered on Widget, ModeStrip, ChallengeBtn, FeedbackBadge, and ScoreDots — none of which belong in the library. The extract drops the quiz scaffolding and lifts the canonical interactive (tier bars + distribution bar + aggregate readout + slider panel) into a Radix-style controlled API. Per-track palette tokens are remapped to var(--cb-*) semantic tokens so consumer themes repaint freely; inline spring values are re-keyed to SPRINGS.snap / SPRINGS.smooth from @craft-bits/core/motion.