Dataset Stratifier Viz

An interactive visualisation of stratified eval-dataset composition. Three difficulty tiers (Easy / Medium / Hard) hold fixed per-tier model accuracies; the viewer rebalances the dataset mix and watches the aggregate accuracy shift — even though the model itself never changes. The default 80/10/10 split mirrors most scraped benchmarks and inflates the aggregate to ~87%; flipping to 33/34/33 collapses it to ~68% and exposes the model's hard-tier weakness.

A single aggregate accuracy hides the failure modes you most need to surface; stratifying makes them visible — and the weighting makes the headline.

Easy

95%

Medium

72%

Hard

38%

87.0%

(0.80×95)+(0.10×72)+(0.10×38)=87.0%

80%

10%

Easy

acc 95%n 400 / 500

80%

Medium

acc 72%n 50 / 500

10%

Hard

acc 38%n 50 / 500

10%

Total examples

Aggregate 87.0% hides a weakness. One tier scores under 50%, but the easy-heavy mix inflates the headline number — classic Simpson's paradox.

Customize

Setup

easy %80%

total examples500

show CI

show calculation

sliders

Installation

npx shadcn@latest add https://craftbits.dev/r/dataset-stratifier-viz.json

Usage

import { DatasetStratifierViz } from "@craft-bits/viz/dataset-stratifier-viz";
 
<DatasetStratifierViz />

Open with a balanced mix so the viewer skips straight to the honest readout:

<DatasetStratifierViz defaultDistribution={[33, 34, 33]} />

Surface the per-tier 95% confidence intervals to teach the sample-size story:

<DatasetStratifierViz showCI defaultTotalSize={150} />

Lift changes into your own readout:

<DatasetStratifierViz
  onChange={({ distribution, aggregate }) => {
    /* feed the snapshot into a downstream chart */
  }}
/>

Understanding the component

Per-tier bars. Each tier renders an accuracy bar in its own accent (--cb-success / --cb-warning / --cb-error). The fill animates via scaleX on a left transform-origin so it respects the transform-and-opacity-only rule. When showCI is on, a translucent CI band overlays the bar at accuracy ± 1.96 · √(p(1−p)/n).
Distribution bar. A three-column grid renders the mix proportionally. The columns animate via layout motion so a slider change slides one column wider while the others narrow.
Aggregate readout. The right-hand panel surfaces the weighted-average accuracy as a tabular-numerals figure that interpolates colour between --cb-success (>85%), --cb-warning (>65%), and --cb-error.
Weighted calculation. When showCalculation is on (default), an equation row spells out the weighted sum so the viewer can connect the picture to the math.
Slider panel. When interactive is on (default), three range inputs and a total-examples number input let the viewer reshape the dataset. The two non-touched sliders redistribute proportionally to their previous shares; the total always sums to 100.
Paradox callout. Whenever the aggregate is above 85% and any tier is below 50%, a warning paragraph announces the Simpson's-paradox shape so the headline number can't lie to a fast reader.
Reduced motion. Under prefers-reduced-motion: reduce, every entrance, bar grow, and layout transition snaps instantly.

Props

Prop	Type	Default	Description
`tiers`	`[Tier, Tier, Tier]`	Easy 95% / Medium 72% / Hard 38%	Tier definitions in render order. Accuracies stay fixed across mix changes.
`distribution`	`[number, number, number]`	—	Controlled mix percentages (sum 100). Pair with `onChange`.
`defaultDistribution`	`[number, number, number]`	`[80, 10, 10]`	Uncontrolled initial mix.
`totalSize`	`number`	—	Controlled total dataset size. Clamped to `[10, 10000]`.
`defaultTotalSize`	`number`	`500`	Uncontrolled initial total.
`showCI`	`boolean`	`false`	Overlay each tier bar with a 95% Wald CI band.
`showCalculation`	`boolean`	`true`	Render the weighted-average equation row.
`interactive`	`boolean`	`true`	Render the slider panel. Set `false` for a static figure.
`transition`	`Transition`	`SPRINGS.snap`	Override the spring used for bar grows and the aggregate readout.
`onChange`	`(next) => void`	—	Fires with `{ distribution, aggregate, totalSize }` on every adjustment.
`className`	`string`	—	Merged onto the root via `cn()`.

Accessibility

The root is role="figure" with a descriptive aria-label; the tier bars, the distribution bar, and the per-tier accuracy bars each carry role="img" labels summarising their numbers.
A polite live region announces the current mix, total examples, and aggregate accuracy whenever the viewer reshapes the dataset.
Slider inputs carry per-tier aria-labels ("Easy tier percentage of dataset") and reach a 36px hit area; the total-examples input is paired with a <label htmlFor> and the same 36px minimum height.
All sliders show a :focus-visible ring on --cb-accent; the range track encodes percentage twice (numeric badge + visual fill).
The Simpson's-paradox warning is a separate paragraph — colour and the warning accent are never the only signal.
Motion respects prefers-reduced-motion: reduce — every entrance, bar grow, and layout transition snaps instantly.

Credits

Extracted from: craftingattention (app/src/lessons/primitives/systems/DatasetStratifierViz.tsx). The source was a three-mode lesson primitive (explore / predict / challenge) layered on Widget, ModeStrip, ChallengeBtn, FeedbackBadge, and ScoreDots — none of which belong in the library. The extract drops the quiz scaffolding and lifts the canonical interactive (tier bars + distribution bar + aggregate readout + slider panel) into a Radix-style controlled API. Per-track palette tokens are remapped to var(--cb-*) semantic tokens so consumer themes repaint freely; inline spring values are re-keyed to SPRINGS.snap / SPRINGS.smooth from @craft-bits/core/motion.