Generalization Probe

Visualises how a trained model behaves on inputs it was never trained on. The scatter plot shows the training distribution in cb-accent, the out-of-distribution probes in cb-warning, and two vertical dashed guides marking the in-distribution boundaryRange. OOD dots fade with confidence and harden into a contrasting ring when the model is uncertain — the literal shape of the prediction-confidence drop on OOD inputs.

Generalization probe.In-distribution 100% · OOD 22%.
Generalizationin-dist 100% OOD 22%
in-distribution 100%OOD 22%12 train · 5 OOD
Customize
Distribution
10.0
5.0
Predictions
0.35
1.0

Installation

npx shadcn@latest add https://craftbits.dev/r/generalization-probe.json

Usage

import { GeneralizationProbe } from "@craft-bits/core";
 
<GeneralizationProbe
  trainSamples={[
    { x: 1, y: 4 },
    { x: 2, y: 7 },
    { x: 3, y: 10 },
  ]}
  oodSamples={[
    { x: -5, y: -14, confidence: 0.2 },
    { x: 50, y: 151, confidence: 0.4 },
    { x: 100, y: 301, confidence: 0.15 },
  ]}
  boundaryRange={[1, 3]}
/>

Hide the accuracy readout for a stripped-back teaching figure:

<GeneralizationProbe
  trainSamples={train}
  oodSamples={ood}
  boundaryRange={[0, 10]}
  showConfidence={false}
/>

Let the boundary auto-fit to the training set's x-range:

<GeneralizationProbe trainSamples={train} oodSamples={ood} />

Anatomy

  1. Two dot families on one plot. trainSamples render in cb-accent; oodSamples render in cb-warning. The colour split is the first signal — even before reading the band, the eye separates the populations.
  2. The boundary is the lesson. boundaryRange is drawn as two vertical dashed cb-fg-subtle guides with a faint cb-accent tint between them. Anything outside the band is out-of-distribution by construction.
  3. Confidence shapes the dot. Each OOD sample's confidence in [0, 1] controls two things: the dot's fill alpha (low confidence renders bold) and an extra contrasting ring around the marker (low confidence makes the ring appear). A confident-but-wrong prediction looks small and faint; an uncertain one looks loud.
  4. Accuracy readout reports the drop. The header surfaces in-distribution and OOD accuracy side-by-side: in-distribution treats the training fit as ground truth (1.0); OOD averages the supplied confidence over OOD samples — the gap between the two is the generalisation gap.
  5. Auto-scaling domain. The scale fits the union of trainSamples, oodSamples, and the boundary, with a 6% pad so dots never clip the frame. Zero-width / zero-height ranges get a small epsilon so a single-point input still reads.
  6. Reduced motion. prefers-reduced-motion: reduce collapses every dot spring to an instant swap. There's no autoplay — every change comes from a new prop shape, so reduced-motion users see the static scatter immediately.

Props

PropTypeDefaultDescription
trainSamplesreadonly { x: number; y: number }[]Training-distribution scatter. Rendered in cb-accent.
oodSamplesreadonly { x: number; y: number; confidence?: number }[]Out-of-distribution probes. Rendered in cb-warning; alpha scales with confidence.
boundaryRangereadonly [number, number]bbox of trainSamples.xIn-distribution x-range. Outside is treated as OOD on the x-axis and the limits are marked with vertical dashed guides.
showConfidencebooleantrueShow the in-distribution / OOD accuracy readout.
transitionTransitionSPRINGS.smoothSpring used for dot transitions when props change.
classNamestringMerged onto the root via cn().

Accessibility

  • The figure is role="figure" with an aria-label describing the sample counts, the in-distribution band, and both accuracies.
  • An aria-live="polite" summary announces the in-distribution and OOD accuracies whenever they change.
  • The two dot families use both colour (cb-accent vs cb-warning) and a shape signal (low-confidence OOD dots get a contrasting ring) — divergence is never signalled by colour alone.
  • All tick labels use font-cb-mono with tabular numerals, so screen readers narrate them cleanly and they hold position across renders.
  • prefers-reduced-motion: reduce collapses every spring to an instant swap.

Credits

  • Extracted from: craftingattention (app/src/lessons/primitives/nn/GeneralizationProbe.tsx). Replaced the lesson-specific two-column "model vs lookup table" framing, hand-rolled SvgLabel chrome, phase-machine narration, --color-success-* / --color-fail-* raw vars, and inline animate() flash loop with a generic scatter probe driven by trainSamples + oodSamples props, a boundaryRange band rendered with cb-fg-subtle dashed guides, and per-sample confidence-modulated alpha plus an uncertainty ring. The accuracy readout collapses the original animated row-by-row "predicted/'???'" reveal into a parallel in-distribution / OOD pair, surfaced both in the header and below the plot.