Generalization Probe

Visualises how a trained model behaves on inputs it was never trained on. The scatter plot shows the training distribution in cb-accent, the out-of-distribution probes in cb-warning, and two vertical dashed guides marking the in-distribution boundaryRange. OOD dots fade with confidence and harden into a contrasting ring when the model is uncertain — the literal shape of the prediction-confidence drop on OOD inputs.

Generalizationin-dist 100% OOD 22%

in-distribution 100%OOD 22%12 train · 5 OOD

Customize

Distribution

boundaryRange high10.0

OOD spread5.0

Predictions

OOD baseline confidence0.35

OOD noise1.0

show readout

Installation

npx shadcn@latest add https://craftbits.dev/r/generalization-probe.json

Usage

import { GeneralizationProbe } from "@craft-bits/core";
 
<GeneralizationProbe
  trainSamples={[
    { x: 1, y: 4 },
    { x: 2, y: 7 },
    { x: 3, y: 10 },
  ]}
  oodSamples={[
    { x: -5, y: -14, confidence: 0.2 },
    { x: 50, y: 151, confidence: 0.4 },
    { x: 100, y: 301, confidence: 0.15 },
  ]}
  boundaryRange={[1, 3]}
/>

Hide the accuracy readout for a stripped-back teaching figure:

<GeneralizationProbe
  trainSamples={train}
  oodSamples={ood}
  boundaryRange={[0, 10]}
  showConfidence={false}
/>

Let the boundary auto-fit to the training set's x-range:

<GeneralizationProbe trainSamples={train} oodSamples={ood} />

Anatomy

Two dot families on one plot. trainSamples render in cb-accent; oodSamples render in cb-warning. The colour split is the first signal — even before reading the band, the eye separates the populations.
The boundary is the lesson. boundaryRange is drawn as two vertical dashed cb-fg-subtle guides with a faint cb-accent tint between them. Anything outside the band is out-of-distribution by construction.
Confidence shapes the dot. Each OOD sample's confidence in [0, 1] controls two things: the dot's fill alpha (low confidence renders bold) and an extra contrasting ring around the marker (low confidence makes the ring appear). A confident-but-wrong prediction looks small and faint; an uncertain one looks loud.
Accuracy readout reports the drop. The header surfaces in-distribution and OOD accuracy side-by-side: in-distribution treats the training fit as ground truth (1.0); OOD averages the supplied confidence over OOD samples — the gap between the two is the generalisation gap.
Auto-scaling domain. The scale fits the union of trainSamples, oodSamples, and the boundary, with a 6% pad so dots never clip the frame. Zero-width / zero-height ranges get a small epsilon so a single-point input still reads.
Reduced motion. prefers-reduced-motion: reduce collapses every dot spring to an instant swap. There's no autoplay — every change comes from a new prop shape, so reduced-motion users see the static scatter immediately.

Props

Prop	Type	Default	Description
`trainSamples`	`readonly { x: number; y: number }[]`	—	Training-distribution scatter. Rendered in `cb-accent`.
`oodSamples`	`readonly { x: number; y: number; confidence?: number }[]`	—	Out-of-distribution probes. Rendered in `cb-warning`; alpha scales with `confidence`.
`boundaryRange`	`readonly [number, number]`	bbox of `trainSamples.x`	In-distribution x-range. Outside is treated as OOD on the x-axis and the limits are marked with vertical dashed guides.
`showConfidence`	`boolean`	`true`	Show the in-distribution / OOD accuracy readout.
`transition`	`Transition`	`SPRINGS.smooth`	Spring used for dot transitions when props change.
`className`	`string`	—	Merged onto the root via `cn()`.

Accessibility

The figure is role="figure" with an aria-label describing the sample counts, the in-distribution band, and both accuracies.
An aria-live="polite" summary announces the in-distribution and OOD accuracies whenever they change.
The two dot families use both colour (cb-accent vs cb-warning) and a shape signal (low-confidence OOD dots get a contrasting ring) — divergence is never signalled by colour alone.
All tick labels use font-cb-mono with tabular numerals, so screen readers narrate them cleanly and they hold position across renders.
prefers-reduced-motion: reduce collapses every spring to an instant swap.

Credits

Extracted from: craftingattention (app/src/lessons/primitives/nn/GeneralizationProbe.tsx). Replaced the lesson-specific two-column "model vs lookup table" framing, hand-rolled SvgLabel chrome, phase-machine narration, --color-success-* / --color-fail-* raw vars, and inline animate() flash loop with a generic scatter probe driven by trainSamples + oodSamples props, a boundaryRange band rendered with cb-fg-subtle dashed guides, and per-sample confidence-modulated alpha plus an uncertainty ring. The accuracy readout collapses the original animated row-by-row "predicted/'???'" reveal into a parallel in-distribution / OOD pair, surfaced both in the header and below the plot.