Distillation Viz
A side-by-side bar chart of the teacher's and student's output distributions over a fixed class set, paired with a distillation temperature T. Both distributions are re-softened by T and renormalised before drawing — the same soft-target trick that powers knowledge distillation. The KL divergence between the two softened distributions (the distillation loss itself) is rendered live above the chart.
Distillation · T = 1.00KL(teacher ‖ student) = 0.092 nats
1.00
Customize
Distribution
dark-knowledge
1.00
Display
Installation
npx shadcn@latest add https://craftbits.dev/r/distillation-viz.jsonUsage
import { DistillationViz } from "@craft-bits/core";
<DistillationViz
teacherProbs={[0.7, 0.2, 0.05, 0.03, 0.02]}
studentProbs={[0.55, 0.18, 0.12, 0.1, 0.05]}
labels={["cat", "dog", "fox", "fish", "bird"]}
defaultTemperature={1}
/>Drive the temperature from outside (parent scrubber, animation, or another control):
const [t, setT] = useState(2);
<DistillationViz
teacherProbs={teacherProbs}
studentProbs={studentProbs}
temperature={t}
onTemperatureChange={setT}
tempRange={[1, 10]}
/>Use it as a pure visualisation by hiding the embedded slider and KL readout:
<DistillationViz
teacherProbs={teacherProbs}
studentProbs={studentProbs}
temperature={4}
showTemperatureSlider={false}
showKlDivergence={false}
/>Understanding the component
- Two mirrored panels. Teacher bars grow leftward from the centre; student bars grow rightward. The class labels run down the central gutter so the eye can pair each teacher bar with its student counterpart in one saccade. Bar widths animate with
SPRINGS.smoothso dragging the temperature slider feels continuous. - Temperature softening. Both distributions are raised to
1 / Tand renormalised.T = 1is identity.T > 1flattens both — the teacher's "dark knowledge" over the non-target classes lights up, the regime knowledge distillation is run in.T < 1sharpens toward one-hot. The math is performed in log-space with the standard max-subtraction trick so aggressiveTnever overflowsexp. - KL readout.
KL(teacher ‖ student)in nats is computed on the softened distributions — the same quantity Hinton's distillation loss minimises. The readout usesaria-live="polite"so screen-reader users hear it update as the slider moves. - Delta arrows in the gutter. When the student over- or under-shoots the teacher for a given class by more than 1%, a small accent-coloured triangle points toward the side that's higher. Sub-percentage moves stay quiet so the chart doesn't strobe.
- Robust to unnormalised input. The component renormalises
teacherProbsandstudentProbsto sum to1before drawing, so callers can pass raw scores without worrying about exact totals. - Controlled + uncontrolled. Pass
temperature+onTemperatureChangeto driveTfrom outside; omit both and let the component own its state viadefaultTemperature. The slider can also be hidden entirely viashowTemperatureSlider={false}for embed scenarios.
Props
| Prop | Type | Default | Description |
|---|---|---|---|
teacherProbs | readonly number[] | — | Teacher output distribution. Renormalised before display. |
studentProbs | readonly number[] | — | Student output distribution. Same length as teacher. Renormalised before display. |
labels | readonly string[] | indices | Class labels in the centre gutter — falls back to the index when missing. |
temperature | number | — | Controlled T > 0. Pair with onTemperatureChange. |
defaultTemperature | number | 1 | Uncontrolled initial T. |
onTemperatureChange | (t: number) => void | — | Fires whenever the slider commits a new T. |
tempRange | readonly [number, number] | [0.5, 8] | Slider extents. Both must be > 0, min < max. |
showTemperatureSlider | boolean | true | Render the embedded temperature slider. |
showKlDivergence | boolean | true | Render the KL = ... nats readout. |
transition | Transition | SPRINGS.smooth | Spring for bar-width transitions. |
className | string | — | Merged onto the root <div> via cn(). |
Accessibility
- The chart is wrapped in
role="figure"with a dynamicaria-label("Teacher vs student distributions at temperature 1.00. Teacher peaks at cat (70%). KL divergence 0.043 nats.") so screen-reader users get the same headline as sighted users. - The temperature slider is a native
<input type="range">— full keyboard support out of the box (Arrowkeys step,Page Up/Page Downstep by 10%,Home/Endjump to the extents). - The KL readout uses
aria-live="polite"so changes are announced as the user drags. - Bar widths animate with
SPRINGS.smooth; reduced-motion users snap to the new values instantly.
Credits
- Extracted from:
craftingattention(app/src/lessons/primitives/viz/DistillationViz.tsx). Stripped the lesson-specific Explore / Predict / Challenge mode strip, alpha (hard-vs-soft loss) slider, hard-label cross-entropy readout,Widgetchrome, model-stack illustrations, and history/bookmark plumbing; generalised to a single visualisation primitive that takes two probability vectors as input. Re-cast the math from "logits + softmax" to "probabilities + temperature re-softening" so callers can plug in any distribution. Added the delta arrows in the centre gutter, renormalisation for unnormalised input, and the log-space max-subtraction trick for numerical stability under aggressiveT. Replaced the inline spring withSPRINGS.smoothfrom@craft-bits/core/motion. - Inspiration: Hinton, Vinyals, Dean — Distilling the Knowledge in a Neural Network (2015).