Distillation Viz

A side-by-side bar chart of the teacher's and student's output distributions over a fixed class set, paired with a distillation temperature T. Both distributions are re-softened by T and renormalised before drawing — the same soft-target trick that powers knowledge distillation. The KL divergence between the two softened distributions (the distillation loss itself) is rendered live above the chart.

Distillation · T = 1.00KL(teacher ‖ student) = 0.092 nats
teacherstudentcat70.0%55.0%dog20.0%18.0%fox5.0%12.0%fish3.0%10.0%bird2.0%5.0%
1.00
Customize
Distribution
dark-knowledge
1.00
Display

Installation

npx shadcn@latest add https://craftbits.dev/r/distillation-viz.json

Usage

import { DistillationViz } from "@craft-bits/core";
 
<DistillationViz
  teacherProbs={[0.7, 0.2, 0.05, 0.03, 0.02]}
  studentProbs={[0.55, 0.18, 0.12, 0.1, 0.05]}
  labels={["cat", "dog", "fox", "fish", "bird"]}
  defaultTemperature={1}
/>

Drive the temperature from outside (parent scrubber, animation, or another control):

const [t, setT] = useState(2);
 
<DistillationViz
  teacherProbs={teacherProbs}
  studentProbs={studentProbs}
  temperature={t}
  onTemperatureChange={setT}
  tempRange={[1, 10]}
/>

Use it as a pure visualisation by hiding the embedded slider and KL readout:

<DistillationViz
  teacherProbs={teacherProbs}
  studentProbs={studentProbs}
  temperature={4}
  showTemperatureSlider={false}
  showKlDivergence={false}
/>

Understanding the component

  1. Two mirrored panels. Teacher bars grow leftward from the centre; student bars grow rightward. The class labels run down the central gutter so the eye can pair each teacher bar with its student counterpart in one saccade. Bar widths animate with SPRINGS.smooth so dragging the temperature slider feels continuous.
  2. Temperature softening. Both distributions are raised to 1 / T and renormalised. T = 1 is identity. T > 1 flattens both — the teacher's "dark knowledge" over the non-target classes lights up, the regime knowledge distillation is run in. T < 1 sharpens toward one-hot. The math is performed in log-space with the standard max-subtraction trick so aggressive T never overflows exp.
  3. KL readout. KL(teacher ‖ student) in nats is computed on the softened distributions — the same quantity Hinton's distillation loss minimises. The readout uses aria-live="polite" so screen-reader users hear it update as the slider moves.
  4. Delta arrows in the gutter. When the student over- or under-shoots the teacher for a given class by more than 1%, a small accent-coloured triangle points toward the side that's higher. Sub-percentage moves stay quiet so the chart doesn't strobe.
  5. Robust to unnormalised input. The component renormalises teacherProbs and studentProbs to sum to 1 before drawing, so callers can pass raw scores without worrying about exact totals.
  6. Controlled + uncontrolled. Pass temperature + onTemperatureChange to drive T from outside; omit both and let the component own its state via defaultTemperature. The slider can also be hidden entirely via showTemperatureSlider={false} for embed scenarios.

Props

PropTypeDefaultDescription
teacherProbsreadonly number[]Teacher output distribution. Renormalised before display.
studentProbsreadonly number[]Student output distribution. Same length as teacher. Renormalised before display.
labelsreadonly string[]indicesClass labels in the centre gutter — falls back to the index when missing.
temperaturenumberControlled T > 0. Pair with onTemperatureChange.
defaultTemperaturenumber1Uncontrolled initial T.
onTemperatureChange(t: number) => voidFires whenever the slider commits a new T.
tempRangereadonly [number, number][0.5, 8]Slider extents. Both must be > 0, min < max.
showTemperatureSliderbooleantrueRender the embedded temperature slider.
showKlDivergencebooleantrueRender the KL = ... nats readout.
transitionTransitionSPRINGS.smoothSpring for bar-width transitions.
classNamestringMerged onto the root <div> via cn().

Accessibility

  • The chart is wrapped in role="figure" with a dynamic aria-label ("Teacher vs student distributions at temperature 1.00. Teacher peaks at cat (70%). KL divergence 0.043 nats.") so screen-reader users get the same headline as sighted users.
  • The temperature slider is a native <input type="range"> — full keyboard support out of the box (Arrow keys step, Page Up / Page Down step by 10%, Home / End jump to the extents).
  • The KL readout uses aria-live="polite" so changes are announced as the user drags.
  • Bar widths animate with SPRINGS.smooth; reduced-motion users snap to the new values instantly.

Credits

  • Extracted from: craftingattention (app/src/lessons/primitives/viz/DistillationViz.tsx). Stripped the lesson-specific Explore / Predict / Challenge mode strip, alpha (hard-vs-soft loss) slider, hard-label cross-entropy readout, Widget chrome, model-stack illustrations, and history/bookmark plumbing; generalised to a single visualisation primitive that takes two probability vectors as input. Re-cast the math from "logits + softmax" to "probabilities + temperature re-softening" so callers can plug in any distribution. Added the delta arrows in the centre gutter, renormalisation for unnormalised input, and the log-space max-subtraction trick for numerical stability under aggressive T. Replaced the inline spring with SPRINGS.smooth from @craft-bits/core/motion.
  • Inspiration: Hinton, Vinyals, Dean — Distilling the Knowledge in a Neural Network (2015).