Tokenizer Viz
A three-mode BPE tokenizer playground. Type into the textarea or pick one of six presets (English, rare long word, CJK, JSON, Python) and watch the text shatter into subword tokens, each chip coloured from the semantic palette. Stat readouts surface token count, character count, chars-per-token compression, and an approximate dollar cost at a configurable rate per million input tokens.
Three modes:
- Explore — free-form input plus six presets that demonstrate compression edge cases.
- Predict — five rounds where you guess the token count for a string before the reveal.
- Challenge — four multiple-choice diagnostics on tokenization trade-offs (CJK vs English, JSON overhead, whitespace cost, common-word compression).
Common English words are typically 1 token each. This 11-character text compresses to just 2 tokens — excellent efficiency.
11 of 500 characters
Hello␣world
2tokens11characters5.5chars/token≈ <$0.0001at $3/M input tokens
Common English words are typically 1 token each. This 11-character text compresses to just 2 tokens — excellent efficiency.
Customize
Tokenizer
$3.00
500 chars
Installation
npx shadcn@latest add https://craftbits.dev/r/tokenizer-viz.jsonUsage
import { TokenizerViz } from "@craft-bits/viz/tokenizer-viz";
<TokenizerViz />Bring your own presets and quiz rounds:
<TokenizerViz
presets={[
{
label: "My example",
text: "tokens are fun",
tokens: ["tokens", " are", " fun"],
narration: "Common short words map cleanly to single tokens.",
},
]}
predictRounds={[
{
text: "Hello world",
tokens: ["Hello", " world"],
options: [1, 2, 5, 11],
correctIdx: 1,
explanation: "Common English words map to single tokens.",
},
]}
/>Set your model's actual rate so the cost readout matches your bill:
<TokenizerViz costPerMillionTokens={0.5} />Subscribe to score events:
<TokenizerViz
onModeComplete={({ mode, correct, total }) => {
/* lift the totals into your own analytics */
}}
/>Understanding the component
- Mode tabs. Three semantic-coloured pill buttons drive
data-modeon the root and crossfade between sub-views. Each switch resets per-mode state so a fresh predict run never inherits a stale answer. - Token chips. The explore-mode chip row is a
popLayoutAnimatePresence: each chip pops in with a tiny blur-out spring tinted by its index into the six-slot semantic colour wheel (cb-accent/cb-warning/cb-success/cb-info/cb-error/cb-fg-muted). Hover scales the chip by 5 %; thetitleattribute exposes the literal token text and length. - Stat row. Token count, character count, chars-per-token, and cost are each wrapped in their own
AnimatePresencemode="wait"so only the value that changed flips — preventing the whole row from re-mounting on every keystroke. Numbers usefont-variant-numeric: tabular-nums. - Heuristic tokenizer. The exported
tokenizeWithPresetshelper short-circuits on exact-preset matches (so the demo stays deterministic) and otherwise runs a teaching-grade BPE approximation: common words map to single tokens, rare words shatter at vowel-consonant boundaries, CJK characters are one-per-token, punctuation is its own token. It is not a real tiktoken implementation — it reproduces the qualitative behaviour learners care about without bundling a vocab. - Predict mode. Each round shows the text in a monospace box plus four numeric options. After checking, the correct option scale-bumps, the wrong one shake-x's, the full token breakdown reveals beneath, and the explanation slides in.
- Reduced motion. Under
prefers-reduced-motion: reduce, every entrance animation, the chip pop-in, the wrong-answer shake, the correct-answer scale-bump, and the mode crossfade collapse to instant transitions.
Props
| Prop | Type | Default | Description |
|---|---|---|---|
presets | readonly TokenizerVizPreset[] | 6 presets | Drop-in strings for the explore-mode chip bar. |
predictRounds | readonly TokenizerVizPredictRound[] | 5 rounds | "How many tokens?" MCQ rounds. |
challengeRounds | readonly TokenizerVizChallengeRound[] | 4 rounds | Multiple-choice diagnostics on tokenization trade-offs. |
defaultMode | "explore" | "predict" | "challenge" | "explore" | Mode visible on first render. |
maxInputLength | number | 500 | Hard cap on the explore-mode textarea. |
costPerMillionTokens | number | 3 | Dollars per million input tokens. |
transition | Transition | SPRINGS.snap | Override the entrance spring for tokens and stat readouts. |
onModeChange | (mode) => void | — | Fires when the active mode changes. |
onInputChange | (text: string) => void | — | Fires on every explore-mode keystroke. |
onModeComplete | (score) => void | — | Fires when a predict / challenge run finishes. |
className | string | — | Merged onto the root via cn(). |
Accessibility
- The three mode buttons are a
role="group"witharia-label="Tokenizer mode"andaria-pressedon the active one. - A polite live region (
aria-live="polite") announces the current narration / round / explanation / final score on every state change. - Mode switches focus the mode-content wrapper so screen readers re-read the new pane.
- The textarea has a visible
:focusring and a hidden character-count hint exposed viaaria-describedby. - The token strip is a
role="region"with anaria-labelnaming the token count; each chip carries atitlewith the literal token text and length. - Each option button in predict / challenge has
aria-pressed, a visible focus ring, and meets the 36-pixel minimum hit area. - Colour is never the only signal — correct / wrong answers also carry an explicit
FeedbackBadge("Correct!" / "Not quite") and a score-dots row withrole="img". - Motion respects
prefers-reduced-motion: reduce— every entrance, the chip pop-in, the wrong-answer shake, the correct-answer scale-bump, and the mode crossfade collapse to instant transitions.
Credits
- Extracted from:
craftingattention(app/src/lessons/primitives/systems/TokenizerViz.tsx). The source was a lesson component wired toWidget/useWidgetHistory/ModeStrip/ChallengeBtn/FeedbackBadge/ScoreDotsprimitives in@/lessons/primitives/chrome/*, used CA palette tokens (--color-accent-400,--color-warn-400,--color-success-400,--color-fail-400,--color-ink-*,--color-surface-*,ca-narration) for the token-chip wheel, and called CA-specific spring aliases (SPRINGS.snappy,SPRINGS.gentle,STAGGER.tight,TIMING.wrong.shakeSpring,TIMING.correct.scaleBounce,MICRO.tap). The craft-bits extract drops the lesson chrome (no widget shell, no undo/redo history — pureuseState), re-keys the eight-slot CA palette wheel to a six-slot semanticcb-*wheel so consumer themes repaint freely, and routes every transition through the canonicalSPRINGS.snap/SPRINGS.smooth/SPRINGS.bouncy/TAP_SCALE/STAGGERfrom@craft-bits/core/motion. The hard-coded preset list, predict rounds, and challenge rounds are now props with the originals exposed asTOKENIZER_VIZ_DEFAULT_*constants. The mode strip, primary / secondary buttons, feedback badge, score dots, numeric option, and text option are inlined as private sub-components consumingcb-*tokens.forwardRef+cn()+...propsspread were added;lessonId/SvgLabel/ChallengeBtnlesson chrome was stripped.