Arc 5 · Rich Inline

Streaming chat

Every LLM chat interface faces the same problem: tokens arrive one at a time, and the assistant's bubble has to grow without jostling everything above it. The naive version reads the bubble's DOM height after each token insertion; the browser obliges, and the scroll pins wobble, the typing indicator jumps, the sibling messages nudge. Pretext lets you know the height before the token is on the page.

Press start stream. On the left, the bubble's height comes from the DOM after each token — watch it jitter. On the right, Pretext sizes the bubble before the token renders. The text arrives inside a box that already knows how tall it is.

DOM measurement — height after insert

recite a Dickinson

height · —px bubble chases text

Pretext pre-sized — height before insert

recite a Dickinson

height · —px bubble already tall enough

words appear on both sides at the same cadence rate · 80ms

Mechanism

On the right pane, the token loop looks like this: take the message so far (including the new token), call prepare(message, font), call layout(handle, bubbleMaxWidth, lineHeight), assign the returned height to the bubble before inserting the new token into the DOM. The bubble is already tall enough. When the token appears, nothing above it moves.

The cost profile stays honest. prepare() is the expensive one, and we run it every token — but the text is short and the handle is small, so per-token preparation is well under a frame on any modern machine. In production you'd also cache by a prefix: if the previous call's text is a prefix of this one, you can often skip re-preparing. For the lesson, we re-prepare, and it still stays buttery.

On the left pane, we insert the token first, then read the bubble's offsetHeight. Each read flushes pending layout — and this is the cost Lesson 1 warned about, made visible in an interface everyone recognizes.

Application

Streaming UIs that feel still are a palpable unlock:

An LLM chat whose scroll stays pinned while tokens arrive — the reader's eye does not have to chase a moving target.
A live transcription overlay where the caption box grows to fit without rattling the video beneath it.
A real-time collaborative document where other users' typing inserts pre-sized runs without shifting your cursor context.
A voice-note playback bubble whose waveform and transcript grow in lockstep, because both are arithmetic.

The effect is difficult to describe but impossible to unsee. Users feel the difference even when they can't name it.

"I'm Nobody! Who are you?
Are you — Nobody — too?
Then there's a pair of us!
Don't tell! they'd advertise — you know!"

Emily Dickinson, poem 288 (posthumously published 1891)

Direct Claude

"chat that stays still during streaming" → measure-before-insert with prepare() + layout() in the token loop "bubble pre-sized per token" → re-prepare + layout each token; set bubble.style.height before rendering "no scroll jitter on token arrival" → assign the height returned by layout() ahead of the DOM write "growing message box, nothing above it moves" → avoid any offsetHeight read inside the stream handler

Arc 5 closes here; Arc 6 starts with a different kind of non-static layout: the width itself varies per line.

the token loop — pre-size before insert

import { prepare, layout } from '@chenglou/pretext';

const FONT = "400 17px 'EB Garamond', serif";
const LINE_HEIGHT = 17 * 1.5;

let assembled = '';
for await (const token of stream) {
  assembled += token;

  // 1. Pretext: measure in JS, before touching the DOM.
  const handle = prepare(assembled, FONT);
  const { height } = layout(handle, bubbleMaxWidth, LINE_HEIGHT);

  // 2. Size the bubble — so it's already the right height when the token lands.
  bubble.style.height = (height + BUBBLE_PADDING_Y) + 'px';

  // 3. Now, and only now, write the token into the DOM.
  body.textContent = assembled;
}