Pretext Lab
07Word-break keep-all
Arc 3 · Whitespace & Breaks

Word-break keep-all

In Chinese, Japanese, and Korean, every character is a word-like unit. The browser's default behavior is to break between any two CJK characters — which is fine for filling lines, but which can shatter compounds that a native reader expects to stay together. A second option on prepare(), wordBreak: 'keep-all', tells Pretext to refuse those mid-compound breaks. Latin words in the same text continue to break normally.

The same passage rendered twice. Narrow the width and watch the two panes diverge.

wordBreak: normal
height — · lines —
wordBreak: keep-all
height — · lines —
260

Mechanism

When Pretext segments the prepared text, it tags each segment with a break kind. Normal word-break gives CJK runs an implicit zero-width break opportunity between every grapheme — the same rule the browser applies under word-break: normal. A wrap can fall anywhere in the run.

Under wordBreak: 'keep-all', those inter-character break opportunities are suppressed. The only places the engine will break a CJK run are the ones it would always have honored: explicit whitespace (rare in Chinese), punctuation that permits a break, or a boundary with a non-CJK script. Latin words embedded in the passage still break on spaces the way they always did — keep-all only changes CJK behavior.

Because both runs share the same measured widths, the split between the two panes is a pure arithmetic replay of different break rules over the same canvas measurements.

Application

Serving CJK typography that doesn't shatter takes one option. The immediate unlocks:

The pre-measured heights are still exact, so virtualization, masonry, and pre-mount sizing work as before. Keep-all costs you nothing but an option.

"道可道,非常道。名可名,非常名。"
The way that can be spoken is not the eternal way; the name that can be named is not the eternal name.

Laozi, Dao De Jing (c. 4th c. BCE; trans. public domain)

Direct Claude

"respect the compound" wordBreak: 'keep-all' on prepare() "CJK that doesn't shatter" keep-all handle for every CJK paragraph "mixed Chinese and English, wrap each correctly" keep-all — Latin still breaks on spaces "switch behavior by locale" pick the option when you call prepare()
Next: breaking words that won't fit any other way, on purpose, with soft hyphens.
CJK-aware wrapping with one option
import { prepare, layout } from '@chenglou/pretext';

const font = "400 20px 'Songti SC', 'Source Han Serif SC', serif";
const lh = 20 * 1.8;

// Normal: break opportunity between any two CJK characters.
const normalHandle = prepare(cjkText, font);

// keep-all: CJK runs stay intact; Latin words still break on spaces.
const keepAllHandle = prepare(cjkText, font, { wordBreak: 'keep-all' });

const a = layout(normalHandle,  width, lh);
const b = layout(keepAllHandle, width, lh);