Emoji + graphemes
A modern family emoji — 👨👩👧👦 — is seven codepoints held together by three zero-width joiners. A flag is two regional-indicator letters. A waving hand with a specific skin tone is the hand plus a Fitzpatrick modifier. Break any of those clusters in the wrong place and the message is no longer the message: half a flag, a stranger's skin tone, a father with no family.
Drag the width narrow. The message wraps, but the clusters never do — each compound emoji stays one atomic unit and is routed to the next line whole.
\u200d as a "keep together" instruction. The seven codepoints collapse into one grapheme cluster, 👨👩👧👦, and Pretext never breaks inside it.
Mechanism
Inside prepare(), Pretext walks the text with the browser's Intl.Segmenter at the grapheme granularity. A grapheme cluster is whatever the Unicode rules say "a reader perceives as one character" — including ZWJ sequences, regional-indicator pairs (flags), skin-tone modifier sequences, variation-selector pairs, and text-selector clusters. Each cluster is measured once, against the actual canvas font, and cached as a single atomic width.
The line walker only ever asks the question "does the next unit fit?" about whole clusters. That question cannot be answered mid-cluster, so the wrap cannot happen mid-cluster. A compound emoji too wide for the remaining line length is routed to the next line as one piece; it is never, under any width, split.
This is the same mechanism that keeps Thai above-line marks attached to their base consonants and Devanagari conjuncts intact — grapheme clusters all the way down. Emoji are just the most visible case.
Application
Anywhere your UI displays user-written text, grapheme-atomic wrapping is the difference between a message that renders and a message that disfigures:
- A chat app whose message bubbles never show half a flag at the right edge.
- A social feed whose post previews truncate on cluster boundaries — so the emoji at the cut is a whole emoji, not a floating joiner.
- A notifications layer that measures the truncated line width correctly — because every emoji contributes its real atomic width, not its codepoint count.
- Search snippets and link previews where a skin-toned 👋🏽 stays attached to its modifier and doesn't reduce to a neutral wave.
The rule the engine enforces is simple. Never display half a person.
"I celebrate myself, and sing myself, / And what I assume you shall assume, / For every atom belonging to me as good belongs to you."
Walt Whitman, Leaves of Grass (1855)Direct Claude
prepare()
"flags stay flags at the boundary"
→
regional-indicator pairs are one grapheme; layout() never breaks inside
"chat bubbles that round-trip emoji"
→
measure with prepare(); the cached width is the true atomic width
"truncate snippets without disfiguring"
→
cut at Pretext line boundaries, not at string.length
import { prepare, layout } from '@chenglou/pretext';
// One message with ZWJ families, flags, skin-tone modifiers,
// and a variation-selector heart. Seven-codepoint clusters count as one.
const message =
"my family \u{1F468}\u200D\u{1F469}\u200D\u{1F467}\u200D\u{1F466} " +
"is well, Tokyo \u{1F1EF}\u{1F1F5} is mild, " +
"and I am waving \u{1F44B}\u{1F3FD} toward home \u{1F1FA}\u{1F1F8} \u2764\uFE0F";
const handle = prepare(message, "400 22px 'EB Garamond', serif");
// Narrow widths push emoji to the next line — never split one.
widthSlider.addEventListener('input', (e) => {
const width = parseFloat(e.target.value);
const { height } = layout(handle, width, 22 * 1.6);
textRegion.style.width = width + 'px';
textRegion.style.minHeight = height + 'px';
});