Pretext Lab
12Emoji + graphemes
Arc 4 · Multilingual

Emoji + graphemes

A modern family emoji — 👨‍👩‍👧‍👦 — is seven codepoints held together by three zero-width joiners. A flag is two regional-indicator letters. A waving hand with a specific skin tone is the hand plus a Fitzpatrick modifier. Break any of those clusters in the wrong place and the message is no longer the message: half a flag, a stranger's skin tone, a father with no family.

Drag the width narrow. The message wraps, but the clusters never do — each compound emoji stays one atomic unit and is routed to the next line whole.

height · —px

A small note, sent on a small screen: my family 👨‍👩‍👧‍👦 is well, the weather in Tokyo 🇯🇵 is mild, and I am waving 👋🏽 toward home 🇺🇸 with the usual wish — that nothing inside a cluster ever breaks ❤️ at the boundary of the line.

420
Anatomy of 👨‍👩‍👧‍👦 · one grapheme, seven codepoints
👨U+1F468
ZWJU+200D
👩U+1F469
ZWJU+200D
👧U+1F467
ZWJU+200D
👦U+1F466
Unicode segmenters read \u200d as a "keep together" instruction. The seven codepoints collapse into one grapheme cluster, 👨‍👩‍👧‍👦, and Pretext never breaks inside it.

Mechanism

Inside prepare(), Pretext walks the text with the browser's Intl.Segmenter at the grapheme granularity. A grapheme cluster is whatever the Unicode rules say "a reader perceives as one character" — including ZWJ sequences, regional-indicator pairs (flags), skin-tone modifier sequences, variation-selector pairs, and text-selector clusters. Each cluster is measured once, against the actual canvas font, and cached as a single atomic width.

The line walker only ever asks the question "does the next unit fit?" about whole clusters. That question cannot be answered mid-cluster, so the wrap cannot happen mid-cluster. A compound emoji too wide for the remaining line length is routed to the next line as one piece; it is never, under any width, split.

This is the same mechanism that keeps Thai above-line marks attached to their base consonants and Devanagari conjuncts intact — grapheme clusters all the way down. Emoji are just the most visible case.

Application

Anywhere your UI displays user-written text, grapheme-atomic wrapping is the difference between a message that renders and a message that disfigures:

The rule the engine enforces is simple. Never display half a person.

"I celebrate myself, and sing myself, / And what I assume you shall assume, / For every atom belonging to me as good belongs to you."

Walt Whitman, Leaves of Grass (1855)

Direct Claude

"never split a human" grapheme-atomic wrapping — default behavior of prepare() "flags stay flags at the boundary" regional-indicator pairs are one grapheme; layout() never breaks inside "chat bubbles that round-trip emoji" measure with prepare(); the cached width is the true atomic width "truncate snippets without disfiguring" cut at Pretext line boundaries, not at string.length
Arc 4 closes here. Arc 5 picks up with the same "atomic unit" idea — but now the units are chips, mentions, and pills inside ordinary prose.
the message prepared once, laid out at any width
import { prepare, layout } from '@chenglou/pretext';

// One message with ZWJ families, flags, skin-tone modifiers,
// and a variation-selector heart. Seven-codepoint clusters count as one.
const message =
  "my family \u{1F468}\u200D\u{1F469}\u200D\u{1F467}\u200D\u{1F466} " +
  "is well, Tokyo \u{1F1EF}\u{1F1F5} is mild, " +
  "and I am waving \u{1F44B}\u{1F3FD} toward home \u{1F1FA}\u{1F1F8} \u2764\uFE0F";

const handle = prepare(message, "400 22px 'EB Garamond', serif");

// Narrow widths push emoji to the next line — never split one.
widthSlider.addEventListener('input', (e) => {
  const width = parseFloat(e.target.value);
  const { height } = layout(handle, width, 22 * 1.6);
  textRegion.style.width = width + 'px';
  textRegion.style.minHeight = height + 'px';
});