Pretext Lab
08Soft hyphens
Arc 3 · Whitespace & Breaks

Soft hyphens

Justified text at narrow widths is a graveyard of rivers — vertical columns of white space that run through consecutive lines because a long word refused to break. The soft hyphen, ­ (U+00AD), is the typesetter's one-character answer. It marks a place in a word where the engine is allowed to break, and a trailing hyphen appears only if it does.

Below is the same Thoreau passage, justified, at a narrow column. Toggle hyphens off and on.

height — · lines —
300

Mechanism

Pretext treats U+00AD as a soft-hyphen segment during prepare() — neither visible nor a word boundary on its own. When layout() walks the prepared segments and a word would otherwise overflow the line, the engine consults the most recent soft-hyphen position inside that word and checks whether breaking there, plus a hyphen glyph, still fits.

If it fits, the line ends with a visible - and the next line starts with the remainder of the word. If no soft-hyphen position fits, the break falls wherever the engine would have fallen without it. Either way, ­ characters not chosen as a break leave no visible trace.

The only decision you make is where to annotate — at natural syllable boundaries (tran­scen­den­tal) — and the engine decides whether to use each one, per line, per width.

Application

Print-quality justification on the web becomes ordinary. Specifically:

The pre-measured height still matches the rendered height — rivers go away without any pipeline changes downstream.

"I went to the woods because I wished to live deliberately, to front only the essential facts of life, and see if I could not learn what it had to teach, and not, when I came to die, discover that I had not lived."

Henry David Thoreau, Walden (1854)

Direct Claude

"no more rivers in this column" annotate long words with ­ at syllable breaks "tight column without ugliness" soft hyphens + Pretext fit, justify the result "break where an editor would" pick your own ­ positions, not overflow-wrap "print-quality justification on the web" Pretext honors U+00AD during prepare()
Next: tabs — the other character that browsers usually swallow, which Pretext preserves under pre-wrap.
annotate once, engine decides per width
import { prepare, layout } from '@chenglou/pretext';

// Soft hyphens (\u00AD or ­ in HTML) mark *permission* to break.
// Annotate once, at natural syllable boundaries, in the source text.
const text =
  "I went to the woods because I wished to live delib\u00ADer\u00ADate\u00ADly, " +
  "to front only the essen\u00ADtial facts of life...";

const handle = prepare(text, "400 20px 'EB Garamond', serif");

// layout() walks the prepared segments and, per line, decides whether to
// break at any soft hyphen it passed through. A visible '-' appears only
// when a soft hyphen is actually used as the break.
const { height, lineCount } = layout(handle, 300, 20 * 1.55);