Skip to Content
ftui-renderGrapheme pool

Grapheme Pool

Most terminal cells hold a single Unicode scalar — an ASCII letter, a CJK ideograph, or an emoji that happens to be one codepoint wide. A 4-byte char fits inline in CellContent with no heap lookup, and 99% of cells in a typical UI are this case. The remaining 1% — ZWJ sequences like 👨‍👩‍👧‍👦, flag emoji, complex combining sequences — need multi-codepoint strings that do not fit in 4 bytes.

GraphemePool interns those strings and hands back a 32-bit GraphemeId that encodes three fields: a slot index, a generation counter, and a display width. The generation counter makes slot reuse safe: if a GraphemeId is held past the lifetime of its pool entry, the lookup returns None rather than the new tenant’s content.

This page documents the bit packing, the interning semantics, and the refcount-plus-generation dance that keeps stale IDs safe.

Motivation

Three design constraints conflict:

  1. Keep Cell at 16 bytes. A cell must not embed a heap pointer or a variable-length payload; the SIMD compare assumes a fixed layout.
  2. Avoid heap allocations on the hot path. ASCII-dominated text (every code editor, every log viewer) would be unusable if each char round-tripped through a pool.
  3. Support full Unicode faithfully. Showing 👨‍👩‍👧‍👦 as four separate cells, each rendering a fragment of the ZWJ sequence, is broken.

The resolution: CellContent stores either an inline char or a GraphemeId, discriminated by bit 31. Simple cells skip the pool entirely; complex cells pay one lookup per emission and share the string across every cell that uses it.

GraphemeId bit layout

crates/ftui-render/src/cell.rs
#[repr(transparent)] pub struct GraphemeId(u32);
bit 31 bit 30 ──── 27 | bit 26 ──── 16 | bit 15 ──── 0 ↓ ↓ ↓ ↓ type width (4) generation (11) slot (16) discriminant 0..=15 cells 0..=2047 0..=65535 (0 = Char, 1 = GraphemeId)
  • Slot (16 bits). Index into Vec<Option<GraphemeSlot>>. 64K distinct grapheme clusters per frame is plenty — even heavily emoji-rich UIs never approach this.
  • Generation (11 bits). Incremented when a slot is freed and reused. 2048 distinct lives per slot detects the ABA problem: holding an ID while the slot turns over, then dereferencing it, fails safely.
  • Width (4 bits). Cached display width (0–15 cells). The width is also embedded in the GraphemeSlot itself for debugging, but the GraphemeId carries it redundantly so the layout engine can ask id.width() without a pool indirection.

Bit 31 discriminates the CellContent union: when it is 0, the remaining 31 bits are a char (cast to u32); when it is 1, the remaining 31 bits are a packed GraphemeId.

How ASCII stays inline

crates/ftui-render/src/cell.rs
pub struct CellContent(u32); impl CellContent { pub const fn from_char(c: char) -> Self { CellContent((c as u32) & 0x7FFF_FFFF) // clear bit 31 } pub fn as_char(self) -> Option<char> { if self.0 >> 31 == 0 { // bit 31 = 0 → inline char::from_u32(self.0) } else { None } } pub fn as_grapheme_id(self) -> Option<GraphemeId> { if self.0 >> 31 == 1 { Some(GraphemeId::from_raw(self.0 & 0x7FFF_FFFF)) } else { None } } }

Cell::from_char('a') stores 0x00000061 — pure char cast, no allocation, no pool touch. Cell::from_char('€') (U+20AC) stores 0x000020AC — still inline. Only when content exceeds what a char can represent (multi-codepoint clusters) does the pool come into play.

Interior code points greater than the Unicode maximum (0x10FFFF = 0x0010FFFF) are not possible via from_char, so there is no ambiguity with bit 31 being accidentally set by a valid Unicode char.

Interning and refcount

crates/ftui-render/src/grapheme_pool.rs
pub struct GraphemePool { slots: Vec<Option<GraphemeSlot>>, // None = free generations: Vec<u16>, // per-slot, incremented on reuse lookup: AHashMap<String, GraphemeId>, // dedup free_list: Vec<u32>, // reuse freed slots LIFO } struct GraphemeSlot { text: String, width: u8, refcount: u32, }

Interning (pool.intern(s, width)):

  1. Look up s in the AHashMap. If present, increment the slot’s refcount and return the existing GraphemeId.
  2. If not present, pop a slot from free_list (or push a new one); populate text, width, refcount = 1; insert in the lookup map; return a GraphemeId packed from (slot, current_generation, width).

Retain (pool.retain(id)) — increment refcount when a GraphemeId is copied into another cell.

Release (pool.release(id)) — decrement refcount; if it reaches zero, set the slot to None, bump generations[slot] += 1 (mod 2048), remove from the lookup map, push slot onto free_list.

Lookup (pool.get(id)):

  1. Extract slot from the id.
  2. If slot >= slots.len(), return None.
  3. If generations[slot] != id.generation(), return None (stale!).
  4. If slots[slot].is_none(), return None.
  5. Otherwise return the &str.

Stale-safe lookup — the ABA guard

t0: id = pool.intern("👨‍👩‍👧‍👦", 2); // slot=3, gen=0 pool.release(id); pool.release(id); // refcount → 0, slot freed // generations[3] += 1 => now 1 t1: id2 = pool.intern("🧑🏽‍💻", 2); // reuses slot 3, gen=1 t2: pool.get(id); // id.generation() == 0, generations[3] == 1 → None pool.get(id2); // matches → Some("🧑🏽‍💻")

Without the generation check, get(id) at t2 would return "🧑🏽‍💻" — the new tenant — silently corrupting whatever used the stale id. Eleven bits of generation give 2048 distinct lives; a slot would have to cycle 2049 times and line up with a stale ID and have refcount mismatch at exactly the wrong moment to alias. In practice this is a correctness guarantee, not just an optimization.

Worked example

examples/pool.rs
use ftui_render::grapheme_pool::GraphemePool; use ftui_render::cell::{Cell, CellContent}; let mut pool = GraphemePool::new(); // Intern a ZWJ family emoji. let family = pool.intern("👨‍👩‍👧‍👦", 2); assert_eq!(family.width(), 2); assert_eq!(pool.get(family), Some("👨‍👩‍👧‍👦")); // Place it in two cells of a buffer — retain the id for the second. let cell_a = Cell::new(CellContent::from_grapheme_id(family)); pool.retain(family); let cell_b = Cell::new(CellContent::from_grapheme_id(family)); // ... cells used in a frame ... // On overwrite, release once per cell. pool.release(family); // refcount 2 → 1 assert!(pool.get(family).is_some()); // still alive pool.release(family); // refcount 1 → 0 assert!(pool.get(family).is_none()); // freed + generation bumped

Wide-character continuation

When a cell has width > 1, subsequent columns are filled with Cell::CONTINUATION (a distinguished content value) so the diff does not treat the “second half” of a CJK character as an independent cell. The continuation has transparent colors and no attrs, so it is cheap to compare. The presenter knows to skip continuation cells during emission.

Don’t stash a GraphemeId in long-lived state. IDs are valid only as long as the pool holds a reference. A cache that saves GraphemeIds across frames must either retain each one (and release on eviction) or re-intern on every use. Storing the raw &str and interning lazily is safer for occasional lookups; the pool’s AHashMap will dedup.

Capacity and telemetry

pool.len() is the active slot count, pool.capacity() is the allocated upper bound. The runtime exposes these via the observability widgets so a developer building emoji-heavy UI can see whether pool churn is a concern. Typical steady-state usage is dozens of slots — the pool is not a hot allocator.

Cross-references

  • Cell & BufferCellContent layout and the bit-31 discriminant.
  • Frame — where the &mut GraphemePool is borrowed from.
  • Presenter — the lookup path during ANSI emission.
  • Grapheme width — the layout engine’s counterpart (admission cache, NFC normalization).
  • Screen modes — continuation cells behave the same inline and alt-screen.
  • One-writer rule — pool access is single-writer by virtue of frame ownership.

Where next