Skip to Content
ftui-renderCell & Buffer

Cell & Buffer

Cell is the atom of the render grid: 16 bytes, #[repr(C, align(16))], four-per-cache-line. Buffer is a 2D row-major array of cells plus a scissor stack, an opacity stack, and three layers of dirty tracking (per-row bitmap, per-row spans, per-cell bitmap) that together let the diff engine skip unchanged work at three different granularities.

These two types are the entire data model of the render kernel. Everything downstream — BufferDiff, Presenter, Frame — is a function of them. The layout choices (16 bytes, row-major, compile-time size asserts) are load-bearing: they let bits_eq lower to a single 128-bit SIMD compare and let the diff engine iterate 4-cell blocks at a time without branching on width.

This page documents the layout, the invariants, and the three dirty mechanisms with their trade-offs. The math and strategy selection live in diff; the ANSI emission lives in presenter.

Motivation

A cell that is 24 or 32 bytes wastes cache: the diff loop touches every cell on every row regardless of dirty tracking, and an extra 8 bytes per cell is 8 × 80 × 24 = 15 KB of extra L1 pressure per frame. 16 bytes is the Pareto point — it holds content, 32-bit RGB foreground, 32-bit RGB background, and a bitflags/link ID pair, while staying inside a single 128-bit SIMD lane. The compile-time assert assert!(size_of::<Cell>() == 16) (cell.rs:L338) makes the constraint a non-negotiable.

Cell layout

crates/ftui-render/src/cell.rs
#[repr(C, align(16))] pub struct Cell { pub content: CellContent, // 4 bytes: inline char OR GraphemeId pub fg: PackedRgba, // 4 bytes: R8G8B8A8 foreground pub bg: PackedRgba, // 4 bytes: R8G8B8A8 background pub attrs: CellAttrs, // 4 bytes: style flags[16] | link_id[16] } const _: () = assert!(core::mem::size_of::<Cell>() == 16);
┌──────────────────────────────── 64-byte cache line ────────────────────────────────┐ │ │ ┌────────────────┬────────────────┬────────────────┬────────────────┐ byte offset │ 0 15│ 16 31│ 32 47│ 48 63│ ├────────────────┼────────────────┼────────────────┼────────────────┤ │ Cell[0] │ Cell[1] │ Cell[2] │ Cell[3] │ └────────────────┴────────────────┴────────────────┴────────────────┘
  • CellContent discriminates on the high bit: bit 31 = 0 means Char(char) inline; bit 31 = 1 means GraphemeId into the grapheme pool. ASCII lives inline with no pool lookup — the 99% fast path.
  • PackedRgba is a u32 with SIMD-friendly order. Alpha is present to support opacity blending during composition.
  • CellAttrs bundles 16 style-flag bits (Bold, Italic, Underline, Dim, Reverse, Strikethrough, Blink, …) with a 16-bit link ID (zero = no hyperlink; otherwise indexes LinkRegistry).

bits_eq — the hot path

The diff inner loop calls bits_eq on every cell it visits. The implementation uses bitwise & rather than short-circuit &&:

crates/ftui-render/src/cell.rs
#[inline(always)] pub fn bits_eq(&self, other: &Self) -> bool { (self.content.raw() == other.content.raw()) & (self.fg == other.fg) & (self.bg == other.bg) & (self.attrs == other.attrs) }

Four unconditional u32 == u32 compares let LLVM lower the whole function to a single vpcmpeqd / pcmpeqb plus reduction on x86_64 with SSE2 (Tier-1), or cmeq on AArch64. Short-circuiting would force the compiler to emit three branches — the exact branches the change density in a TUI hurts the predictor on.

A unit test (cell_eq_matches_bits_eq, cell.rs:L1794) pins bits_eq equivalence with derive’d PartialEq.

Buffer: row-major 2D grid

crates/ftui-render/src/buffer.rs
pub struct Buffer { width: u16, height: u16, cells: Vec<Cell>, // len = width * height, row-major scissor_stack: Vec<Rect>, // clipping, monotone intersection opacity_stack: Vec<f32>, // composition alpha // Three layers of dirty tracking: dirty_rows: Vec<bool>, // per-row bitmap (len == height) dirty_spans: Vec<DirtySpanRow>, // per-row Vec<(x0, x1)> dirty_bits: Vec<u8>, // per-cell bitmap (tile-skip SAT) dirty_cells: usize, dirty_all: bool, pub degradation: DegradationLevel, // set by runtime before view() }

Indexing is row-major: cell (x, y) lives at cells[y * width + x]. This matches the ANSI emission order (cursor advances across, then down) so the diff can scan memory in the same order the presenter will emit it — cache-friendly in both directions.

Scissor stack — monotone intersection

Widgets nest freely: a panel contains a list contains a row contains a cell. Each nesting pushes a scissor rect; the effective clip at any point is the intersection of every rect on the stack.

push_scissor(R₀) ── stack: [R₀] push_scissor(R₁) ── stack: [R₀, R₀ ∩ R₁] ⊆ R₀ push_scissor(R₂) ── stack: [R₀, R₀∩R₁, ⋯∩R₂] ⊆ R₀ ∩ R₁ pop_scissor() ── stack: [R₀, R₀ ∩ R₁]

The monotonicity invariant says top-of-stack never grows on push. This lets every Buffer::set(x, y, cell) check bounds once against the top of the stack — no per-cell iteration up the stack is needed — and it enables the set_line / fill_row fast paths to clamp ranges without re-intersecting. Violating monotonicity (pushing a rect that is not a subset of the current top) would break those fast paths silently.

Dirty tracking at three scales

LayerGranularityStructureUsed by
dirty_rowsrowVec<bool>diff outer loop (skip clean rows)
dirty_spansrange within rowper-row SmallVec<[DirtySpan; 4]>diff inner loop (scan only dirty x-ranges)
dirty_bitssingle cellVec<u8> bitmap + SATtile skip hints, Bayesian strategy

dirty_rows. A Vec<bool> of length height; dirty_rows[y] = true iff some cell in row y was mutated since the last clear_dirty(). The diff skips non-dirty rows unconditionally — a row that is clean now matches the previous buffer’s row, so the change set for it is empty.

dirty_spans. For dense tracking, every row carries a sorted, non-overlapping list of half-open ranges [x0, x1) covering the cells that changed. Adjacent spans within merge_gap (default 1 cell) are coalesced; once a row accumulates more than max_spans_per_row (default 64) spans, the row falls back to “full dirty” and the inner loop scans [0, width). This bounds the cost of mutation tracking at O(log(spans) + merge_cost) per call and prevents quadratic blowup on pathologically fragmented rows.

dirty_bits. A per-cell bitmap fed into a Summed-Area Table (SAT) — see diff — so the diff can answer “is every cell in this rectangular tile clean?” in O(1) and skip whole tiles.

crates/ftui-render/src/buffer.rs
pub struct DirtySpanConfig { pub enabled: bool, // toggle span tracking pub max_spans_per_row: usize, // 64 default; cap before fallback pub merge_gap: u16, // 1 cell default; merge if |gap| ≤ gap pub guard_band: u16, // expand spans on each side }

Buffer invariant (dirty-row soundness)

Formally, for every row y:

(x[0, width):old(x,y)new(x,y))    dirty_rows[y]=true\big(\exists x \in [0,\ \text{width}) : \text{old}(x, y) \neq \text{new}(x, y)\big) \;\Longrightarrow\; \text{dirty\_rows}[y] = \text{true}

This is enough for the diff engine to safely drop clean rows: a clean row under this invariant must be cell-wise equal to its predecessor, so there are no changes to emit. See crates/ftui-render/src/buffer.rs:L210-L240 for the invariant comment in situ.

Minimal buffer example

examples/buffer.rs
use ftui_render::buffer::Buffer; use ftui_render::cell::Cell; use ftui_core::geometry::Rect; let mut buf = Buffer::new(80, 24); // Draw a line. for (i, ch) in "Hello".chars().enumerate() { buf.set(i as u16, 0, Cell::from_char(ch)); } // Nested scissor — clip a widget to rows 2..=6, cols 10..=40. buf.push_scissor(Rect { x: 10, y: 2, width: 31, height: 5 }); { // Inner widget draws — writes outside the scissor are clipped. buf.set(50, 4, Cell::from_char('!')); // out-of-scissor: discarded buf.set(12, 3, Cell::from_char('*')); // in-scissor: written } buf.pop_scissor(); // Inspect dirty tracking. let stats = buf.dirty_span_stats(); eprintln!("rows with spans: {}", stats.rows_with_spans);

Never mutate a Buffer between BufferDiff::compute and Presenter::present. The diff captures a snapshot of ChangeRuns; subsequent writes change the underlying cells but the diff still points at the old ones. The presenter then emits bytes for stale content (or worse, for rows whose widths changed), and the terminal sees a corrupted SGR stream. The correct order is always mutate → compute → present → swap.

Buffer dimensions are immutable

Buffer fixes its (width, height) at construction. Terminal resize is handled by the runtime: it creates a new buffer at the new size, copies what it wants, and diffs against the old one — which will trip the “full redraw” fallback because dimensions changed. See DoubleBuffer and AdaptiveDoubleBuffer in the same module for the swap infrastructure.

Cross-references

  • Frame — how widgets see the buffer.
  • Diff — what consumes dirty tracking.
  • Presenter — what emits ANSI from the change runs.
  • Bayesian diff strategy — how dirty-bitmap density selects between full / dirty-row / redraw.
  • Screen modes — inline vs. alt-screen impact on buffer lifecycle.
  • One-writer rule — why single-writer ownership keeps the invariants holdable.

Where next