Cell & Buffer
Cell is the atom of the render grid: 16 bytes, #[repr(C, align(16))], four-per-cache-line. Buffer is a 2D row-major array
of cells plus a scissor stack, an opacity stack, and three layers of
dirty tracking (per-row bitmap, per-row spans, per-cell bitmap) that
together let the diff engine skip unchanged work at three different
granularities.
These two types are the entire data model of the render kernel.
Everything downstream — BufferDiff, Presenter, Frame — is a
function of them. The layout choices (16 bytes, row-major,
compile-time size asserts) are load-bearing: they let
bits_eq lower to a single 128-bit SIMD compare and let the diff
engine iterate 4-cell blocks at a time without branching on width.
This page documents the layout, the invariants, and the three dirty mechanisms with their trade-offs. The math and strategy selection live in diff; the ANSI emission lives in presenter.
Motivation
A cell that is 24 or 32 bytes wastes cache: the diff loop touches
every cell on every row regardless of dirty tracking, and an extra 8
bytes per cell is 8 × 80 × 24 = 15 KB of extra L1 pressure per frame.
16 bytes is the Pareto point — it holds content, 32-bit RGB
foreground, 32-bit RGB background, and a bitflags/link ID pair, while
staying inside a single 128-bit SIMD lane. The compile-time assert
assert!(size_of::<Cell>() == 16) (cell.rs:L338) makes the
constraint a non-negotiable.
Cell layout
#[repr(C, align(16))]
pub struct Cell {
pub content: CellContent, // 4 bytes: inline char OR GraphemeId
pub fg: PackedRgba, // 4 bytes: R8G8B8A8 foreground
pub bg: PackedRgba, // 4 bytes: R8G8B8A8 background
pub attrs: CellAttrs, // 4 bytes: style flags[16] | link_id[16]
}
const _: () = assert!(core::mem::size_of::<Cell>() == 16); ┌──────────────────────────────── 64-byte cache line ────────────────────────────────┐
│ │
┌────────────────┬────────────────┬────────────────┬────────────────┐
byte offset │ 0 15│ 16 31│ 32 47│ 48 63│
├────────────────┼────────────────┼────────────────┼────────────────┤
│ Cell[0] │ Cell[1] │ Cell[2] │ Cell[3] │
└────────────────┴────────────────┴────────────────┴────────────────┘CellContentdiscriminates on the high bit: bit 31 = 0 meansChar(char)inline; bit 31 = 1 meansGraphemeIdinto the grapheme pool. ASCII lives inline with no pool lookup — the 99% fast path.PackedRgbais au32with SIMD-friendly order. Alpha is present to support opacity blending during composition.CellAttrsbundles 16 style-flag bits (Bold,Italic,Underline,Dim,Reverse,Strikethrough,Blink, …) with a 16-bit link ID (zero = no hyperlink; otherwise indexesLinkRegistry).
bits_eq — the hot path
The diff inner loop calls bits_eq on every cell it visits. The
implementation uses bitwise & rather than short-circuit &&:
#[inline(always)]
pub fn bits_eq(&self, other: &Self) -> bool {
(self.content.raw() == other.content.raw())
& (self.fg == other.fg)
& (self.bg == other.bg)
& (self.attrs == other.attrs)
}Four unconditional u32 == u32 compares let LLVM lower the whole
function to a single vpcmpeqd / pcmpeqb plus reduction on x86_64
with SSE2 (Tier-1), or cmeq on AArch64. Short-circuiting would
force the compiler to emit three branches — the exact branches the
change density in a TUI hurts the predictor on.
A unit test (cell_eq_matches_bits_eq, cell.rs:L1794) pins
bits_eq equivalence with derive’d PartialEq.
Buffer: row-major 2D grid
pub struct Buffer {
width: u16,
height: u16,
cells: Vec<Cell>, // len = width * height, row-major
scissor_stack: Vec<Rect>, // clipping, monotone intersection
opacity_stack: Vec<f32>, // composition alpha
// Three layers of dirty tracking:
dirty_rows: Vec<bool>, // per-row bitmap (len == height)
dirty_spans: Vec<DirtySpanRow>, // per-row Vec<(x0, x1)>
dirty_bits: Vec<u8>, // per-cell bitmap (tile-skip SAT)
dirty_cells: usize,
dirty_all: bool,
pub degradation: DegradationLevel, // set by runtime before view()
}Indexing is row-major: cell (x, y) lives at cells[y * width + x].
This matches the ANSI emission order (cursor advances across, then
down) so the diff can scan memory in the same order the presenter
will emit it — cache-friendly in both directions.
Scissor stack — monotone intersection
Widgets nest freely: a panel contains a list contains a row contains a cell. Each nesting pushes a scissor rect; the effective clip at any point is the intersection of every rect on the stack.
push_scissor(R₀) ── stack: [R₀]
push_scissor(R₁) ── stack: [R₀, R₀ ∩ R₁] ⊆ R₀
push_scissor(R₂) ── stack: [R₀, R₀∩R₁, ⋯∩R₂] ⊆ R₀ ∩ R₁
pop_scissor() ── stack: [R₀, R₀ ∩ R₁]The monotonicity invariant says top-of-stack never grows on
push. This lets every Buffer::set(x, y, cell) check bounds once
against the top of the stack — no per-cell iteration up the stack is
needed — and it enables the set_line / fill_row fast paths to
clamp ranges without re-intersecting. Violating monotonicity (pushing
a rect that is not a subset of the current top) would break those
fast paths silently.
Dirty tracking at three scales
| Layer | Granularity | Structure | Used by |
|---|---|---|---|
dirty_rows | row | Vec<bool> | diff outer loop (skip clean rows) |
dirty_spans | range within row | per-row SmallVec<[DirtySpan; 4]> | diff inner loop (scan only dirty x-ranges) |
dirty_bits | single cell | Vec<u8> bitmap + SAT | tile skip hints, Bayesian strategy |
dirty_rows. A Vec<bool> of length height; dirty_rows[y] = true iff some cell in row y was mutated since the last clear_dirty().
The diff skips non-dirty rows unconditionally — a row that is clean
now matches the previous buffer’s row, so the change set for it is
empty.
dirty_spans. For dense tracking, every row carries a sorted,
non-overlapping list of half-open ranges [x0, x1) covering the
cells that changed. Adjacent spans within merge_gap (default 1
cell) are coalesced; once a row accumulates more than
max_spans_per_row (default 64) spans, the row falls back to “full
dirty” and the inner loop scans [0, width). This bounds the cost
of mutation tracking at O(log(spans) + merge_cost) per call and
prevents quadratic blowup on pathologically fragmented rows.
dirty_bits. A per-cell bitmap fed into a Summed-Area Table
(SAT) — see diff — so the diff can answer “is every
cell in this rectangular tile clean?” in O(1) and skip whole tiles.
pub struct DirtySpanConfig {
pub enabled: bool, // toggle span tracking
pub max_spans_per_row: usize, // 64 default; cap before fallback
pub merge_gap: u16, // 1 cell default; merge if |gap| ≤ gap
pub guard_band: u16, // expand spans on each side
}Buffer invariant (dirty-row soundness)
Formally, for every row y:
This is enough for the diff engine to safely drop clean rows: a clean
row under this invariant must be cell-wise equal to its predecessor,
so there are no changes to emit. See
crates/ftui-render/src/buffer.rs:L210-L240 for the invariant
comment in situ.
Minimal buffer example
use ftui_render::buffer::Buffer;
use ftui_render::cell::Cell;
use ftui_core::geometry::Rect;
let mut buf = Buffer::new(80, 24);
// Draw a line.
for (i, ch) in "Hello".chars().enumerate() {
buf.set(i as u16, 0, Cell::from_char(ch));
}
// Nested scissor — clip a widget to rows 2..=6, cols 10..=40.
buf.push_scissor(Rect { x: 10, y: 2, width: 31, height: 5 });
{
// Inner widget draws — writes outside the scissor are clipped.
buf.set(50, 4, Cell::from_char('!')); // out-of-scissor: discarded
buf.set(12, 3, Cell::from_char('*')); // in-scissor: written
}
buf.pop_scissor();
// Inspect dirty tracking.
let stats = buf.dirty_span_stats();
eprintln!("rows with spans: {}", stats.rows_with_spans);Never mutate a Buffer between BufferDiff::compute and
Presenter::present. The diff captures a snapshot of
ChangeRuns; subsequent writes change the underlying cells but the
diff still points at the old ones. The presenter then emits bytes
for stale content (or worse, for rows whose widths changed), and
the terminal sees a corrupted SGR stream. The correct order is
always mutate → compute → present → swap.
Buffer dimensions are immutable
Buffer fixes its (width, height) at construction. Terminal
resize is handled by the runtime: it creates a new buffer at the new
size, copies what it wants, and diffs against the old one — which
will trip the “full redraw” fallback because dimensions changed. See
DoubleBuffer and AdaptiveDoubleBuffer in the same module for the
swap infrastructure.
Cross-references
- Frame — how widgets see the buffer.
- Diff — what consumes dirty tracking.
- Presenter — what emits ANSI from the change runs.
- Bayesian diff strategy — how dirty-bitmap density selects between full / dirty-row / redraw.
- Screen modes — inline vs. alt-screen impact on buffer lifecycle.
- One-writer rule — why single-writer ownership keeps the invariants holdable.