Bayesian Capability Detection

What goes wrong with a naive approach

Terminal capability detection is famously awful. $TERM=xterm-256color might mean a real xterm, an iTerm2 lying for compatibility, a tmux passthrough of something else, or an SSH session to a stripped-down container. A DA1 (CSI c) probe replies in a format that depends on the terminal vendor; DECRPM queries are sometimes swallowed and sometimes echoed back as literal text. No single probe is trustworthy.

The usual code is a cascade of brittle checks:


if env("TERM_PROGRAM") == "iTerm.app" { ... }
else if env("TERM").contains("kitty") { ... }
else if /* DA1 response matches pattern X */ { ... }
else { /* give up, assume VT100 */ }

A new terminal ships, one check misfires, and truecolor silently downgrades to 256-color for half your users. You cannot test this — you cannot even enumerate the terminals. The only way forward is to treat each probe as evidence and combine evidence principled-ly.

Mental model

A capability is a hypothesis: “this terminal supports synchronized output.” Each probe is a noisy witness. A Bayes factor describes how much that witness should move our belief if we believed it perfectly. Robustness comes from:

Summing log-BF in logit space — independent evidence combines linearly; hostile evidence cancels favorable evidence.
Setting a posterior threshold — enable the capability only if $P(\text{supported} \mid \text{probes}) > 0.8$ . Ambiguity falls through to the safe default.
Logging every clue — the capability ledger is the first thing a user grabs when a terminal misbehaves; grepping it beats SSHing into prod.

Capability detection is an inference problem hiding as an if-else. Give each probe a Bayes factor, sum them, threshold — and the code stops depending on the exact set of terminals you tested against.

The math

Let $H$ = “terminal supports capability $C$ ”. For probes $E_1, \ldots, E_k$ :

\text{logit}(P(H \mid E_{1:k})) = \text{logit}(P(H)) + \sum_i \log \text{BF}_i

with the sigmoid link:

P(H \mid E_{1:k}) = \sigma\!\left(\text{logit}(P(H)) + \sum_i \log \text{BF}_i\right), \qquad \sigma(x) = \frac{1}{1 + e^{-x}}

Decision rule:

\text{enable } C \iff P(H \mid E_{1:k}) > \tau, \qquad \tau = 0.8 \text{ by default}

Probe weights

Weights $w_i = \log \text{BF}_i$ are calibrated once against a corpus of known-good terminals (iTerm2, Alacritty, kitty, tmux, screen, xterm, Windows Terminal, the terminal inside GNOME, etc.). Typical scales (for “synchronized output” as an example):

Probe	$\log \text{BF}$
`$TERM_PROGRAM = iTerm.app` or `kitty`	+2.3 (hard support)
DA2 reply matches known-good prefix	+1.5
DECRPM 2026 answers `1` (set)	+1.9
DECRPM 2026 answers `2` (reset)	−1.9
`$TERM = dumb` or `linux`	−2.5
DECRPM echoed literally (broken)	−1.0

Missing probes contribute zero — not a penalty. A capability is enabled when the positive evidence clears threshold, regardless of which terminals we have never seen.

Worked example — truecolor

Terminal responds:

$COLORTERM = truecolor — $\log \text{BF} = +2.0$
DA2 reply unrecognized — $\log \text{BF} = 0$ (no information)
$TERM_PROGRAM absent — $\log \text{BF} = 0$

Assume prior $P(H) = 0.5$ (logit 0). Posterior logit = $0 + 2.0 = 2.0$ , $P(H \mid E) = \sigma(2.0) \approx 0.88 > 0.8$ . Enable truecolor.

If instead we had $COLORTERM=truecolor and $TERM=dumb:

\text{logit posterior} = 0 + 2.0 - 2.5 = -0.5, \qquad P(H) \approx 0.38

Fall back to 256-color. One hostile clue overrode the friendly one, which is the right answer — something is misconfigured.

Rust interface

crates/ftui-core/src/caps_probe.rs


use ftui_core::caps_probe::{CapabilityProbe, CapabilityLedger};
 
let mut ledger = CapabilityLedger::new(0.5 /* prior P(H) */);
for probe in probes {
    ledger.add(probe.name(), probe.log_bf());
}
 
let posterior = ledger.posterior_probability(); // sigmoid of sum
let supported = posterior > 0.8;

Every probe is a value, not a branch, so the code never grows the cascade. Adding a new terminal-specific probe is a one-line push into the ledger with a calibrated weight.

How to debug

On session start the ledger lands as a capability_detection line:


{"schema":"capability_detection","capability":"synchronized_output",
 "prior":0.5,"posterior":0.92,"decision":"enabled",
 "entries":[
   {"name":"COLORTERM=truecolor","log_bf":2.0},
   {"name":"DECRPM-2026=set",    "log_bf":1.9},
   {"name":"TERM_PROGRAM=absent","log_bf":0.0}
 ]}


FTUI_EVIDENCE_SINK=/tmp/ftui.jsonl cargo run -p ftui-demo-showcase
 
# Which capabilities flipped per session?
jq -c 'select(.schema=="capability_detection")
       | {cap: .capability, p: .posterior, dec: .decision}' \
  /tmp/ftui.jsonl

Pitfalls

Don’t let weights go enormous. $\log \text{BF} = \pm 5$ means 148:1 odds per probe — one misclassified probe single-handedly decides the capability. Keep weights in $[-3, +3]$ so the posterior depends on several clues, not one.

Prior $0.5$ is the safe choice. A higher prior biases the detector toward enabling; a lower prior biases toward the safe default. Use $0.5$ unless you have a population-level reason (e.g., a capability that every terminal in the last decade supports).

Cross-references

/core/capabilities — the consumer that turns the posterior into feature flags.
Command-palette ledger — the same log-BF machinery applied to search.
/reference/terminal-compatibility — the corpus that calibrates these weights.

Where next

How this piece fits in intelligence.

Intelligence overview

The same log-BF machinery applied to search.

Command-palette ledger