Bayesian Capability Detection
What goes wrong with a naive approach
Terminal capability detection is famously awful. $TERM=xterm-256color
might mean a real xterm, an iTerm2 lying for compatibility, a tmux
passthrough of something else, or an SSH session to a stripped-down
container. A DA1 (CSI c) probe replies in a format that depends on
the terminal vendor; DECRPM queries are sometimes swallowed and
sometimes echoed back as literal text. No single probe is trustworthy.
The usual code is a cascade of brittle checks:
if env("TERM_PROGRAM") == "iTerm.app" { ... }
else if env("TERM").contains("kitty") { ... }
else if /* DA1 response matches pattern X */ { ... }
else { /* give up, assume VT100 */ }A new terminal ships, one check misfires, and truecolor silently downgrades to 256-color for half your users. You cannot test this — you cannot even enumerate the terminals. The only way forward is to treat each probe as evidence and combine evidence principled-ly.
Mental model
A capability is a hypothesis: “this terminal supports synchronized output.” Each probe is a noisy witness. A Bayes factor describes how much that witness should move our belief if we believed it perfectly. Robustness comes from:
- Summing log-BF in logit space — independent evidence combines linearly; hostile evidence cancels favorable evidence.
- Setting a posterior threshold — enable the capability only if . Ambiguity falls through to the safe default.
- Logging every clue — the capability ledger is the first thing a user grabs when a terminal misbehaves; grepping it beats SSHing into prod.
Capability detection is an inference problem hiding as an if-else. Give each probe a Bayes factor, sum them, threshold — and the code stops depending on the exact set of terminals you tested against.
The math
Let = “terminal supports capability ”. For probes :
with the sigmoid link:
Decision rule:
Probe weights
Weights are calibrated once against a corpus of known-good terminals (iTerm2, Alacritty, kitty, tmux, screen, xterm, Windows Terminal, the terminal inside GNOME, etc.). Typical scales (for “synchronized output” as an example):
| Probe | |
|---|---|
$TERM_PROGRAM = iTerm.app or kitty | +2.3 (hard support) |
| DA2 reply matches known-good prefix | +1.5 |
DECRPM 2026 answers 1 (set) | +1.9 |
DECRPM 2026 answers 2 (reset) | −1.9 |
$TERM = dumb or linux | −2.5 |
| DECRPM echoed literally (broken) | −1.0 |
Missing probes contribute zero — not a penalty. A capability is enabled when the positive evidence clears threshold, regardless of which terminals we have never seen.
Worked example — truecolor
Terminal responds:
$COLORTERM = truecolor—- DA2 reply unrecognized — (no information)
$TERM_PROGRAMabsent —
Assume prior (logit 0). Posterior logit = , . Enable truecolor.
If instead we had $COLORTERM=truecolor and $TERM=dumb:
Fall back to 256-color. One hostile clue overrode the friendly one, which is the right answer — something is misconfigured.
Rust interface
use ftui_core::caps_probe::{CapabilityProbe, CapabilityLedger};
let mut ledger = CapabilityLedger::new(0.5 /* prior P(H) */);
for probe in probes {
ledger.add(probe.name(), probe.log_bf());
}
let posterior = ledger.posterior_probability(); // sigmoid of sum
let supported = posterior > 0.8;Every probe is a value, not a branch, so the code never grows the cascade. Adding a new terminal-specific probe is a one-line push into the ledger with a calibrated weight.
How to debug
On session start the ledger lands as a capability_detection line:
{"schema":"capability_detection","capability":"synchronized_output",
"prior":0.5,"posterior":0.92,"decision":"enabled",
"entries":[
{"name":"COLORTERM=truecolor","log_bf":2.0},
{"name":"DECRPM-2026=set", "log_bf":1.9},
{"name":"TERM_PROGRAM=absent","log_bf":0.0}
]}FTUI_EVIDENCE_SINK=/tmp/ftui.jsonl cargo run -p ftui-demo-showcase
# Which capabilities flipped per session?
jq -c 'select(.schema=="capability_detection")
| {cap: .capability, p: .posterior, dec: .decision}' \
/tmp/ftui.jsonlPitfalls
Don’t let weights go enormous. means 148:1 odds per probe — one misclassified probe single-handedly decides the capability. Keep weights in so the posterior depends on several clues, not one.
Prior is the safe choice. A higher prior biases the detector toward enabling; a lower prior biases toward the safe default. Use unless you have a population-level reason (e.g., a capability that every terminal in the last decade supports).
Cross-references
/core/capabilities— the consumer that turns the posterior into feature flags.- Command-palette ledger — the same log-BF machinery applied to search.
/reference/terminal-compatibility— the corpus that calibrates these weights.