Skip to Content
IntelligenceBayesian inferenceCapability detection

Bayesian Capability Detection

What goes wrong with a naive approach

Terminal capability detection is famously awful. $TERM=xterm-256color might mean a real xterm, an iTerm2 lying for compatibility, a tmux passthrough of something else, or an SSH session to a stripped-down container. A DA1 (CSI c) probe replies in a format that depends on the terminal vendor; DECRPM queries are sometimes swallowed and sometimes echoed back as literal text. No single probe is trustworthy.

The usual code is a cascade of brittle checks:

if env("TERM_PROGRAM") == "iTerm.app" { ... } else if env("TERM").contains("kitty") { ... } else if /* DA1 response matches pattern X */ { ... } else { /* give up, assume VT100 */ }

A new terminal ships, one check misfires, and truecolor silently downgrades to 256-color for half your users. You cannot test this — you cannot even enumerate the terminals. The only way forward is to treat each probe as evidence and combine evidence principled-ly.

Mental model

A capability is a hypothesis: “this terminal supports synchronized output.” Each probe is a noisy witness. A Bayes factor describes how much that witness should move our belief if we believed it perfectly. Robustness comes from:

  • Summing log-BF in logit space — independent evidence combines linearly; hostile evidence cancels favorable evidence.
  • Setting a posterior threshold — enable the capability only if P(supportedprobes)>0.8P(\text{supported} \mid \text{probes}) > 0.8. Ambiguity falls through to the safe default.
  • Logging every clue — the capability ledger is the first thing a user grabs when a terminal misbehaves; grepping it beats SSHing into prod.

Capability detection is an inference problem hiding as an if-else. Give each probe a Bayes factor, sum them, threshold — and the code stops depending on the exact set of terminals you tested against.

The math

Let HH = “terminal supports capability CC”. For probes E1,,EkE_1, \ldots, E_k:

logit(P(HE1:k))=logit(P(H))+ilogBFi\text{logit}(P(H \mid E_{1:k})) = \text{logit}(P(H)) + \sum_i \log \text{BF}_i

with the sigmoid link:

P(HE1:k)=σ ⁣(logit(P(H))+ilogBFi),σ(x)=11+exP(H \mid E_{1:k}) = \sigma\!\left(\text{logit}(P(H)) + \sum_i \log \text{BF}_i\right), \qquad \sigma(x) = \frac{1}{1 + e^{-x}}

Decision rule:

enable C    P(HE1:k)>τ,τ=0.8 by default\text{enable } C \iff P(H \mid E_{1:k}) > \tau, \qquad \tau = 0.8 \text{ by default}

Probe weights

Weights wi=logBFiw_i = \log \text{BF}_i are calibrated once against a corpus of known-good terminals (iTerm2, Alacritty, kitty, tmux, screen, xterm, Windows Terminal, the terminal inside GNOME, etc.). Typical scales (for “synchronized output” as an example):

ProbelogBF\log \text{BF}
$TERM_PROGRAM = iTerm.app or kitty+2.3 (hard support)
DA2 reply matches known-good prefix+1.5
DECRPM 2026 answers 1 (set)+1.9
DECRPM 2026 answers 2 (reset)−1.9
$TERM = dumb or linux−2.5
DECRPM echoed literally (broken)−1.0

Missing probes contribute zero — not a penalty. A capability is enabled when the positive evidence clears threshold, regardless of which terminals we have never seen.

Worked example — truecolor

Terminal responds:

  • $COLORTERM = truecolorlogBF=+2.0\log \text{BF} = +2.0
  • DA2 reply unrecognized — logBF=0\log \text{BF} = 0 (no information)
  • $TERM_PROGRAM absent — logBF=0\log \text{BF} = 0

Assume prior P(H)=0.5P(H) = 0.5 (logit 0). Posterior logit = 0+2.0=2.00 + 2.0 = 2.0, P(HE)=σ(2.0)0.88>0.8P(H \mid E) = \sigma(2.0) \approx 0.88 > 0.8. Enable truecolor.

If instead we had $COLORTERM=truecolor and $TERM=dumb:

logit posterior=0+2.02.5=0.5,P(H)0.38\text{logit posterior} = 0 + 2.0 - 2.5 = -0.5, \qquad P(H) \approx 0.38

Fall back to 256-color. One hostile clue overrode the friendly one, which is the right answer — something is misconfigured.

Rust interface

crates/ftui-core/src/caps_probe.rs
use ftui_core::caps_probe::{CapabilityProbe, CapabilityLedger}; let mut ledger = CapabilityLedger::new(0.5 /* prior P(H) */); for probe in probes { ledger.add(probe.name(), probe.log_bf()); } let posterior = ledger.posterior_probability(); // sigmoid of sum let supported = posterior > 0.8;

Every probe is a value, not a branch, so the code never grows the cascade. Adding a new terminal-specific probe is a one-line push into the ledger with a calibrated weight.

How to debug

On session start the ledger lands as a capability_detection line:

{"schema":"capability_detection","capability":"synchronized_output", "prior":0.5,"posterior":0.92,"decision":"enabled", "entries":[ {"name":"COLORTERM=truecolor","log_bf":2.0}, {"name":"DECRPM-2026=set", "log_bf":1.9}, {"name":"TERM_PROGRAM=absent","log_bf":0.0} ]}
FTUI_EVIDENCE_SINK=/tmp/ftui.jsonl cargo run -p ftui-demo-showcase # Which capabilities flipped per session? jq -c 'select(.schema=="capability_detection") | {cap: .capability, p: .posterior, dec: .decision}' \ /tmp/ftui.jsonl

Pitfalls

Don’t let weights go enormous. logBF=±5\log \text{BF} = \pm 5 means 148:1 odds per probe — one misclassified probe single-handedly decides the capability. Keep weights in [3,+3][-3, +3] so the posterior depends on several clues, not one.

Prior 0.50.5 is the safe choice. A higher prior biases the detector toward enabling; a lower prior biases toward the safe default. Use 0.50.5 unless you have a population-level reason (e.g., a capability that every terminal in the last decade supports).

Cross-references

Where next