Skip to Content
IntelligenceBayesian inferenceCommand palette ledger

Command-Palette Evidence Ledger

What goes wrong with a naive approach

A classical fuzzy search ranks candidates by an ad-hoc score: a prefix match gets +10, a word-boundary hit +5, each gap −1, tags add a bonus. The constants are picked by the author’s taste. The number “+10” has no units, no noise model, no way to combine it with a second piece of evidence other than hoping the scales line up. When a user complains that “open settings” was ranked below “opens ettings” you stare at the code and cannot even say why.

The palette needs three things the naive score can’t provide:

  1. A common currency for clues that disagree (a great prefix but bad gaps).
  2. An explanation for each ranking decision — which clue mattered, by how much.
  3. Stable ordering when scores tie, so top-k doesn’t flicker as the user types.

The first two come from Bayes factors in log-odds space. The third is handled by the rank-confidence layer on top.

Mental model

Treat ranking as a hypothesis test: is this candidate the user’s intended command, or not?

  • Start with prior odds that depend on the match type alone. An exact match is overwhelmingly likely (~99:1). A fuzzy subsequence is a long shot (~1:3).
  • Each observed clue — a word-boundary hit, a tag match, a small position, a tight gap — is a Bayes factor (likelihood ratio). Each factor multiplies the posterior odds.
  • The final log-posterior is the sort key. The list of (description, log BF) pairs is the ranking explanation.

The palette is a probabilistic classifier in disguise. The user types characters; we update a posterior over “which command did you mean?” and show the argmax-k with their evidence ledgers attached.

The math

Odds form of Bayes’ rule for relevance RR given evidence EE:

P(RE)P(¬RE)=P(R)P(¬R)iBFi,BFi=P(EiR)P(Ei¬R)\frac{P(R \mid E)}{P(\neg R \mid E)} = \frac{P(R)}{P(\neg R)} \prod_i \text{BF}_i, \qquad \text{BF}_i = \frac{P(E_i \mid R)}{P(E_i \mid \neg R)}

Taking logs turns the product into a sum — numerically stable and trivially auditable:

logP(RE)P(¬RE)=logP(R)P(¬R)+ilogBFi\log \frac{P(R \mid E)}{P(\neg R \mid E)} = \log \frac{P(R)}{P(\neg R)} + \sum_i \log \text{BF}_i

Prior odds by match type

Match typePrior oddsP(R)P(R)Intuition
Exact99:10.99User typed the full command name.
Prefix9:10.90Stem of the name.
WordStart4:10.80Lines up with a word boundary.
Substring2:10.67Contiguous inside the name.
Fuzzy1:30.25Subsequence, nothing more.

Evidence factors (illustrative)

Each factor is computed once per candidate per keystroke. Typical clues:

  • Word-boundary hitlogBF+0.7\log \text{BF} \approx +0.7 per boundary hit.
  • Early positionlogBF=βstart_pos\log \text{BF} = -\beta \cdot \text{start\_pos}.
  • Tight gap densitylogBF=γgapi\log \text{BF} = -\gamma \cdot \sum \text{gap}_i.
  • Tag matchlogBF+1.0\log \text{BF} \approx +1.0 when the query also matches a tag.
  • Recent uselogBF+0.5\log \text{BF} \approx +0.5 for commands fired in the last session.

The signs are principled: clues that make relevance more likely have positive log-BF, clues that erode it are negative, and a missing clue contributes zero (not a made-up penalty).

Worked example — typing pal

Consider three candidates matching pal:

  1. palette.open — WordStart, gap 0, tag palette.
  2. terminal.palette — Substring at position 9, gap 0.
  3. unrelated.place — Fuzzy, gaps totalling 4.

The ledger written to the evidence sink looks like (schema trimmed):

{"schema":"match-evidence","id":"palette.open","match_type":"WordStart", "log_prior":1.386,"entries":[ {"desc":"boundary hit", "log_bf":+0.70}, {"desc":"position=0", "log_bf":+0.30}, {"desc":"tag 'palette'", "log_bf":+1.00}], "log_posterior":3.386,"rank":1} {"schema":"match-evidence","id":"terminal.palette","match_type":"Substring", "log_prior":0.693,"entries":[ {"desc":"position=9", "log_bf":-0.20}], "log_posterior":0.493,"rank":2} {"schema":"match-evidence","id":"unrelated.place","match_type":"Fuzzy", "log_prior":-1.099,"entries":[ {"desc":"gap_sum=4", "log_bf":-0.80}], "log_posterior":-1.899,"rank":3}

Reading the ledger: the first two clues on palette.open push the posterior past a Substring match with no bonuses, and the tag match turns it into a clean lead. The sort key is the final log posterior, but you can explain the lead to a user with one glance at the entries array.

Rust interface

crates/ftui-widgets/src/command_palette/scorer.rs
use ftui_widgets::command_palette::{EvidenceEntry, EvidenceLedger, MatchType}; let mut ledger = EvidenceLedger::new(MatchType::WordStart); ledger.add(EvidenceEntry::new("boundary hit", 0.70)); ledger.add(EvidenceEntry::new("position=0", 0.30)); ledger.add(EvidenceEntry::new("tag 'palette'", 1.00)); let log_posterior = ledger.log_posterior(); // = log_prior + Σ log_bf let p_relevant = ledger.posterior(); // sigmoid(log_posterior)

Priors come from the match type:

// MatchType::prior_odds(self) -> f64 let prior_odds = MatchType::WordStart.prior_odds(); // 4.0 let log_prior = prior_odds.ln(); // ≈ 1.386

Every EvidenceEntry carries a human-readable description, so the ledger doubles as the “why is this ranked here?” explanation. The palette widget surfaces this through /widgets/command-palette — a debug overlay renders the top-k entries with their log-BF bars.

How to debug

Enable the evidence sink and filter to match-evidence:

FTUI_EVIDENCE_SINK=/tmp/ftui.jsonl cargo run -p ftui-demo-showcase # Ledgers for the last search: jq -c 'select(.schema=="match-evidence")' /tmp/ftui.jsonl | tail -20

Pinpointing a surprise: if a command should be ranked first but isn’t, the ledger tells you whether the prior was wrong (match-type misclassified) or a specific Bayes factor was missing:

# Find candidates where the tag clue was skipped: jq -c 'select(.schema=="match-evidence" and ( [.entries[] | .desc] | contains(["tag"]) | not ))' /tmp/ftui.jsonl

Pitfalls

Don’t inflate priors to paper over weak evidence. If Fuzzy candidates keep winning, raising Fuzzy’s prior from 1:3 to 1:1 will also let them beat real matches during noisy typing. Instead, add the missing evidence factor (e.g., a penalty for long gap runs) — the ledger stays honest and the posterior calibrates itself.

Independence is an approximation. The Bayes-factor product assumes clues are conditionally independent given RR. They are not: a word-boundary hit at position 0 correlates with a Prefix match type. The palette compensates by keeping factor magnitudes small (mostly logBF<1|\log \text{BF}| < 1) so double-counting is bounded.

Cross-references

Where next