Command-Palette Evidence Ledger

What goes wrong with a naive approach

A classical fuzzy search ranks candidates by an ad-hoc score: a prefix match gets +10, a word-boundary hit +5, each gap −1, tags add a bonus. The constants are picked by the author’s taste. The number “+10” has no units, no noise model, no way to combine it with a second piece of evidence other than hoping the scales line up. When a user complains that “open settings” was ranked below “opens ettings” you stare at the code and cannot even say why.

The palette needs three things the naive score can’t provide:

A common currency for clues that disagree (a great prefix but bad gaps).
An explanation for each ranking decision — which clue mattered, by how much.
Stable ordering when scores tie, so top-k doesn’t flicker as the user types.

The first two come from Bayes factors in log-odds space. The third is handled by the rank-confidence layer on top.

Mental model

Treat ranking as a hypothesis test: is this candidate the user’s intended command, or not?

Start with prior odds that depend on the match type alone. An exact match is overwhelmingly likely (~99:1). A fuzzy subsequence is a long shot (~1:3).
Each observed clue — a word-boundary hit, a tag match, a small position, a tight gap — is a Bayes factor (likelihood ratio). Each factor multiplies the posterior odds.
The final log-posterior is the sort key. The list of (description, log BF) pairs is the ranking explanation.

The palette is a probabilistic classifier in disguise. The user types characters; we update a posterior over “which command did you mean?” and show the argmax-k with their evidence ledgers attached.

The math

Odds form of Bayes’ rule for relevance $R$ given evidence $E$ :

\frac{P(R \mid E)}{P(\neg R \mid E)} = \frac{P(R)}{P(\neg R)} \prod_i \text{BF}_i, \qquad \text{BF}_i = \frac{P(E_i \mid R)}{P(E_i \mid \neg R)}

Taking logs turns the product into a sum — numerically stable and trivially auditable:

\log \frac{P(R \mid E)}{P(\neg R \mid E)} = \log \frac{P(R)}{P(\neg R)} + \sum_i \log \text{BF}_i

Prior odds by match type

Match type	Prior odds	$P(R)$	Intuition
`Exact`	99:1	0.99	User typed the full command name.
`Prefix`	9:1	0.90	Stem of the name.
`WordStart`	4:1	0.80	Lines up with a word boundary.
`Substring`	2:1	0.67	Contiguous inside the name.
`Fuzzy`	1:3	0.25	Subsequence, nothing more.

Evidence factors (illustrative)

Each factor is computed once per candidate per keystroke. Typical clues:

Word-boundary hit — $\log \text{BF} \approx +0.7$ per boundary hit.
Early position — $\log \text{BF} = -\beta \cdot \text{start\_pos}$ .
Tight gap density — $\log \text{BF} = -\gamma \cdot \sum \text{gap}_i$ .
Tag match — $\log \text{BF} \approx +1.0$ when the query also matches a tag.
Recent use — $\log \text{BF} \approx +0.5$ for commands fired in the last session.

The signs are principled: clues that make relevance more likely have positive log-BF, clues that erode it are negative, and a missing clue contributes zero (not a made-up penalty).

Worked example — typing `pal`

Consider three candidates matching pal:

palette.open — WordStart, gap 0, tag palette.
terminal.palette — Substring at position 9, gap 0.
unrelated.place — Fuzzy, gaps totalling 4.

The ledger written to the evidence sink looks like (schema trimmed):


{"schema":"match-evidence","id":"palette.open","match_type":"WordStart",
 "log_prior":1.386,"entries":[
    {"desc":"boundary hit",    "log_bf":+0.70},
    {"desc":"position=0",      "log_bf":+0.30},
    {"desc":"tag 'palette'",   "log_bf":+1.00}],
 "log_posterior":3.386,"rank":1}
 
{"schema":"match-evidence","id":"terminal.palette","match_type":"Substring",
 "log_prior":0.693,"entries":[
    {"desc":"position=9",      "log_bf":-0.20}],
 "log_posterior":0.493,"rank":2}
 
{"schema":"match-evidence","id":"unrelated.place","match_type":"Fuzzy",
 "log_prior":-1.099,"entries":[
    {"desc":"gap_sum=4",       "log_bf":-0.80}],
 "log_posterior":-1.899,"rank":3}

Reading the ledger: the first two clues on palette.open push the posterior past a Substring match with no bonuses, and the tag match turns it into a clean lead. The sort key is the final log posterior, but you can explain the lead to a user with one glance at the entries array.

Rust interface

crates/ftui-widgets/src/command_palette/scorer.rs


use ftui_widgets::command_palette::{EvidenceEntry, EvidenceLedger, MatchType};
 
let mut ledger = EvidenceLedger::new(MatchType::WordStart);
ledger.add(EvidenceEntry::new("boundary hit",   0.70));
ledger.add(EvidenceEntry::new("position=0",     0.30));
ledger.add(EvidenceEntry::new("tag 'palette'",  1.00));
 
let log_posterior = ledger.log_posterior();  // = log_prior + Σ log_bf
let p_relevant    = ledger.posterior();      // sigmoid(log_posterior)

Priors come from the match type:


// MatchType::prior_odds(self) -> f64
let prior_odds  = MatchType::WordStart.prior_odds(); // 4.0
let log_prior   = prior_odds.ln();                   // ≈ 1.386

Every EvidenceEntry carries a human-readable description, so the ledger doubles as the “why is this ranked here?” explanation. The palette widget surfaces this through /widgets/command-palette — a debug overlay renders the top-k entries with their log-BF bars.

How to debug

Enable the evidence sink and filter to match-evidence:


FTUI_EVIDENCE_SINK=/tmp/ftui.jsonl cargo run -p ftui-demo-showcase
 
# Ledgers for the last search:
jq -c 'select(.schema=="match-evidence")' /tmp/ftui.jsonl | tail -20

Pinpointing a surprise: if a command should be ranked first but isn’t, the ledger tells you whether the prior was wrong (match-type misclassified) or a specific Bayes factor was missing:


# Find candidates where the tag clue was skipped:
jq -c 'select(.schema=="match-evidence" and (
  [.entries[] | .desc] | contains(["tag"]) | not
))' /tmp/ftui.jsonl

Pitfalls

Don’t inflate priors to paper over weak evidence. If Fuzzy candidates keep winning, raising Fuzzy’s prior from 1:3 to 1:1 will also let them beat real matches during noisy typing. Instead, add the missing evidence factor (e.g., a penalty for long gap runs) — the ledger stays honest and the posterior calibrates itself.

Independence is an approximation. The Bayes-factor product assumes clues are conditionally independent given $R$ . They are not: a word-boundary hit at position 0 correlates with a Prefix match type. The palette compensates by keeping factor magnitudes small (mostly $|\log \text{BF}| < 1$ ) so double-counting is bounded.

Cross-references

Rank confidence — the conformal layer that breaks ties deterministically.
/widgets/command-palette — the consuming widget’s API and UX.
/runtime/evidence-sink — how match-evidence lines are emitted and rotated.

Where next

How this piece fits in intelligence.

Intelligence overview

The conformal layer that breaks ties deterministically.

Rank confidence