Bayesian Hint Ranking

What goes wrong with a naive approach

A help overlay usually sorts keybinding hints by raw usage count. Two problems show up almost immediately:

Cold-start misery. Fresh installs have zero counts everywhere; the sort collapses to alphabetical. The first hundred keystrokes feel worse than random because the ordering changes on every single input.
Oscillation. When two hints have comparable utility, a single click flips the order, users’ eyes chase the moving target, and the overlay becomes a stress toy. “Just add a threshold” doesn’t help because the boundary moves too.

The hint ranker needs to trade off three things: how useful does the hint look so far, how much more do we expect to learn by showing it, and how much screen real estate does it burn. And the ordering has to be stable under noise.

Mental model

Think of each hint as a tiny RL arm:

Utility $U_i$ is Bernoulli: did the user actually use the hint when shown? Maintain a Beta posterior $U_i \sim \text{Beta}(\alpha_i, \beta_i)$ updated with conjugate counts.
Exploration matters early. Add a VOI bonus proportional to the posterior standard deviation — this is the Bayesian cousin of UCB.
Every hint row costs screen cells. Subtract a display cost $\lambda C_i$ per hint.
Sort by net value $V_i$ . Swap only when a challenger’s net value exceeds the incumbent’s by a hysteresis margin $\epsilon$ — that is what kills oscillation.

The ranker is a stable Bayesian bandit. The Beta posterior gives calibrated expected utility, the VOI bonus gives exploration without tuning, and hysteresis makes the ordering a low-pass filter on the underlying scores.

The math

Net value

V_i = E[U_i] + w_{\text{voi}} \sqrt{\operatorname{Var}(U_i)} - \lambda C_i

With $U_i \sim \text{Beta}(\alpha, \beta)$ :

E[U_i] = \frac{\alpha}{\alpha + \beta}, \qquad \operatorname{Var}(U_i) = \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}

Defaults: $\alpha_0 = \beta_0 = 1$ (uniform prior), $w_{\text{voi}} = 1$ , $\lambda = 0.01$ .

Conjugate updates

When a hint is shown and acted on:

\alpha \leftarrow \alpha + 1

When a hint is shown and ignored:

\beta \leftarrow \beta + 1

No hyperparameters, no learning rate — the update is just counting with smoothing built in.

Hysteresis swap rule

Let $\pi$ be the current displayed order. A swap of ranks $i$ and $j$ (with $i < j$ ) is allowed only if:

V_j > V_i + \varepsilon_{\text{hys}}

The ranker stores $\varepsilon_{\text{hys}} = 2 \cdot$ average posterior standard deviation, so the margin scales down as evidence accumulates. Cold-start hints need a big gap to shuffle; mature hints swap as soon as their expected utilities separate cleanly.

Why hysteresis matters for UI stability

Without hysteresis


frame 0:   [Ctrl-F  Ctrl-G  Ctrl-H]  V = [0.51, 0.50, 0.43]
frame 1:   [Ctrl-G  Ctrl-F  Ctrl-H]  V = [0.50, 0.51, 0.43]   // flip!
frame 2:   [Ctrl-F  Ctrl-G  Ctrl-H]  V = [0.52, 0.50, 0.43]   // flop!

A single additional success for either command flips the order. The user’s eye tracks a moving target. Every overlay render looks like a slot machine.

Rust interface

crates/ftui-widgets/src/hint_ranker.rs


use ftui_widgets::hint_ranker::{HintRanker, HintStats, RankerConfig};
 
let cfg = RankerConfig {
    prior_alpha: 1.0,
    prior_beta:  1.0,
    lambda:      0.01,     // display cost weight
    w_voi:       1.0,      // exploration bonus
    hysteresis:  0.05,     // minimum swap margin
};
let mut ranker = HintRanker::new(cfg);
 
// On every help-overlay render:
let visible: &[HintStats] = ranker.rank(&candidates);
 
// When the user acts on (or ignores) hint `id`:
ranker.record_outcome(id, acted_on);

HintStats::expected_utility returns $E[U]$ ; net_value returns $V$ ; the ranker’s last_decision() exposes the ordered slice along with per-hint scores for the debug overlay.

How to debug

The ranker emits hint-ranking-v1 lines to the evidence sink:


{"schema":"hint-ranking-v1","id":"editor.format","label":"Format",
 "expected_utility":0.61,"cost":0.08,"net_value":0.55,
 "voi":0.12,"rank":1}


FTUI_EVIDENCE_SINK=/tmp/ftui.jsonl cargo run -p ftui-demo-showcase
 
# Trace a single hint's trajectory:
jq -c 'select(.schema=="hint-ranking-v1" and .id=="editor.format")' \
  /tmp/ftui.jsonl

If you see rank oscillating across frames, the hysteresis margin is too small for the posterior variance; raise hysteresis or increase the priors.

Pitfalls

Conjugate priors can drown slow-moving truth. If prior_alpha and prior_beta are large (say, 50), dozens of real uses barely move the posterior. Keep priors at $\alpha_0 = \beta_0 = 1$ unless you have strong evidence the population rate is not 0.5.

VOI bonus is not UCB. The bonus scales with $\sqrt{\operatorname{Var}}$ , not $\sqrt{\log t / n}$ . That means it shrinks only as the posterior sharpens, which is what we want for a TUI (we never have asymptotic data), but it also means you cannot prove standard UCB regret bounds here. Validate via the evidence log before deploying.

Cross-references

VOI sampling — same Beta posterior, different consumer.
/widgets/composition — how the help overlay plugs into the widget tree.
/runtime/evidence-sink — where the hint-ranking-v1 lines land.

Where next

How this piece fits in intelligence.

Intelligence overview

Same Beta posterior, different consumer.

VOI sampling