Skip to Content
ftui-runtimeRolloutRollout scorecard

Rollout scorecard

Shadow runs prove behavioural equivalence; benchmarks prove performance parity. The rollout scorecard is the single object that combines the two into a structured RolloutVerdictGo, NoGo, or Inconclusive — and emits a self-contained evidence bundle operators can attach to a release decision.

File: crates/ftui-harness/src/rollout_scorecard.rs.

Configuration

crates/ftui-harness/src/rollout_scorecard.rs
pub struct RolloutScorecardConfig { /// Minimum number of shadow-run scenarios required. Default: 1. pub min_shadow_scenarios: usize, /// Minimum frame match ratio across all shadow runs (0.0..=1.0). Default: 1.0. pub min_match_ratio: f64, /// Whether a passing benchmark gate is required for Go. Default: false. pub require_benchmark_pass: bool, } impl RolloutScorecardConfig { pub fn min_shadow_scenarios(self, n: usize) -> Self; pub fn min_match_ratio(self, ratio: f64) -> Self; // clamped to 0..=1 pub fn require_benchmark_pass(self, required: bool) -> Self; }

Defaults are conservative: one shadow run and 100 % frame match. For a production rollout, require several scenarios:

let cfg = RolloutScorecardConfig::default() .min_shadow_scenarios(5) .min_match_ratio(1.0) .require_benchmark_pass(true);

Verdict

crates/ftui-harness/src/rollout_scorecard.rs
pub enum RolloutVerdict { Go, // All evidence meets thresholds. NoGo, // Determinism or performance regression detected. Inconclusive, // Not enough evidence to decide. }

Go requires all of:

  1. shadow_results.len() >= min_shadow_scenarios
  2. aggregate_match_ratio() >= min_match_ratio
  3. No ShadowVerdict::Diverged present.
  4. If require_benchmark_pass, a GateResult must be attached and gate.passed().

Anything short of that returns Inconclusive (missing evidence) or NoGo (evidence says “don’t”).

API

crates/ftui-harness/src/rollout_scorecard.rs
pub struct RolloutScorecard { /* ... */ } impl RolloutScorecard { pub fn new(config: RolloutScorecardConfig) -> Self; pub fn add_shadow_result(&mut self, result: ShadowRunResult); pub fn set_benchmark_gate(&mut self, result: GateResult); pub fn shadow_scenario_count(&self) -> usize; pub fn shadow_match_count(&self) -> usize; pub fn aggregate_match_ratio(&self) -> f64; pub fn evaluate(&self) -> RolloutVerdict; pub fn summary(&self) -> RolloutSummary; }

Evidence bundle

RolloutEvidenceBundle is the release artefact — JSON that combines the scorecard verdict with runtime-observed telemetry and lane metadata:

crates/ftui-harness/src/rollout_scorecard.rs
pub struct RolloutEvidenceBundle { pub scorecard: RolloutSummary, pub queue_telemetry: Option<QueueTelemetry>, pub requested_lane: String, pub resolved_lane: String, pub rollout_policy: String, } impl RolloutEvidenceBundle { pub fn to_json(&self) -> String; }

A shortened example of what to_json() produces:

rollout_evidence.json
{ "schema_version": "1.0.0", "scorecard": { "verdict": "GO", "shadow_scenarios": 5, "shadow_matches": 5, "aggregate_match_ratio": 1.0, "total_frames_compared": 4800, "benchmark_passed": "pass", "config": { "min_shadow_scenarios": 5, "min_match_ratio": 1.0, "benchmark_required": true } }, "queue_telemetry": { "enqueued": 1234, "processed": 1234, "dropped": 0, "high_water": 12, "in_flight": 0 }, "runtime": { "requested_lane": "structured", "resolved_lane": "structured", "rollout_policy": "shadow" } }

Worked example

tests/rollout_drill.rs
use ftui_harness::{ rollout_scorecard::{RolloutScorecard, RolloutScorecardConfig, RolloutVerdict}, shadow_run::{ShadowRun, ShadowRunConfig}, }; #[test] fn structured_is_go_for_counter_scenarios() { let scenarios = [ ("increment", 42), ("reset_on_zero", 43), ("burst_ticks", 44), ("resize", 45), ("quit_path", 46), ]; let mut scorecard = RolloutScorecard::new( RolloutScorecardConfig::default() .min_shadow_scenarios(5) .min_match_ratio(1.0), ); for (name, seed) in scenarios { let cfg = ShadowRunConfig::new("rollout/counter", name, seed) .viewport(40, 10); let result = ShadowRun::compare(cfg, || CounterModel::new(), |s| { s.init(); for _ in 0..30 { s.tick(); s.capture_frame(); } }); scorecard.add_shadow_result(result); } assert_eq!(scorecard.evaluate(), RolloutVerdict::Go); let summary = scorecard.summary(); std::fs::write("rollout_summary.json", summary.to_json()).unwrap(); }

Attach a benchmark gate (see the benchmark_gate module) to require performance parity:

scorecard.set_benchmark_gate(gate_result);

Reading a verdict in CI

# Fail CI unless the scorecard said Go: jq -r '.scorecard.verdict' rollout_summary.json | grep -qx 'GO' # Fleet dashboard: count lanes by resolved lane jq -r '.runtime.resolved_lane' **/rollout_evidence.json | sort | uniq -c

Pitfalls

min_match_ratio < 1.0 lets divergence slip through. The ratio helps for long-running scenarios with known-benign noise (e.g. external timestamps baked into the UI), but it is not a substitute for fixing the source. Prefer a deterministic harness first; lower the ratio only after you know why frames diverge.

Scorecard “Inconclusive” is not “Go”. CI gates must distinguish Go from Inconclusive. The default config accepts a single scenario, which is rarely enough to declare parity — bump min_shadow_scenarios for release builds.

Cross-references