Skip to Content

Shadow run

A shadow run executes the same Model and event sequence through two independent LabSessions — a baseline and a candidate — and compares their frame checksums. If every frame matches, the candidate is provably rendering-equivalent to the baseline. This is the primary mechanism for proving that a migration (e.g. threading → structured cancellation) preserves determinism before we flip the rollout policy to Enabled.

File: crates/ftui-harness/src/shadow_run.rs.

Signature

crates/ftui-harness/src/shadow_run.rs
pub struct ShadowRunConfig { pub prefix: String, // JSONL filename prefix / run IDs pub scenario_name: String, pub seed: u64, // shared across both lanes pub viewport_width: u16, // default 80 pub viewport_height: u16, // default 24 pub time_step_ms: u64, // default 16 pub baseline_label: String, // default "baseline" pub candidate_label: String, // default "candidate" } pub enum ShadowVerdict { Match, Diverged } pub struct ShadowRun; impl ShadowRun { pub fn compare<M, MF, SF>( config: ShadowRunConfig, model_factory: MF, scenario_fn: SF, ) -> ShadowRunResult where M: Model, MF: Fn() -> M, SF: Fn(&mut LabSession<M>); pub fn assert_match<M, MF, SF>(/* ...same args... */) -> ShadowRunResult; }

The two lanes are LabConfig copies derived from the same ShadowRunConfig; each is passed the same seed, the same time step, and a fresh model from model_factory(). scenario_fn drives the session — init, tick, dispatch, capture_frame, etc.

Viewport and time control

Determinism depends on every non-model input being identical across the two lanes. ShadowRunConfig pins:

  • viewport size,
  • tick step (time_step_ms),
  • seed.

Defaults are 80×24 at 16 ms/tick. Override with the builder helpers:

let cfg = ShadowRunConfig::new("migration_test", "counter", 42) .viewport(120, 40) .time_step_ms(8) .lane_labels("legacy", "structured");

Checksum comparison

Frames are compared per-index by checksum (not full buffer):

crates/ftui-harness/src/shadow_run.rs
pub struct FrameComparison { pub index: usize, pub baseline_checksum: u64, pub candidate_checksum: u64, pub matched: bool, } pub struct ShadowRunResult { pub verdict: ShadowVerdict, // Match or Diverged pub scenario_name: String, pub seed: u64, pub frame_comparisons: Vec<FrameComparison>, pub first_divergence: Option<usize>, // index of first mismatch pub frames_compared: usize, pub baseline: LabOutput, pub candidate: LabOutput, pub baseline_label: String, pub candidate_label: String, pub run_total: u64, // process-wide counter } impl ShadowRunResult { pub fn diverged_count(&self) -> usize; pub fn match_ratio(&self) -> f64; // 0.0..=1.0 }

Checksumming is cheap enough to let you compare thousands of frames per scenario without blowing the test budget. If verdict is Diverged, first_divergence points at the first offending frame and you can re-run with the LabSession in a debugger.

Emitted evidence

compare writes structured JSONL to stderr via the harness’s TestJsonlLogger:

EventMeaning
shadow.startSeed, scenario, lane labels, viewport.
shadow.lane.donePer-lane summary (frames captured, elapsed).
shadow.frame.divergedOne record per mismatched frame.
shadow.verdictTerminal verdict + match ratio.

These flow to the same place as the runtime’s other evidence (see evidence sink) and can be aggregated by the rollout scorecard.

Worked example

tests/shadow_counter.rs
use ftui_harness::shadow_run::{ShadowRun, ShadowRunConfig, ShadowVerdict}; #[test] fn counter_is_deterministic_across_lanes() { let cfg = ShadowRunConfig::new("rollout/counter", "increment", 0xC0FFEE) .viewport(40, 10) .time_step_ms(16); let result = ShadowRun::compare( cfg, || CounterModel::new(), |session| { session.init(); for _ in 0..30 { session.tick(); session.capture_frame(); } }, ); assert_eq!(result.verdict, ShadowVerdict::Match); assert!(result.match_ratio() >= 1.0); assert!(result.first_divergence.is_none()); }

For a “proof by assertion” style where the scenario is known-good:

ShadowRun::assert_match(cfg, || CounterModel::new(), |s| { /* ... */ });

assert_match panics with a diagnostic if the verdict is not Match.

Where divergence typically comes from

CauseFix
Non-deterministic seed (RNG from entropy)Thread the seed through the model; expose a RngSource.
Wall-clock readsUse the session’s deterministic clock, not Instant::now.
HashMap iteration order in viewCollect into a sorted Vec first; use BTreeMap.
Allocator-address-based hashingHash by field, not Arc::as_ptr.
File / network I/OMove behind a Cmd::task — see commands.

Pitfalls

A single non-deterministic byte fails every frame from that point. Divergence is sticky: one chrono::Local::now() in your render path will break every subsequent checksum. Start the diff hunt at first_divergence and walk backwards from there.

Don’t share state between the two lanes. model_factory() must return independent models. Closures that capture Arc<Mutex<_>> will happily de-synchronise the two runs or deadlock.

Cross-references