Shadow run

A shadow run executes the same Model and event sequence through two independent LabSessions — a baseline and a candidate — and compares their frame checksums. If every frame matches, the candidate is provably rendering-equivalent to the baseline. This is the primary mechanism for proving that a migration (e.g. threading → structured cancellation) preserves determinism before we flip the rollout policy to Enabled.

File: crates/ftui-harness/src/shadow_run.rs.

Signature

crates/ftui-harness/src/shadow_run.rs


pub struct ShadowRunConfig {
    pub prefix: String,             // JSONL filename prefix / run IDs
    pub scenario_name: String,
    pub seed: u64,                  // shared across both lanes
    pub viewport_width: u16,        // default 80
    pub viewport_height: u16,       // default 24
    pub time_step_ms: u64,          // default 16
    pub baseline_label: String,     // default "baseline"
    pub candidate_label: String,    // default "candidate"
}
 
pub enum ShadowVerdict { Match, Diverged }
 
pub struct ShadowRun;
 
impl ShadowRun {
    pub fn compare<M, MF, SF>(
        config: ShadowRunConfig,
        model_factory: MF,
        scenario_fn: SF,
    ) -> ShadowRunResult
    where
        M: Model,
        MF: Fn() -> M,
        SF: Fn(&mut LabSession<M>);
 
    pub fn assert_match<M, MF, SF>(/* ...same args... */) -> ShadowRunResult;
}

The two lanes are LabConfig copies derived from the same ShadowRunConfig; each is passed the same seed, the same time step, and a fresh model from model_factory(). scenario_fn drives the session — init, tick, dispatch, capture_frame, etc.

Viewport and time control

Determinism depends on every non-model input being identical across the two lanes. ShadowRunConfig pins:

viewport size,
tick step (time_step_ms),
seed.

Defaults are 80×24 at 16 ms/tick. Override with the builder helpers:


let cfg = ShadowRunConfig::new("migration_test", "counter", 42)
    .viewport(120, 40)
    .time_step_ms(8)
    .lane_labels("legacy", "structured");

Checksum comparison

Frames are compared per-index by checksum (not full buffer):

crates/ftui-harness/src/shadow_run.rs


pub struct FrameComparison {
    pub index: usize,
    pub baseline_checksum: u64,
    pub candidate_checksum: u64,
    pub matched: bool,
}
 
pub struct ShadowRunResult {
    pub verdict: ShadowVerdict,          // Match or Diverged
    pub scenario_name: String,
    pub seed: u64,
    pub frame_comparisons: Vec<FrameComparison>,
    pub first_divergence: Option<usize>, // index of first mismatch
    pub frames_compared: usize,
    pub baseline: LabOutput,
    pub candidate: LabOutput,
    pub baseline_label: String,
    pub candidate_label: String,
    pub run_total: u64,                  // process-wide counter
}
 
impl ShadowRunResult {
    pub fn diverged_count(&self) -> usize;
    pub fn match_ratio(&self) -> f64;    // 0.0..=1.0
}

Checksumming is cheap enough to let you compare thousands of frames per scenario without blowing the test budget. If verdict is Diverged, first_divergence points at the first offending frame and you can re-run with the LabSession in a debugger.

Emitted evidence

compare writes structured JSONL to stderr via the harness’s TestJsonlLogger:

Event	Meaning
`shadow.start`	Seed, scenario, lane labels, viewport.
`shadow.lane.done`	Per-lane summary (frames captured, elapsed).
`shadow.frame.diverged`	One record per mismatched frame.
`shadow.verdict`	Terminal verdict + match ratio.

These flow to the same place as the runtime’s other evidence (see evidence sink) and can be aggregated by the rollout scorecard.

Worked example

tests/shadow_counter.rs


use ftui_harness::shadow_run::{ShadowRun, ShadowRunConfig, ShadowVerdict};
 
#[test]
fn counter_is_deterministic_across_lanes() {
    let cfg = ShadowRunConfig::new("rollout/counter", "increment", 0xC0FFEE)
        .viewport(40, 10)
        .time_step_ms(16);
 
    let result = ShadowRun::compare(
        cfg,
        || CounterModel::new(),
        |session| {
            session.init();
            for _ in 0..30 {
                session.tick();
                session.capture_frame();
            }
        },
    );
 
    assert_eq!(result.verdict, ShadowVerdict::Match);
    assert!(result.match_ratio() >= 1.0);
    assert!(result.first_divergence.is_none());
}

For a “proof by assertion” style where the scenario is known-good:


ShadowRun::assert_match(cfg, || CounterModel::new(), |s| { /* ... */ });

assert_match panics with a diagnostic if the verdict is not Match.

Where divergence typically comes from

Cause	Fix
Non-deterministic seed (RNG from entropy)	Thread the seed through the model; expose a `RngSource`.
Wall-clock reads	Use the session’s deterministic clock, not `Instant::now`.
HashMap iteration order in `view`	Collect into a sorted `Vec` first; use `BTreeMap`.
Allocator-address-based hashing	Hash by field, not `Arc::as_ptr`.
File / network I/O	Move behind a `Cmd::task` — see commands.

Pitfalls

A single non-deterministic byte fails every frame from that point. Divergence is sticky: one chrono::Local::now() in your render path will break every subsequent checksum. Start the diff hunt at first_divergence and walk backwards from there.

Don’t share state between the two lanes. model_factory() must return independent models. Closures that capture Arc<Mutex<_>> will happily de-synchronise the two runs or deadlock.

Cross-references

Runtime lanes Rollout scorecard Effect queue Evidence sink Testing — shadow run Program simulator

How this piece fits in runtime.

Runtime overview