Shadow run
A shadow run executes the same Model and event sequence through
two independent LabSessions — a baseline and a candidate — and
compares their frame checksums. If every frame matches, the candidate
is provably rendering-equivalent to the baseline. This is the primary
mechanism for proving that a migration (e.g. threading → structured
cancellation) preserves determinism before we flip the
rollout policy to Enabled.
File: crates/ftui-harness/src/shadow_run.rs.
Signature
pub struct ShadowRunConfig {
pub prefix: String, // JSONL filename prefix / run IDs
pub scenario_name: String,
pub seed: u64, // shared across both lanes
pub viewport_width: u16, // default 80
pub viewport_height: u16, // default 24
pub time_step_ms: u64, // default 16
pub baseline_label: String, // default "baseline"
pub candidate_label: String, // default "candidate"
}
pub enum ShadowVerdict { Match, Diverged }
pub struct ShadowRun;
impl ShadowRun {
pub fn compare<M, MF, SF>(
config: ShadowRunConfig,
model_factory: MF,
scenario_fn: SF,
) -> ShadowRunResult
where
M: Model,
MF: Fn() -> M,
SF: Fn(&mut LabSession<M>);
pub fn assert_match<M, MF, SF>(/* ...same args... */) -> ShadowRunResult;
}The two lanes are LabConfig copies derived from the same
ShadowRunConfig; each is passed the same seed, the same
time step, and a fresh model from model_factory(). scenario_fn
drives the session — init, tick, dispatch, capture_frame, etc.
Viewport and time control
Determinism depends on every non-model input being identical across
the two lanes. ShadowRunConfig pins:
- viewport size,
- tick step (
time_step_ms), - seed.
Defaults are 80×24 at 16 ms/tick. Override with the builder helpers:
let cfg = ShadowRunConfig::new("migration_test", "counter", 42)
.viewport(120, 40)
.time_step_ms(8)
.lane_labels("legacy", "structured");Checksum comparison
Frames are compared per-index by checksum (not full buffer):
pub struct FrameComparison {
pub index: usize,
pub baseline_checksum: u64,
pub candidate_checksum: u64,
pub matched: bool,
}
pub struct ShadowRunResult {
pub verdict: ShadowVerdict, // Match or Diverged
pub scenario_name: String,
pub seed: u64,
pub frame_comparisons: Vec<FrameComparison>,
pub first_divergence: Option<usize>, // index of first mismatch
pub frames_compared: usize,
pub baseline: LabOutput,
pub candidate: LabOutput,
pub baseline_label: String,
pub candidate_label: String,
pub run_total: u64, // process-wide counter
}
impl ShadowRunResult {
pub fn diverged_count(&self) -> usize;
pub fn match_ratio(&self) -> f64; // 0.0..=1.0
}Checksumming is cheap enough to let you compare thousands of frames
per scenario without blowing the test budget. If verdict is
Diverged, first_divergence points at the first offending frame and
you can re-run with the LabSession in a debugger.
Emitted evidence
compare writes structured JSONL to stderr via the harness’s
TestJsonlLogger:
| Event | Meaning |
|---|---|
shadow.start | Seed, scenario, lane labels, viewport. |
shadow.lane.done | Per-lane summary (frames captured, elapsed). |
shadow.frame.diverged | One record per mismatched frame. |
shadow.verdict | Terminal verdict + match ratio. |
These flow to the same place as the runtime’s other evidence (see evidence sink) and can be aggregated by the rollout scorecard.
Worked example
use ftui_harness::shadow_run::{ShadowRun, ShadowRunConfig, ShadowVerdict};
#[test]
fn counter_is_deterministic_across_lanes() {
let cfg = ShadowRunConfig::new("rollout/counter", "increment", 0xC0FFEE)
.viewport(40, 10)
.time_step_ms(16);
let result = ShadowRun::compare(
cfg,
|| CounterModel::new(),
|session| {
session.init();
for _ in 0..30 {
session.tick();
session.capture_frame();
}
},
);
assert_eq!(result.verdict, ShadowVerdict::Match);
assert!(result.match_ratio() >= 1.0);
assert!(result.first_divergence.is_none());
}For a “proof by assertion” style where the scenario is known-good:
ShadowRun::assert_match(cfg, || CounterModel::new(), |s| { /* ... */ });assert_match panics with a diagnostic if the verdict is not Match.
Where divergence typically comes from
| Cause | Fix |
|---|---|
| Non-deterministic seed (RNG from entropy) | Thread the seed through the model; expose a RngSource. |
| Wall-clock reads | Use the session’s deterministic clock, not Instant::now. |
HashMap iteration order in view | Collect into a sorted Vec first; use BTreeMap. |
| Allocator-address-based hashing | Hash by field, not Arc::as_ptr. |
| File / network I/O | Move behind a Cmd::task — see commands. |
Pitfalls
A single non-deterministic byte fails every frame from that point.
Divergence is sticky: one chrono::Local::now() in your render
path will break every subsequent checksum. Start the diff hunt at
first_divergence and walk backwards from there.
Don’t share state between the two lanes. model_factory() must
return independent models. Closures that capture Arc<Mutex<_>>
will happily de-synchronise the two runs or deadlock.