Determinism soak
The determinism soak test runs the full happy-path and failure-path flows N times under identical seeds and compares the resulting frame checksums. If any non-volatile frame diverges between iterations, the gate fails — and the divergence is almost always a latent bug in something “deterministic” that turned out not to be.
Source: scripts/doctor_frankentui_determinism_soak.sh.
Why a soak test
Individual tests can pass by luck. A stray HashMap iteration, a
wall-clock read, a thread race that usually goes one way but
occasionally goes the other — none of these are reliably caught by a
single run. A soak test repeats the deterministic flow many times and
insists every iteration produces the same ledger. If the system is
honest about determinism, this is cheap; if it isn’t, the soak test
finds out.
Running the soak
# Default: 3 iterations, timestamped run root
./scripts/doctor_frankentui_determinism_soak.sh
# Custom iteration count via env var
DOCTOR_FRANKENTUI_SOAK_RUNS=10 ./scripts/doctor_frankentui_determinism_soak.sh
# Explicit run root + iteration count as positional args
./scripts/doctor_frankentui_determinism_soak.sh /tmp/custom_root 10The script exits non-zero on:
- A required workflow script missing or not executable
(
doctor_frankentui_happy_e2e.sh,doctor_frankentui_failure_e2e.sh). ITERATIONSnot a positive integer.- The schema file missing (
crates/doctor_frankentui/coverage/e2e_jsonl_schema.json). - Any non-volatile frame divergence across iterations.
Prerequisites
bashpython3— for cross-iteration JSONL parsing.jq— for JSON field extraction.cargo— the underlying happy/failure scripts build the demo.
Output layout
/tmp/doctor_frankentui/determinism_soak_<TIMESTAMP>/
├── happy_run_1/ (full happy-path artifacts for iteration 1)
│ ├── logs/
│ ├── project/
│ └── meta/ (summary.json, events.jsonl, artifact_manifest.json, …)
├── happy_run_2/
├── happy_run_3/
├── failure_run_1/ (full failure-path artifacts for iteration 1)
├── failure_run_2/
├── failure_run_3/
├── logs/
│ ├── happy_run_1.stdout.log
│ ├── happy_run_1.stderr.log
│ └── …
└── meta/
├── run_index.tsv
├── determinism_report.json
└── determinism_report.txtEach happy_run_<i> and failure_run_<i> is a complete, independent
run of its flow. The full artifact contract
applies inside each.
meta/run_index.tsv
One row per workflow-iteration combination:
workflow iteration run_dir stdout_log stderr_log status duration_ms
happy 1 happy_run_1 logs/happy_run_1.stdout.log logs/happy_run_1.stderr.log ok 34121
happy 2 happy_run_2 logs/happy_run_2.stdout.log logs/happy_run_2.stderr.log ok 34008
happy 3 happy_run_3 logs/happy_run_3.stdout.log logs/happy_run_3.stderr.log ok 34194
failure 1 failure_run_1 logs/failure_run_1.stdout.log logs/failure_run_1.stderr.log ok 27810
…Gives you a machine-readable index into the per-iteration artifacts.
meta/determinism_report.json
{
"schema_version": "determinism-report-v1",
"iterations": 3,
"total_frames": 2148,
"matches": 2148,
"divergences": 0,
"divergence_ratio": 0.0,
"per_iteration": [
{ "iter": 1, "workflow": "happy", "run_id": "doctor_happy_seed0", "frames": 716 },
{ "iter": 2, "workflow": "happy", "run_id": "doctor_happy_seed0", "frames": 716 },
{ "iter": 3, "workflow": "happy", "run_id": "doctor_happy_seed0", "frames": 716 }
],
"volatile_events": [],
"diverging_frames": []
}Key fields:
total_frames— cumulative frames captured across all iterations.matches— frames whose checksum equalled the iteration-1 baseline.divergences— frames whose checksum differed.divergence_ratio—divergences / total_frames. Gate passes iff this is zero (or every divergence is on the volatile allowlist).volatile_events— events explicitly marked non-deterministic by schema (e.g. a wall-clock event that the system has acknowledged will never be soak-stable; rare).diverging_frames— detail for each mismatched frame: iteration, frame index, baseline checksum, observed checksum, JSON pointer to the event in that iteration’sevents.jsonl.
meta/determinism_report.txt
Human-readable summary. Shape:
Determinism Soak Report
-----------------------
Iterations: 3
Total frames: 2148
Matches: 2148
Divergences: 0
Divergence rate: 0.000%
Verdict: PASS
Per-iteration:
happy/1 ok 716 frames run_id=doctor_happy_seed0
happy/2 ok 716 frames run_id=doctor_happy_seed0
happy/3 ok 716 frames run_id=doctor_happy_seed0
…Debugging a divergence
Open determinism_report.json and locate the first diverging frame
jq '.diverging_frames[0]' meta/determinism_report.jsonNote the iter, frame_idx, baseline_checksum, and observed_checksum.
Open both iterations’ events.jsonl around that frame
sed -n '4800,4820p' happy_run_1/meta/events.jsonl
sed -n '4800,4820p' happy_run_2/meta/events.jsonl
diff <(sed -n '4800,4820p' happy_run_1/meta/events.jsonl) \
<(sed -n '4800,4820p' happy_run_2/meta/events.jsonl)The first differing line is almost always the nondeterministic input.
Classify the source
Common culprits:
| Pattern | Smell |
|---|---|
HashMap iteration order | Switch to BTreeMap or explicit sort. |
std::time::SystemTime::now | Route through DeterminismFixture::now_ms. |
Thread scheduling in Cmd::Task | Ensure the task is ordered relative to update. |
| External process output | Seed the subprocess; capture its output deterministically. |
| Unbounded retry/backoff | Pin the retry count under E2E_DETERMINISTIC. |
Fix, then re-soak
DOCTOR_FRANKENTUI_SOAK_RUNS=10 ./scripts/doctor_frankentui_determinism_soak.shA fix is good when 10 iterations produce zero divergences.
CI wiring
CI runs DOCTOR_FRANKENTUI_SOAK_RUNS=3 on every push and a nightly
DOCTOR_FRANKENTUI_SOAK_RUNS=20 job. The nightly is where rare races
surface; if the nightly goes red, treat it like any other CI failure —
don’t wait for the “lucky green” retry.
Relationship to shadow-runs
A shadow-run compares two lanes of the same iteration. A soak compares N iterations of the same lane. They catch different bugs:
- Shadow-run: catches behaviour-changing migrations (threading vs Asupersync, old diff vs new diff).
- Soak: catches latent non-determinism inside a single lane.
Both are required for a green release. See shadow-run and rollout scorecard.
Pitfalls
Don’t whitelist a divergence to make the gate pass. Every entry
on volatile_events is an admission that part of the system is not
deterministic. Fix the source; add an entry only when determinism is
genuinely impossible (e.g. a wall-clock-stamped audit event the
system does not control).
Iteration count matters. A 3-iteration soak hides races that only manifest 1 in 10 times. Use 10+ when investigating a suspected race, not 3.
Fresh process each iteration. The script spawns a new
doctor_frankentui_*_e2e.sh subprocess per iteration on purpose — a
leftover static, thread-local, or allocator-state would hide a
real bug. Don’t re-plumb the script to reuse state.