Determinism soak

The determinism soak test runs the full happy-path and failure-path flows N times under identical seeds and compares the resulting frame checksums. If any non-volatile frame diverges between iterations, the gate fails — and the divergence is almost always a latent bug in something “deterministic” that turned out not to be.

Source: scripts/doctor_frankentui_determinism_soak.sh.

Why a soak test

Individual tests can pass by luck. A stray HashMap iteration, a wall-clock read, a thread race that usually goes one way but occasionally goes the other — none of these are reliably caught by a single run. A soak test repeats the deterministic flow many times and insists every iteration produces the same ledger. If the system is honest about determinism, this is cheap; if it isn’t, the soak test finds out.

Running the soak


# Default: 3 iterations, timestamped run root
./scripts/doctor_frankentui_determinism_soak.sh
 
# Custom iteration count via env var
DOCTOR_FRANKENTUI_SOAK_RUNS=10 ./scripts/doctor_frankentui_determinism_soak.sh
 
# Explicit run root + iteration count as positional args
./scripts/doctor_frankentui_determinism_soak.sh /tmp/custom_root 10

The script exits non-zero on:

A required workflow script missing or not executable (doctor_frankentui_happy_e2e.sh, doctor_frankentui_failure_e2e.sh).
ITERATIONS not a positive integer.
The schema file missing (crates/doctor_frankentui/coverage/e2e_jsonl_schema.json).
Any non-volatile frame divergence across iterations.

Prerequisites

bash
python3 — for cross-iteration JSONL parsing.
jq — for JSON field extraction.
cargo — the underlying happy/failure scripts build the demo.

Output layout


/tmp/doctor_frankentui/determinism_soak_<TIMESTAMP>/
├── happy_run_1/     (full happy-path artifacts for iteration 1)
│   ├── logs/
│   ├── project/
│   └── meta/        (summary.json, events.jsonl, artifact_manifest.json, …)
├── happy_run_2/
├── happy_run_3/
├── failure_run_1/   (full failure-path artifacts for iteration 1)
├── failure_run_2/
├── failure_run_3/
├── logs/
│   ├── happy_run_1.stdout.log
│   ├── happy_run_1.stderr.log
│   └── …
└── meta/
    ├── run_index.tsv
    ├── determinism_report.json
    └── determinism_report.txt

Each happy_run_<i> and failure_run_<i> is a complete, independent run of its flow. The full artifact contract applies inside each.

`meta/run_index.tsv`

One row per workflow-iteration combination:


workflow	iteration	run_dir	stdout_log	stderr_log	status	duration_ms
happy	1	happy_run_1	logs/happy_run_1.stdout.log	logs/happy_run_1.stderr.log	ok	34121
happy	2	happy_run_2	logs/happy_run_2.stdout.log	logs/happy_run_2.stderr.log	ok	34008
happy	3	happy_run_3	logs/happy_run_3.stdout.log	logs/happy_run_3.stderr.log	ok	34194
failure	1	failure_run_1	logs/failure_run_1.stdout.log	logs/failure_run_1.stderr.log	ok	27810
…

Gives you a machine-readable index into the per-iteration artifacts.

`meta/determinism_report.json`


{
  "schema_version": "determinism-report-v1",
  "iterations": 3,
  "total_frames": 2148,
  "matches": 2148,
  "divergences": 0,
  "divergence_ratio": 0.0,
  "per_iteration": [
    { "iter": 1, "workflow": "happy",   "run_id": "doctor_happy_seed0",   "frames": 716 },
    { "iter": 2, "workflow": "happy",   "run_id": "doctor_happy_seed0",   "frames": 716 },
    { "iter": 3, "workflow": "happy",   "run_id": "doctor_happy_seed0",   "frames": 716 }
  ],
  "volatile_events": [],
  "diverging_frames": []
}

Key fields:

total_frames — cumulative frames captured across all iterations.
matches — frames whose checksum equalled the iteration-1 baseline.
divergences — frames whose checksum differed.
divergence_ratio — divergences / total_frames. Gate passes iff this is zero (or every divergence is on the volatile allowlist).
volatile_events — events explicitly marked non-deterministic by schema (e.g. a wall-clock event that the system has acknowledged will never be soak-stable; rare).
diverging_frames — detail for each mismatched frame: iteration, frame index, baseline checksum, observed checksum, JSON pointer to the event in that iteration’s events.jsonl.

`meta/determinism_report.txt`

Human-readable summary. Shape:


Determinism Soak Report
-----------------------
Iterations:      3
Total frames:    2148
Matches:         2148
Divergences:     0
Divergence rate: 0.000%
Verdict:         PASS

Per-iteration:
  happy/1   ok  716 frames  run_id=doctor_happy_seed0
  happy/2   ok  716 frames  run_id=doctor_happy_seed0
  happy/3   ok  716 frames  run_id=doctor_happy_seed0
  …

Debugging a divergence

Open `determinism_report.json` and locate the first diverging frame


jq '.diverging_frames[0]' meta/determinism_report.json

Note the iter, frame_idx, baseline_checksum, and observed_checksum.

Open both iterations’ `events.jsonl` around that frame


sed -n '4800,4820p' happy_run_1/meta/events.jsonl
sed -n '4800,4820p' happy_run_2/meta/events.jsonl
diff <(sed -n '4800,4820p' happy_run_1/meta/events.jsonl) \
     <(sed -n '4800,4820p' happy_run_2/meta/events.jsonl)

The first differing line is almost always the nondeterministic input.

Classify the source

Common culprits:

Pattern	Smell
`HashMap` iteration order	Switch to `BTreeMap` or explicit sort.
`std::time::SystemTime::now`	Route through `DeterminismFixture::now_ms`.
Thread scheduling in `Cmd::Task`	Ensure the task is ordered relative to `update`.
External process output	Seed the subprocess; capture its output deterministically.
Unbounded retry/backoff	Pin the retry count under `E2E_DETERMINISTIC`.

Fix, then re-soak


DOCTOR_FRANKENTUI_SOAK_RUNS=10 ./scripts/doctor_frankentui_determinism_soak.sh

A fix is good when 10 iterations produce zero divergences.

CI wiring

CI runs DOCTOR_FRANKENTUI_SOAK_RUNS=3 on every push and a nightly DOCTOR_FRANKENTUI_SOAK_RUNS=20 job. The nightly is where rare races surface; if the nightly goes red, treat it like any other CI failure — don’t wait for the “lucky green” retry.

Relationship to shadow-runs

A shadow-run compares two lanes of the same iteration. A soak compares N iterations of the same lane. They catch different bugs:

Shadow-run: catches behaviour-changing migrations (threading vs Asupersync, old diff vs new diff).
Soak: catches latent non-determinism inside a single lane.

Both are required for a green release. See shadow-run and rollout scorecard.

Pitfalls

Don’t whitelist a divergence to make the gate pass. Every entry on volatile_events is an admission that part of the system is not deterministic. Fix the source; add an entry only when determinism is genuinely impossible (e.g. a wall-clock-stamped audit event the system does not control).

Iteration count matters. A 3-iteration soak hides races that only manifest 1 in 10 times. Use 10+ when investigating a suspected race, not 3.

Fresh process each iteration. The script spawns a new doctor_frankentui_*_e2e.sh subprocess per iteration on purpose — a leftover static, thread-local, or allocator-state would hide a real bug. Don’t re-plumb the script to reuse state.

Overview Commands Artifacts contract Replay triage Shadow-run Determinism fixtures Rollout scorecard