TASK-1.4 Pass 3 — Close-out prompt
- Status: Proposed (Copilot session not started)
- Date: 2026-05-02
- Spec: SLICE-1.4: Storm & Soak Profiles
- Replaces: the Pass 3 Copilot-prompt block in TASK-1.4, which assumed a clean Pass-3 start
Why this file exists
The original TASK-1.4 Pass 3 prompt (written 2026-04-30 when SLICE-1.4 opened) assumed Pass 3 would run both captures from a clean state. Reality diverged:
- The ChaosMonkey 30-min capture ran on 2026-05-01 against commit
2108272. CSV:docs/captures/slice-1-4-chaos-monkey-2026-05-01.csv(currently UNTRACKED in git). - Four FlaUI / DI fixes were needed to make
MultiTagSoakFlaUisurvive ChaosMonkey faults (bf32566,0f1596a,5462d42,2108272). - The 8-hour Soak8h capture has not run yet.
- A post-Pass-2 review surfaced two issues that must be addressed in a small pre-flight commit BEFORE the Soak8h capture starts, otherwise the Soak8h would have to be re-run.
This file is the close-out prompt that replaces the original Pass 3 prompt. The original Pass 3 prompt remains in TASK-1.4 as the historical record of how the slice was opened.
What ships
Two commits:
- Commit A — pre-flight fixes (small, single-purpose). Flips
Simulator:FlakySdk:Enabledfromtruetofalseinappsettings.jsonso existing rows (slice-1-1, slice-1-2, slice-1-3) reproduce against the merged commit (criterion 16). FixesFlakySdkDecoratortimeout-branch to fall through to inner when the caller's CT was not cancelled, per spec. Adds a regression test for the fall-through path. - Commit B — Pass 3 final close (docs + CSVs only). Appends two row blocks to
phase-1-measurements.md. Writes runbook §4.5 + §4.6. Updates CLAUDE.md and roadmap-progress.md. Declares the Phase 1 exit-gate banner.
The 8-hour Soak8h capture runs between A and B, against commit A's tree.
Copilot agent prompt
Paste the following block verbatim into the Copilot session.
You are finishing Pass 3 of TASK-1.4 in this repository. Passes 1 and 2 are
merged. Pass 3 is partial: the ChaosMonkey 30-min capture is done
(docs/captures/slice-1-4-chaos-monkey-2026-05-01.csv, currently UNTRACKED in
git), but the 8-hour Soak8h capture has not run, both row blocks have not
been appended to phase-1-measurements.md, and runbook §4.5 + §4.6 have not
been written.
A prior review surfaced two issues that must be addressed in a small
PRE-FLIGHT commit BEFORE the Soak8h capture starts. Doing them after would
mean re-capturing Soak8h.
This prompt covers both commits.
## Authoritative references
Read these before making changes:
- docs/specs/SLICE-1.4-storm-and-soak-profiles.md (criteria 11, 12, 14, 16)
- docs/tasks/TASK-1.4-implement-storm-and-soak-profiles.md (original Pass 3 prompt)
- docs/tasks/TASK-1.4-pass-3-close.md (this file)
- docs/runbook/capturing-measurements.md (existing §4.1–§4.4 + §4.5+ placeholder)
- docs/reviews/phase-1-measurements.md (slice-1-2 + slice-1-3 rows to mirror)
- src/InspectionPrototype.Infrastructure/Simulator/FlakySdkDecorator.cs
- src/InspectionPrototype.App/appsettings.json
- tests/InspectionPrototype.Tests/FlakySdkDecoratorTests.cs
- CLAUDE.md, docs/reviews/roadmap-progress.md
- tools/Capture-Measurements.ps1
- docs/captures/slice-1-4-chaos-monkey-2026-05-01.csv (the already-captured CSV)
The ChaosMonkey row block details from the captured CSV (for use in commit B):
- CSV path: docs/captures/slice-1-4-chaos-monkey-2026-05-01.csv
- Capture commit: 2108272 (per the 2026-05-01 session-log entry)
- Capture span: 1807 s
- Headline: 491 runs.started, 453 runs.completed, 37 runs.faulted, 37 fault-cycles,
10 469 frames.ingested, encoder-rate 199.9 Hz both axes, gc-pause-p95 10.28 ms,
working-set peak 225.3 MB
- Criterion-11 log evidence (verified): 39 fault-injected, 39 fault-cleared,
37 recovery-completed, 120 defect-shower entries — all four fault branches
(a/b/c/d from the original Pass 3 prompt) hit
The ChaosMonkey CSV was captured with `Simulator:FlakySdk:Enabled = true` in
appsettings.json. The pre-flight commit flips that to false so future
re-captures of the prior rows (slice-1-1, slice-1-2, slice-1-3) reproduce
within the existing accuracy bounds — criterion 16 of SLICE-1.4. Document
this in the ChaosMonkey row's Notes section: "captured against commit 2108272
with `Simulator:FlakySdk:Enabled` set to true; the merged appsettings ships
with Enabled=false. To reproduce, flip Enabled to true before re-running."
═══════════════════════════════════════════════════════════════════════════════
COMMIT A — Pre-flight fixes (do this BEFORE running Soak8h)
═══════════════════════════════════════════════════════════════════════════════
## Deliverables
1. src/InspectionPrototype.App/appsettings.json:
In the `Simulator:FlakySdk` block (≈ lines 120-126), change `"Enabled": true`
to `"Enabled": false`. Leave the three Chance fields and TimeoutHangMs
unchanged. Operator flips back to true manually for any future
ChaosMonkey re-capture (this is documented in the new runbook §4.5 in
commit B).
2. src/InspectionPrototype.Infrastructure/Simulator/FlakySdkDecorator.cs:
The current timeout-hang branch unconditionally throws OperationCanceledException
after the hang completes (lines ≈ 70-77). The spec said "fall through to
inner if not cancelled." Fix:
Replace:
if (roll < opts.TimeoutChance)
{
_logger.LogWarning("FlakySdk: simulating timeout hang ({HangMs} ms).", opts.TimeoutHangMs);
await Task.Delay(opts.TimeoutHangMs); // intentionally ignores caller CT
cancellationToken.ThrowIfCancellationRequested();
throw new OperationCanceledException(
"FlakySdk: timeout hang expired.", cancellationToken);
}
With:
if (roll < opts.TimeoutChance)
{
_logger.LogWarning("FlakySdk: simulating timeout hang ({HangMs} ms).", opts.TimeoutHangMs);
await Task.Delay(opts.TimeoutHangMs); // intentionally ignores caller CT
cancellationToken.ThrowIfCancellationRequested();
// Hang elapsed without cancellation — fall through to the inner call.
return await _inner.ConnectAsync(cancellationToken);
}
This matches the spec's "the decorator awaits a Task.Delay of (caller's CTS
expected timeout × 2). Fall through to inner if not cancelled."
3. tests/InspectionPrototype.Tests/FlakySdkDecoratorTests.cs:
The existing test FlakySdk_TimeoutBranch_ThrowsOperationCanceledException
passes either way because its CTS auto-cancels at 50 ms (< 100 ms hang),
so ThrowIfCancellationRequested() short-circuits before the fall-through
path. Add a SECOND test that verifies the fall-through path explicitly:
[Fact]
public async Task FlakySdk_TimeoutBranch_WhenNotCancelled_FallsThroughToInner()
{
var opts = new FlakySdkOptions
{
Enabled = true,
TimeoutChance = 1.0,
TimeoutHangMs = 50 // short hang
};
var monitor = new FakeOptionsMonitor<FlakySdkOptions>(opts);
var inner = new AlwaysConnected();
var sut = new FlakySdkDecorator(inner, monitor, NullLogger<FlakySdkDecorator>.Instance);
// No cancellation — caller's CT is never cancelled.
var result = await sut.ConnectAsync(CancellationToken.None);
Assert.True(result);
}
The existing FlakySdk_TimeoutBranch_ThrowsOperationCanceledException test
stays unchanged — it now exercises the cancellation-during-hang path
explicitly via the auto-cancelling CT.
4. Verification before committing A:
dotnet build --configuration Release
dotnet test --configuration Release
The encoder-cadence test SimulatedEncoderSourceTests.ProduceAsync_At200Hz_*
may be flaky on a busy host (it has been observed at 102 samples vs the
≥160 floor — Windows-timer-resolution variance, same family as SLICE-1.3's
amended criterion 7). If it fails, re-run dotnet test once; if it
consistently fails, file a follow-up but DO NOT block commit A on it.
That test is unrelated to SLICE-1.4 and is already in the codebase.
5. Commit A:
git add src/InspectionPrototype.App/appsettings.json \
src/InspectionPrototype.Infrastructure/Simulator/FlakySdkDecorator.cs \
tests/InspectionPrototype.Tests/FlakySdkDecoratorTests.cs
Commit message:
"fix(sim): default Simulator:FlakySdk:Enabled=false; FlakySdk timeout falls through to inner
Two pre-Soak8h fixes for SLICE-1.4:
1. appsettings.json ships FlakySdk Enabled=false so existing rows
(slice-1-1, slice-1-2, slice-1-3) remain reproducible against the
merged commit (criterion 16). Operator flips to true before the
ChaosMonkey capture (documented in runbook §4.5).
2. FlakySdkDecorator timeout branch falls through to the inner
connection when the caller's CT was not cancelled during the hang —
matches the spec's stated semantic. New regression test added."
═══════════════════════════════════════════════════════════════════════════════
SOAK8h CAPTURE (8 hours real-time, between commits A and B)
═══════════════════════════════════════════════════════════════════════════════
Run on a sleep-disabled, hibernate-disabled, dedicated host session. Do NOT
use the host for any other interactive work during the run.
# Note current values for restoration
$prevStandby = (powercfg /query SCHEME_CURRENT SUB_SLEEP STANDBYIDLE | Out-String)
$prevMonitor = (powercfg /query SCHEME_CURRENT SUB_VIDEO VIDEOIDLE | Out-String)
$prevHibernate = (powercfg /availablesleepstates | Out-String)
# Disable sleep + hibernate
powercfg /change standby-timeout-ac 0
powercfg /change monitor-timeout-ac 0
powercfg /hibernate off
# Run the capture against commit A
$date = Get-Date -Format 'yyyy-MM-dd'
tools/Capture-Measurements.ps1 -Scenario MultiTagSoak `
-DurationSeconds 28800 -Profile Soak8h `
-OutputCsv "docs/captures/slice-1-4-soak-8h-$date.csv" `
-CommitHash $(git rev-parse --short HEAD) `
-SliceTag slice-1-4-soak-8h
Verify after the run completes:
* Capture span ≥ 28 500 s (≤ 1% drift from 8 h). If less, the host slept
or paused; discard the CSV and restart.
* working-set growth (MB) ≤ 50 — this is criterion 12, the slice's gate.
* gen-2-gc-count rate (per hour) ≤ 4× the slice-1-2-real-frame-payloads
rate (slice-1-2 row shows gen-2 = 2 713 over 600 s ≈ 16 280/hr; under
Soak8h the rate must be ≤ 65 120/hr).
* runs.faulted near zero (Soak8h has AlarmBurstEveryMs = 0, so faults
come only from ConnectionFailureProbability = 0.05 misconnects, which
are not critical-fault paths).
* No unhandled-exception entries in Logs/inspection-prototype-*.log.
If criterion 12 (growth ≤ 50 MB) fails:
* STOP. Do not paper over it by adjusting the criterion.
* The slice's design intent is unmet. File a follow-up with the
growth value, the time-series shape (monotonic? sawtooth?), and any
suspect counters (Gen-2 runaway, LOH growth, etc).
* Phase 2 may then open with the leak as its motivating evidence.
After the capture finishes:
# Restore powercfg
powercfg /change standby-timeout-ac <previous-minutes>
powercfg /change monitor-timeout-ac <previous-minutes>
powercfg /hibernate on # if it was on before
═══════════════════════════════════════════════════════════════════════════════
COMMIT B — Append rows + runbook + close Phase 1
═══════════════════════════════════════════════════════════════════════════════
## Deliverables
1. Append two row blocks to docs/reviews/phase-1-measurements.md.
Format mirrors the existing slice-1-3-encoder-rate-motion + slice-1-2
rows (Slice | Metric | Baseline | After | Delta | Source | Date).
Row "slice-1-4-chaos-monkey":
- Place AFTER the slice-1-3-encoder-rate-motion row (it is the most
recent baseline reference).
- Baseline column = slice-1-3-encoder-rate-motion values for the 20
overlapping metrics; "—" for working-set growth (MB) and
fault-cycles (count).
- 22 rows: existing 20 + working-set growth (MB) + fault-cycles (count).
- Use the headline numbers above. Compute deltas (after − baseline for
totals; ratios for rates).
- Write a "### Notes on slice-1-4-chaos-monkey" subsection covering:
(a) Why slice-1-3 is the baseline reference.
(b) Per-fault-branch evidence, citing log-line counts per branch:
connect-failure (count from 'Connection failed (simulated failure)'
+ 'FlakySdk: out-of-band-throw' + 'Connection error:' lines);
fault-during-home (count of 'CRITICAL FAULT: [CHAOS-' lines that
appear within ±2 s of 'Homing started' / 'Homing aborted');
fault-during-run (count of 'CRITICAL FAULT: [CHAOS-' lines within
±2 s of 'Run running' / 'Run loop interrupted');
fault-clear-and-recover (count of 'Fault condition cleared:
[CHAOS-' followed by 'Recovery completed.').
The known totals from the 2026-05-01 session log: 39 injected,
39 cleared, 37 recovered, 120 defect-shower transitions.
(c) The Enabled=true caveat: capture was taken with
`Simulator:FlakySdk:Enabled` flipped to true; the merged
appsettings.json ships Enabled=false (commit A). Reproducing this
row requires re-flipping to true before the capture.
(d) Anything that surprised in the capture — e.g., FlakySdk timeout
branch effect after the spec-fix, ignore-cancellation effect on
AppState, etc.
Row "slice-1-4-soak-8h":
- Place AFTER the slice-1-4-chaos-monkey row.
- Baseline column = slice-1-2-real-frame-payloads values for the 18
overlapping metrics; "—" for the 4 SLICE-1.3+ metrics that
slice-1-2 predates (encoder-rate-x, encoder-rate-y, working-set
growth, fault-cycles).
- 22 rows.
- Write a "### Notes on slice-1-4-soak-8h" subsection covering:
(a) Why slice-1-2 is the baseline (continuous-load FlaUI-captured
row; both slice-1-2 and Soak8h emphasize sustained data-plane
load with low chaos).
(b) Working-set first-second value vs last-second value (read these
from the CSV directly), the growth math (last − first / 1 MB),
and whether it satisfied criterion 12 (≤ 50 MB).
(c) Gen-2 GC count rate-per-hour vs slice-1-2's rate-per-hour, with
the 4× ceiling check.
(d) Per-tag samples.ingested distribution — note any tag whose rate
dropped by more than the 1% TelemetryDropoutChance predicts
(use Get-Content + Group-Object on the CSV's tag.name dimension
column, similar to TASK-1.1 Pass 3's per-tag rate distribution).
(e) Anything that surprised — working-set sawtooth vs monotonic,
alloc-rate trend, encoder-rate stability over 8 h, etc.
2. docs/runbook/capturing-measurements.md:
Replace the existing "### 4.5+ — pending Phase 1 scenarios" placeholder
section with TWO new sections (§4.5 and §4.6) and a new "### 4.7+ —
pending Phase 2 scenarios" placeholder.
§4.5 "Chaos-monkey scenario — SLICE-1.4, `ChaosMonkey` profile":
- one-paragraph rationale linking back to SLICE-1.4
- PREREQUISITE: flip `Simulator:FlakySdk:Enabled` from false (default)
to true in src/InspectionPrototype.App/appsettings.json before
building. Restore to false after the capture. Note that this means
the capture is NOT bit-for-bit reproducible against the merged
commit; the row's Notes section documents this.
- 30-minute step list mirroring §4.4 with profile = ChaosMonkey
- sanity checks: runs.started ≥ 5, runs.faulted ≥ 5, fault-cycles
(count) ≥ 5, frames.dropped recorded, the four log-line branch
types (a)/(b)/(c)/(d) all present
- the row block is 22-metric — name working-set growth (MB) and
fault-cycles (count) and where they come from
- PowerShell `Select-String` recipe over the inspection-prototype log
files to count each fault-branch landing — copy-pasteable. Example:
$log = "Logs/app-$(Get-Date -Format yyyyMMdd).log"
Select-String -Path $log -Pattern 'Connection failed \(simulated failure\)|FlakySdk: out-of-band-throw|Connection error:' | Measure-Object | Select-Object Count
# Repeat per branch...
- Implemented by: `MultiTagSoakFlaUi` with `--profile ChaosMonkey`
§4.6 "Soak scenario — SLICE-1.4, `Soak8h` profile":
- one-paragraph rationale: leak-detection bar; 8 hours real-time on
a dedicated session
- "do not run on a host you also intend to use" warning (bold)
- prerequisites: hibernate disabled, screen-saver disabled, no other
interactive use of the host
- 8-hour step list — Capture-Measurements.ps1 with -DurationSeconds
28800 -Profile Soak8h
- sanity checks: working-set growth (MB) ≤ 50, gen-2-gc-count rate
within 4× of slice-1-2's rate, no unhandled-exception entries,
capture span ≥ 28 500 s
- what to do if interrupted: discard the partial CSV and restart —
leak math is meaningful only on uninterrupted real-time
- Implemented by: `MultiTagSoakFlaUi` with `--profile Soak8h`
§4.7+ "pending Phase 2 scenarios":
- one line: "Reserved for Phase 2 slices once they open."
3. CLAUDE.md "Current position" block (5 lines):
- Phase: 1 (Simulator to scale) — **complete** as of <today's date>
- Last completed action: TASK-1.4 Pass 3 closed. ChaosMonkey row block
(491 runs.started, 37 runs.faulted, 37 fault-cycles, criterion-11
verified by log evidence) and Soak8h row block (working-set growth
<X> MB, criterion 12 met) appended to phase-1-measurements.md;
runbook §4.5 + §4.6 added; Phase 1 exit-gate banner declared.
Commits <hash-A>, <hash-B>.
- Next action: open Phase 2 — review Phase-1 measurement evidence
(rows slice-1-1 through slice-1-4-soak-8h) to prioritize SLICE-2.1
/ 2.2 / 2.3 / 2.4 ordering. Phase 2 spec to be written.
- Blocked on: nothing
- Last updated: <today's date>
4. docs/reviews/roadmap-progress.md:
- Update the SLICE-1.4 row in the Phase 1 progress table from
"**In Progress** (2026-04-30 / 2026-05-01)" to
"**Completed** (<today's date>)" with a notes column update
citing both row blocks and the commit hashes.
- Append a session-log entry for today:
### <today's date> — TASK-1.4 Pass 3 closed; SLICE-1.4 + Phase 1 done
- Pre-flight commit <hash-A>: appsettings.json
FlakySdk Enabled=true→false (criterion 16 reproducibility);
FlakySdkDecorator timeout-branch falls through to inner per
spec; new regression test.
- Soak8h capture: docs/captures/slice-1-4-soak-8h-<date>.csv
(<span> s, exit 0). Headline: working-set growth = <X> MB
(criterion 12: ≤ 50 MB), gen-2 GC = <Y>, runs.faulted = <Z>,
no unhandled-exception entries.
- ChaosMonkey row + Soak8h row appended to
phase-1-measurements.md; runbook §4.5 + §4.6 added; §4.5+
placeholder replaced with §4.7+ Phase 2 placeholder.
- <test-count> tests pass (note: SimulatedEncoderSourceTests
cadence test may be flaky on busy hosts — pre-existing
SLICE-1.3 issue, not a SLICE-1.4 regression).
- **Phase 1 exit gate met on <today's date>.** Final commit
<hash-B>.
- Add a banner line under the Phase 1 section heading (just below the
existing "Phase 0 exit gate" banner template):
**Phase 1 exit gate:** met on <today's date>, see rows
`slice-1-4-chaos-monkey` and `slice-1-4-soak-8h` of the
measurements table.
5. Stage and commit. The new files are:
- docs/captures/slice-1-4-chaos-monkey-2026-05-01.csv (currently UNTRACKED)
- docs/captures/slice-1-4-soak-8h-<date>.csv (newly created by the capture)
Edited files:
- docs/reviews/phase-1-measurements.md
- docs/runbook/capturing-measurements.md
- CLAUDE.md
- docs/reviews/roadmap-progress.md
git add docs/captures/slice-1-4-chaos-monkey-2026-05-01.csv \
docs/captures/slice-1-4-soak-8h-*.csv \
docs/reviews/phase-1-measurements.md \
docs/runbook/capturing-measurements.md \
CLAUDE.md \
docs/reviews/roadmap-progress.md
Commit message:
"feat(measurements): close SLICE-1.4 and Phase 1; chaos-monkey + 8h soak rows + runbook §4.5/§4.6 (pass 3/3 of TASK-1.4)
Pass 3 final close. Two row blocks appended to phase-1-measurements.md:
slice-1-4-chaos-monkey (30-min, criterion-11 verified by log evidence)
and slice-1-4-soak-8h (8-h, criterion-12 working-set growth = <X> MB ≤ 50).
Runbook §4.5 (ChaosMonkey) + §4.6 (Soak8h) added; §4.5+ placeholder
replaced with §4.7+ Phase-2 placeholder. CLAUDE.md and roadmap-progress
updated; Phase 1 exit-gate banner declared."
═══════════════════════════════════════════════════════════════════════════════
CONSTRAINTS
═══════════════════════════════════════════════════════════════════════════════
- Do NOT skip the 8-hour Soak8h capture.
- Do NOT proceed to commit B if criterion 12 (growth ≤ 50 MB) fails — file
the gap as a follow-up and report back instead.
- Do NOT capture the soak with another high-CPU workload running on the host.
- Do NOT discard the existing ChaosMonkey CSV. It is the criterion-11 evidence;
re-running it would consume another 30 min and may yield slightly different
numbers (Random.Shared is not seeded). Reuse the 2026-05-01 capture.
- Do NOT introduce new code or test changes in commit B. All Pass-3 close
work in commit B is docs + CSVs only.
- Commit A and commit B must be separate. Do NOT squash them — commit A's
appsettings flip is the prerequisite for the Soak8h capture in between.
═══════════════════════════════════════════════════════════════════════════════
REPORT FORMAT WHEN FINISHED
═══════════════════════════════════════════════════════════════════════════════
- both commit hashes (A and B)
- the captured Soak8h row block (the 22-metric markdown table) included verbatim
- working-set growth (MB) value with one-sentence interpretation
("Working-set grew by <X> MB across the <span>s soak, satisfying the
criterion-12 ≤ 50 MB ceiling.")
- final test count and any flake notes
- a one-line declaration: "Phase 1 exit gate met on <today's date>."Operator notes
- Commit A is reversible; the Soak8h capture is not. Land commit A first, verify the build is green and the FlakySdk regression test passes, then start the 8-hour capture. The capture must be uninterrupted on a sleep-disabled host.
- The ChaosMonkey CSV stays as evidence even though
Enabledflips tofalsein commit A. The row's Notes section explicitly records the Enabled=true capture-time value and how to reproduce. This mirrors how SLICE-1.1's row 0a/0b notes documented their headless-mode caveat. - If criterion 12 fails, do not paper over it. The slice's whole purpose is leak detection; a >50 MB growth is a real signal worth feeding into Phase 2's prioritization. File a follow-up rather than amending the criterion downward.
- The encoder-cadence test flake (
SimulatedEncoderSourceTests.ProduceAsync_At200Hz_*) is a SLICE-1.3 follow-up, not a SLICE-1.4 regression. It is documented in commit A's verification step so Copilot does not block on it.