Skip to content

TASK-1.4 Pass 3 — Close-out prompt

Why this file exists

The original TASK-1.4 Pass 3 prompt (written 2026-04-30 when SLICE-1.4 opened) assumed Pass 3 would run both captures from a clean state. Reality diverged:

  • The ChaosMonkey 30-min capture ran on 2026-05-01 against commit 2108272. CSV: docs/captures/slice-1-4-chaos-monkey-2026-05-01.csv (currently UNTRACKED in git).
  • Four FlaUI / DI fixes were needed to make MultiTagSoakFlaUi survive ChaosMonkey faults (bf32566, 0f1596a, 5462d42, 2108272).
  • The 8-hour Soak8h capture has not run yet.
  • A post-Pass-2 review surfaced two issues that must be addressed in a small pre-flight commit BEFORE the Soak8h capture starts, otherwise the Soak8h would have to be re-run.

This file is the close-out prompt that replaces the original Pass 3 prompt. The original Pass 3 prompt remains in TASK-1.4 as the historical record of how the slice was opened.

What ships

Two commits:

  • Commit A — pre-flight fixes (small, single-purpose). Flips Simulator:FlakySdk:Enabled from true to false in appsettings.json so existing rows (slice-1-1, slice-1-2, slice-1-3) reproduce against the merged commit (criterion 16). Fixes FlakySdkDecorator timeout-branch to fall through to inner when the caller's CT was not cancelled, per spec. Adds a regression test for the fall-through path.
  • Commit B — Pass 3 final close (docs + CSVs only). Appends two row blocks to phase-1-measurements.md. Writes runbook §4.5 + §4.6. Updates CLAUDE.md and roadmap-progress.md. Declares the Phase 1 exit-gate banner.

The 8-hour Soak8h capture runs between A and B, against commit A's tree.

Copilot agent prompt

Paste the following block verbatim into the Copilot session.

You are finishing Pass 3 of TASK-1.4 in this repository. Passes 1 and 2 are
merged. Pass 3 is partial: the ChaosMonkey 30-min capture is done
(docs/captures/slice-1-4-chaos-monkey-2026-05-01.csv, currently UNTRACKED in
git), but the 8-hour Soak8h capture has not run, both row blocks have not
been appended to phase-1-measurements.md, and runbook §4.5 + §4.6 have not
been written.

A prior review surfaced two issues that must be addressed in a small
PRE-FLIGHT commit BEFORE the Soak8h capture starts. Doing them after would
mean re-capturing Soak8h.

This prompt covers both commits.

## Authoritative references

Read these before making changes:
- docs/specs/SLICE-1.4-storm-and-soak-profiles.md   (criteria 11, 12, 14, 16)
- docs/tasks/TASK-1.4-implement-storm-and-soak-profiles.md   (original Pass 3 prompt)
- docs/tasks/TASK-1.4-pass-3-close.md   (this file)
- docs/runbook/capturing-measurements.md   (existing §4.1–§4.4 + §4.5+ placeholder)
- docs/reviews/phase-1-measurements.md   (slice-1-2 + slice-1-3 rows to mirror)
- src/InspectionPrototype.Infrastructure/Simulator/FlakySdkDecorator.cs
- src/InspectionPrototype.App/appsettings.json
- tests/InspectionPrototype.Tests/FlakySdkDecoratorTests.cs
- CLAUDE.md, docs/reviews/roadmap-progress.md
- tools/Capture-Measurements.ps1
- docs/captures/slice-1-4-chaos-monkey-2026-05-01.csv  (the already-captured CSV)

The ChaosMonkey row block details from the captured CSV (for use in commit B):
- CSV path: docs/captures/slice-1-4-chaos-monkey-2026-05-01.csv
- Capture commit: 2108272 (per the 2026-05-01 session-log entry)
- Capture span: 1807 s
- Headline: 491 runs.started, 453 runs.completed, 37 runs.faulted, 37 fault-cycles,
  10 469 frames.ingested, encoder-rate 199.9 Hz both axes, gc-pause-p95 10.28 ms,
  working-set peak 225.3 MB
- Criterion-11 log evidence (verified): 39 fault-injected, 39 fault-cleared,
  37 recovery-completed, 120 defect-shower entries — all four fault branches
  (a/b/c/d from the original Pass 3 prompt) hit

The ChaosMonkey CSV was captured with `Simulator:FlakySdk:Enabled = true` in
appsettings.json. The pre-flight commit flips that to false so future
re-captures of the prior rows (slice-1-1, slice-1-2, slice-1-3) reproduce
within the existing accuracy bounds — criterion 16 of SLICE-1.4. Document
this in the ChaosMonkey row's Notes section: "captured against commit 2108272
with `Simulator:FlakySdk:Enabled` set to true; the merged appsettings ships
with Enabled=false. To reproduce, flip Enabled to true before re-running."

═══════════════════════════════════════════════════════════════════════════════
COMMIT A — Pre-flight fixes (do this BEFORE running Soak8h)
═══════════════════════════════════════════════════════════════════════════════

## Deliverables

1. src/InspectionPrototype.App/appsettings.json:
   In the `Simulator:FlakySdk` block (≈ lines 120-126), change `"Enabled": true`
   to `"Enabled": false`. Leave the three Chance fields and TimeoutHangMs
   unchanged. Operator flips back to true manually for any future
   ChaosMonkey re-capture (this is documented in the new runbook §4.5 in
   commit B).

2. src/InspectionPrototype.Infrastructure/Simulator/FlakySdkDecorator.cs:
   The current timeout-hang branch unconditionally throws OperationCanceledException
   after the hang completes (lines ≈ 70-77). The spec said "fall through to
   inner if not cancelled." Fix:

   Replace:
       if (roll < opts.TimeoutChance)
       {
           _logger.LogWarning("FlakySdk: simulating timeout hang ({HangMs} ms).", opts.TimeoutHangMs);
           await Task.Delay(opts.TimeoutHangMs);     // intentionally ignores caller CT
           cancellationToken.ThrowIfCancellationRequested();
           throw new OperationCanceledException(
               "FlakySdk: timeout hang expired.", cancellationToken);
       }

   With:
       if (roll < opts.TimeoutChance)
       {
           _logger.LogWarning("FlakySdk: simulating timeout hang ({HangMs} ms).", opts.TimeoutHangMs);
           await Task.Delay(opts.TimeoutHangMs);     // intentionally ignores caller CT
           cancellationToken.ThrowIfCancellationRequested();
           // Hang elapsed without cancellation — fall through to the inner call.
           return await _inner.ConnectAsync(cancellationToken);
       }

   This matches the spec's "the decorator awaits a Task.Delay of (caller's CTS
   expected timeout × 2). Fall through to inner if not cancelled."

3. tests/InspectionPrototype.Tests/FlakySdkDecoratorTests.cs:
   The existing test FlakySdk_TimeoutBranch_ThrowsOperationCanceledException
   passes either way because its CTS auto-cancels at 50 ms (< 100 ms hang),
   so ThrowIfCancellationRequested() short-circuits before the fall-through
   path. Add a SECOND test that verifies the fall-through path explicitly:

       [Fact]
       public async Task FlakySdk_TimeoutBranch_WhenNotCancelled_FallsThroughToInner()
       {
           var opts = new FlakySdkOptions
           {
               Enabled       = true,
               TimeoutChance = 1.0,
               TimeoutHangMs = 50            // short hang
           };
           var monitor = new FakeOptionsMonitor<FlakySdkOptions>(opts);
           var inner   = new AlwaysConnected();
           var sut     = new FlakySdkDecorator(inner, monitor, NullLogger<FlakySdkDecorator>.Instance);

           // No cancellation — caller's CT is never cancelled.
           var result = await sut.ConnectAsync(CancellationToken.None);
           Assert.True(result);
       }

   The existing FlakySdk_TimeoutBranch_ThrowsOperationCanceledException test
   stays unchanged — it now exercises the cancellation-during-hang path
   explicitly via the auto-cancelling CT.

4. Verification before committing A:
   dotnet build --configuration Release
   dotnet test --configuration Release

   The encoder-cadence test SimulatedEncoderSourceTests.ProduceAsync_At200Hz_*
   may be flaky on a busy host (it has been observed at 102 samples vs the
   ≥160 floor — Windows-timer-resolution variance, same family as SLICE-1.3's
   amended criterion 7). If it fails, re-run dotnet test once; if it
   consistently fails, file a follow-up but DO NOT block commit A on it.
   That test is unrelated to SLICE-1.4 and is already in the codebase.

5. Commit A:
   git add src/InspectionPrototype.App/appsettings.json \
           src/InspectionPrototype.Infrastructure/Simulator/FlakySdkDecorator.cs \
           tests/InspectionPrototype.Tests/FlakySdkDecoratorTests.cs

   Commit message:
   "fix(sim): default Simulator:FlakySdk:Enabled=false; FlakySdk timeout falls through to inner

   Two pre-Soak8h fixes for SLICE-1.4:
   1. appsettings.json ships FlakySdk Enabled=false so existing rows
      (slice-1-1, slice-1-2, slice-1-3) remain reproducible against the
      merged commit (criterion 16). Operator flips to true before the
      ChaosMonkey capture (documented in runbook §4.5).
   2. FlakySdkDecorator timeout branch falls through to the inner
      connection when the caller's CT was not cancelled during the hang —
      matches the spec's stated semantic. New regression test added."

═══════════════════════════════════════════════════════════════════════════════
SOAK8h CAPTURE (8 hours real-time, between commits A and B)
═══════════════════════════════════════════════════════════════════════════════

Run on a sleep-disabled, hibernate-disabled, dedicated host session. Do NOT
use the host for any other interactive work during the run.

  # Note current values for restoration
  $prevStandby   = (powercfg /query SCHEME_CURRENT SUB_SLEEP STANDBYIDLE | Out-String)
  $prevMonitor   = (powercfg /query SCHEME_CURRENT SUB_VIDEO VIDEOIDLE  | Out-String)
  $prevHibernate = (powercfg /availablesleepstates | Out-String)
  # Disable sleep + hibernate
  powercfg /change standby-timeout-ac 0
  powercfg /change monitor-timeout-ac 0
  powercfg /hibernate off

  # Run the capture against commit A
  $date = Get-Date -Format 'yyyy-MM-dd'
  tools/Capture-Measurements.ps1 -Scenario MultiTagSoak `
    -DurationSeconds 28800 -Profile Soak8h `
    -OutputCsv "docs/captures/slice-1-4-soak-8h-$date.csv" `
    -CommitHash $(git rev-parse --short HEAD) `
    -SliceTag slice-1-4-soak-8h

Verify after the run completes:
  * Capture span ≥ 28 500 s (≤ 1% drift from 8 h). If less, the host slept
    or paused; discard the CSV and restart.
  * working-set growth (MB) ≤ 50 — this is criterion 12, the slice's gate.
  * gen-2-gc-count rate (per hour) ≤ 4× the slice-1-2-real-frame-payloads
    rate (slice-1-2 row shows gen-2 = 2 713 over 600 s ≈ 16 280/hr; under
    Soak8h the rate must be ≤ 65 120/hr).
  * runs.faulted near zero (Soak8h has AlarmBurstEveryMs = 0, so faults
    come only from ConnectionFailureProbability = 0.05 misconnects, which
    are not critical-fault paths).
  * No unhandled-exception entries in Logs/inspection-prototype-*.log.

If criterion 12 (growth ≤ 50 MB) fails:
  * STOP. Do not paper over it by adjusting the criterion.
  * The slice's design intent is unmet. File a follow-up with the
    growth value, the time-series shape (monotonic? sawtooth?), and any
    suspect counters (Gen-2 runaway, LOH growth, etc).
  * Phase 2 may then open with the leak as its motivating evidence.

After the capture finishes:
  # Restore powercfg
  powercfg /change standby-timeout-ac <previous-minutes>
  powercfg /change monitor-timeout-ac <previous-minutes>
  powercfg /hibernate on   # if it was on before

═══════════════════════════════════════════════════════════════════════════════
COMMIT B — Append rows + runbook + close Phase 1
═══════════════════════════════════════════════════════════════════════════════

## Deliverables

1. Append two row blocks to docs/reviews/phase-1-measurements.md.

   Format mirrors the existing slice-1-3-encoder-rate-motion + slice-1-2
   rows (Slice | Metric | Baseline | After | Delta | Source | Date).

   Row "slice-1-4-chaos-monkey":
   - Place AFTER the slice-1-3-encoder-rate-motion row (it is the most
     recent baseline reference).
   - Baseline column = slice-1-3-encoder-rate-motion values for the 20
     overlapping metrics; "—" for working-set growth (MB) and
     fault-cycles (count).
   - 22 rows: existing 20 + working-set growth (MB) + fault-cycles (count).
   - Use the headline numbers above. Compute deltas (after − baseline for
     totals; ratios for rates).
   - Write a "### Notes on slice-1-4-chaos-monkey" subsection covering:
     (a) Why slice-1-3 is the baseline reference.
     (b) Per-fault-branch evidence, citing log-line counts per branch:
         connect-failure (count from 'Connection failed (simulated failure)'
         + 'FlakySdk: out-of-band-throw' + 'Connection error:' lines);
         fault-during-home (count of 'CRITICAL FAULT: [CHAOS-' lines that
         appear within ±2 s of 'Homing started' / 'Homing aborted');
         fault-during-run (count of 'CRITICAL FAULT: [CHAOS-' lines within
         ±2 s of 'Run running' / 'Run loop interrupted');
         fault-clear-and-recover (count of 'Fault condition cleared:
         [CHAOS-' followed by 'Recovery completed.').
         The known totals from the 2026-05-01 session log: 39 injected,
         39 cleared, 37 recovered, 120 defect-shower transitions.
     (c) The Enabled=true caveat: capture was taken with
         `Simulator:FlakySdk:Enabled` flipped to true; the merged
         appsettings.json ships Enabled=false (commit A). Reproducing this
         row requires re-flipping to true before the capture.
     (d) Anything that surprised in the capture — e.g., FlakySdk timeout
         branch effect after the spec-fix, ignore-cancellation effect on
         AppState, etc.

   Row "slice-1-4-soak-8h":
   - Place AFTER the slice-1-4-chaos-monkey row.
   - Baseline column = slice-1-2-real-frame-payloads values for the 18
     overlapping metrics; "—" for the 4 SLICE-1.3+ metrics that
     slice-1-2 predates (encoder-rate-x, encoder-rate-y, working-set
     growth, fault-cycles).
   - 22 rows.
   - Write a "### Notes on slice-1-4-soak-8h" subsection covering:
     (a) Why slice-1-2 is the baseline (continuous-load FlaUI-captured
         row; both slice-1-2 and Soak8h emphasize sustained data-plane
         load with low chaos).
     (b) Working-set first-second value vs last-second value (read these
         from the CSV directly), the growth math (last − first / 1 MB),
         and whether it satisfied criterion 12 (≤ 50 MB).
     (c) Gen-2 GC count rate-per-hour vs slice-1-2's rate-per-hour, with
         the 4× ceiling check.
     (d) Per-tag samples.ingested distribution — note any tag whose rate
         dropped by more than the 1% TelemetryDropoutChance predicts
         (use Get-Content + Group-Object on the CSV's tag.name dimension
         column, similar to TASK-1.1 Pass 3's per-tag rate distribution).
     (e) Anything that surprised — working-set sawtooth vs monotonic,
         alloc-rate trend, encoder-rate stability over 8 h, etc.

2. docs/runbook/capturing-measurements.md:

   Replace the existing "### 4.5+ — pending Phase 1 scenarios" placeholder
   section with TWO new sections (§4.5 and §4.6) and a new "### 4.7+ —
   pending Phase 2 scenarios" placeholder.

   §4.5 "Chaos-monkey scenario — SLICE-1.4, `ChaosMonkey` profile":
   - one-paragraph rationale linking back to SLICE-1.4
   - PREREQUISITE: flip `Simulator:FlakySdk:Enabled` from false (default)
     to true in src/InspectionPrototype.App/appsettings.json before
     building. Restore to false after the capture. Note that this means
     the capture is NOT bit-for-bit reproducible against the merged
     commit; the row's Notes section documents this.
   - 30-minute step list mirroring §4.4 with profile = ChaosMonkey
   - sanity checks: runs.started ≥ 5, runs.faulted ≥ 5, fault-cycles
     (count) ≥ 5, frames.dropped recorded, the four log-line branch
     types (a)/(b)/(c)/(d) all present
   - the row block is 22-metric — name working-set growth (MB) and
     fault-cycles (count) and where they come from
   - PowerShell `Select-String` recipe over the inspection-prototype log
     files to count each fault-branch landing — copy-pasteable. Example:
       $log = "Logs/app-$(Get-Date -Format yyyyMMdd).log"
       Select-String -Path $log -Pattern 'Connection failed \(simulated failure\)|FlakySdk: out-of-band-throw|Connection error:' | Measure-Object | Select-Object Count
       # Repeat per branch...
   - Implemented by: `MultiTagSoakFlaUi` with `--profile ChaosMonkey`

   §4.6 "Soak scenario — SLICE-1.4, `Soak8h` profile":
   - one-paragraph rationale: leak-detection bar; 8 hours real-time on
     a dedicated session
   - "do not run on a host you also intend to use" warning (bold)
   - prerequisites: hibernate disabled, screen-saver disabled, no other
     interactive use of the host
   - 8-hour step list — Capture-Measurements.ps1 with -DurationSeconds
     28800 -Profile Soak8h
   - sanity checks: working-set growth (MB) ≤ 50, gen-2-gc-count rate
     within 4× of slice-1-2's rate, no unhandled-exception entries,
     capture span ≥ 28 500 s
   - what to do if interrupted: discard the partial CSV and restart —
     leak math is meaningful only on uninterrupted real-time
   - Implemented by: `MultiTagSoakFlaUi` with `--profile Soak8h`

   §4.7+ "pending Phase 2 scenarios":
   - one line: "Reserved for Phase 2 slices once they open."

3. CLAUDE.md "Current position" block (5 lines):
   - Phase: 1 (Simulator to scale) — **complete** as of <today's date>
   - Last completed action: TASK-1.4 Pass 3 closed. ChaosMonkey row block
     (491 runs.started, 37 runs.faulted, 37 fault-cycles, criterion-11
     verified by log evidence) and Soak8h row block (working-set growth
     <X> MB, criterion 12 met) appended to phase-1-measurements.md;
     runbook §4.5 + §4.6 added; Phase 1 exit-gate banner declared.
     Commits <hash-A>, <hash-B>.
   - Next action: open Phase 2 — review Phase-1 measurement evidence
     (rows slice-1-1 through slice-1-4-soak-8h) to prioritize SLICE-2.1
     / 2.2 / 2.3 / 2.4 ordering. Phase 2 spec to be written.
   - Blocked on: nothing
   - Last updated: <today's date>

4. docs/reviews/roadmap-progress.md:
   - Update the SLICE-1.4 row in the Phase 1 progress table from
     "**In Progress** (2026-04-30 / 2026-05-01)" to
     "**Completed** (<today's date>)" with a notes column update
     citing both row blocks and the commit hashes.
   - Append a session-log entry for today:
       ### <today's date> — TASK-1.4 Pass 3 closed; SLICE-1.4 + Phase 1 done
       - Pre-flight commit <hash-A>: appsettings.json
         FlakySdk Enabled=true→false (criterion 16 reproducibility);
         FlakySdkDecorator timeout-branch falls through to inner per
         spec; new regression test.
       - Soak8h capture: docs/captures/slice-1-4-soak-8h-<date>.csv
         (<span> s, exit 0). Headline: working-set growth = <X> MB
         (criterion 12: ≤ 50 MB), gen-2 GC = <Y>, runs.faulted = <Z>,
         no unhandled-exception entries.
       - ChaosMonkey row + Soak8h row appended to
         phase-1-measurements.md; runbook §4.5 + §4.6 added; §4.5+
         placeholder replaced with §4.7+ Phase 2 placeholder.
       - <test-count> tests pass (note: SimulatedEncoderSourceTests
         cadence test may be flaky on busy hosts — pre-existing
         SLICE-1.3 issue, not a SLICE-1.4 regression).
       - **Phase 1 exit gate met on <today's date>.** Final commit
         <hash-B>.
   - Add a banner line under the Phase 1 section heading (just below the
     existing "Phase 0 exit gate" banner template):
       **Phase 1 exit gate:** met on <today's date>, see rows
       `slice-1-4-chaos-monkey` and `slice-1-4-soak-8h` of the
       measurements table.

5. Stage and commit. The new files are:
   - docs/captures/slice-1-4-chaos-monkey-2026-05-01.csv (currently UNTRACKED)
   - docs/captures/slice-1-4-soak-8h-<date>.csv (newly created by the capture)

   Edited files:
   - docs/reviews/phase-1-measurements.md
   - docs/runbook/capturing-measurements.md
   - CLAUDE.md
   - docs/reviews/roadmap-progress.md

   git add docs/captures/slice-1-4-chaos-monkey-2026-05-01.csv \
           docs/captures/slice-1-4-soak-8h-*.csv \
           docs/reviews/phase-1-measurements.md \
           docs/runbook/capturing-measurements.md \
           CLAUDE.md \
           docs/reviews/roadmap-progress.md

   Commit message:
   "feat(measurements): close SLICE-1.4 and Phase 1; chaos-monkey + 8h soak rows + runbook §4.5/§4.6 (pass 3/3 of TASK-1.4)

   Pass 3 final close. Two row blocks appended to phase-1-measurements.md:
   slice-1-4-chaos-monkey (30-min, criterion-11 verified by log evidence)
   and slice-1-4-soak-8h (8-h, criterion-12 working-set growth = <X> MB ≤ 50).
   Runbook §4.5 (ChaosMonkey) + §4.6 (Soak8h) added; §4.5+ placeholder
   replaced with §4.7+ Phase-2 placeholder. CLAUDE.md and roadmap-progress
   updated; Phase 1 exit-gate banner declared."

═══════════════════════════════════════════════════════════════════════════════
CONSTRAINTS
═══════════════════════════════════════════════════════════════════════════════

- Do NOT skip the 8-hour Soak8h capture.
- Do NOT proceed to commit B if criterion 12 (growth ≤ 50 MB) fails — file
  the gap as a follow-up and report back instead.
- Do NOT capture the soak with another high-CPU workload running on the host.
- Do NOT discard the existing ChaosMonkey CSV. It is the criterion-11 evidence;
  re-running it would consume another 30 min and may yield slightly different
  numbers (Random.Shared is not seeded). Reuse the 2026-05-01 capture.
- Do NOT introduce new code or test changes in commit B. All Pass-3 close
  work in commit B is docs + CSVs only.
- Commit A and commit B must be separate. Do NOT squash them — commit A's
  appsettings flip is the prerequisite for the Soak8h capture in between.

═══════════════════════════════════════════════════════════════════════════════
REPORT FORMAT WHEN FINISHED
═══════════════════════════════════════════════════════════════════════════════

- both commit hashes (A and B)
- the captured Soak8h row block (the 22-metric markdown table) included verbatim
- working-set growth (MB) value with one-sentence interpretation
  ("Working-set grew by <X> MB across the <span>s soak, satisfying the
  criterion-12 ≤ 50 MB ceiling.")
- final test count and any flake notes
- a one-line declaration: "Phase 1 exit gate met on <today's date>."

Operator notes

  • Commit A is reversible; the Soak8h capture is not. Land commit A first, verify the build is green and the FlakySdk regression test passes, then start the 8-hour capture. The capture must be uninterrupted on a sleep-disabled host.
  • The ChaosMonkey CSV stays as evidence even though Enabled flips to false in commit A. The row's Notes section explicitly records the Enabled=true capture-time value and how to reproduce. This mirrors how SLICE-1.1's row 0a/0b notes documented their headless-mode caveat.
  • If criterion 12 fails, do not paper over it. The slice's whole purpose is leak detection; a >50 MB growth is a real signal worth feeding into Phase 2's prioritization. File a follow-up rather than amending the criterion downward.
  • The encoder-cadence test flake (SimulatedEncoderSourceTests.ProduceAsync_At200Hz_*) is a SLICE-1.3 follow-up, not a SLICE-1.4 regression. It is documented in commit A's verification step so Copilot does not block on it.

Docs-first project memory for AI-assisted implementation.