Capturing Measurements Runbook
This runbook covers how to capture the before/after numbers that live in the phase-1-measurements table. Every Phase 1 slice's exit gate — and the baseline "row 0" that precedes Phase 1 — is captured with this procedure.
Read this cold, end to end, the first time. Subsequent captures should take about 15 minutes (10 minutes of scenario + 5 minutes of CSV extraction and table editing).
1. When to use this runbook
Use this procedure when any of the following is true:
- you need to capture the demo baseline (row 0 of the measurements table)
- you are about to start work on a Phase 1 or Phase 2 slice and need the before numbers
- you have just finished a Phase 1 or Phase 2 slice and need the after numbers for the exit gate
- something about the system's behavior under load surprised you, and you want a reproducible capture you can point to in a discussion
Do not use this procedure for debugging a single operator session — the observability runbook covers live counters and log reading.
2. Prerequisites
Check these once per machine:
dotnet-countersinstalled globally:dotnet tool install -g dotnet-counters- TASK-006 (observability baseline) Pass 3 merged — without the
InspectionPrototypemeter, there is nothing to collect beyondSystem.Runtime. - The
docs/captures/directory exists (create it if not; it is committed). docs/reviews/phase-1-measurements.mdexists.
Check these at the start of every capture:
- Working tree is clean and on the commit you intend to measure. A capture that cannot be tied to a specific commit hash is not useful evidence.
- No other
dotnet-counterssession is attached to the same process.
3. The capture procedure
Three terminals. Sequence matters.
Terminal 1 — launch the app
dotnet build --configuration Release
dotnet run --configuration Release --project src/InspectionPrototype.App --no-buildWait until the main window appears and settles (about 2 seconds after launch).
Terminal 2 — start the CSV collector
dotnet-counters ps
# note the PID row for InspectionPrototype.App
dotnet-counters collect \
--name InspectionPrototype.App \
--counters InspectionPrototype,System.Runtime \
--format csv \
--output docs/captures/<capture-name>.csv \
--refresh-interval 1<capture-name> convention: <row-tag>-<yyyy-MM-dd>.csv, for example demo-baseline-2026-04-22.csv or slice-1-1-frame-payloads-after-2026-05-03.csv.
collect runs until you Ctrl+C. It writes one CSV row per second, one column per counter.
Terminal 3 — follow the scenario script
Pick the scenario for the capture you are running. The demo-baseline scenario lives in section 4 of this runbook. Every Phase 1 slice adds its own scenario stub to that section once it lands.
Finishing cleanly
When the scenario completes:
- Ctrl+C Terminal 2 (stops collection, flushes the CSV file).
- Close the app cleanly in Terminal 1 (do not use
taskkill /F— a forced kill skipsOnExitand can leave the single-instance mutex briefly in limbo; the OS cleans it up, but the log trail is cleaner with a graceful exit). - Commit the CSV file and the measurements-table edit in a single commit. CSV files are evidence; they belong in git.
3a. Disable system sleep before any capture
A captured CSV records timestamps at wall-clock granularity. If the machine sleeps mid-capture, the process pauses with it, but the CSV span still grows to include the wall-time gap — diluting every rate metric by the sleep duration. The row 0a / slice-1-1 notes in phase-1-measurements.md describe the failure mode after a 63-min mid-capture sleep stretched a 30-min scenario into a 96-min CSV.
Before any capture longer than a few minutes:
# Disable AC sleep timeout for the capture session.
powercfg /change standby-timeout-ac 0
# Optionally also disable display sleep so the dotnet-counters terminal stays alive.
powercfg /change monitor-timeout-ac 0
# After the capture, restore your usual settings (or run the same commands
# with your preferred minute values).Note on the retired automated capture rig. TASK-1.5 / TASK-1.5.1 added a
--scenario/Capture-Measurements.ps1headless-capture path that droveIOperatorCommandsdirectly without rendering the UI. That rig was retired in favor of a UI-Automation-driven approach (planned SLICE-1.6,FlaUI); rows 0a, 0b, andslice-1-1-multi-tag-telemetrywere captured under it and remain valid historical evidence. The scenario classes,ScenarioRunner,Capture-Measurements.ps1, and the--scenarioCLI flag have all been removed; the manual §3 procedure is the supported capture path until the FlaUI rig lands. The CSV-math helpers intools/MeasurementExtraction.psm1(ConvertTo-MeasurementRow,Get-GcPauseP95,Get-LohAllocRateAvg) are kept and are the recommended way to extract a row from a manually-collected CSV.
3b. Automated capture with the FlaUI rig (SLICE-1.6)
An earlier headless rig (SLICE-1.5) drove ICommand instances directly without rendering the UI; that rig was retired 2026-04-27 in favour of this UI-Automation approach.
SLICE-1.6 ships a tools/Capture-Measurements.ps1 orchestrator that drives the real WPF main window via FlaUI/UIA3. This exercises the full XAML-binding path — exactly what a real operator would do — while automating away the stopwatch, button-clicking, and CSV extraction steps. Every registered scenario takes about DurationSeconds + 40 s wall-clock time and exits with code 0 on success or 1 on any scenario or extraction failure.
Invocation:
cd <repo root>
tools\Capture-Measurements.ps1 `
-Scenario <name> `
-DurationSeconds <secs> `
-OutputCsv docs/captures/<name>-<date>.csv `
-CommitHash (git rev-parse --short HEAD) `
[-Profile <profileName>] `
[-SliceTag <tag>] `
[-AppendToTable]-AppendToTable appends the 18-metric markdown block to docs/reviews/phase-1-measurements.md under the ## Phase 1 rows heading automatically. Without it, the block is printed to stdout and must be pasted manually.
FlaUI prerequisites
- Foreground window required. FlaUI drives the real WPF window. The app must be visible on the primary display — not minimised, not covered by a full-screen overlay. The orchestrator launches the app visibly (
-NoNewWindow/ no-WindowStyle Hidden); do not move it to the background. - No screen lock during capture. A locked workstation pauses the WPF dispatcher, which pauses the scenario. The
powercfgcommands above eliminate the automatic lock; do not lock manually. - Display scaling = 100% recommended. FlaUI element hit-testing uses logical pixels. Non-100% display scaling can shift coordinates under some controls. If a click misses its target, setting scaling to 100% usually fixes it.
Registered scenarios
-Scenario value | FlaUI test class | Default -DurationSeconds | Profile flag needed |
|---|---|---|---|
DemoBaseline | DemoBaselineFlaUi | 600 | none (Normal) |
MultiTagSoak | MultiTagSoakFlaUi | 1800 | none (MultiTag) |
MultiTagSoak | MultiTagSoakFlaUi | 600 | -Profile HighFrameRate for SLICE-1.2 |
MultiTagSoak | MultiTagSoakFlaUi | 600 | -Profile EncoderRate for SLICE-1.3 |
MultiTagSoak | MultiTagSoakFlaUi | 1800 | -Profile ChaosMonkey for SLICE-1.4 (requires FlakySdk:Enabled=true — see §4.5) |
MultiTagSoak | MultiTagSoakFlaUi | 28800 | -Profile Soak8h for SLICE-1.4 (dedicated session — see §4.6) |
MultiTagSoak | MultiTagSoakFlaUi | 1800 | -Profile HighDefect for SLICE-3.1 (see §5.3) |
When to fall back to manual §3
Use the manual §3 procedure instead of the FlaUI rig when:
- doing a quick exploratory capture shorter than 60 s (manual setup is faster than the rig's 40 s overhead for very short windows)
- on a machine that does not have the full dev environment cloned (
dotnet testand the AcceptanceTests project are required by the orchestrator) - troubleshooting a FlaUI element-not-found failure — run the app interactively, use Inspect.exe or Accessibility Insights to verify the AutomationId is present, then re-run the rig
4. Scenarios
Each capture runs against a named scenario so that captures are comparable across commits. A scenario is a fixed sequence of operator actions plus timing that the app is driven through. Scenarios are not optional — an undefined sequence produces unreproducible numbers.
4.1 Demo baseline (row 0)
Used for the reference row that precedes Phase 1. Captures the current simulator at current rates with no architectural changes.
Duration: 10 minutes (stopwatch; the last Start click must land before 10:00)
Preconditions: app just launched; no prior state; simulator profile "Normal"Note: Start Run is disabled after a run reaches Completed (CommandGuards.CanStart requires Idle | Ready). Home returns the workflow to Ready.
Steps:
1. Click Connect. Wait until status = Connected.
2. In the Recipe panel, click Refresh. Wait until catalog populates.
3. Select the recipe "standard-5pt-wafer-scan".
4. Click Load Recipe. Wait until recipe is loaded.
5. Click Home. Wait until homing completes.
6. Click Start Run. Wait until the run completes.
7. Click Home. Wait until homing completes.
8. Repeat steps 6-7 continuously until the 10:00 mark on the stopwatch.
9. If a run is in progress at 10:00, click Stop; otherwise skip.
10. Click Disconnect.
Do not: change the simulator profile mid-capture; open unrelated UI panes;
minimize the window.At current simulator rates this produces 20–40 completed runs in the 10-minute window, enough volume to produce meaningful counter totals.
4.2 Multi-tag soak — slice-1.1, MultiTag profile
Used for the slice-1-1-multi-tag-telemetry row of the measurements table. Drives the new tag stream (50 tags, 1–500 Hz) under the MultiTag simulator profile so we can measure per-tag emit rates and the snapshot-pipeline overhead introduced by SLICE-1.1.
Additional prerequisites (beyond §2):
- on a build with the per-tag metrics wired (TASK-1.1 Pass 3 or later — the build must expose
samples.ingestedandsamples.coalescedon theInspectionPrototypemeter) - the seed
appsettings.jsoncontains exactly 50 entries underSimulator:Tags, including the reserved namestemperature.celsiusandpressure.bar - a
MultiTagprofile is present inSimulator:Profiles(built-in fallback inInfrastructureServiceCollectionExtensionsif absent fromappsettings.json)
Duration: 30 minutes (stopwatch; the last Start click must land before 30:00)
Preconditions: app just launched; no prior state; simulator profile MultiTagNote: Start Run is disabled after a run reaches Completed (CommandGuards.CanStart requires Idle | Ready). Home returns the workflow to Ready.
Steps:
1. In the Simulator Profile selector, switch to "MultiTag" BEFORE Connect.
2. Click Connect. Wait until status = Connected.
3. In the Recipe panel, click Refresh. Wait until catalog populates.
4. Select the recipe "standard-5pt-wafer-scan".
5. Click Load Recipe. Wait until recipe is loaded.
6. Click Home. Wait until homing completes.
7. Click Start Run. Wait until the run completes.
8. Click Home. Wait until homing completes.
9. Repeat steps 7-8 continuously until the 30:00 mark on the stopwatch.
10. If a run is in progress at 30:00, click Stop; otherwise skip.
11. Click Disconnect.
Do not: change the simulator profile mid-capture; edit Simulator:Tags;
open unrelated UI panes; minimize the window.Capture command (Terminal 2):
dotnet-counters collect \
--name InspectionPrototype.App \
--counters InspectionPrototype,System.Runtime \
--format csv \
--output docs/captures/slice-1-1-multi-tag-<YYYY-MM-DD>.csv \
--refresh-interval 1Start collection ~30 seconds before clicking Connect (warm-up) and stop it ~60 seconds after Disconnect (cool-down). The 16-metric row in phase-1-measurements.md is computed across the full CSV duration, the same convention as row 0; the per-tag rate check below uses the same window.
Sanity check before extracting numbers:
After Ctrl+C on the collector, eyeball the CSV:
tags.activeshould read 50 for the steady-state portion of the run. If it reads0, the producer started with an empty tag registry —appsettings.jsondid not load. Re-launch from a working directory that containsappsettings.jsonand re-capture.telemetry.ingested (Count / 1 sec)should read ~20 Hz steady-state (theMultiTagprofile publishes snapshots every 50 ms). If it reads ~5, the active profile wasNormal, notMultiTag— re-capture after switching the profile selector.samples.ingested (Count / 1 sec)[tag.name=…]rows should appear for at least 50 distinct tag names. If none appear, the per-tag metric wiring is not in the running build (Pass 3 not yet merged or built).
If any of these three checks fail, the capture is not measuring what SLICE-1.1 cares about. Throw it out and re-run.
Per-tag rate-error post-processing (PowerShell):
This is the verifier for SLICE-1.1 acceptance criterion 7 (every tag's observed rate within ±2% of configured IntervalMs). It groups samples.ingested rows by their tag.name dimension, computes the observed Hz against Simulator:Tags[i].IntervalMs, and exits non-zero if any tag is out of bounds or missing.
$csvPath = 'docs/captures/slice-1-1-multi-tag-<YYYY-MM-DD>.csv'
$cfgPath = 'src/InspectionPrototype.App/appsettings.json'
# Load expected per-tag IntervalMs from config (strip JSON comments first).
$cfgRaw = Get-Content $cfgPath -Raw
$cfgRaw = $cfgRaw -replace '/\*[\s\S]*?\*/','' -replace '(?m)//.*$',''
$cfg = $cfgRaw | ConvertFrom-Json
$expected = @{}
foreach ($t in $cfg.Simulator.Tags) { $expected[$t.Name] = [double]$t.IntervalMs }
$csv = Import-Csv $csvPath
$first = [datetime]::Parse($csv[0].Timestamp)
$last = [datetime]::Parse($csv[-1].Timestamp)
$durSec = ($last - $first).TotalSeconds
# Counter Name shape: 'samples.ingested (Count / 1 sec)[tag.name=foo.bar]'
$pattern = '^samples\.ingested.*\[tag\.name=(?<tag>[^\]]+)\]$'
$samples = $csv | Where-Object { $_.'Counter Name' -match $pattern } |
ForEach-Object {
$null = $_.'Counter Name' -match $pattern
[pscustomobject]@{ Tag = $Matches.tag; Inc = [double]$_.'Mean/Increment' }
}
$rows = New-Object System.Collections.Generic.List[object]
$missing = New-Object System.Collections.Generic.List[string]
$worst = 0.0
foreach ($name in $expected.Keys) {
$expHz = 1000.0 / $expected[$name]
$hits = $samples | Where-Object { $_.Tag -eq $name }
if (-not $hits) { $missing.Add($name); continue }
$actHz = (($hits | Measure-Object -Property Inc -Sum).Sum) / $durSec
$err = 100.0 * ($actHz - $expHz) / $expHz
$rows.Add([pscustomobject]@{
Tag = $name
ExpectedHz = [math]::Round($expHz, 2)
ActualHz = [math]::Round($actHz, 2)
ErrorPct = [math]::Round($err, 2)
})
if ([math]::Abs($err) -gt [math]::Abs($worst)) { $worst = $err }
}
$rows | Sort-Object { [math]::Abs($_.ErrorPct) } -Descending |
Format-Table -AutoSize | Out-String | Tee-Object -FilePath `
"docs/captures/slice-1-1-multi-tag-$(Get-Date -Format 'yyyy-MM-dd')-rate-check.txt"
if ($missing.Count -gt 0) {
Write-Error "MISSING samples.ingested for $($missing.Count) tag(s): $($missing -join ', ')"
exit 2
}
if ([math]::Abs($worst) -gt 2.0) {
Write-Error "FAIL: max per-tag rate error $([math]::Round($worst,2))% exceeds ±2%."
exit 1
}
"PASS: max per-tag rate error $([math]::Round($worst,2))%."Commit the rate-check .txt next to the CSV. The 16-metric row goes through the §5 PowerShell extraction script unchanged — telemetry.ingested rate (Hz) will read at the snapshot rate (~20 Hz under MultiTag), not the per-tag rate; per-tag totals live in samples.ingested only and do not appear in the 16-metric table by design.
4.3 — Real frame payloads (SLICE-1.2, 30 fps × 2 MP, HighFrameRate profile)
Scenario: MultiTagSoak with the HighFrameRate simulator profile, 10 minutes.
Use the §3b FlaUI rig:
.\tools\Capture-Measurements.ps1 `
-Scenario MultiTagSoak `
-Profile HighFrameRate `
-DurationSeconds 600 `
-OutputCsv docs/captures/slice-1-2-high-fps-YYYY-MM-DD.csv `
-SliceTag slice-1-2-real-frame-payloads `
-CommitHash $(git rev-parse --short HEAD) `
-AppendToTableProfile: HighFrameRate — 2048 × 1024 × 1 byte, 33 ms frame interval (≈ 30 fps), 50 ms telemetry interval. Expected LOH activity: gen-2 GC count > 0; LOH-alloc-rate-avg ≈ 1 MB/s (averaged over the full 600 s capture); alloc-rate ≈ 300× higher than Normal-profile baseline. frames.ingested note: SimulatedCamera streams only while a run is actively executing (Connected + Running state). The criterion (≥ 17 500) assumes continuous streaming; with the multi-cycle MultiTagSoak scenario, the observed count will be lower (~8 000–12 000 depending on run count). See the criterion-scope clarification note in docs/reviews/phase-1-measurements.md row slice-1-2-real-frame-payloads.
4.4 Encoder-rate motion — SLICE-1.3, EncoderRate profile
Scenario: MultiTagSoak with the EncoderRate simulator profile, 10 minutes.
Use the §3b FlaUI rig:
.\tools\Capture-Measurements.ps1 `
-Scenario MultiTagSoak `
-Profile EncoderRate `
-DurationSeconds 600 `
-OutputCsv docs/captures/slice-1-3-encoder-rate-YYYY-MM-DD.csv `
-SliceTag slice-1-3-encoder-rate-motion `
-CommitHash $(git rev-parse --short HEAD) `
-AppendToTableProfile: EncoderRate — mirrors MultiTag (640 × 480 × 1 frames @ 500 ms, 50 ms telemetry snapshot, 50 tags) but adds EncoderIntervalMs = 1 so SimulatedEncoderSource ticks at 1 ms nominal cadence. The producer acquires winmm!timeBeginPeriod(1) on StartAsync to lift the Windows timer-resolution floor from ~15.6 ms to 1 ms.
Architectural design under test: the encoder stream is drained by EncoderStreamPipelineService and emits per-axis metrics, but does not write to AppState. The MultiTagSoakFlaUi scenario continues unchanged; the encoder stream runs as a background IHostedService for the lifetime of the host. A passing capture is one where runs.faulted = 0, frames.dropped = 0, tags.active = 50 even with the encoder producer ticking at 1 ms.
Sanity checks before extracting numbers:
tags.activereads 50 (regression check from the TASK-1.1 workdir bug).frames.dropped (Count / 1 sec)is absent or zero (the encoder stream must not starve the frame pipeline).runs.faulted (Count / 1 sec)is absent or zero (no encoder-pipeline-caused faults).- The 20-metric row block (printed by
ConvertTo-MeasurementRow) includesencoder-rate-xandencoder-rate-yrows — both should be in the hundreds-of-Hz range. Exact target is documented, not gated (see SLICE-1.3 criterion-7 amendment); a typicalPeriodicTimer + timeBeginPeriod(1)combination on a default Windows host lands in the 600–800 Hz range.
System-wide effect — timeBeginPeriod(1). The producer raises the Windows multimedia-timer resolution while the app is running. This is a process-issued, system-wide effect: other processes on the host see the same elevated timer resolution until the app exits. On a dedicated capture machine this is invisible; on a shared/dev workstation, latency-sensitive applications (audio, real-time video) may behave differently while the capture is running. Prefer a dedicated session.
When the receiver rate falls below ~500 Hz: the most likely cause is that AcquireOrFallback returned the no-op disposable instead of the real WinMmTimePeriod. Check the app's startup log (%LOCALAPPDATA%\InspectionPrototype\logs\) for a warning containing timeBeginPeriod — if present, the P/Invoke failed (non-Windows host, sandbox restriction, or winmm.dll absent) and the producer ran at the default ~15.6 ms tick (~64 Hz cap). The capture is still architecturally valid evidence of the stream-bypass design but the rate row should be flagged.
Implemented by: MultiTagSoakFlaUi with -Profile EncoderRate (no new IScenario or new FlaUI test class — the existing scenario reads SIMULATOR_PROFILE env var and the orchestrator wires -Profile through).
4.5 Chaos-monkey scenario — SLICE-1.4, ChaosMonkey profile
SLICE-1.4 criterion-11 evidence capture. Drives all four fault branches in WorkflowService (connect-failure, fault-during-home, fault-during-run, fault-clear-and-recover) under aggressive chaos settings so that log inspection can confirm branch coverage.
PREREQUISITE — flip FlakySdk:Enabled before building. The merged appsettings.json ships Simulator:FlakySdk:Enabled = false so that earlier rows reproduce against the current build. Before running this scenario, open src/InspectionPrototype.App/appsettings.json, change "Enabled": false to "Enabled": true in the Simulator:FlakySdk block, then build. Restore to false and rebuild after the capture. The row's Notes section documents this so the capture remains interpretable.
# After flipping Enabled=true and rebuilding:
$date = Get-Date -Format 'yyyy-MM-dd'
.\tools\Capture-Measurements.ps1 `
-Scenario MultiTagSoak `
-Profile ChaosMonkey `
-DurationSeconds 1800 `
-OutputCsv "docs/captures/slice-1-4-chaos-monkey-$date.csv" `
-SliceTag slice-1-4-chaos-monkey `
-CommitHash (git rev-parse --short HEAD) `
-AppendToTable
# Then restore Enabled=false and rebuild.Sanity checks before recording the row:
runs.started≥ 5,runs.faulted≥ 5,fault-cycles (count)≥ 5 (the chaos knobs are aggressive enough that these thresholds are met in any 30-min window withAlarmBurstEveryMs = 45 000).frames.droppedrecorded (absent means zero — valid).- All four fault-branch log-line types present (use the
Select-Stringrecipe below to count each).
PowerShell log inspection — fault-branch coverage:
# Replace the log filename with today's date.
$log = "Logs\inspection-prototype-$(Get-Date -Format 'yyyyMMdd').log"
# Branch (a) — connect-failure paths
Select-String -Path $log -Pattern 'Connection failed \(simulated failure\)|FlakySdk: out-of-band|Connection error:' |
Measure-Object | Select-Object -ExpandProperty Count
# Branch (b)/(c) — critical fault injection total
Select-String -Path $log -Pattern 'CRITICAL FAULT: \[CHAOS-' |
Measure-Object | Select-Object -ExpandProperty Count
# Branch (d) — fault cleared + recovery
Select-String -Path $log -Pattern 'Fault condition cleared: \[CHAOS-' |
Measure-Object | Select-Object -ExpandProperty Count
Select-String -Path $log -Pattern 'Recovery completed\.' |
Measure-Object | Select-Object -ExpandProperty Count
# Defect-shower transitions
Select-String -Path $log -Pattern 'DefectShower' |
Measure-Object | Select-Object -ExpandProperty CountAll four branch types must have a non-zero count.
The row block is 22-metric — the two new metrics beyond the standard 20 are:
working-set growth (MB)— derived from the CSV'sworking_setgauge:last_value − first_valuedivided by 1 MB (seeGet-WorkingSetGrowthMbintools/MeasurementExtraction.psm1).fault-cycles (count)—runs.faultedsum (derived fromGet-FaultCyclesCount).
Implemented by: MultiTagSoakFlaUi with -Profile ChaosMonkey (no new FlaUI scenario class — the existing MultiTagSoakFlaUi reads the SIMULATOR_PROFILE env var; the orchestrator wires -Profile through).
4.6 Soak scenario — SLICE-1.4, Soak8h profile
SLICE-1.4 criterion-12 evidence capture. Runs the full application under sustained low-chaos load for 8 real-time hours to detect working-set growth (memory leak bar).
Do not run on a host you also intend to use during the capture. The 8-hour window requires the display to remain on and unlocked continuously. FlaUI needs a visible, unobstructed foreground window. Light background work on a secondary monitor is acceptable; CPU-intensive workloads (builds, video encoding) will skew GC and alloc-rate metrics.
Prerequisites:
- Hibernate disabled:
powercfg /hibernate off(requires elevated shell) - AC standby timeout disabled:
powercfg /change standby-timeout-ac 0 - Screen lock disabled for the session
- No other active
dotnet-counterssession attached to the app Simulator:FlakySdk:Enabled = false(the default inappsettings.json)
$date = Get-Date -Format 'yyyy-MM-dd'
.\tools\Capture-Measurements.ps1 `
-Scenario MultiTagSoak `
-Profile Soak8h `
-DurationSeconds 28800 `
-OutputCsv "docs/captures/slice-1-4-soak-8h-$date.csv" `
-SliceTag slice-1-4-soak-8h `
-CommitHash (git rev-parse --short HEAD) `
-AppendToTableSanity checks after the run:
- Capture span ≥ 28 500 s (≤ 1% drift from 8 h). If less, the host slept or paused; discard the CSV and restart.
working-set growth (MB)≤ 50 (criterion-12 ceiling). Note: thelast − firstmetric captures the startup ramp (first 30 min of initialisation) as well as any in-flight leak. The 2026-05-03 reference capture showed a 186.5 MBlast − firstthat is entirely startup cost (stable 228–238 MB sawtooth for hours 1–8); a criterion-12 amendment replacinglast − firstwith a startup-excluded steady-state metric is filed as a follow-up. Until amended, treatlast − first> 50 MB as a flag requiring time-series inspection before concluding a real leak exists.- Gen-2 GC count rate ≤ 4× the
slice-1-2-real-frame-payloadsrate (16 072/hr × 4 = 64 288/hr ceiling). - No unhandled-exception entries in
Logs/inspection-prototype-*.log. runs.faultednear zero (Soak8hhasAlarmBurstEveryMs = 0; onlyConnectionFailureProbability = 0.05misconnects apply, handled by retry).
If the capture is interrupted: discard the partial CSV and restart. The last − first growth math is only meaningful on an uninterrupted real-time run; a partial CSV will show artificially low or misleading growth.
Implemented by: MultiTagSoakFlaUi with -Profile Soak8h.
4.7+ — Phase 2 scenarios
Phase 2 was deferred following the SLICE-2.0 measurement (2026-05-07). All three rubric gates cleared: store alloc share 0.5% (gate: ≥10%), lock-wait p95 0.4 µs (gate: ≥100 µs), no data-plane pipeline dominates calls (WorkflowService and TagStreamPipelineService each hold 46.4%). No Phase 2 scenarios are scheduled. If a future slice re-triggers the rubric, §5 scenarios will be added here.
5. Phase 2 capture scenarios
5.1 Store-allocation profiling — SLICE-2.0, MultiTag profile
SLICE-2.0 baseline capture. Measures AppStateStore.Update allocation share, lock-wait distribution, call rate, and caller distribution under the MultiTag profile, which drives the highest sustained store write rate of any baseline profile (~43 calls/sec from 50-tag emitters + position events + run lifecycle).
Scenario: MultiTagSoak with the default MultiTag simulator profile, 30 minutes.
$date = Get-Date -Format 'yyyy-MM-dd'
.\tools\Capture-Measurements.ps1 `
-Scenario MultiTagSoak `
-Profile MultiTag `
-DurationSeconds 1800 `
-OutputCsv "docs/captures/slice-2-0-store-profiling-$date.csv" `
-SliceTag slice-2-0-store-profiling `
-CommitHash (git rev-parse --short HEAD)Note: -AppendToTable is not used here because the row goes to docs/reviews/phase-2-measurements.md, not phase-1-measurements.md. After the capture completes, run ConvertTo-MeasurementRow manually and paste the output into phase-2-measurements.md.
Sanity checks before recording the row:
tags.activereads 50 steady-state. If 0, appsettings.json did not load (launch from the app's bin directory).telemetry.ingested (Count / 1 sec)reads ~20 Hz steady-state. If ~5 Hz, the profile was Normal not MultiTag.store.update.calls (Count / 1 sec)[caller=…]rows appear for at least 2 distinct callers. If absent, the SLICE-2.0 instrumentation is not in the running build.store.update.lock_wait.microshistogram rows appear. If absent, same cause.
Extracting the 26-metric row:
Import-Module tools\MeasurementExtraction.psm1 -Force
$csv = Import-Csv "docs\captures\slice-2-0-store-profiling-$date.csv"
ConvertTo-MeasurementRow `
-CsvPath "docs\captures\slice-2-0-store-profiling-$date.csv" `
-SliceTag 'slice-2-0-store-profiling' `
-Scenario 'MultiTagSoak' `
-CommitHash (git rev-parse --short HEAD) `
-Date $dateFull caller distribution (not in the 26-metric table — goes in Notes):
Get-StoreUpdateCallerDistribution -Csv $csv | Format-Table -AutoSizeReference capture: docs/captures/slice-2-0-store-profiling-2026-05-07.csv (1812 s, 77 125 total store.update calls). Row committed in docs/reviews/phase-2-measurements.md.
5.2 Subscriber-invocation profiling — SLICE-2.4, HighDefect profile
SLICE-2.4 capture. Measures subscriber.invocations rate, subscriber-to-store ratio, and selector distribution under the HighDefect simulator profile. The HighDefect profile is chosen because it produces the highest defect-event density, which drives the most discriminating subscriber call pattern (subscribers for defect-related state fire at full rate, while unrelated subscribers should remain idle).
Exit gate: subscriber-to-store ratio ≤ 4.0 (≥ 80% reduction from the pre-slice baseline of ~20.0, which reflected one monolithic Project(state) call per store update touching ~20 property groups).
Scenario: MultiTagSoak with -Profile HighDefect, 30 minutes.
$date = Get-Date -Format 'yyyy-MM-dd'
.\tools\Capture-Measurements.ps1 `
-Scenario MultiTagSoak `
-Profile HighDefect `
-DurationSeconds 1800 `
-OutputCsv "docs/captures/slice-2-4-per-slice-observables-$date.csv" `
-SliceTag slice-2-4-per-slice-observables `
-CommitHash (git rev-parse --short HEAD)Note: After the capture completes, run ConvertTo-MeasurementRow manually and paste the output into phase-2-measurements.md.
Sanity checks before recording the row:
tags.activereads 50 steady-state.telemetry.ingested (Count / 1 sec)reads ~20 Hz steady-state.subscriber.invocations (Count / 1 sec)[selector=…]rows appear for at least 3 distinct selectors (one per per-slice subscription inMainViewModel). If absent, Pass 1 AppMetrics instrumentation is not in the running build.store.update.calls (Count / 1 sec)[caller=…]rows appear (needed for ratio computation).
Extracting the 29-metric row (26 base + 3 subscriber metrics):
Import-Module tools\MeasurementExtraction.psm1 -Force
$date = Get-Date -Format 'yyyy-MM-dd'
$csvPath = "docs\captures\slice-2-4-per-slice-observables-$date.csv"
ConvertTo-MeasurementRow `
-CsvPath $csvPath `
-SliceTag 'slice-2-4-per-slice-observables' `
-Scenario 'MultiTagSoak (HighDefect)' `
-CommitHash (git rev-parse --short HEAD) `
-Date $dateFull selector distribution (not in the 29-metric table — goes in Notes):
$csv = Import-Csv $csvPath
Get-SubscriberSelectorDistribution -Csv $csv | Format-Table -AutoSize5.3 SQLite persistence profiling — SLICE-3.3, MultiTag profile
SLICE-3.3 capture. Measures the same 26 Phase 2 metrics plus three new persistence metrics: runs and alarms written to SQLite during the capture window, and the p95 history-hydration load time. The DB is pre-populated with 10 000 synthetic rows so that pagination and LoadRecentAsync behaviour is exercised from a non-trivial starting state.
Scenario: MultiTagSoak with the default MultiTag simulator profile, 30 minutes.
Step 1 — Disable sleep and build
powercfg /change standby-timeout-ac 0
powercfg /change monitor-timeout-ac 0
dotnet build --configuration ReleaseStep 2 — Locate the database file
$dbPath = "$env:LOCALAPPDATA\InspectionPrototype\inspection.db"If the DB does not exist yet, launch the app once (the MigrationRunner hosted service creates the schema on first start) and then close it before proceeding. Check:
Test-Path $dbPathStep 3 — Back up and pre-populate
$date = Get-Date -Format 'yyyy-MM-dd'
# Optional: back up any existing DB to avoid contaminating the snapshot delta
if (Test-Path $dbPath) {
Copy-Item $dbPath "$dbPath.pre-$date"
}
# Pre-populate with 10 000 synthetic rows
.\tools\Populate-SyntheticHistory.ps1 -DatabasePath $dbPath -RowCount 10000
# Snapshot before state
Copy-Item $dbPath "$env:TEMP\inspection-before-$date.db"The populate script is idempotent — if the DB already has ≥10 000 rows it prints a no-op message and exits.
Step 4 — Run the capture
.\tools\Capture-Measurements.ps1 `
-Scenario MultiTagSoak `
-Profile MultiTag `
-DurationSeconds 1800 `
-OutputCsv "docs/captures/slice-3-3-sqlite-persistence-$date.csv" `
-SliceTag slice-3-3-sqlite-persistence `
-CommitHash (git rev-parse --short HEAD)Note: -AppendToTable is not used here because the row goes to docs/reviews/phase-3-measurements.md, not phase-1-measurements.md.
Step 5 — Snapshot after state
Copy-Item $dbPath "$env:TEMP\inspection-after-$date.db"Step 6 — Extract the 29-metric row
Import-Module tools\MeasurementExtraction.psm1 -Force
$logDate = Get-Date -Format 'yyyyMMdd'
$logPath = "$env:LOCALAPPDATA\InspectionPrototype\logs\app-$logDate.log"
$dbBefore = "$env:TEMP\inspection-before-$date.db"
$dbAfter = "$env:TEMP\inspection-after-$date.db"
ConvertTo-MeasurementRow `
-CsvPath "docs\captures\slice-3-3-sqlite-persistence-$date.csv" `
-SliceTag 'slice-3-3-sqlite-persistence' `
-Scenario 'MultiTagSoak' `
-CommitHash (git rev-parse --short HEAD) `
-Date $date `
-DatabaseBefore $dbBefore `
-DatabaseAfter $dbAfter `
-LogPath $logPathThe output block ends with three Phase 3 rows:
| runs.persisted (count) | <value> |
| alarms.persisted (count) | <value> |
| recent-history-load p95 (ms) | <value> |Paste the full output block into docs/reviews/phase-3-measurements.md under ### Row — slice-3-3-sqlite-persistence.
Sanity checks before recording the row:
runs.persisted> 0. If 0, theSqliteRunHistoryStore.SaveAsyncpath is not being hit (check that a run actually completed —runs.completed> 0).alarms.persisted≥ 0. UnderMultiTag(no ChaosMonkey) there may be zero alarms, which is expected and correct.recent-history-load p95is a non-zero value. If—, the app log at$logPathdid not contain a matching hydration line; verify the correct log date is being read.runs.persisted + 10000≈ the row count in$dbAfter'srun_summariestable.
5.4 Rich defect model capture — SLICE-3.1, HighDefect profile
SLICE-3.1 capture. Measures the standard 29 Phase 2/3 metrics plus three new defect-specific metrics: defect rows written to SQLite during the capture window, classification distribution, and p95 persist latency. The HighDefect simulator profile fires frequent defects to stress the defect pipeline.
Scenario: MultiTagSoak with -Profile HighDefect, 30 minutes. Exit gate: defects.persisted ≥ 5000.
Step 1 — Disable sleep and build
powercfg /change standby-timeout-ac 0
powercfg /change monitor-timeout-ac 0
dotnet build --configuration ReleaseStep 2 — Locate the database file
$dbPath = "$env:LOCALAPPDATA\InspectionPrototype\inspection.db"If the DB does not exist yet, launch the app once (the MigrationRunner hosted service creates the schema on first start) and then close it before proceeding:
Test-Path $dbPathStep 3 — Snapshot before state
No pre-population is required — defects are generated by the simulator profile.
$date = Get-Date -Format 'yyyy-MM-dd'
# Optional: back up any existing DB
if (Test-Path $dbPath) {
Copy-Item $dbPath "$dbPath.pre-$date"
}
# Snapshot before state
Copy-Item $dbPath "$env:TEMP\inspection-before-$date.db"Step 4 — Run the capture
.\tools\Capture-Measurements.ps1 `
-Scenario MultiTagSoak `
-Profile HighDefect `
-DurationSeconds 1800 `
-OutputCsv "docs/captures/slice-3-1-rich-defect-model-$date.csv" `
-SliceTag slice-3-1-rich-defect-model `
-CommitHash (git rev-parse --short HEAD)Note: -AppendToTable is not used here because the row goes to docs/reviews/phase-3-measurements.md, not phase-1-measurements.md.
Step 5 — Snapshot after state
Copy-Item $dbPath "$env:TEMP\inspection-after-$date.db"Step 6 — Extract the 32-metric row
Import-Module tools\MeasurementExtraction.psm1 -Force
$logDate = Get-Date -Format 'yyyyMMdd'
$logPath = "$env:LOCALAPPDATA\InspectionPrototype\logs\app-$logDate.log"
$dbBefore = "$env:TEMP\inspection-before-$date.db"
$dbAfter = "$env:TEMP\inspection-after-$date.db"
ConvertTo-MeasurementRow `
-CsvPath "docs\captures\slice-3-1-rich-defect-model-$date.csv" `
-SliceTag 'slice-3-1-rich-defect-model' `
-Scenario 'MultiTagSoak (HighDefect)' `
-CommitHash (git rev-parse --short HEAD) `
-Date $date `
-DatabaseBefore $dbBefore `
-DatabaseAfter $dbAfter `
-LogPath $logPathThe output block ends with three Phase 3 defect rows:
| defects.persisted (count) | <value> |
| defect-classification distribution | <value> |
| defect-persist p95 (ms) | <value> |Paste the full output block into docs/reviews/phase-3-measurements.md under ### Row — slice-3-1-rich-defect-model.
Sanity checks before recording the row:
defects.persisted≥ 5 000. If below, the HighDefect profile is not configured to fire at the expected rate — checkAppSettings.jsonsimulator profile settings.defect-classification distributionshows all five classification names with roughly equal percentages (the HighDefect profile generates a uniform mix).defect-persist p95 (ms)will be—unless Debug-level Serilog logging is enabled (it requiresLogLevel.DebugforInspectionPrototype.Application.Services.FramePipelineService). That is acceptable.
dotnet-counters collect --format csv writes long format: one row per counter per refresh interval, not one column per counter. The columns are:
Timestamp, Provider, Counter Name, Counter Type, Mean/IncrementCounter TypeisRate(a per-interval delta — e.g. frames ingested in this second) orMetric(a gauge snapshot — e.g. working-set bytes right now).- For
Ratecounters, sumMean/Incrementacross rows to get the scenario total. Taking the last row gives you the last second's delta, not the cumulative total. - For
Metriccounters, takeMax/Min/Avgas the metric demands. - Counters that never get incremented at all do not appear in the CSV —
dotnet-countersonly emits rows for counters that produced at least one sample. Aframes.droppedcounter that stays at zero will be absent from the file, not present-with-zero. Record it as0in the table.
Values to extract for the measurements table
The Counter Name column uses the raw OTel-style names from the runtime's built-in meter, not the legacy System.Runtime.* EventCounter names.
| Metric | Counter Name (Provider) | How to aggregate |
|---|---|---|
| frames.ingested (total) | frames.ingested (Count / 1 sec) (InspectionPrototype) | sum Mean/Increment |
| frames.ingested rate (fps) | same | total ÷ duration seconds |
| frames.dropped (total) | frames.dropped (Count / 1 sec) (InspectionPrototype) | sum, or 0 if absent |
| telemetry.ingested (total) | telemetry.ingested (Count / 1 sec) (InspectionPrototype) | sum |
| telemetry.ingested rate (Hz) | same | total ÷ duration seconds |
| telemetry.coalesced (total) | telemetry.coalesced (Count / 1 sec) (InspectionPrototype) | sum, or 0 if absent |
| runs.started | runs.started (Count / 1 sec) (InspectionPrototype) | sum |
| runs.completed | runs.completed (Count / 1 sec) (InspectionPrototype) | sum |
| runs.faulted | runs.faulted (Count / 1 sec) (InspectionPrototype) | sum, or 0 if absent |
| working-set peak (MB) | dotnet.process.memory.working_set (By) (System.Runtime) | max ÷ 1 048 576 |
| gen-0-gc-count (total) | dotnet.gc.collections ({collection} / 1 sec)[gc.heap.generation=gen0] | sum |
| gen-1-gc-count (total) | dotnet.gc.collections ({collection} / 1 sec)[gc.heap.generation=gen1] | sum |
| gen-2-gc-count (total) | dotnet.gc.collections ({collection} / 1 sec)[gc.heap.generation=gen2] | sum |
| alloc-rate avg (B/s) | dotnet.gc.heap.total_allocated (By / 1 sec) (System.Runtime) | avg |
| cpu-usage avg / peak (%) | dotnet.process.cpu.time (s / 1 sec)[cpu.mode=user] + [cpu.mode=system] | see below |
CPU% is not a direct counter. Per timestamp, sum the user and system increments (seconds of CPU time used in that wall-second), divide by dotnet.process.cpu.count ({cpu}) to normalize to % of a single core, and multiply by 100. Take avg across timestamps for the avg row, max for the peak row.
PowerShell extraction script
Copy this into a .ps1 or run it inline. Replace the capture filename. It prints every value needed for a measurements-table row.
$csv = Import-Csv docs/captures/<capture-name>.csv
$first = [datetime]::Parse($csv[0].Timestamp)
$last = [datetime]::Parse($csv[-1].Timestamp)
$dur = ($last - $first).TotalSeconds
function SumCounter($name) { (($csv | Where-Object { $_.'Counter Name' -eq $name } |
ForEach-Object { [double]$_.'Mean/Increment' }) | Measure-Object -Sum).Sum }
function AvgCounter($name) { (($csv | Where-Object { $_.'Counter Name' -eq $name } |
ForEach-Object { [double]$_.'Mean/Increment' }) | Measure-Object -Average).Average }
function MaxCounter($name) { (($csv | Where-Object { $_.'Counter Name' -eq $name } |
ForEach-Object { [double]$_.'Mean/Increment' }) | Measure-Object -Maximum).Maximum }
$cpuCount = [int](($csv | Where-Object { $_.'Counter Name' -eq 'dotnet.process.cpu.count ({cpu})' } |
Select-Object -First 1).'Mean/Increment')
$cpuByTs = @{}
$csv | Where-Object { $_.'Counter Name' -eq 'dotnet.process.cpu.time (s / 1 sec)[cpu.mode=user]' } |
ForEach-Object { $cpuByTs[$_.Timestamp] = [double]$_.'Mean/Increment' }
$csv | Where-Object { $_.'Counter Name' -eq 'dotnet.process.cpu.time (s / 1 sec)[cpu.mode=system]' } |
ForEach-Object { $cpuByTs[$_.Timestamp] = ($cpuByTs[$_.Timestamp] + [double]$_.'Mean/Increment') }
$cpuPct = $cpuByTs.Values | ForEach-Object { 100.0 * $_ / $cpuCount }
"duration_s: $dur"
"frames.ingested: $(SumCounter 'frames.ingested (Count / 1 sec)')"
"telemetry.ingested: $(SumCounter 'telemetry.ingested (Count / 1 sec)')"
"runs.started: $(SumCounter 'runs.started (Count / 1 sec)')"
"runs.completed: $(SumCounter 'runs.completed (Count / 1 sec)')"
"working-set peak MB: {0:F1}" -f ((MaxCounter 'dotnet.process.memory.working_set (By)') / 1MB)
"gen-0 GCs: $(SumCounter 'dotnet.gc.collections ({collection} / 1 sec)[gc.heap.generation=gen0]')"
"gen-1 GCs: $(SumCounter 'dotnet.gc.collections ({collection} / 1 sec)[gc.heap.generation=gen1]')"
"gen-2 GCs: $(SumCounter 'dotnet.gc.collections ({collection} / 1 sec)[gc.heap.generation=gen2]')"
"alloc-rate avg B/s: {0:N0}" -f (AvgCounter 'dotnet.gc.heap.total_allocated (By / 1 sec)')
"cpu avg %: {0:F2}" -f (($cpuPct | Measure-Object -Average).Average)
"cpu peak %: {0:F2}" -f (($cpuPct | Measure-Object -Maximum).Maximum)Remember to add 0 for any InspectionPrototype counter that is absent from the file. Absence means "never incremented"; in the measurements table that's still a data point.
5.4 Cassette cadence capture — SLICE-3.2, 25-wafer Soak8h
SLICE-3.2 capture. Verifies the full 25-wafer cassette loop under the Soak8h simulator profile and confirms that the FK band-aid retirement (stub-row pattern introduced in this slice) produces zero SqliteException (Error 19) entries and zero orphan defect rows.
IMPORTANT: this is NOT a Capture-Measurements.ps1 invocation. The cassette scheduler does not go through MultiTagSoakFlaUi. This scenario is invoked directly via the CassetteSoakFlaUi acceptance test:
$env:APP_PROCESS_ID = $appPid
dotnet test tests/InspectionPrototype.AcceptanceTests --configuration Release --no-build `
--filter "FullyQualifiedName~CassetteSoakFlaUi"Exit criteria: wafers.completed = 25, cassette.wall-clock ≤ 1 200 s (20 min), runs.faulted = 0, zero SqliteException Error 19 in app log.
Step 1 — Disable sleep and rebuild
powercfg /change standby-timeout-ac 0
powercfg /change monitor-timeout-ac 0
Stop-Process -Name 'InspectionPrototype.App' -Force -EA SilentlyContinue
Stop-Process -Name 'dotnet-counters' -Force -EA SilentlyContinue
dotnet build --configuration ReleaseStep 2 — Clean the database
Remove any existing WAL/SHM and the DB file so the capture starts from an empty state:
$dbPath = "$env:LOCALAPPDATA\LcnWaferInspection\inspection.db"
Remove-Item $dbPath -EA SilentlyContinue
Remove-Item "$dbPath-wal" -EA SilentlyContinue
Remove-Item "$dbPath-shm" -EA SilentlyContinueNote: the DB path is LcnWaferInspection (not InspectionPrototype) — this is where the SqlitePersistenceOptions default path places the file.
Step 3 — Launch app and take before-snapshot
$date = Get-Date -Format 'yyyy-MM-dd'
$appDir = ".\src\InspectionPrototype.App\bin\Release\net10.0-windows"
Start-Process -FilePath "$appDir\InspectionPrototype.App.exe" -WorkingDirectory $appDir
Start-Sleep -Seconds 5
$appPid = (Get-Process 'InspectionPrototype.App').Id
Write-Host "App PID: $appPid"
# Before-snapshot (app has just initialised, 0 run_summaries rows)
Copy-Item $dbPath "$env:TEMP\inspection.db.before-$date" -ForceNote on WAL mode: SQLite WAL mode means the DB file may be 4096 bytes (header only) immediately after startup even though the schema is present — the schema pages live in the
.db-walfile until the first checkpoint. The before-snapshot at this stage records the "0 run_summaries" baseline;Get-SqliteCounthandles the missing-table case gracefully by returning 0.
Step 4 — Start dotnet-counters
$finalCsv = "docs\captures\slice-3-2-cassette-cadence-$date.csv"
Start-Process -FilePath 'dotnet-counters' `
-ArgumentList @('collect','--process-id',"$appPid",
'--counters','InspectionPrototype,System.Runtime',
'--format','csv','--output',$finalCsv,'--refresh-interval','1') `
-NoNewWindow
Start-Sleep -Seconds 3
Test-Path $finalCsv # expect TrueImportant: start dotnet-counters in a dedicated terminal (not in the same window you will use for the FlaUI test command). A Ctrl+C in the test terminal sends SIGINT to all processes sharing that console, which will prematurely stop the counters.
Step 5 — Run the CassetteSoakFlaUi acceptance test
In a separate terminal:
$env:APP_PROCESS_ID = "$appPid"
dotnet test tests\InspectionPrototype.AcceptanceTests `
--configuration Release --no-build `
--filter "FullyQualifiedName~CassetteSoakFlaUi" `
--logger "console;verbosity=detailed"The test sequence:
- Switches simulator to
Soak8hprofile. - Connects, refreshes recipe catalog, loads recipe, homes.
- Clicks Load Cassette → Start Cassette Run.
- Waits for the Unload Cassette button to become enabled (fires when phase = Complete).
- Clicks Unload Cassette → Disconnect.
Expected duration: ~4–6 minutes for 25 wafers at Soak8h timing (2 000 ms load + 1 500 ms align + ~5 500 ms run + 1 500 ms unload per wafer ≈ 262 s wafer-loop, plus ~30 s setup/teardown). Total test wall-clock: 5–6 minutes.
Step 6 — Take after-snapshot and stop counters
After the test reports Passed:
# Kill the app to force WAL checkpoint before taking the after-snapshot
Stop-Process -Name 'InspectionPrototype.App' -Force -EA SilentlyContinue
Start-Sleep -Seconds 3
Copy-Item $dbPath "$env:TEMP\inspection.db.after-$date" -Force
Write-Host "After-snapshot: $((Get-Item "$env:TEMP\inspection.db.after-$date").Length) bytes"
# Expect ~65 000–80 000 bytes (schema + 25 run_summaries + defects + alarm_history)
# Stop counters (in the dedicated dotnet-counters terminal, press Q or use:)
Stop-Process -Name 'dotnet-counters' -Force -EA SilentlyContinueStep 7 — Verify DB contents
$dotNetQDir = "$env:TEMP\InspProto_Q2"
# (see existing InspProto_Q2 dotnet-run project, or use the psm1 helpers directly)
Import-Module .\tools\MeasurementExtraction.psm1 -Force
$lotId = 'LOT-<date>-HHMMSS' # read from app log or DB
Get-WafersCompletedCount -DatabasePath "$env:TEMP\inspection.db.after-$date" -LotId $lotId
# expect 25
Get-CassetteWallClockSeconds -DatabasePath "$env:TEMP\inspection.db.after-$date" -LotId $lotId
# expect ≤ 1200
# FK error check
Select-String -Path "$env:LOCALAPPDATA\InspectionPrototype\logs\app-$($date -replace '-','').log" `
-Pattern 'SqliteException|Error 19'
# expect 0 matchesTo find the LotId without a query tool:
Select-String -Path "$env:LOCALAPPDATA\InspectionPrototype\logs\app-$($date -replace '-','').log" `
-Pattern 'LotId=LOT-' | Select-Object -Last 1Step 8 — Extract the 34-metric row
Import-Module .\tools\MeasurementExtraction.psm1 -Force
ConvertTo-MeasurementRow `
-CsvPath "docs\captures\slice-3-2-cassette-cadence-$date.csv" `
-SliceTag 'slice-3-2-cassette-cadence' `
-Scenario 'CassetteCadence' `
-CommitHash (git rev-parse --short HEAD) `
-Date $date `
-DatabaseBefore "$env:TEMP\inspection.db.before-$date" `
-DatabaseAfter "$env:TEMP\inspection.db.after-$date" `
-LotId $lotIdThe output block ends with two cassette-specific rows:
| wafers.completed (count) | 25 |
| cassette.wall-clock (s) | <value> |Paste the full block into docs/reviews/phase-3-measurements.md under ### Row — slice-3-2-cassette-cadence.
Sanity checks before recording the row:
wafers.completed= 25. If below, check theRunWaferAsyncguard inSimulatedCassetteScheduler—CommandGuards.CanStartmust allowWorkflowState.Completed(added in SLICE-3.2 Pass 3).cassette.wall-clock≤ 1 200. If above, check the Soak8h profile timing fields (WaferLoadMs,WaferAlignMs,WaferUnloadMs).runs.faulted= 0 in the CSV. If non-zero, a fault was injected during the cassette run — check the app log for alarm entries.- Zero
SqliteException (Error 19)in the app log. This is the FK band-aid retirement proof. Non-zero means the stub-row pattern is not working (e.g., stub-row insert is failing silently). defects.persisted≥ 0. Soak8h has low but non-zeroDefectProbabilityPerFrame; a few defects per wafer is normal.- Phase 2 trigger assessment:
store.update alloc share< 10% andstore.update lock-wait p95< 100 µs means Phase 2 remains deferred. If either threshold is crossed, open the relevant Phase 2 slice.
6. Writing to the measurements table
The table lives at docs/reviews/phase-1-measurements.md. Its conventions:
- one row per (slice, metric) pair
- the slice column identifies which change produced the delta; row 0's slice tag is
demo-baseline (pre-Phase-1) - the baseline column is the number from the before capture
- the after column is the number from the after capture
- the delta column is after − baseline (or after ÷ baseline for rates — pick one convention per metric and stick with it)
- the capture method column names the CSV file and the scenario
- the date column is the capture date in ISO format
See the table file itself for the live columns and filled rows.
When a metric does not apply
Some metrics only matter for some slices. For example, telemetry.coalesced is near zero at demo rates and only becomes interesting at 200 tags × 10 Hz. Record every metric for every row anyway — a zero is a data point. The shape of the table stays uniform.
7. Troubleshooting
dotnet-counters ps does not show the app
The app must be running as a managed .NET process the tool can attach to. Common causes:
- the app was launched from Visual Studio with "attach to process" disabled
- the app crashed during launch (check
%LOCALAPPDATA%\InspectionPrototype\logs\and%LOCALAPPDATA%\InspectionPrototype\crashes\) - the PID printed is of the
dotnethost rather than the.Appchild — attach by PID in that case, not by name
The CSV file is empty or has only a header
You stopped collection before the first refresh interval elapsed, or the process exited before collection started. Re-run and confirm the app is responding before Ctrl+C.
Counter values look wrong (all zero, or a single giant spike)
- All zero: the meter name is wrong in the
--countersflag. It must be exactlyInspectionPrototype(case sensitive). - Giant spike: you captured across the warm-up period. First ~3 seconds after launch include JIT and DI startup work; those spikes are not representative. For soak measurements, drop the first 30 seconds when computing averages.
dotnet-counters collect says "process not found"
The process you named exited. If you attach by --name rather than PID and the app crashes, collection silently stops. Attach by PID (--process-id) for longer captures, and cross-check by opening a second monitor session.
8. Adding a new capture type
Each Phase 1 slice adds its own scenario entry in section 4.2 and, after the slice merges, produces two CSV files in docs/captures/ (before and after) and two rows per metric in the measurements table.
Checklist when adding a new capture type:
- Add the scenario stub to section 4 of this runbook. Scenario must be deterministic — fixed duration, fixed button sequence, fixed simulator profile.
- Ensure the new slice's code exposes any new counters via the existing
AppMetricsclass. Do not create a second meter. - Capture the before row using the new scenario before starting the slice's implementation work. If you only capture after, you have no baseline to compare against, and the slice's exit gate cannot be evaluated.
- After the slice merges, capture the after row with the same scenario.
- Commit both CSVs and the table edit in the same commit as the slice work, so the evidence travels with the change.
9. Related material
- Observability runbook — what counters exist, where logs and crash files live, live
monitormode - Evolution roadmap — why the measurement discipline exists at all
- Phase 1 measurements table — the table being populated