SLICE-1.5: Automated Measurement Capture
Status: Superseded as of 2026-04-27. The shipped rig (
IScenario,ScenarioRunner,--scenarioCLI flag,Capture-Measurements.ps1) drove view-modelICommandinstances directly without rendering the UI. That bypass was a known design tradeoff (see §"Verification Notes" below) and ultimately the right replacement is a UI-Automation-driven approach (shipped as SLICE-1.6,FlaUI, 2026-04-27) that exercises the real XAML-binding path. The scenario classes, runner, CLI parsing, and orchestrator script have been removed; rows 0a, 0b, andslice-1-1-multi-tag-telemetry(captured under this rig) remain in the measurements table as historical evidence. The CSV-math helpers intools/MeasurementExtraction.psm1and the_disposedInterlocked guards onSimulatedTagSource/SimulatedCamera(real bugs surfaced during this slice's review) were kept. Read the rest of this spec for context only — do not implement against it.
- Status: Superseded by SLICE-1.6 (FlaUI capture, shipped)
- Date: 2026-04-25
- Depends on: Evolution Roadmap, SLICE-006: Observability Baseline, SLICE-1.1: Multi-Tag Telemetry
Goal
Collapse the human-driven measurement-capture procedure into a single command per row. Add an in-app scenario runner that drives the existing view-model commands on a deterministic schedule, plus a thin orchestrator script that wraps dotnet-counters collect around it and emits a ready-to-paste 16-metric markdown block. After this slice, capturing a before/after pair for any Phase 1 slice should take ~10 minutes of wall-clock time and zero human button-clicks.
Why This Slice
The current capturing-measurements runbook prescribes three terminals, a stopwatch, a 10–30 minute scripted button-clicking sequence, and manual transcription of 16 metrics into a markdown table. SLICE-1.1's MultiTag scenario alone is 30 minutes; SLICE-1.2 (frame payloads), SLICE-1.3 (encoder-rate motion), and SLICE-1.4 (storm and soak) each need at least two captures (before and after) at comparable durations. That is several hours of operator time per slice for a procedure whose value is reproducibility, which a human at a stopwatch is bad at.
Three failure modes show up in practice:
- Drift between captures. A human paces inter-action delays differently on Tuesday than on Friday. Run-rate variance shifts every count-style metric without shifting the actual code under test.
- Forgotten steps. "Switch profile before Connect" is the kind of instruction that gets skipped under time pressure, and the §4.2 sanity check exists exactly because that mistake has been made before.
- Transcription errors. Sixteen metrics, two columns each, hand-entered from a PowerShell printout — at SLICE-1.4 closeout there will be hundreds of cells in the table, and one fat-fingered digit is enough to make a delta look like a regression.
This slice exists to make the remaining Phase 1 slices' exit gates cheap enough that no one is ever tempted to skip them, and consistent enough that deltas across slices reflect real code changes rather than operator variance.
Requirements Coverage
- 07. AI Delivery Constraints and Roadmap: each phase ships a measurable before-and-after; this slice protects that discipline by removing the operator-time tax that threatens it
- 04. UI and Technical Requirements: the existing view-model command surface (Connect, LoadRecipe, Home, StartRun, Stop, Disconnect) gains a non-UI driver, exercising the same
ICommandinstances XAML binds to
In Scope
- a new console entry-point flag on
InspectionPrototype.App:--scenario <name> --duration <seconds> [--profile <name>] [--operator-delay <ms>]- when present, the app boots normally (DI, Serilog, single-instance guard, AppState, all hosted services) but launches a
ScenarioRunnerinstead of opening the main window, and exits cleanly with code0on scenario completion or non-zero on any unhandled exception - when absent, the app behaves exactly as today (interactive WPF window)
- a new
IScenarioabstraction and at least two concrete scenarios inInspectionPrototype.Application.Scenarios:DemoBaselineScenario— re-implements the §4.1 step list (Normal profile, Connect → Refresh → Loadstandard-5pt-wafer-scan→ Home → repeated StartRun until duration)MultiTagSoakScenario— re-implements the §4.2 step list (MultiTag profile selected before Connect, then the same Connect → Refresh → Load → Home → repeated StartRun)- each scenario receives
MainViewModel(or a narrowerIOperatorCommandsfacade exposing the sameICommandinstances) and anIAppStateStoreso it can wait on observable state transitions rather than sleeping for fixed delays
- a
ScenarioRunnerhosted service that:- resolves the named scenario by
--scenariovalue (case-insensitive); unknown names exit with a clear error and code2 - awaits scenario completion with a
CancellationTokencancelled at--durationseconds (the runner stops in-flight runs cleanly via the existing Stop command before exit) - logs scenario start, every operator-action invocation, and scenario completion at
Informationlevel so the §4.x runbook entries can grep the log to verify the sequence
- resolves the named scenario by
- a
tools/Capture-Measurements.ps1orchestrator script that:- takes
-Scenario,-Duration,-OutputCsv,-CommitHash, and optional-AppendToTableparameters - launches the app with the scenario flag in the background, attaches
dotnet-counters collect --format csvto its PID, waits for the app to exit, then runs the existing extraction logic (lifted out of the runbook into a sourced PowerShell module so it has one source of truth) - prints the 16-metric markdown row block to stdout; with
-AppendToTablewrites it under the appropriate## Phase 1 rowssubsection ofdocs/reviews/phase-1-measurements.md - for
MultiTagSoakScenario, also runs the existing per-tag rate-error post-processing and writes the…-rate-check.txtfile next to the CSV
- takes
- a re-baseline of row 0 under the automated runner:
- new row
row 0a — demo-baseline (automated)indocs/reviews/phase-1-measurements.md, captured withDemoBaselineScenarioat the same duration as row 0 (10 minutes) - row 0 (human) is preserved unchanged as historical evidence; a footnote on the table names row 0a as the reference for every Phase 1 delta from this slice forward
- new row
- a runbook update:
- new section
§3a. Automated capture (preferred)indocs/runbook/capturing-measurements.mdcovering the one-command path - existing manual procedure (§3) is preserved as the fallback for releases or for diagnosing capture-pipeline drift
- §4.1 and §4.2 gain one line each naming the corresponding
IScenarioclass so the manual step list and the automated implementation cannot diverge silently
- new section
Out of Scope
- replacing
dotnet-counterswith an in-process OpenTelemetry exporter or a custom CSV writer in the app (a future slice may do this to remove the external tool dependency; not needed for this slice's goal) - driving scenarios via Windows UI Automation (FlaUI / TestStack.White) — see Verification Notes for why the command-driven approach is preferred
- CI integration (running captures on every PR via a Windows runner with display) — the orchestrator script will be CI-friendly, but wiring it into a workflow is out
- new measurement metrics beyond the existing 16-metric set; the metric definitions and aggregation rules in §5 of the runbook are unchanged
- new simulator profiles or scenarios beyond Demo Baseline and Multi-Tag Soak; SLICE-1.2/1.3/1.4 each add their own scenario when they land
- a GUI for picking and running scenarios; the entry point is CLI only
- back-porting captures of past slices under the new tooling; row 0 is re-baselined as 0a and that is the only retroactive capture this slice produces
Runtime Behavior
Entry-point routing
Program.Main(orApp.OnStartup) inspects the command line before showing any window. If--scenariois present, the app builds the host as normal but resolves and runsScenarioRunnerto completion instead of callingApplication.Runon the main window. The dispatcher still spins (the view models depend on it) — the main window simply is not shown.- All other startup invariants are preserved: single-instance mutex, Serilog file sink, crash-handler registration, configuration validation. A scenario run that violates any of those (e.g. a second scenario kicked off while one is in flight) fails the same way an interactive launch would.
IScenario and IOperatorCommands
IScenario.RunAsync(IOperatorCommands ops, IAppStateStore state, CancellationToken ct)is the single entry point. Implementations are sequential, awaitable, and cancellable.IOperatorCommandsexposes the sameICommandinstances that XAML binds to inMainViewModel— it is a facade, not a re-implementation. Callingops.StartRun.Execute(null)runs through the same code path the button click runs through, including anyCanExecutegating and anyDispatcher.InvokeAsyncposting.- Scenarios wait on state transitions by subscribing to
IAppStateStore.Changes(or the existing observable surface) — for example, "wait untilWorkflowState == Connected" — rather thanTask.Delay. A scenario that times out waiting for a transition fails with a diagnostic message naming the expected state and the actual state.
--operator-delay
- An optional knob (default
0) inserts a sleep between scenario steps after each successful state transition. It exists to allow one-off comparison runs against the human-paced baseline; theCaptured methodcolumn in the measurements table records its value when non-zero. - The default is
0because the goal is reproducibility, not human mimicry. See the Verification Notes for the row-0a comparison rationale.
Orchestrator script
- The script never starts the collector before the app exists, and never stops the app before the collector has flushed. Sequence: launch app → poll
dotnet-counters psfor the new PID (timeout 30 s) → startdotnet-counters collectagainst that PID → wait for the app process to exit → wait one extra refresh interval → stop the collector → run extraction. - The extraction module is the same PowerShell currently embedded in the runbook §5, lifted into
tools/MeasurementExtraction.psm1. The runbook continues to inline the script for readability; the inlined copy is generated from the module by a one-line snippet so the two cannot drift.
Acceptance Criteria
This slice is satisfied only if all of the following are true:
- Running
InspectionPrototype.App.exe --scenario DemoBaseline --duration 600 --output-csv <path>(or the dotnet-run equivalent) launches the app without showing a window, drives the §4.1 step list to completion, exits with code0, and produces a CSV at the named path that has the same column shape as adotnet-counters collect --format csvcapture of the manual procedure. - Running the same command with
--scenario MultiTagSoak --duration 1800 --profile MultiTagproduces a CSV that passes the existing §4.2 sanity checks (tags.active == 50,telemetry.ingested ≈ 20 Hz, ≥ 50 distincttag.namedimensions insamples.ingested) and a per-tag rate-check.txtshowing max error within ±2% of configuredIntervalMs. - Passing an unknown scenario name (
--scenario Nonsense) or a duration ≤ 0 exits with a non-zero code and a clear error message before any host or counter session is started. tools/Capture-Measurements.ps1 -Scenario DemoBaseline -Duration 600 -OutputCsv <path> -CommitHash <hash>prints a 16-metric markdown row block to stdout matching the column shape of row 0; the same command with-AppendToTableappends that block under the correct## Phase 1 rowssubsection ofdocs/reviews/phase-1-measurements.mdwithout disturbing existing rows.- A new row
row 0a — demo-baseline (automated)is committed todocs/reviews/phase-1-measurements.mdwith its own CSV underdocs/captures/demo-baseline-automated-<date>.csv, captured at--operator-delay 0against the same commit row 0 was captured against (or, if that commit's app no longer builds, against the slice-1.1 head — theDatecolumn distinguishes them). - The
samples.ingestedtotal in row 0a is within ±5% of row 0 normalized for runs completed (i.e. comparesamples.ingested ÷ runs.completed, not raw totals — runs completed will differ because automation paces tighter); row 0a'sruns.completedis allowed to exceed row 0's by any margin and that delta is documented in a footnote on the new row block, not flagged as a regression. docs/runbook/capturing-measurements.mdgains a§3a. Automated capturesection that documents the one-command path, names the location of the orchestrator script, and explicitly preserves §3 as the fallback procedure. §4.1 and §4.2 each name their correspondingIScenarioclass.- The full existing test suite still passes, plus new tests covering:
ScenarioRunnerresolves a registeredIScenarioby name (case-insensitive) and exits non-zero on an unknown name; aFakeScenarioruns to completion under the runner with a cancellation token;DemoBaselineScenarioandMultiTagSoakScenarioeach reach their final step when driven against an in-memory fake ofIOperatorCommandsandIAppStateStore. - No production code path that handles real operator clicks is altered — the scenarios call the same
ICommandinstances XAML binds to. A grep forMainViewModel.StartRun(or the equivalent command property) shows the same callers as before plus the scenarios. - No new entry is added to the §4.x scenario list in the runbook or to the measurements table for this slice itself — this slice produces tooling, not a Phase 1 performance row. The only table edit is row 0a.
Verification Notes
The implementation task for this spec must include verification for:
- the entry-point flag does not interfere with the single-instance mutex (verified by running two scenarios sequentially in the same shell — the second waits for the first to exit, same as two interactive launches)
- the orchestrator script handles the case where the app crashes mid-scenario: the collector is stopped, a partial CSV is preserved with a
.partialsuffix, and the script exits non-zero rather than emitting a misleading row block from a truncated capture - per-tag rate error in the
MultiTagSoakscenario under automation lands within the same ±2% bound that the manual §4.2 procedure produces (the bound is on the simulator, not the operator — automation should not move it) - bypassing the UI dispatcher path is not a meaningful realism loss for the metrics this table tracks (frames.ingested, telemetry.ingested, GC counts, working set, CPU%) because all of those are produced by background services not driven by the dispatcher; UI Automation would pay flakiness for no metric-validity gain. This rationale lives in the new §3a alongside a note pointing readers to the manual §3 procedure when they need to verify a UI-binding regression specifically.
- the
IScenarioimplementations exercise the commands not the underlying methods — a regression test confirms that disablingStartRun.CanExecute(e.g. by holding the workflow inFaulted) causes the scenario to fail with a "command not executable" diagnostic rather than silently calling the underlying method anyway.