Skip to content

SLICE-1.5: Automated Measurement Capture

Status: Superseded as of 2026-04-27. The shipped rig (IScenario, ScenarioRunner, --scenario CLI flag, Capture-Measurements.ps1) drove view-model ICommand instances directly without rendering the UI. That bypass was a known design tradeoff (see §"Verification Notes" below) and ultimately the right replacement is a UI-Automation-driven approach (shipped as SLICE-1.6, FlaUI, 2026-04-27) that exercises the real XAML-binding path. The scenario classes, runner, CLI parsing, and orchestrator script have been removed; rows 0a, 0b, and slice-1-1-multi-tag-telemetry (captured under this rig) remain in the measurements table as historical evidence. The CSV-math helpers in tools/MeasurementExtraction.psm1 and the _disposed Interlocked guards on SimulatedTagSource / SimulatedCamera (real bugs surfaced during this slice's review) were kept. Read the rest of this spec for context only — do not implement against it.

Goal

Collapse the human-driven measurement-capture procedure into a single command per row. Add an in-app scenario runner that drives the existing view-model commands on a deterministic schedule, plus a thin orchestrator script that wraps dotnet-counters collect around it and emits a ready-to-paste 16-metric markdown block. After this slice, capturing a before/after pair for any Phase 1 slice should take ~10 minutes of wall-clock time and zero human button-clicks.

Why This Slice

The current capturing-measurements runbook prescribes three terminals, a stopwatch, a 10–30 minute scripted button-clicking sequence, and manual transcription of 16 metrics into a markdown table. SLICE-1.1's MultiTag scenario alone is 30 minutes; SLICE-1.2 (frame payloads), SLICE-1.3 (encoder-rate motion), and SLICE-1.4 (storm and soak) each need at least two captures (before and after) at comparable durations. That is several hours of operator time per slice for a procedure whose value is reproducibility, which a human at a stopwatch is bad at.

Three failure modes show up in practice:

  • Drift between captures. A human paces inter-action delays differently on Tuesday than on Friday. Run-rate variance shifts every count-style metric without shifting the actual code under test.
  • Forgotten steps. "Switch profile before Connect" is the kind of instruction that gets skipped under time pressure, and the §4.2 sanity check exists exactly because that mistake has been made before.
  • Transcription errors. Sixteen metrics, two columns each, hand-entered from a PowerShell printout — at SLICE-1.4 closeout there will be hundreds of cells in the table, and one fat-fingered digit is enough to make a delta look like a regression.

This slice exists to make the remaining Phase 1 slices' exit gates cheap enough that no one is ever tempted to skip them, and consistent enough that deltas across slices reflect real code changes rather than operator variance.

Requirements Coverage

  • 07. AI Delivery Constraints and Roadmap: each phase ships a measurable before-and-after; this slice protects that discipline by removing the operator-time tax that threatens it
  • 04. UI and Technical Requirements: the existing view-model command surface (Connect, LoadRecipe, Home, StartRun, Stop, Disconnect) gains a non-UI driver, exercising the same ICommand instances XAML binds to

In Scope

  • a new console entry-point flag on InspectionPrototype.App:
    • --scenario <name> --duration <seconds> [--profile <name>] [--operator-delay <ms>]
    • when present, the app boots normally (DI, Serilog, single-instance guard, AppState, all hosted services) but launches a ScenarioRunner instead of opening the main window, and exits cleanly with code 0 on scenario completion or non-zero on any unhandled exception
    • when absent, the app behaves exactly as today (interactive WPF window)
  • a new IScenario abstraction and at least two concrete scenarios in InspectionPrototype.Application.Scenarios:
    • DemoBaselineScenario — re-implements the §4.1 step list (Normal profile, Connect → Refresh → Load standard-5pt-wafer-scan → Home → repeated StartRun until duration)
    • MultiTagSoakScenario — re-implements the §4.2 step list (MultiTag profile selected before Connect, then the same Connect → Refresh → Load → Home → repeated StartRun)
    • each scenario receives MainViewModel (or a narrower IOperatorCommands facade exposing the same ICommand instances) and an IAppStateStore so it can wait on observable state transitions rather than sleeping for fixed delays
  • a ScenarioRunner hosted service that:
    • resolves the named scenario by --scenario value (case-insensitive); unknown names exit with a clear error and code 2
    • awaits scenario completion with a CancellationToken cancelled at --duration seconds (the runner stops in-flight runs cleanly via the existing Stop command before exit)
    • logs scenario start, every operator-action invocation, and scenario completion at Information level so the §4.x runbook entries can grep the log to verify the sequence
  • a tools/Capture-Measurements.ps1 orchestrator script that:
    • takes -Scenario, -Duration, -OutputCsv, -CommitHash, and optional -AppendToTable parameters
    • launches the app with the scenario flag in the background, attaches dotnet-counters collect --format csv to its PID, waits for the app to exit, then runs the existing extraction logic (lifted out of the runbook into a sourced PowerShell module so it has one source of truth)
    • prints the 16-metric markdown row block to stdout; with -AppendToTable writes it under the appropriate ## Phase 1 rows subsection of docs/reviews/phase-1-measurements.md
    • for MultiTagSoakScenario, also runs the existing per-tag rate-error post-processing and writes the …-rate-check.txt file next to the CSV
  • a re-baseline of row 0 under the automated runner:
    • new row row 0a — demo-baseline (automated) in docs/reviews/phase-1-measurements.md, captured with DemoBaselineScenario at the same duration as row 0 (10 minutes)
    • row 0 (human) is preserved unchanged as historical evidence; a footnote on the table names row 0a as the reference for every Phase 1 delta from this slice forward
  • a runbook update:
    • new section §3a. Automated capture (preferred) in docs/runbook/capturing-measurements.md covering the one-command path
    • existing manual procedure (§3) is preserved as the fallback for releases or for diagnosing capture-pipeline drift
    • §4.1 and §4.2 gain one line each naming the corresponding IScenario class so the manual step list and the automated implementation cannot diverge silently

Out of Scope

  • replacing dotnet-counters with an in-process OpenTelemetry exporter or a custom CSV writer in the app (a future slice may do this to remove the external tool dependency; not needed for this slice's goal)
  • driving scenarios via Windows UI Automation (FlaUI / TestStack.White) — see Verification Notes for why the command-driven approach is preferred
  • CI integration (running captures on every PR via a Windows runner with display) — the orchestrator script will be CI-friendly, but wiring it into a workflow is out
  • new measurement metrics beyond the existing 16-metric set; the metric definitions and aggregation rules in §5 of the runbook are unchanged
  • new simulator profiles or scenarios beyond Demo Baseline and Multi-Tag Soak; SLICE-1.2/1.3/1.4 each add their own scenario when they land
  • a GUI for picking and running scenarios; the entry point is CLI only
  • back-porting captures of past slices under the new tooling; row 0 is re-baselined as 0a and that is the only retroactive capture this slice produces

Runtime Behavior

Entry-point routing

  • Program.Main (or App.OnStartup) inspects the command line before showing any window. If --scenario is present, the app builds the host as normal but resolves and runs ScenarioRunner to completion instead of calling Application.Run on the main window. The dispatcher still spins (the view models depend on it) — the main window simply is not shown.
  • All other startup invariants are preserved: single-instance mutex, Serilog file sink, crash-handler registration, configuration validation. A scenario run that violates any of those (e.g. a second scenario kicked off while one is in flight) fails the same way an interactive launch would.

IScenario and IOperatorCommands

  • IScenario.RunAsync(IOperatorCommands ops, IAppStateStore state, CancellationToken ct) is the single entry point. Implementations are sequential, awaitable, and cancellable.
  • IOperatorCommands exposes the same ICommand instances that XAML binds to in MainViewModel — it is a facade, not a re-implementation. Calling ops.StartRun.Execute(null) runs through the same code path the button click runs through, including any CanExecute gating and any Dispatcher.InvokeAsync posting.
  • Scenarios wait on state transitions by subscribing to IAppStateStore.Changes (or the existing observable surface) — for example, "wait until WorkflowState == Connected" — rather than Task.Delay. A scenario that times out waiting for a transition fails with a diagnostic message naming the expected state and the actual state.

--operator-delay

  • An optional knob (default 0) inserts a sleep between scenario steps after each successful state transition. It exists to allow one-off comparison runs against the human-paced baseline; the Captured method column in the measurements table records its value when non-zero.
  • The default is 0 because the goal is reproducibility, not human mimicry. See the Verification Notes for the row-0a comparison rationale.

Orchestrator script

  • The script never starts the collector before the app exists, and never stops the app before the collector has flushed. Sequence: launch app → poll dotnet-counters ps for the new PID (timeout 30 s) → start dotnet-counters collect against that PID → wait for the app process to exit → wait one extra refresh interval → stop the collector → run extraction.
  • The extraction module is the same PowerShell currently embedded in the runbook §5, lifted into tools/MeasurementExtraction.psm1. The runbook continues to inline the script for readability; the inlined copy is generated from the module by a one-line snippet so the two cannot drift.

Acceptance Criteria

This slice is satisfied only if all of the following are true:

  1. Running InspectionPrototype.App.exe --scenario DemoBaseline --duration 600 --output-csv <path> (or the dotnet-run equivalent) launches the app without showing a window, drives the §4.1 step list to completion, exits with code 0, and produces a CSV at the named path that has the same column shape as a dotnet-counters collect --format csv capture of the manual procedure.
  2. Running the same command with --scenario MultiTagSoak --duration 1800 --profile MultiTag produces a CSV that passes the existing §4.2 sanity checks (tags.active == 50, telemetry.ingested ≈ 20 Hz, ≥ 50 distinct tag.name dimensions in samples.ingested) and a per-tag rate-check .txt showing max error within ±2% of configured IntervalMs.
  3. Passing an unknown scenario name (--scenario Nonsense) or a duration ≤ 0 exits with a non-zero code and a clear error message before any host or counter session is started.
  4. tools/Capture-Measurements.ps1 -Scenario DemoBaseline -Duration 600 -OutputCsv <path> -CommitHash <hash> prints a 16-metric markdown row block to stdout matching the column shape of row 0; the same command with -AppendToTable appends that block under the correct ## Phase 1 rows subsection of docs/reviews/phase-1-measurements.md without disturbing existing rows.
  5. A new row row 0a — demo-baseline (automated) is committed to docs/reviews/phase-1-measurements.md with its own CSV under docs/captures/demo-baseline-automated-<date>.csv, captured at --operator-delay 0 against the same commit row 0 was captured against (or, if that commit's app no longer builds, against the slice-1.1 head — the Date column distinguishes them).
  6. The samples.ingested total in row 0a is within ±5% of row 0 normalized for runs completed (i.e. compare samples.ingested ÷ runs.completed, not raw totals — runs completed will differ because automation paces tighter); row 0a's runs.completed is allowed to exceed row 0's by any margin and that delta is documented in a footnote on the new row block, not flagged as a regression.
  7. docs/runbook/capturing-measurements.md gains a §3a. Automated capture section that documents the one-command path, names the location of the orchestrator script, and explicitly preserves §3 as the fallback procedure. §4.1 and §4.2 each name their corresponding IScenario class.
  8. The full existing test suite still passes, plus new tests covering: ScenarioRunner resolves a registered IScenario by name (case-insensitive) and exits non-zero on an unknown name; a FakeScenario runs to completion under the runner with a cancellation token; DemoBaselineScenario and MultiTagSoakScenario each reach their final step when driven against an in-memory fake of IOperatorCommands and IAppStateStore.
  9. No production code path that handles real operator clicks is altered — the scenarios call the same ICommand instances XAML binds to. A grep for MainViewModel.StartRun (or the equivalent command property) shows the same callers as before plus the scenarios.
  10. No new entry is added to the §4.x scenario list in the runbook or to the measurements table for this slice itself — this slice produces tooling, not a Phase 1 performance row. The only table edit is row 0a.

Verification Notes

The implementation task for this spec must include verification for:

  • the entry-point flag does not interfere with the single-instance mutex (verified by running two scenarios sequentially in the same shell — the second waits for the first to exit, same as two interactive launches)
  • the orchestrator script handles the case where the app crashes mid-scenario: the collector is stopped, a partial CSV is preserved with a .partial suffix, and the script exits non-zero rather than emitting a misleading row block from a truncated capture
  • per-tag rate error in the MultiTagSoak scenario under automation lands within the same ±2% bound that the manual §4.2 procedure produces (the bound is on the simulator, not the operator — automation should not move it)
  • bypassing the UI dispatcher path is not a meaningful realism loss for the metrics this table tracks (frames.ingested, telemetry.ingested, GC counts, working set, CPU%) because all of those are produced by background services not driven by the dispatcher; UI Automation would pay flakiness for no metric-validity gain. This rationale lives in the new §3a alongside a note pointing readers to the manual §3 procedure when they need to verify a UI-binding regression specifically.
  • the IScenario implementations exercise the commands not the underlying methods — a regression test confirms that disabling StartRun.CanExecute (e.g. by holding the workflow in Faulted) causes the scenario to fail with a "command not executable" diagnostic rather than silently calling the underlying method anyway.

Docs-first project memory for AI-assisted implementation.