Phase 1 Capabilities and Limits

Date: 2026-05-07
Phase: 1 (Simulator to scale) — complete
Audience: anyone deciding whether to take this prototype's patterns into a similar later project, anyone evaluating the gap between this simulator and a real wafer inspection machine, anyone explaining "what does this evidence actually let us claim?"

This document is the single place to read what Phase 1 measured, what those measurements support, and — equally important — what they do not. It exists because the evidence is otherwise scattered across the measurements table (results), appsettings.json (settings), the design notes (mechanics), and the retrospective (synthesis). To answer the questions a future engineer or a reviewer actually has, you need all of those at once. This doc inlines them.

The doc is structured around the questions you'd actually ask:

What did Phase 1 measure?
What architectural patterns does that evidence support?
Where does the simulator differ from real machines, and what does that mean for confidence?
What can you claim from this work, and what can't you?
What raises confidence further, and on what timeline?

Companion docs are linked where relevant; this doc is the reading entry point.

1. Executive summary

Phase 1 closed 2026-05-03 with five slices completed and one exit gate met. Working-set steady-state drift over 8 hours of real time = −2.7 MB (well within the 50 MB ceiling). The simulator at the load shapes Phase 1 produced does not destabilize the canonical state store, the workflow state machine, or any of the bounded streaming pipelines.
The architectural patterns are durable and reusable. Canonical state store with explicit owner of mutations, bounded streaming with DropOldest back-pressure, layered architecture (Domain / Application / Infrastructure / Presentation), strict workflow state machine, profile-driven simulator, docs-first discipline. These transfer to similar industrial-control projects with high confidence.
The behavioral fidelity to real hardware is unproven. No real SDK has been swapped in. The simulator is faithful to the shape of real machine load, not to its fidelity. Five specific dimensions are listed in §4 as gaps that real-hardware integration would need to close.
Three measurement-criterion amendments landed during Phase 1. All three were methodology fixes (Windows scheduling reality forced a rewrite of the metric), not architectural failures. The pattern itself is a Phase-1 lesson: pre-specified target numbers should leave room for "the platform's behavior turned out to require a different measurement to express the same architectural intent."
Phase 2 opens with measurement, not refactoring. The roadmap §3 explicitly says "if the app survives this run beautifully, Phase 2 is deferred." It did. SLICE-2.0 measures AppStateStore.Update allocation share and lock-wait p95; the resulting numbers gate whether 2.1 / 2.2 / 2.3 / 2.4 ship as planned, get reordered, or get deferred entirely. Deferring Phase 2 is a first-class outcome.

If you read only one section past this summary, read §4 (where the simulator differs from real machines) — that is the gap between "this prototype demonstrates good patterns" and "this code can drive a real wafer inspection machine."

2. What was measured

Each row below has its scenario, the exact profile values used, the headline numbers, and what each row proves and doesn't prove. CSV evidence at docs/captures/<row>.csv. Detailed per-metric tables and notes are in phase-1-measurements.md; this section distills them into reading-friendly form.

Row 0 — demo baseline (pre-Phase-1)

The starting point. Captured 2026-04-23 against commit 7ecef05 to anchor every subsequent delta.

Scenario: Demo profile, manual operator-driven, ~10 minutes
Profile (Normal): MotionSpeed=20, TelInterval=200ms, FrameInterval=500ms, Frame=640×480×1, EncoderInterval=5ms, DefectProb=0.05, ConnFail=0.20
Headline: 41 runs in 687 s, 492 frames, 3 315 telemetry events, 203 MB working-set peak, 0.99% avg CPU, 0 dropped, 0 coalesced, 0 faulted

This is the "before" reference. Subsequent Phase 1 rows compare against it (or against a more recent baseline) to isolate what each slice changed.

Row `slice-1-1-multi-tag-telemetry` (30 min, MultiTag profile)

Validates the rebuilt 50-tag pipeline at sustained rate.

Scenario: Connect → Home → repeated Run cycles for 30 minutes
Profile (MultiTag): MotionSpeed=20, TelInterval=50ms (20 Hz snapshot), FrameInterval=500ms, Frame=640×480×1, EncoderInterval=5ms, DefectProb=0.05, ConnFail=0.20
Tag registry: 50 tags spanning 8 interval bands (2 ms / 4 ms / 10 ms / 20 ms / 50 ms / 100 ms / 200 ms / 1000 ms) with all 4 noise variants (Sine / Drift / RandomWalk / Step)
Headline: 174 runs, all 50 tags emitting, aggregate telemetry rate 19.7 Hz, per-tag rate distribution recorded across all 50 tags
Per-tag accuracy under MultiTag: ≤5 Hz tags within ±2%; 10 Hz tags hit ~9.2 Hz (−8%); 50 Hz tags hit ~32 Hz (−36%); ≥100 Hz tags cap near 64 Hz (capped by default Windows 15.6 ms timer tick)
Criterion 7 amended: original "±2% across all bands" target was unachievable on Windows; amended to documented-not-gated, with the actual achievable bands recorded above

Architecturally proven: 50 concurrent emitter Tasks + 1 snapshot publisher + bounded channel handle the configured rates without back-pressure escape; per-tag dimensions on samples.ingested work across thread-pool boundaries; MainViewModel correctly projects two reserved tags (temperature.celsius, pressure.bar) from LatestTagValues.

Limits:

Per-tag rate accuracy is bounded by Windows scheduling, not by code quality. Sub-15 ms tag rates require a winmm.timeBeginPeriod(1) boost (which only the encoder source acquires, not the tag source).
Only 2 of the 50 tags are bound to the UI; the other 48 are produced but not displayed. UI consumption is a Phase 3 question.

Row `slice-1-2-real-frame-payloads` (10 min, HighFrameRate profile)

Validates the camera + frame pipeline + WriteableBitmap rendering path under real LOH allocation pressure.

Scenario: Connect → Home → repeated Run cycles for 10 minutes
Profile (HighFrameRate): MotionSpeed=50, TelInterval=100ms, FrameInterval=33ms (~30 fps), Frame=2048×1024×1 (2 MP × 1 byte), EncoderInterval=5ms, DefectProb=0.05, ConnFail=0.05
Headline: 8 154 frames ingested in 608 s, 0 dropped, gen-2 GC count = 2 713 (4.5/s), gc-pause-p95 = 11.76 ms, LOH-alloc-rate avg = 1.04 MB/s, working-set peak = 237.2 MB, alloc-rate avg = 35.1 MB/s

Architecturally proven: the bounded frame channel (capacity=3, DropOldest) absorbs 30 fps × 2 MP under sustained load; the consumer's WriteableBitmap.Lock/WritePixels/Unlock cycle keeps up with production; LOH allocation pressure (each 2 MP frame ≥ 85 KB LOH threshold) is GC-handleable without dropped frames.

Limits:

The captured frame count (8 154) was below the original criterion-6 target of ≥17 500. Diagnosis: SimulatedCamera only streams during active runs (Connected+Running state), not continuously, so the 30 fps rate only applies to the active fraction of wall-clock time. Pipeline behavior is correct; the criterion's continuous-streaming assumption was wrong. Filed as follow-up.
30 fps × 2 MP is well below real wafer inspection rates (50-200 kHz line scans). The pipeline architecture is faithful; the throughput envelope isn't.
LOH allocation is the slice's purpose, not its cost — pooling is deferred until measurement shows it matters. Phase 2 will revisit if the soak data ever surfaces pressure here (the 8-hour Soak8h didn't).

Row `slice-1-3-encoder-rate-motion` (10 min, EncoderRate profile)

Validates the high-rate encoder stream that deliberately bypasses AppState as the load-bearing design preview for Phase 2.3 data-plane lift-out.

Scenario: Connect → Home → repeated Run cycles for 10 minutes
Profile (EncoderRate): MotionSpeed=20, TelInterval=50ms, FrameInterval=500ms, Frame=640×480×1, EncoderInterval=1ms (1 kHz target), DefectProb=0.05, ConnFail=0.05
Encoder config: two axes (X, Y), each with RandomWalkNoise(Baseline=0, StepStdDev=0.0005, ClampMin=−0.01, ClampMax=0.01)
System-timer boost: producer acquires winmm!timeBeginPeriod(1) for its lifetime
Headline: encoder-rate-x = 656.6 Hz, encoder-rate-y = 656.6 Hz on both axes (criterion 7 originally specified 1 kHz ± 2%); runs.faulted=0, frames.dropped=0, tags.active=50, gen-2 GC = 50, gc-pause-p95 = 7.9 ms, working-set peak = 223.3 MB, LOH-alloc-rate = 23 KB/s
Criterion 7 amended: the 1 kHz target is capped on Windows at ~657 Hz by PeriodicTimer + winmm.timeBeginPeriod(1) per-tick scheduling overhead plus per-tick producer work (~0.5 ms total). Same Windows-platform-reality pattern as the multi-tag amendment. Architectural goal — encoder receiver reachable through non-AppState channel at hundreds of Hz without destabilizing the workflow — is satisfied.

Architecturally proven: the encoder pipeline service drains the channel and emits axis-dimension metrics without ever calling AppStateStore.Update (verified by EncoderStreamPipelineServiceTests.UpdateCount_Equals_Zero). The data-plane bypass design works: high-rate counters can be reached without elevating the AppState change rate. This is the architectural rehearsal for Phase 2.3.

Limits:

The 657 Hz ceiling is platform-bounded on Windows. Real encoder boards run at 10-100 kHz on dedicated hardware. The architecture would carry forward; the absolute rate would not. Encoder-cadence remediation (Stopwatch-busy-yield, timeSetEvent, CreateWaitableTimerEx(TIMER_HIGH_RESOLUTION)) is a filed follow-up.
The encoder data is captured to a metric counter and a channel; no UI surfaces the high-rate stream. Phase 2 / 3 work.

Row `slice-1-4-chaos-monkey` (30 min, ChaosMonkey profile)

Validates the workflow state machine's recovery path under aggressive fault injection.

Scenario: Connect → Home → repeated Run cycles for 30 minutes
Profile (ChaosMonkey): MotionSpeed=50, TelInterval=50ms, FrameInterval=100ms (10 fps), Frame=1024×768×1, EncoderInterval=5ms, DefectProb=0.05, ConnFail=0.30, DefectShowerEvery=30000ms / Duration=3000ms, AlarmBurstEvery=45000ms, TelDropout=0.05, NetworkLatencyMean=250ms / Stddev=150ms, TimeCompression=1.0
Simulator:FlakySdk (manually flipped to true for this capture): Enabled=true, TimeoutChance=0.05, IgnoreCancellationChance=0.05, OutOfBandThrowChance=0.05, TimeoutHangMs=30000
Headline: 491 runs.started, 453 runs.completed, 37 runs.faulted, 37 fault-cycles; frames.ingested=10 469, frames.dropped=0; gc-pause-p95=10.28 ms; working-set peak=225.3 MB; LOH-alloc-rate=323 KB/s
Criterion-11 log evidence (verified): 39 critical-fault injections, 39 fault-clears, 37 recovery-completed events, 120 defect-shower transitions over the 30-minute window. All four WorkflowService fault branches (connect-failure / fault-during-home / fault-during-run / clear-and-recover) hit at least once and were verified by log inspection.

Architecturally proven: the workflow state machine survives bursty fault injection; the inject → clear → recover cycle completes 37 times in 30 minutes; the 92.3% completion rate (453/491) is exclusively due to faults that interrupt in-flight runs, not framework defects.

Limits:

ChaosMonkey injects faults as independent uniform-distribution events. Real-world faults often correlate cascadingly (loss of vacuum → multiple alarms in milliseconds). The chaos profile does not model fault dependency graphs; race conditions under correlated faults are unproven.
The capture used FlakySdk:Enabled=true. The merged appsettings.json ships Enabled=false so existing rows reproduce. Reproducing the ChaosMonkey row requires manually flipping Enabled to true — runbook §4.5 documents this. The capture is not bit-for-bit reproducible against the merged commit by design.
Four FlaUI scenario-rig hardening fixes (bf32566, 0f1596a, 5462d42, 2108272) landed during this capture. Each was a retry/wait loop in the test scenario, not an application-layer change. Application behavior was correct throughout; the test rig was the surface that needed hardening.

Row `slice-1-4-soak-8h` (8 hours, Soak8h profile)

Validates that the process does not leak memory across real-time hours of sustained load.

Scenario: Connect → Home → repeated Run cycles for 8 hours of wall-clock time on a sleep- and hibernate-disabled host
Profile (Soak8h): MotionSpeed=30, TelInterval=100ms, FrameInterval=250ms (4 fps), Frame=1024×768×1, EncoderInterval=5ms, DefectProb=0.05, ConnFail=0.05, DefectShowerEvery=600000ms / Duration=5000ms, AlarmBurstEvery=0 (disabled — leak detection, not fault-path coverage), TelDropout=0.01, NetworkLatencyMean=50ms / Stddev=20ms, TimeCompression=1.0
Simulator:FlakySdk: Enabled=false (default)
Headline: 28 809 s capture span (1% drift from 8 h target); 5 109 runs.started, 5 109 runs.completed, 0 faulted, 2 frames.dropped (0.003%); telemetry.coalesced=12 over 28 809 s (1 every 40 minutes — consistent with TelemetryDropoutChance=0.01 deferring snapshots, not structural backpressure); gen-2 GC = 11 185 (1 398/hr); gc-pause-p95=12.42 ms; LOH-alloc-rate=239 KB/s
Working-set time series (sampled at 14 timepoints): t=0: 47.5 MB → t=29 s: 230.9 MB (startup ramp complete) → avg(min 5-30): 235.4 MB → avg(h 4-5): 234.0 MB → avg(h 7-8): 232.7 MB → p99: 238.1 MB → max: 246.0 MB
Working-set steady-state drift = avg(last 60 min) − avg(min 5-60) = 232.7 − 235.4 = −2.7 MB
Criterion 12 amended: original metric working-set growth = last − first read 186.5 MB and would have failed the 50 MB ceiling. Direct CSV inspection showed the entire delta is the process startup ramp (47.5 → 230.9 MB in 29 s — WPF + 50 tag emitters + encoder pipeline + JIT). The 7.5-hour plateau is stable. The amended metric isolates the steady-state behavior the slice was designed to measure. The 50 MB ceiling itself is unchanged.

Architecturally proven: no leak. Across 8 hours of real time and 5 109 run cycles, the working-set steady-state mean decreased by 2.7 MB. Gen-2 GC rate (1 398/hr) is well within the 4× ceiling vs. slice-1-2 (16 280/hr). The state machine, the streaming pipelines, the simulator producers, and the WPF rendering surface all hold their resources cleanly across long-duration operation.

Limits:

8 hours is the leak-detection window; it is not the production deployment window. Real wafer inspection tools run 24/7. The 8-hour evidence is necessary but not sufficient for "this can run unattended for a week."
The Soak8h profile is deliberately gentle (low chaos, no alarm bursts). It does not stress the fault-recovery path the way ChaosMonkey does. Combining ChaosMonkey-style chaos with Soak8h duration is an unmeasured intersection; filed as a future Phase 2 follow-up if needed.

3. Architectural patterns the evidence supports

This is the durable, transferable part — what later projects of similar shape can reuse with confidence.

Canonical state store with explicit owner of mutations (AppStateStore.Update is the single mutation point; Lock serializes; StateChanged event fires outside the lock).

Demonstrated by: 5 109 sequential workflow run cycles + 37 chaos fault cycles produced consistent state across 8 hours; UI projection has never observed corruption; the RecordingAppStateStore test fake has caught every accidental write attempt across all slices that promised "no AppState write" (notably the encoder pipeline).
Limit: under-store-side-load contention is not yet measured. SLICE-2.0 instruments AppStateStore.Update with [CallerMemberName] + Stopwatch + GC.GetAllocatedBytesForCurrentThread(). Until the SLICE-2.0 row lands, "the canonical store scales" is a hypothesis informed by Phase 1's clean exit, not a measurement.

Bounded streaming with DropOldest back-pressure, sized per stream (frame=3, tag=1, encoder=1).

Demonstrated by: 71 530 frames absorbed with 2 drops over 8 hours; tag pipeline produced 286 968 telemetry events with 12 snapshot coalesces; encoder pipeline ran at 657 Hz × 2 axes with runs.faulted=0 and no destabilization of the workflow.
Limit: real wafer inspection rates are 100-1000× higher than what Phase 1 measured. The architecture is the right shape; the absolute throughput envelope is not. Real-rate validation is a Phase 4 concern.

Workflow state machine with Stop / Abort / Fault distinctions (cooperative-stop boundary, immediate-abort token cancellation, fault as a third terminal mode requiring explicit operator recovery).

Demonstrated by: ChaosMonkey hit all four fault branches in 30 min with full clear-and-recover cycles completing in <2 s each; Soak8h's 5 109-run loop never produced an unexpected workflow termination; the AlarmBurster's tolerance of RecoverAsync rejection (by design) is exercised in unit tests.
Limit: chaos faults are independent uniform events. Real fault cascades (correlated, time-varying probability, with hidden dependency graphs) are not modeled.

Layered architecture (Domain / Application / Infrastructure / Presentation) with no cross-layer leaks; Application has zero WPF references; Infrastructure depends on Application abstractions only via interfaces.

Demonstrated by: SLICE-1.6 retired the headless capture rig and built a new FlaUI rig without touching the Application or Domain layers; SLICE-1.4 added 3 services + a decorator + 7 profile fields + 2 profiles without modifying the Domain layer at all; the encoder pipeline was added in SLICE-1.3 with a single new IEncoderStream abstraction that didn't disrupt any existing consumer.
Limit: the real layered-architecture stress test is a real SDK swap (Phase 4.1, "first real SDK swap"). Until then, the simulator implements every machine-boundary interface, which is the layer where most coupling normally hides. The clean swap so far is encouraging but not the final test.

Profile-driven simulator — 8 distinct profiles drive 8 distinct load shapes from one codebase, with no profile-specific conditionals.

Demonstrated by: every Phase 1 profile (Normal, Demo, HighDefect, MultiTag, HighFrameRate, EncoderRate, ChaosMonkey, Soak8h) selectable at runtime through the same UI surface; profile changes apply on next operation, never to in-flight work; SimulatorProfilesValidator rejects out-of-range values at startup so config errors fail fast.
Limit: the profile field count grew from 6 to 17 fields in one phase. Sustained growth would push toward profile categories or per-subsystem profile blocks; the current single-flat-record shape works at 17 fields, less obvious it works at 50.

Encoder data-plane bypass of AppState as a Phase-2.3 rehearsal.

Demonstrated by: the encoder pipeline service emits per-axis metrics without writing to AppState; verified by a load-bearing test that asserts RecordingAppStateStore.UpdateCount == 0 after draining 10 snapshots.
Limit: this is a single subsystem's rehearsal of the lift-out. Whether the same pattern works for the tag stream (which has 50 emitters and is read by MainViewModel) and the frame stream (which is consumed by WriteableBitmap rendering) is conditional on SLICE-2.0's measurement of caller-side allocation share.

Docs-first discipline + measurement-gated phases.

Demonstrated by: every slice has a spec, a task, a measurement row, and a design-notes deep-dive; phase exit gates are met with explicit row-block evidence rather than vibes; three measurement-criterion amendments are documented as methodology fixes (not architectural failures), preserving the spec-time intent while updating the measurement to fit platform reality.
Limit: measurement discipline is only as good as the simulator's faithfulness to real load. Calibrating the simulator's profiles against a real machine's actual load envelope would close that loop; currently the Phase 1 numbers are reproducible against the simulator, not against any specific real machine.

4. Where the simulator differs from real machines

This is the gap between "this prototype demonstrates good patterns" and "this code can drive a real wafer inspection machine." Five categories.

4.1 SDK behavior shapes

What the simulator does:

SimulatedMachineConnection.ConnectAsync is Task.Delay(1500ms) + Gaussian latency + a uniform-random coin flip against ConnectionFailureProbability.
SimulatedMotionController linearly interpolates between commanded positions with Task.Delay(20ms) per tick.
SimulatedCamera.ProduceFramesAsync allocates a byte[] per tick and fills it with a deterministic gradient.
SimulatorFaultInjector is a manual-injection API; AlarmBursterService exercises it with a 5-code round-robin pool.
SimulatedMachineSignals exposes a SafetySignals snapshot updated by manual-override methods on ISimulatorSignalControl.

What real machines do:

Real PLC connections require handshake state machines, vendor-specific protocol error codes (Allen-Bradley CIP, Siemens S7, OPC UA), reconnection sequences with exponential backoff, certificate validation, security context negotiation. A Task.Delay + coin flip is a useful test scaffold; it is not a model of failure modes you'll meet on real hardware.
Real motion controllers have S-curve acceleration profiles, encoder feedback servo loops, soft and hard limit switches, drift compensation, vendor-specific homing routines (touch-off, edge-find, stall-detection). Linear interpolation with Task.Delay produces faithful position-over-time behavior; it does not produce faithful failure behavior (servo following errors, encoder dropouts, axis jam recoveries).
Real cameras (linescan or area-scan) have asynchronous frame-arrival callbacks delivered by the SDK, buffer ownership semantics where the application must release buffers back to a hardware pool, format negotiation (Bayer / mono / RGB / packed / planar), exposure/gain/trigger configuration with hardware-side validation, DMA buffer lifetimes that prevent allocation per frame entirely.
Real machines emit hundreds of distinct fault codes from heterogeneous subsystems: PLC alarms, drive amplifier faults, vacuum gauge thresholds, RGA partial-pressure trips, temperature controller deviations, gas-flow alarms, interlock-state changes. Each subsystem has its own timing and severity model. The 5-code round-robin is a coverage test, not a faithful fault distribution.

Gap: behavioral fidelity, not architectural fidelity. The interfaces (IMachineConnection, IMotionController, ICameraController, IFaultInjector, IMachineSignals) are the right shape — they survive an SDK swap structurally. The implementations behind those interfaces don't surface the failure modes that real hardware does.

What would close it: Phase 4.1 ("first real SDK swap"). The roadmap names this explicitly as the "smallest surface" pick (likely IMachineConnection or ILightController). One real swap forces every assumption about the simulator's failure behavior to confront real-hardware reality.

4.2 Real-world rate envelopes

What the simulator does:

Frame pipeline: 30 fps × 2 MP at peak (HighFrameRate profile = 33 ms × 2048×1024×1).
Tag pipeline: aggregate 20 Hz snapshot rate (50 ms TelemetryIntervalMs); per-tag rates from 1 Hz to ~64 Hz (capped by Windows scheduling).
Encoder pipeline: 657 Hz at receiver (1 ms target capped by PeriodicTimer + winmm).

What real machines do:

Wafer inspection linescan cameras run at 50-200 kHz line rates, often with multi-channel illumination (multiple bayer or mono streams per second) and hardware ROI selection that the application configures but doesn't process pixel-by-pixel.
PLC scan cycles run at 1-10 ms with hard determinism guarantees; tag values arrive on those scan boundaries with sub-millisecond jitter, not the variable jitter the simulator produces.
Encoder feedback runs at 10-100 kHz on dedicated motion-control hardware boards. The C# host typically subscribes to summary statistics (position-error histograms, dropout counts) rather than raw samples.
Real PLCs and SDKs deliver data via DMA, ring buffers, or kernel-mode bridges that bypass the managed-code allocation pattern entirely. The frame-payload-as-byte[] model is faithful for a managed C# preview, not for the actual high-rate processing path.

Gap: two to three orders of magnitude between the simulator's measured throughput and real-machine production rates. This is not a defect — Phase 1's job was to scale the simulator within the WPF + .NET 10 envelope, not to match real-machine rates that would require native code, kernel-mode bridges, or dedicated processing hardware.

What would close it: real hardware integration plus a likely architectural split where the high-rate processing (frame analysis, encoder feedback) runs in native code or on dedicated hardware, with the C# application acting as the orchestrator and operator-facing surface. Phase 4 of the roadmap reserves space for this; the deeper architectural split is beyond the roadmap as currently scoped.

4.3 Real fault correlation and timing

What the simulator does: ChaosMonkey injects fault events as independent uniform-distribution rolls. Each AlarmBurstEveryMs tick fires exactly one fault on one of 5 codes; TelemetryDropoutChance is rolled per emitter cycle independently; NetworkLatencyMean/Stddev is sampled from a Gaussian per connect call.

What real machines do: faults correlate. A loss-of-vacuum event triggers a chamber-pressure spike, which trips wafer-presence sensors, which generate a flurry of related alarms in milliseconds, which cause the workflow's run-loop to interleave several cancellation signals. Telemetry dropouts are bursty (a network glitch drops 10 consecutive packets), not independent. Drift faults appear gradually (a sensor reading slowly drifts toward a threshold) rather than as instant events.

Gap: race conditions under correlated faults are unproven. The chaos profile's 37-fault-cycle proof in slice-1-4-chaos-monkey exercises each fault branch in isolation; it does not exercise simultaneous-fault interleaving.

What would close it: correlated-fault scenario design (a CorrelatedChaos profile that injects clusters with simulated cascade timing) plus integration testing against a real PLC fault simulator if one becomes available. This is a Phase 2 follow-up if measurement surfaces the need; not currently filed as load-bearing work.

4.4 Real UI scale

What the simulator's UI does:

13 AutomationProperties.AutomationId attributes on MainWindow.xaml (8 buttons, 2 combo boxes, 3 readouts).
2 of the 50 multi-tag pipeline tags are bound to readouts (temperature.celsius, pressure.bar).
1 frame preview surface (WriteableBitmap bound to a single Image control).
Diagnostics timeline scrolls a single list view; alarms collection drives a single panel; recipe-catalog combobox lists the loaded recipes.

What real wafer inspection UIs do:

50+ live readouts (vacuum levels per chamber, gas flows, temperatures per zone, drive currents, lamp powers, plasma densities, etc.) — typically with color-coded thresholds, sparkline history, and per-tag detail drill-down.
Defect-overlay wafer maps with zoom/pan, multi-layer visualization (defects + recipe scan path + camera ROI), and per-defect detail.
Recipe editors with parameter trees, validation, version history, and operator role-gated edit permissions.
Calibration screens for each sensor and motion axis.
Alarm history with acknowledge flows, severity sorting, time-range filtering, and audit trail.
Role-based panels (Operator vs. Engineer) with visibility and editability gated by login identity.

Gap: WPF dispatcher pipeline performance is unproven at 5-10× the current UI surface complexity. The Phase 1 captures show the pipeline holds at the simulator's UI scale; whether _dispatcher.InvokeAsync calls scale to dozens of concurrent updates per state-change event is not measured.

What would close it: Phase 3 work — "rich defect model" (3.1), "wafer loop / cassette cadence" (3.2), "identity + audit" (3.4) all materially expand the UI surface. The roadmap reserves Phase 3 for these features. Until Phase 3 builds and measures them, "the WPF rendering surface scales" is a hope informed by the small UI working well, not a demonstrated property.

4.5 `AppState` contention under store-side load

What's currently known: the encoder pipeline (SLICE-1.3) bypasses AppState and proves a non-store data plane works. The tag pipeline (50 emitters → snapshot publisher → channel → consumer → AppState.LatestTagValues) and the frame pipeline (camera → channel → consumer → AppState.LatestFrame) both flow through AppState. Whether either is the dominant AppStateStore.Update caller is unmeasured.

What's unknown:

The per-call allocation cost of AppStateStore.Update. The reducer's s with { ... } pattern produces a new AppState record per call — the record's size and per-field copy cost are not measured.
The lock-wait time on AppStateStore._lock. Phase 1 evidence shows no observable starvation, but no histogram of contention has been captured.
Which callers (FramePipelineService, TagStreamPipelineService, WorkflowService.OnFaultInjected, MainViewModel.OnPositionChanged, etc.) account for which fraction of total store writes.

Gap: the entire premise of Phase 2 (originally planned as 2.1 store-slicing, 2.2 immutable collections, 2.3 data-plane lift-out, 2.4 per-slice observables) hinges on these measurements. Without them, opening any 2.x slice commits to a refactor whose payoff cannot be quantified.

What would close it: SLICE-2.0 (store allocation profiling) is being opened specifically to measure this. The row block produces an explicit Phase 2 prioritization recommendation following a mechanical decision rubric (alloc share / lock-wait p95 / top-caller share thresholds → which 2.x slice opens, or "Phase 2 deferred entirely"). Until SLICE-2.0 lands, Phase 2's planned slices are conditional, not committed. Deferral is a first-class outcome.

5. What you can claim, what you can't

A reviewer asking "what does Phase 1 demonstrate?" should be able to find the defensible answer here without re-reading every measurement.

Claim	Defensible?	Where the evidence is
"The architectural patterns are reusable for similar industrial-control prototypes"	Yes	§3 (every pattern with its supporting capture)
"This codebase is a good reference for WPF + .NET 10 control software structure"	Yes	§3 (layered architecture, canonical store, bounded streaming all demonstrated cleanly)
"Bounded-streaming + state-machine patterns survive the simulator's load envelope"	Yes	§2 (every Phase 1 row, especially soak)
"The simulator does not leak memory across 8 hours of real time"	Yes	§2 (slice-1-4-soak-8h: steady-state drift = −2.7 MB)
"The workflow state machine survives bursty fault injection"	Yes	§2 (slice-1-4-chaos-monkey: 37 fault cycles, all four branches verified)
"The encoder data-plane bypass-`AppState` design works as the Phase-2.3 rehearsal"	Yes	§2 (slice-1-3) + §3 (encoder pattern)
"GC pauses are bounded enough that the frame pipeline doesn't drop frames"	Yes — within the simulator's envelope	§2 (slice-1-2: gc-pause-p95=11.76 ms; frames.dropped=0)
"This code can drive a real wafer inspection machine without major rework"	No	§4.1 (no real SDK has been swapped); §4.2 (rate envelope is 100-1000× off)
"The frame pipeline scales to real-machine line rates (50-200 kHz)"	No	§4.2 (peak measured = 30 fps × 2 MP)
"GC behavior is acceptable at production-scale frame rates"	Unproven	§4.2 (current LOH-alloc-rate = 1 MB/s under 30 fps × 2 MP; production rates not measured)
"The canonical store can absorb production write rates"	Unmeasured — SLICE-2.0 will tell you	§4.5
"The fault recovery handles real-world correlated fault cascades"	Unproven	§4.3 (chaos injects independent uniform events; cascades not modeled)
"The WPF rendering surface scales to a real operator UI"	Unproven	§4.4 (current UI: 13 AutomationIds, 2 tag readouts; production ~5-10× larger)
"The simulator's profiles are calibrated against real wafer inspection machines"	No	§4.2 (profiles match reasonable shapes but no real-machine load envelope is the source of truth)
"This prototype is suitable for a 24/7 unattended deployment"	No — 8-hour evidence is necessary but not sufficient	§2 (slice-1-4-soak-8h is the longest available evidence)

The pattern: take the structural lessons forward with confidence; do not take the absolute throughput numbers as a guarantee for any specific real machine.

6. Roadmap for higher confidence

What raises confidence further, and on what timeline.

Source of higher confidence	Closes which gap (§4 reference)	Timeline / status
SLICE-2.0 — instrument `AppStateStore.Update` (alloc share + lock-wait + caller distribution)	§4.5 (store contention)	Spec + task written; status: Proposed; next slice to land
SLICE-2.1 / 2.2 / 2.3 / 2.4 — store refactors	§4.5 (store contention) — conditional on 2.0	Conditional. Open only if SLICE-2.0 measurement supports them
Phase 3 (3.1, 3.2, 3.3, 3.4) — rich defect model, wafer loop, SQLite persistence, identity/audit	§4.4 (UI scale); §4.3 partially (audit trail)	Roadmap-planned; not yet opened
Phase 4.1 — first real SDK swap	§4.1 (SDK behavior shapes); partial §4.2	Planned; "smallest surface" pick (likely `IMachineConnection`)
Phase 4.2 — historian / MES bridge	§4.2 partially (real telemetry sinks)	Planned; out-of-scope until 4.1 lands
Profile calibration against a real machine's load envelope (no current slice)	§4.2 (rate envelopes)	Not currently in the roadmap; would be a new slice once a real machine is available for measurement
Correlated-fault chaos profile (`CorrelatedChaos` — no current slice)	§4.3 (fault correlation)	Filed as a future Phase 2 follow-up if measurement surfaces the need
24/7 deployment soak (e.g., 7-day continuous run)	§4.5 (sustained stability beyond 8 h)	Not currently planned; would be a new slice if production deployment becomes the target

Each row in this table is a specific, measurable next step — not a wish. None of them is blocked by Phase 1; they are all enabled by Phase 1's exit gate (the foundation is solid enough that the next slices can build on it without revisiting structural decisions).

The honest summary of where confidence stands today: the foundation is solid for the patterns it demonstrates; the simulator's load envelope is faithful to its shape but not to real-machine throughput; the gap to a production deployment is several phases of work, each of which has a specific measurement or integration milestone that will close it.

If you take only one decision from this document: invest in real-hardware integration sooner rather than later. The architectural patterns will hold up; the specific implementations and rate envelopes will not. The earlier real-SDK behavior surfaces, the less you'll have to change in implementations that look right against the simulator and turn out wrong against the machine.

Companion docs

Phase 1 Retrospective — chronological cross-slice synthesis (the story of how Phase 1 unfolded; this doc is the evaluation of what it proved)
Phase 1 Measurements — full per-slice row blocks with detailed metric tables
Roadmap Progress — per-slice status table + chronological session log
Evolution Roadmap — the five-phase plan with exit-gate criteria
Design Notes — per-slice deep-dive into class shapes, lifecycles, and decisions

Domains

Terms

1 Machine Control and Motion Systems

2 Hardware Integration and Device Control

3 Industrial Software Architecture

4 Industrial Communication and Integration

5 Vision, Imaging and Inspection Systems

6 UI HMI Operator Experience

7 Reliability Safety and Production Readiness

Industrial Desktop Systems

Streaming Pipelines Dotnet Real World

Phase 1 Capabilities and Limits

1. Executive summary

2. What was measured

Row 0 — demo baseline (pre-Phase-1)

Row `slice-1-1-multi-tag-telemetry` (30 min, MultiTag profile)

Row `slice-1-2-real-frame-payloads` (10 min, HighFrameRate profile)

Row `slice-1-3-encoder-rate-motion` (10 min, EncoderRate profile)

Row `slice-1-4-chaos-monkey` (30 min, ChaosMonkey profile)

Row `slice-1-4-soak-8h` (8 hours, Soak8h profile)

3. Architectural patterns the evidence supports

4. Where the simulator differs from real machines

4.1 SDK behavior shapes

4.2 Real-world rate envelopes

4.3 Real fault correlation and timing

4.4 Real UI scale

4.5 `AppState` contention under store-side load

5. What you can claim, what you can't

6. Roadmap for higher confidence

Companion docs

Streaming Pipelines Dotnet Real World

Phase 1 Capabilities and Limits ​

1. Executive summary ​

2. What was measured ​

Row 0 — demo baseline (pre-Phase-1) ​

Row slice-1-1-multi-tag-telemetry (30 min, MultiTag profile) ​

Row slice-1-2-real-frame-payloads (10 min, HighFrameRate profile) ​

Row slice-1-3-encoder-rate-motion (10 min, EncoderRate profile) ​

Row slice-1-4-chaos-monkey (30 min, ChaosMonkey profile) ​

Row slice-1-4-soak-8h (8 hours, Soak8h profile) ​

3. Architectural patterns the evidence supports ​

4. Where the simulator differs from real machines ​

4.1 SDK behavior shapes ​

4.2 Real-world rate envelopes ​

4.3 Real fault correlation and timing ​

4.4 Real UI scale ​

4.5 AppState contention under store-side load ​

5. What you can claim, what you can't ​

6. Roadmap for higher confidence ​

Companion docs ​

Phase 1 Capabilities and Limits

1. Executive summary

2. What was measured

Row 0 — demo baseline (pre-Phase-1)

Row `slice-1-1-multi-tag-telemetry` (30 min, MultiTag profile)

Row `slice-1-2-real-frame-payloads` (10 min, HighFrameRate profile)

Row `slice-1-3-encoder-rate-motion` (10 min, EncoderRate profile)

Row `slice-1-4-chaos-monkey` (30 min, ChaosMonkey profile)

Row `slice-1-4-soak-8h` (8 hours, Soak8h profile)

3. Architectural patterns the evidence supports

4. Where the simulator differs from real machines

4.1 SDK behavior shapes

4.2 Real-world rate envelopes

4.3 Real fault correlation and timing

4.4 Real UI scale

4.5 `AppState` contention under store-side load

5. What you can claim, what you can't

6. Roadmap for higher confidence

Companion docs