Evolution Roadmap — From Demo to Real-Machine Level (Simulator-First)

Date: 2026-04-22
Author: Planning outline (Claude / Opus 4.7)
Scope: A phased plan to grow this repository from a training prototype to software capable of driving a real wafer inspection machine, while keeping the simulator as the primary runtime. The simulator is upgraded alongside the app so that "simulator" stops meaning "toy" and starts meaning "faithful machine-in-a-box".
Companion reviews:
- docs/reviews/2026-04-22-production-readiness-review.md — current state, gaps
- docs/reviews/2026-04-22-canonical-appstate-review.md — state pattern analysis

0. Guiding principles

These are non-negotiable for every phase below.

Simulator-first, measured, then refactor. Do not rearchitect the store on a hunch. Scale the simulator to produce real-machine-level load, measure where it breaks, then refactor. Premature refactoring is the most expensive mistake available here.
Preserve IAppStateStore semantics while everything behind it changes. Domain and Presentation should compile unchanged across every phase.
One slice, one ADR, one task, one acceptance test bar. Keep the existing docs-first discipline. Every item below becomes its own SLICE-### + TASK-### + ADR (if it changes an architectural decision).
No dead abstractions. Don't add interfaces for "future flexibility". Add them when a real second implementation (or a real test double) actually lands.
Every phase ships a measurable "before" and "after". Frames/sec sustained, telemetry tags at rate, p95 UI frame time, GC pauses, allocations/sec, alarm-storm survivability. Numbers go in each slice's acceptance criteria.
Keep the demo path green. The existing happy-path walkthrough must still work after every slice. If it can't, the slice is too big.

1. Phase map

Phase 0 — Foundations              (2 slices, ~1–2 weeks)
Phase 1 — Simulator to scale       (4 slices, ~3–4 weeks)   ← measure here
Phase 2 — Store under pressure      (4 slices, ~3–4 weeks)
Phase 3 — New functionality         (4 slices, ~4–6 weeks)
Phase 4 — Real-world edges          (3 slices, ~4+ weeks)

Each phase has a hard exit gate that must be met before the next phase opens. This is what keeps the plan from becoming a wish list.

2. Phase 0 — Foundations (prereqs before we touch anything load-bearing)

Goal: give ourselves a safety net so measurements are trustworthy and regressions are visible.

#	Slice	What	Exit criteria
0.1	CI + quality gates	`.github/workflows/ci.yml`: restore/build/test on push and PR. `Directory.Build.props` with `TreatWarningsAsErrors`. `Directory.Packages.props` to centralize versions. `.editorconfig`. Coverage upload.	PR without green CI cannot be merged. Coverage baseline recorded.
0.2	Observability baseline	Configure Serilog (or OpenTelemetry) with rolling file sink and UI sink. Add `DispatcherUnhandledException`, `AppDomain.UnhandledException`, `TaskScheduler.UnobservedTaskException`. Add `System.Diagnostics.Metrics` meter with counters that Phase 1+ will populate. Add single-instance mutex.	Unhandled exceptions surface in a crash log and in the diagnostics pane. A console `dotnet-counters monitor` session shows live metrics.

Phase 0 exit gate: every subsequent slice can run with logs captured, metrics emitted, and a red CI build on regression. Without this, Phase 1's measurements are not trustworthy.

3. Phase 1 — Simulator to real-machine scale

Goal: make the simulator produce load that actually stresses the current architecture, so we can see where it breaks. We will not refactor the store yet. We will take notes.

#	Slice	What	Exit criteria
1.1	Multi-tag telemetry	Replace `MachineTelemetry(Temp, Pressure)` with a keyed tag bag: `TagSample(Name, Timestamp, Value, Quality)` and a `TagDefinition` registry. Seed 50 tags via config. Per-tag intervals from 1 Hz to 500 Hz. Synthetic noise models (sine, drift, random-walk, step).	50 tags emitting at configured rates for 30 minutes without exception. Metrics show sustained emit rate and per-tag coalesce count.
1.2	Real frame payloads	`Frame.PreviewPayload` becomes non-null. Simulator generates a real `byte[]` (or `WriteableBitmap`) per frame with configurable resolution (e.g. 2 MP, 8 MP) and channel count. `SimulatorProfile` gains `FrameWidth`, `FrameHeight`, `BytesPerPixel`. Preview actually renders in the UI.	30 fps at 2 MP sustained for 10 minutes. LOH allocation rate measured. GC pause p95 recorded.
1.3	Encoder-rate motion	Separate "UI position" stream (20 Hz, goes to AppState) from "encoder position" stream (1 kHz, goes to a dedicated channel for future plotting/tuning). `SimulatedMotionController` gains a background ticker + a per-axis noise model.	1 kHz encoder stream measured at receiver with expected rate ± 2%. UI position remains ~20 Hz unchanged.
1.4	Storm & soak profiles	`SimulatorProfile` gains: `DefectShowerRate`, `AlarmBurstEvery`, `TelemetryDropoutChance`, `NetworkLatencyMeanMs`, `NetworkLatencyStddevMs`, `TimeCompressionFactor`. New profiles: `ChaosMonkey`, `Soak8h`. SDK-flakiness wrapper that can inject timeouts, cancellation-that-doesn't-cancel, and out-of-band throw.	8-hour soak in `Soak8h` profile completes without leaking memory (RSS growth < 50 MB). `ChaosMonkey` triggers at least one code path in every fault branch of `WorkflowService`.

Phase 1 exit gate (the pressure test):

Run the app in Soak8h + 2 MP × 30 fps frames + 50 tags × 100 Hz telemetry + occasional fault storms for one full business day. Collect numbers:

sustained frames/sec at UI and in pipeline
UI p95 frame time (WPF CompositionTarget.Rendering or ETW frames)
GC Gen-0/1/2 counts and pause durations
allocations/sec at AppStateStore.Update
lock-wait time on AppStateStore._lock
memory pressure trace
dropped-frames, coalesced-telemetry counters
number of "Dispatcher.Invoke" events per second

Those numbers become the measured justification for Phase 2. If the app survives this run beautifully, Phase 2 is deferred. If not, we know exactly which slice of the store to attack first.

4. Phase 2 — Store under pressure

Only open once Phase 1's soak run exposes real, measured problems. The following slices are ordered by expected impact-per-effort given what the canonical-appstate-review predicted; reorder based on actual measurements.

#	Slice	What	Exit criteria
2.1	Slice `AppState` into sub-records	Extract `ConnectionSlice`, `MotionSlice`, `RunSlice`, `AlarmSlice`, `DiagnosticsSlice`, `RecipeSlice`, `TelemetrySlice`, `FrameSlice`. Reducers mutate one slice at a time. `CommandGuards` take the narrowest slice they need, not `AppState`.	Allocations/sec at `AppStateStore.Update` drop by ≥ 40% under the Phase-1 soak profile. All existing tests green.
2.2	Immutable collections	`ImmutableArray<T>` / `ImmutableList<T>` for `RunHistory`, `ActiveAlarms`, `RecentDiagnostics`. Remove `List.RemoveAt(0)` and `new List<T>(existing) { item }` patterns.	No `List<T>` allocation in hot paths (verify with a diagnostic session). `Alarms` collection in the VM stops rebuilding when unchanged (use `SequenceEqual`/reference check).
2.3	Data-plane lift-out	Introduce `ITelemetryBuffer`, `IFrameBuffer`, `IDiagnosticsJournal`. High-rate data lands in these, not in `AppState`. `AppState` keeps "latest value" fields used by guards only. Panels/charts subscribe directly to buffers via `IObservable<T>` or `ChannelReader<T>`.	UI bindings to latest telemetry tick at full rate with the store lock taken only at human-visible events (recipe load, workflow transition, alarm raise).
2.4	Per-slice observables	`IAppStateStore` gains `IObservable<ConnectionSlice> OnConnection()`, `IObservable<RunSlice> OnRun()`, etc., each with `.DistinctUntilChanged()`. `StateChanged` remains for compatibility but stops being the fast path. `MainViewModel` (and a future second panel) subscribe to the slices they care about.	A second test panel subscribes to only `RunSlice` and receives zero updates during a pure telemetry storm. Lock-wait on the store stays under a measured threshold.

Phase 2 exit gate: re-run the Phase 1 soak profile at 2× the rates. All Phase 1 numbers must be at or below the original baseline. That proves the refactors were actually worth it.

5. Phase 3 — New functionality (things a real tool needs)

Now that load is sustainable, add the missing machine-shaped features.

#	Slice	What	Exit criteria
3.1	Rich defect model	Replace `InspectionResult(hasDefect, string)` with `Defect(Id, FrameId, BoundingBox, Classification, Confidence, ImageRef?)`. Persist per-wafer defect list in SQLite (new `IDefectStore`). Wafer map view in UI.	A high-defect run produces 5,000 defects, all persisted and queryable, with UI pagination/virtualization working.
3.2	Wafer loop / cassette cadence	Scheduler that runs N wafers back-to-back with load / align / run / unload phases. `WaferId`, `LotId`, `OperatorId` flow through `RunSummary`.	25-wafer cassette completes under `Soak8h` with correct per-wafer records.
3.3	SQLite persistence + schema versioning	Move `RunSummary`, `Alarm` history, `Defect`, per-run `TagSample` snapshots into SQLite. EF Core or Dapper + migrations. Retire the JSON run-history file (with a one-time import).	Opening the app with 10,000 historical runs loads the last page of history in < 200 ms. Schema version recorded; a forward migration test exists.
3.4	Identity + audit	Operator prompt at start-of-shift. `OperatorId` attached to every `RunSummary`, `Alarm.AcknowledgedBy`, and recovery action. Role gating for diagnostics/fault-injection panel (Operator vs. Engineer).	Audit export produces a CSV/JSON of every state-changing event with who did it when.

Phase 3 exit gate: a full shift simulation (3× 8-hour shifts, one cassette per hour, operator hand-off between shifts) runs unattended with complete audit trail and no manual intervention.

6. Phase 4 — Real-world edges (when the simulator stops being enough)

This is where the simulator becomes the development runtime but no longer the only runtime.

#	Slice	What	Exit criteria
4.1	First real SDK swap	Pick one subsystem (likely `IMachineConnection` or `ILightController` — smallest surface). Implement a real driver alongside the simulated one. DI selects based on config.	Both `Simulator` and `RealVendor` configurations pass the same integration test pack. Interface churn is captured as an ADR.
4.2	Historian / MES bridge	`IHistorianSink` for telemetry time-series (InfluxDB / PI / Prometheus). `ISecsGemAdapter` (or minimal equivalent) for run-start/stop events to MES. All optional via config.	Telemetry appears in Grafana. Run-start messages appear in a mock MES listener.
4.3	Packaging + signed installer	MSIX or WiX installer. Code-signed. Versioned. Per-environment `appsettings`. Auto-update channel (even if manually triggered at first).	One-click install on a clean Windows machine produces a working app with a signed EXE. Uninstall cleans `%LocalAppData%`.

Phase 4 exit gate: the app runs in either Simulator or RealVendor mode by config flip, and both modes are exercised in CI.

7. Cross-cutting concerns (threaded through every phase)

Not standalone slices, but must be enforced in every PR:

Measurement before and after. Every performance-touching slice includes a before/after number in the task document.
Simulator parity. Every new production feature gets a simulator story in the same slice. No feature lands with "only works against real hardware".
ADR hygiene. Any slice that contradicts an existing ADR updates or supersedes it; new architectural choices get a new ADR.
Test fakes keep up. Every new interface gets a fake in tests/.../Stubs/ in the same slice.
Docs-first. Requirements, specs, and tasks are updated before code lands. No code-only PRs for load-bearing changes.

8. What this roadmap does not do

Deliberately excluded. These are bigger-than-project or later-than-project decisions.

Safety-critical logic in C#. Stays out. Safety interlocks belong in a PLC or a dedicated safety controller; this app is a viewer of safety state, not the authority. This is not a slice; it is a long-running architectural constraint.
Multi-station / multi-machine orchestration. Explicitly out of scope per requirements §3. If it ever becomes in-scope, it is a separate product.
Full SECS-GEM, full MES integration, factory compliance certification. Phase 4.2 opens a door; walking through it is its own multi-quarter programme.
ML-based defect classification. Plug-in point at Phase 3.1 (Classification, Confidence), but the model training/serving pipeline is a separate effort.
Localization, accessibility, design system. Important for shipped product, but do not affect architectural correctness; slot in at Phase 3 or 4 as UI-only slices.

9. How this integrates with the existing docs-first workflow

Each of the rows above becomes concrete artifacts under the current conventions:

docs/specs/SLICE-005-observability-baseline.md (etc.) — one per row.
docs/tasks/TASK-005-observability-baseline.md — the AI-sized implementation task.
docs/adrs/ADR-005-*.md — only when a load-bearing decision changes (e.g. ADR-005 would record the move of data-plane data out of AppState).
docs/scenarios/ — add a new scenario per phase exit gate (soak scenario, cassette scenario, multi-mode scenario).
docs/reviews/ — periodic review documents (like this one) that capture the state of the plan vs. reality.

The roadmap does not replace existing slices 001–004; they remain the historical record of how we got here. Phase numbering starts fresh so the evolution is clearly additive.

10. First concrete asks

If this plan is adopted, the smallest productive next step is:

Decide the budget — is this a weekend project, a quarter, or a year? The phase list scales; the ordering does not.
Open SLICE-005 and SLICE-006 (Phase 0 items). They are independent of everything else and unblock the rest.
Decide the performance targets for Phase 1 — what wafer throughput, what frame rate, what telemetry tag count, what soak duration? Those numbers become the acceptance bar for 1.1–1.4.
Assign a measurement-keeper — one person or one dashboard that owns the "before/after numbers" table. Without this, the exit gates become vibes instead of data.

Everything else flows from those four decisions.

— End of roadmap

Domains

Terms

1 Machine Control and Motion Systems

2 Hardware Integration and Device Control

3 Industrial Software Architecture

4 Industrial Communication and Integration

5 Vision, Imaging and Inspection Systems

6 UI HMI Operator Experience

7 Reliability Safety and Production Readiness

Industrial Desktop Systems

Streaming Pipelines Dotnet Real World

Evolution Roadmap — From Demo to Real-Machine Level (Simulator-First)

0. Guiding principles

1. Phase map

2. Phase 0 — Foundations (prereqs before we touch anything load-bearing)

3. Phase 1 — Simulator to real-machine scale

4. Phase 2 — Store under pressure

5. Phase 3 — New functionality (things a real tool needs)

6. Phase 4 — Real-world edges (when the simulator stops being enough)

7. Cross-cutting concerns (threaded through every phase)

8. What this roadmap does not do

9. How this integrates with the existing docs-first workflow

10. First concrete asks

Streaming Pipelines Dotnet Real World

Evolution Roadmap — From Demo to Real-Machine Level (Simulator-First) ​

0. Guiding principles ​

1. Phase map ​

2. Phase 0 — Foundations (prereqs before we touch anything load-bearing) ​

3. Phase 1 — Simulator to real-machine scale ​

4. Phase 2 — Store under pressure ​

5. Phase 3 — New functionality (things a real tool needs) ​

6. Phase 4 — Real-world edges (when the simulator stops being enough) ​

7. Cross-cutting concerns (threaded through every phase) ​

8. What this roadmap does not do ​

9. How this integrates with the existing docs-first workflow ​

10. First concrete asks ​

Evolution Roadmap — From Demo to Real-Machine Level (Simulator-First)

0. Guiding principles

1. Phase map

2. Phase 0 — Foundations (prereqs before we touch anything load-bearing)

3. Phase 1 — Simulator to real-machine scale

4. Phase 2 — Store under pressure

5. Phase 3 — New functionality (things a real tool needs)

6. Phase 4 — Real-world edges (when the simulator stops being enough)

7. Cross-cutting concerns (threaded through every phase)

8. What this roadmap does not do

9. How this integrates with the existing docs-first workflow

10. First concrete asks