Evolution Roadmap — From Demo to Real-Machine Level (Simulator-First)
- Date: 2026-04-22
- Author: Planning outline (Claude / Opus 4.7)
- Scope: A phased plan to grow this repository from a training prototype to software capable of driving a real wafer inspection machine, while keeping the simulator as the primary runtime. The simulator is upgraded alongside the app so that "simulator" stops meaning "toy" and starts meaning "faithful machine-in-a-box".
- Companion reviews:
docs/reviews/2026-04-22-production-readiness-review.md— current state, gapsdocs/reviews/2026-04-22-canonical-appstate-review.md— state pattern analysis
0. Guiding principles
These are non-negotiable for every phase below.
- Simulator-first, measured, then refactor. Do not rearchitect the store on a hunch. Scale the simulator to produce real-machine-level load, measure where it breaks, then refactor. Premature refactoring is the most expensive mistake available here.
- Preserve
IAppStateStoresemantics while everything behind it changes. Domain and Presentation should compile unchanged across every phase. - One slice, one ADR, one task, one acceptance test bar. Keep the existing docs-first discipline. Every item below becomes its own
SLICE-###+TASK-###+ ADR (if it changes an architectural decision). - No dead abstractions. Don't add interfaces for "future flexibility". Add them when a real second implementation (or a real test double) actually lands.
- Every phase ships a measurable "before" and "after". Frames/sec sustained, telemetry tags at rate, p95 UI frame time, GC pauses, allocations/sec, alarm-storm survivability. Numbers go in each slice's acceptance criteria.
- Keep the demo path green. The existing happy-path walkthrough must still work after every slice. If it can't, the slice is too big.
1. Phase map
Phase 0 — Foundations (2 slices, ~1–2 weeks)
Phase 1 — Simulator to scale (4 slices, ~3–4 weeks) ← measure here
Phase 2 — Store under pressure (4 slices, ~3–4 weeks)
Phase 3 — New functionality (4 slices, ~4–6 weeks)
Phase 4 — Real-world edges (3 slices, ~4+ weeks)Each phase has a hard exit gate that must be met before the next phase opens. This is what keeps the plan from becoming a wish list.
2. Phase 0 — Foundations (prereqs before we touch anything load-bearing)
Goal: give ourselves a safety net so measurements are trustworthy and regressions are visible.
| # | Slice | What | Exit criteria |
|---|---|---|---|
| 0.1 | CI + quality gates | .github/workflows/ci.yml: restore/build/test on push and PR. Directory.Build.props with TreatWarningsAsErrors. Directory.Packages.props to centralize versions. .editorconfig. Coverage upload. | PR without green CI cannot be merged. Coverage baseline recorded. |
| 0.2 | Observability baseline | Configure Serilog (or OpenTelemetry) with rolling file sink and UI sink. Add DispatcherUnhandledException, AppDomain.UnhandledException, TaskScheduler.UnobservedTaskException. Add System.Diagnostics.Metrics meter with counters that Phase 1+ will populate. Add single-instance mutex. | Unhandled exceptions surface in a crash log and in the diagnostics pane. A console dotnet-counters monitor session shows live metrics. |
Phase 0 exit gate: every subsequent slice can run with logs captured, metrics emitted, and a red CI build on regression. Without this, Phase 1's measurements are not trustworthy.
3. Phase 1 — Simulator to real-machine scale
Goal: make the simulator produce load that actually stresses the current architecture, so we can see where it breaks. We will not refactor the store yet. We will take notes.
| # | Slice | What | Exit criteria |
|---|---|---|---|
| 1.1 | Multi-tag telemetry | Replace MachineTelemetry(Temp, Pressure) with a keyed tag bag: TagSample(Name, Timestamp, Value, Quality) and a TagDefinition registry. Seed 50 tags via config. Per-tag intervals from 1 Hz to 500 Hz. Synthetic noise models (sine, drift, random-walk, step). | 50 tags emitting at configured rates for 30 minutes without exception. Metrics show sustained emit rate and per-tag coalesce count. |
| 1.2 | Real frame payloads | Frame.PreviewPayload becomes non-null. Simulator generates a real byte[] (or WriteableBitmap) per frame with configurable resolution (e.g. 2 MP, 8 MP) and channel count. SimulatorProfile gains FrameWidth, FrameHeight, BytesPerPixel. Preview actually renders in the UI. | 30 fps at 2 MP sustained for 10 minutes. LOH allocation rate measured. GC pause p95 recorded. |
| 1.3 | Encoder-rate motion | Separate "UI position" stream (20 Hz, goes to AppState) from "encoder position" stream (1 kHz, goes to a dedicated channel for future plotting/tuning). SimulatedMotionController gains a background ticker + a per-axis noise model. | 1 kHz encoder stream measured at receiver with expected rate ± 2%. UI position remains ~20 Hz unchanged. |
| 1.4 | Storm & soak profiles | SimulatorProfile gains: DefectShowerRate, AlarmBurstEvery, TelemetryDropoutChance, NetworkLatencyMeanMs, NetworkLatencyStddevMs, TimeCompressionFactor. New profiles: ChaosMonkey, Soak8h. SDK-flakiness wrapper that can inject timeouts, cancellation-that-doesn't-cancel, and out-of-band throw. | 8-hour soak in Soak8h profile completes without leaking memory (RSS growth < 50 MB). ChaosMonkey triggers at least one code path in every fault branch of WorkflowService. |
Phase 1 exit gate (the pressure test):
Run the app in Soak8h + 2 MP × 30 fps frames + 50 tags × 100 Hz telemetry + occasional fault storms for one full business day. Collect numbers:
- sustained frames/sec at UI and in pipeline
- UI p95 frame time (WPF CompositionTarget.Rendering or ETW frames)
- GC Gen-0/1/2 counts and pause durations
- allocations/sec at
AppStateStore.Update - lock-wait time on
AppStateStore._lock - memory pressure trace
- dropped-frames, coalesced-telemetry counters
- number of "Dispatcher.Invoke" events per second
Those numbers become the measured justification for Phase 2. If the app survives this run beautifully, Phase 2 is deferred. If not, we know exactly which slice of the store to attack first.
4. Phase 2 — Store under pressure
Only open once Phase 1's soak run exposes real, measured problems. The following slices are ordered by expected impact-per-effort given what the canonical-appstate-review predicted; reorder based on actual measurements.
| # | Slice | What | Exit criteria |
|---|---|---|---|
| 2.1 | Slice AppState into sub-records | Extract ConnectionSlice, MotionSlice, RunSlice, AlarmSlice, DiagnosticsSlice, RecipeSlice, TelemetrySlice, FrameSlice. Reducers mutate one slice at a time. CommandGuards take the narrowest slice they need, not AppState. | Allocations/sec at AppStateStore.Update drop by ≥ 40% under the Phase-1 soak profile. All existing tests green. |
| 2.2 | Immutable collections | ImmutableArray<T> / ImmutableList<T> for RunHistory, ActiveAlarms, RecentDiagnostics. Remove List.RemoveAt(0) and new List<T>(existing) { item } patterns. | No List<T> allocation in hot paths (verify with a diagnostic session). Alarms collection in the VM stops rebuilding when unchanged (use SequenceEqual/reference check). |
| 2.3 | Data-plane lift-out | Introduce ITelemetryBuffer, IFrameBuffer, IDiagnosticsJournal. High-rate data lands in these, not in AppState. AppState keeps "latest value" fields used by guards only. Panels/charts subscribe directly to buffers via IObservable<T> or ChannelReader<T>. | UI bindings to latest telemetry tick at full rate with the store lock taken only at human-visible events (recipe load, workflow transition, alarm raise). |
| 2.4 | Per-slice observables | IAppStateStore gains IObservable<ConnectionSlice> OnConnection(), IObservable<RunSlice> OnRun(), etc., each with .DistinctUntilChanged(). StateChanged remains for compatibility but stops being the fast path. MainViewModel (and a future second panel) subscribe to the slices they care about. | A second test panel subscribes to only RunSlice and receives zero updates during a pure telemetry storm. Lock-wait on the store stays under a measured threshold. |
Phase 2 exit gate: re-run the Phase 1 soak profile at 2× the rates. All Phase 1 numbers must be at or below the original baseline. That proves the refactors were actually worth it.
5. Phase 3 — New functionality (things a real tool needs)
Now that load is sustainable, add the missing machine-shaped features.
| # | Slice | What | Exit criteria |
|---|---|---|---|
| 3.1 | Rich defect model | Replace InspectionResult(hasDefect, string) with Defect(Id, FrameId, BoundingBox, Classification, Confidence, ImageRef?). Persist per-wafer defect list in SQLite (new IDefectStore). Wafer map view in UI. | A high-defect run produces 5,000 defects, all persisted and queryable, with UI pagination/virtualization working. |
| 3.2 | Wafer loop / cassette cadence | Scheduler that runs N wafers back-to-back with load / align / run / unload phases. WaferId, LotId, OperatorId flow through RunSummary. | 25-wafer cassette completes under Soak8h with correct per-wafer records. |
| 3.3 | SQLite persistence + schema versioning | Move RunSummary, Alarm history, Defect, per-run TagSample snapshots into SQLite. EF Core or Dapper + migrations. Retire the JSON run-history file (with a one-time import). | Opening the app with 10,000 historical runs loads the last page of history in < 200 ms. Schema version recorded; a forward migration test exists. |
| 3.4 | Identity + audit | Operator prompt at start-of-shift. OperatorId attached to every RunSummary, Alarm.AcknowledgedBy, and recovery action. Role gating for diagnostics/fault-injection panel (Operator vs. Engineer). | Audit export produces a CSV/JSON of every state-changing event with who did it when. |
Phase 3 exit gate: a full shift simulation (3× 8-hour shifts, one cassette per hour, operator hand-off between shifts) runs unattended with complete audit trail and no manual intervention.
6. Phase 4 — Real-world edges (when the simulator stops being enough)
This is where the simulator becomes the development runtime but no longer the only runtime.
| # | Slice | What | Exit criteria |
|---|---|---|---|
| 4.1 | First real SDK swap | Pick one subsystem (likely IMachineConnection or ILightController — smallest surface). Implement a real driver alongside the simulated one. DI selects based on config. | Both Simulator and RealVendor configurations pass the same integration test pack. Interface churn is captured as an ADR. |
| 4.2 | Historian / MES bridge | IHistorianSink for telemetry time-series (InfluxDB / PI / Prometheus). ISecsGemAdapter (or minimal equivalent) for run-start/stop events to MES. All optional via config. | Telemetry appears in Grafana. Run-start messages appear in a mock MES listener. |
| 4.3 | Packaging + signed installer | MSIX or WiX installer. Code-signed. Versioned. Per-environment appsettings. Auto-update channel (even if manually triggered at first). | One-click install on a clean Windows machine produces a working app with a signed EXE. Uninstall cleans %LocalAppData%. |
Phase 4 exit gate: the app runs in either Simulator or RealVendor mode by config flip, and both modes are exercised in CI.
7. Cross-cutting concerns (threaded through every phase)
Not standalone slices, but must be enforced in every PR:
- Measurement before and after. Every performance-touching slice includes a before/after number in the task document.
- Simulator parity. Every new production feature gets a simulator story in the same slice. No feature lands with "only works against real hardware".
- ADR hygiene. Any slice that contradicts an existing ADR updates or supersedes it; new architectural choices get a new ADR.
- Test fakes keep up. Every new interface gets a fake in
tests/.../Stubs/in the same slice. - Docs-first. Requirements, specs, and tasks are updated before code lands. No code-only PRs for load-bearing changes.
8. What this roadmap does not do
Deliberately excluded. These are bigger-than-project or later-than-project decisions.
- Safety-critical logic in C#. Stays out. Safety interlocks belong in a PLC or a dedicated safety controller; this app is a viewer of safety state, not the authority. This is not a slice; it is a long-running architectural constraint.
- Multi-station / multi-machine orchestration. Explicitly out of scope per requirements §3. If it ever becomes in-scope, it is a separate product.
- Full SECS-GEM, full MES integration, factory compliance certification. Phase 4.2 opens a door; walking through it is its own multi-quarter programme.
- ML-based defect classification. Plug-in point at Phase 3.1 (
Classification,Confidence), but the model training/serving pipeline is a separate effort. - Localization, accessibility, design system. Important for shipped product, but do not affect architectural correctness; slot in at Phase 3 or 4 as UI-only slices.
9. How this integrates with the existing docs-first workflow
Each of the rows above becomes concrete artifacts under the current conventions:
docs/specs/SLICE-005-observability-baseline.md(etc.) — one per row.docs/tasks/TASK-005-observability-baseline.md— the AI-sized implementation task.docs/adrs/ADR-005-*.md— only when a load-bearing decision changes (e.g. ADR-005 would record the move of data-plane data out ofAppState).docs/scenarios/— add a new scenario per phase exit gate (soak scenario, cassette scenario, multi-mode scenario).docs/reviews/— periodic review documents (like this one) that capture the state of the plan vs. reality.
The roadmap does not replace existing slices 001–004; they remain the historical record of how we got here. Phase numbering starts fresh so the evolution is clearly additive.
10. First concrete asks
If this plan is adopted, the smallest productive next step is:
- Decide the budget — is this a weekend project, a quarter, or a year? The phase list scales; the ordering does not.
- Open
SLICE-005andSLICE-006(Phase 0 items). They are independent of everything else and unblock the rest. - Decide the performance targets for Phase 1 — what wafer throughput, what frame rate, what telemetry tag count, what soak duration? Those numbers become the acceptance bar for 1.1–1.4.
- Assign a measurement-keeper — one person or one dashboard that owns the "before/after numbers" table. Without this, the exit gates become vibes instead of data.
Everything else flows from those four decisions.
— End of roadmap