Skip to content

Evolution Roadmap — From Demo to Real-Machine Level (Simulator-First)

  • Date: 2026-04-22
  • Author: Planning outline (Claude / Opus 4.7)
  • Scope: A phased plan to grow this repository from a training prototype to software capable of driving a real wafer inspection machine, while keeping the simulator as the primary runtime. The simulator is upgraded alongside the app so that "simulator" stops meaning "toy" and starts meaning "faithful machine-in-a-box".
  • Companion reviews:
    • docs/reviews/2026-04-22-production-readiness-review.md — current state, gaps
    • docs/reviews/2026-04-22-canonical-appstate-review.md — state pattern analysis

0. Guiding principles

These are non-negotiable for every phase below.

  1. Simulator-first, measured, then refactor. Do not rearchitect the store on a hunch. Scale the simulator to produce real-machine-level load, measure where it breaks, then refactor. Premature refactoring is the most expensive mistake available here.
  2. Preserve IAppStateStore semantics while everything behind it changes. Domain and Presentation should compile unchanged across every phase.
  3. One slice, one ADR, one task, one acceptance test bar. Keep the existing docs-first discipline. Every item below becomes its own SLICE-### + TASK-### + ADR (if it changes an architectural decision).
  4. No dead abstractions. Don't add interfaces for "future flexibility". Add them when a real second implementation (or a real test double) actually lands.
  5. Every phase ships a measurable "before" and "after". Frames/sec sustained, telemetry tags at rate, p95 UI frame time, GC pauses, allocations/sec, alarm-storm survivability. Numbers go in each slice's acceptance criteria.
  6. Keep the demo path green. The existing happy-path walkthrough must still work after every slice. If it can't, the slice is too big.

1. Phase map

Phase 0 — Foundations              (2 slices, ~1–2 weeks)
Phase 1 — Simulator to scale       (4 slices, ~3–4 weeks)   ← measure here
Phase 2 — Store under pressure      (4 slices, ~3–4 weeks)
Phase 3 — New functionality         (4 slices, ~4–6 weeks)
Phase 4 — Real-world edges          (3 slices, ~4+ weeks)

Each phase has a hard exit gate that must be met before the next phase opens. This is what keeps the plan from becoming a wish list.


2. Phase 0 — Foundations (prereqs before we touch anything load-bearing)

Goal: give ourselves a safety net so measurements are trustworthy and regressions are visible.

#SliceWhatExit criteria
0.1CI + quality gates.github/workflows/ci.yml: restore/build/test on push and PR. Directory.Build.props with TreatWarningsAsErrors. Directory.Packages.props to centralize versions. .editorconfig. Coverage upload.PR without green CI cannot be merged. Coverage baseline recorded.
0.2Observability baselineConfigure Serilog (or OpenTelemetry) with rolling file sink and UI sink. Add DispatcherUnhandledException, AppDomain.UnhandledException, TaskScheduler.UnobservedTaskException. Add System.Diagnostics.Metrics meter with counters that Phase 1+ will populate. Add single-instance mutex.Unhandled exceptions surface in a crash log and in the diagnostics pane. A console dotnet-counters monitor session shows live metrics.

Phase 0 exit gate: every subsequent slice can run with logs captured, metrics emitted, and a red CI build on regression. Without this, Phase 1's measurements are not trustworthy.


3. Phase 1 — Simulator to real-machine scale

Goal: make the simulator produce load that actually stresses the current architecture, so we can see where it breaks. We will not refactor the store yet. We will take notes.

#SliceWhatExit criteria
1.1Multi-tag telemetryReplace MachineTelemetry(Temp, Pressure) with a keyed tag bag: TagSample(Name, Timestamp, Value, Quality) and a TagDefinition registry. Seed 50 tags via config. Per-tag intervals from 1 Hz to 500 Hz. Synthetic noise models (sine, drift, random-walk, step).50 tags emitting at configured rates for 30 minutes without exception. Metrics show sustained emit rate and per-tag coalesce count.
1.2Real frame payloadsFrame.PreviewPayload becomes non-null. Simulator generates a real byte[] (or WriteableBitmap) per frame with configurable resolution (e.g. 2 MP, 8 MP) and channel count. SimulatorProfile gains FrameWidth, FrameHeight, BytesPerPixel. Preview actually renders in the UI.30 fps at 2 MP sustained for 10 minutes. LOH allocation rate measured. GC pause p95 recorded.
1.3Encoder-rate motionSeparate "UI position" stream (20 Hz, goes to AppState) from "encoder position" stream (1 kHz, goes to a dedicated channel for future plotting/tuning). SimulatedMotionController gains a background ticker + a per-axis noise model.1 kHz encoder stream measured at receiver with expected rate ± 2%. UI position remains ~20 Hz unchanged.
1.4Storm & soak profilesSimulatorProfile gains: DefectShowerRate, AlarmBurstEvery, TelemetryDropoutChance, NetworkLatencyMeanMs, NetworkLatencyStddevMs, TimeCompressionFactor. New profiles: ChaosMonkey, Soak8h. SDK-flakiness wrapper that can inject timeouts, cancellation-that-doesn't-cancel, and out-of-band throw.8-hour soak in Soak8h profile completes without leaking memory (RSS growth < 50 MB). ChaosMonkey triggers at least one code path in every fault branch of WorkflowService.

Phase 1 exit gate (the pressure test):

Run the app in Soak8h + 2 MP × 30 fps frames + 50 tags × 100 Hz telemetry + occasional fault storms for one full business day. Collect numbers:

  • sustained frames/sec at UI and in pipeline
  • UI p95 frame time (WPF CompositionTarget.Rendering or ETW frames)
  • GC Gen-0/1/2 counts and pause durations
  • allocations/sec at AppStateStore.Update
  • lock-wait time on AppStateStore._lock
  • memory pressure trace
  • dropped-frames, coalesced-telemetry counters
  • number of "Dispatcher.Invoke" events per second

Those numbers become the measured justification for Phase 2. If the app survives this run beautifully, Phase 2 is deferred. If not, we know exactly which slice of the store to attack first.


4. Phase 2 — Store under pressure

Only open once Phase 1's soak run exposes real, measured problems. The following slices are ordered by expected impact-per-effort given what the canonical-appstate-review predicted; reorder based on actual measurements.

#SliceWhatExit criteria
2.1Slice AppState into sub-recordsExtract ConnectionSlice, MotionSlice, RunSlice, AlarmSlice, DiagnosticsSlice, RecipeSlice, TelemetrySlice, FrameSlice. Reducers mutate one slice at a time. CommandGuards take the narrowest slice they need, not AppState.Allocations/sec at AppStateStore.Update drop by ≥ 40% under the Phase-1 soak profile. All existing tests green.
2.2Immutable collectionsImmutableArray<T> / ImmutableList<T> for RunHistory, ActiveAlarms, RecentDiagnostics. Remove List.RemoveAt(0) and new List<T>(existing) { item } patterns.No List<T> allocation in hot paths (verify with a diagnostic session). Alarms collection in the VM stops rebuilding when unchanged (use SequenceEqual/reference check).
2.3Data-plane lift-outIntroduce ITelemetryBuffer, IFrameBuffer, IDiagnosticsJournal. High-rate data lands in these, not in AppState. AppState keeps "latest value" fields used by guards only. Panels/charts subscribe directly to buffers via IObservable<T> or ChannelReader<T>.UI bindings to latest telemetry tick at full rate with the store lock taken only at human-visible events (recipe load, workflow transition, alarm raise).
2.4Per-slice observablesIAppStateStore gains IObservable<ConnectionSlice> OnConnection(), IObservable<RunSlice> OnRun(), etc., each with .DistinctUntilChanged(). StateChanged remains for compatibility but stops being the fast path. MainViewModel (and a future second panel) subscribe to the slices they care about.A second test panel subscribes to only RunSlice and receives zero updates during a pure telemetry storm. Lock-wait on the store stays under a measured threshold.

Phase 2 exit gate: re-run the Phase 1 soak profile at 2× the rates. All Phase 1 numbers must be at or below the original baseline. That proves the refactors were actually worth it.


5. Phase 3 — New functionality (things a real tool needs)

Now that load is sustainable, add the missing machine-shaped features.

#SliceWhatExit criteria
3.1Rich defect modelReplace InspectionResult(hasDefect, string) with Defect(Id, FrameId, BoundingBox, Classification, Confidence, ImageRef?). Persist per-wafer defect list in SQLite (new IDefectStore). Wafer map view in UI.A high-defect run produces 5,000 defects, all persisted and queryable, with UI pagination/virtualization working.
3.2Wafer loop / cassette cadenceScheduler that runs N wafers back-to-back with load / align / run / unload phases. WaferId, LotId, OperatorId flow through RunSummary.25-wafer cassette completes under Soak8h with correct per-wafer records.
3.3SQLite persistence + schema versioningMove RunSummary, Alarm history, Defect, per-run TagSample snapshots into SQLite. EF Core or Dapper + migrations. Retire the JSON run-history file (with a one-time import).Opening the app with 10,000 historical runs loads the last page of history in < 200 ms. Schema version recorded; a forward migration test exists.
3.4Identity + auditOperator prompt at start-of-shift. OperatorId attached to every RunSummary, Alarm.AcknowledgedBy, and recovery action. Role gating for diagnostics/fault-injection panel (Operator vs. Engineer).Audit export produces a CSV/JSON of every state-changing event with who did it when.

Phase 3 exit gate: a full shift simulation (3× 8-hour shifts, one cassette per hour, operator hand-off between shifts) runs unattended with complete audit trail and no manual intervention.


6. Phase 4 — Real-world edges (when the simulator stops being enough)

This is where the simulator becomes the development runtime but no longer the only runtime.

#SliceWhatExit criteria
4.1First real SDK swapPick one subsystem (likely IMachineConnection or ILightController — smallest surface). Implement a real driver alongside the simulated one. DI selects based on config.Both Simulator and RealVendor configurations pass the same integration test pack. Interface churn is captured as an ADR.
4.2Historian / MES bridgeIHistorianSink for telemetry time-series (InfluxDB / PI / Prometheus). ISecsGemAdapter (or minimal equivalent) for run-start/stop events to MES. All optional via config.Telemetry appears in Grafana. Run-start messages appear in a mock MES listener.
4.3Packaging + signed installerMSIX or WiX installer. Code-signed. Versioned. Per-environment appsettings. Auto-update channel (even if manually triggered at first).One-click install on a clean Windows machine produces a working app with a signed EXE. Uninstall cleans %LocalAppData%.

Phase 4 exit gate: the app runs in either Simulator or RealVendor mode by config flip, and both modes are exercised in CI.


7. Cross-cutting concerns (threaded through every phase)

Not standalone slices, but must be enforced in every PR:

  • Measurement before and after. Every performance-touching slice includes a before/after number in the task document.
  • Simulator parity. Every new production feature gets a simulator story in the same slice. No feature lands with "only works against real hardware".
  • ADR hygiene. Any slice that contradicts an existing ADR updates or supersedes it; new architectural choices get a new ADR.
  • Test fakes keep up. Every new interface gets a fake in tests/.../Stubs/ in the same slice.
  • Docs-first. Requirements, specs, and tasks are updated before code lands. No code-only PRs for load-bearing changes.

8. What this roadmap does not do

Deliberately excluded. These are bigger-than-project or later-than-project decisions.

  • Safety-critical logic in C#. Stays out. Safety interlocks belong in a PLC or a dedicated safety controller; this app is a viewer of safety state, not the authority. This is not a slice; it is a long-running architectural constraint.
  • Multi-station / multi-machine orchestration. Explicitly out of scope per requirements §3. If it ever becomes in-scope, it is a separate product.
  • Full SECS-GEM, full MES integration, factory compliance certification. Phase 4.2 opens a door; walking through it is its own multi-quarter programme.
  • ML-based defect classification. Plug-in point at Phase 3.1 (Classification, Confidence), but the model training/serving pipeline is a separate effort.
  • Localization, accessibility, design system. Important for shipped product, but do not affect architectural correctness; slot in at Phase 3 or 4 as UI-only slices.

9. How this integrates with the existing docs-first workflow

Each of the rows above becomes concrete artifacts under the current conventions:

  • docs/specs/SLICE-005-observability-baseline.md (etc.) — one per row.
  • docs/tasks/TASK-005-observability-baseline.md — the AI-sized implementation task.
  • docs/adrs/ADR-005-*.md — only when a load-bearing decision changes (e.g. ADR-005 would record the move of data-plane data out of AppState).
  • docs/scenarios/ — add a new scenario per phase exit gate (soak scenario, cassette scenario, multi-mode scenario).
  • docs/reviews/ — periodic review documents (like this one) that capture the state of the plan vs. reality.

The roadmap does not replace existing slices 001–004; they remain the historical record of how we got here. Phase numbering starts fresh so the evolution is clearly additive.


10. First concrete asks

If this plan is adopted, the smallest productive next step is:

  1. Decide the budget — is this a weekend project, a quarter, or a year? The phase list scales; the ordering does not.
  2. Open SLICE-005 and SLICE-006 (Phase 0 items). They are independent of everything else and unblock the rest.
  3. Decide the performance targets for Phase 1 — what wafer throughput, what frame rate, what telemetry tag count, what soak duration? Those numbers become the acceptance bar for 1.1–1.4.
  4. Assign a measurement-keeper — one person or one dashboard that owns the "before/after numbers" table. Without this, the exit gates become vibes instead of data.

Everything else flows from those four decisions.

— End of roadmap

Docs-first project memory for AI-assisted implementation.