Skip to content

Phase 2 + Phase 3 Strategy — Refining the Roadmap After Phase 1

This doc records the strategic decision made on 2026-05-07 about how to sequence Phase 2 (architectural refactors) and Phase 3 (real-world functionality) given Phase 1's clean exit. It deviates from the original roadmap's strict measurement-first stance for one specific reason — Phase 1 didn't surface architectural problems for the simulator-driven measurement to gate against — and proposes a hybrid plan that keeps the cheap measurement leverage while letting product work drive forward.

The doc is not a retroactive amendment to the original roadmap (2026-04-22-evolution-roadmap.md) — that file remains the historical record of how the project was planned. This doc is what supersedes it for sequencing decisions from 2026-05-07 onward.


1. The decision in one paragraph

Land SLICE-2.0 first (cheap; instruments AppStateStore.Update; ~1-2 days). Then open Phase 3 in parallel with conditional Phase 2. Pick the Phase 3 slice order based on product value, not measurement-gating: 3.3 (SQLite persistence) → 3.1 (rich defect model) → 3.2 (cassette cadence) → 3.4 (identity + audit). The trigger for Phase 2 refactors shifts from "did Phase 1 measurement surface a problem" (it didn't) to "did Phase 1 measurement OR a Phase 3 row block OR Phase 3 implementation friction surface a problem." Each Phase 3 slice's row block becomes a candidate trigger for the corresponding Phase 2 slice. Phase 4.1 (first real SDK swap) is the next must-do milestone after Phase 3 — and arguably should open in parallel with later Phase 3 slices, since the simulator-vs-real-hardware gap is the largest unresolved confidence question.

If you read only one line: measurement-first, but the simulator can't be the only source of measurement evidence anymore — Phase 3 feature work and Phase 4 real-hardware integration are both legitimate sources of pressure that justify Phase 2 work.


2. Why the original Phase 2 plan needs refinement

The original roadmap (2026-04-22-evolution-roadmap.md §3) said: "Those numbers become the measured justification for Phase 2. If the app survives this run beautifully, Phase 2 is deferred. If not, we know exactly which slice of the store to attack first."

The app did survive beautifully. Phase 1 evidence (phase-1-capabilities-and-limits.md §2-3): no leaks across 8 hours of soak, no faults under low-chaos load, recovery survives bursty fault injection, frame pipeline holds 30 fps × 2 MP without drops, encoder pipeline holds 657 Hz without destabilizing the workflow, GC pause p95 stable in the 7.9-12.4 ms band across every load shape.

The roadmap's plan B for that case was "Phase 2 is deferred." But "deferred" alone isn't a working plan — it leaves the question of what to do instead unanswered. The honest reading of Phase 1's evidence is that the simulator at the load shapes Phase 1 can produce isn't going to surface anything more. Continuing to drive measurement against the same simulator is hitting a wall.

Three options for what to do instead:

  • Option A — Wait. Stay on the simulator-first discipline; defer everything until a real machine is available for measurement. Cost: weeks of idle time.
  • Option B — Refactor anyway. Open Phase 2's planned slices on architectural-cleanliness grounds. Cost: wide-blast-radius refactors with no evidence they help.
  • Option C — Build Phase 3 features. Add real-world functionality (defect model, persistence, cassette cadence, identity). Cost: deviates from the original roadmap's measurement-first stance. Benefit: Phase 3 features are additive, not transformative — they don't depend on Phase 2's architectural decisions, and their implementation will surface real architectural pressure that synthetic chaos profiles can't.

This doc picks Option C with one constraint: SLICE-2.0 still lands first as the cheap baseline. That preserves measurement discipline (every Phase 3 row block is comparable against a known store-side baseline) without forcing speculative refactors.


3. The slice-by-slice plan

Phase 2 — refactors stay conditional, but trigger expands

SliceOriginal justificationStatus under this plan
2.0Measurement instrumentationProceed first. Independent of feature work; ~1-2 days; produces baseline.
2.1"Slice AppState into sub-records" — alloc reductionStay conditional. Wide-blast-radius refactor; needs evidence (from 2.0 or any Phase 3 row) showing alloc share ≥ 10% before opening. Cleanliness alone is not sufficient justification under roadmap §0.4 ("No dead abstractions").
2.2"Immutable collections" — avoid List<T> rebuild churnStay conditional. Same threshold as 2.1.
2.3"Data-plane lift-out" for tags / frames / diagnosticsHas independent architectural merit. The encoder bypass (SLICE-1.3) proved the pattern; the cleanliness argument here is real because it's the same pattern applied to the next subsystem. Could open before measurement evidence on architectural grounds, but recommended timing is after Phase 3.1 (rich defect model expands what flows through AppState) so that the lift-out is shaped against real load.
2.4"Per-slice observables" — reduce subscriber-update fanoutDefer further. Only meaningful once the UI surface grows (Phase 3.1 wafer map). Revisit post-3.1.

The Phase 2 trigger conditions, restated:

  • 2.1 / 2.2 open if: any row block (SLICE-2.0, or any Phase 3 row) shows AppStateStore.Update alloc share ≥ 10% OR lock-wait p95 ≥ 100 µs.
  • 2.3 opens if: Phase 3.1 implementation surfaces friction routing per-frame defect collections through AppState.LatestFrame OR SLICE-2.0 measures TagStreamPipelineService / FramePipelineService as the dominant store-write caller (> 50% share).
  • 2.4 opens if: Phase 3.1's wafer-map view shows redundant subscriber updates (specific subscribers receiving events they don't care about, measurably).

Phase 3 — proceeds in product-value order

SliceTitleOrderWhy this order
3.3SQLite persistence + schema versioning1stMost architecturally interesting; surfaces persistence-side load shapes (lock-wait, schema migrations); replaces the JSON-file run-history store that's been "good enough" but isn't real-world. Likely Phase 2 trigger candidate (SQLite writes under sustained run cycles will exercise AppState write rate).
3.1Rich defect model2ndMost operator-facing value; gives the UI surface room to grow; per-frame defect collections will pressure the frame pipeline's allocation profile (potential 2.2 trigger) and the run-summary path (potential 2.3 trigger for frame-stream lift-out).
3.2Wafer loop / cassette cadence3rdComposes 3.3 + 3.1 + the existing workflow state machine into something that genuinely looks like a wafer tool. Each 25-wafer cassette run is a long-duration scenario that exercises the full Phase 1 + Phase 3 stack at production-shaped cadence.
3.4Identity + audit4thPartly orthogonal — can land any time after 3.1 once the UI surface is rich enough that role-gating is meaningful. Lower priority because it's process-shaped (audit trails, role permissions) rather than throughput-shaped.

Each Phase 3 slice produces a row block in phase-3-measurements.md (a new file mirroring phase-1-measurements.md), with the same 22-26 metric set extended for new domain-specific counters (e.g., defects.persisted, cassette.wafers.completed).

Phase 4 — real SDK swap, parallel candidate

SliceTitleStatus under this plan
4.1First real SDK swap (IMachineConnection)Open in parallel with 3.x slices, not after them. Reasoning below.
4.2Historian / MES bridgeAfter 4.1. No change from original plan.
4.3Packaging + signed installerAfter 4.2. No change from original plan.

Why 4.1 should consider opening earlier than the original plan suggested:

  • The simulator-vs-real-hardware gap (phase-1-capabilities-and-limits.md §4.1) is the largest unresolved confidence question. The earlier real-SDK behavior surfaces, the less rework Phase 3 features will need.
  • 4.1 picks the smallest surface (IMachineConnection ≈ 20 lines of interface, 50 lines of SimulatedMachineConnection). It's not a multi-month integration; it's a focused swap.
  • Phase 3 features will need to handle real-SDK error modes anyway (e.g., 3.4's audit trail records connection events; if real SDK errors don't fit the simulator's coin-flip model, 3.4's UI flow may need rework). Better to surface the real failure modes early than to build Phase 3 against a too-clean simulator and refactor later.
  • 4.1 is largely independent of the Phase 3 slices in code terms — it touches Infrastructure.RealVendor.* (a new namespace) and a DI registration switch in App. It doesn't conflict with Phase 3 work.

The right framing: 4.1 is a parallel track, not a phase that comes after Phase 3 finishes.


4. What this plan keeps from the original roadmap

  • Measurement discipline — every slice still produces a row block; every row block is reproducible from a committed CSV; criterion-amendment-as-documentation is still the pattern when platform reality forces it.
  • Docs-first execution — every slice still has a spec, a task, a measurement row, a design-notes deep-dive, and (now) a per-slice contribution to whichever measurements file applies (Phase 1, Phase 3, or new ones).
  • Layered architecture and DI invariants — no slice introduces cross-layer leaks; no slice mutates AppState outside AppStateStore.Update; no slice adds infrastructure dependencies to Application or Domain.
  • Roadmap §0 guiding principles 1, 2, 3, 5, 6 — simulator-first remains the default runtime even as 4.1 lands; canonical state semantics preserved; one slice / one task / one acceptance bar; measurable before-and-after for every perf-affecting change; demo path stays green.
  • Roadmap §0 principle 4 ("No dead abstractions")especially preserved. This is the principle that argues against speculative Phase 2 refactors; this plan honors it by keeping 2.1 / 2.2 / 2.4 conditional.

5. What this plan changes from the original roadmap

  • Phase 2's trigger condition expands. Originally: "Phase 1 soak surfaces a measured problem." Now: "Phase 1 measurement OR Phase 3 row block OR Phase 3 implementation friction." Same principle (refactor when justified by evidence); broader source of evidence.
  • Phase 2 and Phase 3 are no longer strictly sequential. Originally: Phase 2 fully complete before Phase 3 opens. Now: Phase 3 proceeds in parallel with conditional Phase 2 slices, each gated on its own evidence trigger.
  • Phase 4.1 considered for parallel execution with later Phase 3 slices rather than only after Phase 3 fully completes. Original ordering treated Phase 4 as strictly post-Phase 3; this plan recognizes 4.1 has independent value and minimal coupling to 3.x.
  • Phase 3 slice order changed. Original roadmap presented 3.1 / 3.2 / 3.3 / 3.4 in that order; this plan reorders to 3.3 / 3.1 / 3.2 / 3.4 based on architectural-pressure value (3.3 surfaces persistence load earlier, which is the most likely Phase 2 trigger).
  • The "Phase 2 deferred entirely" outcome is acknowledged as a first-class success case. If neither SLICE-2.0 nor any Phase 3 row block surfaces alloc/lock-wait pressure, Phase 2 stays deferred indefinitely — and that's a win, not a failure to ship.

6. Risks of this plan and how it addresses them

Risk 1 — Phase 3 features land on the current AppState shape, then Phase 2.1 retrofits them later. Mitigation: 2.1 stays conditional, so it only opens if evidence justifies it. If it does open, the wide-blast-radius refactor will affect Phase 3 features regardless of whether they landed first or last — but landing them first means we have working features during the refactor, not just specs.

Risk 2 — Phase 3 work surfaces friction that Phase 2 would have prevented. Mitigation: each Phase 3 slice's spec must include a "what does this require from AppState?" section before implementation begins. If the answer is "a fundamentally different shape" (e.g., 3.2's cassette cadence requires per-wafer sub-state that doesn't fit the current flat record), that's the trigger to open 2.1 before 3.2 ships. The trigger flow makes this visible at spec-writing time, not at implementation time.

Risk 3 — SLICE-2.0 measurement is taken too literally. Mitigation: the SLICE-2.0 spec already includes a decision rubric and the explicit option to defer Phase 2 entirely. The same numbers can produce "open 2.1" or "Phase 2 deferred" depending on the threshold; this plan keeps the rubric mechanical so the decision is reproducible from data.

Risk 4 — Phase 4.1 in parallel adds context-switching overhead. Mitigation: 4.1 is intentionally the smallest-surface Phase 4 slice. It's a single-author, single-Copilot-pass slice (or two), not a multi-week integration. The context-switching cost is bounded.

Risk 5 — Deviating from the original roadmap loses the discipline that made Phase 1 successful. Mitigation: this doc is the deviation, written explicitly and dated, with the rationale preserved. Future engineers reading both this doc and the original roadmap can see what changed and why. The discipline isn't "follow the original plan literally"; it's "every change to the plan is itself documented."


7. Concrete next steps

This week:        SLICE-2.0 — instrument AppStateStore.Update
                  ── 1-2 days, single Copilot pass (TASK-2.0 already written)
                  ── produces row block; unblocks Phase 3 measurement comparisons

In parallel:      SLICE-3.3 — write spec + task
                  ── SQLite persistence + schema versioning
                  ── replaces JSON run-history store
                  ── EF Core or Dapper + migrations + import path

Next 2 weeks:     SLICE-3.3 implementation begins
                  SLICE-2.0 row block lands
                  ── if SLICE-2.0 row triggers a 2.x slice, open it before 3.3 ships
                  ── otherwise continue 3.3

Next month:       SLICE-3.3 ships → row block in phase-3-measurements.md
                  SLICE-3.1 spec written (rich defect model)
                  Decision: open 4.1 spec now, or defer to after 3.1?
                  ── recommended: open 4.1 spec now, implement in parallel

2 months out:     SLICE-3.1 ships
                  SLICE-4.1 ships (first real SDK swap)
                  SLICE-3.2 spec written

3 months out:     SLICE-3.2 ships (cassette cadence — composes 3.3 + 3.1 + workflow)
                  SLICE-3.4 spec written (identity + audit)

Beyond:           Reassess. Phase 3 done by this point should have produced
                  enough evidence to know whether any Phase 2 slice still
                  needs to ship. If not, Phase 2 is deferred indefinitely
                  and Phase 4.2 / 4.3 become the next "must-do" milestones.

The timeline is rough — actual pace depends on Copilot bandwidth and capture-session scheduling. The ordering and the trigger conditions are the load-bearing parts.


8. What this doc explicitly does NOT decide

  • Whether Phase 5 / Phase 6 / etc. exist. Out of scope for the original roadmap; out of scope for this revision.
  • Whether SLICE-1.5's retired automated-capture rig should be revisited. It was superseded by SLICE-1.6 (FlaUI); no plan to bring it back.
  • Specific UI design for Phase 3 features. Phase 3.1's wafer map, Phase 3.4's identity prompt, etc. — those are spec-time decisions made when the slices are written, not strategy-doc decisions.
  • Hardware vendor selection for Phase 4.1. Open question; depends on what real machine becomes available for integration. Spec-time decision.
  • Whether this plan should itself be revised. This doc is dated 2026-05-07 and represents the current best read of the situation; if Phase 3 evidence changes the picture materially, write a new dated strategy doc that supersedes this one. The pattern (dated review docs that supersede each other) is the durable thing.

Companion docs

Docs-first project memory for AI-assisted implementation.