SLICE-3.2: Wafer Loop / Cassette Cadence
- Status: Proposed
- Date: 2026-05-08
- Depends on: SLICE-3.3: SQLite Persistence (run-history persistence + migration runner), SLICE-3.1: Rich Defect Model (per-run defect persistence), Requirements, Evolution Roadmap, 2026-05-07 Phase 2+3 Strategy
Goal
Compose SLICE-3.3 (run history) + SLICE-3.1 (defect persistence) + the existing workflow state machine into a wafer-cassette scheduler that runs N wafers back-to-back through a Load → Align → Run → Unload phase sequence. Each wafer gets a unique WaferId; each cassette gets a LotId; both flow through RunSummary. The slice's exit gate is a 25-wafer cassette completing under the Soak8h profile with correct per-wafer records persisted, defects correlated to the right wafer, and no FK / orphan-row issues in the SQLite schema. This slice closes the FK band-aid filed in the SLICE-3.1 follow-ups by introducing a stub-row-at-run-start pattern so the defects.run_id foreign key relationship can be enforced.
This is the third Phase 3 slice opened under the 2026-05-07 strategy and the slice where the prototype starts to genuinely look like a wafer inspection tool. Each cassette is a long-duration scenario (25 wafers × ~30 s/wafer ≈ 12-15 min wall-clock) that exercises the full Phase 1 + Phase 3 stack at production-shaped cadence. The row's Phase 2 trigger assessment is informed by the cumulative store-write rate of an entire cassette, not just a single run.
Why This Slice
Today the workflow state machine treats each run as an independent operator-driven event: operator clicks Load Recipe, Home, Start Run; one run completes; operator decides what to do next. Real wafer inspection tools batch wafers into cassettes (typically 25-wafer FOUPs or 13-wafer carriers) and the tool processes each wafer through a fixed phase sequence without operator intervention between wafers:
- Load — wafer transfer from cassette slot to chuck (motion + vacuum sequence)
- Align — locate the wafer notch / pre-aligner stage
- Run — execute the recipe (the existing
WorkflowService.StartRunAsyncpath) - Unload — return the wafer to its cassette slot
The current WorkflowService covers only step 3 in isolation. Steps 1, 2, 4 don't exist as concepts; the operator manually re-clicks Home / Load Recipe / Start Run for each wafer. That works for a demo; it doesn't model how real tools run.
This slice introduces a CassetteService that drives the loop: pull a wafer from the cassette, transition through Load/Align/Run/Unload phases, advance to the next wafer, repeat until the cassette is empty or aborted. WaferId and LotId are added to RunSummary so the run history is queryable per-wafer and per-lot. The simulator gets stub Load/Align/Unload phases (each a Task.Delay plus a state transition) — real machine integration is deferred to Phase 4.1.
The slice also closes the FK band-aid filed in SLICE-3.1's follow-ups (item 3 of the 2026-05-08 follow-up entry in roadmap-progress.md). Currently SqliteDefectStore.OpenAsync disables FK enforcement because defects are persisted before their parent run_summary row exists. SLICE-3.2 introduces a stub-row pattern: at run-start, a run_summary row is inserted with TerminalStatus = "Pending"; defects can then be persisted with FK enforcement on; at run-end the existing SaveAsync upsert (INSERT OR REPLACE) updates the row to its terminal status. Same architectural fit as the cassette scheduler — both create run_summary rows at known lifecycle points.
Requirements Coverage
- 02. Domain and State Model:
WaferIdandLotIdare first-class domain entities; per-wafer state survives across operations - 03. Functional Scope: cassette-mode operation is the production workflow; per-wafer records support audit and traceability
- 04. UI and Technical Requirements: the operator UI must indicate the cassette's current wafer and overall progress
- 07. AI Delivery Constraints and Roadmap: each phase ships measurable before-and-after; this is row
slice-3-2-cassette-cadenceinphase-3-measurements.md
In Scope
New domain shapes
namespace InspectionPrototype.Domain.Contracts;
public enum WaferPhase
{
Idle, // no wafer in progress
Loading, // wafer being moved to chuck
Aligning, // pre-aligner finding the notch
Running, // executing the recipe (composes existing WorkflowState.Running)
Unloading, // wafer returning to cassette slot
Complete, // wafer finished; ready for next
}
public record WaferId(string Value)
{
public static WaferId Generate(string lotId, int slotIndex) =>
new($"{lotId}-W{slotIndex:D2}");
}
public record LotId(string Value)
{
public static LotId GenerateForToday() =>
new($"LOT-{DateTimeOffset.UtcNow:yyyyMMdd-HHmmss}");
}
public record CassetteSlot(int Index, WaferId? WaferId);
public record CassetteState(
LotId LotId,
int TotalSlots, // 25 for FOUP
int CurrentSlotIndex, // 0-based
WaferPhase CurrentPhase,
IReadOnlyList<CassetteSlot> Slots);WaferId is a value type (record) deliberately, not just a string, so the type system prevents mixing wafer identifiers with run identifiers (Guid).
RunSummary extension
public record RunSummary(
Guid RunId,
string RecipeName,
DateTimeOffset StartedAtUtc,
DateTimeOffset EndedAtUtc,
RunTerminalStatus TerminalStatus,
int DefectCount,
IReadOnlyList<string> MajorAlarms,
int CompletedScanPoints,
int TotalScanPoints,
string? SimulatorProfileName = null,
int DefectsMinor = 0,
int DefectsMajor = 0,
int DefectsCritical = 0,
string? WaferId = null, // ◀ new
string? LotId = null); // ◀ newWaferId and LotId are nullable strings (not the records) so non-cassette-mode runs (operator-driven single runs from the existing UI) can still produce RunSummary records without manufacturing fake wafer identities. The cassette scheduler always populates them; legacy operator-driven flow leaves them null.
M003 migration
-- M003_cassette_columns.sql
ALTER TABLE run_summaries ADD COLUMN wafer_id TEXT;
ALTER TABLE run_summaries ADD COLUMN lot_id TEXT;
CREATE INDEX idx_run_summaries_lot_id ON run_summaries(lot_id);
CREATE INDEX idx_run_summaries_wafer_id ON run_summaries(wafer_id);
INSERT INTO schema_version (version, applied_at) VALUES (3, datetime('now'));Existing rows have NULL for both columns — that's correct (they predate cassette mode).
M004 migration — close the FK band-aid
-- M004_enable_defects_fk.sql
-- (Schema already declares REFERENCES; this migration is a no-op for schema
-- but documents that the application-level invariant is now enforced via the
-- stub-row pattern introduced in SLICE-3.2's WorkflowService changes.)
INSERT INTO schema_version (version, applied_at) VALUES (4, datetime('now'));The actual FK fix is in code, not SQL: SqliteDefectStore.OpenAsync removes the explicit PRAGMA foreign_keys = OFF. The new WorkflowService.StartRunAsync (cassette-aware) inserts a stub run_summary before the run loop starts; defects can then be persisted with FK enforcement on. The existing SaveAsync upsert at run-end updates the row to its terminal status.
A simpler alternative (no migration) is to just drop the PRAGMA foreign_keys = OFF and add the stub-row code change. The M004 migration exists to make the version-history record explicit ("at version 4, the FK band-aid was retired"); reviewers querying schema_version see the audit trail.
ICassetteScheduler + SimulatedCassetteScheduler
namespace InspectionPrototype.Application.Abstractions;
public interface ICassetteScheduler
{
/// <summary>The current cassette state, or null if no cassette is loaded.</summary>
CassetteState? Current { get; }
/// <summary>Loads a fresh cassette with the given LotId and slot count. Idempotent rejection if a cassette is already loaded.</summary>
Task LoadCassetteAsync(LotId lotId, int slotCount = 25, CancellationToken ct = default);
/// <summary>Starts the wafer-loop scheduler. Drives Load → Align → Run → Unload for each slot until the cassette is empty or stopped.</summary>
Task StartCassetteRunAsync(CancellationToken ct = default);
/// <summary>Cooperative stop — finishes the current wafer's phase, then halts.</summary>
void RequestCassetteStop();
/// <summary>Immediate abort — cancels the current phase's CTS.</summary>
Task AbortCassetteAsync();
/// <summary>Unloads the cassette (resets state). Only allowed when scheduler is Idle or Complete.</summary>
Task UnloadCassetteAsync(CancellationToken ct = default);
/// <summary>Raised when the cassette state transitions (slot index advances, phase changes, etc.). Subscribers project state into the UI.</summary>
event Action<CassetteState>? CassetteStateChanged;
}SimulatedCassetteScheduler (Infrastructure.Simulator):
- Constructor takes
IWorkflowService,IRunHistoryStore,ISimulatorProfileProvider,IAppStateStore,ILogger. LoadCassetteAsyncbuilds the initialCassetteStatewithTotalSlots = 25,CurrentSlotIndex = 0, all 25CassetteSlot.WaferIdpopulated viaWaferId.Generate(lotId, i). UpdatesAppState.Cassette = newState.StartCassetteRunAsyncstarts the scheduler loop on a background task. The loop visits each slot in order:- Transition to
Loading;Task.Delay(profile.WaferLoadMs); advance. - Transition to
Aligning;Task.Delay(profile.WaferAlignMs); advance. - Transition to
Running; callWorkflowService.StartRunAsync()with the wafer'sWaferId+LotId(the workflow's internal run-loop runs as before); wait for terminal state. - Transition to
Unloading;Task.Delay(profile.WaferUnloadMs); advance. - Slot complete; advance
CurrentSlotIndex; loop until cassette empty or stop/abort requested.
- Transition to
- Each phase transition fires
CassetteStateChangedso the UI can update. - On stop: finishes current phase, transitions cassette to
Complete. On abort: cancels current phase's CTS; cassette state goes toCompletewith current slot inIdle(didn't finish). - Catches and logs each wafer's run faults but does not abort the cassette on a single-wafer fault; advances to the next slot. Operator can manually abort if cascading faults indicate a bigger problem.
WorkflowService.StartRunAsync extension
public Task StartRunAsync(WaferId? waferId = null, LotId? lotId = null)
{
// ... existing guard checks ...
var summary = new RunSummary(
// ... existing fields ...
WaferId: waferId?.Value,
LotId: lotId?.Value);
// **NEW: Stub-row insert at run start** — closes the SLICE-3.1 FK band-aid.
// Persists a Pending row so defects can FK-reference run_summaries(run_id) safely.
var stubSummary = summary with {
EndedAtUtc = DateTimeOffset.MinValue,
TerminalStatus = RunTerminalStatus.Pending,
};
_ = SafeRunHistoryPersistAsync(() => _runHistoryStore.SaveAsync(stubSummary));
// ... existing run-loop start ...
}The RunTerminalStatus enum gains a Pending value:
public enum RunTerminalStatus
{
Pending, // ◀ new — stub row at run start; updated to one of the below at run end
Completed,
Stopped,
Aborted,
Faulted,
}At run-end, the existing _runHistoryStore.SaveAsync(summary) call updates the row to its terminal status (INSERT OR REPLACE on run_id primary key handles the upsert).
SqliteDefectStore.OpenAsync cleanup
The explicit PRAGMA foreign_keys = OFF is removed. With the stub-row pattern in place, FK enforcement now works correctly: defects are always inserted after their parent run_summary row exists. The connection string keeps Cache=Shared (matches the other stores; pooling is fine because every connection now expects FK enforcement).
AppState evolution
public record AppState(
// ... existing fields ...
CassetteState? Cassette, // ◀ new
// ... existing fields ...
);Cassette is null when no cassette is loaded. MainViewModel projects it to a UI panel showing the current slot index and phase.
Cassette UI
A new CassetteStatusPanel (UserControl) shows:
- LotId
- Progress: "Wafer 7 / 25"
- Current phase: "Running" (with a phase-specific icon or text)
- A 25-cell grid showing each slot's status (empty / pending / done / current)
AutomationIds: CassetteStatusPanel, CassetteCurrentSlotText, CassetteCurrentPhaseText. New buttons in MainWindow.xaml: LoadCassetteButton, StartCassetteRunButton, StopCassetteButton, UnloadCassetteButton.
Profile fields
SimulatorProfile gains three new fields for cassette-mode timing:
| Field | Default | Purpose |
|---|---|---|
WaferLoadMs | 2 000 | Simulated wafer-load duration |
WaferAlignMs | 1 500 | Simulated pre-align duration |
WaferUnloadMs | 1 500 | Simulated wafer-unload duration |
For a 25-wafer cassette: 5 s of overhead per wafer × 25 = 125 s of cassette-overhead. With ~30 s of recipe-run time per wafer, total cassette wall-clock ≈ 14-15 min.
Measurement
A new runbook §5.4 entry: "Cassette cadence capture — SLICE-3.2, 25-wafer cassette under Soak8h profile." 25-wafer run takes ~14 min wall-clock; capture covers Connect → Load Cassette → StartCassetteRun → wait for completion → Unload Cassette → Disconnect.
MeasurementExtraction.psm1 gains:
Get-WafersCompletedCount -DatabasePath $db -LotId $lotId— count fromSELECT COUNT(*) FROM run_summaries WHERE lot_id = @lotId AND terminal_status != 'Pending'.Get-CassetteWallClockSeconds -DatabasePath $db -LotId $lotId—MAX(ended_at_utc) - MIN(started_at_utc)for the lot.
ConvertTo-MeasurementRow adds two rows when -LotId is provided: wafers.completed (count) and cassette.wall-clock (s).
Row block tagged slice-3-2-cassette-cadence in phase-3-measurements.md. Baseline is slice-3-1-rich-defect-model for the 32 overlapping metrics; the two new cassette-specific metrics have no baseline.
Out of Scope
- Multi-cassette / lot-batching beyond 25 wafers. The scheduler handles one cassette at a time. A lot-batching layer is a Phase 4 concern.
- Real wafer-handler robotics. Load / Align / Unload are
Task.Delaysimulations; real motion sequences live in the SDK swap (Phase 4.1). - Pre-align failure modes (notch not found, wafer mis-clamp). Simulated phases always succeed.
- Mid-cassette pause / resume. The scheduler supports stop and abort; pause-and-resume across cassette boundaries is deferred.
- Wafer-level retry policy. A faulted wafer's run advances to the next slot; per-wafer retry is operator-initiated by re-loading the same slot, not by automatic retry.
OperatorId— Phase 3.4's territory. SLICE-3.2 leaves operator identity offRunSummary; SLICE-3.4 adds it.- Defect correlation across wafers — defects link to a single
run_id; cross-wafer pattern detection is Phase 4 ML work. - Cassette UI virtualization — 25 slots is small; no virtualization needed. Larger cassettes (FOSB 13, sample 4) work without changes.
Runtime Behavior
Cassette flow
Operator clicks "Load Cassette" button → ICassetteScheduler.LoadCassetteAsync(LotId.GenerateForToday(), 25)
AppState.Cassette = new CassetteState(lotId, 25, 0, WaferPhase.Idle, [25 slots])
Operator clicks "Start Cassette Run" button → ICassetteScheduler.StartCassetteRunAsync()
Background loop starts:
For each slot in 0..24:
AppState.Cassette transitions phase: Loading
await Task.Delay(profile.WaferLoadMs) [2 s default]
AppState.Cassette transitions phase: Aligning
await Task.Delay(profile.WaferAlignMs) [1.5 s default]
AppState.Cassette transitions phase: Running
await _workflow.StartRunAsync(slot.WaferId, cassette.LotId)
↓
WorkflowService inserts stub run_summary (TerminalStatus = Pending)
↓
Run loop executes — defects persisted with FK enforcement on
↓
WorkflowService updates run_summary (terminal status, defects, etc.)
await wait-for-terminal-state [~30 s typical]
AppState.Cassette transitions phase: Unloading
await Task.Delay(profile.WaferUnloadMs) [1.5 s default]
AppState.Cassette.CurrentSlotIndex++
AppState.Cassette transitions phase: CompleteCassette-vs-direct workflow distinction
The existing direct-workflow path (Connect → Home → Load Recipe → Start Run → ...) continues to work without the cassette scheduler. MainViewModel distinguishes cassette mode from direct mode:
- Cassette mode:
AppState.Cassetteis non-null. Direct-mode buttons (Start Run, Stop, etc.) are visually de-emphasised; cassette-control buttons are foregrounded. - Direct mode:
AppState.Cassetteis null. Cassette panel hides; direct buttons foregrounded.
Switching between modes requires the workflow to be Idle (MainViewModel enables / disables Load Cassette accordingly).
Acceptance Criteria
This slice is satisfied only if all of the following are true:
- New domain shapes (
WaferId,LotId,CassetteSlot,CassetteState,WaferPhaseenum) exist inInspectionPrototype.Domain.Contracts.RunTerminalStatusenum gainsPending. RunSummarygains nullableWaferIdandLotIdstring properties (defaultsnull).- M003 migration adds
wafer_idandlot_idcolumns torun_summariesplus the two indexes; existing rows have NULL for both. Idempotent re-run is no-op. - M004 migration records the FK-band-aid retirement in
schema_version(no schema change). Idempotent. SqliteRunHistoryStore.SaveAsyncparameter list includes the two new columns; INSERT OR REPLACE upserts onrun_id.SqliteDefectStore.OpenAsyncno longer issuesPRAGMA foreign_keys = OFF. The connection-string defaultForeign Keys=True(or unspecified, which defaults to enforced) is preserved.ICassetteScheduler+SimulatedCassetteSchedulerexist per the In Scope shapes. Loop drives Load → Align → Run → Unload for each slot. Single-wafer faults log Warning and advance to next slot; cassette-level abort cancels the current phase's CTS and transitions toComplete.WorkflowService.StartRunAsync(WaferId?, LotId?)accepts optional wafer/lot identifiers; persists a stubrun_summaryrow at run start withTerminalStatus = Pending; updates the row at run end with the terminal values.AppState.Cassettefield exists;CassetteServiceupdates it on each phase transition;MainViewModelprojects it.SimulatorProfilegainsWaferLoadMs,WaferAlignMs,WaferUnloadMs(all default per spec).CassetteStatusPanelUserControl shows LotId, progress (current / total), current phase, 25-slot grid. AutomationIds present (CassetteStatusPanel,CassetteCurrentSlotText,CassetteCurrentPhaseText). New buttonsLoadCassetteButton,StartCassetteRunButton,StopCassetteButton,UnloadCassetteButtoninMainWindow.xamlwith AutomationIds.- 25-wafer cassette criterion (slice's exit gate): under
Soak8hprofile, a single cassette completes 25 wafers in ≤ 20 min wall-clock with 25 distinctRunSummaryrows persisted (SELECT COUNT(*) FROM run_summaries WHERE lot_id = @lotId AND terminal_status != 'Pending' = 25). Each wafer'swafer_idfollows theLOT-yyyymmdd-hhmmss-W##pattern. No FK errors in the application log; defects (if any) correlate to the correctrun_idvia the foreign key. - New
MeasurementExtraction.psm1helpersGet-WafersCompletedCount,Get-CassetteWallClockSecondsplus Pester tests. - New runbook §5.4 entry describing the cassette capture procedure.
- New row block
slice-3-2-cassette-cadenceinphase-3-measurements.mdcovering the 32 standard metrics + 2 new cassette-specific metrics. Baseline isslice-3-1-rich-defect-model. - Phase 2 trigger assessment in row's Notes section: cassette mode runs ~25× the workflow-state-transition rate of single-run mode (each cassette wafer drives Idle → Loading → Aligning → Running → Unloading → ...). Apply the SLICE-2.0 rubric to the captured numbers; document the outcome.
- Existing tests pass; FK band-aid removal verified by an integration test that runs a cassette and asserts (a) defects exist for each wafer, (b)
SELECT COUNT(*) FROM defects d LEFT JOIN run_summaries r ON d.run_id = r.run_id WHERE r.run_id IS NULL = 0(no orphans). - Pre-existing direct-mode workflow path continues to work without regressions;
Capture-Measurements.ps1 -Profile MultiTagstill produces a row equivalent toslice-1-1-multi-tag-telemetry.
Verification Notes
- Stub-row insertion is on the same fire-and-forget pattern as defects. The stub
_runHistoryStore.SaveAsync(stubSummary)runs withoutawait; if it fails, the run still proceeds in memory but defect persistence may fail FK validation on subsequent inserts. A test verifies the stub-row insert is durable enough that subsequent same-thread defect inserts find the parent row. RunTerminalStatus.Pendingis never reported as a terminal state to operators. The UI'sWorkflowStateprojection ignoresPendingrows; only the database sees them transiently.- Cassette stop semantics: "Stop" finishes the current wafer's phase but does not advance to next; "Abort" cancels the current phase's CTS immediately. Same pattern as
WorkflowService's Stop / Abort distinction (the cassette layer wraps the run-level distinctions consistently). - Multi-wafer fault tolerance: if Wafer 7 faults but Wafer 8 should proceed, the scheduler logs the Wafer-7 fault, transitions to
Unloading, and continues to Wafer 8. A run-level fault is NOT a cassette-level fault. This matches real machine behavior. - The 20 min wall-clock criterion 12 has slack — typical ~14 min per cassette. The 6-minute margin absorbs slow-host noise and accounts for any per-wafer phase variability under chaos.
- Pre-existing reproducibility check (criterion 18) is the SLICE-1.4 / SLICE-2.0 pattern —
slice-1-1-multi-tag-telemetryrow reproduces against the merged commit. The cassette changes affect run-startup and run-end paths; if a non-cassette MultiTag capture produces materially different numbers, the cassette-aware code path is bleeding into the direct-mode path somewhere unintended. - The FK band-aid removal is the load-bearing architectural fix. Verify by:
- Code:
SqliteDefectStore.OpenAsyncno longer containsPRAGMA foreign_keys = OFF. - Capture: 25-wafer cassette with HighDefect or chaos-mode defect rates produces zero
SqliteException Error 19log entries. - Schema:
PRAGMA foreign_keysreports1(enabled) on a fresh connection from the post-fix code.
- Code:
- Cassette UI is the second material UI surface expansion (after SLICE-3.1's WaferMapView). Same dispatcher-hop discipline applies —
OnStateChangedruns on the producer thread; UI projections marshal via_dispatcher.InvokeAsync.