Skip to content

AppState Design Review

TL;DR

The hypothesis that triggered this review — "the AppState design is JS-frontend-shaped and over-abstracted" — is not what the code shows. The current design is minimalist: one immutable record, one mutating method (Update(Func<AppState, AppState>)), one selector-based subscription primitive. There are no actions, no reducers-as-discriminated-unions, no middleware, no thunks, no DevTools. The ceremony budget is well under what Redux/MobX would require.

What the review does find: a small number of concrete correctness, auditability, and operational-fitness gaps that have nothing to do with abstraction and everything to do with the store being slightly under-specified for a fab environment, plus a handful of craftsmanship items — vocabulary leakage, type-level laxity, dead surface — that hold the code back from "world-class for this domain." Most are addressable with focused, small changes rather than a pattern swap.

Scorecard at a glance — 7 pass, 1 pass-by-accident, 6 partial, 2 fail (definitions below the table).

Scorecard

#CriterionVerdictHeadline
1Threading & ownershipPartialStore mutations are correctly serialized; subscriber dispatch is not. State can be observed regressing at a subscriber under concurrent writes.
2Backpressure & bounded memoryPass (by accident)Channels in front of the store handle backpressure; the store itself has no contract and would stall all writers behind one slow subscriber.
3Consistency between related slicesPassReducers compose atomically; with clones ensure all-or-nothing transitions. Subscriber ordering caveat is filed under §1, not here.
4Crash & fault recoveryPartialDurable vs. ephemeral boundary is real (Sqlite stores) but not visible at the AppState type level. Crash file captures only 4 fields + 50 diag lines, not full state.
5AuditabilityPartialSerilog file logs + RecentDiagnostics (bounded 200) + store.update.calls counter exist, but no single "who changed what slice when" record. Reducer effects are not logged.
6aOperator-visible statePass20 distinct state slices subscribed by MainViewModel; operator-facing readout is dense and covers connection, workflow, motion, alarms, defects, history, diagnostics.
6bEngineer deep-dive snapshotFailNo on-demand state dump. Crash file is the closest artifact and only fires on crash. Service engineer cannot pull current state without rebuilding or attaching dotnet-counters.
7Equipment-state reporting (E10)FailNo projection from (ConnectionState, WorkflowState, MotionState, Safety, ActiveAlarms) into SEMI E10 categories. No timestamped equipment-state-transition log.
8Determinism in testsPassReducers are Func<AppState, AppState>; state transitions test as pure functions. AppState is a record with structural equality.
9Marginal cost of new fieldPassAdding a slice touches AppState.cs (ctor + Initial) and the writer site only. with-clones automatic.
10Cognitive loadPassTrace is "click → Command → Service.Update(reducer) → Subscribe fires → ViewModel updates." No framework vocabulary required beyond "selector + comparer".
11Earned abstractionPassThree concrete benefits: single mutation point + caller-dimensioned metrics + change-detected subscriber dispatch. JS ceremony is genuinely absent.
12What the store is NOTPartialCleanly separated from workflow engine and persistence. But RecentDiagnostics is an event log embedded in state, and PipelineCounters / OperationalCounters duplicate counters that also live in AppMetrics.
13Reads as domain prosePartialMost fields use fab vocabulary (Cassette, LoadedRecipe, ActiveAlarms, LatestTagValues). Simulator vocabulary (SelectedSimulatorProfile, SimulatorProfileCatalog) leaks into production AppState; SLICE-2.7 renamed it inside RunSummary but the store fields were not touched.
14Invalid states unrepresentablePartialThe record allows invalid pairs (WorkflowState=Idle + ActiveRun!=null, ConnectionState=Disconnected + LatestFrame!=null, WorkflowState=Faulted + empty ActiveAlarms). Reducer discipline is the only enforcement. Acceptable for the prototype; explicit if a fab integration needs stronger guarantees.
15API surface — small, honest namesPass (with blemishes)Four-member IAppStateStore surface is right-sized. Most names are clear. RegisterTagsActiveProvider names the mechanism not the intent; WithClearedCrashBanner reads awkwardly. Minor.
16No accidental complexity / dead surfacePartialNo TODO/HACK in the Application layer. But: legacy StateChanged event runs in parallel with Subscribe<T> for test-fake back-compat; null! workaround at AppState.cs:36; MaxDiagnosticsEntries magic number in the wrong file.

Verdict legend:

  • Pass — current design meets the criterion on merit. No action needed.
  • Pass (by accident) — the criterion is met because of something outside the store; no contract in the store guarantees it. Document or move into the contract.
  • Partial — meets the spirit, fails on a specific concrete point. Smallest-change fix exists.
  • Fail — the design has no mechanism for this. Needs a new mechanism.

The shape of the current design (one-paragraph recap)

AppState (src/InspectionPrototype.Application/State/AppState.cs:12-72) is a positional record with 24 fields and an Initial factory. IAppStateStore (src/InspectionPrototype.Application/Abstractions/IAppStateStore.cs:11-84) exposes Current, a legacy StateChanged event, Update(Func<AppState, AppState>), and Subscribe<T>(selector, onChange, comparer). AppStateStore (src/InspectionPrototype.Application/Services/AppStateStore.cs:14-149) holds _current behind a System.Threading.Lock, runs the reducer inside the lock, releases the lock, then fires the legacy event and the typed subscribers outside the lock. AppStateExtensions (src/InspectionPrototype.Application/Services/AppStateExtensions.cs) provides WithDiagnosticsEntry, WithRunTerminated, etc. as small composable mutators. MainViewModel consumes 20 state slices and 16 command-guard slices via Subscribe<T> (src/InspectionPrototype.Presentation/ViewModels/MainViewModel.cs:373-822), each callback marshaled to the UI thread via _dispatcher.Invoke.

That's the whole pattern. It is much closer to "central mutable cell with snapshot semantics and a typed observer" than to Redux.

Findings by criterion

1. Threading & ownership — Partial

Pass: state mutation is correctly serialized. The reducer runs inside lock (_lock) (AppStateStore.cs:45-52); concurrent writers cannot tear the snapshot. _current is published under the lock and read under the lock (AppStateStore.cs:26-29).

Fail point — subscriber dispatch is not serialized. Two threads T1 and T2 both calling Update produce two state transitions s0 → s1 (by T1) and s1 → s2 (by T2), but after each thread releases the lock it enters an unsynchronized dispatch phase:

csharp
// AppStateStore.cs:58-65
StateChanged?.Invoke(next);
ISubscription[] snapshot;
lock (_lock) snapshot = _subscribers.ToArray();
foreach (var sub in snapshot)
    sub.OnStateChanged(next);

If T2's foreach interleaves with T1's, a subscriber S receives OnStateChanged(s2) before OnStateChanged(s1) — i.e. the projection it sees regresses. Inside Subscription<T>.OnStateChanged (AppStateStore.cs:129-139), _last is read and written without any lock, so the change-detection check _comparer.Equals(_last, next) races against another thread executing the same path.

Why this hasn't bitten yet: in practice, the main writers (WorkflowService — 32 Update calls — and the three BackgroundService pipelines) tend to write sequentially within their own contexts, and the UI subscribers all marshal to the UI thread via _dispatcher.Invoke, which re-serializes everything on the UI thread before any state-derived work runs. So the race is masked at the UI boundary. But: any non-UI subscriber would observe it, and any future subscriber-driven side effect (e.g. SEMI E10 reporting, audit log emission) would be exposed.

Recommendation: make the dispatch phase part of the locked section, or use a per-store dispatch queue that serializes notifications. The current cost (subscriber callbacks under lock) is bounded by the SLICE-2.4 invocation rate (1.12 callbacks per Update on average), so the lock-hold extension is small.

Bonus observation: System.Threading.Lock is non-reentrant. A reducer that triggers Update (transitively, e.g. by calling _logger.LogWarning which routes through DiagnosticsTimelineSinkDiagnosticsTimelineSink.cs:42) would deadlock. The design relies on the discipline "don't log Warning-or-above from inside a reducer," but nothing enforces it. Worth a comment at the Update method explicitly forbidding it, or a recursion guard on the calling thread.

2. Backpressure & bounded memory — Pass (by accident)

Producers (SimulatedTagSource, SimulatedCamera, SimulatedEncoderSource) sit behind bounded Channel<T> with DropOldest policy (TagStreamPipelineService.cs:14-19, FramePipelineService.cs:14-19). When the consumer (the pipeline service that calls _store.Update) falls behind, the producer drops at the channel boundary — never at the store. So unbounded queueing into the store is structurally prevented.

But: the store itself has no subscriber-latency contract. If a future subscriber were slow (e.g. a synchronous DB write inside onChange), Update would block on it because dispatch is synchronous. Every writer — pipelines, workflow, crash reporter, diagnostics sink — would stall on that one subscriber. There is no fast-fail, no timeout, no "this subscriber is too slow" telemetry.

Recommendation: add subscriber.callback.latency.micros histogram dimensioned by selector. Document the implicit contract in the IAppStateStore.Subscribe<T> XMLdoc: "callbacks must complete in O(microseconds); long work must marshal off the calling thread." (The XMLdoc currently says callbacks fire on the writer's thread but doesn't bound their cost.)

Every reducer produces a single fully-formed AppState. Multi-slice transitions are explicit and atomic: e.g. WorkflowService writes WorkflowState + ActiveRun + LastRunSummary + OperationalCounters together inside one reducer body (32 Update call sites in WorkflowService.cs, each composing the slices it needs). FramePipelineService.ProcessDefectsForFrame (FramePipelineService.cs:130-141) updates ActiveRun.DefectCount, ActiveRun.DefectsCritical/Major/Minor, and RecentDefects in one reducer.

A subscriber observing a single AppState snapshot cannot see a torn read across slices.

The subscriber-ordering caveat (a subscriber receiving callbacks for s2 then s1) is real but is a §1 finding, not a §3 finding: each individual AppState snapshot is internally consistent.

4. Crash & fault recovery — Partial

Pass: the durable vs. ephemeral boundary is real. RunHistory and recent defects are hydrated at startup from SqliteRunHistoryStore and SqliteDefectStore (per SLICE-3.3 / SLICE-3.1 history). The rest of AppState resets to AppState.Initial (AppState.cs:43-65).

Fail point — the boundary is not visible at the type. Reading the AppState(...) ctor signature, you cannot tell that RunHistory is durable but LatestFrame is not. The 24 fields are typographically uniform. A new contributor adding a new field will not know whether to wire a persistence path.

Fail point — crash files capture only 4 fields plus the last 50 diagnostics entries (CrashReporter.cs:113-132):

csharp
sb.AppendLine($"WorkflowState: {snapshot.WorkflowState}");
sb.AppendLine($"ActiveRun Id:  {snapshot.ActiveRun?.RunId.ToString() ?? "(none)"}");
sb.AppendLine($"Recipe Name:   {snapshot.ActiveRun?.RecipeName ?? ...}");

ConnectionState, MotionState, IsMotionHomed, CameraState, Safety, stage position, active alarms, simulator profile, pipeline counters — all absent from the crash file. For post-mortem analysis of "what was the machine doing at the moment of failure," this is thin.

Recommendation A: annotate slices in AppState with a marker (attribute or split type) that says "durable" vs. "ephemeral" vs. "derived." Could be as simple as XMLdoc tags + an XUnit architecture test that asserts every durable slice has a hydration entry point.

Recommendation B: widen CrashReporter.BuildCrashContent to serialize the full snapshot as JSON, or at least the enum/scalar fields. The cost is one JsonSerializer.Serialize(snapshot, opts) call; the value is full forensic state. Defects + frame payloads would need to be excluded by attribute to keep crash files bounded.

5. Auditability — Partial

What exists:

  • Serilog file sink (Information+) — full structured log of every LogX call.
  • RecentDiagnostics (bounded 200 — AppStateExtensions.cs:11) — operator-visible warnings+ rolling window.
  • store.update.calls counter dimensioned by caller = FileName.MethodName (AppMetrics.cs:65) — you can see which method called Update and how often.

What does not exist:

  • A record of what changed. The reducer is opaque; you see "WorkflowService.StartRunAsync called Update at t=...", you do not see "WorkflowState: Idle → Running, ActiveRun: null → {RunId=...}".
  • A durable equipment-event log distinct from RecentDiagnostics. RecentDiagnostics is bounded and in-memory; it's lost on restart and rolls off at 200 entries.
  • A way to answer "why did the system command X at 14:03:22" with a single artifact.

For SEMI E10 reporting (criterion 7), reconstructing equipment-state transitions from Serilog text logs requires regex over message strings — not robust.

Recommendation: introduce a StateTransitionLog slice — append-only, durable, schema'd records (timestamp, actor, slicesChanged, before, after) written from AppStateStore.Update itself (i.e. the store records its own deltas). This is mechanically possible because the store has both _current (before) and next (after) inside the lock. Cost: one structured-log emit per Update, dimensioned. Sqlite-persisted via the existing MigrationRunner. This single mechanism feeds criteria 5, 6b, and 7 simultaneously.

6a. Operator-visible state (touchscreen) — Pass

MainViewModel subscribes to 20 distinct state slices for operator readout (MainViewModel.cs:373-707):

ConnectionState/WorkflowState/MotionState/CameraState/IsMotionHomed, MachineReadyDerivation, StagePosition, LoadedRecipe, RecipeCatalog, SimulatorProfiles, SelectedSimulatorProfileName, ActiveRun+WorkflowState, TagSummary (temperature, pressure), LatestFrame, CurrentFrameBitmap, PipelineCounters, SafetySignals, ActiveAlarms, LastRunSummary, RunHistory, RecentDiagnostics, OperationalCounters, CrashBanner, RecentDefects, Cassette.

Every operator-relevant slice is reachable from the UI. No state value drives an operator decision that the operator cannot see.

6b. Engineer deep-dive snapshot — Fail

There is no facility for a service engineer to pull current state off a deployed machine. Available artifacts are:

  1. %LOCALAPPDATA%\InspectionPrototype\crashes\crash-*.txt — only on crash, and only 4 fields + 50 diag lines as noted in §4.
  2. Serilog file sink — text log, requires manual reconstruction.
  3. dotnet-counters monitor --name InspectionPrototype.App — requires a laptop with dotnet-counters installed and the ability to attach to the running process; useful for live observation, not for "snapshot the current state to a file I can email."

There is no menu command, no debug endpoint, no file-based dump. Engineer cannot ask the running app for its AppState.

Recommendation: add a "Save State Snapshot…" service-engineer command (gated by a hidden key chord or a service-mode flag — not exposed to operators) that serializes _store.Current to JSON and writes it next to the crash directory. Composable with the §4 recommendation; same serialization path.

7. Equipment-state reporting (SEMI E10-aligned) — Fail

AppState has the ingredients for E10 classification — ConnectionState, WorkflowState, MotionState, Safety, ActiveAlarms — but there is no projection that emits (productive / standby / engineering / scheduled-down / unscheduled-down / non-scheduled). There is also no equipment-state-transition log with the timestamp precision E10 requires.

A fab integration that needed E10 reporting today would have to:

  1. Subscribe to multiple slices via Subscribe<T>(s => (s.ConnectionState, s.WorkflowState, s.Safety, s.ActiveAlarms.Count), ...).
  2. Compute the E10 category in the subscriber.
  3. Maintain its own transition log externally.

Step 3 is the real cost: there's nowhere natural to put it. The store doesn't keep transitions; RecentDiagnostics is bounded; Serilog is text. The §5 recommendation (durable StateTransitionLog) would resolve this — the E10 projector becomes a stateless function over slices, and the transition timestamps come from the store's own delta log.

Recommendation: treat E10 as the forcing function for §5's recommendation. Build the durable transition log first; add an EquipmentStateProjector (pure function AppState → E10State) and a service that subscribes and writes E10 transitions to the same log. This avoids inventing a parallel mechanism.

8. Determinism in tests — Pass

Reducers are Func<AppState, AppState>. Tests of "given state S, action A produces state S′" are pure function calls. AppState is a record with structural equality, so assertion is Assert.Equal(expected, actual).

No dispatcher, no scheduler, no UI thread required to test a state transition.

The store itself can be instantiated as new AppStateStore(new AppMetrics()) for tests that need subscription semantics. AppStateStoreSubscribeTests (tests/.../AppStateStoreSubscribeTests.cs) and AppStateStoreInstrumentationTests exist and run as part of the 568-test suite.

9. Marginal cost of a new field — Pass

To add a state slice:

  1. Add it to AppState's positional record signature (AppState.cs:12-37).
  2. Initialize it in AppState.Initial (AppState.cs:43-65).

Existing reducers using state with { ... } automatically preserve the new field — they don't need to be touched.

Optional follow-ups:

  • A new WithXyz extension in AppStateExtensions if the mutation is composable.
  • One or more new Subscribe<T> call sites if anything needs to react.

No dispatcher registration, no action type, no reducer-table entry. The cost is genuinely 1–2 files.

10. Cognitive load — Pass

Trace, top-to-bottom, no framework vocabulary required:

User clicks button → RelayCommand fires → handler calls IWorkflowService.StartAsync(...) → service calls _store.Update(state => state with { WorkflowState = ... }) → store mutates _current under lock → subscribers fire → MainViewModel subscriber marshals to UI thread via _dispatcher.Invoke[ObservableProperty] setter raises INPC → XAML binding updates.

The only non-obvious wrinkle is the Subscribe<T> comparer parameter — collection-typed selectors need ReferenceEqualityComparer.Instance because AppState uses copy-on-write and an unchanged collection retains the same reference. This is documented in IAppStateStore.cs:70-74 and is a real piece of vocabulary, but it's bounded.

11. Earned abstraction — Pass

What the abstraction buys:

  1. Single mutation point. Every writer goes through IAppStateStore.Update; analyzers and tests can assert this. Without the store, you'd have 30+ services each holding their own slice — the canonical AppState ADR-001 motivation.
  2. Caller-dimensioned telemetry. [CallerMemberName] + [CallerFilePath] injected at every Update site give store.update.calls{caller="WorkflowService.StartRunAsync"} automatically (AppStateStore.cs:33-40). The SLICE-2.0 measurement work depended on this.
  3. Change-detected selector subscription. Subscribe<T> runs the selector on every state change and fires onChange only when the projected value differs. Pre-SLICE-2.4 (full-state StateChanged event), this dispatched 20+ callbacks per Update; post-SLICE-2.4 it averages 1.12 per Update. Real work that pays its way.

What the abstraction does not require: actions as discriminated unions, action creators, dispatchers, middleware, thunks, sagas, effect handlers, observable stores, time-travel state, reducer registration tables, schema. None of these are present, and none are needed for the store's responsibilities.

12. What the store is NOT — Partial

Pass: the store is not a workflow engine (WorkflowService is 871 lines and holds the state machine; the store just holds the current WorkflowState enum value). It is not a persistence layer (SqliteRunHistoryStore, SqliteDefectStore, JsonRecipeCatalog are separate; hydration is explicit at startup).

Fail points:

  1. RecentDiagnostics is an event log embedded in state. WithDiagnosticsEntry (AppStateExtensions.cs:17-29) appends to a bounded List<DiagnosticsEntry> capped at 200. This is a stream of events, not a state value. The fact that it lives in AppState makes every state mutation that touches diagnostics produce a full list copy + cap. Plus it conflates "current state" with "recent history." A separate IDiagnosticsLog interface with its own bounded ring buffer (and durable backing per §5) would be cleaner — and AppState would lose a field whose presence muddles the "single source of truth" intent.

  2. Counters are duplicated. AppState.PipelineCounters and AppState.OperationalCounters (AppState.cs:30-31) hold counts that also live in AppMetrics (AppMetrics.cs:42-48: FramesIngested, FramesDropped, etc.). The duplication exists because UI binding is easier off AppState than off Meter instruments. Acceptable as a pragmatic call, but it means every increment is two writes — one through _store.Update and one through _metrics.X.Add(1). Worth a comment in AppState explaining the duplication and pointing at AppMetrics as the authoritative live counter; otherwise a contributor will inevitably ask "which one do I read?"

  3. CrashBanner is a UI-state value in AppState. Borderline — it represents persistent UI state ("user has not yet dismissed the crash banner") and surviving across view recreations is the whole point. Arguably fine. Filed as a minor finding.

13. Reads as domain prose — Partial

Most of AppState's 24 fields carry domain weight. Cassette, LoadedRecipe, RecipeCatalog, ActiveRun, RunHistory, ActiveAlarms, LatestTagValues, LatestFrame, RecentDefects, StageX/Y, IsMotionHomed, Safety — a fab engineer reads these as fab concepts, not as state-management generics.

Failure points:

  1. Simulator vocabulary in production state. SelectedSimulatorProfile and SimulatorProfileCatalog (AppState.cs:32-33) carry the prototype's simulator-shaped reality into what should be domain state. SLICE-2.7 already recognized this concern and renamed RunSummary.SimulatorProfileName → OperatingProfileName (per roadmap-progress.md, 2026-05-11). But the AppState-level fields were not renamed in the same pass. The vocabulary leak persists at the store surface.

  2. Scaffolding-not-domain names. PipelineCounters and OperationalCounters (AppState.cs:30-31) name what they are (counter aggregates) rather than what they describe (frame/run throughput; equipment-run history). Acceptable for instrumentation; worth a comment so they don't become a template for new fields.

  3. AppState itself is a generic name. ADR-001 enshrined it and changing the type name would ripple through hundreds of references. Not a finding I'd recommend acting on, but worth naming: a green-field design might choose MachineState or InspectionState for the domain hook.

Recommendation: rename SelectedSimulatorProfile/SimulatorProfileCatalog to ActiveOperatingProfile/OperatingProfileCatalog (matching the SLICE-2.7 rename of RunSummary.OperatingProfileName). Mechanical edit; would have to be a passover slice with the rename + tests + migration of any persisted column names.

14. Invalid states unrepresentable — Partial

The record definition (AppState.cs:12-37) allows several physically-impossible combinations to be constructed by with { }:

  • WorkflowState = Idle together with ActiveRun != null. A machine cannot be "idle" and "have a run in progress." Type allows it.
  • ConnectionState = Disconnected together with LatestFrame != null or LatestTagValues populated. No camera, no frames; no tag stream, no tag values. Type allows it.
  • WorkflowState = Faulted with ActiveAlarms empty. A fault without an alarm is contradictory in the model. Type allows it.
  • IsMotionHomed = true together with MotionState = NotReady. Contradictory. Type allows it.

The enforcement lives in the reducers and in WorkflowService (871 lines, the state machine). This is acceptable as long as nothing constructs AppState instances outside the store. In practice, nothing does. But the type surface is informationally misleading — a reader of AppState.cs cannot tell which combinations are legal.

A stricter design would use a discriminated union over WorkflowState, with the run/recipe/cassette context bundled into the active variant:

csharp
abstract record WorkflowPhase;
record Idle() : WorkflowPhase;
record Running(ActiveRunState Run, Recipe Recipe) : WorkflowPhase;
record Faulted(Alarm RootCause, IReadOnlyList<Alarm> AdditionalAlarms) : WorkflowPhase;
// ...

This is a real refactor — every reducer site, every selector, every command-guard subscriber would need to change. It would also pay for itself only if the prototype is moving toward fab integration where invariants must be enforceable across teams. For the current scope, I do not recommend it. Filing the option here so it's a known choice, not an oversight.

Recommendation: at minimum, add [Pure] / XMLdoc-level invariants on AppState that name the legal combinations, and an architecture test that asserts a sample of forbidden constructions throws. This makes the discipline explicit without a rewrite.

15. API surface — small, intentional, honestly named — Pass (with blemishes)

The IAppStateStore surface is four members: Current, StateChanged, Update, Subscribe<T>. Right-sized.

Method names mostly meet the bar:

  • Update(reducer) — names the action.
  • Subscribe<T>(selector, onChange, comparer) — standard observer vocabulary, well-typed.
  • Current — clear.
  • WithDiagnosticsEntry, WithRunTerminated, WithAlarmRaised, WithCatalogResults — record-With idiom; reads naturally.

Blemishes:

  1. AppMetrics.RegisterTagsActiveProvider (AppMetrics.cs:122-123). The name describes the mechanism (registering a callback) rather than the intent (supplying the polling source for the TagsActive gauge). Why a provider pattern? Because the gauge is created in AppMetrics's constructor but the producer that knows the count (SimulatedTagSource) isn't constructed yet. A cleaner alternative: pass the provider to the constructor; or expose TagsActive as IObserver<long>-style instead of an observable gauge. Tiny finding; named for honesty about API quality.

  2. CrashStateExtensions.WithClearedCrashBanner (CrashStateExtensions.cs:26-27). WithCrashBannerCleared would read more naturally and match the With{Noun}{Adjective} pattern. Trivial.

  3. [CallerMemberName]/[CallerFilePath] on Update and Subscribe<T> are public surface that call sites do not pass. The XMLdoc tells the reader "do not pass explicitly" — but they're still public parameters. This is the canonical .NET idiom for caller-attribute capture and acceptable; alternatives (separate UpdateWithCaller overloads) would be worse.

No finding requires a recommendation beyond items 1–2 above as small renames.

16. No accidental complexity, no dead surface — Partial

What's clean:

  • Zero TODO / HACK / FIXME in the Application layer (verified by repository-wide search).
  • No commented-out code paths in the store / extensions / pipelines.
  • Helper methods in AppStateExtensions are each used by exactly one writer; nothing speculative.

What isn't:

  1. Dual notification channels. IAppStateStore.StateChanged (legacy full-state event) runs in parallel with Subscribe<T> (per-slice). Both fire on every Update. XMLdoc at IAppStateStore.cs:18-24 says "retained for back-compat with test fakes; new consumers should use Subscribe<T>." Production callers were migrated in SLICE-2.4. Two notification paths exist solely because test fakes haven't been touched yet.

  2. null! workaround at AppState.cs:36. IReadOnlyList<Defect> RecentDefects = null!. This silences the nullable-reference-types warning for a default value the record's positional ordering requires. The actual null value never leaks (the Initial factory and every reducer set it to []). But the type signature lies about nullability. The fix is to drop the default value (move RecentDefects earlier in the positional list, or convert it to a non-defaulted required field). Both touch every AppState construction site once.

  3. MaxDiagnosticsEntries = 200 magic number in AppStateExtensions.cs:11. This is a domain bound that callers might reasonably want to read (e.g. for UI virtualization or test fixtures). It lives buried in an internal static class while RecentDefectsWindow = 100 lives publicly on AppState. Inconsistent + arguably mis-located.

  4. Two write paths for diagnostics. WithDiagnosticsEntry (AppStateExtensions.cs:17-29) and DiagnosticsTimelineSink.Emit (DiagnosticsTimelineSink.cs:42) both produce diagnostics entries — the latter via the Serilog pipeline, the former by direct call. Not a duplication exactly; each has its own caller pattern (Serilog Warning+ vs. direct call from reducers). But the path-from-source-to-entry is non-obvious. A reader sees WithDiagnosticsEntry in two unrelated places and has to reason about which Serilog levels also get there. Worth a flow diagram or a "two paths in" comment on RecentDiagnostics.

Recommendations:

  • R-16a: retire StateChanged (already in §11 follow-ups list as F-1). Net change: remove ~5 lines + migrate test fakes (a few file edits).
  • R-16b: drop the null! default — promote RecentDefects in positional order or make it required. Single-pass edit.
  • R-16c: move MaxDiagnosticsEntries to AppState as public const int RecentDiagnosticsWindow = 200, matching RecentDefectsWindow. One-line move.
  • R-16d: comment the two-paths-in nature of RecentDiagnostics and link to DiagnosticsTimelineSink.

Cross-cutting findings

These are observations that span multiple criteria and are worth surfacing once:

F-1. The legacy StateChanged event is still on IAppStateStore. XMLdoc (IAppStateStore.cs:18-24) says "retained for back-compat with test fakes; new consumers should use Subscribe<T> instead." The SLICE-2.4 work migrated production callers; if test fakes can be migrated too, removing it shrinks the public surface, removes a hidden full-state notification per Update, and forces all subscribers through the change-detected path. Filed as a follow-up to SLICE-2.4.

F-2. AppStateStore.Current takes the lock to read. _current is a reference; reference assignment is atomic on .NET; the publish path inside Update happens before subsequent reads regardless of locking in the read path. The Volatile.Read(ref _current) pattern (or just _current after marking it volatile) would be sufficient and would remove a small but real source of Current-getter contention. Not a correctness issue — performance/contention only.

F-3. The MaxDiagnosticsEntries = 200 constant lives in AppStateExtensions.cs while RecentDefectsWindow = 100 lives on AppState. Inconsistent. Either both should live on AppState (as public bounds that callers can reference) or both should live in the extension that mutates them. Tiny but worth fixing while the area is being touched.

F-4. Subscribers with s => true (MainViewModel.cs:772, 779) fire on every state change. Two of the 16 command-guard subscriptions use this pattern because the commands have no specific guard. They each cost one callback per Update, partially undermining the SLICE-2.4 subscriber-reduction story for those two commands. Acceptable, but a comment at the call sites pointing this out would prevent the pattern from being copy-pasted elsewhere.

Recommendations, ranked

In order of value-per-effort:

  1. Serialize subscriber dispatch. Extend the locked region in AppStateStore.Update to include the foreach (var sub in snapshot) loop, or move dispatch into a serial queue. Fixes criterion 1 outright; sub-millisecond cost given current subscriber rates. Single file edit. (Maps to: §1.)

  2. Document the subscriber-latency contract + add subscriber.callback.latency.micros histogram. Comment the implicit "callbacks must be cheap" rule on IAppStateStore.Subscribe<T>. Add a histogram so a future slow subscriber is visible in dotnet-counters rather than discovered via field complaints. (Maps to: §2.)

  3. Widen the crash file to a full JSON snapshot. Replace the hand-written 4-field block in CrashReporter.BuildCrashContent with JsonSerializer.Serialize(snapshot, indented). Add [JsonIgnore] to byte-array slices (LatestFrame.Pixels) so files stay small. (Maps to: §4 and feeds §6b.)

  4. Add a service-mode "Save State Snapshot…" command. Same serialization path as #3, on-demand from the UI rather than crash-triggered. Gated by a service-mode flag so operators don't see it. (Maps to: §6b.)

  5. Introduce a durable StateTransitionLog. Append-only log written from AppStateStore.Update itself capturing (timestamp, caller, sliceDelta). Sqlite-persisted via the existing MigrationRunner. This single mechanism is the load-bearing piece for criteria 5, 6b, and 7 — without it, those criteria are unmeetable; with it, all three become reachable. (Maps to: §5, §6b, §7.)

  6. Add EquipmentStateProjector (pure function over slices). Once #5 exists, the E10 projection is a small additional file: AppState → E10State pure function + a thin service that subscribes, projects, and writes transitions to the log from #5. (Maps to: §7.)

  7. Lift RecentDiagnostics out of AppState into IDiagnosticsLog. Once a durable log exists, the in-memory diagnostics ring buffer is a UI affordance, not state. AppState shrinks by one field, WithDiagnosticsEntry goes away, and the "events vs. state" line gets sharper. (Maps to: §12 finding 1.)

  8. Annotate durable vs. ephemeral slices on AppState. XMLdoc-only at minimum; an architecture test that asserts the annotations match hydration code is the stronger form. Low cost, high contributor-clarity payoff. (Maps to: §4.)

  9. Retire the legacy StateChanged event once test fakes are migrated. Pure cleanup; shrinks the public surface. (Maps to: F-1.)

  10. Cleanup pass: _current read without lock; MaxDiagnosticsEntries location; comment on s => true guard subscriptions; null! removal; WithClearedCrashBannerWithCrashBannerCleared. Tiny edits, none correctness-affecting. (Maps to: F-2, F-3, F-4, R-16b, R-16c, §15 blemish 2.)

  11. Rename SelectedSimulatorProfile/SimulatorProfileCatalogActiveOperatingProfile/OperatingProfileCatalog. Closes the SLICE-2.7 rename that stopped at RunSummary. Vocabulary cleanup; mechanical edit with one migration touch for any persisted column. (Maps to: §13 finding 1.)

  12. Add invariants documentation + an architecture test for forbidden AppState constructions. Names the legal combinations of (WorkflowState, ActiveRun, ConnectionState, ...) explicitly; asserts that forbidden constructions don't pass an architecture test. Cheaper than a discriminated-union refactor; preserves §10 cognitive load. (Maps to: §14.)

  13. Comment the two-paths-in nature of RecentDiagnostics. One-comment edit on AppState.RecentDiagnostics pointing at both WithDiagnosticsEntry and DiagnosticsTimelineSink. (Maps to: R-16d.)

  14. Rename RegisterTagsActiveProvider or restructure AppMetrics construction to remove the post-hoc-registration pattern. Either pass the provider to the AppMetrics constructor (cleanest) or rename to express intent (UseTagsActiveSource). (Maps to: §15 blemish 1.)

Total roadmap impact: items 1, 2, 3, 8, 9, 10, 11, 12, 13, 14 are mechanical and could land in a single small slice ("SLICE-2.9: AppState design + craftsmanship follow-ups"). Items 4, 5, 6, 7 are larger — they enable real new capability (E10, service-engineer snapshots) and likely warrant their own slice in Phase 3 or 4.

What we explicitly chose NOT to recommend

Listed so future drift toward these is informed, not accidental:

  • Adopting actions / reducer tables / middleware / a state-change pipeline. None of these address the findings; all of them would raise the cognitive-load cost (§10) and the marginal-cost-per-field (§9) without improving correctness. The current Func<AppState, AppState> form is correct and minimal.
  • Splitting AppState into multiple sub-stores. The 2026-05-09 doc considered this as SLICE-2.1; SLICE-2.0 measurements deferred it (alloc share 0.5%, gate ≥10%). The findings in this review do not re-trigger that slice. Atomic multi-slice transitions (§3) are easier with a single store; splitting would push consistency into the caller.
  • Replacing the lock with a Channel<T> of reducers. The lock + sync dispatch is correct and well-instrumented. A channel introduces a queue depth, lag, and a separate failure mode without addressing any finding.
  • Adding a state-diff visualization or DevTools-style panel. Tempting under §6b but disproportionate. A JSON file an engineer can cat or load into any viewer covers the use case at a fraction of the build cost.
  • Persisting the full AppState for crash recovery. The current pattern (durable slices hydrated from purpose-built stores; ephemeral slices reset to Initial) is correct for an industrial app where hardware state must be re-observed on restart, not replayed from disk. Persisting MotionState, CameraState, etc. across crashes would lie about hardware reality.

Conclusion

The user's hypothesis going in — "the design is JS-frontend-shaped and over-abstracted" — does not survive contact with the code. What is shaped like a Redux store is the vocabulary of the 2026-05-09 Architectural Foundations doc, not the implementation; the implementation is markedly leaner than what that doc endorses.

The real gaps are in four domains, none of which is "JS-frontend pattern":

  1. Subscriber dispatch concurrency — a real bug class, addressable in a single file edit (§1, recommendation 1).
  2. Operational forensics for a fab deployment — no on-demand state dump, no equipment-state transition log, no E10 projection (§5, §6b, §7, recommendations 3–6). These are not over-abstraction; they are under-abstraction. The right response is to add a single durable transition log and one pure projection function — both new mechanisms, not refactors.
  3. Craftsmanship and domain fit — simulator vocabulary leaks into production state (§13), the record allows physically-impossible state combinations (§14), the legacy StateChanged event runs in parallel with Subscribe<T> (§16), and a null! workaround papers over a default-value nullability issue (§16). None of these are large; together they are what separates "competent" from "world-class for this domain."
  4. A small list of housekeeping items — F-1 through F-4 plus the diagnostics-in-state finding.

None of this argues for replacing the store, splitting it, or wrapping it in more layers. The pattern is correct; the gaps are at its edges and in its naming. A focused follow-up slice (recommendations 1–3, 8–14) would lift the design to the bar the user articulated — "world-class framework that fits perfectly with the domain problem" — without changing its shape.

Docs-first project memory for AI-assisted implementation.