AppState Design Review
- Date: 2026-05-12
- Status: Findings draft. Scores the current
AppStatestore against the criteria locked in2026-05-12-appstate-design-criteria.md. No code has been changed; recommendations are filed for separate slices. - Audience: anyone deciding whether to refactor
AppState, anyone reviewing the 2026-05-09 Architectural Foundations doc's §1 (State management), anyone evaluating which Phase 2 slices remain on the table. - Companion docs: Criteria, Architectural Foundations §1, Canonical AppState Review (2026-04-22)
TL;DR
The hypothesis that triggered this review — "the AppState design is JS-frontend-shaped and over-abstracted" — is not what the code shows. The current design is minimalist: one immutable record, one mutating method (Update(Func<AppState, AppState>)), one selector-based subscription primitive. There are no actions, no reducers-as-discriminated-unions, no middleware, no thunks, no DevTools. The ceremony budget is well under what Redux/MobX would require.
What the review does find: a small number of concrete correctness, auditability, and operational-fitness gaps that have nothing to do with abstraction and everything to do with the store being slightly under-specified for a fab environment, plus a handful of craftsmanship items — vocabulary leakage, type-level laxity, dead surface — that hold the code back from "world-class for this domain." Most are addressable with focused, small changes rather than a pattern swap.
Scorecard at a glance — 7 pass, 1 pass-by-accident, 6 partial, 2 fail (definitions below the table).
Scorecard
| # | Criterion | Verdict | Headline |
|---|---|---|---|
| 1 | Threading & ownership | Partial | Store mutations are correctly serialized; subscriber dispatch is not. State can be observed regressing at a subscriber under concurrent writes. |
| 2 | Backpressure & bounded memory | Pass (by accident) | Channels in front of the store handle backpressure; the store itself has no contract and would stall all writers behind one slow subscriber. |
| 3 | Consistency between related slices | Pass | Reducers compose atomically; with clones ensure all-or-nothing transitions. Subscriber ordering caveat is filed under §1, not here. |
| 4 | Crash & fault recovery | Partial | Durable vs. ephemeral boundary is real (Sqlite stores) but not visible at the AppState type level. Crash file captures only 4 fields + 50 diag lines, not full state. |
| 5 | Auditability | Partial | Serilog file logs + RecentDiagnostics (bounded 200) + store.update.calls counter exist, but no single "who changed what slice when" record. Reducer effects are not logged. |
| 6a | Operator-visible state | Pass | 20 distinct state slices subscribed by MainViewModel; operator-facing readout is dense and covers connection, workflow, motion, alarms, defects, history, diagnostics. |
| 6b | Engineer deep-dive snapshot | Fail | No on-demand state dump. Crash file is the closest artifact and only fires on crash. Service engineer cannot pull current state without rebuilding or attaching dotnet-counters. |
| 7 | Equipment-state reporting (E10) | Fail | No projection from (ConnectionState, WorkflowState, MotionState, Safety, ActiveAlarms) into SEMI E10 categories. No timestamped equipment-state-transition log. |
| 8 | Determinism in tests | Pass | Reducers are Func<AppState, AppState>; state transitions test as pure functions. AppState is a record with structural equality. |
| 9 | Marginal cost of new field | Pass | Adding a slice touches AppState.cs (ctor + Initial) and the writer site only. with-clones automatic. |
| 10 | Cognitive load | Pass | Trace is "click → Command → Service.Update(reducer) → Subscribe fires → ViewModel updates." No framework vocabulary required beyond "selector + comparer". |
| 11 | Earned abstraction | Pass | Three concrete benefits: single mutation point + caller-dimensioned metrics + change-detected subscriber dispatch. JS ceremony is genuinely absent. |
| 12 | What the store is NOT | Partial | Cleanly separated from workflow engine and persistence. But RecentDiagnostics is an event log embedded in state, and PipelineCounters / OperationalCounters duplicate counters that also live in AppMetrics. |
| 13 | Reads as domain prose | Partial | Most fields use fab vocabulary (Cassette, LoadedRecipe, ActiveAlarms, LatestTagValues). Simulator vocabulary (SelectedSimulatorProfile, SimulatorProfileCatalog) leaks into production AppState; SLICE-2.7 renamed it inside RunSummary but the store fields were not touched. |
| 14 | Invalid states unrepresentable | Partial | The record allows invalid pairs (WorkflowState=Idle + ActiveRun!=null, ConnectionState=Disconnected + LatestFrame!=null, WorkflowState=Faulted + empty ActiveAlarms). Reducer discipline is the only enforcement. Acceptable for the prototype; explicit if a fab integration needs stronger guarantees. |
| 15 | API surface — small, honest names | Pass (with blemishes) | Four-member IAppStateStore surface is right-sized. Most names are clear. RegisterTagsActiveProvider names the mechanism not the intent; WithClearedCrashBanner reads awkwardly. Minor. |
| 16 | No accidental complexity / dead surface | Partial | No TODO/HACK in the Application layer. But: legacy StateChanged event runs in parallel with Subscribe<T> for test-fake back-compat; null! workaround at AppState.cs:36; MaxDiagnosticsEntries magic number in the wrong file. |
Verdict legend:
- Pass — current design meets the criterion on merit. No action needed.
- Pass (by accident) — the criterion is met because of something outside the store; no contract in the store guarantees it. Document or move into the contract.
- Partial — meets the spirit, fails on a specific concrete point. Smallest-change fix exists.
- Fail — the design has no mechanism for this. Needs a new mechanism.
The shape of the current design (one-paragraph recap)
AppState (src/InspectionPrototype.Application/State/AppState.cs:12-72) is a positional record with 24 fields and an Initial factory. IAppStateStore (src/InspectionPrototype.Application/Abstractions/IAppStateStore.cs:11-84) exposes Current, a legacy StateChanged event, Update(Func<AppState, AppState>), and Subscribe<T>(selector, onChange, comparer). AppStateStore (src/InspectionPrototype.Application/Services/AppStateStore.cs:14-149) holds _current behind a System.Threading.Lock, runs the reducer inside the lock, releases the lock, then fires the legacy event and the typed subscribers outside the lock. AppStateExtensions (src/InspectionPrototype.Application/Services/AppStateExtensions.cs) provides WithDiagnosticsEntry, WithRunTerminated, etc. as small composable mutators. MainViewModel consumes 20 state slices and 16 command-guard slices via Subscribe<T> (src/InspectionPrototype.Presentation/ViewModels/MainViewModel.cs:373-822), each callback marshaled to the UI thread via _dispatcher.Invoke.
That's the whole pattern. It is much closer to "central mutable cell with snapshot semantics and a typed observer" than to Redux.
Findings by criterion
1. Threading & ownership — Partial
Pass: state mutation is correctly serialized. The reducer runs inside lock (_lock) (AppStateStore.cs:45-52); concurrent writers cannot tear the snapshot. _current is published under the lock and read under the lock (AppStateStore.cs:26-29).
Fail point — subscriber dispatch is not serialized. Two threads T1 and T2 both calling Update produce two state transitions s0 → s1 (by T1) and s1 → s2 (by T2), but after each thread releases the lock it enters an unsynchronized dispatch phase:
// AppStateStore.cs:58-65
StateChanged?.Invoke(next);
ISubscription[] snapshot;
lock (_lock) snapshot = _subscribers.ToArray();
foreach (var sub in snapshot)
sub.OnStateChanged(next);If T2's foreach interleaves with T1's, a subscriber S receives OnStateChanged(s2) before OnStateChanged(s1) — i.e. the projection it sees regresses. Inside Subscription<T>.OnStateChanged (AppStateStore.cs:129-139), _last is read and written without any lock, so the change-detection check _comparer.Equals(_last, next) races against another thread executing the same path.
Why this hasn't bitten yet: in practice, the main writers (WorkflowService — 32 Update calls — and the three BackgroundService pipelines) tend to write sequentially within their own contexts, and the UI subscribers all marshal to the UI thread via _dispatcher.Invoke, which re-serializes everything on the UI thread before any state-derived work runs. So the race is masked at the UI boundary. But: any non-UI subscriber would observe it, and any future subscriber-driven side effect (e.g. SEMI E10 reporting, audit log emission) would be exposed.
Recommendation: make the dispatch phase part of the locked section, or use a per-store dispatch queue that serializes notifications. The current cost (subscriber callbacks under lock) is bounded by the SLICE-2.4 invocation rate (1.12 callbacks per Update on average), so the lock-hold extension is small.
Bonus observation: System.Threading.Lock is non-reentrant. A reducer that triggers Update (transitively, e.g. by calling _logger.LogWarning which routes through DiagnosticsTimelineSink — DiagnosticsTimelineSink.cs:42) would deadlock. The design relies on the discipline "don't log Warning-or-above from inside a reducer," but nothing enforces it. Worth a comment at the Update method explicitly forbidding it, or a recursion guard on the calling thread.
2. Backpressure & bounded memory — Pass (by accident)
Producers (SimulatedTagSource, SimulatedCamera, SimulatedEncoderSource) sit behind bounded Channel<T> with DropOldest policy (TagStreamPipelineService.cs:14-19, FramePipelineService.cs:14-19). When the consumer (the pipeline service that calls _store.Update) falls behind, the producer drops at the channel boundary — never at the store. So unbounded queueing into the store is structurally prevented.
But: the store itself has no subscriber-latency contract. If a future subscriber were slow (e.g. a synchronous DB write inside onChange), Update would block on it because dispatch is synchronous. Every writer — pipelines, workflow, crash reporter, diagnostics sink — would stall on that one subscriber. There is no fast-fail, no timeout, no "this subscriber is too slow" telemetry.
Recommendation: add subscriber.callback.latency.micros histogram dimensioned by selector. Document the implicit contract in the IAppStateStore.Subscribe<T> XMLdoc: "callbacks must complete in O(microseconds); long work must marshal off the calling thread." (The XMLdoc currently says callbacks fire on the writer's thread but doesn't bound their cost.)
3. Consistency between related slices — Pass
Every reducer produces a single fully-formed AppState. Multi-slice transitions are explicit and atomic: e.g. WorkflowService writes WorkflowState + ActiveRun + LastRunSummary + OperationalCounters together inside one reducer body (32 Update call sites in WorkflowService.cs, each composing the slices it needs). FramePipelineService.ProcessDefectsForFrame (FramePipelineService.cs:130-141) updates ActiveRun.DefectCount, ActiveRun.DefectsCritical/Major/Minor, and RecentDefects in one reducer.
A subscriber observing a single AppState snapshot cannot see a torn read across slices.
The subscriber-ordering caveat (a subscriber receiving callbacks for s2 then s1) is real but is a §1 finding, not a §3 finding: each individual AppState snapshot is internally consistent.
4. Crash & fault recovery — Partial
Pass: the durable vs. ephemeral boundary is real. RunHistory and recent defects are hydrated at startup from SqliteRunHistoryStore and SqliteDefectStore (per SLICE-3.3 / SLICE-3.1 history). The rest of AppState resets to AppState.Initial (AppState.cs:43-65).
Fail point — the boundary is not visible at the type. Reading the AppState(...) ctor signature, you cannot tell that RunHistory is durable but LatestFrame is not. The 24 fields are typographically uniform. A new contributor adding a new field will not know whether to wire a persistence path.
Fail point — crash files capture only 4 fields plus the last 50 diagnostics entries (CrashReporter.cs:113-132):
sb.AppendLine($"WorkflowState: {snapshot.WorkflowState}");
sb.AppendLine($"ActiveRun Id: {snapshot.ActiveRun?.RunId.ToString() ?? "(none)"}");
sb.AppendLine($"Recipe Name: {snapshot.ActiveRun?.RecipeName ?? ...}");ConnectionState, MotionState, IsMotionHomed, CameraState, Safety, stage position, active alarms, simulator profile, pipeline counters — all absent from the crash file. For post-mortem analysis of "what was the machine doing at the moment of failure," this is thin.
Recommendation A: annotate slices in AppState with a marker (attribute or split type) that says "durable" vs. "ephemeral" vs. "derived." Could be as simple as XMLdoc tags + an XUnit architecture test that asserts every durable slice has a hydration entry point.
Recommendation B: widen CrashReporter.BuildCrashContent to serialize the full snapshot as JSON, or at least the enum/scalar fields. The cost is one JsonSerializer.Serialize(snapshot, opts) call; the value is full forensic state. Defects + frame payloads would need to be excluded by attribute to keep crash files bounded.
5. Auditability — Partial
What exists:
Serilogfile sink (Information+) — full structured log of everyLogXcall.RecentDiagnostics(bounded 200 —AppStateExtensions.cs:11) — operator-visible warnings+ rolling window.store.update.callscounter dimensioned bycaller = FileName.MethodName(AppMetrics.cs:65) — you can see which method calledUpdateand how often.
What does not exist:
- A record of what changed. The reducer is opaque; you see "WorkflowService.StartRunAsync called Update at t=...", you do not see "
WorkflowState: Idle → Running, ActiveRun: null → {RunId=...}". - A durable equipment-event log distinct from
RecentDiagnostics.RecentDiagnosticsis bounded and in-memory; it's lost on restart and rolls off at 200 entries. - A way to answer "why did the system command X at 14:03:22" with a single artifact.
For SEMI E10 reporting (criterion 7), reconstructing equipment-state transitions from Serilog text logs requires regex over message strings — not robust.
Recommendation: introduce a StateTransitionLog slice — append-only, durable, schema'd records (timestamp, actor, slicesChanged, before, after) written from AppStateStore.Update itself (i.e. the store records its own deltas). This is mechanically possible because the store has both _current (before) and next (after) inside the lock. Cost: one structured-log emit per Update, dimensioned. Sqlite-persisted via the existing MigrationRunner. This single mechanism feeds criteria 5, 6b, and 7 simultaneously.
6a. Operator-visible state (touchscreen) — Pass
MainViewModel subscribes to 20 distinct state slices for operator readout (MainViewModel.cs:373-707):
ConnectionState/WorkflowState/MotionState/CameraState/IsMotionHomed, MachineReadyDerivation, StagePosition, LoadedRecipe, RecipeCatalog, SimulatorProfiles, SelectedSimulatorProfileName, ActiveRun+WorkflowState, TagSummary (temperature, pressure), LatestFrame, CurrentFrameBitmap, PipelineCounters, SafetySignals, ActiveAlarms, LastRunSummary, RunHistory, RecentDiagnostics, OperationalCounters, CrashBanner, RecentDefects, Cassette.
Every operator-relevant slice is reachable from the UI. No state value drives an operator decision that the operator cannot see.
6b. Engineer deep-dive snapshot — Fail
There is no facility for a service engineer to pull current state off a deployed machine. Available artifacts are:
%LOCALAPPDATA%\InspectionPrototype\crashes\crash-*.txt— only on crash, and only 4 fields + 50 diag lines as noted in §4.Serilogfile sink — text log, requires manual reconstruction.dotnet-counters monitor --name InspectionPrototype.App— requires a laptop withdotnet-countersinstalled and the ability to attach to the running process; useful for live observation, not for "snapshot the current state to a file I can email."
There is no menu command, no debug endpoint, no file-based dump. Engineer cannot ask the running app for its AppState.
Recommendation: add a "Save State Snapshot…" service-engineer command (gated by a hidden key chord or a service-mode flag — not exposed to operators) that serializes _store.Current to JSON and writes it next to the crash directory. Composable with the §4 recommendation; same serialization path.
7. Equipment-state reporting (SEMI E10-aligned) — Fail
AppState has the ingredients for E10 classification — ConnectionState, WorkflowState, MotionState, Safety, ActiveAlarms — but there is no projection that emits (productive / standby / engineering / scheduled-down / unscheduled-down / non-scheduled). There is also no equipment-state-transition log with the timestamp precision E10 requires.
A fab integration that needed E10 reporting today would have to:
- Subscribe to multiple slices via
Subscribe<T>(s => (s.ConnectionState, s.WorkflowState, s.Safety, s.ActiveAlarms.Count), ...). - Compute the E10 category in the subscriber.
- Maintain its own transition log externally.
Step 3 is the real cost: there's nowhere natural to put it. The store doesn't keep transitions; RecentDiagnostics is bounded; Serilog is text. The §5 recommendation (durable StateTransitionLog) would resolve this — the E10 projector becomes a stateless function over slices, and the transition timestamps come from the store's own delta log.
Recommendation: treat E10 as the forcing function for §5's recommendation. Build the durable transition log first; add an EquipmentStateProjector (pure function AppState → E10State) and a service that subscribes and writes E10 transitions to the same log. This avoids inventing a parallel mechanism.
8. Determinism in tests — Pass
Reducers are Func<AppState, AppState>. Tests of "given state S, action A produces state S′" are pure function calls. AppState is a record with structural equality, so assertion is Assert.Equal(expected, actual).
No dispatcher, no scheduler, no UI thread required to test a state transition.
The store itself can be instantiated as new AppStateStore(new AppMetrics()) for tests that need subscription semantics. AppStateStoreSubscribeTests (tests/.../AppStateStoreSubscribeTests.cs) and AppStateStoreInstrumentationTests exist and run as part of the 568-test suite.
9. Marginal cost of a new field — Pass
To add a state slice:
- Add it to
AppState's positional record signature (AppState.cs:12-37). - Initialize it in
AppState.Initial(AppState.cs:43-65).
Existing reducers using state with { ... } automatically preserve the new field — they don't need to be touched.
Optional follow-ups:
- A new
WithXyzextension inAppStateExtensionsif the mutation is composable. - One or more new
Subscribe<T>call sites if anything needs to react.
No dispatcher registration, no action type, no reducer-table entry. The cost is genuinely 1–2 files.
10. Cognitive load — Pass
Trace, top-to-bottom, no framework vocabulary required:
User clicks button →
RelayCommandfires → handler callsIWorkflowService.StartAsync(...)→ service calls_store.Update(state => state with { WorkflowState = ... })→ store mutates_currentunder lock → subscribers fire →MainViewModelsubscriber marshals to UI thread via_dispatcher.Invoke→[ObservableProperty]setter raises INPC → XAML binding updates.
The only non-obvious wrinkle is the Subscribe<T> comparer parameter — collection-typed selectors need ReferenceEqualityComparer.Instance because AppState uses copy-on-write and an unchanged collection retains the same reference. This is documented in IAppStateStore.cs:70-74 and is a real piece of vocabulary, but it's bounded.
11. Earned abstraction — Pass
What the abstraction buys:
- Single mutation point. Every writer goes through
IAppStateStore.Update; analyzers and tests can assert this. Without the store, you'd have 30+ services each holding their own slice — the canonical AppState ADR-001 motivation. - Caller-dimensioned telemetry.
[CallerMemberName]+[CallerFilePath]injected at everyUpdatesite givestore.update.calls{caller="WorkflowService.StartRunAsync"}automatically (AppStateStore.cs:33-40). The SLICE-2.0 measurement work depended on this. - Change-detected selector subscription.
Subscribe<T>runs the selector on every state change and firesonChangeonly when the projected value differs. Pre-SLICE-2.4 (full-stateStateChangedevent), this dispatched 20+ callbacks perUpdate; post-SLICE-2.4 it averages 1.12 perUpdate. Real work that pays its way.
What the abstraction does not require: actions as discriminated unions, action creators, dispatchers, middleware, thunks, sagas, effect handlers, observable stores, time-travel state, reducer registration tables, schema. None of these are present, and none are needed for the store's responsibilities.
12. What the store is NOT — Partial
Pass: the store is not a workflow engine (WorkflowService is 871 lines and holds the state machine; the store just holds the current WorkflowState enum value). It is not a persistence layer (SqliteRunHistoryStore, SqliteDefectStore, JsonRecipeCatalog are separate; hydration is explicit at startup).
Fail points:
RecentDiagnosticsis an event log embedded in state.WithDiagnosticsEntry(AppStateExtensions.cs:17-29) appends to a boundedList<DiagnosticsEntry>capped at 200. This is a stream of events, not a state value. The fact that it lives inAppStatemakes every state mutation that touches diagnostics produce a full list copy + cap. Plus it conflates "current state" with "recent history." A separateIDiagnosticsLoginterface with its own bounded ring buffer (and durable backing per §5) would be cleaner — andAppStatewould lose a field whose presence muddles the "single source of truth" intent.Counters are duplicated.
AppState.PipelineCountersandAppState.OperationalCounters(AppState.cs:30-31) hold counts that also live inAppMetrics(AppMetrics.cs:42-48:FramesIngested,FramesDropped, etc.). The duplication exists because UI binding is easier offAppStatethan offMeterinstruments. Acceptable as a pragmatic call, but it means every increment is two writes — one through_store.Updateand one through_metrics.X.Add(1). Worth a comment inAppStateexplaining the duplication and pointing atAppMetricsas the authoritative live counter; otherwise a contributor will inevitably ask "which one do I read?"CrashBanneris a UI-state value inAppState. Borderline — it represents persistent UI state ("user has not yet dismissed the crash banner") and surviving across view recreations is the whole point. Arguably fine. Filed as a minor finding.
13. Reads as domain prose — Partial
Most of AppState's 24 fields carry domain weight. Cassette, LoadedRecipe, RecipeCatalog, ActiveRun, RunHistory, ActiveAlarms, LatestTagValues, LatestFrame, RecentDefects, StageX/Y, IsMotionHomed, Safety — a fab engineer reads these as fab concepts, not as state-management generics.
Failure points:
Simulator vocabulary in production state.
SelectedSimulatorProfileandSimulatorProfileCatalog(AppState.cs:32-33) carry the prototype's simulator-shaped reality into what should be domain state. SLICE-2.7 already recognized this concern and renamedRunSummary.SimulatorProfileName → OperatingProfileName(perroadmap-progress.md, 2026-05-11). But theAppState-level fields were not renamed in the same pass. The vocabulary leak persists at the store surface.Scaffolding-not-domain names.
PipelineCountersandOperationalCounters(AppState.cs:30-31) name what they are (counter aggregates) rather than what they describe (frame/run throughput; equipment-run history). Acceptable for instrumentation; worth a comment so they don't become a template for new fields.AppStateitself is a generic name. ADR-001 enshrined it and changing the type name would ripple through hundreds of references. Not a finding I'd recommend acting on, but worth naming: a green-field design might chooseMachineStateorInspectionStatefor the domain hook.
Recommendation: rename SelectedSimulatorProfile/SimulatorProfileCatalog to ActiveOperatingProfile/OperatingProfileCatalog (matching the SLICE-2.7 rename of RunSummary.OperatingProfileName). Mechanical edit; would have to be a passover slice with the rename + tests + migration of any persisted column names.
14. Invalid states unrepresentable — Partial
The record definition (AppState.cs:12-37) allows several physically-impossible combinations to be constructed by with { }:
WorkflowState = Idletogether withActiveRun != null. A machine cannot be "idle" and "have a run in progress." Type allows it.ConnectionState = Disconnectedtogether withLatestFrame != nullorLatestTagValuespopulated. No camera, no frames; no tag stream, no tag values. Type allows it.WorkflowState = FaultedwithActiveAlarmsempty. A fault without an alarm is contradictory in the model. Type allows it.IsMotionHomed = truetogether withMotionState = NotReady. Contradictory. Type allows it.
The enforcement lives in the reducers and in WorkflowService (871 lines, the state machine). This is acceptable as long as nothing constructs AppState instances outside the store. In practice, nothing does. But the type surface is informationally misleading — a reader of AppState.cs cannot tell which combinations are legal.
A stricter design would use a discriminated union over WorkflowState, with the run/recipe/cassette context bundled into the active variant:
abstract record WorkflowPhase;
record Idle() : WorkflowPhase;
record Running(ActiveRunState Run, Recipe Recipe) : WorkflowPhase;
record Faulted(Alarm RootCause, IReadOnlyList<Alarm> AdditionalAlarms) : WorkflowPhase;
// ...This is a real refactor — every reducer site, every selector, every command-guard subscriber would need to change. It would also pay for itself only if the prototype is moving toward fab integration where invariants must be enforceable across teams. For the current scope, I do not recommend it. Filing the option here so it's a known choice, not an oversight.
Recommendation: at minimum, add [Pure] / XMLdoc-level invariants on AppState that name the legal combinations, and an architecture test that asserts a sample of forbidden constructions throws. This makes the discipline explicit without a rewrite.
15. API surface — small, intentional, honestly named — Pass (with blemishes)
The IAppStateStore surface is four members: Current, StateChanged, Update, Subscribe<T>. Right-sized.
Method names mostly meet the bar:
Update(reducer)— names the action.Subscribe<T>(selector, onChange, comparer)— standard observer vocabulary, well-typed.Current— clear.WithDiagnosticsEntry,WithRunTerminated,WithAlarmRaised,WithCatalogResults— record-Withidiom; reads naturally.
Blemishes:
AppMetrics.RegisterTagsActiveProvider(AppMetrics.cs:122-123). The name describes the mechanism (registering a callback) rather than the intent (supplying the polling source for theTagsActivegauge). Why a provider pattern? Because the gauge is created inAppMetrics's constructor but the producer that knows the count (SimulatedTagSource) isn't constructed yet. A cleaner alternative: pass the provider to the constructor; or exposeTagsActiveasIObserver<long>-style instead of an observable gauge. Tiny finding; named for honesty about API quality.CrashStateExtensions.WithClearedCrashBanner(CrashStateExtensions.cs:26-27).WithCrashBannerClearedwould read more naturally and match theWith{Noun}{Adjective}pattern. Trivial.[CallerMemberName]/[CallerFilePath]onUpdateandSubscribe<T>are public surface that call sites do not pass. The XMLdoc tells the reader "do not pass explicitly" — but they're still public parameters. This is the canonical .NET idiom for caller-attribute capture and acceptable; alternatives (separateUpdateWithCalleroverloads) would be worse.
No finding requires a recommendation beyond items 1–2 above as small renames.
16. No accidental complexity, no dead surface — Partial
What's clean:
- Zero
TODO/HACK/FIXMEin the Application layer (verified by repository-wide search). - No commented-out code paths in the store / extensions / pipelines.
- Helper methods in
AppStateExtensionsare each used by exactly one writer; nothing speculative.
What isn't:
Dual notification channels.
IAppStateStore.StateChanged(legacy full-state event) runs in parallel withSubscribe<T>(per-slice). Both fire on everyUpdate. XMLdoc atIAppStateStore.cs:18-24says "retained for back-compat with test fakes; new consumers should use Subscribe<T>." Production callers were migrated in SLICE-2.4. Two notification paths exist solely because test fakes haven't been touched yet.null!workaround atAppState.cs:36.IReadOnlyList<Defect> RecentDefects = null!. This silences the nullable-reference-types warning for a default value the record's positional ordering requires. The actualnullvalue never leaks (theInitialfactory and every reducer set it to[]). But the type signature lies about nullability. The fix is to drop the default value (moveRecentDefectsearlier in the positional list, or convert it to a non-defaulted required field). Both touch everyAppStateconstruction site once.MaxDiagnosticsEntries = 200magic number inAppStateExtensions.cs:11. This is a domain bound that callers might reasonably want to read (e.g. for UI virtualization or test fixtures). It lives buried in aninternal static classwhileRecentDefectsWindow = 100lives publicly onAppState. Inconsistent + arguably mis-located.Two write paths for diagnostics.
WithDiagnosticsEntry(AppStateExtensions.cs:17-29) andDiagnosticsTimelineSink.Emit(DiagnosticsTimelineSink.cs:42) both produce diagnostics entries — the latter via the Serilog pipeline, the former by direct call. Not a duplication exactly; each has its own caller pattern (Serilog Warning+ vs. direct call from reducers). But the path-from-source-to-entry is non-obvious. A reader seesWithDiagnosticsEntryin two unrelated places and has to reason about which Serilog levels also get there. Worth a flow diagram or a "two paths in" comment onRecentDiagnostics.
Recommendations:
- R-16a: retire
StateChanged(already in §11 follow-ups list as F-1). Net change: remove ~5 lines + migrate test fakes (a few file edits). - R-16b: drop the
null!default — promoteRecentDefectsin positional order or make it required. Single-pass edit. - R-16c: move
MaxDiagnosticsEntriestoAppStateaspublic const int RecentDiagnosticsWindow = 200, matchingRecentDefectsWindow. One-line move. - R-16d: comment the two-paths-in nature of
RecentDiagnosticsand link toDiagnosticsTimelineSink.
Cross-cutting findings
These are observations that span multiple criteria and are worth surfacing once:
F-1. The legacy StateChanged event is still on IAppStateStore. XMLdoc (IAppStateStore.cs:18-24) says "retained for back-compat with test fakes; new consumers should use Subscribe<T> instead." The SLICE-2.4 work migrated production callers; if test fakes can be migrated too, removing it shrinks the public surface, removes a hidden full-state notification per Update, and forces all subscribers through the change-detected path. Filed as a follow-up to SLICE-2.4.
F-2. AppStateStore.Current takes the lock to read. _current is a reference; reference assignment is atomic on .NET; the publish path inside Update happens before subsequent reads regardless of locking in the read path. The Volatile.Read(ref _current) pattern (or just _current after marking it volatile) would be sufficient and would remove a small but real source of Current-getter contention. Not a correctness issue — performance/contention only.
F-3. The MaxDiagnosticsEntries = 200 constant lives in AppStateExtensions.cs while RecentDefectsWindow = 100 lives on AppState. Inconsistent. Either both should live on AppState (as public bounds that callers can reference) or both should live in the extension that mutates them. Tiny but worth fixing while the area is being touched.
F-4. Subscribers with s => true (MainViewModel.cs:772, 779) fire on every state change. Two of the 16 command-guard subscriptions use this pattern because the commands have no specific guard. They each cost one callback per Update, partially undermining the SLICE-2.4 subscriber-reduction story for those two commands. Acceptable, but a comment at the call sites pointing this out would prevent the pattern from being copy-pasted elsewhere.
Recommendations, ranked
In order of value-per-effort:
Serialize subscriber dispatch. Extend the locked region in
AppStateStore.Updateto include theforeach (var sub in snapshot)loop, or move dispatch into a serial queue. Fixes criterion 1 outright; sub-millisecond cost given current subscriber rates. Single file edit. (Maps to: §1.)Document the subscriber-latency contract + add
subscriber.callback.latency.microshistogram. Comment the implicit "callbacks must be cheap" rule onIAppStateStore.Subscribe<T>. Add a histogram so a future slow subscriber is visible indotnet-countersrather than discovered via field complaints. (Maps to: §2.)Widen the crash file to a full JSON snapshot. Replace the hand-written 4-field block in
CrashReporter.BuildCrashContentwithJsonSerializer.Serialize(snapshot, indented). Add[JsonIgnore]to byte-array slices (LatestFrame.Pixels) so files stay small. (Maps to: §4 and feeds §6b.)Add a service-mode "Save State Snapshot…" command. Same serialization path as #3, on-demand from the UI rather than crash-triggered. Gated by a service-mode flag so operators don't see it. (Maps to: §6b.)
Introduce a durable
StateTransitionLog. Append-only log written fromAppStateStore.Updateitself capturing(timestamp, caller, sliceDelta). Sqlite-persisted via the existingMigrationRunner. This single mechanism is the load-bearing piece for criteria 5, 6b, and 7 — without it, those criteria are unmeetable; with it, all three become reachable. (Maps to: §5, §6b, §7.)Add
EquipmentStateProjector(pure function over slices). Once #5 exists, the E10 projection is a small additional file:AppState → E10Statepure function + a thin service that subscribes, projects, and writes transitions to the log from #5. (Maps to: §7.)Lift
RecentDiagnosticsout ofAppStateintoIDiagnosticsLog. Once a durable log exists, the in-memory diagnostics ring buffer is a UI affordance, not state.AppStateshrinks by one field,WithDiagnosticsEntrygoes away, and the "events vs. state" line gets sharper. (Maps to: §12 finding 1.)Annotate durable vs. ephemeral slices on
AppState. XMLdoc-only at minimum; an architecture test that asserts the annotations match hydration code is the stronger form. Low cost, high contributor-clarity payoff. (Maps to: §4.)Retire the legacy
StateChangedevent once test fakes are migrated. Pure cleanup; shrinks the public surface. (Maps to: F-1.)Cleanup pass:
_currentread without lock;MaxDiagnosticsEntrieslocation; comment ons => trueguard subscriptions;null!removal;WithClearedCrashBanner→WithCrashBannerCleared. Tiny edits, none correctness-affecting. (Maps to: F-2, F-3, F-4, R-16b, R-16c, §15 blemish 2.)Rename
SelectedSimulatorProfile/SimulatorProfileCatalog→ActiveOperatingProfile/OperatingProfileCatalog. Closes the SLICE-2.7 rename that stopped atRunSummary. Vocabulary cleanup; mechanical edit with one migration touch for any persisted column. (Maps to: §13 finding 1.)Add invariants documentation + an architecture test for forbidden
AppStateconstructions. Names the legal combinations of(WorkflowState, ActiveRun, ConnectionState, ...)explicitly; asserts that forbidden constructions don't pass an architecture test. Cheaper than a discriminated-union refactor; preserves §10 cognitive load. (Maps to: §14.)Comment the two-paths-in nature of
RecentDiagnostics. One-comment edit onAppState.RecentDiagnosticspointing at bothWithDiagnosticsEntryandDiagnosticsTimelineSink. (Maps to: R-16d.)Rename
RegisterTagsActiveProvideror restructureAppMetricsconstruction to remove the post-hoc-registration pattern. Either pass the provider to theAppMetricsconstructor (cleanest) or rename to express intent (UseTagsActiveSource). (Maps to: §15 blemish 1.)
Total roadmap impact: items 1, 2, 3, 8, 9, 10, 11, 12, 13, 14 are mechanical and could land in a single small slice ("SLICE-2.9: AppState design + craftsmanship follow-ups"). Items 4, 5, 6, 7 are larger — they enable real new capability (E10, service-engineer snapshots) and likely warrant their own slice in Phase 3 or 4.
What we explicitly chose NOT to recommend
Listed so future drift toward these is informed, not accidental:
- Adopting actions / reducer tables / middleware / a state-change pipeline. None of these address the findings; all of them would raise the cognitive-load cost (§10) and the marginal-cost-per-field (§9) without improving correctness. The current
Func<AppState, AppState>form is correct and minimal. - Splitting
AppStateinto multiple sub-stores. The 2026-05-09 doc considered this as SLICE-2.1; SLICE-2.0 measurements deferred it (alloc share 0.5%, gate ≥10%). The findings in this review do not re-trigger that slice. Atomic multi-slice transitions (§3) are easier with a single store; splitting would push consistency into the caller. - Replacing the lock with a
Channel<T>of reducers. The lock + sync dispatch is correct and well-instrumented. A channel introduces a queue depth, lag, and a separate failure mode without addressing any finding. - Adding a state-diff visualization or DevTools-style panel. Tempting under §6b but disproportionate. A JSON file an engineer can
cator load into any viewer covers the use case at a fraction of the build cost. - Persisting the full
AppStatefor crash recovery. The current pattern (durable slices hydrated from purpose-built stores; ephemeral slices reset toInitial) is correct for an industrial app where hardware state must be re-observed on restart, not replayed from disk. PersistingMotionState,CameraState, etc. across crashes would lie about hardware reality.
Conclusion
The user's hypothesis going in — "the design is JS-frontend-shaped and over-abstracted" — does not survive contact with the code. What is shaped like a Redux store is the vocabulary of the 2026-05-09 Architectural Foundations doc, not the implementation; the implementation is markedly leaner than what that doc endorses.
The real gaps are in four domains, none of which is "JS-frontend pattern":
- Subscriber dispatch concurrency — a real bug class, addressable in a single file edit (§1, recommendation 1).
- Operational forensics for a fab deployment — no on-demand state dump, no equipment-state transition log, no E10 projection (§5, §6b, §7, recommendations 3–6). These are not over-abstraction; they are under-abstraction. The right response is to add a single durable transition log and one pure projection function — both new mechanisms, not refactors.
- Craftsmanship and domain fit — simulator vocabulary leaks into production state (§13), the record allows physically-impossible state combinations (§14), the legacy
StateChangedevent runs in parallel withSubscribe<T>(§16), and anull!workaround papers over a default-value nullability issue (§16). None of these are large; together they are what separates "competent" from "world-class for this domain." - A small list of housekeeping items — F-1 through F-4 plus the diagnostics-in-state finding.
None of this argues for replacing the store, splitting it, or wrapping it in more layers. The pattern is correct; the gaps are at its edges and in its naming. A focused follow-up slice (recommendations 1–3, 8–14) would lift the design to the bar the user articulated — "world-class framework that fits perfectly with the domain problem" — without changing its shape.