SLICE-004: Operational Maturity

Status: Completed
Date: 2026-04-16
Depends on: Requirements, ADR-001: Use Central App State Store, ADR-002: File-Backed Run History Store Before Database Persistence, ADR-004: Use One Operational Maturity Slice Before Specialized Modules

Goal

Consolidate the remaining core realism work into one bounded slice that makes the prototype easier to inspect, recover, configure, and demonstrate without reopening the architecture again in several tiny follow-up slices.

Why This Slice

After persistent run history and JSON recipe management, the next highest-value improvement is operational maturity:

better diagnostics and runtime visibility
more explicit alarm acknowledgment and recovery semantics
configurable simulator behavior for different teaching and demo scenarios
richer inspection results and live run metrics

Treating these as one umbrella slice keeps the shared context in one place and lets external AI tools work through a medium-sized task pack instead of one very large prompt or several overly fragmented ones.

Requirements Coverage

This slice extends or activates these requirement areas:

03. Functional Scope: inspection results, metrics, fault support, and diagnostics behavior
04. UI and Technical Requirements: diagnostics surface, fault controls, measurable pipeline behavior, and testability
05. Failure Modes and Workflow Requirements: explicit fault handling, blocked commands, and recovery semantics
07. AI Delivery Constraints and Roadmap: grouped medium-sized tasks for AI-assisted implementation

In Scope

extend canonical app state with structured diagnostics timeline and operational counters
provide a richer operational workspace in the UI for alarms, diagnostics, fault injection, live metrics, and selected simulator profile
harden alarm acknowledgment, fault clearance, and recovery or reset semantics
introduce named simulator profiles loaded from configuration
enrich live inspection results and persisted run summaries with more useful metrics
keep new behavior testable without launching the full UI where practical

Out of Scope

historical charts or long-term analytics dashboards
a new multi-page or multi-window application architecture
advanced image synthesis or realistic computer vision output
hot reload or editing UI for simulator profiles
explicit adoption of a third-party state machine library
performance instrumentation as a required outcome of this slice

Runtime Behavior

Operational Workspace

The app should expose one richer diagnostics-oriented workspace or pane rather than scattering these features across several unrelated screens.

That workspace should make it possible to see:

active alarms
recent diagnostics timeline entries
injected fault controls
selected simulator profile
live run metrics and relevant counters

The goal is not a full production HMI, but a believable operator and developer surface for understanding what the system is doing.

Diagnostics Timeline

The system must maintain a structured diagnostics timeline in canonical app state.

At minimum, timeline entries should capture:

timestamp
severity or importance
source or subsystem
short message
optional run correlation data where useful

The timeline must be bounded so it cannot grow without limit during long sessions.

The timeline should record major operational events such as:

connect and disconnect
recipe load or refresh events
homing
run start, stop, abort, complete, and fault
alarm acknowledgment
recovery or reset
simulator profile changes

Alarm Lifecycle and Recovery

Alarm handling must become more explicit than the first slices.

For this slice:

acknowledgment is separate from condition clearance
acknowledgment marks that the operator has seen the alarm but does not by itself re-enable blocked commands
critical faults still transition active work to Faulted
after a critical fault, Start, Home, and motion commands remain blocked until the fault condition is cleared and an explicit recovery or reset action occurs
recovery must create diagnostics entries and preserve the history of the faulted run
after successful recovery, the machine may return to Idle or Ready depending on current prerequisites

The slice does not need a large enterprise alarm model, but it must make the distinction between seen, cleared, and recovered behavior explicit.

Simulator Profiles

The simulator should support named profiles loaded from configuration rather than hard-coded runtime constants only.

For the first profile version:

available profiles are loaded at startup
one profile is selected as the active profile
profile selection is visible to the operator
profile changes apply only to future operations and must not silently mutate an active run
profile changes create diagnostics entries

Profiles may shape behavior such as:

motion timing
telemetry cadence
preview frame cadence
defect density or result distribution
fault sensitivity or other safe scenario parameters

Inspection Results and Run Metrics

Inspection results should become more informative than a minimal defect count.

For this slice, the system should expose richer but still simple results such as:

scan points completed versus total
elapsed run duration
total detected defects
defect counts grouped by simple severity or category where practical
selected simulator profile name
completion reason

These metrics should be visible during active work where appropriate and should also flow into persisted run summaries and history projection.

Observability and Counters

Where queues, channels, or coalescing behavior already exist, the system should expose enough counters or diagnostics to understand when data is processed, dropped, or coalesced.

This slice does not require a full telemetry platform, but it should make important backpressure behavior visible through state, diagnostics, logs, or a small diagnostics surface.

Acceptance Criteria

This slice is satisfied only if all of the following are true:

The system records structured diagnostics timeline entries for major operational events including connection, recipe load, homing, run state changes, faults, acknowledgment, recovery, and profile changes.
Diagnostics timeline state is exposed through canonical app state with a documented bounded capacity.
The UI provides an operational workspace or pane showing active alarms, recent diagnostics, fault injection controls, selected simulator profile, and live metrics.
Injecting a critical fault during active work raises an alarm, transitions the workflow to Faulted, preserves the run summary, and blocks invalid commands until the condition is cleared and an explicit recovery or reset occurs.
Alarm acknowledgment is tracked separately from clearance and recovery, and acknowledgment alone does not re-enable blocked commands.
The operator can view and switch between named simulator profiles loaded from configuration, and profile changes apply only to future operations.
Active runs expose richer metrics and results, and persisted run summaries include the richer fields introduced by this slice.
Core timeline, recovery, simulator profile, and result-metric behavior are covered by automated tests.

Verification Notes

The implementation task for this spec must include verification for:

bounded diagnostics timeline behavior
acknowledgment versus recovery guard behavior
faulted-run preservation and post-recovery state transition behavior
simulator profile loading and selection rules
richer run metrics flowing into persisted history
visibility of queue, drop, or coalescing counters where such behavior exists

Domains

Terms

1 Machine Control and Motion Systems

2 Hardware Integration and Device Control

3 Industrial Software Architecture

4 Industrial Communication and Integration

5 Vision, Imaging and Inspection Systems

6 UI HMI Operator Experience

7 Reliability Safety and Production Readiness

Industrial Desktop Systems

Streaming Pipelines Dotnet Real World

SLICE-004: Operational Maturity

Goal

Why This Slice

Requirements Coverage

In Scope

Out of Scope

Runtime Behavior

Operational Workspace

Diagnostics Timeline

Alarm Lifecycle and Recovery

Simulator Profiles

Inspection Results and Run Metrics

Observability and Counters

Acceptance Criteria

Verification Notes

Streaming Pipelines Dotnet Real World

SLICE-004: Operational Maturity ​

Goal ​

Why This Slice ​

Requirements Coverage ​

In Scope ​

Out of Scope ​

Runtime Behavior ​

Operational Workspace ​

Diagnostics Timeline ​

Alarm Lifecycle and Recovery ​

Simulator Profiles ​

Inspection Results and Run Metrics ​

Observability and Counters ​

Acceptance Criteria ​

Verification Notes ​

SLICE-004: Operational Maturity

Goal

Why This Slice

Requirements Coverage

In Scope

Out of Scope

Runtime Behavior

Operational Workspace

Diagnostics Timeline

Alarm Lifecycle and Recovery

Simulator Profiles

Inspection Results and Run Metrics

Observability and Counters

Acceptance Criteria

Verification Notes