Skip to content

SLICE-004: Operational Maturity

Goal

Consolidate the remaining core realism work into one bounded slice that makes the prototype easier to inspect, recover, configure, and demonstrate without reopening the architecture again in several tiny follow-up slices.

Why This Slice

After persistent run history and JSON recipe management, the next highest-value improvement is operational maturity:

  • better diagnostics and runtime visibility
  • more explicit alarm acknowledgment and recovery semantics
  • configurable simulator behavior for different teaching and demo scenarios
  • richer inspection results and live run metrics

Treating these as one umbrella slice keeps the shared context in one place and lets external AI tools work through a medium-sized task pack instead of one very large prompt or several overly fragmented ones.

Requirements Coverage

This slice extends or activates these requirement areas:

In Scope

  • extend canonical app state with structured diagnostics timeline and operational counters
  • provide a richer operational workspace in the UI for alarms, diagnostics, fault injection, live metrics, and selected simulator profile
  • harden alarm acknowledgment, fault clearance, and recovery or reset semantics
  • introduce named simulator profiles loaded from configuration
  • enrich live inspection results and persisted run summaries with more useful metrics
  • keep new behavior testable without launching the full UI where practical

Out of Scope

  • historical charts or long-term analytics dashboards
  • a new multi-page or multi-window application architecture
  • advanced image synthesis or realistic computer vision output
  • hot reload or editing UI for simulator profiles
  • explicit adoption of a third-party state machine library
  • performance instrumentation as a required outcome of this slice

Runtime Behavior

Operational Workspace

The app should expose one richer diagnostics-oriented workspace or pane rather than scattering these features across several unrelated screens.

That workspace should make it possible to see:

  • active alarms
  • recent diagnostics timeline entries
  • injected fault controls
  • selected simulator profile
  • live run metrics and relevant counters

The goal is not a full production HMI, but a believable operator and developer surface for understanding what the system is doing.

Diagnostics Timeline

The system must maintain a structured diagnostics timeline in canonical app state.

At minimum, timeline entries should capture:

  • timestamp
  • severity or importance
  • source or subsystem
  • short message
  • optional run correlation data where useful

The timeline must be bounded so it cannot grow without limit during long sessions.

The timeline should record major operational events such as:

  • connect and disconnect
  • recipe load or refresh events
  • homing
  • run start, stop, abort, complete, and fault
  • alarm acknowledgment
  • recovery or reset
  • simulator profile changes

Alarm Lifecycle and Recovery

Alarm handling must become more explicit than the first slices.

For this slice:

  • acknowledgment is separate from condition clearance
  • acknowledgment marks that the operator has seen the alarm but does not by itself re-enable blocked commands
  • critical faults still transition active work to Faulted
  • after a critical fault, Start, Home, and motion commands remain blocked until the fault condition is cleared and an explicit recovery or reset action occurs
  • recovery must create diagnostics entries and preserve the history of the faulted run
  • after successful recovery, the machine may return to Idle or Ready depending on current prerequisites

The slice does not need a large enterprise alarm model, but it must make the distinction between seen, cleared, and recovered behavior explicit.

Simulator Profiles

The simulator should support named profiles loaded from configuration rather than hard-coded runtime constants only.

For the first profile version:

  • available profiles are loaded at startup
  • one profile is selected as the active profile
  • profile selection is visible to the operator
  • profile changes apply only to future operations and must not silently mutate an active run
  • profile changes create diagnostics entries

Profiles may shape behavior such as:

  • motion timing
  • telemetry cadence
  • preview frame cadence
  • defect density or result distribution
  • fault sensitivity or other safe scenario parameters

Inspection Results and Run Metrics

Inspection results should become more informative than a minimal defect count.

For this slice, the system should expose richer but still simple results such as:

  • scan points completed versus total
  • elapsed run duration
  • total detected defects
  • defect counts grouped by simple severity or category where practical
  • selected simulator profile name
  • completion reason

These metrics should be visible during active work where appropriate and should also flow into persisted run summaries and history projection.

Observability and Counters

Where queues, channels, or coalescing behavior already exist, the system should expose enough counters or diagnostics to understand when data is processed, dropped, or coalesced.

This slice does not require a full telemetry platform, but it should make important backpressure behavior visible through state, diagnostics, logs, or a small diagnostics surface.

Acceptance Criteria

This slice is satisfied only if all of the following are true:

  1. The system records structured diagnostics timeline entries for major operational events including connection, recipe load, homing, run state changes, faults, acknowledgment, recovery, and profile changes.
  2. Diagnostics timeline state is exposed through canonical app state with a documented bounded capacity.
  3. The UI provides an operational workspace or pane showing active alarms, recent diagnostics, fault injection controls, selected simulator profile, and live metrics.
  4. Injecting a critical fault during active work raises an alarm, transitions the workflow to Faulted, preserves the run summary, and blocks invalid commands until the condition is cleared and an explicit recovery or reset occurs.
  5. Alarm acknowledgment is tracked separately from clearance and recovery, and acknowledgment alone does not re-enable blocked commands.
  6. The operator can view and switch between named simulator profiles loaded from configuration, and profile changes apply only to future operations.
  7. Active runs expose richer metrics and results, and persisted run summaries include the richer fields introduced by this slice.
  8. Core timeline, recovery, simulator profile, and result-metric behavior are covered by automated tests.

Verification Notes

The implementation task for this spec must include verification for:

  • bounded diagnostics timeline behavior
  • acknowledgment versus recovery guard behavior
  • faulted-run preservation and post-recovery state transition behavior
  • simulator profile loading and selection rules
  • richer run metrics flowing into persisted history
  • visibility of queue, drop, or coalescing counters where such behavior exists

Docs-first project memory for AI-assisted implementation.