Skip to content

05. Failure Modes and Workflow Requirements

Back to Requirements Hub

This page covers original Sections 11-12.

11. Expected Failure Modes

The simulator and app should eventually model realistic failures such as:

  • connect timeout or connection failure
  • move requested before homing
  • move timeout
  • emergency stop during run
  • door open preventing run
  • connection loss during operation
  • camera stream interruption
  • processing slower than incoming data
  • operator stop during active workflow
  • operator abort during active workflow

The application must not treat these as rare edge cases. They are part of the normal design.

12. Workflow Requirements

The workflow must be explicit and understandable.

12.1 Preconditions for Start

For the first slice, an inspection run must only start when:

  • machine is connected
  • recipe is loaded
  • safety conditions are satisfied
  • machine is already homed
  • no active critical fault is present
  • workflow state is Idle or Ready

Automatic homing as part of Start is deferred to a later slice.

12.2 Run Execution

A run should:

  • enter an initializing phase
  • prepare machine state
  • move through scan points
  • acquire frames
  • generate inspection results
  • update progress and metrics
  • complete, stop, abort, or fault cleanly

For the first slice, the nominal happy-path transition is:

Idle or Ready -> Preparing -> Running -> Completed

If interrupted, the workflow must instead end in exactly one of:

  • Stopped
  • Aborted
  • Faulted

12.3 Stop Semantics

Stop must represent controlled and graceful termination of the active run.

For the first slice:

  • Stop is valid only while the workflow is Preparing or Running
  • once a stop request is accepted, no new scan point may begin
  • the system may finish the current safe step boundary such as an in-flight move or already accepted acquisition step
  • the terminal workflow state after a successful stop is Stopped
  • the run summary must record that the run ended by operator stop rather than normal completion

12.4 Abort Semantics

Abort must interrupt active work as quickly and safely as possible.

For the first slice:

  • Abort is valid while the workflow is Preparing, Homing, Running, or Stopping
  • cancellation tokens for in-flight work must be signaled immediately
  • queued frame or processing work for the interrupted run may be discarded rather than drained to completion
  • the terminal workflow state after operator abort is Aborted

12.5 Fault Semantics

Critical faults must force the system into a safe and explicit state and prevent invalid operations until addressed.

For the first slice:

  • a critical fault during active work must transition the workflow to Faulted
  • active critical faults must block Start, Home, and motion commands
  • acknowledgement records operator awareness only
  • recovery requires the fault condition to be cleared and an explicit operator reset or recovery action

12.6 Fault Recovery

Fault recovery must be explicit and observable.

For the first slice:

  • recovery must create diagnostics or log entries
  • recovery must not silently erase the history of the faulted run
  • after successful recovery, the machine may return to Idle or Ready depending on current prerequisites

Docs-first project memory for AI-assisted implementation.