LCN Wafer Inspection

Below is a principal-level explanation of Workflow & Process Coordination, aligned to your source of truth: Domain 1 explicitly includes Machine Workflow & Sequencing, with emphasis on step-by-step sequencing, synchronization between subsystems, deterministic workflow execution, operational control semantics, and fault handling. The roadmap also ties this to long-running workflows, stateful components, error propagation, concurrency, and recovery.

PART 1 — WHAT A WORKFLOW IS IN MACHINE SOFTWARE

In industrial machine software, a workflow is the explicit model of a real machine process.

It is not just “some code that runs in order.” It is a representation of a physical operation such as:

inspection cycle
pick-and-place cycle
wafer alignment procedure
calibration routine
unload / load sequence
recovery procedure

A workflow answers questions like:

What step are we in right now?
What must complete before the next step can begin?
What conditions must be true to continue?
What happens if we pause, stop, timeout, or fail?
What has already been done, and what remains?

That last question is critical. In business software, if a method fails, you often retry or roll back a transaction. In machine software, the machine may already have moved, clamped a part, energized a vacuum, captured an image, or opened a valve. The physical world does not roll back automatically.

Workflow vs orchestration vs state machine

These three are related, but they are not the same.

Workflow The process definition itself. It describes the business-of-the-machine sequence: load wafer, align, autofocus, scan, review, unload.

Orchestration The coordination logic that drives subsystems during that workflow. It decides when to command motion, when to wait for vision readiness, when to validate interlocks, when to branch, when to raise alarms.

State machine The execution-control model. It governs allowed states and transitions such as Idle -> Starting -> Running -> Paused -> Stopping -> Faulted -> Recovering.

A useful mental model is:

workflow = what process the machine is performing
orchestration = how the system coordinates components to perform it
state machine = how execution is controlled safely and predictably

A machine can have one workflow model, an orchestration layer that executes it, and a state model that constrains what execution states are valid. That separation is usually healthier than collapsing everything into one giant state enum.

PART 2 — STRUCTURING WORKFLOW STEPS

A workflow is built from explicit steps.

Typical machine workflow steps include:

move to position
home axis
wait for sensor
acquire image
validate result
actuate clamp / vacuum / IO
compute next target
confirm subsystem ready
branch based on outcome
finalize / cleanup

These steps are not all equal. Some are:

action steps: command something
wait steps: wait for completion or condition
decision steps: choose next branch
validation steps: verify safety / readiness / quality
recovery steps: clear partial state or bring machine to a safe point

Step dependencies

In real systems, a step depends on more than “previous step finished.”

A step may require:

motion complete
position within tolerance
no active interlock
device initialized
sensor stable for N ms
image acquisition buffer ready
recipe parameter validated
operator acknowledgment received

So good workflow design treats dependencies explicitly, not implicitly.

Sequencing rules

A robust workflow usually follows this pattern:

validate prerequisites
issue command
observe progress
detect completion or timeout
verify postconditions
transition to next step

That sounds simple, but many bad systems skip steps 1, 4, or 5.

Conditional branching

Machine workflows often branch on:

recipe options
product type
inspection outcome
sensor results
subsystem capability
fault condition
operator choice during recovery

ASCII workflow diagram

text

+------------------+
| Start Cycle      |
+------------------+
         |
         v
+------------------+
| Validate Ready   |
| - recipe loaded  |
| - no alarms      |
| - interlocks ok  |
+------------------+
         |
         v
+------------------+
| Move to Start    |
+------------------+
         |
         v
+------------------+
| Wait Motion Done |
+------------------+
         |
         v
+------------------+
| Acquire Data     |
+------------------+
         |
         v
+------------------+
| Validate Result  |
+------------------+
      /       \
     /pass     \fail
    v           v
+------------------+    +----------------------+
| Next Process Step|    | Recovery / Retry     |
+------------------+    +----------------------+
         |                         |
         v                         v
+------------------+    +----------------------+
| Complete Cycle   |    | Operator Decision    |
+------------------+    +----------------------+

What this diagram means

This is not just business flow. Each box usually maps to:

a command to one or more subsystems
a wait for asynchronous completion
timeout and fault logic
state tracking
interruption handling points

That is why machine workflows need explicit modeling.

PART 3 — LONG-RUNNING WORKFLOWS

Machine workflows are often long-running.

They may last:

a few seconds for a simple transfer
minutes for calibration
tens of minutes for a batch operation
hours for full inspection lots or maintenance procedures

That changes the design completely.

A long-running workflow is not just a method call that takes longer. It has to survive:

asynchronous device completions
delays and timeouts
operator intervention
pause / stop / abort requests
device reconnects
transient bad measurements
partial success
power cycles in some architectures
stale or reordered events
subsystem availability changes

Why it is different from a simple function call

A normal function call assumes:

one call stack
immediate control
one thread of execution
predictable return path

A machine workflow usually involves:

multiple asynchronous subsystems
external events arriving later
long waits
state that must outlive one method frame
interruption requests from outside
progress tracking visible to operators and logs

So a workflow engine in a machine is usually closer to a persistent execution model than to a normal procedural method.

What must be tracked

For long-running execution, you typically track:

workflow instance id
current step
step status
workflow status
start time / duration
active command ids or correlation ids
last known subsystem statuses
retry count
pause / stop / abort requested flags
partial completion markers
fault context
operator action requirements

If you do not track these explicitly, debugging becomes miserable.

PART 4 — COORDINATING SUBSYSTEMS WITHIN WORKFLOW

A workflow coordinates subsystems such as:

motion
vision
sensors
IO
vacuum / pneumatics
robot handlers
measurement devices

The workflow itself should not become a dumping ground for device-specific details. It should coordinate at the right level.

For example, a wafer inspection step may conceptually say:

move stage to scan position
wait until stage settled
trigger camera acquisition
validate frame received
evaluate focus metric
decide continue or refocus

The workflow is about process intent and cross-subsystem coordination, not low-level driver mechanics.

ASCII sequence diagram

text

Workflow        MotionCtrl        SensorSvc        VisionSvc        IO/Actuator
   |                |                |                |                |
   |--StartStep---->|                |                |                |
   |                |--MoveTo(X,Y)-->|                |                |
   |                |<--Moving-------|                |                |
   |                |<--InPosition---|                |                |
   |<--MotionDone---|                |                |                |
   |--CheckReady-------------------->|                |                |
   |<-------------SensorOK-----------|                |                |
   |--TriggerAcquire--------------------------------->|                |
   |<-------------------------------FrameReady--------|                |
   |--SetOutput------------------------------------------------------->|
   |<----------------------------------------------------OutputDone----|
   |--AdvanceToNextStep-->|

What this diagram means

Notice the workflow does not assume a command is complete because the method returned.

Instead it follows a realistic pattern:

command subsystem
wait for actual completion signal
validate readiness
trigger next subsystem
continue only when postconditions are true

That is the essence of process coordination.

Common dependency patterns

Real workflows usually depend on one of these:

completion dependency: do not continue until prior action completed
condition dependency: do not continue until condition becomes true
stability dependency: do not continue until value is stable for some interval
mutual exclusion dependency: do not start because another subsystem owns the resource
safe-state dependency: do not start until machine is in a known safe condition

Strong engineers make these dependencies visible in the design.

PART 5 — HANDLING INTERRUPTIONS

Industrial workflows must handle interruption as a first-class concern.

Typical interruption types:

pause
resume
stop
abort

These are not synonyms.

Pause

Pause usually means:

finish to a safe pause boundary if possible
hold resources in a consistent state
remember current progress
allow later continuation

Example: finish current image acquisition, then stop advancing.

Resume

Resume means:

confirm prerequisites still hold
restore execution context
continue from a valid re-entry point
revalidate any stale assumptions

Resume is often harder than pause.

Stop

Stop usually means:

request orderly termination
finish current safe unit of work
perform cleanup
bring machine to a controlled state

Abort

Abort means:

terminate as fast as safely possible
may cut short normal sequencing
may leave work incomplete
often transitions to faulted / recovery-needed state

What happens at different timing points

At a step boundary

This is the easiest case.

You can often:

record step complete
check interruption request
transition to Paused or Stopped
avoid starting the next step

Mid-step

This is much harder.

Example:

stage is moving
vacuum is engaging
image acquisition is underway
robot arm is between positions

You cannot always just “stop now.” You need step-specific interruption semantics.

A good question for every step is:

What does pause/stop/abort mean while this step is active?

During waiting

This is where many workflows get stuck.

The workflow may be waiting for:

motion done
sensor ready
acquisition complete
PLC acknowledgment
timeout window

If pause/stop arrives during waiting, the engine must decide:

keep waiting until safe completion?
cancel the underlying action?
transition wait state?
ignore completion events that arrive after cancellation?

That last one is a major source of bugs.

ASCII workflow-state view

text

          +---------+
          |  Idle   |
          +---------+
               |
               v
          +---------+
          | Starting|
          +---------+
               |
               v
          +---------+
          | Running |
          +---------+
           /   |   \
    pause / stop| abort
         v      v     v
   +---------+ +---------+ +---------+
   | Pausing | |Stopping | |Aborting |
   +---------+ +---------+ +---------+
        |          |           |
        v          v           v
   +---------+ +---------+ +---------+
   | Paused  | | Stopped | | Faulted |
   +---------+ +---------+ +---------+
        |
      resume
        |
        v
   +---------+
   | Running |
   +---------+

Why interruption handling is complex

Because the workflow is coordinating real operations that may already be in progress, and each subsystem may have different cancellation behavior.

Motion may decelerate. Camera capture may already be triggered. PLC may already have latched a command. A valve may already be open. A part may already be clamped.

So interruption is not just a control flag. It is a coordination problem across real subsystems.

PART 6 — PARTIAL COMPLETION & RECOVERY

This is one of the most important ideas in machine workflows.

When failure happens, you need to know:

what has already completed
what is in progress
what definitely did not happen
what physical state the machine is now in
what the safe next action is

That is why strong workflow systems track completion markers, not just current step.

Typical recovery choices

When a workflow fails mid-process, the system may:

retry the current step
repeat the whole sub-sequence
perform compensating cleanup
continue from the next safe checkpoint
move to a recovery workflow
require operator intervention

Retry step

Good when:

action is idempotent or safely repeatable
failure is transient
physical state remains valid

Bad when:

repeating may duplicate actuation
side effects already happened
the environment changed

Rollback

In machine software, rollback is limited.

You can sometimes:

move back to safe position
release clamp
clear output
discard partial data
mark part for reject

But you often cannot “undo” the physical world in the same clean way as database rollback.

Continue safely

Possible if:

completed steps are trusted
next step does not require redoing previous action
machine state is still consistent

Require operator action

Often necessary when:

material position is uncertain
a gripper may still hold a part
a sensor disagrees with expected state
a human must inspect or reset hardware

Checkpoint mentality

Good workflow design often uses checkpoints like:

recipe validated
hardware initialized
part clamped
stage homed
scan region 1 complete
lot step N complete

These checkpoints make recovery tractable.

PART 7 — REAL-WORLD FAILURE SCENARIOS

1. Workflow stuck waiting for event

What it looks like Machine shows “Running,” but progress never advances. No obvious fault. Operator says it hangs randomly.

Why it happens

expected completion event never arrived
event arrived before wait subscription was active
timeout missing or too large
event correlation id mismatch
subsystem completed physically, but status propagation failed

How engineers debug it

inspect workflow timeline and current wait condition
verify whether command was actually issued
check raw device communication logs
confirm event/callback path fired
compare command id vs completion id
look for race between command issue and event subscription

This is a classic asynchronous coordination bug.

2. Step completes but next step starts too early

What it looks like System starts acquisition before motion has truly settled, or starts clamp before positioning fully finished.

Why it happens

using “command accepted” as “command completed”
completion signal means “in position” but not “stable”
stale cached state read as current
no postcondition validation
subsystem reports ready earlier than physically safe

How engineers debug it

compare timestamps of command, completion, and next-step start
inspect whether completion semantics are misunderstood
add separate “settled” or “postcondition verified” state
instrument actual device values around transition time

Strong engineers learn to distrust naive “done” signals.

3. Interruption leaves system inconsistent

What it looks like Pause requested during cycle. UI says paused, but one actuator is still active, or workflow resumes from the wrong place.

Why it happens

interruption only updated workflow flag, not subsystem behavior
no defined mid-step interruption policy
transition to Paused happened before underlying step quiesced
cleanup action not modeled

How engineers debug it

reconstruct exact step at interruption time
inspect whether pause was handled at boundary or mid-step
verify command cancellation / safe-stop path
check whether completion event from old step was consumed after pause

This is why interruption semantics must be defined per step type.

4. Retry causes duplicate actions

What it looks like Part gets clamped twice, image stored twice, output signal sent twice, item counted twice.

Why it happens

step retried without idempotency design
workflow did not know action had already succeeded
acknowledgement was delayed, causing false timeout
completion state not persisted before retry

How engineers debug it

inspect retry reason and timing
determine whether original action actually completed
check if step had idempotency token or duplicate suppression
separate “command issued,” “command acknowledged,” and “effect confirmed”

Retry is dangerous when physical side effects exist.

5. Condition check incorrect due to stale data

What it looks like Workflow proceeds because sensor says safe, but that value was from an earlier cycle.

Why it happens

polling cache not refreshed
async status propagation lag
data timestamp ignored
condition check subscribed to wrong source
race between state update and decision

How engineers debug it

inspect timestamp and source of condition data
distinguish latest observed value from last published value
validate freshness window
trace the path from device read to workflow decision

In machine software, stale data is often more dangerous than missing data.

PART 8 — SOFTWARE DESIGN IMPLICATIONS

Workflow logic should be explicitly modeled, not hidden across random service methods.

Your source-of-truth already points in this direction: machine workflow/sequencing, deterministic execution, operational control semantics, interlocks/fault handling, and state-driven design all belong here.

Why workflow must be explicit

Because you need to reason about:

current step
allowed next step
wait conditions
interruption points
recovery points
fault ownership
step completion evidence

If these are scattered through event handlers, timers, callbacks, and service classes, you no longer have a workflow. You have an accident waiting to happen.

Good vs bad approaches

Bad: implicit workflows in code

text

UI button click
  -> serviceA.DoThing()
      -> if ok call serviceB.Start()
          -> callback somewhere sets flag
              -> timer somewhere checks flag
                  -> maybe call serviceC()

Why this fails:

execution flow is invisible
step boundaries are unclear
state lives in booleans everywhere
pause/stop/recovery become chaotic
race conditions become normal
debugging requires reading half the codebase

Good: structured workflow model

text

Workflow Definition
   -> Step definitions
   -> Transition rules
   -> Preconditions / postconditions
   -> Interruption policy
   -> Retry / recovery policy

Workflow Executor
   -> Runs current step
   -> Tracks state and progress
   -> Waits for completion
   -> Applies transition rules
   -> Handles pause/stop/abort/fault

Subsystem Services
   -> Motion
   -> Vision
   -> IO
   -> Sensors

ASCII component view

text

+--------------------------+
| Workflow Definition      |
| - steps                  |
| - transitions            |
| - conditions             |
| - recovery rules         |
+------------+-------------+
             |
             v
+--------------------------+
| Workflow Executor        |
| - current step           |
| - progress tracking      |
| - interruption handling  |
| - timeout handling       |
| - retry / recovery       |
+------+-------+-----------+
       |       | 
       v       v
+----------+  +----------+  +----------+  +----------+
| Motion   |  | Vision   |  | Sensors  |  | IO/PLC   |
| Service  |  | Service  |  | Service  |  | Service  |
+----------+  +----------+  +----------+  +----------+

Design principles that matter

Clear step definitions Each step should have a single clear purpose.

Explicit transitions Do not infer next steps from scattered flags.

State tracking Track workflow state, step state, and completion markers explicitly.

Separation from device logic The workflow says what process is happening. Device services say how to talk to hardware.

Time-aware logic Conditions should know about timeout, freshness, stability, and cancellation.

Observability built in Every step transition should be logged with timestamps, identifiers, and reason.

Recovery modeled, not improvised Recovery code written only during incidents is almost always bad.

PART 9 — INTERVIEW / REAL-WORLD TALKING POINTS

How to explain workflows clearly

You can say:

In industrial machine software, a workflow is the explicit model of a real machine process. It is not just sequential code. It defines ordered steps, dependencies, conditions, interruption behavior, and recovery paths across multiple subsystems.

That is a strong answer.

Difference between workflow and orchestration

A clean explanation:

Workflow is the process definition: the sequence and rules of the machine operation. Orchestration is the runtime coordination logic that drives motion, sensors, vision, and IO to execute that process. The workflow describes what should happen; orchestration ensures the subsystems do it in the right order and under the right conditions.

Common mistakes engineers make

treating physical actions like synchronous method calls
assuming command accepted means operation complete
hiding workflow in event handlers and flags
not defining pause/stop/abort semantics per step
retrying non-idempotent actions blindly
not tracking partial completion
using stale status for decisions
mixing device logic and workflow logic
having no timeout or no meaningful timeout reason
making recovery manual because software does not know what happened

What strong engineers understand about long-running processes

Strong engineers understand that:

the machine process is stateful and long-lived
every step has physical meaning
step completion must be proven, not assumed
interruption is part of the design, not an edge case
recovery requires explicit knowledge of partial completion
correctness matters more than elegant-looking abstraction
diagnosability is part of architecture

Interview-ready closing statement

The key architectural move is to model workflow explicitly. Once the workflow, step boundaries, completion conditions, and interruption/recovery semantics are clear, the system becomes understandable, testable, and safer. When those things are implicit, long-running machine behavior becomes fragile very quickly.

Final mental model

Think of workflow coordination in industrial software like this:

workflow gives the machine a structured process
executor/orchestrator drives that process across subsystems
state tracking tells you where you are
interruption handling tells you how to stop or pause safely
recovery logic tells you what to do when reality diverges from expectation

The core mindset is not “run steps in order.” It is:

maintain correct, observable, recoverable progress through a physical process over time.

That is the real meaning of workflow & process coordination in machine software.

If you want, I can turn this into the same reusable learning format as the earlier topics, with a compact summary + key takeaways + interview Q&A section at the end.

Streaming Pipelines Dotnet Real World

PART 1 — WHAT A WORKFLOW IS IN MACHINE SOFTWARE ​

Workflow vs orchestration vs state machine ​

PART 2 — STRUCTURING WORKFLOW STEPS ​

Step dependencies ​

Sequencing rules ​

Conditional branching ​

ASCII workflow diagram ​

What this diagram means ​

PART 3 — LONG-RUNNING WORKFLOWS ​

Why it is different from a simple function call ​

What must be tracked ​

PART 4 — COORDINATING SUBSYSTEMS WITHIN WORKFLOW ​

ASCII sequence diagram ​

What this diagram means ​

Common dependency patterns ​

PART 5 — HANDLING INTERRUPTIONS ​

Pause ​

Resume ​

Stop ​

Abort ​

What happens at different timing points ​

At a step boundary ​

Mid-step ​

During waiting ​

ASCII workflow-state view ​

Why interruption handling is complex ​

PART 6 — PARTIAL COMPLETION & RECOVERY ​

Typical recovery choices ​

Retry step ​

Rollback ​

Continue safely ​

Require operator action ​

Checkpoint mentality ​

PART 7 — REAL-WORLD FAILURE SCENARIOS ​

1. Workflow stuck waiting for event ​

2. Step completes but next step starts too early ​

3. Interruption leaves system inconsistent ​

4. Retry causes duplicate actions ​

5. Condition check incorrect due to stale data ​

PART 8 — SOFTWARE DESIGN IMPLICATIONS ​

Why workflow must be explicit ​

Good vs bad approaches ​

Bad: implicit workflows in code ​

Good: structured workflow model ​

ASCII component view ​

Design principles that matter ​

PART 9 — INTERVIEW / REAL-WORLD TALKING POINTS ​

How to explain workflows clearly ​

Difference between workflow and orchestration ​

Common mistakes engineers make ​

What strong engineers understand about long-running processes ​

Interview-ready closing statement ​

Final mental model ​

PART 1 — WHAT A WORKFLOW IS IN MACHINE SOFTWARE

Workflow vs orchestration vs state machine

PART 2 — STRUCTURING WORKFLOW STEPS

Step dependencies

Sequencing rules

Conditional branching

ASCII workflow diagram

What this diagram means

PART 3 — LONG-RUNNING WORKFLOWS

Why it is different from a simple function call

What must be tracked

PART 4 — COORDINATING SUBSYSTEMS WITHIN WORKFLOW

ASCII sequence diagram

What this diagram means

Common dependency patterns

PART 5 — HANDLING INTERRUPTIONS

Pause

Resume

Stop

Abort

What happens at different timing points

At a step boundary

Mid-step

During waiting

ASCII workflow-state view

Why interruption handling is complex

PART 6 — PARTIAL COMPLETION & RECOVERY

Typical recovery choices

Retry step

Rollback

Continue safely

Require operator action

Checkpoint mentality

PART 7 — REAL-WORLD FAILURE SCENARIOS

1. Workflow stuck waiting for event

2. Step completes but next step starts too early

3. Interruption leaves system inconsistent

4. Retry causes duplicate actions

5. Condition check incorrect due to stale data

PART 8 — SOFTWARE DESIGN IMPLICATIONS

Why workflow must be explicit

Good vs bad approaches

Bad: implicit workflows in code

Good: structured workflow model

ASCII component view

Design principles that matter

PART 9 — INTERVIEW / REAL-WORLD TALKING POINTS

How to explain workflows clearly

Difference between workflow and orchestration

Common mistakes engineers make

What strong engineers understand about long-running processes

Interview-ready closing statement

Final mental model