Skip to content

Inspection Workflow Integration

Inspection workflow integration is the point where vision stops being “an image-processing module” and becomes part of the machine’s actual behavior.

In your roadmap, vision includes acquisition pipelines, triggered capture, inspection workflow orchestration, alignment, result handling, image storage, real-time presentation, and integration with machine motion . This topic sits exactly at that intersection: vision, motion, sequencing, state machines, recipes, faults, and deterministic machine behavior. Domain 1 also emphasizes that machine software is state-driven, timing-sensitive, and must coordinate physical actions safely .


PART 1 — WHY INSPECTION IS NOT AN ISOLATED MODULE

A vision algorithm can be technically correct and still be useless in production if it is not connected to the right machine context.

In offline testing, inspection often looks simple:

text
image -> algorithm -> result

But in a real machine, the real question is not only:

“Does this image contain a defect?”

The real question is:

“Does this specific image, captured at this specific position, for this specific wafer/part, under this specific recipe, at this specific workflow step, produce a result that the machine can safely act on?”

That is a much harder software problem.

A real inspection step depends on several things being correct at the same time:

text
Correct product / wafer / part
Correct lot / job context
Correct recipe version
Correct inspection site
Correct machine position
Correct camera configuration
Correct illumination state
Correct image frame
Correct alignment transform
Correct timing
Correct result ownership

If any of those are wrong, the algorithm may still return a “valid” result, but the machine decision becomes wrong.

For example, in a wafer inspection machine, the workflow may move the stage to die location A, trigger image capture, run alignment, inspect the die, and then decide whether to continue scanning or flag the wafer. If the image actually belongs to die location A-1, the inspection logic may still produce measurements, but those measurements are spatially meaningless.

Another example: a robot places a part into a fixture, then a camera verifies its position. If the workflow does not confirm that the part is clamped, motion has settled, illumination is correct, and the captured frame belongs to the current placement cycle, then a “position OK” result may be based on stale or unstable data.

A third example: alignment finds an offset, and that offset changes the next motion command. In this case, inspection is not just reporting information. It directly influences machine behavior. If that offset is stale, computed from the wrong image, or applied to the wrong coordinate frame, the next motion command can be physically wrong.

So from an architecture perspective, inspection is not a passive module. It participates in the machine sequence.

A better mental model is:

text
Machine Workflow
    -> prepares physical state
    -> prepares inspection context
    -> acquires correlated image
    -> runs alignment / inspection
    -> receives structured result
    -> decides machine action
    -> records traceable outcome

The workflow owns the meaning of the inspection result. The algorithm owns the computation. Those two responsibilities should not be mixed.


PART 2 — INSPECTION STEP LIFECYCLE

A production inspection step should be explicit. It should not be hidden behind a single method like:

text
Inspect()

That kind of abstraction is too vague for real machines.

A better lifecycle is:

text
1. Prepare context
2. Configure imaging / inspection parameters
3. Move or wait for correct condition
4. Acquire image
5. Validate image quality
6. Align / register if needed
7. Run inspection
8. Validate result
9. Dispatch decision
10. Store / report result

Here is a simple flow diagram:

text
+----------------------+
|  Prepare Context     |
|  wafer, part, site   |
|  recipe, step id     |
+----------+-----------+
           |
           v
+----------------------+
| Configure Parameters |
| camera, lighting,    |
| exposure, algorithm  |
+----------+-----------+
           |
           v
+----------------------+
| Move / Wait          |
| position, settle,    |
| trigger condition    |
+----------+-----------+
           |
           v
+----------------------+
| Acquire Image        |
| frame id, timestamp, |
| position snapshot    |
+----------+-----------+
           |
           v
+----------------------+
| Validate Image       |
| exposure, focus,     |
| completeness         |
+----------+-----------+
           |
           v
+----------------------+
| Align / Register     |
| fiducial, offset,    |
| transform            |
+----------+-----------+
           |
           v
+----------------------+
| Run Inspection       |
| measure, detect,     |
| classify             |
+----------+-----------+
           |
           v
+----------------------+
| Validate Result      |
| confidence, limits,  |
| completeness         |
+----------+-----------+
           |
           v
+----------------------+
| Workflow Decision    |
| pass, fail, retry,   |
| alarm, review        |
+----------+-----------+
           |
           v
+----------------------+
| Store / Report       |
| traceability, result |
| event, summary       |
+----------------------+

Each step exists because each step can fail differently.

“Acquire image failed” is not the same as “image quality bad.”

“Alignment failed” is not the same as “product failed inspection.”

“Algorithm crashed” is not the same as “defect found.”

This distinction matters because the workflow response is different.

For example:

text
Bad image quality       -> reacquire
Alignment failure       -> retry alignment, ask operator, or alarm
Defect detected         -> mark product fail
Algorithm exception     -> machine fault / software alarm
Storage timeout         -> continue, buffer, or stop depending on policy

Skipping context validation causes subtle bugs because the result may look structurally valid but semantically wrong.

A dangerous example:

text
InspectionResult {
    Status = Pass
    Score = 0.97
}

This looks valid, but it is incomplete.

A production-grade result needs context:

text
InspectionResult {
    StepId
    ProductId / WaferId / PartId
    RecipeId / RecipeVersion
    SiteId / DieIndex / PositionId
    FrameId
    AcquisitionTimestamp
    MotionPositionSnapshot
    AlignmentTransformId
    AlgorithmVersion
    ResultStatus
    Confidence
    FailureReason
}

Without this context, you cannot prove what was inspected.


PART 3 — COORDINATION WITH MOTION AND ACQUISITION

Inspection usually depends on physical position and capture timing.

The workflow must coordinate:

text
move to inspection position
wait for motion complete
wait for settle condition
set lighting / exposure
trigger camera
receive frame
verify frame correlation
run alignment / inspection

The dangerous shortcut is:

text
image = camera.GetLatestImage()
result = vision.Inspect(image)

“Latest image” is dangerous because latest does not mean correct.

The latest image could be:

text
from previous part
from previous wafer site
from manual camera preview
from retry attempt
from a trigger that fired late
from another workflow branch
from a buffer not yet cleared

In machine software, image ownership matters.

A safer model is:

text
Request frame for StepId = S123
Camera returns FrameId = F987
Frame metadata includes StepId = S123
Workflow verifies FrameId belongs to current step
Only then inspection runs

Sequence diagram:

text
Participant: Workflow
Participant: MotionController
Participant: CameraAcquisition
Participant: VisionService
Participant: ResultHandler

Workflow          MotionController       CameraAcquisition       VisionService       ResultHandler
   |                    |                       |                     |                    |
   |-- MoveTo(site) --->|                       |                     |                    |
   |                    |-- motion starts ---->|                     |                    |
   |                    |<-- motion done ------|                     |                    |
   |                    |                       |                     |                    |
   |-- WaitSettle() -------------------------->|                     |                    |
   |<-- settle confirmed ----------------------|                     |                    |
   |                    |                       |                     |                    |
   |-- ArmCapture(stepId, expectedPos) -------->|                     |                    |
   |                    |                       |                     |                    |
   |-- TriggerCapture() ----------------------->|                     |                    |
   |                    |                       |-- frame captured -->|                    |
   |                    |                       |<-- FrameId F987 ----|                    |
   |                    |                       |                     |                    |
   |-- Inspect(stepId, frameId, context) --------------------------->|                    |
   |                    |                       |                     |-- run inspection --|
   |                    |                       |                     |<-- result ---------|
   |                    |                       |                     |                    |
   |-- HandleResult(result) ---------------------------------------------------------->|
   |                    |                       |                     |                    |

The important point is not the exact API shape. The important point is correlation.

The workflow should know:

text
Which step requested this image?
Which physical position was expected?
Which actual position was captured?
Which recipe was active?
Which product was present?
Which trigger produced this frame?
Which result came from this frame?

A strong architecture treats image frames as evidence, not just data.


PART 4 — RESULT HANDLING AND MACHINE DECISION

The vision service should not secretly decide the machine’s next action.

It can return structured information such as:

text
InspectionResultStatus:
    Passed
    Failed
    ImageQualityRejected
    AlignmentFailed
    AlgorithmError
    Timeout
    Inconclusive

But the workflow decides what to do.

Possible workflow outcomes include:

text
Continue machine sequence
Reject product
Retry acquisition
Retry alignment
Request operator review
Raise alarm
Adjust next motion position
Stop machine
Pause lot
Mark wafer / part for review
Continue but flag degraded condition

The reason this belongs in workflow/application logic is that the correct action depends on machine context.

For example, an alignment failure during setup may mean:

text
Ask operator to teach fiducial again

The same alignment failure during production may mean:

text
Retry once, then stop lot

The same failure during engineering mode may mean:

text
Log warning and allow manual override

The algorithm cannot know all of that safely.

Bad design:

text
VisionAlgorithm detects fail
VisionAlgorithm tells motion to reject part
VisionAlgorithm writes alarm
VisionAlgorithm updates UI
VisionAlgorithm stores result

This creates hidden control flow and makes the machine hard to reason about.

Better design:

text
VisionAlgorithm returns structured result
Workflow evaluates result against current mode / recipe / policy
Workflow dispatches machine command
Workflow raises alarm if needed
Workflow publishes UI event
Workflow records traceable outcome

The workflow owns the machine decision.

The algorithm owns the inspection computation.

Storage owns persistence.

UI owns presentation.

Alarm service owns alarm lifecycle.

That separation is not academic. It prevents production bugs.


PART 5 — INSPECTION STATE, RETRY, AND RECOVERY

Inspection can fail for many reasons:

text
Acquisition failure
Poor image quality
Alignment failure
Algorithm error
Timeout
Device disconnected
Motion not settled
Wrong trigger
Lighting not ready
Storage blocked
Operator interruption

The workflow must decide whether to:

text
retry
skip
stop
ask operator
mark product for review
raise alarm
continue in degraded mode

Retry policy is especially tricky in machines because retry is not just software retry.

In business software, retrying an HTTP call often means sending the same request again.

In machine software, retrying inspection may require physical actions:

text
turn light off/on
clear camera buffer
move stage back
wait for vibration to settle
re-trigger camera
recompute alignment
discard stale frame
preserve original attempt history

A retry that ignores physical state can make the situation worse.

State/flow diagram:

text
+------------------+
| ReadyToInspect   |
+--------+---------+
         |
         v
+------------------+
| PreparingContext |
+--------+---------+
         |
         v
+------------------+
| MovingToPosition |
+--------+---------+
         |
         v
+------------------+
| WaitingForSettle |
+--------+---------+
         |
         v
+------------------+
| AcquiringImage   |
+---+----------+---+
    |          |
    | success  | acquisition failure
    v          v
+------------------+      +------------------+
| ValidatingImage  |      | RetryDecision    |
+---+----------+---+      +----+--------+----+
    |          |               |        |
    | ok       | bad quality   | retry  | stop/alarm
    v          v               v        v
+------------------+      +------------------+
| Aligning         |<-----| ReprepareCapture |
+---+----------+---+      +------------------+
    |          |
    | ok       | fail
    v          v
+------------------+      +------------------+
| Inspecting       |      | RecoveryDecision |
+---+----------+---+      +----+--------+----+
    |          |               |        |
    | ok       | error/timeout | retry  | operator/alarm
    v          v               v        v
+------------------+      +------------------+
| EvaluatingResult |      | Faulted/Review   |
+--------+---------+      +------------------+
         |
         v
+------------------+
| Completed        |
+------------------+

A mature retry policy considers:

text
Is the product still in the same physical position?
Has the machine moved since the failed attempt?
Is the camera buffer clean?
Is the alignment result still valid?
Is the same recipe still active?
Is this the first retry or fifth retry?
Will retry risk damaging product or machine?
Does the operator need to approve?
Should the failed attempt be stored?

Retry must be state-aware.

A common bug is retrying the algorithm without reacquiring the image. Sometimes that is valid. Sometimes it is not. The workflow must make that distinction explicitly.


PART 6 — REAL-WORLD FAILURE SCENARIOS

1. Vision algorithm works offline but fails in live workflow

In production, engineers may say:

text
The algorithm works perfectly on saved images, but the machine still fails.

What it looks like:

text
Offline replay passes.
Live machine produces random fails.
Operators complain that results are unstable.
Vision engineer says the algorithm is fine.
Controls engineer says the motion is fine.
Software team is stuck in the middle.

Why it happens:

text
image captured before settle
wrong lighting state
wrong recipe loaded
motion position not correlated
camera buffer contains old frame
trigger timing is unstable
inspection starts before acquisition is complete

How experienced engineers diagnose it:

text
Compare live frame metadata with workflow step id.
Log requested position vs actual position.
Check trigger timestamp vs motion complete timestamp.
Replay the exact captured frame.
Verify recipe and camera configuration snapshot.
Check whether failed live images differ from offline test images.

The lesson: offline correctness does not prove workflow correctness.


2. Image belongs to previous part or previous position

What it looks like:

text
Part A is rejected for a defect that belongs to Part B.
Wafer map shows defects shifted by one site.
Inspection appears consistently one step behind.

Why it happens:

text
camera buffer not cleared
latest image used implicitly
frame event processed late
workflow advanced before frame arrived
missing step/frame correlation

How to handle it:

text
Assign StepId before acquisition.
Attach StepId to capture request.
Require FrameId correlation.
Reject frames with mismatched context.
Flush or segment buffers at step boundaries.
Log frame source and acquisition trigger id.

The key architectural rule:

text
Never inspect an unowned frame.

3. Inspection result arrives after workflow already moved on

What it looks like:

text
Machine moves to next site.
Then previous result arrives.
System applies result to current site by mistake.
UI shows confusing status.
Reject action happens late.

Why it happens:

text
asynchronous processing without ownership
no cancellation or result validity check
workflow state changed while inspection was running
slow algorithm path under load
queue backlog

How to handle it:

text
Each result carries StepId and FrameId.
Workflow verifies current state before applying result.
Late results are recorded but not applied.
Cancellation token is passed to inspection.
Workflow uses timeout and state transition rules.
Processing queues expose backlog metrics.

A mature system distinguishes:

text
Result computed
Result accepted by workflow
Result applied to machine decision

Those are not the same thing.


4. Retry uses stale image or stale alignment result

What it looks like:

text
Retry produces the same wrong result instantly.
Machine says retry succeeded, but nothing physical changed.
Alignment offset from previous attempt is reused incorrectly.

Why it happens:

text
retry path calls algorithm again but does not reacquire
alignment cache not invalidated
frame id not changed
context object reused carelessly
state cleanup missing between attempts

How to handle it:

text
Define retry type:
    algorithm retry
    image reacquisition retry
    alignment retry
    full physical retry

Invalidate cached alignment when image changes.
Require new FrameId for reacquisition retry.
Store AttemptNumber.
Preserve attempt history.
Make reuse rules explicit.

Retry should not mean “run the same code again.” It should mean “execute a defined recovery path.”


5. Algorithm fail is treated as product fail incorrectly

What it looks like:

text
Good products are rejected.
Yield drops suddenly.
Operators see many fail results but no real defects.

Why it happens:

text
algorithm exception mapped to Fail
timeout mapped to DefectFound
alignment failure mapped to ProductFail
image quality failure mapped to InspectionFail

This is a serious semantic bug.

A product fail means:

text
The product was inspected and did not meet criteria.

An inspection failure means:

text
The system could not produce a trustworthy inspection result.

Those are different.

How to handle it:

text
Use separate result categories:
    ProductPass
    ProductFail
    InspectionInvalid
    ImageInvalid
    AlignmentInvalid
    SystemFault
    Timeout
    OperatorReviewRequired

Only mark product fail when inspection was valid.

This distinction protects yield, traceability, and customer trust.


6. Poor image quality triggers false defect instead of reacquire path

What it looks like:

text
Sudden false defects during lighting drift.
Defect rate changes with machine vibration or focus drift.
Same part passes after manual recapture.

Why it happens:

text
image quality validation missing
algorithm forced to inspect bad image
low confidence result treated as defect
exposure/focus/illumination not checked before inspection

How to handle it:

text
Validate image quality before defect decision.
Return ImageQualityRejected separately.
Allow reacquisition policy.
Log image quality metrics.
Keep bad image for diagnosis if needed.
Do not hide image-quality problems as product defects.

Bad image quality should usually enter a recovery path before becoming a product decision.


7. Storage delay blocks workflow

What it looks like:

text
Machine throughput drops.
Inspection step waits on database/file system.
UI freezes or result queue grows.
Camera keeps producing images faster than storage can persist.

Why it happens:

text
workflow synchronously writes large images
storage is on slow network path
database transaction includes heavy image data
no buffering/backpressure policy
storage failure treated inconsistently

How to handle it:

text
Separate decision-critical result from heavy artifact storage.
Persist minimal result synchronously if required.
Offload large image storage to controlled background pipeline.
Use bounded queues.
Expose storage backlog.
Define policy for storage failure:
    stop machine
    continue with warning
    degrade image retention
    pause after threshold

Storage should not accidentally become the hidden cycle-time bottleneck.


PART 7 — SOFTWARE DESIGN IMPLICATIONS

Inspection workflow integration must be explicit.

The architecture should make these things first-class:

text
inspection context object
frame/result correlation
inspection step lifecycle
separation between algorithm output and workflow decision
retry/recovery policy
observability around each step
deterministic result ownership

Component diagram:

text
+--------------------------------------------------+
|                 Machine Workflow                 |
|  sequence, state, mode, recipe, recovery policy  |
+-------------------------+------------------------+
                          |
                          v
+--------------------------------------------------+
|             Inspection Step Context              |
|  step id, product id, recipe, site, position,     |
|  expected trigger, attempt number, correlation id |
+-------------------------+------------------------+
                          |
                          v
+-------------------+   +-------------------+   +-------------------+
| Acquisition       |   | Alignment         |   | Inspection        |
| Service           |   | Service           |   | Service           |
| frame id, trigger |   | transform, score  |   | measurements,     |
| timestamp         |   | fiducial result   |   | defects, status   |
+---------+---------+   +---------+---------+   +---------+---------+
          |                       |                       |
          +-----------------------+-----------------------+
                                  |
                                  v
+--------------------------------------------------+
|          Structured Inspection Result            |
|  context, frame id, result status, confidence,    |
|  failure reason, timing, algorithm version        |
+-------------------------+------------------------+
                          |
                          v
+-------------------+   +-------------------+   +-------------------+
| Workflow Decision |   | UI Notification   |   | Storage / Report  |
| continue, retry,  |   | status, alarm,    |   | traceability,     |
| reject, stop      |   | operator review   |   | audit, evidence   |
+-------------------+   +-------------------+   +-------------------+

The bad approach:

text
Workflow calls camera.GetLatestImage()
Vision algorithm reads global recipe
Vision algorithm controls reject station
Result has no frame id
Retry reuses stale state
Storage happens inside algorithm
UI listens to random events

This creates a fragile system where nobody owns the truth.

The good approach:

text
Workflow owns inspection step.
Workflow creates InspectionContext.
Acquisition returns correlated Frame.
Alignment returns correlated Transform.
Inspection returns StructuredInspectionResult.
Workflow decides machine action.
UI and storage receive events/results after workflow ownership is clear.

A good inspection context might conceptually contain:

text
InspectionContext
    RunId
    LotId
    WaferId / PartId
    RecipeId
    RecipeVersion
    StepId
    SiteId / PositionId
    ExpectedMachinePosition
    ActualPositionSnapshot
    CameraId
    AcquisitionPlanId
    AttemptNumber
    CorrelationId
    TimeoutPolicy
    RetryPolicy

A strong result might contain:

text
StructuredInspectionResult
    Context
    FrameId
    AlignmentResultId
    Status
    Measurements
    Defects
    Confidence
    FailureCategory
    FailureReason
    StartedAt
    CompletedAt
    Duration
    AlgorithmVersion
    IsDecisionEligible

The field IsDecisionEligible is important conceptually.

It means:

text
Can this result safely influence product or machine decision?

For example:

text
ProductFail            -> decision eligible
ProductPass            -> decision eligible
ImageQualityRejected   -> not product decision eligible
AlgorithmError         -> not product decision eligible
AlignmentFailed        -> usually not product decision eligible
Timeout                -> depends on policy

This prevents invalid inspection attempts from becoming false product decisions.


PART 8 — INTERVIEW / REAL-WORLD TALKING POINTS

A clear interview explanation could sound like this:

In a machine vision system, inspection is not just an algorithm running on an image. It is a workflow step that depends on machine state, recipe, position, acquisition timing, frame ownership, alignment, and result correlation. The workflow should own the inspection lifecycle and machine decision. The vision service should return structured, correlated results, not directly control machine behavior.

Another strong version:

The hardest bugs happen when vision works offline but fails inside the live machine sequence. That usually means the issue is not the algorithm itself, but context: wrong frame, wrong position, stale alignment, late result, bad trigger timing, or missing retry/recovery semantics.

Common mistakes software engineers make when entering vision systems:

text
They treat image inspection like a stateless function.
They use “latest image” instead of correlated frame ownership.
They let algorithm code decide machine actions.
They treat algorithm failure as product failure.
They design retry without considering physical machine state.
They forget that motion, lighting, trigger, image, and result must belong to the same workflow step.
They under-log timing and correlation data.
They block machine sequence on heavy storage operations.

What strong engineers understand:

text
Inspection is part of the machine sequence.
Context is as important as pixels.
Frame/result correlation is mandatory.
Workflow owns decision-making.
Algorithms return evidence, not machine commands.
Retry is a physical recovery policy, not just code repetition.
Invalid inspection is not the same as failed product.
Late results must not mutate the wrong workflow state.
Every result must be traceable to product, recipe, frame, position, and step.

The core architectural principle is:

text
Vision computes.
Workflow decides.
Machine state constrains.
Context proves correctness.
Traceability preserves trust.

That is the mindset shift from normal software to production machine software.

Docs-first project memory for AI-assisted implementation.