Inspection Workflow Integration
Inspection workflow integration is the point where vision stops being “an image-processing module” and becomes part of the machine’s actual behavior.
In your roadmap, vision includes acquisition pipelines, triggered capture, inspection workflow orchestration, alignment, result handling, image storage, real-time presentation, and integration with machine motion . This topic sits exactly at that intersection: vision, motion, sequencing, state machines, recipes, faults, and deterministic machine behavior. Domain 1 also emphasizes that machine software is state-driven, timing-sensitive, and must coordinate physical actions safely .
PART 1 — WHY INSPECTION IS NOT AN ISOLATED MODULE
A vision algorithm can be technically correct and still be useless in production if it is not connected to the right machine context.
In offline testing, inspection often looks simple:
image -> algorithm -> resultBut in a real machine, the real question is not only:
“Does this image contain a defect?”
The real question is:
“Does this specific image, captured at this specific position, for this specific wafer/part, under this specific recipe, at this specific workflow step, produce a result that the machine can safely act on?”
That is a much harder software problem.
A real inspection step depends on several things being correct at the same time:
Correct product / wafer / part
Correct lot / job context
Correct recipe version
Correct inspection site
Correct machine position
Correct camera configuration
Correct illumination state
Correct image frame
Correct alignment transform
Correct timing
Correct result ownershipIf any of those are wrong, the algorithm may still return a “valid” result, but the machine decision becomes wrong.
For example, in a wafer inspection machine, the workflow may move the stage to die location A, trigger image capture, run alignment, inspect the die, and then decide whether to continue scanning or flag the wafer. If the image actually belongs to die location A-1, the inspection logic may still produce measurements, but those measurements are spatially meaningless.
Another example: a robot places a part into a fixture, then a camera verifies its position. If the workflow does not confirm that the part is clamped, motion has settled, illumination is correct, and the captured frame belongs to the current placement cycle, then a “position OK” result may be based on stale or unstable data.
A third example: alignment finds an offset, and that offset changes the next motion command. In this case, inspection is not just reporting information. It directly influences machine behavior. If that offset is stale, computed from the wrong image, or applied to the wrong coordinate frame, the next motion command can be physically wrong.
So from an architecture perspective, inspection is not a passive module. It participates in the machine sequence.
A better mental model is:
Machine Workflow
-> prepares physical state
-> prepares inspection context
-> acquires correlated image
-> runs alignment / inspection
-> receives structured result
-> decides machine action
-> records traceable outcomeThe workflow owns the meaning of the inspection result. The algorithm owns the computation. Those two responsibilities should not be mixed.
PART 2 — INSPECTION STEP LIFECYCLE
A production inspection step should be explicit. It should not be hidden behind a single method like:
Inspect()That kind of abstraction is too vague for real machines.
A better lifecycle is:
1. Prepare context
2. Configure imaging / inspection parameters
3. Move or wait for correct condition
4. Acquire image
5. Validate image quality
6. Align / register if needed
7. Run inspection
8. Validate result
9. Dispatch decision
10. Store / report resultHere is a simple flow diagram:
+----------------------+
| Prepare Context |
| wafer, part, site |
| recipe, step id |
+----------+-----------+
|
v
+----------------------+
| Configure Parameters |
| camera, lighting, |
| exposure, algorithm |
+----------+-----------+
|
v
+----------------------+
| Move / Wait |
| position, settle, |
| trigger condition |
+----------+-----------+
|
v
+----------------------+
| Acquire Image |
| frame id, timestamp, |
| position snapshot |
+----------+-----------+
|
v
+----------------------+
| Validate Image |
| exposure, focus, |
| completeness |
+----------+-----------+
|
v
+----------------------+
| Align / Register |
| fiducial, offset, |
| transform |
+----------+-----------+
|
v
+----------------------+
| Run Inspection |
| measure, detect, |
| classify |
+----------+-----------+
|
v
+----------------------+
| Validate Result |
| confidence, limits, |
| completeness |
+----------+-----------+
|
v
+----------------------+
| Workflow Decision |
| pass, fail, retry, |
| alarm, review |
+----------+-----------+
|
v
+----------------------+
| Store / Report |
| traceability, result |
| event, summary |
+----------------------+Each step exists because each step can fail differently.
“Acquire image failed” is not the same as “image quality bad.”
“Alignment failed” is not the same as “product failed inspection.”
“Algorithm crashed” is not the same as “defect found.”
This distinction matters because the workflow response is different.
For example:
Bad image quality -> reacquire
Alignment failure -> retry alignment, ask operator, or alarm
Defect detected -> mark product fail
Algorithm exception -> machine fault / software alarm
Storage timeout -> continue, buffer, or stop depending on policySkipping context validation causes subtle bugs because the result may look structurally valid but semantically wrong.
A dangerous example:
InspectionResult {
Status = Pass
Score = 0.97
}This looks valid, but it is incomplete.
A production-grade result needs context:
InspectionResult {
StepId
ProductId / WaferId / PartId
RecipeId / RecipeVersion
SiteId / DieIndex / PositionId
FrameId
AcquisitionTimestamp
MotionPositionSnapshot
AlignmentTransformId
AlgorithmVersion
ResultStatus
Confidence
FailureReason
}Without this context, you cannot prove what was inspected.
PART 3 — COORDINATION WITH MOTION AND ACQUISITION
Inspection usually depends on physical position and capture timing.
The workflow must coordinate:
move to inspection position
wait for motion complete
wait for settle condition
set lighting / exposure
trigger camera
receive frame
verify frame correlation
run alignment / inspectionThe dangerous shortcut is:
image = camera.GetLatestImage()
result = vision.Inspect(image)“Latest image” is dangerous because latest does not mean correct.
The latest image could be:
from previous part
from previous wafer site
from manual camera preview
from retry attempt
from a trigger that fired late
from another workflow branch
from a buffer not yet clearedIn machine software, image ownership matters.
A safer model is:
Request frame for StepId = S123
Camera returns FrameId = F987
Frame metadata includes StepId = S123
Workflow verifies FrameId belongs to current step
Only then inspection runsSequence diagram:
Participant: Workflow
Participant: MotionController
Participant: CameraAcquisition
Participant: VisionService
Participant: ResultHandler
Workflow MotionController CameraAcquisition VisionService ResultHandler
| | | | |
|-- MoveTo(site) --->| | | |
| |-- motion starts ---->| | |
| |<-- motion done ------| | |
| | | | |
|-- WaitSettle() -------------------------->| | |
|<-- settle confirmed ----------------------| | |
| | | | |
|-- ArmCapture(stepId, expectedPos) -------->| | |
| | | | |
|-- TriggerCapture() ----------------------->| | |
| | |-- frame captured -->| |
| | |<-- FrameId F987 ----| |
| | | | |
|-- Inspect(stepId, frameId, context) --------------------------->| |
| | | |-- run inspection --|
| | | |<-- result ---------|
| | | | |
|-- HandleResult(result) ---------------------------------------------------------->|
| | | | |The important point is not the exact API shape. The important point is correlation.
The workflow should know:
Which step requested this image?
Which physical position was expected?
Which actual position was captured?
Which recipe was active?
Which product was present?
Which trigger produced this frame?
Which result came from this frame?A strong architecture treats image frames as evidence, not just data.
PART 4 — RESULT HANDLING AND MACHINE DECISION
The vision service should not secretly decide the machine’s next action.
It can return structured information such as:
InspectionResultStatus:
Passed
Failed
ImageQualityRejected
AlignmentFailed
AlgorithmError
Timeout
InconclusiveBut the workflow decides what to do.
Possible workflow outcomes include:
Continue machine sequence
Reject product
Retry acquisition
Retry alignment
Request operator review
Raise alarm
Adjust next motion position
Stop machine
Pause lot
Mark wafer / part for review
Continue but flag degraded conditionThe reason this belongs in workflow/application logic is that the correct action depends on machine context.
For example, an alignment failure during setup may mean:
Ask operator to teach fiducial againThe same alignment failure during production may mean:
Retry once, then stop lotThe same failure during engineering mode may mean:
Log warning and allow manual overrideThe algorithm cannot know all of that safely.
Bad design:
VisionAlgorithm detects fail
VisionAlgorithm tells motion to reject part
VisionAlgorithm writes alarm
VisionAlgorithm updates UI
VisionAlgorithm stores resultThis creates hidden control flow and makes the machine hard to reason about.
Better design:
VisionAlgorithm returns structured result
Workflow evaluates result against current mode / recipe / policy
Workflow dispatches machine command
Workflow raises alarm if needed
Workflow publishes UI event
Workflow records traceable outcomeThe workflow owns the machine decision.
The algorithm owns the inspection computation.
Storage owns persistence.
UI owns presentation.
Alarm service owns alarm lifecycle.
That separation is not academic. It prevents production bugs.
PART 5 — INSPECTION STATE, RETRY, AND RECOVERY
Inspection can fail for many reasons:
Acquisition failure
Poor image quality
Alignment failure
Algorithm error
Timeout
Device disconnected
Motion not settled
Wrong trigger
Lighting not ready
Storage blocked
Operator interruptionThe workflow must decide whether to:
retry
skip
stop
ask operator
mark product for review
raise alarm
continue in degraded modeRetry policy is especially tricky in machines because retry is not just software retry.
In business software, retrying an HTTP call often means sending the same request again.
In machine software, retrying inspection may require physical actions:
turn light off/on
clear camera buffer
move stage back
wait for vibration to settle
re-trigger camera
recompute alignment
discard stale frame
preserve original attempt historyA retry that ignores physical state can make the situation worse.
State/flow diagram:
+------------------+
| ReadyToInspect |
+--------+---------+
|
v
+------------------+
| PreparingContext |
+--------+---------+
|
v
+------------------+
| MovingToPosition |
+--------+---------+
|
v
+------------------+
| WaitingForSettle |
+--------+---------+
|
v
+------------------+
| AcquiringImage |
+---+----------+---+
| |
| success | acquisition failure
v v
+------------------+ +------------------+
| ValidatingImage | | RetryDecision |
+---+----------+---+ +----+--------+----+
| | | |
| ok | bad quality | retry | stop/alarm
v v v v
+------------------+ +------------------+
| Aligning |<-----| ReprepareCapture |
+---+----------+---+ +------------------+
| |
| ok | fail
v v
+------------------+ +------------------+
| Inspecting | | RecoveryDecision |
+---+----------+---+ +----+--------+----+
| | | |
| ok | error/timeout | retry | operator/alarm
v v v v
+------------------+ +------------------+
| EvaluatingResult | | Faulted/Review |
+--------+---------+ +------------------+
|
v
+------------------+
| Completed |
+------------------+A mature retry policy considers:
Is the product still in the same physical position?
Has the machine moved since the failed attempt?
Is the camera buffer clean?
Is the alignment result still valid?
Is the same recipe still active?
Is this the first retry or fifth retry?
Will retry risk damaging product or machine?
Does the operator need to approve?
Should the failed attempt be stored?Retry must be state-aware.
A common bug is retrying the algorithm without reacquiring the image. Sometimes that is valid. Sometimes it is not. The workflow must make that distinction explicitly.
PART 6 — REAL-WORLD FAILURE SCENARIOS
1. Vision algorithm works offline but fails in live workflow
In production, engineers may say:
The algorithm works perfectly on saved images, but the machine still fails.What it looks like:
Offline replay passes.
Live machine produces random fails.
Operators complain that results are unstable.
Vision engineer says the algorithm is fine.
Controls engineer says the motion is fine.
Software team is stuck in the middle.Why it happens:
image captured before settle
wrong lighting state
wrong recipe loaded
motion position not correlated
camera buffer contains old frame
trigger timing is unstable
inspection starts before acquisition is completeHow experienced engineers diagnose it:
Compare live frame metadata with workflow step id.
Log requested position vs actual position.
Check trigger timestamp vs motion complete timestamp.
Replay the exact captured frame.
Verify recipe and camera configuration snapshot.
Check whether failed live images differ from offline test images.The lesson: offline correctness does not prove workflow correctness.
2. Image belongs to previous part or previous position
What it looks like:
Part A is rejected for a defect that belongs to Part B.
Wafer map shows defects shifted by one site.
Inspection appears consistently one step behind.Why it happens:
camera buffer not cleared
latest image used implicitly
frame event processed late
workflow advanced before frame arrived
missing step/frame correlationHow to handle it:
Assign StepId before acquisition.
Attach StepId to capture request.
Require FrameId correlation.
Reject frames with mismatched context.
Flush or segment buffers at step boundaries.
Log frame source and acquisition trigger id.The key architectural rule:
Never inspect an unowned frame.3. Inspection result arrives after workflow already moved on
What it looks like:
Machine moves to next site.
Then previous result arrives.
System applies result to current site by mistake.
UI shows confusing status.
Reject action happens late.Why it happens:
asynchronous processing without ownership
no cancellation or result validity check
workflow state changed while inspection was running
slow algorithm path under load
queue backlogHow to handle it:
Each result carries StepId and FrameId.
Workflow verifies current state before applying result.
Late results are recorded but not applied.
Cancellation token is passed to inspection.
Workflow uses timeout and state transition rules.
Processing queues expose backlog metrics.A mature system distinguishes:
Result computed
Result accepted by workflow
Result applied to machine decisionThose are not the same thing.
4. Retry uses stale image or stale alignment result
What it looks like:
Retry produces the same wrong result instantly.
Machine says retry succeeded, but nothing physical changed.
Alignment offset from previous attempt is reused incorrectly.Why it happens:
retry path calls algorithm again but does not reacquire
alignment cache not invalidated
frame id not changed
context object reused carelessly
state cleanup missing between attemptsHow to handle it:
Define retry type:
algorithm retry
image reacquisition retry
alignment retry
full physical retry
Invalidate cached alignment when image changes.
Require new FrameId for reacquisition retry.
Store AttemptNumber.
Preserve attempt history.
Make reuse rules explicit.Retry should not mean “run the same code again.” It should mean “execute a defined recovery path.”
5. Algorithm fail is treated as product fail incorrectly
What it looks like:
Good products are rejected.
Yield drops suddenly.
Operators see many fail results but no real defects.Why it happens:
algorithm exception mapped to Fail
timeout mapped to DefectFound
alignment failure mapped to ProductFail
image quality failure mapped to InspectionFailThis is a serious semantic bug.
A product fail means:
The product was inspected and did not meet criteria.An inspection failure means:
The system could not produce a trustworthy inspection result.Those are different.
How to handle it:
Use separate result categories:
ProductPass
ProductFail
InspectionInvalid
ImageInvalid
AlignmentInvalid
SystemFault
Timeout
OperatorReviewRequired
Only mark product fail when inspection was valid.This distinction protects yield, traceability, and customer trust.
6. Poor image quality triggers false defect instead of reacquire path
What it looks like:
Sudden false defects during lighting drift.
Defect rate changes with machine vibration or focus drift.
Same part passes after manual recapture.Why it happens:
image quality validation missing
algorithm forced to inspect bad image
low confidence result treated as defect
exposure/focus/illumination not checked before inspectionHow to handle it:
Validate image quality before defect decision.
Return ImageQualityRejected separately.
Allow reacquisition policy.
Log image quality metrics.
Keep bad image for diagnosis if needed.
Do not hide image-quality problems as product defects.Bad image quality should usually enter a recovery path before becoming a product decision.
7. Storage delay blocks workflow
What it looks like:
Machine throughput drops.
Inspection step waits on database/file system.
UI freezes or result queue grows.
Camera keeps producing images faster than storage can persist.Why it happens:
workflow synchronously writes large images
storage is on slow network path
database transaction includes heavy image data
no buffering/backpressure policy
storage failure treated inconsistentlyHow to handle it:
Separate decision-critical result from heavy artifact storage.
Persist minimal result synchronously if required.
Offload large image storage to controlled background pipeline.
Use bounded queues.
Expose storage backlog.
Define policy for storage failure:
stop machine
continue with warning
degrade image retention
pause after thresholdStorage should not accidentally become the hidden cycle-time bottleneck.
PART 7 — SOFTWARE DESIGN IMPLICATIONS
Inspection workflow integration must be explicit.
The architecture should make these things first-class:
inspection context object
frame/result correlation
inspection step lifecycle
separation between algorithm output and workflow decision
retry/recovery policy
observability around each step
deterministic result ownershipComponent diagram:
+--------------------------------------------------+
| Machine Workflow |
| sequence, state, mode, recipe, recovery policy |
+-------------------------+------------------------+
|
v
+--------------------------------------------------+
| Inspection Step Context |
| step id, product id, recipe, site, position, |
| expected trigger, attempt number, correlation id |
+-------------------------+------------------------+
|
v
+-------------------+ +-------------------+ +-------------------+
| Acquisition | | Alignment | | Inspection |
| Service | | Service | | Service |
| frame id, trigger | | transform, score | | measurements, |
| timestamp | | fiducial result | | defects, status |
+---------+---------+ +---------+---------+ +---------+---------+
| | |
+-----------------------+-----------------------+
|
v
+--------------------------------------------------+
| Structured Inspection Result |
| context, frame id, result status, confidence, |
| failure reason, timing, algorithm version |
+-------------------------+------------------------+
|
v
+-------------------+ +-------------------+ +-------------------+
| Workflow Decision | | UI Notification | | Storage / Report |
| continue, retry, | | status, alarm, | | traceability, |
| reject, stop | | operator review | | audit, evidence |
+-------------------+ +-------------------+ +-------------------+The bad approach:
Workflow calls camera.GetLatestImage()
Vision algorithm reads global recipe
Vision algorithm controls reject station
Result has no frame id
Retry reuses stale state
Storage happens inside algorithm
UI listens to random eventsThis creates a fragile system where nobody owns the truth.
The good approach:
Workflow owns inspection step.
Workflow creates InspectionContext.
Acquisition returns correlated Frame.
Alignment returns correlated Transform.
Inspection returns StructuredInspectionResult.
Workflow decides machine action.
UI and storage receive events/results after workflow ownership is clear.A good inspection context might conceptually contain:
InspectionContext
RunId
LotId
WaferId / PartId
RecipeId
RecipeVersion
StepId
SiteId / PositionId
ExpectedMachinePosition
ActualPositionSnapshot
CameraId
AcquisitionPlanId
AttemptNumber
CorrelationId
TimeoutPolicy
RetryPolicyA strong result might contain:
StructuredInspectionResult
Context
FrameId
AlignmentResultId
Status
Measurements
Defects
Confidence
FailureCategory
FailureReason
StartedAt
CompletedAt
Duration
AlgorithmVersion
IsDecisionEligibleThe field IsDecisionEligible is important conceptually.
It means:
Can this result safely influence product or machine decision?For example:
ProductFail -> decision eligible
ProductPass -> decision eligible
ImageQualityRejected -> not product decision eligible
AlgorithmError -> not product decision eligible
AlignmentFailed -> usually not product decision eligible
Timeout -> depends on policyThis prevents invalid inspection attempts from becoming false product decisions.
PART 8 — INTERVIEW / REAL-WORLD TALKING POINTS
A clear interview explanation could sound like this:
In a machine vision system, inspection is not just an algorithm running on an image. It is a workflow step that depends on machine state, recipe, position, acquisition timing, frame ownership, alignment, and result correlation. The workflow should own the inspection lifecycle and machine decision. The vision service should return structured, correlated results, not directly control machine behavior.
Another strong version:
The hardest bugs happen when vision works offline but fails inside the live machine sequence. That usually means the issue is not the algorithm itself, but context: wrong frame, wrong position, stale alignment, late result, bad trigger timing, or missing retry/recovery semantics.
Common mistakes software engineers make when entering vision systems:
They treat image inspection like a stateless function.
They use “latest image” instead of correlated frame ownership.
They let algorithm code decide machine actions.
They treat algorithm failure as product failure.
They design retry without considering physical machine state.
They forget that motion, lighting, trigger, image, and result must belong to the same workflow step.
They under-log timing and correlation data.
They block machine sequence on heavy storage operations.What strong engineers understand:
Inspection is part of the machine sequence.
Context is as important as pixels.
Frame/result correlation is mandatory.
Workflow owns decision-making.
Algorithms return evidence, not machine commands.
Retry is a physical recovery policy, not just code repetition.
Invalid inspection is not the same as failed product.
Late results must not mutate the wrong workflow state.
Every result must be traceable to product, recipe, frame, position, and step.The core architectural principle is:
Vision computes.
Workflow decides.
Machine state constrains.
Context proves correctness.
Traceability preserves trust.That is the mindset shift from normal software to production machine software.