Usability Under Real Conditions & Failure Modes
Industrial HMI Design for Real Factory Conditions
This topic belongs directly to the UI / HMI / Operator Experience domain, where industrial UI must expose machine state, alarms, controls, recipes, device health, images, workflow progress, and safe controls clearly under real operating pressure. The roadmap explicitly calls out “usability under stress and factory conditions” as a key HMI topic.
Part 1 — Why Industrial HMI Usability Is Different
Industrial HMI usability is not the same as normal desktop or web application usability.
In a business app, poor usability may cause user frustration, slower work, or data-entry mistakes. In an industrial machine, poor usability can cause:
- wrong recovery actions
- machine downtime
- damaged parts
- scrapped product
- unsafe motion attempts
- loss of operator trust
- longer troubleshooting during production pressure
The operator is often not sitting calmly with time to explore the UI. They may be standing beside a noisy machine, wearing gloves, dealing with alarms, under pressure from production targets, and being interrupted by supervisors, technicians, or other machines.
A technically correct UI can still be operationally dangerous.
For example:
Software reality:
Machine is paused because vacuum is not ready.
UI reality:
Screen only says "Sequence stopped."
Operator interpretation:
"Maybe I should press Reset or Start again."
Actual result:
Operator retries the wrong action instead of checking vacuum.The software may be “working,” but the UI failed to communicate the real situation.
In industrial systems, usability is part of reliability.
A good HMI does not only show data. It helps the operator answer three questions quickly:
1. What is the machine doing now?
2. What is blocking it?
3. What should I do next?If the screen cannot answer these questions during abnormal situations, the UI is not production-grade.
Part 2 — Operator Cognition Under Stress
Under stress, operators do not carefully analyze every detail on the screen.
They rely on:
- visible cues
- familiar screen patterns
- clear status indicators
- obvious next actions
- short messages
- consistent terminology
- muscle memory from previous shifts
This is important because abnormal situations are exactly when the UI must be clearest.
During faults, the operator may not read a long paragraph such as:
The machine has entered a recoverable suspended state because the downstream
inspection sequence did not receive a completed motion confirmation from the
stage controller within the expected timeout window.A better production message is:
Stage movement did not complete.
Required action:
1. Check stage area is clear.
2. Press Recover Stage.
3. If fault repeats, call service.The goal is not to hide complexity forever. The goal is to separate immediate operator action from deeper engineering diagnostics.
Operator Decision Flow
+-------------------+
| Machine Problem |
| e.g. stage fault |
+---------+---------+
|
v
+-------------------+
| UI Signal |
| status, alarm, |
| message, color |
+---------+---------+
|
v
+---------------------------+
| Operator Interpretation |
| "What is happening?" |
| "Is it safe?" |
| "What should I do?" |
+---------+-----------------+
|
v
+-------------------+
| Operator Action |
| reset, recover, |
| stop, call service|
+---------+---------+
|
v
+-------------------+
| Machine Outcome |
| safe recovery or |
| wrong action |
+-------------------+The weak point is often not the machine logic. The weak point is the interpretation step.
If the UI signal is unclear, the operator forms the wrong mental model.
Part 3 — Information Hierarchy & Clarity
A production HMI should not treat all information equally.
The most important information should be immediately visible.
A strong hierarchy is:
1. Machine state
2. Active faults / blocking conditions
3. Required operator action
4. Production context
5. Detailed diagnosticsThis means the main screen should not be dominated by low-level numbers while the machine is blocked by a critical condition.
Bad screen:
- 30 sensor values
- 12 motor positions
- 8 temperature readings
- small red text: "Vacuum not ready"Good screen:
Machine State: PAUSED - WAITING FOR VACUUM
Blocking Condition:
Vacuum not ready at wafer chuck.
Required Action:
Check vacuum supply, then press Retry Vacuum.
Details:
Sensor VCH-01 = OFF
Expected = ON
Last transition = 14:32:08Screen Hierarchy Diagram
+--------------------------------------------------+
| MACHINE STATE |
| PAUSED - BLOCKED BY VACUUM |
+--------------------------------------------------+
+--------------------------------------------------+
| ACTIVE BLOCKING CONDITION |
| Vacuum not ready at wafer chuck |
+--------------------------------------------------+
+--------------------------------------------------+
| REQUIRED OPERATOR ACTION |
| 1. Check wafer position |
| 2. Check vacuum supply |
| 3. Press Retry Vacuum |
+--------------------------------------------------+
+--------------------------------------------------+
| PRODUCTION CONTEXT |
| Lot: L23091 | Wafer: W07 | Recipe: Inspect_A |
+--------------------------------------------------+
+--------------------------------------------------+
| DETAILS / DIAGNOSTICS |
| Sensor: VCH-01 OFF | Timeout: 5s | Step: Clamp |
+--------------------------------------------------+The key principle:
Operator screens should prioritize action.
Engineer screens can prioritize analysis.Mixing both creates clutter and confusion.
Part 4 — Error Prevention Design
The best industrial HMI prevents bad actions before they happen.
This is better than allowing a bad action and then showing an error afterward.
Practical strategies
1. Disable unsafe actions
If the machine is in Auto mode, do not allow manual axis jog unless the workflow permits it.
Bad:
[Jog X+] button is always enabled.
Clicking it shows: "Cannot jog in Auto mode."Better:
[Jog X+] is disabled.
Reason shown: "Manual jog unavailable while machine is in Auto mode."2. Explain why a command is unavailable
A disabled button without explanation causes frustration.
Bad:
[Start] disabledBetter:
[Start] disabled
Cannot start because:
- Door is open
- Recipe is not loaded
- Wafer is not clamped3. Require confirmation for dangerous actions
Not every button needs confirmation. But destructive or high-impact actions should.
Examples:
Abort Run
Clear Current Lot
Unload Wafer
Reset Calibration
Overwrite Recipe
Move Axis to HomeThe confirmation should include context:
Abort current inspection run?
Lot: L23091
Wafer: W07
Current step: Defect Review
This will stop inspection and mark the wafer as incomplete.4. Avoid ambiguous labels
Bad labels:
Reset
Clear
Run
Apply
OK
ContinueBetter labels:
Reset Alarm
Clear Current Recipe
Start Inspection
Apply Recipe Changes
Continue Recovery5. Prevent mode confusion
Many industrial mistakes happen because the operator does not realize the machine is in:
- Auto mode
- Manual mode
- Maintenance mode
- Dry run mode
- Simulation mode
- Service override mode
Mode should be highly visible and should affect available actions.
+--------------------------------------+
| MODE: MANUAL |
| Auto inspection commands unavailable |
+--------------------------------------+6. Show current context before action
Before allowing a command, show what it applies to.
Bad:
[Delete]Better:
Delete Recipe: Inspect_300mm_ProductA_v12?7. Make destructive actions intentional
Dangerous buttons should not be visually or spatially close to common safe buttons.
Bad:
[Retry] [Abort] [Next]Better:
[Retry Step] [Abort Run...]The layout should make accidental destructive clicks unlikely.
Part 5 — Feedback, Trust, and Operator Confidence
Operators need clear feedback after every important action.
A command should not disappear into a black box.
For each command, the UI should show whether it is:
Requested
Accepted
Executing
Completed
Rejected
Failed
Timed outA common mistake is showing success too early.
Bad:
Operator clicks "Clamp Wafer"
UI immediately shows "Wafer clamped"But physically, the clamp may still be moving, vacuum may still be building, and the sensor may not yet confirm success.
Better:
Operator clicks "Clamp Wafer"
UI shows:
- Command sent
- Clamp command accepted
- Waiting for vacuum confirmation
- Wafer clamped successfullyThis distinction is critical:
Command sent != Command accepted != Physical action completedFeedback Loop Diagram
+-------------------+
| Machine State |
| sensors, workflow |
+---------+---------+
|
v
+-------------------+
| UI Presentation |
| current state, |
| available actions |
+---------+---------+
|
v
+-------------------+
| Operator Action |
| clicks command |
+---------+---------+
|
v
+-------------------+
| Validated Command |
| state/mode/checks |
+---------+---------+
|
v
+-------------------+
| Machine Execution |
| motion/device |
+---------+---------+
|
v
+-------------------+
| Machine Feedback |
| accepted, done, |
| failed, timeout |
+---------+---------+
|
v
+-------------------+
| UI Update |
| clear result |
+-------------------+If feedback is delayed or missing, operators may:
- click repeatedly
- assume the UI froze
- retry unsafe actions
- call service unnecessarily
- stop trusting the HMI
Trust is built when the UI consistently tells the truth about what is known, what is unknown, and what is still in progress.
Part 6 — Factory Conditions That Affect UI Design
Industrial HMIs are used in physical environments.
That changes design decisions.
Touchscreens with gloves
Buttons must be large enough. Small precise controls are risky.
Bad:
Tiny checkbox beside "Unload wafer"Better:
Large button with clear label:
[Unload Wafer...]Low or high lighting
Screens may be viewed under bright factory lights or dim equipment areas.
This affects:
- contrast
- font size
- status visibility
- color dependency
Color alone is not enough.
Bad:
Green = ready
Red = blockedBetter:
READY
BLOCKED - DOOR OPENUse color plus text plus icon/shape.
Operator standing posture
Operators may not be sitting close to the screen. The most important status should be readable from a distance.
This affects:
- large machine state header
- clear status banners
- fewer dense tables on main operator screens
Noisy environment
Audio alarms may be missed or ignored. Visual state must still be clear.
Multilingual teams
Use simple, consistent wording.
Avoid clever phrasing, long sentences, and inconsistent names.
Bad:
Chuck vacuum missing
Wafer suction inactive
Clamp pressure not satisfiedIf these mean the same thing, choose one term and use it everywhere.
Limited training
The UI should guide normal and abnormal operation without requiring deep machine knowledge.
The operator does not need to know the internal class name, SDK error code, or sequence engine step name.
They need to know:
What happened?
Is production stopped?
What should I do?
When should I call service?Shift changes
The next operator must understand machine status quickly.
Useful UI features:
- current state
- active blocking condition
- last major operator action
- current lot/wafer/run
- recovery step in progress
- timestamp of last state change
Remote/service support
When a service engineer asks, “What do you see on the screen?”, the screen must provide clear names, IDs, timestamps, and context.
Part 7 — Real-World UI Failure Modes
1. UI shows stale machine state as current
What it looks like
The screen says:
Machine ReadyBut the backend has lost communication with the device, or the state is several seconds old.
Why it happens
The UI caches the last known state but does not show freshness.
Impact
The operator believes the machine is ready and tries to start or recover.
Fix
Show freshness explicitly:
Machine State: Ready
Last updated: 0.4s agoIf stale:
Machine State: Unknown
No update from controller for 8.2s
Operator commands disabled2. Operator cannot tell what action is required
What it looks like
The UI says:
Error 2034
Sequence interruptedWhy it happens
The system exposes internal fault information but not operator guidance.
Impact
Operator guesses, retries, resets, or calls service for a simple recoverable issue.
Fix
Separate technical cause from required action:
Problem:
Wafer clamp did not confirm.
Required action:
1. Check wafer is seated.
2. Press Retry Clamp.
3. If this repeats, call service.
Details:
Sensor CLAMP_OK did not turn ON within 5 seconds.3. Alarm/recovery text is too vague
What it looks like
Motion failed.Why it happens
Developers reuse generic exception messages.
Impact
The operator does not know whether the issue is obstruction, limit, timeout, homing, or communication.
Fix
Use specific, action-oriented messages:
X-axis failed to reach target position.
Possible causes:
- Obstruction
- Axis not homed
- Servo fault
Required action:
Check stage area, then press Recover Axis.4. Button label is ambiguous
What it looks like
[Reset]Does it reset the alarm? Sequence? Axis? Recipe? Whole machine?
Why it happens
Developers use generic verbs.
Impact
Operators hesitate or press the wrong command.
Fix
Use object-specific labels:
[Reset Alarm]
[Reset Sequence]
[Home Axis]
[Reload Recipe]5. Screen navigation hides critical context
What it looks like
The operator opens a service screen and loses visibility of the active fault or machine state.
Why it happens
Screens are designed as independent pages instead of workflow-aware views.
Impact
Operator performs action without seeing the current blocking condition.
Fix
Keep critical context persistent:
Top banner always visible:
Machine State | Mode | Active Fault | Current Lot/Wafer6. UI allows action in wrong mode
What it looks like
Manual jog is available while the machine is still in Auto recovery.
Why it happens
UI command availability is handled locally per button instead of centrally from machine state/mode.
Impact
Operator action conflicts with workflow execution.
Fix
Use mode-aware command gating:
Command availability = function of:
- machine state
- current mode
- interlocks
- workflow step
- user role
- device readiness7. Success message shown before physical completion
What it looks like
Unload completedBut the robot is still moving.
Why it happens
The UI treats command acceptance as completion.
Impact
Operator opens door or performs next action too early.
Fix
Represent command lifecycle clearly:
Unload requested
Unload accepted
Robot moving
Wafer placed in cassette
Unload completed8. Too many warnings cause alarm fatigue
What it looks like
The screen constantly shows low-value warnings.
Operators start ignoring all warnings.
Why it happens
Everything is treated as equally important.
Impact
Critical warnings are missed.
Fix
Prioritize and classify messages:
Blocking fault:
Requires action now
Warning:
Machine can continue but should be watched
Information:
Useful history, not urgentDo not flood the operator with engineering noise.
9. Inconsistent terminology across screens
What it looks like
One screen says:
RecipeAnother says:
ProgramAnother says:
Job fileWhy it happens
Different developers name things independently.
Impact
Operators are unsure whether these are the same concept.
Fix
Use centralized terminology.
For example:
Recipe = inspection parameters
Job = production execution instance
Lot = manufacturing batch
Wafer = physical unit being inspectedUse these terms consistently everywhere.
10. Service/debug information shown to production operator
What it looks like
Operator screen shows:
SequenceNodeException: AxisMoveStep.WaitForDone timeout at node 17
SDK_ERR_0x800704C7Why it happens
Engineering diagnostics are exposed directly.
Impact
Operator is confused and may lose trust.
Fix
Layer the information:
Operator message:
Stage movement did not complete.
Action:
Press Recover Stage or call service.
Engineer details:
SequenceNodeException...
AxisMoveStep...
SDK error...Operator UI and service UI should not be the same screen with different users looking at it.
Part 8 — Software Design Implications
Usability is not only a screen-layout concern. It affects architecture.
A good industrial HMI needs proper models behind it.
Important architectural concepts
1. State freshness indicators
The UI should know whether displayed state is current.
Example:
Fresh:
Last update < 1 second
Degraded:
Last update 1-5 seconds
Stale:
No update > 5 secondsWhen state is stale, commands should usually be disabled or restricted.
2. Clear command result model
Do not model commands as simple button clicks.
Use a lifecycle model:
Requested
Validated
Rejected
Accepted
Executing
Completed
Failed
TimedOut
CancelledThis allows the UI to tell the operator what is really happening.
3. Mode-aware UI behavior
The UI should not independently decide what is allowed.
Command availability should come from the machine/workflow context.
Bad:
Button decides:
if IsEnabledCheckbox == trueGood:
Presentation model decides:
CanStartInspection
CanRetryStep
CanAbortRun
CanJogAxis
CanEditRecipeBased on machine state, mode, workflow, safety conditions, and role.
4. Centralized terminology
Terms should not be hardcoded randomly across screens.
A strong system has a shared vocabulary for:
- machine states
- modes
- commands
- alarms
- recovery steps
- recipe concepts
- production context
This prevents confusion and improves training.
5. Consistent screen patterns
Operators build habits.
If every recovery screen has a different layout, the operator must think harder during stress.
Good pattern:
Problem
Cause
Required Action
Available Commands
DetailsUse this consistently.
6. Operator/action context
Every important action should carry context.
Example audit/action context:
Action: Retry Clamp
Operator: user123
Mode: Auto Recovery
Lot: L23091
Wafer: W07
Recipe: Inspect_A
Machine State: Paused
Blocking Condition: Clamp timeout
Timestamp: 14:32:08Even if this topic is not deep-diving audit internals, the usability implication is clear: context helps support, diagnosis, and accountability.
7. Workflow-guided recovery
A recovery UI should not just show an error. It should guide the operator through a safe path.
Bad:
Fault occurred.
[Reset]Good:
Wafer clamp failed.
Step 1: Check wafer position.
Step 2: Retry clamp.
Step 3: If retry fails, unload wafer and call service.
Available:
[Retry Clamp]
[Abort Run]
[Call Service Instructions]8. Separate operator-facing and engineer-facing diagnostics
Operators need action. Engineers need evidence.
Do not force one screen to serve both badly.
Operator view:
What happened?
What should I do?
Engineer view:
Which subsystem?
Which signal?
Which timestamp?
Which exception?
Which device response?Component Diagram
+--------------------------------------+
| Machine State + Workflow Context |
| state, mode, step, faults, freshness |
+-------------------+------------------+
|
v
+--------------------------------------+
| UI Context / Presentation Model |
| visible state, available actions, |
| messages, guidance, command status |
+-------------------+------------------+
|
v
+--------------------------------------+
| Operator Screen |
| clear status, next action, feedback |
+-------------------+------------------+
|
v
+--------------------------------------+
| Operator Action |
| start, retry, abort, recover |
+-------------------+------------------+
|
v
+--------------------------------------+
| Validated Command Path |
| state/mode/interlock validation |
+-------------------+------------------+
|
v
+--------------------------------------+
| Machine Feedback |
| accepted, executing, complete, fail |
+--------------------------------------+The screen should be the visible result of machine context, not an independent command panel.
Part 9 — Interview / Real-World Talking Points
A strong way to explain this in an interview:
Industrial HMI usability is not just about making screens look clean. It is about helping operators make correct decisions under pressure. The UI must clearly show machine state, blocking conditions, required actions, and command feedback. A machine can have correct backend logic but still fail operationally if the HMI misleads the operator, hides stale state, uses ambiguous labels, or shows success before physical completion.
Common mistakes software engineers make when entering industrial UI:
1. Treating HMI like a normal CRUD desktop app.
2. Showing too much raw diagnostic data to operators.
3. Not distinguishing command accepted from action completed.
4. Forgetting that operators work under stress and interruptions.
5. Using generic error messages from exceptions.
6. Allowing screens to hide critical machine context.
7. Making command enablement local to buttons instead of state-driven.
8. Ignoring stale data and communication loss.
9. Using inconsistent terminology.
10. Designing for developers instead of operators.What strong engineers understand:
Good industrial HMI design is about operational clarity.
The operator should always know:
- what the machine is doing
- whether the displayed state is fresh
- what is blocking progress
- what action is safe
- what action is unavailable and why
- whether a command was requested, accepted, executing, or completedThe best HMIs reduce guessing.
They do not assume the operator has perfect attention, perfect training, perfect lighting, perfect time, or perfect context.
They are designed for the real factory:
stress
noise
fatigue
interruptions
gloves
shift handovers
production pressure
abnormal recoveryA production-grade HMI is successful when, during a fault, the operator does not need to guess what the machine means.