Skip to content

Usability Under Real Conditions & Failure Modes

Industrial HMI Design for Real Factory Conditions

This topic belongs directly to the UI / HMI / Operator Experience domain, where industrial UI must expose machine state, alarms, controls, recipes, device health, images, workflow progress, and safe controls clearly under real operating pressure. The roadmap explicitly calls out “usability under stress and factory conditions” as a key HMI topic.


Part 1 — Why Industrial HMI Usability Is Different

Industrial HMI usability is not the same as normal desktop or web application usability.

In a business app, poor usability may cause user frustration, slower work, or data-entry mistakes. In an industrial machine, poor usability can cause:

  • wrong recovery actions
  • machine downtime
  • damaged parts
  • scrapped product
  • unsafe motion attempts
  • loss of operator trust
  • longer troubleshooting during production pressure

The operator is often not sitting calmly with time to explore the UI. They may be standing beside a noisy machine, wearing gloves, dealing with alarms, under pressure from production targets, and being interrupted by supervisors, technicians, or other machines.

A technically correct UI can still be operationally dangerous.

For example:

text
Software reality:
    Machine is paused because vacuum is not ready.

UI reality:
    Screen only says "Sequence stopped."

Operator interpretation:
    "Maybe I should press Reset or Start again."

Actual result:
    Operator retries the wrong action instead of checking vacuum.

The software may be “working,” but the UI failed to communicate the real situation.

In industrial systems, usability is part of reliability.

A good HMI does not only show data. It helps the operator answer three questions quickly:

text
1. What is the machine doing now?
2. What is blocking it?
3. What should I do next?

If the screen cannot answer these questions during abnormal situations, the UI is not production-grade.


Part 2 — Operator Cognition Under Stress

Under stress, operators do not carefully analyze every detail on the screen.

They rely on:

  • visible cues
  • familiar screen patterns
  • clear status indicators
  • obvious next actions
  • short messages
  • consistent terminology
  • muscle memory from previous shifts

This is important because abnormal situations are exactly when the UI must be clearest.

During faults, the operator may not read a long paragraph such as:

text
The machine has entered a recoverable suspended state because the downstream
inspection sequence did not receive a completed motion confirmation from the
stage controller within the expected timeout window.

A better production message is:

text
Stage movement did not complete.

Required action:
1. Check stage area is clear.
2. Press Recover Stage.
3. If fault repeats, call service.

The goal is not to hide complexity forever. The goal is to separate immediate operator action from deeper engineering diagnostics.

Operator Decision Flow

text
+-------------------+
| Machine Problem   |
| e.g. stage fault  |
+---------+---------+
          |
          v
+-------------------+
| UI Signal         |
| status, alarm,    |
| message, color    |
+---------+---------+
          |
          v
+---------------------------+
| Operator Interpretation   |
| "What is happening?"      |
| "Is it safe?"             |
| "What should I do?"       |
+---------+-----------------+
          |
          v
+-------------------+
| Operator Action   |
| reset, recover,   |
| stop, call service|
+---------+---------+
          |
          v
+-------------------+
| Machine Outcome   |
| safe recovery or  |
| wrong action      |
+-------------------+

The weak point is often not the machine logic. The weak point is the interpretation step.

If the UI signal is unclear, the operator forms the wrong mental model.


Part 3 — Information Hierarchy & Clarity

A production HMI should not treat all information equally.

The most important information should be immediately visible.

A strong hierarchy is:

text
1. Machine state
2. Active faults / blocking conditions
3. Required operator action
4. Production context
5. Detailed diagnostics

This means the main screen should not be dominated by low-level numbers while the machine is blocked by a critical condition.

Bad screen:

text
- 30 sensor values
- 12 motor positions
- 8 temperature readings
- small red text: "Vacuum not ready"

Good screen:

text
Machine State: PAUSED - WAITING FOR VACUUM

Blocking Condition:
Vacuum not ready at wafer chuck.

Required Action:
Check vacuum supply, then press Retry Vacuum.

Details:
Sensor VCH-01 = OFF
Expected = ON
Last transition = 14:32:08

Screen Hierarchy Diagram

text
+--------------------------------------------------+
| MACHINE STATE                                    |
| PAUSED - BLOCKED BY VACUUM                       |
+--------------------------------------------------+

+--------------------------------------------------+
| ACTIVE BLOCKING CONDITION                        |
| Vacuum not ready at wafer chuck                  |
+--------------------------------------------------+

+--------------------------------------------------+
| REQUIRED OPERATOR ACTION                         |
| 1. Check wafer position                           |
| 2. Check vacuum supply                            |
| 3. Press Retry Vacuum                             |
+--------------------------------------------------+

+--------------------------------------------------+
| PRODUCTION CONTEXT                               |
| Lot: L23091 | Wafer: W07 | Recipe: Inspect_A     |
+--------------------------------------------------+

+--------------------------------------------------+
| DETAILS / DIAGNOSTICS                            |
| Sensor: VCH-01 OFF | Timeout: 5s | Step: Clamp   |
+--------------------------------------------------+

The key principle:

text
Operator screens should prioritize action.
Engineer screens can prioritize analysis.

Mixing both creates clutter and confusion.


Part 4 — Error Prevention Design

The best industrial HMI prevents bad actions before they happen.

This is better than allowing a bad action and then showing an error afterward.

Practical strategies

1. Disable unsafe actions

If the machine is in Auto mode, do not allow manual axis jog unless the workflow permits it.

Bad:

text
[Jog X+] button is always enabled.
Clicking it shows: "Cannot jog in Auto mode."

Better:

text
[Jog X+] is disabled.
Reason shown: "Manual jog unavailable while machine is in Auto mode."

2. Explain why a command is unavailable

A disabled button without explanation causes frustration.

Bad:

text
[Start] disabled

Better:

text
[Start] disabled

Cannot start because:
- Door is open
- Recipe is not loaded
- Wafer is not clamped

3. Require confirmation for dangerous actions

Not every button needs confirmation. But destructive or high-impact actions should.

Examples:

text
Abort Run
Clear Current Lot
Unload Wafer
Reset Calibration
Overwrite Recipe
Move Axis to Home

The confirmation should include context:

text
Abort current inspection run?

Lot: L23091
Wafer: W07
Current step: Defect Review

This will stop inspection and mark the wafer as incomplete.

4. Avoid ambiguous labels

Bad labels:

text
Reset
Clear
Run
Apply
OK
Continue

Better labels:

text
Reset Alarm
Clear Current Recipe
Start Inspection
Apply Recipe Changes
Continue Recovery

5. Prevent mode confusion

Many industrial mistakes happen because the operator does not realize the machine is in:

  • Auto mode
  • Manual mode
  • Maintenance mode
  • Dry run mode
  • Simulation mode
  • Service override mode

Mode should be highly visible and should affect available actions.

text
+--------------------------------------+
| MODE: MANUAL                         |
| Auto inspection commands unavailable |
+--------------------------------------+

6. Show current context before action

Before allowing a command, show what it applies to.

Bad:

text
[Delete]

Better:

text
Delete Recipe: Inspect_300mm_ProductA_v12?

7. Make destructive actions intentional

Dangerous buttons should not be visually or spatially close to common safe buttons.

Bad:

text
[Retry] [Abort] [Next]

Better:

text
[Retry Step]                     [Abort Run...]

The layout should make accidental destructive clicks unlikely.


Part 5 — Feedback, Trust, and Operator Confidence

Operators need clear feedback after every important action.

A command should not disappear into a black box.

For each command, the UI should show whether it is:

text
Requested
Accepted
Executing
Completed
Rejected
Failed
Timed out

A common mistake is showing success too early.

Bad:

text
Operator clicks "Clamp Wafer"
UI immediately shows "Wafer clamped"

But physically, the clamp may still be moving, vacuum may still be building, and the sensor may not yet confirm success.

Better:

text
Operator clicks "Clamp Wafer"

UI shows:
- Command sent
- Clamp command accepted
- Waiting for vacuum confirmation
- Wafer clamped successfully

This distinction is critical:

text
Command sent != Command accepted != Physical action completed

Feedback Loop Diagram

text
+-------------------+
| Machine State     |
| sensors, workflow |
+---------+---------+
          |
          v
+-------------------+
| UI Presentation   |
| current state,    |
| available actions |
+---------+---------+
          |
          v
+-------------------+
| Operator Action   |
| clicks command    |
+---------+---------+
          |
          v
+-------------------+
| Validated Command |
| state/mode/checks |
+---------+---------+
          |
          v
+-------------------+
| Machine Execution |
| motion/device     |
+---------+---------+
          |
          v
+-------------------+
| Machine Feedback  |
| accepted, done,   |
| failed, timeout   |
+---------+---------+
          |
          v
+-------------------+
| UI Update         |
| clear result      |
+-------------------+

If feedback is delayed or missing, operators may:

  • click repeatedly
  • assume the UI froze
  • retry unsafe actions
  • call service unnecessarily
  • stop trusting the HMI

Trust is built when the UI consistently tells the truth about what is known, what is unknown, and what is still in progress.


Part 6 — Factory Conditions That Affect UI Design

Industrial HMIs are used in physical environments.

That changes design decisions.

Touchscreens with gloves

Buttons must be large enough. Small precise controls are risky.

Bad:

text
Tiny checkbox beside "Unload wafer"

Better:

text
Large button with clear label:
[Unload Wafer...]

Low or high lighting

Screens may be viewed under bright factory lights or dim equipment areas.

This affects:

  • contrast
  • font size
  • status visibility
  • color dependency

Color alone is not enough.

Bad:

text
Green = ready
Red = blocked

Better:

text
READY
BLOCKED - DOOR OPEN

Use color plus text plus icon/shape.

Operator standing posture

Operators may not be sitting close to the screen. The most important status should be readable from a distance.

This affects:

  • large machine state header
  • clear status banners
  • fewer dense tables on main operator screens

Noisy environment

Audio alarms may be missed or ignored. Visual state must still be clear.

Multilingual teams

Use simple, consistent wording.

Avoid clever phrasing, long sentences, and inconsistent names.

Bad:

text
Chuck vacuum missing
Wafer suction inactive
Clamp pressure not satisfied

If these mean the same thing, choose one term and use it everywhere.

Limited training

The UI should guide normal and abnormal operation without requiring deep machine knowledge.

The operator does not need to know the internal class name, SDK error code, or sequence engine step name.

They need to know:

text
What happened?
Is production stopped?
What should I do?
When should I call service?

Shift changes

The next operator must understand machine status quickly.

Useful UI features:

  • current state
  • active blocking condition
  • last major operator action
  • current lot/wafer/run
  • recovery step in progress
  • timestamp of last state change

Remote/service support

When a service engineer asks, “What do you see on the screen?”, the screen must provide clear names, IDs, timestamps, and context.


Part 7 — Real-World UI Failure Modes

1. UI shows stale machine state as current

What it looks like

The screen says:

text
Machine Ready

But the backend has lost communication with the device, or the state is several seconds old.

Why it happens

The UI caches the last known state but does not show freshness.

Impact

The operator believes the machine is ready and tries to start or recover.

Fix

Show freshness explicitly:

text
Machine State: Ready
Last updated: 0.4s ago

If stale:

text
Machine State: Unknown
No update from controller for 8.2s
Operator commands disabled

2. Operator cannot tell what action is required

What it looks like

The UI says:

text
Error 2034
Sequence interrupted

Why it happens

The system exposes internal fault information but not operator guidance.

Impact

Operator guesses, retries, resets, or calls service for a simple recoverable issue.

Fix

Separate technical cause from required action:

text
Problem:
Wafer clamp did not confirm.

Required action:
1. Check wafer is seated.
2. Press Retry Clamp.
3. If this repeats, call service.

Details:
Sensor CLAMP_OK did not turn ON within 5 seconds.

3. Alarm/recovery text is too vague

What it looks like

text
Motion failed.

Why it happens

Developers reuse generic exception messages.

Impact

The operator does not know whether the issue is obstruction, limit, timeout, homing, or communication.

Fix

Use specific, action-oriented messages:

text
X-axis failed to reach target position.

Possible causes:
- Obstruction
- Axis not homed
- Servo fault

Required action:
Check stage area, then press Recover Axis.

4. Button label is ambiguous

What it looks like

text
[Reset]

Does it reset the alarm? Sequence? Axis? Recipe? Whole machine?

Why it happens

Developers use generic verbs.

Impact

Operators hesitate or press the wrong command.

Fix

Use object-specific labels:

text
[Reset Alarm]
[Reset Sequence]
[Home Axis]
[Reload Recipe]

5. Screen navigation hides critical context

What it looks like

The operator opens a service screen and loses visibility of the active fault or machine state.

Why it happens

Screens are designed as independent pages instead of workflow-aware views.

Impact

Operator performs action without seeing the current blocking condition.

Fix

Keep critical context persistent:

text
Top banner always visible:
Machine State | Mode | Active Fault | Current Lot/Wafer

6. UI allows action in wrong mode

What it looks like

Manual jog is available while the machine is still in Auto recovery.

Why it happens

UI command availability is handled locally per button instead of centrally from machine state/mode.

Impact

Operator action conflicts with workflow execution.

Fix

Use mode-aware command gating:

text
Command availability = function of:
- machine state
- current mode
- interlocks
- workflow step
- user role
- device readiness

7. Success message shown before physical completion

What it looks like

text
Unload completed

But the robot is still moving.

Why it happens

The UI treats command acceptance as completion.

Impact

Operator opens door or performs next action too early.

Fix

Represent command lifecycle clearly:

text
Unload requested
Unload accepted
Robot moving
Wafer placed in cassette
Unload completed

8. Too many warnings cause alarm fatigue

What it looks like

The screen constantly shows low-value warnings.

Operators start ignoring all warnings.

Why it happens

Everything is treated as equally important.

Impact

Critical warnings are missed.

Fix

Prioritize and classify messages:

text
Blocking fault:
    Requires action now

Warning:
    Machine can continue but should be watched

Information:
    Useful history, not urgent

Do not flood the operator with engineering noise.


9. Inconsistent terminology across screens

What it looks like

One screen says:

text
Recipe

Another says:

text
Program

Another says:

text
Job file

Why it happens

Different developers name things independently.

Impact

Operators are unsure whether these are the same concept.

Fix

Use centralized terminology.

For example:

text
Recipe = inspection parameters
Job = production execution instance
Lot = manufacturing batch
Wafer = physical unit being inspected

Use these terms consistently everywhere.


10. Service/debug information shown to production operator

What it looks like

Operator screen shows:

text
SequenceNodeException: AxisMoveStep.WaitForDone timeout at node 17
SDK_ERR_0x800704C7

Why it happens

Engineering diagnostics are exposed directly.

Impact

Operator is confused and may lose trust.

Fix

Layer the information:

text
Operator message:
Stage movement did not complete.

Action:
Press Recover Stage or call service.

Engineer details:
SequenceNodeException...
AxisMoveStep...
SDK error...

Operator UI and service UI should not be the same screen with different users looking at it.


Part 8 — Software Design Implications

Usability is not only a screen-layout concern. It affects architecture.

A good industrial HMI needs proper models behind it.

Important architectural concepts

1. State freshness indicators

The UI should know whether displayed state is current.

Example:

text
Fresh:
Last update < 1 second

Degraded:
Last update 1-5 seconds

Stale:
No update > 5 seconds

When state is stale, commands should usually be disabled or restricted.


2. Clear command result model

Do not model commands as simple button clicks.

Use a lifecycle model:

text
Requested
Validated
Rejected
Accepted
Executing
Completed
Failed
TimedOut
Cancelled

This allows the UI to tell the operator what is really happening.


3. Mode-aware UI behavior

The UI should not independently decide what is allowed.

Command availability should come from the machine/workflow context.

Bad:

text
Button decides:
if IsEnabledCheckbox == true

Good:

text
Presentation model decides:
CanStartInspection
CanRetryStep
CanAbortRun
CanJogAxis
CanEditRecipe

Based on machine state, mode, workflow, safety conditions, and role.


4. Centralized terminology

Terms should not be hardcoded randomly across screens.

A strong system has a shared vocabulary for:

  • machine states
  • modes
  • commands
  • alarms
  • recovery steps
  • recipe concepts
  • production context

This prevents confusion and improves training.


5. Consistent screen patterns

Operators build habits.

If every recovery screen has a different layout, the operator must think harder during stress.

Good pattern:

text
Problem
Cause
Required Action
Available Commands
Details

Use this consistently.


6. Operator/action context

Every important action should carry context.

Example audit/action context:

text
Action: Retry Clamp
Operator: user123
Mode: Auto Recovery
Lot: L23091
Wafer: W07
Recipe: Inspect_A
Machine State: Paused
Blocking Condition: Clamp timeout
Timestamp: 14:32:08

Even if this topic is not deep-diving audit internals, the usability implication is clear: context helps support, diagnosis, and accountability.


7. Workflow-guided recovery

A recovery UI should not just show an error. It should guide the operator through a safe path.

Bad:

text
Fault occurred.
[Reset]

Good:

text
Wafer clamp failed.

Step 1: Check wafer position.
Step 2: Retry clamp.
Step 3: If retry fails, unload wafer and call service.

Available:
[Retry Clamp]
[Abort Run]
[Call Service Instructions]

8. Separate operator-facing and engineer-facing diagnostics

Operators need action. Engineers need evidence.

Do not force one screen to serve both badly.

text
Operator view:
    What happened?
    What should I do?

Engineer view:
    Which subsystem?
    Which signal?
    Which timestamp?
    Which exception?
    Which device response?

Component Diagram

text
+--------------------------------------+
| Machine State + Workflow Context     |
| state, mode, step, faults, freshness |
+-------------------+------------------+
                    |
                    v
+--------------------------------------+
| UI Context / Presentation Model      |
| visible state, available actions,    |
| messages, guidance, command status   |
+-------------------+------------------+
                    |
                    v
+--------------------------------------+
| Operator Screen                      |
| clear status, next action, feedback  |
+-------------------+------------------+
                    |
                    v
+--------------------------------------+
| Operator Action                      |
| start, retry, abort, recover         |
+-------------------+------------------+
                    |
                    v
+--------------------------------------+
| Validated Command Path               |
| state/mode/interlock validation      |
+-------------------+------------------+
                    |
                    v
+--------------------------------------+
| Machine Feedback                     |
| accepted, executing, complete, fail  |
+--------------------------------------+

The screen should be the visible result of machine context, not an independent command panel.


Part 9 — Interview / Real-World Talking Points

A strong way to explain this in an interview:

Industrial HMI usability is not just about making screens look clean. It is about helping operators make correct decisions under pressure. The UI must clearly show machine state, blocking conditions, required actions, and command feedback. A machine can have correct backend logic but still fail operationally if the HMI misleads the operator, hides stale state, uses ambiguous labels, or shows success before physical completion.

Common mistakes software engineers make when entering industrial UI:

text
1. Treating HMI like a normal CRUD desktop app.
2. Showing too much raw diagnostic data to operators.
3. Not distinguishing command accepted from action completed.
4. Forgetting that operators work under stress and interruptions.
5. Using generic error messages from exceptions.
6. Allowing screens to hide critical machine context.
7. Making command enablement local to buttons instead of state-driven.
8. Ignoring stale data and communication loss.
9. Using inconsistent terminology.
10. Designing for developers instead of operators.

What strong engineers understand:

text
Good industrial HMI design is about operational clarity.

The operator should always know:
- what the machine is doing
- whether the displayed state is fresh
- what is blocking progress
- what action is safe
- what action is unavailable and why
- whether a command was requested, accepted, executing, or completed

The best HMIs reduce guessing.

They do not assume the operator has perfect attention, perfect training, perfect lighting, perfect time, or perfect context.

They are designed for the real factory:

text
stress
noise
fatigue
interruptions
gloves
shift handovers
production pressure
abnormal recovery

A production-grade HMI is successful when, during a fault, the operator does not need to guess what the machine means.

Docs-first project memory for AI-assisted implementation.