Skip to content

Communication Security & Safety Boundaries

This topic sits between industrial communication, safety/interlocks, PLC/controller interaction, and cybersecurity for industrial systems. Your roadmap already highlights that machine software must respect interlocks, permissives, safe motion gating, safety-related UI restrictions, traceability, controlled access, audit logging, and least privilege.


Part 1 — Why Communication Boundaries Are Safety-Critical

In normal enterprise software, a communication boundary usually protects data.

In industrial machine software, a communication boundary also protects:

  • people
  • equipment
  • product quality
  • production state
  • machine integrity
  • traceability

An industrial machine may receive commands from many places:

text
Operator HMI
PLC / safety controller
SCADA
MES
service tool
remote support client
third-party integration

The key point is:

Just because a system can communicate with the machine does not mean it should be allowed to control the machine.

Example:

text
MES may start a production job.
MES should not directly jog an axis.

SCADA may acknowledge or display alarms.
SCADA should not bypass interlocks.

Remote service may run diagnostics.
Remote service should not move hardware unless the machine is in a controlled service mode.

This is why communication security in machine systems is not only about “is the request authenticated?” It is also about:

text
Is this caller allowed to request this action?
Is this action valid for this machine mode?
Is this action safe right now?
Can we trace who requested it?
Can the system fail closed if uncertain?

Part 2 — Trust Levels & Control Authority

Different systems have different authority.

A local machine controller is usually more trusted than a factory-level MES. A PLC or safety controller may own safety-critical permissives. A remote service tool may be powerful, but only under restricted conditions.

A practical trust model may look like this:

text
+-------------------------------------------------------------+
|                    MACHINE TRUSTED ZONE                     |
|                                                             |
|  +-------------------+       +---------------------------+   |
|  | Operator HMI      |       | Machine Controller        |   |
|  | Local control     |-----> | Workflow / State Machine  |   |
|  +-------------------+       +-------------+-------------+   |
|                                      |                       |
|                                      v                       |
|                         +---------------------------+        |
|                         | Device / Motion Layer     |        |
|                         +-------------+-------------+        |
|                                      |                       |
|                                      v                       |
|                         +---------------------------+        |
|                         | PLC / Safety Controller   |        |
|                         | Interlocks / Permissives  |        |
|                         +---------------------------+        |
|                                                             |
+-----------------------------^-------------------------------+
                              |
                              |
+-----------------------------+-------------------------------+
|                 CONTROLLED COMMUNICATION BOUNDARY            |
+-----------------------------+-------------------------------+
                              |
        +---------------------+----------------------+
        |                                            |
        v                                            v
+---------------+     +---------------+     +----------------+
| MES           |     | SCADA         |     | Remote Service |
| production    |     | supervision   |     | restricted     |
+---------------+     +---------------+     +----------------+

+-------------------------------------------------------------+
|                 LESS TRUSTED / EXTERNAL ZONE                 |
|     third-party tools, factory clients, integrations          |
+-------------------------------------------------------------+

The design principle is:

Control authority must be explicit, not accidental.

Bad design allows any connected client to call internal APIs.

Good design says:

text
MES can:
- download job
- start approved recipe
- receive results
- query machine state

MES cannot:
- jog axis
- disable interlock
- force sensor state
- directly reset safety fault

Part 3 — Command Authorization vs Command Safety

These two are different.

Authorization asks:

text
Is this caller allowed to request this command?

Safety gating asks:

text
Is this command safe and valid right now?

Both are required.

Example 1:

text
Caller: authenticated service engineer
Command: move X axis
Authorization: allowed
Machine state: guard door open
Result: reject
Reason: unsafe right now

Example 2:

text
Caller: authenticated MES
Command: start production job
Authorization: allowed
Machine state: auto mode, idle, recipe validated
Result: accept

Example 3:

text
Caller: authenticated MES
Command: jog Z axis
Authorization: rejected
Reason: MES has no manual motion authority

A common mistake from enterprise software engineers is to think:

“The user is authenticated and has the role, so the command is valid.”

In machine software, that is not enough.

A command can be authorized but still unsafe.


Part 4 — Safe Command Gating Across Communication Boundaries

External systems should never directly call motion APIs, device APIs, or low-level controller methods.

Bad:

text
MES ---> MotionController.MoveAxis()

Good:

text
MES
 |
 v
API / Protocol Boundary
 |
 v
Authentication / Identity
 |
 v
Authorization / Capability Check
 |
 v
Command Contract Validation
 |
 v
Machine Mode / State Validation
 |
 v
Interlock / Permissive Check
 |
 v
Audit Log
 |
 v
Machine Controller
 |
 v
Device / Motion / PLC Layer

ASCII command flow:

text
+------------------+
| External System  |
| MES / SCADA /    |
| Service Tool     |
+--------+---------+
         |
         v
+---------------------------+
| API / Protocol Boundary   |
| - parse message           |
| - validate schema         |
| - normalize command       |
+------------+--------------+
             |
             v
+---------------------------+
| Authorization Layer       |
| - identity                |
| - role / capability       |
| - source system policy    |
+------------+--------------+
             |
             v
+---------------------------+
| Safety Gate               |
| - machine mode            |
| - current state           |
| - interlocks              |
| - permissives             |
| - active alarms           |
+------------+--------------+
             |
             v
+---------------------------+
| Command Orchestrator      |
| - sequence ownership      |
| - state transition        |
| - cancellation handling   |
+------------+--------------+
             |
             v
+---------------------------+
| Machine Controller        |
| - motion                  |
| - devices                 |
| - PLC handshake           |
+---------------------------+

The safety gate must be independent of caller identity.

Even a highly privileged caller should not bypass:

text
guard door state
E-stop state
axis homing status
soft limits
machine mode
active faults
recipe validity
ownership of current operation

This aligns with the roadmap principle that motion and machine actions must be validated, state-driven, safe, and deterministic.


Part 5 — Read Access vs Control Access

Many systems need read access.

Examples:

text
machine status
current job
alarms
event history
inspection results
throughput metrics
device health
production counters

Fewer systems should have control access.

Examples:

text
start job
stop job
pause / resume
reset fault
change recipe
enter maintenance mode
move axis
trigger calibration
override configuration

These should be separated at the contract level, not only by UI hiding.

Bad:

text
IMachineApi
  GetStatus()
  GetAlarms()
  StartJob()
  MoveAxis()
  DisableInterlock()

Better:

text
IMachineReadApi
  GetStatus()
  GetAlarms()
  GetProductionData()
  GetResults()

IMachineControlApi
  StartJob()
  StopJob()
  ResetFault()

IServiceControlApi
  JogAxis()
  RunDiagnostic()
  Calibrate()

ISafetyRestrictedApi
  Not exposed to external clients

The important idea:

Read access should not accidentally imply control authority.

A historian, MES, dashboard, or analytics client should not receive write-capable credentials just because it needs production data.


Part 6 — Real-World Failure Scenarios

1. External system sends command in the wrong machine state

Production symptom:

text
MES sends StartJob while the machine is still recovering from a fault.
Machine accepts the command.
Workflow enters inconsistent state.
Operator sees confusing alarms.

Why it happens:

text
The API checked authentication but did not check machine state.

Prevention:

text
Use explicit state validation:
Idle + AutoMode + RecipeLoaded + NoCriticalAlarm + SafetyReady

2. Authenticated integration bypasses local safety logic

Production symptom:

text
A service tool calls a low-level motion method directly.
Axis moves even though the operator screen would have disabled the button.

Why it happens:

text
Safety logic existed only in the UI.
Backend APIs were not gated.

Prevention:

text
Never put safety gating only in the UI.
Centralize command validation below all clients.

3. Service tool leaves machine in unsafe mode

Production symptom:

text
Engineer enables maintenance mode.
Runs diagnostics.
Walks away.
Next operator starts operation while service flags are still active.

Why it happens:

text
Service mode has no timeout, ownership, or exit validation.

Prevention:

text
Require explicit service session ownership.
Add timeout.
Show visible machine state.
Block production start until service mode is cleared.
Audit who entered and exited service mode.

4. SCADA/MES command conflicts with local operator action

Production symptom:

text
Operator presses Stop locally.
MES immediately sends Start again.
Machine oscillates between stopping and starting.

Why it happens:

text
No control ownership model.
The system treats all commands equally.

Prevention:

text
Introduce command ownership:
- local operator has priority during manual intervention
- MES controls only in auto production mode
- stop/abort commands may have higher priority than start

5. Remote command cannot be traced

Production symptom:

text
Machine changed recipe at 02:13.
No one knows whether it was operator, MES, service engineer, or script.

Why it happens:

text
Logs record “RecipeChanged” but not caller identity, source, command ID, or reason.

Prevention:

text
Audit every external command:
- timestamp
- source system
- authenticated identity
- command type
- parameters
- machine state before/after
- accepted/rejected reason

6. Read-only integration accidentally gains write capability

Production symptom:

text
A dashboard client intended for monitoring can reset alarms or change parameters.

Why it happens:

text
Same API key or role is reused for read and control operations.

Prevention:

text
Separate read contracts, write contracts, credentials, roles, and network/API exposure.

Part 7 — Software Design Implications

Communication boundaries must be first-class architecture concepts.

They are not just controllers, endpoints, sockets, or protocol handlers.

They define:

text
Who can talk to the machine?
What can they ask for?
When is it valid?
Who owns control?
What must be logged?
What happens if validation is uncertain?

Good component design:

text
+-------------------------------------------------------------+
|                     Machine Application                     |
|                                                             |
|  +-------------------+       +----------------------------+ |
|  | Operator HMI      |       | External API Boundary      | |
|  | Local commands    |       | MES / SCADA / Service      | |
|  +---------+---------+       +-------------+--------------+ |
|            |                               |                |
|            +---------------+---------------+                |
|                            v                                |
|                 +---------------------+                     |
|                 | Command Gateway     |                     |
|                 | - command contract  |                     |
|                 | - source policy     |                     |
|                 | - audit correlation |                     |
|                 +----------+----------+                     |
|                            |                                |
|                            v                                |
|                 +---------------------+                     |
|                 | Authorization       |                     |
|                 | Capability Policy   |                     |
|                 +----------+----------+                     |
|                            |                                |
|                            v                                |
|                 +---------------------+                     |
|                 | Safety Gate         |                     |
|                 | Mode / State /      |                     |
|                 | Interlock Checks    |                     |
|                 +----------+----------+                     |
|                            |                                |
|                            v                                |
|                 +---------------------+                     |
|                 | Workflow / State    |                     |
|                 | Machine Controller  |                     |
|                 +----------+----------+                     |
|                            |                                |
|                            v                                |
|                 +---------------------+                     |
|                 | Device / PLC Layer  |                     |
|                 +---------------------+                     |
|                                                             |
+-------------------------------------------------------------+

Bad approach:

text
External systems directly invoke internal services.
UI has safety checks, but backend does not.
Motion APIs are reachable from integration code.
Roles are broad: Admin can do everything.
Logs only show success/failure, not command source.

Good approach:

text
All external commands pass through a command gateway.
Each command has an explicit contract.
Each caller has explicit capabilities.
Safety validation is independent of caller identity.
Machine state controls command acceptance.
All command attempts are audited.
Unknown state means reject, not allow.

The strongest rule:

Fail closed.

If the system cannot determine whether the command is safe, valid, authorized, or traceable, it should reject the command.


Part 8 — Interview / Real-World Talking Points

A strong explanation:

In industrial machine software, communication security is not just about protecting APIs. It is about protecting control authority. A machine may receive commands from HMI, PLC, SCADA, MES, service tools, or remote clients, but each source must have explicit authority. Authentication tells us who is calling. Authorization tells us whether that caller may request the command. Safety gating tells us whether the machine should execute it right now. Those checks must be centralized below all communication paths, not scattered in the UI or individual protocol handlers.

Another strong version:

I would never allow external systems to call device or motion APIs directly. I would route all external commands through a controlled boundary where we validate identity, command type, source capability, machine mode, current state, interlocks, permissives, and audit context. MES may be allowed to start an approved job, but it should not jog an axis. SCADA may acknowledge alarms, but it should not bypass safety logic. Remote service may perform diagnostics, but only in a restricted service mode with traceability.

Common mistakes:

text
Treating authentication as enough.
Putting safety checks only in the UI.
Letting MES or SCADA call internal machine services directly.
Using one “admin” role for all machine actions.
Not separating read access from control access.
Not modeling command ownership.
Not auditing rejected commands.
Allowing external commands during ambiguous machine state.

What strong engineers understand:

text
Authority is explicit.
Safety is state-dependent.
Control access is narrower than read access.
External systems should request intent, not manipulate devices.
Machine state decides whether a command is executable.
Safety checks must exist below every communication path.
Traceability is part of safety and diagnosability.

Final mental model:

In enterprise software, a bad command may corrupt data. In machine software, a bad command may move hardware, damage equipment, scrap material, or put people at risk. Therefore, communication boundaries are safety boundaries.

Docs-first project memory for AI-assisted implementation.