Communication Security & Safety Boundaries
This topic sits between industrial communication, safety/interlocks, PLC/controller interaction, and cybersecurity for industrial systems. Your roadmap already highlights that machine software must respect interlocks, permissives, safe motion gating, safety-related UI restrictions, traceability, controlled access, audit logging, and least privilege.
Part 1 — Why Communication Boundaries Are Safety-Critical
In normal enterprise software, a communication boundary usually protects data.
In industrial machine software, a communication boundary also protects:
- people
- equipment
- product quality
- production state
- machine integrity
- traceability
An industrial machine may receive commands from many places:
Operator HMI
PLC / safety controller
SCADA
MES
service tool
remote support client
third-party integrationThe key point is:
Just because a system can communicate with the machine does not mean it should be allowed to control the machine.
Example:
MES may start a production job.
MES should not directly jog an axis.
SCADA may acknowledge or display alarms.
SCADA should not bypass interlocks.
Remote service may run diagnostics.
Remote service should not move hardware unless the machine is in a controlled service mode.This is why communication security in machine systems is not only about “is the request authenticated?” It is also about:
Is this caller allowed to request this action?
Is this action valid for this machine mode?
Is this action safe right now?
Can we trace who requested it?
Can the system fail closed if uncertain?Part 2 — Trust Levels & Control Authority
Different systems have different authority.
A local machine controller is usually more trusted than a factory-level MES. A PLC or safety controller may own safety-critical permissives. A remote service tool may be powerful, but only under restricted conditions.
A practical trust model may look like this:
+-------------------------------------------------------------+
| MACHINE TRUSTED ZONE |
| |
| +-------------------+ +---------------------------+ |
| | Operator HMI | | Machine Controller | |
| | Local control |-----> | Workflow / State Machine | |
| +-------------------+ +-------------+-------------+ |
| | |
| v |
| +---------------------------+ |
| | Device / Motion Layer | |
| +-------------+-------------+ |
| | |
| v |
| +---------------------------+ |
| | PLC / Safety Controller | |
| | Interlocks / Permissives | |
| +---------------------------+ |
| |
+-----------------------------^-------------------------------+
|
|
+-----------------------------+-------------------------------+
| CONTROLLED COMMUNICATION BOUNDARY |
+-----------------------------+-------------------------------+
|
+---------------------+----------------------+
| |
v v
+---------------+ +---------------+ +----------------+
| MES | | SCADA | | Remote Service |
| production | | supervision | | restricted |
+---------------+ +---------------+ +----------------+
+-------------------------------------------------------------+
| LESS TRUSTED / EXTERNAL ZONE |
| third-party tools, factory clients, integrations |
+-------------------------------------------------------------+The design principle is:
Control authority must be explicit, not accidental.
Bad design allows any connected client to call internal APIs.
Good design says:
MES can:
- download job
- start approved recipe
- receive results
- query machine state
MES cannot:
- jog axis
- disable interlock
- force sensor state
- directly reset safety faultPart 3 — Command Authorization vs Command Safety
These two are different.
Authorization asks:
Is this caller allowed to request this command?Safety gating asks:
Is this command safe and valid right now?Both are required.
Example 1:
Caller: authenticated service engineer
Command: move X axis
Authorization: allowed
Machine state: guard door open
Result: reject
Reason: unsafe right nowExample 2:
Caller: authenticated MES
Command: start production job
Authorization: allowed
Machine state: auto mode, idle, recipe validated
Result: acceptExample 3:
Caller: authenticated MES
Command: jog Z axis
Authorization: rejected
Reason: MES has no manual motion authorityA common mistake from enterprise software engineers is to think:
“The user is authenticated and has the role, so the command is valid.”
In machine software, that is not enough.
A command can be authorized but still unsafe.
Part 4 — Safe Command Gating Across Communication Boundaries
External systems should never directly call motion APIs, device APIs, or low-level controller methods.
Bad:
MES ---> MotionController.MoveAxis()Good:
MES
|
v
API / Protocol Boundary
|
v
Authentication / Identity
|
v
Authorization / Capability Check
|
v
Command Contract Validation
|
v
Machine Mode / State Validation
|
v
Interlock / Permissive Check
|
v
Audit Log
|
v
Machine Controller
|
v
Device / Motion / PLC LayerASCII command flow:
+------------------+
| External System |
| MES / SCADA / |
| Service Tool |
+--------+---------+
|
v
+---------------------------+
| API / Protocol Boundary |
| - parse message |
| - validate schema |
| - normalize command |
+------------+--------------+
|
v
+---------------------------+
| Authorization Layer |
| - identity |
| - role / capability |
| - source system policy |
+------------+--------------+
|
v
+---------------------------+
| Safety Gate |
| - machine mode |
| - current state |
| - interlocks |
| - permissives |
| - active alarms |
+------------+--------------+
|
v
+---------------------------+
| Command Orchestrator |
| - sequence ownership |
| - state transition |
| - cancellation handling |
+------------+--------------+
|
v
+---------------------------+
| Machine Controller |
| - motion |
| - devices |
| - PLC handshake |
+---------------------------+The safety gate must be independent of caller identity.
Even a highly privileged caller should not bypass:
guard door state
E-stop state
axis homing status
soft limits
machine mode
active faults
recipe validity
ownership of current operationThis aligns with the roadmap principle that motion and machine actions must be validated, state-driven, safe, and deterministic.
Part 5 — Read Access vs Control Access
Many systems need read access.
Examples:
machine status
current job
alarms
event history
inspection results
throughput metrics
device health
production countersFewer systems should have control access.
Examples:
start job
stop job
pause / resume
reset fault
change recipe
enter maintenance mode
move axis
trigger calibration
override configurationThese should be separated at the contract level, not only by UI hiding.
Bad:
IMachineApi
GetStatus()
GetAlarms()
StartJob()
MoveAxis()
DisableInterlock()Better:
IMachineReadApi
GetStatus()
GetAlarms()
GetProductionData()
GetResults()
IMachineControlApi
StartJob()
StopJob()
ResetFault()
IServiceControlApi
JogAxis()
RunDiagnostic()
Calibrate()
ISafetyRestrictedApi
Not exposed to external clientsThe important idea:
Read access should not accidentally imply control authority.
A historian, MES, dashboard, or analytics client should not receive write-capable credentials just because it needs production data.
Part 6 — Real-World Failure Scenarios
1. External system sends command in the wrong machine state
Production symptom:
MES sends StartJob while the machine is still recovering from a fault.
Machine accepts the command.
Workflow enters inconsistent state.
Operator sees confusing alarms.Why it happens:
The API checked authentication but did not check machine state.Prevention:
Use explicit state validation:
Idle + AutoMode + RecipeLoaded + NoCriticalAlarm + SafetyReady2. Authenticated integration bypasses local safety logic
Production symptom:
A service tool calls a low-level motion method directly.
Axis moves even though the operator screen would have disabled the button.Why it happens:
Safety logic existed only in the UI.
Backend APIs were not gated.Prevention:
Never put safety gating only in the UI.
Centralize command validation below all clients.3. Service tool leaves machine in unsafe mode
Production symptom:
Engineer enables maintenance mode.
Runs diagnostics.
Walks away.
Next operator starts operation while service flags are still active.Why it happens:
Service mode has no timeout, ownership, or exit validation.Prevention:
Require explicit service session ownership.
Add timeout.
Show visible machine state.
Block production start until service mode is cleared.
Audit who entered and exited service mode.4. SCADA/MES command conflicts with local operator action
Production symptom:
Operator presses Stop locally.
MES immediately sends Start again.
Machine oscillates between stopping and starting.Why it happens:
No control ownership model.
The system treats all commands equally.Prevention:
Introduce command ownership:
- local operator has priority during manual intervention
- MES controls only in auto production mode
- stop/abort commands may have higher priority than start5. Remote command cannot be traced
Production symptom:
Machine changed recipe at 02:13.
No one knows whether it was operator, MES, service engineer, or script.Why it happens:
Logs record “RecipeChanged” but not caller identity, source, command ID, or reason.Prevention:
Audit every external command:
- timestamp
- source system
- authenticated identity
- command type
- parameters
- machine state before/after
- accepted/rejected reason6. Read-only integration accidentally gains write capability
Production symptom:
A dashboard client intended for monitoring can reset alarms or change parameters.Why it happens:
Same API key or role is reused for read and control operations.Prevention:
Separate read contracts, write contracts, credentials, roles, and network/API exposure.Part 7 — Software Design Implications
Communication boundaries must be first-class architecture concepts.
They are not just controllers, endpoints, sockets, or protocol handlers.
They define:
Who can talk to the machine?
What can they ask for?
When is it valid?
Who owns control?
What must be logged?
What happens if validation is uncertain?Good component design:
+-------------------------------------------------------------+
| Machine Application |
| |
| +-------------------+ +----------------------------+ |
| | Operator HMI | | External API Boundary | |
| | Local commands | | MES / SCADA / Service | |
| +---------+---------+ +-------------+--------------+ |
| | | |
| +---------------+---------------+ |
| v |
| +---------------------+ |
| | Command Gateway | |
| | - command contract | |
| | - source policy | |
| | - audit correlation | |
| +----------+----------+ |
| | |
| v |
| +---------------------+ |
| | Authorization | |
| | Capability Policy | |
| +----------+----------+ |
| | |
| v |
| +---------------------+ |
| | Safety Gate | |
| | Mode / State / | |
| | Interlock Checks | |
| +----------+----------+ |
| | |
| v |
| +---------------------+ |
| | Workflow / State | |
| | Machine Controller | |
| +----------+----------+ |
| | |
| v |
| +---------------------+ |
| | Device / PLC Layer | |
| +---------------------+ |
| |
+-------------------------------------------------------------+Bad approach:
External systems directly invoke internal services.
UI has safety checks, but backend does not.
Motion APIs are reachable from integration code.
Roles are broad: Admin can do everything.
Logs only show success/failure, not command source.Good approach:
All external commands pass through a command gateway.
Each command has an explicit contract.
Each caller has explicit capabilities.
Safety validation is independent of caller identity.
Machine state controls command acceptance.
All command attempts are audited.
Unknown state means reject, not allow.The strongest rule:
Fail closed.
If the system cannot determine whether the command is safe, valid, authorized, or traceable, it should reject the command.
Part 8 — Interview / Real-World Talking Points
A strong explanation:
In industrial machine software, communication security is not just about protecting APIs. It is about protecting control authority. A machine may receive commands from HMI, PLC, SCADA, MES, service tools, or remote clients, but each source must have explicit authority. Authentication tells us who is calling. Authorization tells us whether that caller may request the command. Safety gating tells us whether the machine should execute it right now. Those checks must be centralized below all communication paths, not scattered in the UI or individual protocol handlers.
Another strong version:
I would never allow external systems to call device or motion APIs directly. I would route all external commands through a controlled boundary where we validate identity, command type, source capability, machine mode, current state, interlocks, permissives, and audit context. MES may be allowed to start an approved job, but it should not jog an axis. SCADA may acknowledge alarms, but it should not bypass safety logic. Remote service may perform diagnostics, but only in a restricted service mode with traceability.
Common mistakes:
Treating authentication as enough.
Putting safety checks only in the UI.
Letting MES or SCADA call internal machine services directly.
Using one “admin” role for all machine actions.
Not separating read access from control access.
Not modeling command ownership.
Not auditing rejected commands.
Allowing external commands during ambiguous machine state.What strong engineers understand:
Authority is explicit.
Safety is state-dependent.
Control access is narrower than read access.
External systems should request intent, not manipulate devices.
Machine state decides whether a command is executable.
Safety checks must exist below every communication path.
Traceability is part of safety and diagnosability.Final mental model:
In enterprise software, a bad command may corrupt data. In machine software, a bad command may move hardware, damage equipment, scrap material, or put people at risk. Therefore, communication boundaries are safety boundaries.