Below is a principal-level explanation grounded in your roadmap’s communication topic and the surrounding architecture concerns: communication boundaries, request/response vs publish/subscribe, timing sensitivity, streaming behavior, and the fact that industrial systems are split across PC software, devices, controllers, and factory systems.
PART 1 — WHY COMMUNICATION MODELS MATTER
Industrial systems are not one program talking to one database. They are usually a set of cooperating participants:
- machine software on the industrial PC
- lower-level devices and controllers
- external plant systems such as PLC, SCADA, MES, historian, or host systems
The key architectural point is that these participants do not all behave the same way. Some interactions are command-oriented. Some are event-oriented. Some are continuous data flows. If you use the wrong communication model, the system may still work in a demo, but it becomes unstable in production.
A useful mental model is this:
- control usually wants explicit intent and explicit completion
- notification usually wants loose coupling and fan-out
- data movement usually wants throughput and buffering
That is why communication model choice is architectural, not just technical.
Why the wrong choice hurts
If you use the wrong model, you usually get one of these failure patterns:
- tight coupling: one component cannot move unless another responds immediately
- latency amplification: a slow device call blocks unrelated work
- missed behavior: something important happened, but nobody was listening in the right way
- overload: high-rate data is treated like ordinary messages and swamps the app
- unstable orchestration: workflow logic depends on timing assumptions that are not actually guaranteed
In machine software, this matters more because software is driving physical reality. Your Domain 1 material already frames the mindset correctly: machine behavior is long-running, asynchronous, deterministic, and safety-sensitive rather than simple call-and-return business logic.
Three simple examples
1. Sending a command to a device
“Move stage to X=125.0 Y=80.0”
This is not just “send data.” It is a control interaction with ownership, acknowledgement, execution time, and result interpretation.
2. Receiving an event from a sensor
“Door opened” or “vacuum reached” or “axis motion completed”
This is not a query result. It is something the system must react to when it happens.
3. Streaming measurement data
“Camera produces frames” or “DAQ card produces samples”
This is not a sequence of unrelated events. It is a continuous flow with timing, buffering, and downstream consumption concerns.
PART 2 — REQUEST / RESPONSE MODEL
This is the most familiar model for a software engineer.
One component sends a request. Another component processes it and returns a response.
That sounds simple, but in industrial systems there are two distinct meanings hidden inside it:
- request accepted: “I received your command”
- operation completed: “The physical action finished”
Those are not the same.
Core characteristics
- direct interaction between caller and callee
- clear ownership of who initiated the action
- easy to trace in code
- natural for commands and queries
- can be synchronous or asynchronous
- couples caller to callee availability and timing
Where it fits well
Use request/response when one component needs an explicit answer from another component.
Typical examples:
- command a device to start homing
- ask a controller for current temperature
- send recipe activation request to a subsystem
- ask MES for lot metadata
- request current alarm list from machine service
UML-style interaction diagram
+----------------+ request +------------------+
| Workflow Engine| ---------------------> | Device Adapter |
+----------------+ +------------------+
| |
| | execute against device
| v
| +------------------+
| | Device/Controller|
| +------------------+
| |
| response / accepted |
| <---------------------------------------|
|
| later, operation may still be runningThis diagram highlights the first important real-world truth: the response may only mean accepted, not physically done.
A more realistic pattern is this:
+----------------+ MoveTo(P1) +------------------+
| Workflow Engine| ---------------------> | Motion Service |
+----------------+ +------------------+
| |
|<--------- CommandAccepted ---------------|
|
| waits for completion elsewhere
v
next workflow state depends on later signalStrengths
- simple mental model
- clear control flow
- good for actions with explicit ownership
- easy for validation and authorization boundaries
- works well for operator commands and subsystem APIs
Weaknesses
- easy to over-block the system
- caller often assumes too much about timing
- poor fit for fan-out notifications
- poor fit for continuous high-rate data
- tends to create hidden serial dependencies if overused
Principal-level caution
In machine systems, request/response often becomes dangerous when developers write code as if physical execution behaves like an in-memory method call.
Bad mental model:
MoveAxis();
TurnOnLight();
CaptureImage();Real machine behavior is usually closer to:
Request move
Wait until safe/executing/complete
Verify final condition
Then continueSo request/response is often the initiation model, not the full coordination model.
PART 3 — PUBLISH / SUBSCRIBE (EVENT-DRIVEN)
In this model, a publisher emits an event, and one or more subscribers react.
The publisher does not need to know exactly who is listening. That is the main architectural value.
Core characteristics
- asynchronous
- decouples producer from consumers
- supports multiple listeners
- good for state change and notification
- less explicit end-to-end flow than request/response
Where it fits well
Use pub/sub when something has happened and multiple parts of the system may care.
Typical examples:
- device state changed
- recipe activated
- inspection started
- frame captured
- alarm raised/cleared
- lot completed
- safety state changed
- door opened
- axis completed move
UML-style interaction diagram
publishes event
+----------------------+
| Device Adapter |
+----------------------+
|
| AxisMotionCompleted
v
+-----------------+-----------------+------------------+
| | | |
v v v v
+----------------+ +----------------+ +----------------+ +----------------+
| Workflow Engine| | UI/HMI Status | | Alarm/History | | Data Recorder |
+----------------+ +----------------+ +----------------+ +----------------+This is the right shape when one occurrence matters to several consumers for different reasons.
Strengths
- loose coupling
- supports many listeners without changing publisher
- good for reactive architecture
- natural for state changes and notifications
- helps separate workflow, UI, logging, alarms, and analytics
Weaknesses
- end-to-end behavior is harder to reason about
- hidden dependencies can appear over time
- delivery timing may be nondeterministic at app level
- debugging becomes harder if event contracts are weak
- events can become noisy or semantically vague
Good event-driven use cases
A motion subsystem publishes:
MotionStartedMotionCompletedMotionFaulted
The workflow engine uses them for sequencing. The UI uses them for status. The historian uses them for traceability. The alarm system uses them for fault visibility.
That is a strong use of pub/sub because the same occurrence has multiple legitimate consumers.
Bad event-driven use case
A workflow step publishes “PleaseStartInspection” and hopes someone somewhere performs the actual critical action.
That is often weak architecture for control. Critical control intent usually needs a clear owner, not anonymous hope.
Principal-level rule
Use events to say what happened, not to vaguely delegate responsibility for who should do critical work.
Request/response is usually better for “do this.” Pub/sub is usually better for “this happened.”
PART 4 — STREAMING MODEL
Streaming is for continuous data flow over time.
This is not just “many messages.” Streaming has different design pressure:
- data arrives continuously
- consumers may process slower than producers
- timing and ordering matter
- buffering matters
- backpressure matters
- dropping strategy may matter
Core characteristics
- continuous producer-to-consumer flow
- high throughput
- often time-sensitive
- often multi-stage processing pipeline
- requires lifecycle control, buffering, and rate management
Where it fits well
Use streaming for continuous or bursty data production.
Typical examples:
- image acquisition frames
- high-rate sensor samples
- encoder position feed
- inspection results pipeline
- waveform collection
- live telemetry trend feed
UML-style flow diagram
+---------------+ frames +---------------+ results +---------------+
| Camera / DAQ | --------------> | Processing | -------------> | UI / Storage |
| Producer | | Pipeline | | / Analytics |
+---------------+ +---------------+ +---------------+
| |
| |
+-------- continuous flow ---------+A more realistic version includes buffers:
+-----------+ +---------+ +---------+ +-----------+
| Producer |-->| Buffer |-->| Worker |-->| Consumer |
+-----------+ +---------+ +---------+ +-----------+
^
|
absorbs burstsStrengths
- appropriate for high-volume or continuous data
- separates production rate from consumption rate
- supports pipeline architectures well
- natural for image and measurement systems
- enables staged processing and parallelism
Weaknesses
- significantly more complex than simple messaging
- easy to overrun memory or queues
- consumers may lag
- ordering and timing assumptions become tricky
- failure handling becomes subtle even before reliability concerns
Principal-level distinction
A stream is not just a chatty event bus.
If a camera produces 120 frames per second, treating each frame as a generic business-style event often creates architectural pain:
- UI subscribes directly and freezes
- logging layer accidentally touches every frame
- workflow layer gets flooded with data it does not need
- backlog grows silently
Streaming requires explicit pipeline design, not casual event subscription.
PART 5 — COMPARING MODELS
Here is the practical comparison.
+-------------------+-------------------+-------------------+-------------------+
| Dimension | Request/Response | Publish/Subscribe | Streaming |
+-------------------+-------------------+-------------------+-------------------+
| Primary purpose | Control / query | Notification | Continuous data |
| Initiator | Caller | Publisher | Producer |
| Receivers | Usually one | One or many | One or pipeline |
| Coupling | Higher | Lower | Medium |
| Control flow | Explicit | Indirect | Flow-oriented |
| Timing model | Immediate/awaited | Asynchronous | Ongoing |
| Data volume | Low to moderate | Low to moderate | Moderate to high |
| Best for | Commands, queries | State/events | Frames, samples |
| Common risk | Blocking | Hidden dependency | Overload/backlog |
+-------------------+-------------------+-------------------+-------------------+Another way to compare them:
CONTROL INTENT
Request/Response >>> strongest
REACTION / FAN-OUT
Publish/Subscribe >>> strongest
CONTINUOUS DATA MOVEMENT
Streaming >>> strongestArchitectural trade-off summary
Request/response
You gain clarity. You pay with tighter coupling.
Publish/subscribe
You gain flexibility and fan-out. You pay with harder reasoning and debugging.
Streaming
You gain throughput and pipeline structure. You pay with lifecycle, buffering, and load-management complexity.
PART 6 — MIXING MODELS IN REAL SYSTEMS
Real industrial systems almost never use one model only.
They mix models because the machine itself has different kinds of interactions.
Typical mixed model architecture
+--------------------+
| Operator UI / HMI |
+--------------------+
|
| request/response
v
+--------------------+
| Workflow / App |
| Orchestrator |
+--------------------+
| |
| +-----------------------------+
| |
| request/response | publish events
v v
+--------------------+ +--------------------+
| Device Services | -----------> | Event Bus / Event |
| (motion, camera) | | Distribution |
+--------------------+ +--------------------+
| |
| +--> UI status
| +--> Alarm/history
| +--> Workflow reactions
|
| streaming
v
+--------------------+
| Data Pipeline |
| frames/samples |
+--------------------+Real example
In a wafer inspection machine:
operator presses Start Inspection -> request/response from UI to workflow
workflow tells motion subsystem move to alignment point -> request/response
motion subsystem emits MotionCompleted -> publish/subscribe
camera starts producing images -> streaming
processing pipeline produces defect results -> stream or batched event output
machine reports lot completion to MES -> request/response or transactional external interface
That is normal. It would actually be suspicious if the whole machine used one interaction style everywhere.
Why mixing is necessary
Because system interactions are fundamentally different:
- commands need ownership
- events need decoupling
- data needs flow control
Trying to flatten all three into one model usually creates awkward code and operational problems.
Challenges when combining them
This is where architecture quality shows up.
Common challenges:
- workflow starts with a request but completes on an event
- an event starts a stream that later produces summarized events
- UI developers accidentally consume raw streams instead of derived state
- control logic becomes split across requests, events, and stream callbacks in confusing ways
The solution is not “pick one.” The solution is to define clear boundaries for each model.
PART 7 — REAL-WORLD FAILURE SCENARIOS
1. Using request/response where event model was needed
What it looks like
The workflow repeatedly polls a device for state:
Are you done?
Are you done?
Are you done?Why it happens
Developers are more comfortable with explicit calls than reactive designs.
Why it fails
- unnecessary latency
- extra load
- race windows between polls
- awkward workflow logic
- delayed reaction to important changes
How engineers fix it
Use request/response to issue the command, then use event-driven completion or state-change notification for the ongoing coordination.
2. Missed event leads to stuck workflow
What it looks like
A workflow waits forever for “vacuum reached” or “move completed,” but the event was emitted before the workflow subscribed, or it was not handled reliably inside the process.
Why it happens
The design assumes event timing is magically aligned with subscription timing.
Why it fails
The software state says “waiting,” while the machine may already have moved on physically.
How engineers fix it
They design state and event coordination intentionally:
- subscribe before action when needed
- reconcile event-driven reactions with current state snapshots
- do not model critical progress purely as blind transient notifications
3. Streaming data overwhelms the system
What it looks like
Frame queues grow, memory rises, UI lags, results arrive late, and the operator sees stale information.
Why it happens
The architecture treats the stream like ordinary app messaging. No explicit buffering or rate-control thinking was done.
Why it fails
Producer rate exceeds consumer rate.
How engineers fix it
They redesign the pipeline:
- isolate acquisition from processing
- introduce bounded queues or staged buffers
- reduce unnecessary subscribers
- publish derived summaries to UI instead of raw high-rate data
4. Pub/sub causes hidden dependencies
What it looks like
A subsystem emits a vague event like InspectionStarted, and over time five unrelated components begin depending on that event’s exact timing and content.
Why it happens
Events are easy to add, so teams use them as shortcuts.
Why it fails
The publisher becomes a hidden critical dependency graph that nobody fully owns.
How engineers fix it
They tighten event contracts:
- publish meaningful domain events
- document event semantics clearly
- avoid using events as hidden command channels
- keep critical control actions owned by explicit services
5. Blocking calls create latency chains
What it looks like
A UI action calls workflow, workflow calls controller, controller calls device, device is slow, and the whole path stalls. Operator interaction feels frozen. Alarms display late.
Why it happens
The entire architecture assumes the caller should wait synchronously for downstream work.
Why it fails
Long physical operations propagate waiting back into unrelated layers.
How engineers fix it
They separate:
- command acceptance
- operation progress
- completion notification
In other words, they stop confusing “request sent” with “work finished.”
PART 8 — SOFTWARE DESIGN IMPLICATIONS
Communication model is not just an integration detail. It shapes the whole software structure.
1. Choose model by interaction semantics
Ask this first:
- Is this an explicit command?
- Is this a state/event notification?
- Is this a continuous data flow?
That question should come before technology choice.
2. Make boundaries explicit
A strong architecture has explicit communication boundaries such as:
- workflow issues commands
- device services publish state changes
- acquisition publishes streams into processing pipelines
- UI consumes derived state, not raw hardware chatter
That prevents accidental coupling.
3. Keep models consistent inside a boundary
A common design smell is using multiple styles inconsistently for the same concept.
Bad:
- sometimes motion completion is returned directly
- sometimes it is an event
- sometimes UI checks a flag
- sometimes workflow polls
Good:
- command submission has one clear path
- completion has one clear path
- live data has one clear path
Consistency reduces mental load and bugs.
4. Separate control plane from data plane
This is a very strong industrial-software concept.
- control plane: commands, acknowledgements, lifecycle, state changes
- data plane: frames, measurements, high-rate telemetry
If you mix them casually, control becomes unstable under data load.
5. Design for physical reality, not software elegance alone
Machine communication is about real devices with timing, state, and side effects.
Good architecture asks:
- what does the sender really need back?
- who owns progression of this action?
- what happens if data arrives faster than we process it?
- what state can be reconstructed if timing is imperfect?
That mindset is more important than any specific protocol.
Good vs bad approaches
Bad
“One event bus for everything” or “Everything is just API calls” or “Streams are just lots of events”
These are category mistakes.
Good
Use:
- request/response for explicit control and query
- pub/sub for state changes and notifications
- streaming for continuous data movement
Then define how they connect.
PART 9 — INTERVIEW / REAL-WORLD TALKING POINTS
How to explain communication models clearly
A strong explanation sounds like this:
In industrial systems, communication style must match the interaction semantics. Commands usually need explicit ownership, so request/response fits well. State changes often need loose coupling and multiple listeners, so pub/sub is better. High-rate sensor or image data behaves differently again, so it needs a streaming model with buffering and pipeline thinking.
That answer shows architectural maturity because it is based on behavior, not buzzwords.
When to use each model
Use request/response when:
- a component needs an explicit answer
- an action has a clear owner
- you want traceable command flow
- you are issuing control intent or querying state
Use publish/subscribe when:
- something happened and many consumers may care
- you want decoupling
- the producer should not know all listeners
- you are distributing state changes or notifications
Use streaming when:
- data arrives continuously
- throughput matters
- consumers may lag behind producers
- buffering and staged processing are required
Common mistakes engineers make
- treating physical operations like ordinary function calls
- using polling everywhere because it feels simpler
- using events for critical commands with no clear owner
- sending high-rate data through general-purpose event mechanisms
- letting UI subscribe too close to raw device traffic
- mixing interaction styles inconsistently for the same subsystem
- failing to distinguish command acceptance from actual completion
What strong engineers understand
Strong engineers understand that communication models are about system behavior.
They know:
- coupling is created by interaction style, not just dependencies in code
- latency behavior is shaped by communication choice
- workflows usually cross multiple models, not one
- the hardest bugs often come from mismatched assumptions between model and reality
- industrial software must align communication semantics with timing, safety, and operational behavior
Final summary
In industrial machine software, communication models are one of the core architectural decisions.
- Request/response is best for explicit control and query.
- Publish/subscribe is best for notification and decoupled reaction.
- Streaming is best for continuous data flow.
The mistake is not choosing one over the others. The mistake is using one blindly everywhere.
Strong machine software uses each model intentionally:
- control actions are explicit
- events communicate what happened
- streams move high-rate data through controlled pipelines
That is how you keep machine software understandable, scalable, and stable under real production behavior rather than just under ideal demo conditions. This fits the roadmap’s emphasis on boundaries, timing-sensitive behavior, command/event design, streaming pipelines, and the communication split between PC software, devices, controllers, and external factory systems.
If you want, I can do the next pass as a machine-specific example architecture for a wafer inspection system showing exactly which subsystem uses which communication model.