Data Modeling & Semantics Across Systems
In industrial machine software, the hardest data problems are often not caused by JSON, XML, OPC UA, SQL tables, or protocol encoding.
The hardest problems happen when two systems exchange a value successfully, but interpret it differently.
That is why data semantics matter more than data format.
This topic belongs closely to factory integration, traceability, MES/SCADA awareness, production reporting, result persistence, and time-series data concerns described in the roadmap’s data/manufacturing systems area.
PART 1 — WHY SEMANTICS MATTER MORE THAN DATA FORMAT
A data format answers:
“Can I read the value?”
Semantics answers:
“Do I understand what the value means?”
Those are not the same.
A machine may send:
Position = 100Technically, this is valid data.
But what does 100 mean?
It could mean:
100 mm
100 µm
100 encoder counts
100 motor steps
100 percent
100 raw ADC units
100 after calibration offset
100 before calibration offsetThe data is syntactically correct, but the machine behavior may be wrong.
In industrial systems, this is dangerous because software does not only display data. It may use data to:
- move hardware
- accept or reject material
- report production quality
- decide whether a machine is ready
- trigger alarms
- send results to MES
- mark a lot as completed
So a semantic mistake can become a production mistake.
Example 1 — Same value, different unit
Machine Software:
XPosition = 100 mm
MES:
XPosition = 100 µmThe value is the same.
The meaning is not.
A report may show that a wafer was inspected at a position 1000 times smaller than the real position.
Example 2 — Same status code, different meaning
PLC:
Status = 1 means Busy
Machine Software:
Status = 1 means ReadyThis is worse than a communication failure.
If communication fails, you usually know something is wrong.
If communication succeeds but meaning is wrong, the system may confidently make the wrong decision.
Example 3 — Same timestamp, different interpretation
Machine:
Timestamp = acquisition time
MES:
Timestamp = result upload time
SCADA:
Timestamp = tag polling timeAll three timestamps may be valid.
But they describe different moments.
For production traceability, this matters a lot. If an image defect was captured at 10:00:01 but reported at 10:00:07, the timestamp meaning affects correlation with motion, alarms, recipe version, lot state, and sensor values.
PART 2 — DATA MODELS IN DIFFERENT SYSTEMS
Different systems model the same machine at different abstraction levels.
A PLC often thinks in:
registers
bits
coils
inputs
outputs
numeric wordsMachine software thinks in:
objects
states
commands
workflows
recipes
results
alarmsSCADA thinks in:
tags
signals
trends
alarms
operator-visible statusMES thinks in:
jobs
lots
batches
wafers
recipes
equipment state
production results
quality recordsThese models are not wrong. They serve different purposes.
The problem is that integration requires translating between them.
ASCII model comparison
+------------------+ +-----------------------+
| PLC | | Machine Software |
+------------------+ +-----------------------+
| M100.0 | | MachineState |
| D120 | | AxisPosition |
| D121 | | InspectionWorkflow |
| X10 | | Recipe |
| Y20 | | Alarm |
+------------------+ +-----------------------+
| |
| |
v v
+------------------+ +-----------------------+
| SCADA | | MES |
+------------------+ +-----------------------+
| Tags | | Lot |
| Signals | | Job |
| Trends | | Batch |
| Alarm Summary | | Wafer Result |
| Equipment View | | Production Report |
+------------------+ +-----------------------+The important point:
These systems may all describe the same physical machine, but they do not describe it using the same mental model.
A PLC may expose:
D120 = 12345Machine software may interpret it as:
Stage X position = 12.345 mmSCADA may display it as:
StageXPosition = 12.345 mmMES may store it as:
InspectionPositionX = 12345 µmEach layer adds meaning.
That meaning must be explicit.
PART 3 — MAPPING BETWEEN SYSTEMS
Industrial integration is mostly translation.
You translate low-level signals into machine concepts, then machine concepts into factory concepts.
Example mappings
PLC register -> Machine state object
Machine result -> MES inspection report
Sensor value -> SCADA tag
Recipe parameter -> Device configuration
Alarm bit -> Operator-readable alarmA good mapping does not only copy data.
It defines:
source field
target field
unit
scale
allowed values
state meaning
timestamp meaning
validity condition
ownership
update frequencyASCII mapping diagram
+-----------------------+
| PLC Raw Data |
+-----------------------+
| D100 = 1 |
| D120 = 12345 |
| D130 = 250 |
+-----------+-----------+
|
| semantic mapping
v
+-----------------------+
| Machine Domain Model |
+-----------------------+
| State = Busy |
| XPosition = 12.345 mm |
| Temperature = 25.0 C |
+-----------+-----------+
|
| reporting mapping
v
+----------------------------+
| MES Production Model |
+----------------------------+
| EquipmentState = Running |
| InspectionX_um = 12345 |
| ChamberTemp_C = 25.0 |
+----------------------------+The most important box here is not PLC or MES.
It is the semantic mapping layer.
Without that layer, meaning gets scattered across code.
That leads to code like:
if (plc.ReadInt("D100") == 1)
{
// ready? busy? running? depends who remembers
}That is fragile.
A better design is:
var machineState = plcStateMapper.Map(rawPlcSnapshot);
if (machineState.ExecutionState == MachineExecutionState.Busy)
{
// clear semantic meaning
}The difference is not just code style.
The second design makes meaning visible, testable, reviewable, and documentable.
PART 4 — COMMON SEMANTIC CHALLENGES
1. Unit mismatch
This is one of the most common industrial data problems.
Examples:
mm vs µm
degrees vs radians
Celsius vs Fahrenheit
raw counts vs engineering units
motor steps vs physical distance
percentage vs normalized 0.0-1.0 valueA field named Position is not enough.
A strong field name is closer to:
StageXPositionMm
StageXPositionUm
RawEncoderCount
CalibratedStageXPositionMmIn industrial software, units should be treated as part of the contract, not as comments.
2. Scaling and conversion
PLC and device values are often scaled.
Example:
PLC D130 = 253
Meaning = 25.3 C
Scale = value / 10Another example:
Raw analog input = 32768
Meaning = 5.0 V
Then converted to pressure = 2.5 barThe risk is that one system applies the scale and another system applies it again.
Then 25.3 C becomes 2.53 C.
Or nobody applies the scale, and 253 is treated as 253 C.
Both are technically valid numbers.
Both are semantically wrong.
3. Naming inconsistencies
Different teams may use different names for the same concept:
LotId
BatchId
CarrierId
WorkOrderId
JobId
RunIdSometimes these are truly different.
Sometimes they are accidentally used interchangeably.
That is dangerous.
For example:
LotId = production lot
CarrierId = physical container
WaferId = individual wafer
RunId = machine execution session
JobId = MES instructionIf engineers casually map JobId to LotId, reporting may look correct in simple tests but fail in real production when one job contains multiple lots or one lot spans multiple runs.
4. State meaning differences
State names are especially dangerous.
Common words like these are not universal:
Ready
Idle
Running
Busy
Completed
Stopped
Aborted
Error
Paused
HeldFor example:
PLC Ready:
motion controller is ready to accept command
Machine Ready:
all devices initialized, recipe loaded, no blocking alarms
MES Ready:
equipment is available for dispatching a jobAll three can be called Ready.
They are not the same state.
5. Timing differences
Data meaning depends on when it was sampled.
Example:
StagePosition = 100 mmQuestion:
Was that position before motion?
During motion?
At image trigger time?
At inspection result generation time?
At MES upload time?In inspection systems, this matters a lot.
A defect result without the correct acquisition context can become meaningless.
6. Missing context
A value often needs context to be meaningful.
Example:
OffsetX = 12Missing questions:
12 what?
mm or µm?
relative to what coordinate system?
before or after calibration?
for which recipe?
for which product?
for which camera?
for which wafer orientation?Industrial data is rarely self-explanatory.
The context is part of the data.
PART 5 — CONTEXT & DATA VALIDITY
The same value can mean different things depending on machine context.
This is a major difference from typical business software.
In business software, Price = 100 usually means the same thing as long as currency is known.
In machine software, Position = 100 may mean different things depending on machine state.
Example — Calibration vs production
SensorValue = 0.82During calibration, this may mean:
raw measurement used to compute offsetDuring production, it may mean:
validated process measurementDuring maintenance, it may mean:
manual diagnostic readingSame value.
Different meaning.
Example — Position before and after homing
Before homing:
XPosition = 100may mean:
controller-reported relative position since power-onAfter homing:
XPosition = 100may mean:
known physical position in machine coordinate systemThis is why machine software should not expose position as only a number.
A better model is:
+-----------------------------+
| AxisPosition |
+-----------------------------+
| Value |
| Unit |
| CoordinateSystem |
| IsHomed |
| IsCalibrated |
| Timestamp |
| Validity |
+-----------------------------+The position is only trustworthy if the context says it is trustworthy.
Data validity model
+-----------------------------+
| Measurement |
+-----------------------------+
| Value |
| Unit |
| Timestamp |
| Source |
| MachineState |
| WorkflowStep |
| RecipeVersion |
| CalibrationVersion |
| ValidityStatus |
+-----------------------------+This may look verbose, but in real machines this metadata often saves days of debugging.
A measurement without context is just a number.
A measurement with context becomes evidence.
PART 6 — DATA CONSISTENCY ACROSS SYSTEMS
Different systems may have different views of reality.
That is normal.
The problem is pretending they always agree.
Example
PLC:
Machine is Busy
Machine Software:
Workflow is Completing
SCADA:
Machine is Running
MES:
Equipment is AvailableAll of these may be based on real data.
But they may be updated at different times.
Why inconsistency happens
Data can be:
delayed
cached
sampled periodically
partially updated
reported after workflow completion
written in different transaction boundaries
read from different sourcesExample:
Machine finishes inspection at 10:00:00
Result persisted locally at 10:00:01
SCADA tag updates at 10:00:02
MES report sent at 10:00:05
MES acknowledges at 10:00:07During those seven seconds, different systems may show different truth.
That does not automatically mean the system is broken.
It means the architecture needs a consistency strategy.
Consistency strategy questions
Strong engineers ask:
Who owns the source of truth?
Which data is command-critical?
Which data is display-only?
Which data is historical evidence?
Which data can be eventually consistent?
Which data must be synchronized before proceeding?
What timestamp represents the event time?
What timestamp represents the report time?Without these rules, teams argue during incidents because each system has a different “truth”.
PART 7 — REAL-WORLD FAILURE SCENARIOS
Scenario 1 — Wrong unit conversion causes incorrect operation
What it looks like:
Machine moves to the wrong position.
Inspection region is shifted.
Alignment appears unstable.
Recipe works on one machine but fails on another.Why it happens:
MES sends position in µm.
Machine software expects mm.
Or PLC exposes encoder counts.
Application treats it as physical distance.How engineers debug it:
Compare raw source value.
Check unit at every boundary.
Trace conversion code.
Compare expected physical movement with reported movement.
Verify recipe unit definitions.
Check calibration/version metadata.The key debugging question is:
Where did this number first change meaning?
Scenario 2 — MES reports incorrect production data due to mapping error
What it looks like:
MES shows wrong wafer count.
Lot appears completed too early.
Rejected units are reported as passed.
Production dashboard disagrees with machine history.Why it happens:
Machine RunId mapped to MES LotId.
Inspection attempt count mapped to production quantity.
Rework flow not represented correctly.
Partial run treated as completed run.How engineers debug it:
Compare machine event log with MES transaction log.
Identify source event that generated the report.
Check mapping rules for job, lot, wafer, carrier, and run.
Replay a known run through the reporting mapper.
Validate edge cases: abort, retry, rework, partial completion.This is not a database problem.
It is a semantic mapping problem.
Scenario 3 — PLC and machine disagree on state meaning
What it looks like:
UI says Ready.
PLC refuses command.
Machine starts command too early.
Operator sees confusing state transitions.
MES dispatches work while machine is not actually available.Why it happens:
PLC Ready means controller ready.
Machine Ready means workflow ready.
MES Ready means production available.How engineers debug it:
Draw state ownership map.
List each state source and meaning.
Check transition timing.
Review handshake signals.
Compare logs across PLC, machine software, and MES.
Create explicit state translation table.A good fix is not “rename a variable”.
A good fix is defining separate states:
ControllerReady
MachineOperationalReady
ProductionAvailableScenario 4 — Stale data used for decision
What it looks like:
Machine uses old sensor value.
Decision logic passes even though condition changed.
SCADA shows normal value while machine already alarmed.
MES receives result based on outdated context.Why it happens:
Polling interval too slow.
Cached value has no timestamp.
Consumer assumes latest value.
Data snapshot is not atomic.How engineers debug it:
Inspect timestamps.
Check polling/update frequency.
Log data age at decision point.
Compare event time vs read time.
Add freshness validation.A strong design treats data freshness explicitly:
if (sensorReading.Age > allowedAge)
{
reject decision;
}Scenario 5 — Scaling factor mismatch leads to subtle defects
What it looks like:
Temperature is slightly wrong.
Pressure control drifts.
Measurement threshold is consistently off.
Inspection quality changes gradually.
Only some products fail.Why it happens:
One side uses scale / 10.
Another side uses scale / 100.
Calibration table changed but mapper was not updated.
Vendor firmware changed raw value meaning.How engineers debug it:
Compare raw values to independent measurement.
Check firmware release notes.
Verify scaling formula.
Review calibration data version.
Run controlled test points.This kind of bug is hard because the system is not obviously broken.
It is slightly wrong.
In manufacturing, “slightly wrong” can still mean scrap, yield loss, or customer complaints.
Scenario 6 — Same field interpreted differently by different systems
What it looks like:
Dashboard and report disagree.
Operators trust one screen.
Process engineers trust another.
Service engineers cannot reproduce issue.Why it happens:
Field name is generic.
Documentation is missing.
One system uses raw value.
Another uses filtered value.
Another uses calibrated value.Example:
Thickness = 725Possible meanings:
raw sensor reading
calibrated thickness
average thickness
last sampled thickness
target thickness
measured thickness after compensationHow engineers debug it:
Find producer of the field.
Find all consumers.
Identify transformation steps.
Add semantic suffixes or separate fields.
Update contract documentation.
Add validation tests around mappings.PART 8 — SOFTWARE DESIGN IMPLICATIONS
The main design lesson is simple:
Data meaning must be explicit in the architecture.
Do not let meaning live only in someone’s head, a spreadsheet, or a comment beside a PLC address.
Bad approach
Application code reads raw values everywhere.
PLC D100 means different things in different places.
State codes are interpreted inline.
Unit conversion is duplicated.
MES mapping is built directly in workflow code.
SCADA tags are named casually.Example:
var value = plc.ReadInt("D120");
mes.Send("Position", value);This looks simple.
It is dangerous because nobody knows whether value is raw count, mm, µm, calibrated position, or current command target.
Good approach
Raw data is isolated.
Mapping is explicit.
Units are part of the model.
State translation is centralized.
MES reporting has contract tests.
SCADA tags are documented.
Data freshness is validated.Example:
var rawSnapshot = plc.ReadSnapshot();
var machineSnapshot = plcMapper.ToMachineSnapshot(rawSnapshot);
machineSnapshot.ValidateSemantics();
var mesReport = mesMapper.ToInspectionReport(machineSnapshot, currentRunContext);
await mesClient.ReportInspectionResultAsync(mesReport);This design separates:
raw communication
semantic interpretation
domain model
external reportingASCII component diagram
+------------------+
| PLC / Device |
+------------------+
| registers |
| bits |
| raw values |
+--------+---------+
|
v
+---------------------------+
| Raw Data Adapter |
+---------------------------+
| reads protocol/device data|
| no business meaning |
+-------------+-------------+
|
v
+---------------------------+
| Semantic Mapping Layer |
+---------------------------+
| units |
| scaling |
| state translation |
| validity rules |
| timestamp meaning |
+-------------+-------------+
|
v
+---------------------------+
| Machine Domain Model |
+---------------------------+
| machine state |
| axis position |
| recipe context |
| inspection result |
| alarms |
+-------------+-------------+
|
+-------+--------+
| |
v v
+-------------+ +----------------+
| SCADA Tags | | MES Reports |
+-------------+ +----------------+
| display | | production |
| trends | | traceability |
| alarms | | quality data |
+-------------+ +----------------+The design rule:
External systems should not consume raw machine data unless the contract explicitly says it is raw.
Most factory integrations should consume semantically meaningful data.
What should be in a data contract?
A serious industrial data contract should define more than field names.
For each field, define:
Name
Description
Source system
Owner
Unit
Scale
Allowed range
Allowed values
Timestamp meaning
Update frequency
Validity condition
Null/missing behavior
Calibration dependency
Recipe dependency
State dependency
Version introduced
Backward compatibility ruleExample:
Field:
StageXPositionUm
Meaning:
Calibrated X-axis stage position at image acquisition time
Unit:
micrometers
Timestamp:
hardware trigger timestamp
Valid when:
axis is homed
calibration is active
workflow is in inspection step
Invalid when:
machine is not homed
axis is moving without trigger synchronization
calibration version is unknownThat is a real contract.
Not just:
StageXPosition: numberPART 9 — INTERVIEW / REAL-WORLD TALKING POINTS
A strong interview explanation could be:
In industrial systems, data modeling is not just about structure. It is about preserving meaning across boundaries. A PLC may expose registers and bits, machine software may model states and workflows, SCADA may model tags, and MES may model jobs, lots, and production records. The integration risk is that values can be technically valid but semantically wrong. For example, a position value may be interpreted as millimeters in one system and micrometers in another, or a status code may mean controller-ready in the PLC but production-ready in MES. A strong design uses explicit mapping layers, unit definitions, state translation tables, timestamp semantics, data freshness checks, and contract tests so that every system interprets the data consistently.
Why format is not enough
Format tells you:
This is an integer.
This is a string.
This is a timestamp.
This field is required.Semantics tells you:
This integer is scaled by 1000.
This status is from the PLC state machine.
This timestamp means acquisition time.
This value is valid only after homing.
This result belongs to a specific recipe version.Format prevents parsing errors.
Semantics prevents wrong decisions.
Common mistakes engineers make
New engineers often:
treat PLC addresses as business meaning
trust field names without checking units
map status codes directly across systems
ignore timestamp meaning
forget data freshness
mix raw and calibrated values
assume MES, SCADA, PLC, and machine states mean the same thing
hide conversions inside random services
skip contract tests for mappingsThese mistakes are common because the data usually “works” in demos.
The problems appear during real production, with real recipes, real timing, partial failures, retries, stale values, and different machine variants.
What strong engineers understand
Strong industrial software engineers understand that:
Data has meaning, not just shape.
Units are part of the contract.
State names are not universal.
Timestamps need precise meaning.
Raw values and engineering values must be separated.
Context determines validity.
Mapping logic deserves tests.
Different systems may have different truths temporarily.
Semantic documentation is production protection.The key mindset is:
Never ask only “Did we receive the data?” Ask “Did we receive the right meaning at the right time, in the right context?”
That is the heart of data modeling and semantics across industrial systems.