Skip to content

Data Modeling & Semantics Across Systems

In industrial machine software, the hardest data problems are often not caused by JSON, XML, OPC UA, SQL tables, or protocol encoding.

The hardest problems happen when two systems exchange a value successfully, but interpret it differently.

That is why data semantics matter more than data format.

This topic belongs closely to factory integration, traceability, MES/SCADA awareness, production reporting, result persistence, and time-series data concerns described in the roadmap’s data/manufacturing systems area.


PART 1 — WHY SEMANTICS MATTER MORE THAN DATA FORMAT

A data format answers:

“Can I read the value?”

Semantics answers:

“Do I understand what the value means?”

Those are not the same.

A machine may send:

text
Position = 100

Technically, this is valid data.

But what does 100 mean?

It could mean:

text
100 mm
100 µm
100 encoder counts
100 motor steps
100 percent
100 raw ADC units
100 after calibration offset
100 before calibration offset

The data is syntactically correct, but the machine behavior may be wrong.

In industrial systems, this is dangerous because software does not only display data. It may use data to:

  • move hardware
  • accept or reject material
  • report production quality
  • decide whether a machine is ready
  • trigger alarms
  • send results to MES
  • mark a lot as completed

So a semantic mistake can become a production mistake.


Example 1 — Same value, different unit

text
Machine Software:
  XPosition = 100 mm

MES:
  XPosition = 100 µm

The value is the same.

The meaning is not.

A report may show that a wafer was inspected at a position 1000 times smaller than the real position.


Example 2 — Same status code, different meaning

text
PLC:
  Status = 1 means Busy

Machine Software:
  Status = 1 means Ready

This is worse than a communication failure.

If communication fails, you usually know something is wrong.

If communication succeeds but meaning is wrong, the system may confidently make the wrong decision.


Example 3 — Same timestamp, different interpretation

text
Machine:
  Timestamp = acquisition time

MES:
  Timestamp = result upload time

SCADA:
  Timestamp = tag polling time

All three timestamps may be valid.

But they describe different moments.

For production traceability, this matters a lot. If an image defect was captured at 10:00:01 but reported at 10:00:07, the timestamp meaning affects correlation with motion, alarms, recipe version, lot state, and sensor values.


PART 2 — DATA MODELS IN DIFFERENT SYSTEMS

Different systems model the same machine at different abstraction levels.

A PLC often thinks in:

text
registers
bits
coils
inputs
outputs
numeric words

Machine software thinks in:

text
objects
states
commands
workflows
recipes
results
alarms

SCADA thinks in:

text
tags
signals
trends
alarms
operator-visible status

MES thinks in:

text
jobs
lots
batches
wafers
recipes
equipment state
production results
quality records

These models are not wrong. They serve different purposes.

The problem is that integration requires translating between them.


ASCII model comparison

text
+------------------+        +-----------------------+
|       PLC        |        |   Machine Software    |
+------------------+        +-----------------------+
| M100.0           |        | MachineState          |
| D120             |        | AxisPosition          |
| D121             |        | InspectionWorkflow    |
| X10              |        | Recipe                |
| Y20              |        | Alarm                 |
+------------------+        +-----------------------+
        |                              |
        |                              |
        v                              v
+------------------+        +-----------------------+
|      SCADA       |        |          MES          |
+------------------+        +-----------------------+
| Tags             |        | Lot                   |
| Signals          |        | Job                   |
| Trends           |        | Batch                 |
| Alarm Summary    |        | Wafer Result          |
| Equipment View   |        | Production Report     |
+------------------+        +-----------------------+

The important point:

These systems may all describe the same physical machine, but they do not describe it using the same mental model.

A PLC may expose:

text
D120 = 12345

Machine software may interpret it as:

text
Stage X position = 12.345 mm

SCADA may display it as:

text
StageXPosition = 12.345 mm

MES may store it as:

text
InspectionPositionX = 12345 µm

Each layer adds meaning.

That meaning must be explicit.


PART 3 — MAPPING BETWEEN SYSTEMS

Industrial integration is mostly translation.

You translate low-level signals into machine concepts, then machine concepts into factory concepts.


Example mappings

text
PLC register      -> Machine state object
Machine result    -> MES inspection report
Sensor value      -> SCADA tag
Recipe parameter  -> Device configuration
Alarm bit         -> Operator-readable alarm

A good mapping does not only copy data.

It defines:

text
source field
target field
unit
scale
allowed values
state meaning
timestamp meaning
validity condition
ownership
update frequency

ASCII mapping diagram

text
+-----------------------+
| PLC Raw Data          |
+-----------------------+
| D100 = 1              |
| D120 = 12345          |
| D130 = 250            |
+-----------+-----------+
            |
            | semantic mapping
            v
+-----------------------+
| Machine Domain Model  |
+-----------------------+
| State = Busy          |
| XPosition = 12.345 mm |
| Temperature = 25.0 C  |
+-----------+-----------+
            |
            | reporting mapping
            v
+----------------------------+
| MES Production Model       |
+----------------------------+
| EquipmentState = Running   |
| InspectionX_um = 12345     |
| ChamberTemp_C = 25.0       |
+----------------------------+

The most important box here is not PLC or MES.

It is the semantic mapping layer.

Without that layer, meaning gets scattered across code.

That leads to code like:

csharp
if (plc.ReadInt("D100") == 1)
{
    // ready? busy? running? depends who remembers
}

That is fragile.

A better design is:

csharp
var machineState = plcStateMapper.Map(rawPlcSnapshot);

if (machineState.ExecutionState == MachineExecutionState.Busy)
{
    // clear semantic meaning
}

The difference is not just code style.

The second design makes meaning visible, testable, reviewable, and documentable.


PART 4 — COMMON SEMANTIC CHALLENGES

1. Unit mismatch

This is one of the most common industrial data problems.

Examples:

text
mm vs µm
degrees vs radians
Celsius vs Fahrenheit
raw counts vs engineering units
motor steps vs physical distance
percentage vs normalized 0.0-1.0 value

A field named Position is not enough.

A strong field name is closer to:

text
StageXPositionMm
StageXPositionUm
RawEncoderCount
CalibratedStageXPositionMm

In industrial software, units should be treated as part of the contract, not as comments.


2. Scaling and conversion

PLC and device values are often scaled.

Example:

text
PLC D130 = 253
Meaning = 25.3 C
Scale = value / 10

Another example:

text
Raw analog input = 32768
Meaning = 5.0 V
Then converted to pressure = 2.5 bar

The risk is that one system applies the scale and another system applies it again.

Then 25.3 C becomes 2.53 C.

Or nobody applies the scale, and 253 is treated as 253 C.

Both are technically valid numbers.

Both are semantically wrong.


3. Naming inconsistencies

Different teams may use different names for the same concept:

text
LotId
BatchId
CarrierId
WorkOrderId
JobId
RunId

Sometimes these are truly different.

Sometimes they are accidentally used interchangeably.

That is dangerous.

For example:

text
LotId      = production lot
CarrierId  = physical container
WaferId    = individual wafer
RunId      = machine execution session
JobId      = MES instruction

If engineers casually map JobId to LotId, reporting may look correct in simple tests but fail in real production when one job contains multiple lots or one lot spans multiple runs.


4. State meaning differences

State names are especially dangerous.

Common words like these are not universal:

text
Ready
Idle
Running
Busy
Completed
Stopped
Aborted
Error
Paused
Held

For example:

text
PLC Ready:
  motion controller is ready to accept command

Machine Ready:
  all devices initialized, recipe loaded, no blocking alarms

MES Ready:
  equipment is available for dispatching a job

All three can be called Ready.

They are not the same state.


5. Timing differences

Data meaning depends on when it was sampled.

Example:

text
StagePosition = 100 mm

Question:

text
Was that position before motion?
During motion?
At image trigger time?
At inspection result generation time?
At MES upload time?

In inspection systems, this matters a lot.

A defect result without the correct acquisition context can become meaningless.


6. Missing context

A value often needs context to be meaningful.

Example:

text
OffsetX = 12

Missing questions:

text
12 what?
mm or µm?
relative to what coordinate system?
before or after calibration?
for which recipe?
for which product?
for which camera?
for which wafer orientation?

Industrial data is rarely self-explanatory.

The context is part of the data.


PART 5 — CONTEXT & DATA VALIDITY

The same value can mean different things depending on machine context.

This is a major difference from typical business software.

In business software, Price = 100 usually means the same thing as long as currency is known.

In machine software, Position = 100 may mean different things depending on machine state.


Example — Calibration vs production

text
SensorValue = 0.82

During calibration, this may mean:

text
raw measurement used to compute offset

During production, it may mean:

text
validated process measurement

During maintenance, it may mean:

text
manual diagnostic reading

Same value.

Different meaning.


Example — Position before and after homing

Before homing:

text
XPosition = 100

may mean:

text
controller-reported relative position since power-on

After homing:

text
XPosition = 100

may mean:

text
known physical position in machine coordinate system

This is why machine software should not expose position as only a number.

A better model is:

text
+-----------------------------+
| AxisPosition                |
+-----------------------------+
| Value                       |
| Unit                        |
| CoordinateSystem            |
| IsHomed                     |
| IsCalibrated                |
| Timestamp                   |
| Validity                    |
+-----------------------------+

The position is only trustworthy if the context says it is trustworthy.


Data validity model

text
+-----------------------------+
| Measurement                 |
+-----------------------------+
| Value                       |
| Unit                        |
| Timestamp                   |
| Source                      |
| MachineState                |
| WorkflowStep                |
| RecipeVersion               |
| CalibrationVersion          |
| ValidityStatus              |
+-----------------------------+

This may look verbose, but in real machines this metadata often saves days of debugging.

A measurement without context is just a number.

A measurement with context becomes evidence.


PART 6 — DATA CONSISTENCY ACROSS SYSTEMS

Different systems may have different views of reality.

That is normal.

The problem is pretending they always agree.


Example

text
PLC:
  Machine is Busy

Machine Software:
  Workflow is Completing

SCADA:
  Machine is Running

MES:
  Equipment is Available

All of these may be based on real data.

But they may be updated at different times.


Why inconsistency happens

Data can be:

text
delayed
cached
sampled periodically
partially updated
reported after workflow completion
written in different transaction boundaries
read from different sources

Example:

text
Machine finishes inspection at 10:00:00
Result persisted locally at 10:00:01
SCADA tag updates at 10:00:02
MES report sent at 10:00:05
MES acknowledges at 10:00:07

During those seven seconds, different systems may show different truth.

That does not automatically mean the system is broken.

It means the architecture needs a consistency strategy.


Consistency strategy questions

Strong engineers ask:

text
Who owns the source of truth?
Which data is command-critical?
Which data is display-only?
Which data is historical evidence?
Which data can be eventually consistent?
Which data must be synchronized before proceeding?
What timestamp represents the event time?
What timestamp represents the report time?

Without these rules, teams argue during incidents because each system has a different “truth”.


PART 7 — REAL-WORLD FAILURE SCENARIOS

Scenario 1 — Wrong unit conversion causes incorrect operation

What it looks like:

text
Machine moves to the wrong position.
Inspection region is shifted.
Alignment appears unstable.
Recipe works on one machine but fails on another.

Why it happens:

text
MES sends position in µm.
Machine software expects mm.
Or PLC exposes encoder counts.
Application treats it as physical distance.

How engineers debug it:

text
Compare raw source value.
Check unit at every boundary.
Trace conversion code.
Compare expected physical movement with reported movement.
Verify recipe unit definitions.
Check calibration/version metadata.

The key debugging question is:

Where did this number first change meaning?


Scenario 2 — MES reports incorrect production data due to mapping error

What it looks like:

text
MES shows wrong wafer count.
Lot appears completed too early.
Rejected units are reported as passed.
Production dashboard disagrees with machine history.

Why it happens:

text
Machine RunId mapped to MES LotId.
Inspection attempt count mapped to production quantity.
Rework flow not represented correctly.
Partial run treated as completed run.

How engineers debug it:

text
Compare machine event log with MES transaction log.
Identify source event that generated the report.
Check mapping rules for job, lot, wafer, carrier, and run.
Replay a known run through the reporting mapper.
Validate edge cases: abort, retry, rework, partial completion.

This is not a database problem.

It is a semantic mapping problem.


Scenario 3 — PLC and machine disagree on state meaning

What it looks like:

text
UI says Ready.
PLC refuses command.
Machine starts command too early.
Operator sees confusing state transitions.
MES dispatches work while machine is not actually available.

Why it happens:

text
PLC Ready means controller ready.
Machine Ready means workflow ready.
MES Ready means production available.

How engineers debug it:

text
Draw state ownership map.
List each state source and meaning.
Check transition timing.
Review handshake signals.
Compare logs across PLC, machine software, and MES.
Create explicit state translation table.

A good fix is not “rename a variable”.

A good fix is defining separate states:

text
ControllerReady
MachineOperationalReady
ProductionAvailable

Scenario 4 — Stale data used for decision

What it looks like:

text
Machine uses old sensor value.
Decision logic passes even though condition changed.
SCADA shows normal value while machine already alarmed.
MES receives result based on outdated context.

Why it happens:

text
Polling interval too slow.
Cached value has no timestamp.
Consumer assumes latest value.
Data snapshot is not atomic.

How engineers debug it:

text
Inspect timestamps.
Check polling/update frequency.
Log data age at decision point.
Compare event time vs read time.
Add freshness validation.

A strong design treats data freshness explicitly:

text
if (sensorReading.Age > allowedAge)
{
    reject decision;
}

Scenario 5 — Scaling factor mismatch leads to subtle defects

What it looks like:

text
Temperature is slightly wrong.
Pressure control drifts.
Measurement threshold is consistently off.
Inspection quality changes gradually.
Only some products fail.

Why it happens:

text
One side uses scale / 10.
Another side uses scale / 100.
Calibration table changed but mapper was not updated.
Vendor firmware changed raw value meaning.

How engineers debug it:

text
Compare raw values to independent measurement.
Check firmware release notes.
Verify scaling formula.
Review calibration data version.
Run controlled test points.

This kind of bug is hard because the system is not obviously broken.

It is slightly wrong.

In manufacturing, “slightly wrong” can still mean scrap, yield loss, or customer complaints.


Scenario 6 — Same field interpreted differently by different systems

What it looks like:

text
Dashboard and report disagree.
Operators trust one screen.
Process engineers trust another.
Service engineers cannot reproduce issue.

Why it happens:

text
Field name is generic.
Documentation is missing.
One system uses raw value.
Another uses filtered value.
Another uses calibrated value.

Example:

text
Thickness = 725

Possible meanings:

text
raw sensor reading
calibrated thickness
average thickness
last sampled thickness
target thickness
measured thickness after compensation

How engineers debug it:

text
Find producer of the field.
Find all consumers.
Identify transformation steps.
Add semantic suffixes or separate fields.
Update contract documentation.
Add validation tests around mappings.

PART 8 — SOFTWARE DESIGN IMPLICATIONS

The main design lesson is simple:

Data meaning must be explicit in the architecture.

Do not let meaning live only in someone’s head, a spreadsheet, or a comment beside a PLC address.


Bad approach

text
Application code reads raw values everywhere.

PLC D100 means different things in different places.
State codes are interpreted inline.
Unit conversion is duplicated.
MES mapping is built directly in workflow code.
SCADA tags are named casually.

Example:

csharp
var value = plc.ReadInt("D120");
mes.Send("Position", value);

This looks simple.

It is dangerous because nobody knows whether value is raw count, mm, µm, calibrated position, or current command target.


Good approach

text
Raw data is isolated.
Mapping is explicit.
Units are part of the model.
State translation is centralized.
MES reporting has contract tests.
SCADA tags are documented.
Data freshness is validated.

Example:

csharp
var rawSnapshot = plc.ReadSnapshot();

var machineSnapshot = plcMapper.ToMachineSnapshot(rawSnapshot);

machineSnapshot.ValidateSemantics();

var mesReport = mesMapper.ToInspectionReport(machineSnapshot, currentRunContext);

await mesClient.ReportInspectionResultAsync(mesReport);

This design separates:

text
raw communication
semantic interpretation
domain model
external reporting

ASCII component diagram

text
+------------------+
| PLC / Device     |
+------------------+
| registers        |
| bits             |
| raw values       |
+--------+---------+
         |
         v
+---------------------------+
| Raw Data Adapter          |
+---------------------------+
| reads protocol/device data|
| no business meaning       |
+-------------+-------------+
              |
              v
+---------------------------+
| Semantic Mapping Layer    |
+---------------------------+
| units                     |
| scaling                   |
| state translation         |
| validity rules            |
| timestamp meaning         |
+-------------+-------------+
              |
              v
+---------------------------+
| Machine Domain Model      |
+---------------------------+
| machine state             |
| axis position             |
| recipe context            |
| inspection result         |
| alarms                    |
+-------------+-------------+
              |
      +-------+--------+
      |                |
      v                v
+-------------+  +----------------+
| SCADA Tags  |  | MES Reports    |
+-------------+  +----------------+
| display     |  | production     |
| trends      |  | traceability   |
| alarms      |  | quality data   |
+-------------+  +----------------+

The design rule:

External systems should not consume raw machine data unless the contract explicitly says it is raw.

Most factory integrations should consume semantically meaningful data.


What should be in a data contract?

A serious industrial data contract should define more than field names.

For each field, define:

text
Name
Description
Source system
Owner
Unit
Scale
Allowed range
Allowed values
Timestamp meaning
Update frequency
Validity condition
Null/missing behavior
Calibration dependency
Recipe dependency
State dependency
Version introduced
Backward compatibility rule

Example:

text
Field:
  StageXPositionUm

Meaning:
  Calibrated X-axis stage position at image acquisition time

Unit:
  micrometers

Timestamp:
  hardware trigger timestamp

Valid when:
  axis is homed
  calibration is active
  workflow is in inspection step

Invalid when:
  machine is not homed
  axis is moving without trigger synchronization
  calibration version is unknown

That is a real contract.

Not just:

text
StageXPosition: number

PART 9 — INTERVIEW / REAL-WORLD TALKING POINTS

A strong interview explanation could be:

In industrial systems, data modeling is not just about structure. It is about preserving meaning across boundaries. A PLC may expose registers and bits, machine software may model states and workflows, SCADA may model tags, and MES may model jobs, lots, and production records. The integration risk is that values can be technically valid but semantically wrong. For example, a position value may be interpreted as millimeters in one system and micrometers in another, or a status code may mean controller-ready in the PLC but production-ready in MES. A strong design uses explicit mapping layers, unit definitions, state translation tables, timestamp semantics, data freshness checks, and contract tests so that every system interprets the data consistently.


Why format is not enough

Format tells you:

text
This is an integer.
This is a string.
This is a timestamp.
This field is required.

Semantics tells you:

text
This integer is scaled by 1000.
This status is from the PLC state machine.
This timestamp means acquisition time.
This value is valid only after homing.
This result belongs to a specific recipe version.

Format prevents parsing errors.

Semantics prevents wrong decisions.


Common mistakes engineers make

New engineers often:

text
treat PLC addresses as business meaning
trust field names without checking units
map status codes directly across systems
ignore timestamp meaning
forget data freshness
mix raw and calibrated values
assume MES, SCADA, PLC, and machine states mean the same thing
hide conversions inside random services
skip contract tests for mappings

These mistakes are common because the data usually “works” in demos.

The problems appear during real production, with real recipes, real timing, partial failures, retries, stale values, and different machine variants.


What strong engineers understand

Strong industrial software engineers understand that:

text
Data has meaning, not just shape.
Units are part of the contract.
State names are not universal.
Timestamps need precise meaning.
Raw values and engineering values must be separated.
Context determines validity.
Mapping logic deserves tests.
Different systems may have different truths temporarily.
Semantic documentation is production protection.

The key mindset is:

Never ask only “Did we receive the data?” Ask “Did we receive the right meaning at the right time, in the right context?”

That is the heart of data modeling and semantics across industrial systems.

Docs-first project memory for AI-assisted implementation.