LCN Wafer Inspection

Message Framing & Parsing

This topic sits inside the communication area of your roadmap, specifically “Protocol framing and parsing” in the industrial communication domain.

PART 1 — WHY FRAMING & PARSING MATTER

In industrial systems, the transport layer gives your software bytes, not meaningful business objects.

A serial port does not tell you, “here is one complete status message.” A TCP socket does not tell you, “here is exactly one command response.”

It just gives you whatever bytes happened to arrive at that moment.

That means software has to solve a foundational problem:

Where does one message begin and end? That is framing.
Once I have one whole message, how do I interpret it safely? That is parsing.

If you get either part wrong, the system may appear to work most of the time and then fail in ugly, intermittent ways.

Why this is harder than many engineers expect

A transport read can return:

half of a message
one whole message
one and a half messages
three messages combined together
a message plus corrupted junk
nothing at all, even though more data will come later

That is normal behavior, not an edge case.

Example: serial stream

Imagine a device sends two framed messages one after another:

text

<STX>CMD,123<ETX><STX>STS,OK<ETX>

Your read operation might return:

text

Read #1: <STX>CMD,
Read #2: 123<ETX><STX>ST
Read #3: S,OK<ETX>

The transport did nothing wrong. Your software must reconstruct the intended messages.

Example: TCP stream

Suppose the device logically sends:

text

LEN=05 HELLO
LEN=05 WORLD

Your socket might return:

text

Read #1: LEN=05 HE
Read #2: LLOLEN=05WORLD

Again, normal. TCP is a byte stream, not a message queue.

Why incorrect parsing is dangerous

Bad framing/parsing causes problems that look like device bugs, random timeouts, or “weird field issues”:

command interpreted incorrectly
status field shifted into wrong column
checksum validated against wrong bytes
parser waits forever because it lost synchronization
one bad byte corrupts every subsequent message
silent data corruption because parser accepted invalid structure

In industrial software, those are not small bugs. They can become:

wrong setpoints
false alarms
missed alarms
invalid motion parameters
incorrect measurement data
hard-to-reproduce production failures

The core lesson is simple:

Communication code is not about reading data. It is about reconstructing truth from imperfect byte streams.

PART 2 — WHAT IS “FRAMING”

Framing means defining how the receiver knows where a message starts and ends.

Without framing, a stream of bytes is just an undifferentiated river.

Common framing techniques

1. Delimiter-based framing

A special byte or sequence marks the end, sometimes start and end.

Examples:

newline-terminated text
STX/ETX wrappers
CRLF terminators

text

<STX>TEMP,42<ETX><STX>STATUS,OK<ETX>

Good for:

simple protocols
human-readable protocols
instrument-style command sets

Risks:

delimiter may appear inside payload unless escaped
lost delimiter can misalign the stream

2. Fixed-length framing

Every message is exactly N bytes.

text

[10 bytes][10 bytes][10 bytes]

Good for:

highly regular device packets
simple low-level controllers

Risks:

inflexible
one-byte shift destroys alignment
versioning becomes harder

3. Length-prefixed framing

Header says how long the payload is.

text

[Header][Length=12][12 bytes payload]

Good for:

binary protocols
variable-size messages
efficient stream handling

Risks:

corrupted length field can break everything
parser must defend against absurd lengths

4. Header-based framing

A known header pattern starts the frame, often followed by type, length, and checksum.

text

[AA][55][Type][Length][Payload][CRC]

Good for:

binary industrial protocols
protocols needing resynchronization
protocols with multiple message types

Risks:

header pattern may appear accidentally in corrupted data
parser must know how to recover from false positives

ASCII data stream diagram

text

Raw byte stream
+---------------------------------------------------------------+
| 7E 01 04 A1 B2 C3 D4 7E 02 02 11 22 7E 03 03 99 88 77 7E ... |
+---------------------------------------------------------------+

Framed as delimiter-based messages
+-------------------+ +-------------+ +----------------+
| 7E 01 04 A1 B2... | | 7E 02 02... | | 7E 03 03 ...  |
+-------------------+ +-------------+ +----------------+
   Msg 1                 Msg 2           Msg 3

What this diagram means

The stream arrives as one continuous sequence. The parser’s first job is to identify the frame boundaries so the rest of the software can work on complete messages.

PART 3 — BUFFERING & STREAM PROCESSING

Because reads are arbitrary, incoming data must be buffered.

You do not parse directly from each read as if it were a complete message. You append bytes to a buffer, then repeatedly try to extract complete frames from that buffer.

That is the standard mental model.

Core loop

receive bytes
append to buffer
inspect buffer
if a full frame exists, extract it
leave incomplete remainder in buffer
wait for more bytes

ASCII buffer diagram

text

After Read #1
Buffer:
+----------------------+
| AA 55 03 10 20       |
+----------------------+
         ^
         incomplete frame, wait for more

After Read #2
Buffer:
+--------------------------------------+
| AA 55 03 10 20 30 AA 55 02 99 88     |
+--------------------------------------+

Extract frame 1:
+------------------+
| AA 55 03 10 20 30|
+------------------+

Remaining buffer:
+------------------+
| AA 55 02 99 88   |
+------------------+

What this diagram means

The parser does not “forget” partial data between reads. It preserves incomplete bytes, extracts only what is complete, and leaves leftovers for the next read.

Partial message handling

Suppose framing is length-prefixed:

text

[Type][Length][Payload]

If buffer contains:

text

01 05 A0 B1

That says the payload should contain 5 bytes, but only 2 have arrived so far. The correct behavior is not failure. The correct behavior is:

keep buffer as-is
wait for more bytes
continue once the payload is complete

Leftover handling

If one read contains:

one complete message
plus the beginning of the next one

you must extract only the complete message and keep the remainder.

This is where many naive implementations fail. They parse one message correctly but then either discard trailing bytes or accidentally mix them into the next read.

Design implication

Your parser should think in terms of:

stream buffer
cursor / consumption position
frame extraction loop

Not in terms of:

“one read equals one message”

That assumption is one of the most common bugs in device integrations.

PART 4 — PARSING STRUCTURED MESSAGES

Once framing gives you a complete message, parsing begins.

Framing answers: “Do I have one whole message?”

Parsing answers: “What does this message contain, and is it valid?”

Typical parsing steps

identify message type
extract fields
validate structure
convert raw values into usable data
produce a message object or parse result

Example structure

text

[Header][Type][Length][Payload][Checksum]

Payload might itself contain fields:

text

[DeviceId][Status][Value]

ASCII message structure diagram

text

+--------+------+--------+-------------------+----------+
| Header | Type | Length | Payload           | Checksum |
+--------+------+--------+-------------------+----------+
|  AA55  |  02  |   03   | 10 01 7F          |   9C     |
+--------+------+--------+-------------------+----------+

Payload decoded as:
+----------+--------+-------+
| DeviceId | Status | Value |
+----------+--------+-------+
|   0x10   |  0x01  | 0x7F  |
+----------+--------+-------+

What this diagram means

The parser first validates the outer frame structure. Then it interprets the payload according to the message type.

Important distinction

A frame can be structurally complete but semantically invalid.

For example:

length matches actual bytes
checksum passes
but message type is unknown
or payload field values are out of allowed range

A robust parser treats those as different categories:

framing valid
syntax valid
semantic meaning maybe invalid

That separation matters for debugging.

Practical parser outputs

Good parsers usually return something like:

success with parsed message
incomplete frame, need more data
invalid frame, discard or resync
unsupported but structurally valid message
internal parser error

That is much better than throwing random exceptions at every bad input.

PART 5 — HANDLING PARTIAL & CORRUPTED DATA

In real systems, data is often imperfect.

Not constantly broken, but imperfect often enough that your parser must expect it.

Three common bad states

1. Incomplete data

You do not yet have a full frame.

Example:

text

AA 55 04 10 20

Length says 4 payload bytes, but only 2 are present.

Correct response:

keep buffer
wait for more bytes

2. Misaligned data

Your buffer starts in the middle of a frame or contains junk before a valid frame.

Example:

text

99 88 FF AA 55 02 11 22 7C

Correct response:

scan for the next plausible frame start
discard bytes before it
continue carefully

This is called resynchronization.

3. Corrupted data

Frame exists, but structure or integrity check fails.

Example:

invalid checksum
impossible length
illegal type
truncated payload

Correct response may be:

reject frame
log raw bytes
discard frame
attempt resync to next valid header

Checksum / CRC validation

Checksums are not the full answer, but they are a major defense.

They help detect:

flipped bits
truncated messages
accidental concatenation errors
some kinds of misalignment

They do not guarantee correctness of meaning. They only help validate byte integrity.

Why robustness is critical

In industrial software, parsers sit at the trust boundary between your software and the outside world.

That outside world may include:

noisy serial lines
buggy vendor firmware
device restarts mid-stream
mixed protocol versions
partial initialization states
unexpected diagnostic messages

A parser that assumes perfect input is not production-grade.

A production parser must be able to say:

this is incomplete
this is invalid
this looks like a new frame start
this cannot be trusted
this should be dropped without poisoning the rest of the stream

PART 6 — REAL-WORLD FAILURE SCENARIOS

This is where the topic becomes very real.

1. Message split across multiple reads

What it looks like

text

Read #1: AA 55 05 10 20
Read #2: 30 40 50 9C

Why it happens

Transport read boundaries are unrelated to message boundaries.

How engineers debug it

They inspect raw receive logs and notice the parser assumed each read was a whole packet.

Typical symptom:

“works sometimes on local machine”
“fails randomly in production”
“device responses occasionally timeout”

The actual cause is often: parser discarded partial data instead of buffering it.

2. Multiple messages in one buffer

What it looks like

text

Read #1: AA 55 02 11 22 7C AA 55 01 33 44

Why it happens

The OS or driver delivered more than one logical message in a single read.

How engineers debug it

They see first message processed successfully, second one mysteriously missing.

Typical root cause:

parser extracted one frame and ignored remaining bytes
or code returned too early after one successful parse

Strong parsers loop until no more complete frames remain.

3. Lost delimiter causing misalignment

What it looks like

Delimiter-based protocol:

text

<STX>ABC<ETX><STX>DEF<ETX>

But one ETX is lost:

text

<STX>ABC<STX>DEF<ETX>

Now parser may think:

text

ABC<STX>DEF

is one giant invalid frame.

Why it happens

Noise, device bug, or incorrect escaping.

How engineers debug it

They look at raw dumps and compare against expected framing markers.

Good engineers ask:

can my parser recover after delimiter loss?
or will one missing delimiter corrupt the whole session?

This is why resynchronization strategy matters.

4. Corrupted data causing incorrect parsing

What it looks like

Length field says 120 bytes when the protocol maximum is 32.

Why it happens

Corruption, stale buffer content, endian mismatch, or parser bug.

How engineers debug it

They add validation logs:

raw bytes
computed length
expected max length
message type
checksum result

Without those, debugging becomes guesswork.

A defensive parser rejects absurd values early.

5. Parser stuck waiting for data that never arrives

What it looks like

Buffer contains header and declared payload length, but device stopped sending.

text

AA 55 08 10 20 30

Parser keeps waiting forever for remaining bytes.

Why it happens

Device restart, cable issue, transport timeout, sender bug.

How engineers debug it

They correlate:

receive timestamps
buffer state
parser state
timeout events

A robust design must separate:

incomplete but still plausible
incomplete for too long, now stale and invalid

Otherwise the parser can deadlock the protocol session conceptually, even if not at thread level.

6. Incorrect assumption about message boundaries

What it looks like

Developer tests against a simulator that always sends one message per read. Real device later batches messages or splits them.

Why it happens

Test environment was too clean.

How engineers debug it

They compare simulator behavior vs field behavior and discover the parser was accidentally built around a transport artifact.

This is a classic commissioning bug.

Strong engineers test with:

partial frames
combined frames
junk prefixes
corrupted bytes
delayed completion

Not just happy-path packets.

PART 7 — DESIGNING ROBUST PARSERS

A good industrial parser is not clever. It is deterministic, boring, defensive, and inspectable.

That is what you want.

Principles

Deterministic parsing logic

Same input buffer should always produce the same decision:

incomplete
valid frame extracted
invalid frame rejected
resync applied

No hidden heuristics unless the protocol truly requires them.

Clear parser state machine

Even if not formally modeled, the logic should behave like a state machine:

searching for start
reading header
reading declared length
waiting for remaining bytes
validating checksum
emitting frame
resynchronizing after failure

That mental model keeps the code understandable.

Defensive validation

Never trust incoming bytes blindly.

Validate things like:

minimum frame size
maximum frame size
header signature
supported message type
length consistency
checksum / CRC
field ranges where applicable

Rejecting bad data early is cheaper than letting corrupted state travel upward.

Log raw data when needed

Not always at full production verbosity, but you need a way to capture:

received raw bytes
timestamps
parser decisions
reason for rejection
resync events

Many field issues are only solvable if raw communication history can be inspected.

ASCII parsing flow diagram

text

+------------------+
| Incoming bytes   |
+------------------+
          |
          v
+------------------+
| Append to buffer |
+------------------+
          |
          v
+-------------------------------+
| Is there enough for frame     |
| header/minimum structure?     |
+-------------------------------+
      | Yes                     | No
      v                         v
+----------------------------+  +------------------+
| Detect frame boundary      |  | Wait for more    |
| / locate valid start       |  +------------------+
+----------------------------+
          |
          v
+----------------------------+
| Is complete frame present? |
+----------------------------+
      | Yes                     | No
      v                         v
+----------------------------+  +------------------+
| Validate length/checksum   |  | Keep remainder   |
| / structure                |  | in buffer        |
+----------------------------+  +------------------+
      | Valid                  | Invalid
      v                        v
+------------------------+   +----------------------+
| Parse fields           |   | Discard / resync     |
| Emit message object    |   | Log reason           |
+------------------------+   +----------------------+

What this diagram means

The parser is a controlled pipeline:

buffer
detect
validate
extract
parse
recover

Not a loose pile of Substring, Split, or if statements.

PART 8 — SOFTWARE DESIGN IMPLICATIONS

This topic has big architectural consequences.

Parsing should be isolated from business logic

Business logic should not care about byte offsets, delimiters, checksum math, or resync rules.

That belongs in a dedicated communication/parsing layer.

Good layering

text

+-----------------------------+
| Application / Workflow      |
| "Start cycle"               |
| "Update device state"       |
+-----------------------------+
              ^
              |
+-----------------------------+
| Protocol Message Layer      |
| typed messages / DTOs       |
| command and response models |
+-----------------------------+
              ^
              |
+-----------------------------+
| Framing & Parsing Layer     |
| buffer, extract, validate   |
| resync, decode              |
+-----------------------------+
              ^
              |
+-----------------------------+
| Transport Layer             |
| serial / TCP / vendor API   |
| raw bytes in/out            |
+-----------------------------+

What this diagram means

Each layer has a clean job:

transport moves bytes
framing/parsing reconstructs messages
message layer represents protocol structures cleanly
application uses meaningful commands and events

That separation makes systems easier to test and debug.

Bad approach

Ad hoc parsing scattered everywhere:

socket callback slices bytes
service class splits strings
workflow interprets magic indexes
UI directly depends on protocol field positions

That creates:

duplicated assumptions
inconsistent validation
impossible debugging
fragile changes when protocol evolves

Good approach

Structured parsing pipeline:

one input buffer owner
one framing algorithm per protocol
one parser for message definitions
typed parse results
explicit invalid/incomplete states
test cases for edge conditions

Testability matters

A good parser can be tested entirely offline with byte arrays.

You should be able to unit test scenarios like:

frame split across three reads
two frames in one read
invalid checksum
junk before valid frame
declared length too large
unknown message type

That is one of the strongest design advantages of isolating the parsing layer.

PART 9 — INTERVIEW / REAL-WORLD TALKING POINTS

How to explain framing vs parsing clearly

A strong simple explanation:

Framing is how you identify where a message starts and ends in a raw byte stream. Parsing is how you decode that complete message into structured fields and validate that it is actually meaningful.

That is crisp and correct.

Why stream-based communication is tricky

Because reads do not align with messages. You can receive:

partial messages
multiple messages together
corrupted bytes
misaligned data after an error

So robust software must buffer, extract, validate, and recover.

Common mistakes engineers make

assuming one read equals one message
discarding incomplete data
parsing without clear framing
trusting length fields blindly
not handling junk or resynchronization
mixing byte parsing with business logic
lacking raw communication logs
testing only happy-path packets

What strong engineers understand

Strong engineers understand that parsing is not a small helper function. It is a reliability boundary.

They know that:

stream handling is stateful
imperfect input is normal
recovery after corruption matters as much as normal parsing
deterministic parser behavior is critical
observability is essential for field debugging
protocol code should be isolated and heavily tested

A strong interview answer

You could say something like this:

In industrial systems, transport layers usually deliver arbitrary chunks of bytes, not complete messages. So the software needs a framing layer to reconstruct message boundaries, then a parsing layer to decode and validate structured content. The hard part is not the happy path. The hard part is handling split frames, combined frames, corrupted data, and loss of synchronization without poisoning the whole communication session. Good designs isolate parsing from business logic, use deterministic state-based parsing, validate aggressively, and provide raw-byte diagnostics for field troubleshooting.

That sounds like someone who has moved past textbook protocol knowledge into production engineering thinking.

Final mental model

Think of the whole topic like this:

text

Raw bytes arrive
      ->
bytes are buffered
      ->
frame boundaries are detected
      ->
complete frame is validated
      ->
fields are parsed into typed data
      ->
invalid input is rejected or resynchronized
      ->
business logic sees only meaningful messages

That is the real job of message framing and parsing.

If you want, I can continue with the next topic in the same style.

Streaming Pipelines Dotnet Real World

Message Framing & Parsing ​

PART 1 — WHY FRAMING & PARSING MATTER ​

Why this is harder than many engineers expect ​

Example: serial stream ​

Example: TCP stream ​

Why incorrect parsing is dangerous ​

PART 2 — WHAT IS “FRAMING” ​

Common framing techniques ​

1. Delimiter-based framing ​

2. Fixed-length framing ​

3. Length-prefixed framing ​

4. Header-based framing ​

ASCII data stream diagram ​

What this diagram means ​

PART 3 — BUFFERING & STREAM PROCESSING ​

Core loop ​

ASCII buffer diagram ​

What this diagram means ​

Partial message handling ​

Leftover handling ​

Design implication ​

PART 4 — PARSING STRUCTURED MESSAGES ​

Typical parsing steps ​

Example structure ​

ASCII message structure diagram ​

What this diagram means ​

Important distinction ​

Practical parser outputs ​

PART 5 — HANDLING PARTIAL & CORRUPTED DATA ​

Three common bad states ​

1. Incomplete data ​

2. Misaligned data ​

3. Corrupted data ​

Checksum / CRC validation ​

Why robustness is critical ​

PART 6 — REAL-WORLD FAILURE SCENARIOS ​

1. Message split across multiple reads ​

What it looks like ​

Why it happens ​

How engineers debug it ​

2. Multiple messages in one buffer ​

What it looks like ​

Why it happens ​

How engineers debug it ​

3. Lost delimiter causing misalignment ​

What it looks like ​

Why it happens ​

How engineers debug it ​

4. Corrupted data causing incorrect parsing ​

What it looks like ​

Why it happens ​

How engineers debug it ​

5. Parser stuck waiting for data that never arrives ​

What it looks like ​

Why it happens ​

How engineers debug it ​

6. Incorrect assumption about message boundaries ​

What it looks like ​

Why it happens ​

How engineers debug it ​

PART 7 — DESIGNING ROBUST PARSERS ​

Principles ​

Deterministic parsing logic ​

Clear parser state machine ​

Defensive validation ​

Log raw data when needed ​

ASCII parsing flow diagram ​

What this diagram means ​

PART 8 — SOFTWARE DESIGN IMPLICATIONS ​

Parsing should be isolated from business logic ​

Good layering ​

What this diagram means ​

Bad approach ​

Good approach ​

Testability matters ​

PART 9 — INTERVIEW / REAL-WORLD TALKING POINTS ​

How to explain framing vs parsing clearly ​

Why stream-based communication is tricky ​

Common mistakes engineers make ​

Message Framing & Parsing

PART 1 — WHY FRAMING & PARSING MATTER

Why this is harder than many engineers expect

Example: serial stream

Example: TCP stream

Why incorrect parsing is dangerous

PART 2 — WHAT IS “FRAMING”

Common framing techniques

1. Delimiter-based framing

2. Fixed-length framing

3. Length-prefixed framing

4. Header-based framing

ASCII data stream diagram

What this diagram means

PART 3 — BUFFERING & STREAM PROCESSING

Core loop

ASCII buffer diagram

What this diagram means

Partial message handling

Leftover handling

Design implication

PART 4 — PARSING STRUCTURED MESSAGES

Typical parsing steps

Example structure

ASCII message structure diagram

What this diagram means

Important distinction

Practical parser outputs

PART 5 — HANDLING PARTIAL & CORRUPTED DATA

Three common bad states

1. Incomplete data

2. Misaligned data

3. Corrupted data

Checksum / CRC validation

Why robustness is critical

PART 6 — REAL-WORLD FAILURE SCENARIOS

1. Message split across multiple reads

What it looks like

Why it happens

How engineers debug it

2. Multiple messages in one buffer

What it looks like

Why it happens

How engineers debug it

3. Lost delimiter causing misalignment

What it looks like

Why it happens

How engineers debug it

4. Corrupted data causing incorrect parsing

What it looks like

Why it happens

How engineers debug it

5. Parser stuck waiting for data that never arrives

What it looks like

Why it happens

How engineers debug it

6. Incorrect assumption about message boundaries

What it looks like

Why it happens

How engineers debug it

PART 7 — DESIGNING ROBUST PARSERS

Principles

Deterministic parsing logic

Clear parser state machine

Defensive validation

Log raw data when needed

ASCII parsing flow diagram

What this diagram means

PART 8 — SOFTWARE DESIGN IMPLICATIONS

Parsing should be isolated from business logic

Good layering

What this diagram means

Bad approach

Good approach

Testability matters

PART 9 — INTERVIEW / REAL-WORLD TALKING POINTS

How to explain framing vs parsing clearly

Why stream-based communication is tricky

Common mistakes engineers make

What strong engineers understand