Lesson 3: Separating Pipeline Stages
Time: 15-25 minutes
Source section: Producer-consumer model; ingestion, processing, persistence, and UI stages.
Speaking Goal
Describe a channel-based pipeline architecture clearly enough for an architecture review.
Core Idea
text
A production pipeline gives each stage one job. Ingestion accepts machine data, processing enriches it, persistence writes durable records, and UI publishing sends controlled updates. This separation reduces timing coupling and keeps WPF-specific logic out of core processing.Reusable English Sentence Structure: Stage, Responsibility, Boundary, Benefit
Use these sentence frames when explaining architecture:
text
I would model this as [number/type] stages rather than one large handler.
The [stage name] stage is responsible for [owned work].
It should not know about [out-of-scope concern], because that would couple it to [cost/risk].
The boundary is [channel/interface/contract], so each stage receives only [what it needs].
The benefit is that [slow/failing stage] becomes [controlled effect] instead of [production failure].Example:
text
I would model this as separate ingestion, processing, persistence, and UI publishing stages.
The ingestion stage is responsible for accepting machine data and handing it off quickly.
It should not know about database writes or WPF updates, because that would couple callbacks to downstream latency.
The boundary is the channel reader or writer, so each stage receives only the side it needs.Model Answer
text
I would model the flow as a set of stages. The ingestion stage receives raw machine events and writes them into a bounded raw channel. Its responsibility is to accept and hand off data quickly; it should not do database writes, UI updates, or heavy classification.
The processing stage reads raw events, validates or enriches them, and writes processed defects into the next channel. Persistence can then batch and save those processed records. Separately, a UI publishing stage can create smaller visual batches for the WPF view model.
The important boundary is that each stage receives only the channel side it needs. Producers get a ChannelWriter<T>, consumers get a ChannelReader<T>. That keeps ownership clear and prevents the pipeline from turning into a shared mutable coordination object.
The benefit is operational stability. A slow UI does not directly slow ingestion, and a disk spike becomes a controlled queue or backpressure event instead of a random callback delay.Challenge Questions with Sample Answers
Question:
text
Why split processing and persistence? Could the processor just save the result?Sample answer:
text
It could for a small demo, but then CPU transformation and IO latency are coupled. Persistence often benefits from batching and has different failure behavior. By separating them, we can tune throughput, retry or fail the persistence path intentionally, and keep the transformation logic focused.Question:
text
Does this design make debugging harder?Sample answer:
text
It can if we do not add observability. I would add queue depth, processed counts, dropped UI batches, batch flush timing, and stage-level errors. With those signals, debugging is usually easier because each stage has a clear responsibility.Question:
text
What is the risk of too many stages?Sample answer:
text
Too many stages can add latency and operational complexity. I would split stages only where there is a real difference in responsibility, speed, failure behavior, or scaling need. The goal is clear flow control, not a decorative pipeline.Sample Conversation
Reviewer:
text
Why does the ingestor only write to a channel? It looks too thin.You:
text
That thinness is intentional. The machine callback is a sensitive boundary, so I want it to stay predictable. It translates the SDK event into our message and hands it off. The heavier work belongs in stages we control and can test independently.Practice Drill
text
Draw the pipeline on paper, then explain each stage using Stage, Responsibility, Boundary, Benefit.Self-check:
- Did I avoid listing classes without explaining why they exist?
- Did I name what each stage must not do?
- Did I mention operational benefits?
- Did I keep the explanation under three minutes?