PART 1 — BIG PICTURE
In industrial desktop systems, state is not a side detail. It is the system.
When you build a normal CRUD business app, state is often just “what values are currently in memory” or “what row is in the database.” But in a machine-control desktop app, state decides what the system is allowed to do next, what the UI should enable, which commands are safe, and how the app should react when something goes wrong.
A wafer inspection machine is a good example. At any moment, the machine may be:
- Idle
- Initializing
- Homing
- Ready
- Running
- Paused
- Stopping
- Error
- EmergencyStop
Those are not just labels for display. They control behavior.
If the machine is Idle, Start may be allowed. If it is Running, Start must be blocked. If it is Stopping, maybe neither Start nor Stop should be allowed. If it is Error, Reset may be allowed, but only after the hardware has acknowledged the fault.
That is why state management becomes critical. It is not about “organizing code nicely.” It is about preventing bad commands, protecting hardware, keeping UI honest, and making the system understandable.
Why simple flags fail at scale
At the beginning, many systems start like this:
bool isRunning;
bool isPaused;
bool isStopping;
bool hasError;
bool isHomed;
bool isReady;This feels easy. It also becomes a mess very quickly.
Because now you must answer questions like:
- Can
isRunningandhasErrorboth be true? - Can
isPausedbe true whileisStoppingis true? - Does
isReady = trueimplyisHomed = true? - What does it mean if
isRunning = false,isPaused = false,isStopping = false, but the machine is still moving?
Booleans do not describe a system. They describe fragments of a system. Once many flags start interacting, you no longer have a clear model. You have a pile of partial truths.
That is where real bugs come from:
- UI says “Ready” but machine is not actually ready
- Start button is enabled during shutdown
- Stop command gets sent twice
- Error recovery skips a required step
- Logs show contradictory conditions
Why workflows must be modeled explicitly
Industrial workflows have meaning and order.
A real inspection workflow is not just “run some code.” It usually looks more like this:
- Recipe selected
- Validation started
- Hardware prepared
- Wafer loaded
- Alignment running
- Inspection running
- Results streaming
- Inspection completed
- Report finalized
- Return to ready state
That flow has rules. Some steps can repeat. Some cannot. Some can fail. Some can be canceled. Some need operator confirmation.
If you do not model that flow explicitly, the logic spreads everywhere:
- in ViewModels
- in machine service classes
- in button handlers
- in timers
- in hardware callback code
Then nobody knows where the workflow truly lives.
A state-driven design gives you one place to answer the most important question in the system:
“Given where we are now, what can happen next?”
That is the heart of production-grade machine software.
PART 2 — HOW IT ACTUALLY WORKS
A state machine is just a very disciplined way to model behavior.
It says:
- the system is currently in one state
- something happens, usually an event or command
- based on the current state, the system either transitions to another valid state or rejects the action
That is all. But that simple idea is extremely powerful.
Core pieces
State
A named condition of the system.
Examples:
IdleReadyRunningStoppingError
Event / Trigger
Something that happens and may cause a transition.
Examples:
- user clicks Start
- machine reports MotionComplete
- inspection engine reports Completed
- hardware reports FaultDetected
- operator clicks Reset
Transition
A rule that says:
when in state X, event Y is allowed, and system moves to state Z
Example:
Idle+InitializeSucceeded->ReadyReady+StartRequested->RunningRunning+StopRequested->StoppingStopping+StopConfirmed->ReadyAnyState+FaultDetected->Error
Enforcing valid transitions
This is the most important part.
A state machine is useful because it does not just store state. It protects state.
Without a state machine, someone can accidentally do this:
if (!isRunning)
{
StartMachine();
}Looks innocent. But what if the machine is actually:
- in
Stopping - in
Error - not homed
- disconnected
- waiting for door interlock
!isRunning is not enough.
A state machine forces the question to be explicit:
- Are we in
Ready? - Is
StartRequestedvalid fromReady? - If yes, move to
StartingorRunning - If no, reject it and log why
That gives you much stronger guarantees.
A useful mental model
Think of the state machine as a gatekeeper.
The gatekeeper answers:
- What state are we in?
- What events are allowed here?
- What state change happens next?
- What side effects must run during the transition?
That is much better than letting every part of the app make its own guess.
PART 3 — REAL PROBLEMS IN THIS SYSTEM
Using your example:
A WPF desktop app controlling a wafer inspection machine
this is exactly the kind of system where state bugs become expensive.
Invalid actions: Start while Running
This sounds trivial, but it is a classic real-world bug.
User clicks Start. The machine starts. UI is slightly delayed. User clicks Start again. Now one of these happens:
- duplicate command sent to controller
- second workflow instance starts
- internal buffers are reset mid-run
- command is ignored silently, causing operator confusion
- app enters inconsistent state
This is not really a “button handling” problem. It is a state problem.
The correct question is not “should I disable the button?” The correct question is “is StartRequested valid in the current system state?”
If the state machine says no, the command must be rejected even if the button somehow remained enabled.
That is an important senior-engineer principle:
UI disabling is convenience. State validation is protection.
Inconsistent UI vs machine state
This is another common production issue.
Suppose:
- machine controller reports
Running - app processing thread is busy
- UI still shows
Ready - Start button remains enabled for 500 ms
That small delay is enough to cause operator mistakes.
Why does this happen?
Because in many systems, the machine state and UI state are maintained separately:
- hardware service has its own flags
- workflow service has another status field
- ViewModel has button booleans
- status bar text is set manually somewhere else
Now you have four versions of the truth.
A better design has one canonical state model, and the UI derives from it.
For example:
- machine state source changes to
Running - state coordinator processes that change
- ViewModel receives updated state snapshot
- Start button becomes disabled because current state is
Running
UI does not invent its own meaning. It reflects the authoritative state.
Race conditions between events and user actions
Industrial systems are full of concurrent events:
- user clicks Stop
- machine emits FrameCaptured
- network heartbeat drops
- inspection algorithm finishes batch
- PLC reports safety door opened
These can arrive close together, sometimes from different threads.
Now imagine:
- User clicks Stop
- Stop request is being processed
- Before transition completes, machine sends Completed
- UI thread receives both updates in slightly different order
- Workflow ends in impossible state like
CompletedthenStopping, orReadywith active pipeline still alive
This is how state corruption happens.
The problem is not just “threading.” The problem is that state transition handling is not serialized and not explicit.
In production systems, state changes usually need one of these patterns:
- a single-threaded event loop for state transitions
- a lock around transition processing
- a command queue / channel
- immutable state snapshots updated in one coordinator
The goal is simple:
state transitions should happen in a controlled order, not randomly from whichever thread arrives first
Hard-to-debug state bugs
State bugs are among the hardest bugs in machine systems because they are usually:
- timing-dependent
- rare
- operator-dependent
- hardware-dependent
- not reproducible on developer laptops
Symptoms look vague:
- “sometimes Stop doesn’t work”
- “UI stuck in Running after completion”
- “machine says Ready but Start is disabled”
- “after an error and reset, next run fails”
If the system has no explicit transition model, debugging becomes painful. You only see final symptoms, not the sequence of state changes that caused them.
A good state-driven system makes debugging much easier because you can log:
- previous state
- trigger/event
- next state
- correlation id / workflow id
- thread id
- reason transition was accepted or rejected
That turns “random weird bug” into a traceable transition history.
PART 4 — HOW WE USE IT IN .NET (PRACTICAL)
There are many ways to implement state management in .NET. The right one depends on system complexity.
For industrial desktop systems, I usually think in layers:
- simple enum-based state for small, bounded flows
- richer transition logic in a dedicated coordinator/service
- event serialization if multiple threads can update state
- ViewModel derives UI state from the authoritative machine/workflow state
1. Start with explicit state types
For many systems, enum is a good start.
public enum MachineState
{
Disconnected,
Initializing,
Idle,
Ready,
Running,
Pausing,
Paused,
Stopping,
Error,
EmergencyStop
}
public enum MachineTrigger
{
ConnectSucceeded,
InitializeSucceeded,
StartRequested,
PauseRequested,
ResumeRequested,
StopRequested,
Completed,
FaultDetected,
ResetRequested,
EmergencyStopPressed
}This is already better than six booleans.
2. Centralize transition validation
Do not let transitions happen all over the app.
Create one service responsible for state changes.
public sealed class MachineStateCoordinator
{
private readonly object _sync = new();
public MachineState CurrentState { get; private set; } = MachineState.Disconnected;
public event EventHandler<StateChangedEventArgs>? StateChanged;
public bool TryFire(MachineTrigger trigger, out string? reason)
{
lock (_sync)
{
var previous = CurrentState;
if (!TryGetNextState(CurrentState, trigger, out var nextState, out reason))
return false;
if (previous == nextState)
return true;
CurrentState = nextState;
StateChanged?.Invoke(this, new StateChangedEventArgs(previous, nextState, trigger));
return true;
}
}
private static bool TryGetNextState(
MachineState current,
MachineTrigger trigger,
out MachineState next,
out string? reason)
{
reason = null;
next = current;
switch (current)
{
case MachineState.Disconnected:
if (trigger == MachineTrigger.ConnectSucceeded)
{
next = MachineState.Initializing;
return true;
}
break;
case MachineState.Initializing:
if (trigger == MachineTrigger.InitializeSucceeded)
{
next = MachineState.Ready;
return true;
}
if (trigger == MachineTrigger.FaultDetected)
{
next = MachineState.Error;
return true;
}
break;
case MachineState.Ready:
if (trigger == MachineTrigger.StartRequested)
{
next = MachineState.Running;
return true;
}
if (trigger == MachineTrigger.FaultDetected)
{
next = MachineState.Error;
return true;
}
break;
case MachineState.Running:
if (trigger == MachineTrigger.StopRequested)
{
next = MachineState.Stopping;
return true;
}
if (trigger == MachineTrigger.Completed)
{
next = MachineState.Ready;
return true;
}
if (trigger == MachineTrigger.FaultDetected)
{
next = MachineState.Error;
return true;
}
break;
case MachineState.Stopping:
if (trigger == MachineTrigger.Completed)
{
next = MachineState.Ready;
return true;
}
if (trigger == MachineTrigger.FaultDetected)
{
next = MachineState.Error;
return true;
}
break;
case MachineState.Error:
if (trigger == MachineTrigger.ResetRequested)
{
next = MachineState.Idle;
return true;
}
break;
}
reason = $"Trigger '{trigger}' is invalid while machine is in state '{current}'.";
return false;
}
}
public sealed record StateChangedEventArgs(
MachineState PreviousState,
MachineState NewState,
MachineTrigger Trigger);This gives you:
- single source of truth
- valid transition enforcement
- thread-safe updates
- a clean event you can log or bind to
3. Separate transition logic from side effects
One important design mistake is mixing “change state” and “perform hardware action” too tightly.
Bad pattern:
if (state == MachineState.Ready)
{
state = MachineState.Running;
_controller.StartInspection();
}This is dangerous because what if _controller.StartInspection() fails? Now the system says Running, but the machine never started.
A better pattern is to be deliberate about transition phases. For example:
Ready->Starting- send hardware command
- when hardware acknowledges start,
Starting->Running - if hardware start fails,
Starting->ErrororReady
That is much more honest.
4. Handle machine events and user actions the same way
Both operator commands and hardware callbacks should flow into the same coordinator.
For example:
public sealed class MachineApplicationService
{
private readonly MachineStateCoordinator _stateCoordinator;
private readonly IMachineController _controller;
private readonly ILogger<MachineApplicationService> _logger;
public MachineApplicationService(
MachineStateCoordinator stateCoordinator,
IMachineController controller,
ILogger<MachineApplicationService> logger)
{
_stateCoordinator = stateCoordinator;
_controller = controller;
_logger = logger;
}
public async Task StartAsync(CancellationToken cancellationToken)
{
if (!_stateCoordinator.TryFire(MachineTrigger.StartRequested, out var reason))
{
_logger.LogWarning("Start rejected: {Reason}", reason);
return;
}
try
{
await _controller.StartInspectionAsync(cancellationToken);
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to start inspection.");
_stateCoordinator.TryFire(MachineTrigger.FaultDetected, out _);
}
}
public async Task StopAsync(CancellationToken cancellationToken)
{
if (!_stateCoordinator.TryFire(MachineTrigger.StopRequested, out var reason))
{
_logger.LogWarning("Stop rejected: {Reason}", reason);
return;
}
try
{
await _controller.StopInspectionAsync(cancellationToken);
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to stop inspection.");
_stateCoordinator.TryFire(MachineTrigger.FaultDetected, out _);
}
}
public void OnMachineCompleted()
{
_stateCoordinator.TryFire(MachineTrigger.Completed, out _);
}
public void OnMachineFault(string error)
{
_logger.LogError("Machine fault: {Error}", error);
_stateCoordinator.TryFire(MachineTrigger.FaultDetected, out _);
}
}That is important: everything enters through the same state boundary.
5. Integrate with ViewModel cleanly
In WPF, the ViewModel should not invent machine truth.
It should observe state changes and expose UI-friendly properties derived from state.
public sealed class MachineViewModel : INotifyPropertyChanged
{
private readonly MachineApplicationService _appService;
private readonly MachineStateCoordinator _stateCoordinator;
public event PropertyChangedEventHandler? PropertyChanged;
public MachineViewModel(
MachineApplicationService appService,
MachineStateCoordinator stateCoordinator)
{
_appService = appService;
_stateCoordinator = stateCoordinator;
_stateCoordinator.StateChanged += OnStateChanged;
StartCommand = new AsyncRelayCommand(StartAsync, CanStart);
StopCommand = new AsyncRelayCommand(StopAsync, CanStop);
}
public ICommand StartCommand { get; }
public ICommand StopCommand { get; }
public string MachineStatus => _stateCoordinator.CurrentState.ToString();
public bool CanStartButtonBeEnabled => _stateCoordinator.CurrentState == MachineState.Ready;
public bool CanStopButtonBeEnabled =>
_stateCoordinator.CurrentState == MachineState.Running ||
_stateCoordinator.CurrentState == MachineState.Paused;
private bool CanStart() => CanStartButtonBeEnabled;
private bool CanStop() => CanStopButtonBeEnabled;
private async Task StartAsync()
{
await _appService.StartAsync(CancellationToken.None);
}
private async Task StopAsync()
{
await _appService.StopAsync(CancellationToken.None);
}
private void OnStateChanged(object? sender, StateChangedEventArgs e)
{
OnPropertyChanged(nameof(MachineStatus));
OnPropertyChanged(nameof(CanStartButtonBeEnabled));
OnPropertyChanged(nameof(CanStopButtonBeEnabled));
if (StartCommand is AsyncRelayCommand start)
start.RaiseCanExecuteChanged();
if (StopCommand is AsyncRelayCommand stop)
stop.RaiseCanExecuteChanged();
}
private void OnPropertyChanged(string propertyName)
=> PropertyChanged?.Invoke(this, new PropertyChangedEventArgs(propertyName));
}The key idea is:
- authoritative state lives in coordinator
- ViewModel reflects it
- commands use it
- UI consistency becomes much easier
6. For more complex workflows, model richer state
Sometimes enum is not enough.
For example, inspection workflow state may need data:
- current wafer id
- current recipe version
- step index
- last defect count
- cancellation requested
- failure reason
In that case, use an immutable state object instead of just enum.
public sealed record InspectionState(
InspectionPhase Phase,
string? RecipeName,
string? WaferId,
int CurrentStep,
string? ErrorMessage);
public enum InspectionPhase
{
NotReady,
Preparing,
Aligning,
Running,
Finalizing,
Completed,
Failed,
Cancelled
}Then transitions become state-to-state updates:
public static class InspectionTransitions
{
public static InspectionState StartPreparation(InspectionState current, string recipe, string waferId)
{
if (current.Phase is not InspectionPhase.NotReady and not InspectionPhase.Completed)
throw new InvalidOperationException($"Cannot prepare from state {current.Phase}");
return current with
{
Phase = InspectionPhase.Preparing,
RecipeName = recipe,
WaferId = waferId,
CurrentStep = 0,
ErrorMessage = null
};
}
public static InspectionState MarkFailure(InspectionState current, string error)
=> current with
{
Phase = InspectionPhase.Failed,
ErrorMessage = error
};
}This becomes very powerful for complex workflows because now state is both:
- phase
- related context
7. Consider serializing transitions through a queue
If machine events come from many threads, a very practical approach is to process triggers through a single channel or mailbox.
That avoids race conditions better than random locking scattered everywhere.
High-level idea:
- machine callback thread posts trigger
- UI command posts trigger
- one coordinator loop reads triggers sequentially
- transition processing happens in one place, in one order
That pattern is excellent for industrial systems.
PART 5 — COMMON MISTAKES (VERY REALISTIC)
1. Too many boolean flags
This is probably the most common mistake.
Example:
bool isConnected;
bool isReady;
bool isRunning;
bool isStopping;
bool hasAlarm;
bool isResetting;Production consequence:
- contradictory combinations become possible
- logic becomes impossible to reason about
- every new feature adds another flag
- debugging becomes “what combination was true at that moment?”
This is how teams accidentally build a hidden state machine without realizing it.
2. Scattered state checks
One class checks isRunning. Another checks !hasAlarm. Another checks currentMode == Auto. Another checks statusText != "Stopped".
Now nobody actually knows the real rule for Start.
Production consequence:
- different parts of the app disagree
- UI allows actions the backend rejects
- bug fixes in one place do not fix the system
A senior engineer will centralize transition rules.
3. Implicit transitions
This is when state changes “just happen” as side effects.
Example:
- some method sends Start command
- another callback later sets
isRunning = true - a timer sets
isReady = false - ViewModel text updates separately
No explicit transition model. Just drifting variables.
Production consequence:
- nobody can reconstruct how the system moved from A to B
- logging is weak
- failures create ghost states that are hard to recover from
4. No single source of truth
Machine service stores one state. Workflow engine stores another. UI stores button booleans. Database stores status text.
Production consequence:
- support team sees impossible screenshots
- restart fixes the UI but not machine logic
- system becomes fragile under latency or thread timing differences
The best design usually has:
- one authoritative runtime state model
- derived UI state
- derived status text
- explicit synchronization with hardware state
5. Treating UI state as business truth
Disabling a button is not enough.
A user may trigger a command through:
- keyboard shortcut
- automation
- stale screen
- repeated click
- external integration
Production consequence:
- invalid commands still slip through
- safety bugs happen behind the UI layer
Always validate at the domain/application state boundary.
6. Skipping transitional states
Teams often model only stable states:
- Idle
- Running
- Error
But real systems also need intermediate states:
- Starting
- Stopping
- Resetting
- Recovering
Production consequence:
- operations feel flaky
- duplicate commands get through during in-between moments
- UI has no way to communicate “busy but not finished”
Transitional states are not extra complexity for fun. They represent real system time.
PART 6 — PERFORMANCE & TRADE-OFFS
A lot of engineers worry that state machines sound “heavy.”
In practice, state machine overhead is usually tiny compared with:
- hardware I/O
- image processing
- network calls
- UI rendering
- logging
- database writes
A transition check is often just:
- read current state
- evaluate rule
- write next state
- publish event
That is extremely cheap.
The real trade-off is not performance
It is complexity versus clarity.
A fully formal state machine can feel like overkill for tiny features. But for anything involving machine control, long-running workflows, cancellation, or fault recovery, explicit state usually pays for itself quickly.
When simple is enough
Use a simple enum and switch-based transition logic when:
- state space is small
- rules are clear
- team wants minimal abstraction
- workflow is stable
This is often a very good choice.
When you need something richer
Use richer objects or even a dedicated library when:
- many transitions exist
- entry/exit actions matter
- nested workflows exist
- state needs attached data
- auditability matters
- visualization/documentation matters
The point is not to sound fancy. The point is to choose a model that the team can maintain.
Over-engineering risk
It is possible to go too far:
- generic state engine nobody understands
- state metadata everywhere
- reflection-heavy configuration
- too many abstractions for a small workflow
That can make the system harder, not easier.
For many production .NET systems, the sweet spot is:
- explicit states
- explicit triggers
- centralized transition service
- strong logging
- state-derived UI behavior
That gets most of the benefit without building a framework.
PART 7 — SENIOR ENGINEER THINKING
Experienced engineers treat state as a design problem, not just a coding detail.
1. Make the allowed behavior visible
A senior engineer wants to answer these questions quickly:
- What states exist?
- What transitions are legal?
- What events cause transitions?
- What happens on failure?
- How do we recover?
If those answers are buried across ten files, the design is weak.
2. Keep transitions explicit
Good systems make transitions obvious.
Instead of random property changes, prefer code that clearly says:
- current state
- trigger
- next state
- reason
That makes code review easier and prevents accidental behavior.
3. Separate “requested”, “in progress”, and “confirmed”
This is a very mature pattern in machine systems.
Do not jump straight from Ready to Running just because the user clicked Start.
Often the better model is:
ReadyStartRequestedStartingRunning
Or at least:
- command requested
- hardware acknowledged
- state finalized
Why? Because real systems have delays, failures, retries, and asynchronous acknowledgements.
4. Design for fault paths first
Many junior implementations model the happy path beautifully and the error path poorly.
Senior engineers think early about:
- what if start fails halfway?
- what if stop is requested during aligning?
- what if machine disconnects during inspection?
- what if error occurs while UI still processing old events?
In industrial systems, the fault path is not edge behavior. It is core behavior.
5. Serialize state transitions when concurrency is real
When hardware callbacks, timers, streaming pipelines, and UI actions all interact, experienced engineers do not trust ad hoc updates.
They prefer:
- one coordinator
- one queue
- one controlled transition path
That is how you reduce race conditions.
6. Log transitions, not just errors
A strong production system logs not only “something failed” but also:
- state before
- trigger/event
- state after
- command id / workflow id
- elapsed time
- rejection reason
That is gold during incident analysis.
Example log:
WorkflowId=WF-2048 Previous=Running Trigger=StopRequested Next=Stopping
WorkflowId=WF-2048 Previous=Stopping Trigger=Completed Next=Ready
WorkflowId=WF-2048 Previous=Ready Trigger=StartRequested Rejected Reason="Recipe not loaded"Now debugging becomes forensic instead of guesswork.
7. Derive UI behavior from authoritative state
A senior engineer does not manually sprinkle button enable/disable logic everywhere.
They define authoritative state once, then derive:
- button enabled state
- status text
- screen mode
- progress display
- alarm visibility
This drastically reduces inconsistencies.
8. Think in recovery terms
A production-grade industrial system must answer:
- how does operator recover from
Error? - what must be reset?
- what state is safe after reconnect?
- what happens after app restart?
- should incomplete workflow resume, fail, or reset?
State design is not just about forward movement. It is about safe recovery.
Final practical takeaway
For a WPF industrial machine-control system, the healthy mental model is this:
- the machine has explicit states
- the workflow has explicit states
- UI behavior is derived from those states
- all commands and machine events go through a controlled transition boundary
- invalid transitions are rejected, logged, and visible
- transitional states are modeled honestly
- concurrency is handled so transitions happen in a predictable order
That is how experienced engineers stop a complex system from turning into a pile of flags, random callbacks, and impossible bugs.
If you want, I can next give you a production-grade design blueprint for this topic, with: MachineStateCoordinator + WorkflowStateCoordinator + WPF ViewModel + Channels/Event Queue + logging flow, all in one practical architecture.