Skip to content

Advanced concurrency primitives in .NET, in the real world

When people first learn async/await, it is tempting to think concurrency is mostly solved.

It is not.

async/await helps you express asynchronous work. It does not automatically make shared state safe. It does not prevent two threads from updating the same object at the same time. It does not stop a UI freeze caused by a bad wait. It does not make machine state transitions correct. It does not prevent race conditions in counters, flags, queues, caches, or background pipelines.

That is where low-level concurrency primitives still matter.

In a production WPF machine-control system, these primitives sit in the places where correctness really matters:

  • only one machine command should execute at a time
  • start and stop must not overlap
  • streaming threads must not corrupt shared buffers
  • UI must see consistent state
  • counters and flags must not lie
  • background workers must coordinate without freezing the app

And unfortunately, misuse of these primitives causes some of the nastiest production bugs you can get:

  • hangs that only happen once every few weeks
  • UI freezes that look random
  • data corruption with no exception
  • “machine busy” forever because a flag was never cleared
  • deadlocks that appear only under pressure
  • race conditions that disappear the moment you attach a debugger

This is why senior engineers care about them.


Part 1 — Big picture

Why low-level primitives still matter even with Task and async/await

Task and async/await are about how work is scheduled and resumed.

Concurrency primitives are about how shared state is protected and coordinated.

Those are different problems.

For example:

  • A camera event arrives on a background thread.
  • A result-processing pipeline is running asynchronously.
  • The UI thread is rendering live defect markers.
  • An operator clicks Stop while Start is still in progress.
  • A health-monitor loop is polling machine status every 200 ms.

You can write all of that with Task, await, and background services. But once multiple execution paths touch the same state, you still need rules.

Typical shared state in a wafer inspection app:

  • machine state: Idle, Starting, Running, Stopping, Faulted
  • current recipe
  • live defect counters
  • image/result queues
  • “is stop requested” flags
  • connection/session state with the vendor SDK
  • in-memory caches of wafer/run metadata

Without synchronization, that state becomes unreliable fast.

Why misuse causes the worst production bugs

Most normal bugs fail loudly. A null reference throws. A bad file path fails immediately.

Concurrency bugs often fail silently and intermittently.

That makes them much more expensive.

A bad lock can:

  • freeze the UI
  • block shutdown
  • deadlock machine commands
  • create priority inversions where one slow path stalls the whole app

A bad atomic update can:

  • show incorrect counts
  • skip a stop request
  • allow double-start
  • lose results under load

A bad concurrent collection design can:

  • look thread-safe while the overall workflow is still broken
  • hide ownership problems
  • create memory growth because no one owns lifecycle clearly

Coordination vs synchronization

This distinction matters a lot.

Synchronization means protecting shared access so multiple threads do not step on each other.

Examples:

  • locking machine state during state transition
  • atomically incrementing a shared defect counter
  • protecting a shared dictionary from simultaneous updates

Coordination means controlling who is allowed to do something, when, and in what order.

Examples:

  • only one Start/Stop command at a time
  • producer waits when a queue is full
  • shutdown waits for workers to finish
  • background pipeline stages signal each other

Many engineers use low-level synchronization when the real problem is coordination.

That is where designs go wrong.

For example, using lots of locks around a workflow usually means you are trying to synchronize your way out of a coordination problem. Very often the better answer is a command queue, actor-like ownership model, or pipeline boundary.

Correctness vs performance

You always want correctness first.

A wrong program that is fast is still wrong.

In production systems, especially machine systems, the order is usually:

  1. correctness
  2. diagnosability
  3. simplicity
  4. performance

Then optimize only where measurement says it matters.

A lot of concurrency mistakes happen because developers optimize too early:

  • replacing a simple lock with Interlocked everywhere
  • using ConcurrentDictionary for everything
  • sprinkling volatile on flags
  • avoiding locks at all costs because “locks are slow”

This often makes code faster to fail and harder to reason about.

A small uncontended lock is usually cheap. A deadlock is infinitely expensive.


Part 2 — Each primitive: deep and practical


1) lock / Monitor

What problem it solves

lock protects a critical section so only one thread at a time can execute it for a given lock object.

In real systems, this is the basic tool for protecting shared mutable state.

Typical examples:

  • machine state transitions
  • protecting a vendor SDK session object that is not thread-safe
  • updating several related fields together so they stay consistent
  • swapping a shared buffer safely

lock is syntax sugar over Monitor.Enter / Monitor.Exit.

How it actually behaves

When a thread enters a lock, it acquires exclusive ownership of that monitor. Other threads trying to enter the same lock have to wait.

Important real-world points:

  • lock is thread-affine in practice: the thread that entered must leave
  • it is blocking, not async-friendly
  • if the lock is free, acquisition is fast
  • if many threads contend, waiting threads block and throughput drops
  • if code inside the lock does slow work, you create a bottleneck

Monitor also gives extra features like TryEnter, Wait, Pulse, PulseAll, but most production code should stay simple unless you really need condition-style coordination.

When to use it

Use lock when:

  • you need to protect a small synchronous critical section
  • multiple related fields must be updated together
  • you need strong, simple reasoning about state
  • the protected work is short and non-async

Example: machine state transition.

csharp
private readonly object _stateLock = new();
private MachineState _state = MachineState.Idle;

public bool TryStart()
{
    lock (_stateLock)
    {
        if (_state != MachineState.Idle)
            return false;

        _state = MachineState.Starting;
        return true;
    }
}

This is good because the check and state transition are one atomic unit from the program’s point of view.

Another example: protect a non-thread-safe SDK object.

csharp
private readonly object _sdkLock = new();

public void SendCommand(string command)
{
    lock (_sdkLock)
    {
        _vendorSdk.Send(command);
    }
}

If the SDK is not thread-safe, this is often the simplest safe option.

When NOT to use it

Do not use lock when:

  • the code inside needs await
  • the operation may take a long time
  • you are protecting workflow sequencing instead of just shared state
  • you need concurrency throttling rather than mutual exclusion
  • you are locking across external calls that may block unpredictably

Bad example:

csharp
lock (_stateLock)
{
    await _machine.InitializeAsync(); // illegal with lock
}

Even if you try to work around it with .Wait() or .Result, you are creating serious deadlock risk.

Also avoid locking around long vendor calls if they can hang or take seconds. That turns one lock into a system-wide choke point.

Common mistakes

1. Locking on this, strings, or public objects

csharp
lock (this) { ... }   // bad
lock ("machine") { ... } // very bad

Always lock on a private dedicated object.

2. Doing too much inside the lock

Bad:

csharp
lock (_stateLock)
{
    _logger.LogInformation("Starting machine...");
    _vendorSdk.Start();
    SaveAuditRecord();
    RefreshUi();
}

This is dangerous because:

  • logging may block
  • SDK may block
  • DB or file IO may block
  • UI work does not belong there

Keep locked sections tight.

3. Nested locks

csharp
lock (_machineLock)
{
    lock (_recipeLock)
    {
        ...
    }
}

This is how deadlocks start.

4. Assuming lock makes everything correct

lock makes access serialized for that lock. It does not magically fix bad state design, bad ownership, or bad workflow.

Performance implications

  • Uncontended locks are often fine.
  • Contended locks hurt throughput and increase latency.
  • Long critical sections are much worse than the lock itself.
  • One heavily shared lock can serialize a whole subsystem.

In desktop machine systems, performance problems from locks usually come from:

  • one global lock around too much logic
  • high-frequency streaming paths fighting over the same lock
  • UI thread blocked waiting for a busy lock
  • background workers blocked on slow code inside a lock

A short lock around state transition is usually cheap. A lock around image processing is usually wrong.


2) SemaphoreSlim

What problem it solves

SemaphoreSlim controls how many callers may enter at once.

When initialized with 1, it acts like an async-friendly mutual exclusion mechanism. When initialized with N, it acts like a concurrency limiter.

This makes it useful for coordination, especially in async code.

How it actually behaves

WaitAsync() lets callers asynchronously wait without blocking a thread. Wait() blocks. Release() signals completion.

With count = 1:

  • only one caller at a time proceeds
  • others wait in line

With count > 1:

  • up to N callers proceed concurrently

It is not the same as lock:

  • it is not tied to a specific thread the same way lock is
  • it is better suited for async workflows
  • misuse is easier because forgetting Release() breaks the world

When to use it

Async mutual exclusion

Example: only one Start or Stop operation should run at a time, and those operations are async.

csharp
private readonly SemaphoreSlim _commandGate = new(1, 1);

public async Task StartAsync(CancellationToken ct)
{
    await _commandGate.WaitAsync(ct);
    try
    {
        if (_state != MachineState.Idle)
            return;

        _state = MachineState.Starting;
        await _machine.InitializeAsync(ct);
        _state = MachineState.Running;
    }
    finally
    {
        _commandGate.Release();
    }
}

This is a common real-world pattern in WPF + hardware systems.

Throttling

Example: defect thumbnails are processed in parallel, but you only want 4 at a time to avoid saturating CPU or memory.

csharp
private readonly SemaphoreSlim _thumbnailLimiter = new(4, 4);

public async Task ProcessThumbnailAsync(ImageData image, CancellationToken ct)
{
    await _thumbnailLimiter.WaitAsync(ct);
    try
    {
        await _thumbnailService.GenerateAsync(image, ct);
    }
    finally
    {
        _thumbnailLimiter.Release();
    }
}

When NOT to use it

Do not use SemaphoreSlim when:

  • you only need a tiny synchronous critical section; lock is simpler
  • you are using it as a band-aid around poor ownership design
  • you need thread-safe access to a collection but not async coordination
  • you need event signaling or queue semantics; other tools are better

Also do not use SemaphoreSlim(1,1) everywhere as a universal replacement for lock. That usually makes code harder to reason about.

Common mistakes

1. Forgetting Release()

That causes permanent hangs.

2. Releasing too many times

That corrupts the semaphore count and can allow more concurrent access than intended.

3. Mixing blocking waits and async waits carelessly

csharp
_semaphore.Wait();
await SomethingAsync();

This may work, but it is often a smell. It blocks a thread while holding access across an async boundary.

4. Using it to protect shared state that should not be shared in the first place

This is common in service code that really wants a single owner loop or message queue.

Performance implications

  • Good for async coordination.
  • Better than blocking threads when waiting asynchronously.
  • Still has contention cost.
  • Not free under high pressure.
  • Overuse can create serialized pipelines that look async but behave single-threaded.

A common failure mode is thinking “it is async so it must scale,” while a semaphore count of 1 turns the whole path into a queue.

That may be correct, but you should know you are doing it.


3) Interlocked

What problem it solves

Interlocked performs atomic operations on simple values.

That means the read-modify-write happens as one indivisible step.

It is ideal for:

  • counters
  • flags represented as ints
  • one-time initialization guards
  • exchanging references atomically

How it actually behaves

Interlocked uses CPU-level atomic instructions where possible. It is much lighter than taking a lock for very small operations.

Common methods:

  • Increment
  • Decrement
  • Add
  • Exchange
  • CompareExchange
  • Read for some integral types

Real-world meaning:

  • no other thread can observe a half-updated value for that operation
  • very fast for simple state changes
  • only protects that specific variable operation, not larger invariants

This last point is where people go wrong.

When to use it

Shared counters

csharp
private int _defectCount;

public void OnDefectDetected()
{
    Interlocked.Increment(ref _defectCount);
}

This is a perfect use case.

One-time transition guard

Example: ensure Stop is requested once.

csharp
private int _stopRequested; // 0 = no, 1 = yes

public bool TryRequestStop()
{
    return Interlocked.Exchange(ref _stopRequested, 1) == 0;
}

First caller gets true, later callers get false.

Compare-and-swap for state transition

csharp
private int _isRunning; // 0 = stopped, 1 = running

public bool TryMarkRunning()
{
    return Interlocked.CompareExchange(ref _isRunning, 1, 0) == 0;
}

This means “set to 1 only if current value is 0.”

When NOT to use it

Do not use Interlocked when:

  • multiple fields must stay consistent together
  • state transitions involve business rules, not just bit flips
  • the logic spans more than one atomic operation
  • readability becomes terrible

Bad example:

csharp
if (Interlocked.CompareExchange(ref _isStarting, 1, 0) == 0)
{
    if (_hasRecipeLoaded == 1 && _connectionReady == 1)
    {
        ...
    }
}

This is already drifting into tricky state logic. A proper state owner or lock may be clearer and safer.

Common mistakes

1. Thinking atomic variable update means atomic business logic

This is one of the most common senior-level discussion points.

Example:

csharp
if (_defectCount < _maxDefects)
{
    Interlocked.Increment(ref _defectCount);
}

This is not safe. Two threads can both pass the check before incrementing.

If you need “check then update” as one invariant, Interlocked.Increment alone is not enough.

2. Building complex lock-free logic without being sure

Lock-free code is hard. Many systems would be better with a small lock.

3. Using it for readability-hostile micro-optimizations

A simple lock is often easier to maintain.

Performance implications

  • Very fast for simple atomic operations
  • Great under moderate contention for counters/flags
  • Still not free under extreme contention
  • Can become a hotspot when many cores hammer the same variable

This is where false sharing and cache-line bouncing can appear at a high level: many threads repeatedly updating the same memory location can create heavy cache traffic.

So Interlocked is fast, but not magic.


4) volatile

What problem it solves

volatile is about visibility and ordering, not atomic multi-step correctness.

It tells the runtime and CPU not to treat accesses as freely optimizable in ways that would hide updates between threads.

In plain English: one thread writes a field, another thread should see the latest value more reliably.

How it actually behaves

A volatile field:

  • prevents some compiler/CPU reordering around that access
  • improves visibility across threads
  • does not make compound operations atomic

That last point is crucial.

This is safe:

csharp
private volatile bool _shutdownRequested;

One thread sets it to true, another thread reads it in a loop.

This is not made safe by volatile:

csharp
if (!_shutdownRequested)
{
    _shutdownRequested = true;
}

That is still a check-then-act race if multiple threads do it.

When to use it

Use volatile rarely, and only when the problem is simple visibility of a flag or reference.

Example: cooperative loop shutdown.

csharp
private volatile bool _shouldStop;

public void RequestStop() => _shouldStop = true;

public async Task PollLoopAsync()
{
    while (!_shouldStop)
    {
        await PollMachineAsync();
        await Task.Delay(100);
    }
}

Even here, many teams prefer CancellationToken instead because it is more expressive and fits async APIs better.

When NOT to use it

Do not use volatile when:

  • you need atomic increments
  • you need check-then-update correctness
  • multiple fields must stay consistent
  • you are unsure whether visibility is the real problem
  • CancellationToken, Interlocked, or a lock would be clearer

Common mistakes

1. Using volatile as a general “thread-safe” keyword

It is not.

2. Using it on counters

csharp
private volatile int _count;
_count++;

Still not thread-safe.

3. Hiding design problems with flags

If ten threads are watching five volatile flags, your design likely needs stronger ownership or a message-based approach.

Performance implications

volatile is not about speed. It is about memory visibility semantics.

It can be cheaper than a lock for very simple read/write flags, but correctness is narrow. Use it only when that narrow correctness is exactly what you need.


5) Concurrent collections

Examples:

  • ConcurrentDictionary<TKey,TValue>
  • ConcurrentQueue<T>
  • ConcurrentBag<T>
  • ConcurrentStack<T>
  • BlockingCollection<T> historically, though newer designs often prefer Channels

What problem they solve

They allow multiple threads to access a shared collection safely for supported operations.

This is useful when:

  • many threads produce results into a queue
  • a cache is shared across worker paths
  • multiple consumers read/write shared collection state

How they actually behave

They do not mean “all operations involving this collection are automatically correct.”

They mean individual collection operations are thread-safe according to that type’s contract.

For example, ConcurrentDictionary makes add/get/update operations thread-safe. But if your workflow logic spans multiple operations, you can still have races.

Example:

csharp
if (!_runs.ContainsKey(runId))
{
    _runs[runId] = CreateRunState();
}

This is not the right pattern. Use GetOrAdd.

csharp
var run = _runs.GetOrAdd(runId, _ => CreateRunState());

That is better.

When to use them

ConcurrentQueue<T>

For multi-producer/single-consumer or multi-producer/multi-consumer scenarios where a simple shared queue makes sense.

Example: defect events being buffered before a processing stage.

csharp
private readonly ConcurrentQueue<DefectEvent> _queue = new();

public void EnqueueDefect(DefectEvent evt) => _queue.Enqueue(evt);

public bool TryDequeue(out DefectEvent evt) => _queue.TryDequeue(out evt);

ConcurrentDictionary<TKey,TValue>

Useful for shared caches, run/session registries, and state by key.

Example: active wafer runs by run ID.

csharp
private readonly ConcurrentDictionary<string, WaferRunContext> _runs = new();

When NOT to use them

Do not use concurrent collections when:

  • a single owner thread would be simpler
  • you need blocking/backpressure semantics; use channels or explicit coordination
  • you think collection thread safety means business logic safety
  • you are sharing too much state and avoiding design cleanup

A classic mistake is turning the whole app into a giant ConcurrentDictionary of mutable objects. The dictionary becomes thread-safe, but the objects inside may still be unsafe.

Common mistakes

1. False sense of safety

csharp
var context = _runs[runId];
context.DefectCount++;

The dictionary access is thread-safe. The object mutation may not be.

2. Multi-step races

csharp
if (_runs.TryGetValue(runId, out var context))
{
    if (!context.IsCompleted)
    {
        context.MarkCompleted();
    }
}

Collection safety does not make the object state transition atomic.

3. Using a queue where you actually need backpressure and ownership

A raw concurrent queue can grow forever. In real-time systems, that can become a memory incident.

Performance implications

  • Often very good for supported access patterns
  • Better than hand-rolled locking around many collection operations
  • Still have contention and internal synchronization costs
  • Can scale well, but not infinitely
  • Wrong access patterns can destroy expected benefits

In streaming systems, the bigger performance issue is often not the concurrent collection itself, but:

  • unbounded growth
  • poor lifecycle ownership
  • contention on items stored inside
  • downstream consumers being slower than producers

Part 3 — Deadlocks

What a deadlock really is

A deadlock is not just “two locks stuck.”

A deadlock is a situation where execution paths are waiting on each other in a cycle, and nobody can make progress.

In production, it feels like:

  • the UI freezes
  • Start never completes
  • Stop hangs forever
  • shutdown gets stuck
  • no exception appears
  • CPU might even be low because everything is just waiting

That is why deadlocks are so painful. The system is alive, but progress has stopped.

How deadlocks happen in production

The real pattern is almost always:

  1. Thread A holds resource X and waits for resource Y
  2. Thread B holds resource Y and waits for resource X

Or a variation involving the UI thread, synchronization context, or blocking waits.


Pattern 1 — Nested locks

csharp
lock (_machineLock)
{
    lock (_recipeLock)
    {
        ApplyRecipe();
    }
}

Elsewhere:

csharp
lock (_recipeLock)
{
    lock (_machineLock)
    {
        StopMachine();
    }
}

Now one thread can hold _machineLock and wait for _recipeLock, while another holds _recipeLock and waits for _machineLock.

Deadlock.

Fix

  • avoid nested locks if possible
  • enforce one global lock ordering rule
  • reduce shared state so both locks are not needed
  • move to single-owner command processing for machine state

Pattern 2 — UI thread + background thread

This is very common in WPF.

Example:

  • background thread holds a lock and tries to update UI synchronously via Dispatcher.Invoke
  • UI thread, meanwhile, tries to enter the same lock

Background thread:

csharp
lock (_stateLock)
{
    Application.Current.Dispatcher.Invoke(() =>
    {
        StatusText = "Running";
    });
}

UI thread:

csharp
lock (_stateLock)
{
    RenderMachineState();
}

Now:

  • background thread waits for UI thread to run the dispatcher action
  • UI thread waits for _stateLock
  • deadlock

Fix

Do not call Dispatcher.Invoke while holding locks.

Capture data inside the lock, then update UI outside.

csharp
string status;
lock (_stateLock)
{
    status = _state.ToString();
}

Application.Current.Dispatcher.BeginInvoke(() =>
{
    StatusText = status;
});

Also prefer BeginInvoke/async posting over synchronous Invoke unless you truly need blocking semantics.


Pattern 3 — Sync-over-async (.Result, .Wait())

This is infamous in WPF.

Example on UI thread:

csharp
public void StartButton_Click(object sender, RoutedEventArgs e)
{
    var result = StartInspectionAsync().Result;
}

Why this deadlocks:

  • UI thread blocks waiting for async result
  • StartInspectionAsync awaits something
  • continuation tries to resume on captured UI context
  • UI thread is blocked
  • continuation cannot run
  • deadlock

This is one of the most common real WPF deadlocks.

Fix

Make the whole path async.

csharp
public async void StartButton_Click(object sender, RoutedEventArgs e)
{
    var result = await StartInspectionAsync();
}

And be disciplined about not blocking async flows.


Why WPF apps are especially vulnerable

Because WPF has a single UI thread with thread affinity.

That creates several risks:

  • developers accidentally block the UI thread
  • async continuations often resume onto the UI context
  • synchronous dispatcher calls can introduce circular waits
  • property changes, command handling, and rendering all depend on a responsive dispatcher

Machine apps make this worse because they also have:

  • hardware callbacks
  • polling loops
  • streaming pipelines
  • vendor SDKs with weird threading rules
  • operators clicking buttons during long-running workflows

So WPF deadlocks are not just a threading problem. They are a coordination + UI model problem.


Part 4 — Real problems in a wafer inspection system

Let’s use this system:

A WPF desktop app controlling a wafer inspection machine

It has:

  • Start / Stop / Pause commands
  • live camera/image streams
  • defect result processing
  • UI dashboards
  • recipe loading
  • machine callbacks from SDK
  • long-running wafer runs

Now let’s look at real failure patterns.


1) Locking around machine state → deadlock risk

A common design is:

csharp
private readonly object _machineLock = new();
private MachineState _state;

Then many methods do:

csharp
lock (_machineLock)
{
    // check state
    // call SDK
    // update state
    // notify UI
}

This feels safe at first.

But over time:

  • Start locks machine state and calls SDK
  • SDK raises callback on another thread
  • callback wants same lock
  • UI thread also reads state under same lock
  • Stop waits behind Start
  • callback tries to marshal back to UI
  • eventually a deadlock or freeze appears

The problem is not just the lock. It is that one lock became responsible for:

  • state
  • SDK call serialization
  • workflow sequencing
  • UI visibility

Too many responsibilities.

Better approach

Split responsibilities:

  • one owner for machine command sequencing
  • short lock only for tiny state transitions if needed
  • never hold lock across SDK calls if avoidable
  • post UI updates after leaving the critical section
  • consider single-threaded command processor for machine operations

2) Mixing UI thread + background thread → freeze

Example:

  • result pipeline processes defects in background
  • it updates ObservableCollection directly
  • WPF throws cross-thread errors, so developer adds dispatcher calls everywhere
  • one path uses synchronous Invoke
  • now under load the UI becomes a bottleneck and may freeze

Typical bad pattern:

csharp
Parallel.ForEach(defects, defect =>
{
    Application.Current.Dispatcher.Invoke(() =>
    {
        Defects.Add(defect);
    });
});

This is terrible under load:

  • many threads synchronously queue to UI
  • background threads block
  • UI thread becomes overloaded
  • responsiveness collapses

Better approach

Batch updates. Keep heavy processing off UI thread. Use producer-consumer handoff and periodic UI refresh.

For example:

  • background threads push to a queue
  • UI timer or dispatcher batch drains every 100 ms
  • UI updates happen in chunks

This is often much better than trying to make every event instantly visible.


3) Incorrect use of Interlocked → logical bugs

Suppose you want to prevent Start from running twice.

csharp
private int _isStarting;

public async Task StartAsync()
{
    if (Interlocked.Exchange(ref _isStarting, 1) == 1)
        return;

    try
    {
        await _machine.StartAsync();
    }
    finally
    {
        Interlocked.Exchange(ref _isStarting, 0);
    }
}

Looks okay.

But what if:

  • machine is already running
  • Stop is requested during start
  • recipe load failed and state changed elsewhere
  • there is also _isRunning, _isStopping, _hasFault

Now you have a bunch of atomic flags, but no reliable overall state model.

That creates logical races even though each variable update is atomic.

Lesson

Atomic primitive correctness is not the same as state machine correctness.

For machine systems, a proper state model is often better than several independent atomics.


4) Overusing ConcurrentDictionary → false sense of safety

Suppose active wafer runs are stored in:

csharp
private readonly ConcurrentDictionary<string, RunContext> _runs = new();

Good so far.

But then RunContext contains mutable lists, counters, flags, timestamps, and defect maps updated by many threads.

Now the dictionary is safe. The actual contents are not.

This is a classic production trap.

Better approach

Either:

  • make RunContext internally synchronized very carefully, or
  • treat each RunContext as owned by one processing agent/thread, or
  • make updates flow through messages/commands

Concurrent collections are best when paired with a good ownership model.


5) volatile flags → when they break

Suppose you write:

csharp
private volatile bool _stopRequested;

The poll loop checks it:

csharp
while (!_stopRequested)
{
    Poll();
}

This can be fine for a simple stop signal.

But then someone adds:

  • _isStopping
  • _hasFlushedBuffers
  • _cameraDrainComplete
  • _motorParked

Now shutdown depends on several volatile flags observed across threads.

You no longer have a simple visibility problem. You have a distributed protocol with no strong synchronization.

That is where volatile-based designs break down.

Better approach

Use:

  • CancellationToken for cooperative cancellation
  • explicit task completion for shutdown phases
  • one coordinator owning shutdown sequence
  • stronger synchronization or message passing where correctness matters

Part 5 — Choosing the right tool

Here is the practical mental model.

Use lock when:

  • shared mutable state must be protected
  • operation is synchronous and short
  • several related values must remain consistent together
  • simplicity and readability matter most

Typical use:

  • protect machine state transition data
  • protect non-thread-safe SDK session access
  • swap shared references safely with a small critical section

Use SemaphoreSlim when:

  • code is async
  • you need mutual exclusion across awaited work
  • you want to throttle concurrency
  • the problem is coordination, not just tiny state protection

Typical use:

  • only one Start/Stop command at a time
  • allow only 4 image decoders concurrently
  • serialize async access to a hardware command pipeline

Use Interlocked when:

  • the problem is one variable
  • counter, flag, one-time guard, atomic reference swap
  • you need very cheap atomic operation
  • the business invariant is small and explicit

Typical use:

  • increment processed-frame count
  • mark stop requested once
  • exchange current snapshot reference atomically

Use volatile when:

  • you truly only need visibility of a simple flag/reference
  • the semantics are extremely narrow
  • stronger tools would be overkill

Typical use:

  • a simple loop-exit flag in low-level code

But in most application code, CancellationToken or stronger coordination is usually clearer.

Use concurrent collections when:

  • the collection itself is genuinely shared
  • supported operations match your access pattern
  • you understand collection safety is not workflow safety
  • you need multi-threaded producer/consumer or concurrent keyed access

Typical use:

  • run registry
  • result buffering queue
  • shared cache

Prefer redesign instead of synchronization when:

  • many locks appear across the same subsystem
  • state logic spans many fields and threads
  • deadlock risk grows
  • code becomes impossible to reason about
  • “thread-safe” objects are still producing logical corruption

This is where senior engineers step back and redesign.

When to use message passing instead

Message passing is often better when:

  • one subsystem should own its own state
  • commands naturally form a sequence
  • the problem is coordination more than shared data
  • you want easier reasoning and lower deadlock risk

For example, a machine controller can be designed as:

  • one command queue
  • one owner loop
  • all machine state changes happen there
  • other components send commands/messages instead of touching shared state directly

That often reduces locking dramatically.

For industrial desktop systems, this is one of the highest-value design moves.


Part 6 — Performance and trade-offs

Lock contention

Contention is when many threads try to acquire the same lock.

Symptoms:

  • throughput drops
  • latency spikes
  • UI stalls when waiting on background-held lock
  • thread pool pressure may grow if blocking spreads

The real enemy is usually not the lock itself. It is:

  • too much code inside
  • too many callers
  • wrong granularity
  • wrong ownership model

Atomic operations vs locks

Interlocked is usually cheaper than a lock for one variable.

But the moment you need:

  • check then act
  • multiple related values
  • larger invariants

a lock may be the correct tool.

Do not replace clear locking with confusing lock-free code for tiny gains.

Scalability under load

Some things scale poorly:

  • one global lock for all state
  • one shared counter hit by every thread
  • synchronous dispatcher updates per event
  • unbounded concurrent queue with slow consumer
  • many workers mutating one shared object graph

In machine systems, scalability is often less about raw thread count and more about:

  • keeping UI responsive
  • keeping command sequencing correct
  • keeping streaming pipelines bounded
  • avoiding queue explosion and memory pressure

False sharing, high level

False sharing happens when different threads update different variables that live close together in memory, causing cache invalidation traffic.

You usually do not optimize for this first in application code. But it can matter in hot paths like high-frequency counters.

The important takeaway is broader:

  • shared writes are expensive
  • even “simple atomics” can become bottlenecks at scale
  • reduce shared hot-state when possible

When optimization matters

It matters when:

  • you measured contention
  • streaming rate is high enough to stress hot paths
  • the UI is lagging
  • CPU is high due to synchronization
  • memory grows from queue backlog

It does not matter when:

  • the path is infrequent
  • correctness is the primary concern
  • the simple solution is already fast enough

In interviews, a strong senior answer is:

I optimize concurrency only after I have a correct design and evidence of contention. Most production issues come from wrong ownership and blocking behavior, not from the raw cost of a small lock.

That is the right mindset.


Part 7 — Senior engineer thinking

How experienced engineers avoid concurrency bugs by design

They do not start with primitives. They start with ownership.

They ask:

  • who owns this state?
  • who is allowed to mutate it?
  • can I avoid sharing it at all?
  • can I serialize commands through one boundary?
  • can I batch UI updates instead of pushing each event?
  • can I pass immutable snapshots instead of mutable objects?

This is the big shift from mid-level to senior thinking.

Junior approach:

  • state is shared, then patched with locks

Senior approach:

  • state ownership is designed so less sharing exists in the first place

Reducing shared state instead of “fixing” it

This is the highest leverage concurrency strategy.

Examples:

  • machine controller owns machine state exclusively
  • pipeline stage owns its own buffer
  • UI consumes immutable view models or snapshots
  • background workers publish results rather than mutating UI-bound objects directly

Once you reduce shared mutable state, primitive usage drops naturally.

Thinking in ownership and boundaries

Good boundaries in a WPF machine app often look like this:

  • UI thread owns visual state and bindings
  • machine controller owns command sequencing and machine lifecycle
  • stream processor owns result ingestion
  • storage writer owns persistence queue
  • communication happens via messages, queues, channels, or immutable snapshots

That is much easier to reason about than “everything can touch everything if it takes the right lock.”

How to reason about correctness under concurrency

Ask very concrete questions:

  • Can two Starts overlap?
  • Can Stop arrive while Start is incomplete?
  • Can result processing continue after run completion?
  • Can UI observe half-updated state?
  • Can shutdown finish while workers still hold resources?
  • Can a callback arrive after disposal?
  • Can a queue grow faster than it drains?
  • What invariants must always hold?

Then identify where those invariants are enforced:

  • lock?
  • semaphore?
  • atomic operation?
  • owner loop?
  • state machine?
  • queue boundary?

If you cannot answer clearly, the design is not yet safe.

How to debug deadlocks and race conditions in production

This is hard, but there is a pattern.

For deadlocks / hangs

Look for:

  • thread dumps
  • blocked UI thread stack
  • threads waiting on monitor/semaphore
  • dispatcher stuck in Invoke
  • .Result / .Wait() on UI or hot paths
  • lock order cycles

Useful signs:

  • no CPU, but app frozen
  • command never returns
  • shutdown hangs
  • all logs stop around a state transition

Add structured logging around:

  • lock acquisition attempts in critical areas
  • state transitions
  • Start/Stop/Dispose lifecycle
  • command IDs / run IDs / wafer IDs
  • wait durations

You do not log every lock in hot paths, but for critical coordination points it can help a lot.

For race conditions

Look for:

  • impossible state combinations
  • intermittent wrong counts
  • duplicate command execution
  • “already disposed” or “not initialized” timing errors
  • behavior that disappears under debugger

Helpful techniques:

  • add correlation IDs
  • log every state transition with old/new state
  • make transitions explicit and centralized
  • use stress tests and fault injection
  • increase concurrency in test harnesses
  • capture dumps when hangs occur

Senior engineers know that diagnosing concurrency bugs often requires better observability, not just better code reading.


Practical summary

If I reduce everything to a compact decision model:

  • Use lock for short synchronous protection of shared mutable state.
  • Use SemaphoreSlim for async mutual exclusion or throttling.
  • Use Interlocked for tiny atomic operations on one variable.
  • Use volatile only for narrow visibility scenarios, and rarely.
  • Use concurrent collections when the collection truly needs shared access, but never confuse collection safety with workflow safety.
  • Prefer redesign and ownership boundaries over increasingly clever synchronization.
  • In WPF, be paranoid about:
    • blocking the UI thread
    • Dispatcher.Invoke while holding locks
    • .Result / .Wait()
    • background threads touching UI-bound state directly

The most reliable concurrency design is usually not the one with the smartest primitive.

It is the one with the least shared mutable state.


Interview Q&A

1) When would you use lock instead of SemaphoreSlim?

Use lock for short, synchronous critical sections protecting shared state. It is simple, fast enough in most uncontended cases, and easy to reason about.

Use SemaphoreSlim when the protected operation is async or when I need throttling. For example, serializing StartAsync and StopAsync calls belongs to SemaphoreSlim, because those operations may await. A lock cannot safely span async work.


2) Why is Interlocked not a replacement for lock?

Because Interlocked only makes a single variable operation atomic. It does not protect larger invariants involving multiple fields or multi-step business logic.

For example, incrementing a counter is fine with Interlocked. But validating machine state, recipe readiness, and connection status before transitioning to Running is not a one-variable atomic problem. That needs stronger coordination or ownership.


3) What is the biggest misconception about ConcurrentDictionary?

That it makes the whole design thread-safe.

It only makes dictionary operations thread-safe. The objects stored inside can still be unsafe, and multi-step workflow logic around the dictionary can still race. I treat concurrent collections as safe containers, not as proof that the broader workflow is correct.


4) When is volatile appropriate?

Rarely. Mostly for very simple visibility scenarios, like a low-level shutdown flag read by one loop and written by another thread.

In application code, I usually prefer CancellationToken, Interlocked, or explicit coordination because they communicate intent better and are easier to reason about. volatile is narrow and often misunderstood.


5) How do deadlocks usually happen in WPF apps?

The most common patterns are:

  • nested locks with inconsistent order
  • background thread holding a lock and calling Dispatcher.Invoke
  • sync-over-async on the UI thread using .Result or .Wait()

WPF is especially vulnerable because the UI thread is a single-threaded dispatcher with thread affinity. If that thread blocks, continuations and UI work can no longer make progress.


6) How would you protect machine Start/Stop commands in a desktop control app?

I would usually not let multiple callers directly manipulate machine state with scattered locks. I prefer a clear coordination boundary.

A common solution is:

  • one machine controller owns state transitions
  • Start/Stop are serialized through a SemaphoreSlim or command queue
  • state transitions are explicit and logged
  • SDK calls are not made while holding unrelated locks
  • UI is updated asynchronously after state changes

That gives much stronger reasoning than many small locks across services and view models.


7) How do senior engineers avoid concurrency bugs?

By reducing shared mutable state and designing ownership boundaries.

Instead of asking “which primitive should I use,” they ask:

  • who owns this state?
  • who can mutate it?
  • can this become message passing instead?
  • can I hand off immutable snapshots?
  • can I centralize state transitions?

The best concurrency bug is the one the design makes impossible.


8) What is lock contention, and when does it matter?

Lock contention happens when many threads compete for the same lock, causing waiting and serialization.

It matters when that lock is in a hot path, like real-time result ingestion or frequently updated shared state. But I would not optimize it blindly. First I would check whether the design is over-sharing state, whether the critical section is too large, and whether a queue or ownership model would be better.


9) How would you debug a suspected deadlock in production?

I would first identify whether it is a true deadlock or just long blocking.

Then I would inspect:

  • UI thread stack
  • blocked worker threads
  • monitor/semaphore waits
  • dispatcher usage
  • .Result / .Wait() patterns
  • lock order across involved code paths

I also want logs around state transitions and command lifecycles, with correlation IDs. In machine systems, knowing that “Stop requested for Run 123 entered waiting state after Start callback” is far more useful than generic error logs.


10) When would you redesign instead of adding more synchronization?

When synchronization starts spreading everywhere:

  • multiple locks across one subsystem
  • hard-to-explain deadlock risk
  • many atomics representing one logical state machine
  • concurrent collection plus mutable shared objects
  • UI and background code tightly interwoven

At that point, more primitives usually make the system harder to reason about. I would redesign around ownership, message passing, command serialization, or explicit state machines.


If you want, I can turn this into a second piece with focused interview-ready Q&A plus small “bad vs better” code examples for each primitive.

Docs-first project memory for AI-assisted implementation.