Skip to content

Background processing, worker loops, and hosted services in .NET desktop systems

This is one of those topics that sounds simple until you build a real machine-control system.

At first, “background processing” feels like a small implementation detail. You think: “I just need a few tasks running in the background.” But in a real industrial desktop application, background processing becomes part of the system’s backbone.

A WPF app controlling a wafer inspection machine is not just showing screens. It is constantly doing work behind the scenes:

  • polling machine state
  • receiving events from hardware or vendor SDKs
  • processing inspection results
  • saving data and images
  • checking health
  • reconnecting after faults
  • cleaning up resources
  • notifying the UI without freezing it

That means your desktop app is not just a UI app. It is a long-running process with a UI attached to it.

And that changes how you should design it.


PART 1 — BIG PICTURE

Why background processing is unavoidable in real desktop systems

In a real industrial system, work keeps happening whether the user is actively clicking buttons or not.

The machine does not stop producing state changes because the UI is idle. Cameras keep capturing. Sensors keep changing. Inspection results keep arriving. Connections can drop at any time. Files still need to be saved. Health checks still need to run.

So background processing is not optional. It is the mechanism that lets the system stay alive and responsive while continuously doing work.

A few concrete examples:

Machine status polling Some machines or vendor SDKs do not provide reliable push events for all state changes. So you end up polling every 100 ms, 500 ms, or 1 second to ask: “Are you connected?” “Are you idle, running, or faulted?” “Did the current job finish?” “Did alarm code change?”

Event ingestion If the machine or SDK pushes events, those events still need to be consumed somewhere. You need a loop that reads them, validates them, timestamps them, and forwards them to the rest of the system.

Result processing Inspection results often arrive faster than the UI or persistence layer should handle directly. So you buffer and process them in the background: enrich metadata, create thumbnails, run defect classification, aggregate counters, and then publish summaries to the UI.

Health monitoring You may need a loop that checks whether the machine is alive, whether disk space is low, whether a camera stream stalled, whether a PLC heartbeat stopped, or whether a save queue is falling behind.

Auto-save / persistence In industrial systems, users hate losing data. So the app often has some background persistence pipeline saving recipes, logs, run history, thumbnails, raw images, or partially completed inspection sessions.

These are not “nice-to-have background features.” They are core system behavior.


Why not everything can run on the UI thread

Because the UI thread is a terrible place for long-running work.

In WPF, the UI thread owns rendering, input handling, data binding updates, command execution, and most UI object access. If you block that thread, the app appears frozen. The window stops repainting. Buttons stop responding. Users start force-closing the application. Operators lose trust immediately.

A common junior mistake is thinking, “This operation is small, I’ll just do it directly in the command handler.” But “small” adds up fast:

  • a synchronous SDK call that occasionally takes 300 ms
  • image decoding
  • file I/O
  • database writes
  • waiting for a machine reply
  • retry loops
  • reconnection attempts

Do enough of that on the UI thread and your “desktop application” becomes a laggy control panel.

In machine-control systems, that is worse than ugly. It is dangerous. An operator may press Stop and see no visible response for two seconds because the UI thread is blocked doing result processing. That is unacceptable.

So the rule becomes:

UI thread = UI workBackground workers = continuous, blocking, or heavy work


Why industrial systems often need multiple always-on loops

In real systems, one background loop is rarely enough.

The reason is simple: different jobs have different timing, failure behavior, and load patterns.

A single “mega-loop” becomes a mess because it mixes concerns like:

  • fast machine polling
  • slow persistence
  • bursty result processing
  • reconnection logic
  • health checks
  • cleanup jobs

These have different requirements.

For example:

Machine monitor loop Runs frequently and must be predictable. It should be lightweight and resilient.

Result processing loop May handle bursts of thousands of items. Needs buffering and backpressure.

Reconnect loop Should sleep most of the time, wake up only when disconnected, retry with policy, and stop once reconnected.

Save pipeline May involve disk I/O, database I/O, batching, retries, and flush-on-shutdown behavior.

Health monitor Runs periodically, produces alerts, and should not interfere with machine-control timing.

Trying to jam all of these into one loop leads to poor timing, unclear ownership, and fragile code.

Experienced engineers separate loops by responsibility and design explicit coordination between them.


PART 2 — HOW IT ACTUALLY WORKS

Background tasks vs dedicated threads

This is where people often get confused.

Not every background operation needs its own dedicated OS thread.

In modern .NET, a lot of background work is best represented as a Task running an async loop:

  • wait for message
  • poll status
  • delay
  • read from channel
  • write to queue
  • process result
  • repeat until canceled

That often uses ThreadPool threads under the hood, which is usually fine.

A dedicated thread is more appropriate when:

  • the vendor SDK requires thread affinity
  • you have blocking native calls that can hold a thread for long periods
  • timing must be isolated from ThreadPool starvation
  • you run a special event pump or COM-dependent worker
  • you need a single-threaded actor-like hardware access loop

A Task-based async loop is more appropriate when:

  • work is I/O-heavy
  • you spend time awaiting delays or I/O
  • you want simpler cancellation and composition
  • you want to integrate cleanly with host lifetime and service abstractions

The mistake is not choosing one or the other. The mistake is using them carelessly.

A dedicated thread for every small background function wastes resources and increases complexity.

A ThreadPool-based task for badly blocking hardware calls can starve the pool and hurt the entire app.

So the real question is not “Task or thread?” The real question is:

What kind of work is this loop doing, and what execution model matches it?


Long-running loops

A worker loop usually looks like this conceptually:

  1. Start
  2. Check cancellation
  3. Try to do one unit of work
  4. Handle failures
  5. Wait or continue
  6. Repeat

That sounds trivial. In production, it is not.

Because every loop needs design answers for questions like:

  • What should happen if one iteration fails?
  • Should the loop retry immediately or back off?
  • How do we avoid spinning at 100% CPU?
  • How do we report health?
  • How do we shut down cleanly?
  • What state is safe to share?
  • Does this loop own its dependency exclusively?
  • What happens if the machine disconnects mid-iteration?

A good loop is not just a while(true). It is a controlled service with clear lifecycle and failure behavior.


Coordination with cancellation and shutdown

A serious desktop app must shut down cleanly.

That means when the user closes the app, or the system is stopping, you need to:

  • signal all loops to stop
  • prevent new work from being accepted
  • let in-flight work finish if appropriate
  • flush pending saves if required
  • release hardware connections
  • stop timers and subscriptions
  • wait for workers to exit within a reasonable timeout

That is where CancellationToken becomes essential.

Each worker loop should cooperatively observe cancellation. Not once every ten minutes. Frequently enough that shutdown feels responsive.

A typical pattern is:

  • create a root CancellationTokenSource
  • link child tokens if needed
  • pass token into every background loop
  • use await Task.Delay(..., token) instead of Thread.Sleep
  • use APIs that accept cancellation where possible
  • stop accepting new queued work during shutdown
  • await worker completion

In desktop systems, shutdown logic is usually more complex than in web apps because the process is often holding onto physical resources: machine connections, file handles, image buffers, SDK handles, unmanaged memory.

Clean shutdown is not just about elegance. It is about preventing corruption and avoiding bad machine state.


How desktop apps differ from web-hosted worker services

This is important.

A lot of .NET guidance about background services comes from ASP.NET Core and server apps. The concepts help, but desktop systems are different.

In a web server:

  • requests are short-lived
  • background services often support the server, not the UI
  • restarts may be acceptable
  • statelessness is often easier
  • scaling out is sometimes possible

In a WPF industrial desktop app:

  • the process may run for days or weeks
  • there is one UI thread that must stay responsive
  • the app may directly own hardware connections
  • some state is in-memory and operationally critical
  • restarting may interrupt production
  • the operator sees every failure immediately

So desktop worker design has to care more about:

  • UI responsiveness
  • long-term stability
  • resource leakage over time
  • thread affinity with SDKs
  • correctness of shutdown
  • operator-visible degraded modes
  • graceful recovery instead of “just restart the pod”

You can still use Host, IHostedService, logging, DI, options, channels, and modern patterns in WPF. In fact, that is often a good idea. But you must adapt them to desktop realities.


PART 3 — REAL PROBLEMS IN THIS SYSTEM

Now let’s ground this in the wafer inspection machine example.

Machine monitor loop

This loop is often the heartbeat of the app.

Its job may include:

  • check connection status
  • query machine mode
  • read alarm codes
  • detect state transitions
  • update internal machine state
  • publish status changes to UI and other services

The biggest trap is making this loop “too smart.”

If the monitor loop directly updates UI, triggers reconnects, starts workflows, clears alarms, writes logs, and changes machine state all in one place, it becomes a fragile god loop.

A better design is:

  • the monitor loop reads machine state
  • it produces normalized events or state updates
  • other components decide what to do with them

That separation matters because polling code is timing-sensitive and should stay small.

Another real issue: polling too aggressively. If you poll every 20 ms because it “feels real-time,” you may overload the machine interface, create unnecessary CPU use, and drown the rest of the app in redundant state updates.

Good engineers ask: “What is the real required detection latency?”

Maybe 200 ms is enough for connection state. Maybe 50 ms is needed for emergency stop visibility. Different signals may need different strategies.


Result processing loop

Inspection systems often generate data in bursts.

A single wafer run might generate:

  • result records
  • defect metadata
  • thumbnails
  • overlays
  • summary counts
  • image file references

If you process results directly on the machine callback thread, you create coupling between machine ingestion and downstream work. That is dangerous.

Why?

Because if saving images becomes slow, result ingestion slows too. Then machine-side buffers can overflow, SDK callbacks can block, and the entire system becomes unstable.

So experienced engineers decouple:

  • machine callback or acquisition layer enqueues work
  • result processing loop consumes work
  • downstream save pipeline persists data
  • UI receives lightweight summaries, not raw heavy payloads

That creates isolation.

It also gives you a place to handle backpressure. If the queue starts growing, you know the system is falling behind.

Without that visibility, you are blind.


Reconnect loop after machine disconnect

Reconnect logic is one of the most underestimated pieces in industrial apps.

People often write reconnect logic like this:

  • detect disconnect
  • start retrying every second
  • on success, restore state
  • continue

In reality, reconnect behavior is messy:

  • disconnect detection may be flaky
  • SDK may hang during reconnect
  • network may half-fail
  • hardware may come back in a partial state
  • previous subscriptions may need re-registration
  • stale handles may need disposal
  • UI may need to move into degraded mode
  • workflows may need to be canceled or paused

So the reconnect loop should usually be a separate responsibility, not buried inside the monitor loop.

A mature reconnect loop often has:

  • state gating so only one reconnect flow runs
  • exponential or bounded backoff
  • timeout around connect attempts
  • cleanup of prior connection artifacts
  • post-reconnect reinitialization sequence
  • clear health/status reporting
  • ability to stop immediately during app shutdown

The biggest mistake is letting multiple reconnect attempts run concurrently. That produces chaos: double subscriptions, duplicate sessions, resource leaks, and strange machine behavior.


Background save pipeline

Saving in the background sounds straightforward until production traffic arrives.

A save pipeline may need to handle:

  • raw images
  • thumbnails
  • result rows
  • logs
  • recipe snapshots
  • temporary recovery checkpoints

And each has different performance characteristics.

Disk I/O may be fast most of the time, then suddenly slow due to antivirus scanning, network storage issues, or large bursts. Database writes may have occasional latency spikes. Image encoding may be CPU-heavy.

So a robust save pipeline usually needs:

  • buffering
  • bounded capacity
  • retry policy for transient failures
  • dead-letter or failure tracking for unrecoverable items
  • flush behavior on shutdown
  • metrics: queue depth, save rate, error count, oldest pending age

One common production failure is silent accumulation. The save pipeline is slower than ingestion, the queue grows slowly for hours, memory rises, then the app degrades badly near the end of the shift.

The system does not fail suddenly. It fails by drift.

That is why observability matters.


Keeping these loops alive without destabilizing the app

This is the heart of the topic.

The goal is not merely “start some background tasks.” The goal is to keep them alive, healthy, and predictable for long periods.

That means:

  • each loop has a clear owner
  • each loop has explicit startup and shutdown
  • each loop handles expected failures internally
  • unexpected failures are surfaced, logged, and supervised
  • shared state is minimized
  • communication happens through queues, events, or state snapshots
  • UI updates are marshaled safely
  • liveness is observable

You do not want invisible worker death.

A worker that crashes silently is worse than one that crashes loudly, because the system may look “mostly okay” while critical background behavior has stopped.

For example:

  • health loop dead → no one notices disk is full
  • save loop dead → results stop persisting
  • reconnect loop dead → machine never recovers
  • monitor loop dead → UI shows stale status forever

Reliable desktop apps treat worker liveness as a first-class concern.


PART 4 — HOW WE USE IT IN .NET (PRACTICAL)

Structuring worker loops with Task and CancellationToken

Here is a realistic basic pattern for a polling worker:

csharp
public sealed class MachineMonitorWorker
{
    private readonly IMachineClient _machineClient;
    private readonly ILogger<MachineMonitorWorker> _logger;
    private readonly TimeSpan _pollInterval = TimeSpan.FromMilliseconds(200);

    public MachineMonitorWorker(
        IMachineClient machineClient,
        ILogger<MachineMonitorWorker> logger)
    {
        _machineClient = machineClient;
        _logger = logger;
    }

    public async Task RunAsync(CancellationToken cancellationToken)
    {
        _logger.LogInformation("Machine monitor worker started.");

        while (!cancellationToken.IsCancellationRequested)
        {
            try
            {
                var status = await _machineClient.ReadStatusAsync(cancellationToken);

                // Publish to application state / event bus / channel.
                HandleStatus(status);

                await Task.Delay(_pollInterval, cancellationToken);
            }
            catch (OperationCanceledException) when (cancellationToken.IsCancellationRequested)
            {
                break;
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, "Machine monitor loop failed.");

                // Back off so we don't spin aggressively on repeated failure.
                try
                {
                    await Task.Delay(TimeSpan.FromSeconds(1), cancellationToken);
                }
                catch (OperationCanceledException)
                {
                    break;
                }
            }
        }

        _logger.LogInformation("Machine monitor worker stopped.");
    }

    private void HandleStatus(MachineStatus status)
    {
        // Keep this lightweight.
        // Update internal state or publish an event.
    }
}

This is intentionally boring. That is good. Production loops should be boring and predictable.

Notice what this code does right:

  • no while (true)
  • cancellation is cooperative
  • delay is cancelable
  • exception handling is inside the loop
  • repeated failure backs off
  • status handling is separated from polling

Supervising background services

If you have multiple loops, do not just start them all with Task.Run and hope for the best.

Create something that owns them.

For example:

csharp
public sealed class BackgroundRuntime : IAsyncDisposable
{
    private readonly ILogger<BackgroundRuntime> _logger;
    private readonly MachineMonitorWorker _machineMonitorWorker;
    private readonly ResultProcessorWorker _resultProcessorWorker;
    private readonly SavePipelineWorker _savePipelineWorker;

    private readonly CancellationTokenSource _cts = new();
    private readonly List<Task> _runningTasks = new();

    public BackgroundRuntime(
        ILogger<BackgroundRuntime> logger,
        MachineMonitorWorker machineMonitorWorker,
        ResultProcessorWorker resultProcessorWorker,
        SavePipelineWorker savePipelineWorker)
    {
        _logger = logger;
        _machineMonitorWorker = machineMonitorWorker;
        _resultProcessorWorker = resultProcessorWorker;
        _savePipelineWorker = savePipelineWorker;
    }

    public void Start()
    {
        _runningTasks.Add(RunSupervisedAsync("MachineMonitor", _machineMonitorWorker.RunAsync));
        _runningTasks.Add(RunSupervisedAsync("ResultProcessor", _resultProcessorWorker.RunAsync));
        _runningTasks.Add(RunSupervisedAsync("SavePipeline", _savePipelineWorker.RunAsync));
    }

    private Task RunSupervisedAsync(
        string workerName,
        Func<CancellationToken, Task> worker)
    {
        return Task.Run(async () =>
        {
            try
            {
                await worker(_cts.Token);
            }
            catch (OperationCanceledException) when (_cts.IsCancellationRequested)
            {
                _logger.LogInformation("{WorkerName} canceled.", workerName);
            }
            catch (Exception ex)
            {
                _logger.LogCritical(ex, "{WorkerName} crashed unexpectedly.", workerName);

                // In a real system:
                // - raise alarm
                // - move app to degraded mode
                // - notify operator
                // - possibly trigger controlled shutdown
            }
        });
    }

    public async ValueTask DisposeAsync()
    {
        _logger.LogInformation("Stopping background runtime...");
        _cts.Cancel();

        try
        {
            await Task.WhenAll(_runningTasks);
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "Error while stopping background runtime.");
        }

        _cts.Dispose();
    }
}

This gives you a lifecycle boundary.

In a more mature system, supervision can do more:

  • restart certain workers
  • mark unhealthy state
  • trip circuit breakers
  • notify operator
  • stop dependent workers
  • escalate if critical loops die

Not every worker should auto-restart. That decision depends on the loop.

For example:

  • save pipeline failure might be degraded but survivable
  • machine control loop failure may require immediate safe-stop mode

Handling exceptions in long-running loops

This is one of the biggest production concerns.

If an exception escapes a background loop, one of two bad things usually happens:

  • the worker dies silently
  • the app crashes unexpectedly

Neither is great unless you deliberately chose it.

The rule is:

Expected operational failures should usually be handled inside the loop.Unexpected invariant-breaking failures should be surfaced clearly.

Examples of expected failures:

  • timeout from machine read
  • temporary disconnect
  • file lock conflict
  • transient database/network failure

Examples of more serious failures:

  • corrupted internal state
  • impossible state transition
  • programming bug
  • repeated failure beyond safe threshold
  • duplicated control ownership

In practice, that means loop code often needs layered handling:

csharp
while (!token.IsCancellationRequested)
{
    try
    {
        await ProcessOneIterationAsync(token);
    }
    catch (TransientMachineException ex)
    {
        _logger.LogWarning(ex, "Transient machine read failure.");
        await Task.Delay(500, token);
    }
    catch (Exception ex)
    {
        _logger.LogError(ex, "Unexpected fatal error in worker.");
        throw; // let supervisor decide what to do
    }
}

This is much better than catching Exception everywhere and pretending all failures are recoverable.


Communicating results safely back to UI

Background workers should not directly manipulate WPF controls.

That creates thread-affinity violations and architectural coupling.

Instead, background workers should publish results to application state, event streams, or message channels. Then the UI layer can observe and marshal updates properly.

For example, using IProgress<T> is fine for simple cases:

csharp
public sealed class ResultProcessorWorker
{
    private readonly ChannelReader<InspectionResult> _reader;
    private readonly IProgress<ResultSummary> _progress;

    public ResultProcessorWorker(
        ChannelReader<InspectionResult> reader,
        IProgress<ResultSummary> progress)
    {
        _reader = reader;
        _progress = progress;
    }

    public async Task RunAsync(CancellationToken cancellationToken)
    {
        await foreach (var result in _reader.ReadAllAsync(cancellationToken))
        {
            var summary = Summarize(result);
            _progress.Report(summary);
        }
    }

    private static ResultSummary Summarize(InspectionResult result)
    {
        return new ResultSummary(result.WaferId, result.DefectCount);
    }
}

Then from the UI side:

csharp
public sealed class MainViewModel : INotifyPropertyChanged
{
    private int _latestDefectCount;

    public int LatestDefectCount
    {
        get => _latestDefectCount;
        private set
        {
            if (_latestDefectCount != value)
            {
                _latestDefectCount = value;
                PropertyChanged?.Invoke(this, new PropertyChangedEventArgs(nameof(LatestDefectCount)));
            }
        }
    }

    public IProgress<ResultSummary> CreateUiProgress()
    {
        return new Progress<ResultSummary>(summary =>
        {
            LatestDefectCount = summary.DefectCount;
        });
    }

    public event PropertyChangedEventHandler? PropertyChanged;
}

Because Progress<T> captures the current synchronization context when created on the UI thread, the callback can safely update ViewModel properties.

For more complex systems, I usually prefer one of these patterns:

  • application event aggregator
  • channel-based message pump
  • central state store
  • dispatcher-marshaled domain notifications

The principle is the same: workers publish data; UI consumes it safely


Using hosted services in a WPF desktop app

A useful modern approach is to use the .NET Generic Host even in desktop applications.

That gives you:

  • DI
  • logging
  • configuration
  • hosted service lifecycle
  • a consistent composition model

A WPF app can bootstrap a host and register background services:

csharp
public partial class App : Application
{
    public static IHost Host { get; private set; } = default!;

    protected override async void OnStartup(StartupEventArgs e)
    {
        Host = Microsoft.Extensions.Hosting.Host
            .CreateDefaultBuilder()
            .ConfigureServices((context, services) =>
            {
                services.AddSingleton<IMachineClient, MachineClient>();
                services.AddSingleton<MachineMonitorWorker>();
                services.AddSingleton<ResultProcessorWorker>();
                services.AddSingleton<SavePipelineWorker>();

                services.AddHostedService<MachineMonitorHostedService>();
                services.AddHostedService<ResultProcessorHostedService>();
                services.AddHostedService<SavePipelineHostedService>();

                services.AddSingleton<MainWindow>();
            })
            .Build();

        await Host.StartAsync();

        var mainWindow = Host.Services.GetRequiredService<MainWindow>();
        mainWindow.Show();

        base.OnStartup(e);
    }

    protected override async void OnExit(ExitEventArgs e)
    {
        await Host.StopAsync(TimeSpan.FromSeconds(10));
        Host.Dispose();
        base.OnExit(e);
    }
}

Example hosted service:

csharp
public sealed class MachineMonitorHostedService : BackgroundService
{
    private readonly MachineMonitorWorker _worker;

    public MachineMonitorHostedService(MachineMonitorWorker worker)
    {
        _worker = worker;
    }

    protected override Task ExecuteAsync(CancellationToken stoppingToken)
    {
        return _worker.RunAsync(stoppingToken);
    }
}

This is a clean way to make desktop background processing feel structured instead of ad hoc.

But remember: BackgroundService is just a wrapper around the real loop. It does not magically solve design problems. You still need good worker logic.


PART 5 — COMMON MISTAKES (VERY REALISTIC)

Fire-and-forget tasks with lost exceptions

This is classic:

csharp
Task.Run(() => DoWorkAsync());

No awaiting. No supervision. No lifecycle ownership.

The code “works” until it doesn’t. Then the task fails, exceptions are lost or delayed, and critical processing silently stops.

Production consequence: You think result saving is running, but it died three hours ago. Operators continue working. Later you discover partial data loss.


Infinite loops without cancellation

Another classic:

csharp
while (true)
{
    PollMachine();
    Thread.Sleep(1000);
}

No cancellation, no structured shutdown, no async wait, no proper exception strategy.

Production consequence: The app hangs during exit, background work continues after the UI closes, machine resources are not released cleanly, and shutdown becomes unreliable.


Blocking sleeps instead of async waits

Using Thread.Sleep inside general background loops is often the wrong choice.

It blocks a thread for the full duration. In a Task-based worker model, that means wasting ThreadPool capacity or requiring unnecessary dedicated threads.

Use:

csharp
await Task.Delay(interval, token);

unless you intentionally own a dedicated blocking thread.

Production consequence: Higher thread usage, reduced scalability inside the process, slower responsiveness to cancellation, and more fragile behavior under load.


Background loops directly touching UI

For example:

  • changing WPF controls directly
  • mutating ObservableCollection from a worker thread
  • setting ViewModel properties from arbitrary threads
  • showing dialogs from worker services

Production consequence: Cross-thread exceptions, random UI corruption, intermittent crashes, and architecture that becomes impossible to reason about.


Multiple loops fighting over shared state

This is extremely common in machine-control systems.

One loop updates machine state. Another loop performs reconnect. Another loop handles user commands. Another loop processes results. All of them read and write the same mutable objects.

This creates race conditions like:

  • reconnect clears state while monitor reads it
  • stop command arrives during reconnect
  • result loop uses stale recipe version
  • two loops attempt connection recovery simultaneously

Production consequence: Intermittent “impossible” bugs. Duplicate actions. Stale UI. Broken invariants. Hard-to-reproduce production failures.

This is why good engineers reduce shared mutable state and introduce clearer ownership boundaries.


PART 6 — PERFORMANCE & TRADE-OFFS

Polling interval trade-offs

Polling faster is not always better.

A very short interval gives lower latency, but increases:

  • CPU usage
  • SDK/machine load
  • duplicate data volume
  • downstream UI churn
  • logging volume
  • contention with other loops

A very long interval reduces load, but increases detection delay.

So polling must be chosen based on operational need.

For example:

  • emergency stop visibility may need very low latency
  • machine summary status may be fine at 500 ms
  • disk health may be fine at 5 seconds
  • archive cleanup may be fine at 1 minute

Do not use one universal interval for everything.


Dedicated thread vs ThreadPool usage

Use ThreadPool-backed async tasks for most I/O-driven background work.

Use dedicated threads carefully when:

  • work is truly blocking
  • SDK behavior is ugly
  • thread affinity matters
  • isolation is more important than efficiency

Too many dedicated threads create overhead and make debugging harder.

Too much blocking work on the ThreadPool can starve unrelated operations.

The right answer is situational.

In industrial systems, hardware SDK integration is often the main reason you end up needing some dedicated-thread design.


Throughput vs responsiveness

This shows up everywhere.

If result processing is optimized for maximum throughput, you may batch aggressively and update UI less frequently.

If UI responsiveness is optimized too aggressively, you may publish too many tiny updates and overwhelm the binding/rendering pipeline.

Similarly:

  • large save batches improve throughput
  • smaller batches reduce data-at-risk during crash
  • fast polling improves responsiveness
  • slower polling reduces system overhead

There is no universal optimum. You choose based on what matters most for that subsystem.


Too many background loops

Breaking responsibilities apart is good. Over-fragmenting is bad.

If every tiny concern becomes its own independent loop, you create:

  • too many moving parts
  • lifecycle complexity
  • harder shutdown coordination
  • more logging noise
  • more shared state edges
  • more scheduling overhead

So don’t design loops as a fashion statement.

A loop should exist because it has a real timing or lifecycle reason.

Not because “services should be small.”


PART 7 — SENIOR ENGINEER THINKING

How experienced engineers structure background processing

Experienced engineers usually think in terms of runtime topology, not just code files.

They ask:

  • What are the continuously running components in this app?
  • Which ones are critical for safety or correctness?
  • Which ones can degrade without stopping production?
  • Who owns each resource?
  • Where are the queues and backpressure boundaries?
  • How is liveness observed?
  • What is the shutdown order?

That leads to designs like:

  • machine communication worker
  • result ingestion buffer
  • result processor
  • save pipeline
  • reconnect coordinator
  • health monitor
  • UI state publisher

Each with explicit responsibility and clear communication paths.


How to supervise and observe worker loops

A mature system should be able to answer questions like:

  • Is this worker currently running?
  • When did it last successfully process work?
  • What is its failure count?
  • How far behind is it?
  • Is it degraded, retrying, or healthy?
  • What queue depth is it handling?
  • What is the oldest pending work item?

This is the difference between “we started some tasks” and “we operate a system.”

In practice, that means exposing:

  • structured logs
  • counters
  • queue depth metrics
  • heartbeats / last-success timestamps
  • health state visible to UI or diagnostics
  • alerts when critical loops stop or fall behind

When the app runs for days, observability is not optional.


How to keep long-running tasks reliable over days/weeks

Long-running reliability is usually not destroyed by one giant bug.

It is destroyed by accumulation:

  • minor memory leaks
  • unbounded queues
  • dangling subscriptions
  • retry storms
  • silent worker death
  • repeated exception spam
  • resource handles not being released
  • UI over-updating
  • stale state after reconnect

So experienced engineers optimize for stability, not cleverness.

That means:

  • bounded queues where appropriate
  • explicit ownership of unmanaged resources
  • careful retry/backoff design
  • minimal shared mutable state
  • restartable worker design where possible
  • periodic health visibility
  • graceful degradation modes
  • shutdown paths that are tested, not guessed

They also test ugly scenarios deliberately:

  • disconnect during inspection
  • slow disk during heavy save load
  • machine reconnect after partial failure
  • app shutdown with pending save queue
  • worker exception in the middle of a run
  • UI close while reconnect is in progress

That is real engineering.


How to shut down cleanly without losing work

Clean shutdown in a real wafer inspection system often means:

  1. stop accepting new commands
  2. signal cancellation
  3. stop machine-related loops safely
  4. flush pending result/save pipelines if possible
  5. persist recovery metadata if needed
  6. release machine/SDK resources
  7. wait with timeout
  8. log what was completed vs abandoned

Not every workload should always fully drain. Sometimes immediate stop is safer. Sometimes save flush is mandatory. Sometimes reconnect should not be attempted during shutdown.

The point is: shutdown is a designed workflow, not an afterthought.

A senior engineer thinks about shutdown from day one, because long-running apps reveal shutdown bugs at the worst possible moment.


Final practical mental model

In a production WPF industrial app, background processing is not “some helper tasks.”

It is more like an internal plant of always-running services inside the desktop process.

The UI is only one part of the system. Behind it, there are worker loops continuously:

  • reading
  • buffering
  • transforming
  • retrying
  • reconnecting
  • saving
  • reporting
  • shutting down

The quality of those loops determines whether the app feels stable, trustworthy, and production-grade.

The real goal is not just concurrency. The real goal is controlled concurrency:

  • clear ownership
  • safe cancellation
  • supervised execution
  • bounded communication
  • reliable recovery
  • clean shutdown
  • UI isolation

That is how experienced .NET engineers build desktop systems that can stay alive for days or weeks without becoming fragile.

If you want, I can turn this topic into a second companion piece: “BackgroundService, Channels, and supervision patterns in WPF — interview-style architecture + code walkthrough.”

Docs-first project memory for AI-assisted implementation.