Below is the mental model I would want a senior engineer to have before walking into a leadership interview.
This is not “how to use Task.Run.” This is “what the runtime is really doing underneath, why it behaves that way, and where production systems get hurt.”
Part 1 — Core concepts recap
Process vs thread
A process is an operating-system container for execution. It has its own virtual address space, loaded modules, handles, security context, and one or more threads. When your .NET app starts, the OS creates a process, loads the runtime, and then execution begins on an initial thread.
A thread is an execution path inside that process. Threads in the same process share the same heap and most process resources, but each thread has its own call stack, register state, and scheduling state. That shared heap is exactly why concurrency is dangerous: multiple threads can observe and modify the same objects at the same time. That is where visibility bugs, races, and contention come from.
User thread vs OS thread
In .NET, when people say “thread,” they usually mean a managed System.Threading.Thread, but that is fundamentally a wrapper over an OS thread. The CLR tracks metadata for it, can attach managed state, and coordinates GC safepoints, exceptions, and runtime services around it, but it still runs on an operating system thread underneath.
So unlike runtimes that heavily use green threads or fibers as the main abstraction, classic .NET threading is largely built on top of kernel-scheduled threads. async changes how work is represented and resumed, but it does not magically create a new kind of CPU thread.
Concurrency vs parallelism
Concurrency means multiple operations are in progress during the same period of time. Parallelism means multiple operations are literally executing at the same instant on different cores.
That distinction matters a lot:
- An ASP.NET server handling 10,000 socket requests asynchronously is highly concurrent, but not necessarily running 10,000 threads.
- A
Parallel.Forover CPU-heavy image processing is parallelism: you are trying to use multiple cores at once. - A WPF app with a UI thread, a hardware callback thread, and background processing is concurrent even if only one worker is actively running at some instant.
A lot of interview confusion starts when people mix these up. Concurrency is mostly about coordination. Parallelism is mostly about throughput from hardware.
Part 2 — Thread in .NET
How Thread maps to an OS thread
A managed Thread corresponds closely to an actual OS thread. Creating one means the runtime asks the OS to create a native thread, allocates and initializes thread state, prepares a stack, and wires the thread into CLR bookkeeping so the runtime can suspend it for GC, track exceptions, and manage thread-local storage.
This is a heavyweight object compared to a task or a queued work item.
Thread lifecycle
A practical lifecycle looks like this:
- Created — you constructed
new Thread(...) - Started — the OS thread is created and scheduled
- Runnable / running — it may run, block, yield, or be preempted
- Blocked / waiting — sleeping, waiting on I/O, lock, monitor, event, etc.
- Stopped — delegate exits or unhandled failure tears it down
The OS scheduler decides when it actually runs. The CLR does not directly “own” CPU scheduling; it cooperates with the OS.
Cost of creating threads
Creating a thread is expensive for several reasons:
- you pay for an OS thread object
- you reserve stack space
- you initialize scheduler/runtime bookkeeping
- the OS has to manage another schedulable entity
- the GC has one more thread to coordinate with
The exact cost varies by OS and environment, but the important point is architectural: a thread is not just “a little bit of work.” It is a scarce scheduling resource. That is why the runtime strongly prefers ThreadPool + tasks for most short-lived work.
Context switching
A context switch happens when the OS stops running one thread and runs another. That means saving register state, restoring another thread’s state, touching scheduling structures, and often disturbing CPU cache locality.
This is where over-threading kills performance. If you spawn too many threads:
- CPUs spend more time switching than doing useful work
- cache warmth disappears
- lock contention rises
- tail latency gets worse
So the big senior-engineer idea is: more threads do not mean more throughput. Past a point, they mean less.
Part 3 — ThreadPool internals
What the ThreadPool is really for
The .NET ThreadPool exists to amortize thread cost. Instead of creating a new OS thread for every short unit of work, the runtime maintains a pool of worker threads and reuses them. The default TaskScheduler is built on top of this pool, and it uses work-stealing plus thread injection/retirement heuristics for throughput and load balancing. (Microsoft Learn)
In practice, the ThreadPool is the engine behind:
Task.Run- many
asynccontinuations - timers
- a lot of internal framework work dispatch
Global queue vs local queue
Conceptually, modern .NET ThreadPool scheduling uses:
- a global queue for broadly visible work
- local queues associated with worker threads for efficient push/pop behavior
Why both?
The global queue is simple and fairer across the whole pool. Local queues improve throughput because a worker can often enqueue and dequeue its own work with less contention. The default scheduler uses work-stealing so idle threads can take work from others when needed. (Microsoft Learn)
A useful mental model is:
- external producers often feed shared/global structures
- worker-created child work often lands locally
- workers prefer their own local work first
- idle workers steal from others
That design tries to balance locality and fairness.
Work stealing
Work stealing is the runtime’s answer to uneven load. Suppose thread A’s local queue is full and thread B has nothing to do. Instead of waking a brand-new thread immediately, B tries to steal from A.
Why this is powerful:
- it improves CPU utilization
- it reduces contention on one giant global queue
- it preserves cache locality for the owning thread as much as possible
- it handles irregular parallel workloads better than strict central scheduling
But it is not free. Stealing requires coordination, and in some bad patterns, stolen work can become “lower priority in practice” than you expect, especially when new global work keeps arriving. Recent .NET performance notes explicitly discuss cases where local-queue work can wait longer if the system keeps finding fresh global work first. (Microsoft for Developers)
Hill-climbing algorithm, high level
The ThreadPool must decide: “Do I need more worker threads right now?”
It cannot just add threads whenever there is a queue, because that causes oversubscription and context switching. It also cannot be too conservative, because then queued work sits idle.
So it uses heuristics, often described at a high level as a hill-climbing strategy: adjust worker count, observe throughput, keep moving in the direction that improves throughput, and back off when more threads stop helping. Microsoft’s docs summarize this as thread injection and retirement aimed at maximum throughput. (Microsoft Learn)
This is the right mental model: the pool is constantly searching for a good operating point, not a mathematically perfect one.
How tasks are scheduled
A Task is not itself a thread. It is a unit of work plus state plus continuation machinery. When a task is scheduled with the default scheduler, it typically becomes a ThreadPool work item. The scheduler decides where to queue it; the ThreadPool decides when a worker executes it.
So the pipeline is roughly:
- create task
- schedule task through a
TaskScheduler - default scheduler queues to ThreadPool
- ThreadPool worker dequeues and executes
- continuations get scheduled too
That separation is important: Task is the abstraction; ThreadPool is often the execution substrate.
Part 4 — Task scheduling
TaskScheduler
TaskScheduler is the policy layer for where and how tasks run. The docs put it simply: a task scheduler ensures that task work eventually gets executed. The default scheduler is integrated with the ThreadPool and supports work-stealing; custom schedulers can impose other policies. (Microsoft Learn)
So TaskScheduler is not “a thread.” It is a dispatch strategy.
Common policies a custom scheduler might enforce:
- single-thread affinity
- limited concurrency
- priority handling
- isolation from the main ThreadPool
- UI thread dispatch through synchronization context
Default scheduler vs custom
The default scheduler is the one you get most of the time. It rides on the ThreadPool and is what powers Task.Run.
A custom scheduler exists when you need a different execution policy. For example:
- a scheduler that runs everything on one dedicated thread
- a scheduler that caps concurrency at 2
- a scheduler bound to a UI context
Most systems should stick to the default scheduler. Custom schedulers are powerful, but they are easy to misuse because you are now taking responsibility for fairness, deadlock risk, and throughput.
How Task.Run works internally
High level:
Task.Run(action)creates a task- it targets
TaskScheduler.Default - the task is queued to the ThreadPool
- a worker thread eventually executes the delegate
The key point is that Task.Run is a very opinionated API. It means: “queue this to the default ThreadPool-backed scheduler.” It is not “make this asynchronous” in the I/O sense. It is “run this work on pool threads.” That is great for CPU-bound work, but it is the wrong cure for naturally asynchronous I/O.
This distinction is a major interview trap.
Part 5 — Memory model and visibility
This is the part many experienced engineers still treat too casually.
Why visibility problems happen
On a single thread, code looks sequential. On multiple threads, that illusion breaks.
A thread may write a value, but another thread may not see it immediately or in the order you expect, because of:
- compiler optimizations
- JIT optimizations
- CPU store buffers
- cache hierarchy
- hardware reordering rules
The C# docs explicitly warn that compiler, runtime, and hardware may rearrange reads and writes for performance. (Microsoft Learn)
So the problem is not only “two threads wrote at once.” Sometimes the bug is simply: one thread wrote, another thread observed stale state.
CPU cache effects
Each core has caches, and hardware aggressively avoids synchronizing every memory access globally because that would be far too slow. So one core can temporarily see an older value than another. Modern systems eventually converge, but “eventually” is not a synchronization guarantee.
That is why polling a plain field from one thread while another updates it is broken unless you use the right synchronization semantics.
Reordering
Reordering is the other half of the story.
For example, this looks harmless:
_data = 42;
_initialized = true;Another thread might observe _initialized == true before it can reliably observe _data == 42, unless the program establishes the right ordering guarantees. Microsoft’s memory-model material explicitly describes these reorder risks and why synchronization is required. (Microsoft Learn)
volatile
volatile says: accesses to this field must obey stronger visibility and ordering semantics than ordinary reads/writes. In modern .NET guidance, volatile is presented as a special-purpose tool, not a general solution; safer primitives like lock, Interlocked, and higher-level synchronization are preferred in most scenarios. (Microsoft Learn)
Important nuance:
volatilehelps with visibility and some ordering- it does not make compound operations atomic
- it does not replace a lock for invariants
So this is still broken:
volatile int _count;
_count++;Why? Because increment is read-modify-write, not one indivisible operation.
Memory barriers, conceptually
A memory barrier is a rule that prevents certain reads/writes from moving across a point. You do not usually think in raw barrier instructions in application code, but primitives like lock, Monitor, Interlocked, Volatile.Read, and Volatile.Write rely on those semantics.
The Volatile docs describe this in acquire/release terms:
- volatile write prevents earlier memory ops from moving after it
- volatile read prevents later memory ops from moving before it (Microsoft Learn)
That is the conceptual model you need for interviews.
Why race conditions happen
A race condition happens when correctness depends on timing/interleaving between threads, and the program does not enforce the required ordering or atomicity.
Classic example:
if (!_initialized)
{
Initialize();
_initialized = true;
}Two threads can both enter. Or one can see _initialized before all the initialization data is safely published. These are different races:
- atomicity race
- publication/visibility race
Senior engineers distinguish them.
Part 6 — Synchronization primitives
lock / Monitor
In C#, lock(obj) is syntax sugar over Monitor.Enter(obj) / Monitor.Exit(obj) in a try/finally.
What it gives you:
- mutual exclusion
- memory ordering guarantees around the critical section
- reentrancy for the owning thread
Internally, the runtime tries to make uncontended locking cheap and only pays more when contention appears. That is why lock is usually the default choice for protecting in-memory shared state.
Use it when:
- the protected data is in-process
- the critical section is short
- you need simple, strong correctness
Do not use it when:
- you need cross-process coordination
- you need to
awaitinside the critical section - you need very specialized reader-heavy behavior
Mutex
A Mutex is heavier than Monitor. It is an OS-backed synchronization primitive and can be named for cross-process use.
That makes it useful for:
- single-instance app coordination
- inter-process exclusion
But for normal intra-process data protection, it is typically much slower than lock because you are paying kernel-object costs.
So the rule of thumb is simple: if you only need in-process mutual exclusion, prefer lock.
Semaphore / SemaphoreSlim
A semaphore is not “one owner at a time.” It is “up to N concurrent entrants.”
Use it when you want to limit concurrency, for example:
- allow only 8 concurrent HTTP calls
- allow only 2 consumers into a section
- throttle access to a resource pool
SemaphoreSlim is the lighter in-process version, optimized for managed use and commonly used in async-aware throttling patterns. A full Semaphore is more OS-oriented and heavier.
Practical distinction:
SemaphoreSlimfor most .NET application concurrency throttlingSemaphorewhen you need named/cross-process semantics or compatibility with wait handles
ReaderWriterLock
A reader-writer lock allows:
- many concurrent readers
- one exclusive writer
This sounds perfect for read-heavy workloads, but it is not automatically better than lock.
Why not?
- it is more complex
- upgrade/downgrade behavior can get tricky
- writer starvation or fairness trade-offs can appear
- overhead can exceed benefits unless reads are frequent and long enough
So use it only when profiling suggests real benefit. Otherwise, a plain lock is often faster and simpler.
Cost differences
Very roughly, from lighter to heavier in common uncontended in-process scenarios:
Interlockedlock/MonitorSemaphoreSlimReaderWriterLockSlimdepends on pattern, can be great or badMutex/ kernel primitives
The important interview answer is not the exact ranking. It is this:
Choose the weakest primitive that still gives the guarantees you need. Anything stronger than necessary usually costs throughput and simplicity.
Part 7 — Concurrent collections
How ConcurrentDictionary works conceptually
ConcurrentDictionary<TKey, TValue> is designed so many threads can operate on the dictionary safely without one giant global lock for every operation.
Conceptually, it uses techniques such as:
- atomic reads where possible
- fine-grained synchronization for mutations
- partitioned/bucketed coordination rather than a single coarse lock
- retry loops for races
The exact internals have evolved over .NET versions, but the important design idea is consistent: reduce contention by avoiding one lock for the entire structure.
That means two threads updating unrelated buckets often do not block each other the way they would with lock(myDictionary) around the whole thing.
Lock-free vs fine-grained locking
These terms get mixed together a lot.
Lock-free means progress is coordinated through atomic operations like CAS (CompareExchange) rather than conventional blocking locks.
Fine-grained locking means locks still exist, but at smaller scope.
Many “concurrent” collections are not purely lock-free. They often use a hybrid approach:
- lock-free or near-lock-free fast paths for reads
- fine-grained locks for structural mutations
- careful memory publication rules
That is usually the sweet spot in real runtimes because fully lock-free algorithms are hard to get correct, hard to maintain, and can still perform badly under contention if retries explode.
Part 8 — Contention and scalability
What contention is
Contention means multiple threads are trying to use the same constrained resource at the same time.
That resource could be:
- a lock
- a queue
- a cache line
- a CPU core
- a database connection slot
- the ThreadPool itself
When threads contend, they spend time waiting, retrying, spinning, or context switching instead of doing useful work.
How it limits scalability
Contention is the reason “8 cores” does not mean “8x faster.”
If all threads repeatedly converge on one hot lock, you have effectively serialized the program. More cores just create a bigger traffic jam.
Typical symptoms:
- throughput plateaus early
- CPU rises but useful work does not
- latency spikes under load
- lock wait time dominates profiles
- parallel loops stop scaling after a small number of threads
How to detect it
In production, you detect contention through a mix of symptoms and tooling:
- thread dumps showing many workers blocked on the same monitor
- profiling showing lock contention or excessive waiting time
- ETW / EventPipe / PerfView /
dotnet-tracestyle diagnostics - ThreadPool starvation symptoms
- flat throughput despite increasing worker counts
- high context-switch rate with poor progress
The senior habit is: never assume “parallel” means scalable. Measure where the serialization point is.
Part 9 — Common low-level pitfalls
Race conditions
A race condition is any bug where correctness depends on timing and unsynchronized access.
Common forms:
- lost updates
- stale reads
- partially published object state
- double initialization
- check-then-act bugs
The really dangerous thing is that races often disappear under the debugger. The debugger changes timing, and timing is the bug.
Deadlocks
A deadlock happens when threads form a cycle of waiting.
Classic example:
- thread A holds lock 1, waits for lock 2
- thread B holds lock 2, waits for lock 1
This is why lock ordering matters. If all code acquires locks in one global order, deadlock risk drops sharply.
This is one of the most important senior-level disciplines in concurrent design: define acquisition order up front.
Livelocks
A livelock is when threads are not blocked, but still make no progress because they keep reacting to each other.
Example shape:
- each thread notices contention
- both politely back off
- both retry together
- both collide again forever
The system looks active but produces nothing useful.
Thread starvation
Thread starvation means work is ready but cannot get a thread soon enough.
In .NET, a famous case is ThreadPool starvation:
- pool threads block on synchronous waits
- queued work that would unblock them cannot run
- the pool injects more threads, but too slowly or under heavy pressure
- latency explodes
This is why blocking on async work in server code is so toxic. You are taking a shared worker pool and turning it into a parking lot.
Part 10 — Senior engineer mental model
How to reason about multithreaded execution
The right mental shift is this:
Your code does not execute in source-code order globally. It executes as many small interleaved steps across cores, caches, queues, and schedulers.
So when reviewing concurrent code, ask:
- What state is shared?
- Who can read it?
- Who can write it?
- What makes each read/write visible?
- What operations must be atomic?
- What ordering must be guaranteed?
- What happens if two threads arrive at the worst possible moment?
That is the real checklist.
How to visualize interleaving
Take any critical section and decompose it into tiny steps.
Instead of this:
_count++;Visualize:
- load
_count - add 1
- store result
Now imagine two threads interleaving those three steps. The lost update becomes obvious.
Do the same with initialization, queue operations, and lock acquisition.
That is how senior engineers debug concurrency in their heads.
How to debug concurrency bugs
A practical approach:
- first, suspect shared mutable state
- reduce the system to one hot path
- log thread IDs, timestamps, and state transitions
- capture dumps when the system is hung or slow
- inspect waits, ownership, and queue growth
- reproduce under load, not just in unit tests
- add assertions around invariants
- prefer proving correctness through structure, not hoping tests catch timing bugs
The deepest lesson is this:
Concurrency bugs are usually design bugs before they are coding bugs.
If your design requires many threads touching the same mutable state with subtle rules, you are already in the danger zone. The best fix is often architectural:
- isolate ownership
- reduce sharing
- use message passing
- make state immutable where possible
- keep synchronization boundaries explicit
That is the mental model interviewers want to hear.
A compact interview summary
If I had to summarize the whole topic in a few lines:
Threadis a heavyweight managed wrapper over an OS thread.Taskis a unit-of-work abstraction, usually scheduled onto the ThreadPool.- The ThreadPool uses queues, work stealing, and adaptive thread management to balance throughput and latency. (Microsoft Learn)
- Multithreaded correctness fails not only because of simultaneous writes, but because of visibility, reordering, and non-atomic compound operations. (Microsoft Learn)
volatileis a specialized visibility tool, not a general synchronization solution. (Microsoft Learn)lockis usually the default primitive for protecting in-process shared state; heavier primitives should be chosen only when their semantics are actually needed.- Scalability is mostly limited by contention, not by lack of threads.
- Senior engineers reason in terms of ownership, visibility, atomicity, ordering, and interleaving.
I can turn this into a second pass focused only on interview-style questions and model answers for each section.
Good — this is exactly the level where many “senior” engineers still have gaps.
Interlocked looks simple, but it sits right at the boundary between CPU instructions, memory model, and lock-free design.
Let’s go deep, but keep it practical.
PART 1 — WHAT Interlocked REALLY IS
At a high level:
Interlockedprovides atomic operations implemented using CPU instructions, without using locks.
When you call:
Interlocked.Increment(ref _count);You are NOT doing:
- load → increment → store (3 steps)
You are doing:
- one indivisible hardware operation
That means:
- no other thread can observe intermediate state
- no other thread can interleave inside that operation
Why this matters
This fixes a fundamental problem:
_count++; // NOT SAFEBreak it down:
- load value
- add 1
- store value
Two threads can interleave → lost update
But:
Interlocked.Increment(ref _count); // SAFE→ done as a single atomic instruction (or a tight CAS loop internally)
PART 2 — HOW IT WORKS UNDER THE HOOD
CPU-level primitives
Modern CPUs provide atomic instructions like:
LOCK XADDCMPXCHG(Compare-And-Swap / CAS)
The CLR maps Interlocked operations to these.
Example concept:
CAS(address, expected, newValue)Meaning:
“Only write
newValueif current value == expected, otherwise fail”
This is the foundation of lock-free algorithms.
Full memory fences (VERY IMPORTANT)
Every Interlocked operation also acts as a:
full memory barrier
That means:
- no reads/writes before can move after
- no reads/writes after can move before
So Interlocked gives you BOTH:
- atomicity
- visibility + ordering
This is stronger than volatile.
Quick comparison
| Feature | volatile | Interlocked | lock |
|---|---|---|---|
| atomic operation | ❌ | ✅ | ✅ |
| visibility | ✅ | ✅ | ✅ |
| ordering | partial | full | full |
| blocking | ❌ | ❌ | ✅ |
PART 3 — CORE OPERATIONS
1. Increment / Decrement
Interlocked.Increment(ref _count);
Interlocked.Decrement(ref _count);Used for:
- counters
- reference tracking
- simple metrics
2. Add
Interlocked.Add(ref _value, 5);Atomic addition.
3. Exchange (set value atomically)
Interlocked.Exchange(ref _state, newState);Use when:
- you want to replace value safely
- and get old value if needed
4. CompareExchange (MOST IMPORTANT)
Interlocked.CompareExchange(ref _value, newValue, expectedValue);Meaning:
If
_value == expectedValue, set it tonewValue
This is the foundation of:
- lock-free programming
- state transitions
- one-time initialization
- retry loops
PART 4 — THE MOST IMPORTANT PATTERN (CAS LOOP)
This is the core mental model.
Example: atomic update
int oldValue, newValue;
do
{
oldValue = _value;
newValue = oldValue + 1;
} while (Interlocked.CompareExchange(ref _value, newValue, oldValue) != oldValue);What is happening?
- read current value
- compute new value
- try to swap
- if someone changed it → retry
This is:
optimistic concurrency at CPU level
Why this is powerful
- no blocking
- no kernel calls
- scales better under low contention
But:
- can spin under high contention
- harder to reason about
PART 5 — REAL USE CASES
1. State transitions (VERY COMMON)
if (Interlocked.CompareExchange(ref _started, 1, 0) == 0)
{
Start();
}Meaning:
Only one thread can transition from 0 → 1
This replaces:
lock(...)
{
if (!_started)
{
_started = true;
Start();
}
}2. Double-check initialization (safe version)
if (_instance == null)
{
var newInstance = Create();
if (Interlocked.CompareExchange(ref _instance, newInstance, null) != null)
{
// another thread won
}
}3. Lightweight counters
Interlocked.Increment(ref _requests);Used everywhere in:
- metrics
- logging
- performance tracking
4. Lock-free structures (advanced)
Used inside:
- ConcurrentQueue
- ConcurrentDictionary
- ThreadPool queues
They rely heavily on CAS loops.
PART 6 — WHEN TO USE Interlocked vs lock
Use Interlocked when:
- single variable
- simple operation
- no complex invariants
- performance matters
Example:
- counters
- flags
- state machine transitions
Use lock when:
- multiple variables must stay consistent
- complex logic
- readability matters
- correctness > micro-performance
Key rule
If you need more than 1–2 Interlocked operations to maintain correctness → use a lock
PART 7 — PERFORMANCE CHARACTERISTICS
Why it's fast
- no blocking
- no context switching
- no kernel involvement
- uses CPU instructions
But not always faster
Under high contention:
- CAS loops retry
- CPU spins
- cache line bouncing occurs
This can be WORSE than a lock.
Example problem: cache line contention
Multiple cores updating same variable:
Interlocked.Increment(ref _count);This causes:
- cache invalidation
- memory bus traffic
- performance degradation
This is called:
false sharing / cache line contention
PART 8 — COMMON PITFALLS
1. Thinking it replaces all locks
Wrong.
This is broken:
Interlocked.Increment(ref _a);
Interlocked.Increment(ref _b);Because:
- operation across multiple variables is NOT atomic
2. Mixing normal reads with Interlocked writes
if (_flag == 1) // unsafe readIf _flag is modified via Interlocked, you should use:
Volatile.Read(ref _flag)or also use Interlocked consistently.
3. Complex CAS logic bugs
CAS loops can:
- spin forever (if bug)
- be hard to reason about
- hide subtle race conditions
4. ABA problem (advanced)
CAS checks value equality:
A → B → ACAS thinks nothing changed, but it did.
This is rare in simple code but critical in lock-free structures.
PART 9 — HOW SENIOR ENGINEERS THINK ABOUT IT
Mental model
Interlocked= "small, atomic, lock-free building block"
Use it for:
- simple state transitions
- counters
- coordination flags
Decision framework
Ask:
- Is this a single variable?
- Is operation simple?
- Do I need strict atomicity?
If yes → Interlocked
Else → lock
Real-world rule
In large systems:
- 90% →
lock - 9% →
Interlocked - 1% → true lock-free algorithms
FINAL INTERVIEW SUMMARY
If interviewer asks:
“What is Interlocked?”
Strong answer:
Interlocked provides atomic operations implemented using CPU instructions like compare-and-swap. It guarantees both atomicity and full memory ordering, so it solves not just race conditions but also visibility issues. It’s typically used for simple shared state like counters or flags, but it doesn’t scale well to complex invariants, where a lock is usually the safer choice.
If you want, next step we can do:
👉 Deep dive into lock-free vs wait-free vs obstruction-free (very strong senior topic) 👉 Or how ThreadPool + Interlocked are used together internally (queues, scheduling)