High-performance and low-allocation techniques in modern C#/.NET systems
This topic matters a lot more in industrial desktop software than many web developers first realize.
In a typical web API, a request comes in, some objects get allocated, the request finishes, and the process moves on. Even if the allocation pattern is not great, the impact may be tolerable because each request is short-lived, work is naturally partitioned, and latency spikes are often averaged out across many requests.
A WPF desktop app controlling a wafer inspection machine is very different. It stays alive all day. It streams data continuously. It receives hardware callbacks, moves images through pipelines, updates the UI, stores results, and may run for hours without restart. In that kind of system, poor allocation behavior is not a small inefficiency. It becomes a stability problem.
The important mindset is this: performance is not only about CPU. In real .NET systems, memory allocation rate often drives performance problems indirectly through garbage collection, cache pressure, pauses, fragmentation, and long-term memory growth. That is why senior engineers care so much about allocation behavior in hot paths.
1. Big picture
Why memory allocation is a major performance factor in .NET
In .NET, allocation is usually cheap at the point of allocation. Creating a new object often looks fast. That is why developers get lulled into thinking allocations do not matter.
The real cost is not the single allocation. The real cost is the system-level effect of allocating continuously under load.
Every allocation adds pressure to the garbage collector. The GC then needs to trace object graphs, identify dead objects, move surviving objects in compacting generations, and sometimes pause managed threads. So the real question is not, “Is new expensive?” The real question is, “What does this allocation pattern do to the process over time?”
In long-running machine software, that distinction is huge. Ten million tiny allocations over an hour may hurt far more than one expensive CPU operation.
Why GC behavior matters more in long-running desktop systems than short-lived web requests
A long-running desktop system accumulates history.
It builds object graphs over time. Some objects die quickly. Some survive longer than intended. Some get promoted to older generations. Some event subscriptions accidentally keep things alive. Some caches grow “temporarily” and never shrink. Some UI view models stay referenced because a screen was closed incorrectly.
That means GC behavior becomes part of the runtime character of the app. You do not just care whether the system is fast right now. You care whether it still behaves predictably after six hours, after three production shifts, or after a week in a lab.
A web request that allocates too much might cause a slower response. A long-running WPF machine application that allocates too much may gradually become unstable, more jittery, less responsive, and harder to diagnose.
Why real-time systems are sensitive to GC pauses and allocation spikes
Machine-integrated systems care about timing consistency, not just average speed.
If a background analysis pipeline allocates heavily for a few seconds, the GC may run more aggressively. Then the UI thread may pause at the wrong time. A live trend graph may stutter. A command acknowledgement may be delayed. A device status panel may stop refreshing smoothly. An operator may interpret that as machine trouble.
In image-heavy inspection systems, the problem is worse because image data is large, data rates are high, and bursts happen. One badly designed stage in the pipeline can turn a smooth system into a jittery one.
The important word here is jitter. In industrial systems, jitter is often more damaging than slightly slower steady-state performance.
2. How allocation impacts performance
Allocation rate vs total memory
A lot of developers look only at total memory usage.
That is not enough.
A process using 1.5 GB steadily may actually be healthier than a process using 500 MB but allocating and discarding objects at an extreme rate. Why? Because GC pressure is driven largely by allocation churn, not just by the current size of the heap.
You need to distinguish:
- Total memory footprint: how much memory the process currently holds
- Allocation rate: how much new managed memory is being created over time
High allocation rate means the GC has to work harder, even if the process does not look “huge” in Task Manager.
Short-lived vs long-lived objects
Short-lived objects are not automatically bad. .NET is actually optimized for many short-lived allocations. Generational GC is built around the assumption that many objects die young.
The problem starts when short-lived allocations happen at very high frequency in hot paths. Then Gen 0 collections happen constantly. That can be okay up to a point, but eventually it starts stealing time from useful work.
Long-lived objects are dangerous in a different way. If objects survive collections, they get promoted to older generations. Gen 2 collections are more expensive. If your system keeps accidentally promoting data that should have died quickly, you pay a larger price later.
So the production problem is not just “too many allocations.” It is often “the wrong lifetime profile.”
How frequent allocations increase GC pressure
Imagine a defect detection stage that creates:
- one defect object per finding
- several strings for logging and formatting
- temporary lists for filtering
- lambda closures in helper methods
- LINQ iterators in tight loops
Maybe none of those looks terrible by itself. But if this happens thousands of times per second, the total allocation rate becomes enormous.
Then the GC starts running frequently. CPU time shifts away from actual inspection work into memory cleanup. Throughput drops. Latency becomes uneven. The UI may start to skip frames or lag when an operator interacts with the system.
That is the real production effect.
How GC pauses affect UI responsiveness and real-time behavior
WPF already has a single-threaded UI model. The UI thread must stay responsive for rendering, input, and dispatching work. If managed pauses happen at bad times, even short pauses become visible.
In a machine control system, this shows up as:
- delayed UI updates
- frozen trend graphs
- operator clicks feeling ignored
- alarm screens appearing late
- jitter in dashboards
- delayed binding refreshes
Even if the machine control loop is not directly on the UI thread, a sluggish UI still damages operator trust. In industrial software, perceived responsiveness is part of system quality.
3. Real problems in a wafer inspection WPF system
Let’s use this concrete scenario:
A WPF desktop app controls a wafer inspection machine. Cameras produce image frames. An image-processing pipeline finds defects. Results stream to a UI. Operators see defect lists, thumbnails, counters, and status panels. Sessions may run for hours.
Frequent allocation of defect objects
A naïve design often creates many small reference objects:
public sealed class Defect
{
public int X { get; init; }
public int Y { get; init; }
public double Size { get; init; }
public string Type { get; init; } = "";
public DateTime Timestamp { get; init; }
}If every stage creates new Defect objects, wraps them in other objects, transforms them with LINQ, and pushes them into multiple queues, the system may create millions of objects in a long session.
This does not fail immediately. It slowly creates GC churn and memory growth.
Handling image buffers
Image buffers are where teams often get hurt badly.
A single grayscale image of 4096 x 4096 pixels is already large. A color image or multiple intermediate processing buffers can become huge very quickly. If each stage allocates a fresh byte[], ushort[], or float[], the system will hammer the Large Object Heap.
That creates serious long-run problems: fragmentation, slower collections, and memory behavior that becomes worse the longer the app runs.
UI binding causing hidden allocations
WPF can hide allocation problems behind convenience.
Common examples:
- rebuilding
ObservableCollection<T>repeatedly - creating new view models every refresh
- using converters on thousands of items
- using string formatting in bindings
- pushing individual UI updates for each defect
- replacing large item sources instead of batching
The code may look clean, but the allocation and layout cost can be huge.
Memory growth over long inspection sessions
Long sessions expose retention bugs.
Maybe the current run should only keep summary data, but historical thumbnails remain referenced by old view models. Maybe event subscriptions from closed windows were never removed. Maybe a global cache keeps strong references forever. Maybe completed tasks still hold state objects through continuations.
This kind of problem usually looks like “memory slowly increases over time.” In production, that is one of the most dangerous symptoms because it often does not appear in short test runs.
Performance degradation after hours of runtime
This is the classic industrial desktop pattern:
- the app starts fast
- the first hour looks fine
- after a few hours, the UI becomes less smooth
- CPU rises during heavy inspection periods
- opening result screens gets slower
- memory climbs and does not fully recover
- occasional pauses become noticeable
That is not just “the app is old.” It is usually a combination of allocation churn, retention, LOH pressure, and UI overproduction.
4. Reducing allocations in hot paths
Identifying hot paths
Do not optimize everything.
A hot path is code that runs very frequently or processes large volumes of data. In this kind of system, examples include:
- per-frame image processing
- per-defect transformation
- parsing incoming data packets
- queueing and dispatch loops
- UI update loops for streaming data
That is where allocation reduction matters. Not in rarely used admin screens.
Avoid unnecessary object creation
Bad:
public DefectViewModel Map(Defect defect)
{
return new DefectViewModel
{
X = defect.X,
Y = defect.Y,
Size = defect.Size,
DisplayText = $"({defect.X}, {defect.Y}) Size={defect.Size:F2}"
};
}If this runs for every live update, you are creating view models and strings constantly.
Better approach: separate streaming data from UI projection. Keep the hot path using compact data structures, and only project to UI objects when actually needed.
public readonly record struct DefectData(int X, int Y, float Size, DefectKind Kind);Then UI projection can be done in batches, or only for visible rows.
Avoid LINQ in tight loops
LINQ is great for readability in non-critical paths. In hot loops, it can introduce iterator allocations, delegates, hidden captures, and extra passes over data.
Before:
var largeDefects = defects
.Where(d => d.Size > threshold)
.Select(d => new DefectSummary(d.X, d.Y, d.Size))
.ToList();This is often fine in business code. In a high-frequency processing path, it can be too allocation-heavy.
After:
var results = new List<DefectSummary>(defects.Count);
for (int i = 0; i < defects.Count; i++)
{
ref readonly var d = ref defects[i];
if (d.Size > threshold)
{
results.Add(new DefectSummary(d.X, d.Y, d.Size));
}
}This version is more verbose, but in a hot path it gives better control over allocations and execution.
Important nuance: do not ban LINQ globally. Ban it selectively in measured hot paths.
Avoid boxing
Boxing turns a value type into an object on the heap. This is easy to miss and surprisingly common.
Examples:
object obj = 42; // boxing
IComparable c = 42; // boxing
logger.LogInformation("{Value}", someStruct); // may box depending on API usageIn tight paths, boxing can create invisible allocation churn.
A common production issue is using non-generic interfaces or APIs with value types. For example, iterating with older abstractions or storing structs as object in shared pipelines.
Reduce temporary objects
Bad:
public string BuildAlarmMessage(int x, int y, double size)
{
return "Defect at X=" + x + ", Y=" + y + ", Size=" + size;
}This can create multiple intermediate strings.
Better in high-frequency cases:
public string BuildAlarmMessage(int x, int y, double size)
{
return string.Create(
64,
(x, y, size),
static (span, state) =>
{
var written = 0;
"Defect at X=".AsSpan().CopyTo(span[written..]);
written += "Defect at X=".Length;
state.x.TryFormat(span[written..], out var w1);
written += w1;
", Y=".AsSpan().CopyTo(span[written..]);
written += 4;
state.y.TryFormat(span[written..], out var w2);
written += w2;
", Size=".AsSpan().CopyTo(span[written..]);
written += 7;
state.size.TryFormat(span[written..], out _);
});
}Would I write this everywhere? No. Only if profiling proves string construction is a real hot spot.
That is the senior mindset: optimize surgically.
5. ArrayPool and object reuse
What ArrayPool<T> solves
Repeatedly allocating arrays is expensive, especially medium and large arrays used in data pipelines.
ArrayPool<T> lets you rent buffers and return them for reuse instead of constantly allocating new ones.
This is extremely useful for:
- image scanline buffers
- temporary processing buffers
- packet assembly
- serialization/deserialization
- intermediate transform stages
Example: image buffer reuse
Without pooling:
public byte[] ProcessFrame(byte[] source)
{
var temp = new byte[source.Length];
// process...
return temp;
}This creates a new array on every frame.
With pooling:
private readonly ArrayPool<byte> _pool = ArrayPool<byte>.Shared;
public void ProcessFrame(ReadOnlySpan<byte> source, IFrameSink sink)
{
byte[] rented = _pool.Rent(source.Length);
try
{
var target = rented.AsSpan(0, source.Length);
source.CopyTo(target);
// process target...
sink.Write(target);
}
finally
{
_pool.Return(rented);
}
}Now the system reuses memory instead of constantly allocating.
Example: processing pipeline
A pipeline stage may need a temporary working buffer for filtering or thresholding. Renting one buffer per operation can dramatically reduce allocation churn compared with creating new arrays repeatedly.
Pitfalls
Pooling improves performance, but it introduces responsibility.
1. Returning buffers incorrectly
If you forget to return a rented buffer, pooling loses value and memory pressure returns.
2. Returning buffers too early
If another component still uses the buffer after you returned it, you have a correctness bug. This is a classic danger when buffers are passed downstream asynchronously.
3. Data leakage
A rented array may contain old data. If sensitive or correctness-critical, you may need to clear it before reuse.
_pool.Return(rented, clearArray: true);That has a cost, so use it intentionally.
4. Keeping oversized arrays around
Pools may return arrays larger than requested. Always work on the intended slice, not the full length.
var buffer = rented.AsSpan(0, requestedLength);Object reuse beyond arrays
Sometimes teams try to pool normal objects too aggressively. That can work, but it is riskier than array pooling because reused objects have more state.
If you pool objects, you need:
- clear ownership rules
- reset logic
- thread-safety guarantees
- no hidden references escaping
In practice, array pooling is usually the first, safest, highest-value reuse technique.
6. Span<T> and Memory<T> in practical use
What problem they solve
Span<T> and Memory<T> help you work with slices of data efficiently without copying.
That matters when your system processes chunks of buffers repeatedly. Instead of creating subarrays or duplicating data, you can create lightweight views over existing memory.
This is powerful in:
- packet parsing
- binary protocol handling
- image row or tile processing
- framing and chunking
- string/byte parsing
Practical example: parsing a binary device packet
Bad:
byte[] header = data.Skip(0).Take(8).ToArray();
byte[] payload = data.Skip(8).Take(length).ToArray();This allocates multiple arrays.
Better:
ReadOnlySpan<byte> span = data;
ReadOnlySpan<byte> header = span.Slice(0, 8);
ReadOnlySpan<byte> payload = span.Slice(8, length);No copies. No extra arrays.
Practical example: handling an image segment
Suppose you process a rectangular subregion from a frame. A naïve design may allocate a new array for the segment. Sometimes that is necessary, but often you can process by slice or by row-window over the original memory.
That reduces copying and allocation significantly.
When to use Span<T>
Use it when:
- data is short-lived
- the work is synchronous
- you want efficient slicing/parsing
- you want to avoid copies in hot code
When to use Memory<T>
Use Memory<T> when the buffer needs to cross async boundaries or survive beyond stack-only scope.
Span<T> is stack-only and cannot be stored in fields or used across await. Memory<T> gives similar slice semantics but with broader usage.
Do not use it everywhere
This is important.
Some teams discover Span<T> and start rewriting everything around it. That is usually a mistake. It can make code harder to understand, harder to debug, and more brittle, especially if the performance benefit is unmeasured.
Use it where data slicing/copy avoidance is clearly important.
7. Value types vs reference types
When struct is beneficial
Small, simple, immutable data often works well as a struct.
Examples:
- coordinates
- measurement samples
- points
- rectangles
- small packet headers
- defect positions
These can avoid heap allocation when used locally or inside arrays of structs.
Example:
public readonly record struct DefectPoint(int X, int Y);
public readonly record struct MeasurementSample(long Timestamp, float Value);This can be better than allocating many tiny reference objects.
Why it helps
A reference type means separate heap object allocation and pointer chasing. A value type can be stored inline, including inside arrays. That often improves memory locality and reduces GC pressure.
Trade-offs
Structs are not free.
If a struct is too large, copying it around becomes expensive. If it is mutable, bugs become confusing. If it is boxed accidentally, you lose the benefit.
A useful rule of thumb: structs are good for small, simple, value-like data. They are not good for large, stateful domain objects.
Bad candidate:
public struct InspectionSessionState
{
public string RecipeName;
public List<DefectData> Defects;
public byte[] Thumbnail;
public Dictionary<string, object> Metadata;
}This is not value-like. It should be a class.
Good candidate:
public readonly struct StagePosition
{
public double X { get; }
public double Y { get; }
public double Z { get; }
public StagePosition(double x, double y, double z)
=> (X, Y, Z) = (x, y, z);
}8. Large Object Heap in practice
What triggers LOH allocations
In .NET, large objects above a threshold go to the Large Object Heap. The exact threshold is around 85 KB.
That means large arrays, large strings, and large image-related buffers often end up there immediately.
Why large images go to LOH
Image processing systems naturally deal with large contiguous buffers.
Examples:
- raw frame buffers
- grayscale planes
- RGB images
- intermediate convolution buffers
- thumbnail batches
- stitched image regions
A single frame buffer can easily exceed the LOH threshold many times over.
Why LOH hurts long-running apps
LOH is expensive because large objects are expensive to allocate and reclaim, and repeated patterns can lead to fragmentation problems.
A long-running inspection app that keeps allocating large temporary image buffers can develop:
- rising memory usage
- expensive collections
- slower allocation behavior
- reduced predictability under heavy load
Even if average throughput looks acceptable, the runtime becomes less stable.
Real image-processing example
Bad design:
- acquire image frame into fresh
byte[] - clone into processing buffer A
- clone into threshold buffer B
- clone into display buffer C
- create cropped copies for thumbnails
That pipeline may allocate several LOH objects per frame.
Better design:
- reuse buffers through pools
- process in-place where safe
- use slices or views instead of copies
- separate display conversion from analysis buffers
- keep only necessary retained images
The biggest LOH win often comes from architecture, not syntax.
9. UI performance and memory
Large collections bound to UI
Binding large live collections directly to WPF is dangerous.
If you push every defect immediately to an ObservableCollection<T> bound to a visible grid, the system pays for:
- collection notifications
- UI container generation
- layout
- rendering
- possible string formatting and converters
- view model allocation
With thousands of items, this becomes expensive very quickly.
Virtualization is critical
UI virtualization means only visible items are actually realized as UI elements.
This is one of the highest-value techniques in WPF for large result sets.
Without virtualization, a defect list with 50,000 items may create a huge number of visual objects. That destroys memory and responsiveness.
With virtualization, the UI creates only enough visuals for what the user is currently viewing.
This is essential for:
- defect grids
- result tables
- thumbnail browsers
- log viewers
Important production lesson: virtualization can be accidentally disabled by control templates, nested scroll viewers, grouping, or certain panel choices. Teams often think they are virtualizing when they are not.
Batch UI updates
Do not push one UI update per event if the event rate is high.
Instead of:
foreach (var defect in incomingDefects)
{
Defects.Add(new DefectViewModel(defect));
}Use a batching model. Accumulate updates in the background, then flush them periodically on the UI thread.
var batch = GetNextDefectBatch();
await _dispatcher.InvokeAsync(() =>
{
foreach (var defect in batch)
{
_visibleDefects.Add(new DefectViewModel(defect));
}
});Better still, batch notifications or use controls/data layers designed for bulk updates.
Avoid excessive UI object creation
A common mistake is creating a full view model for every backend entity, even when most are not visible.
In real production systems, it is often better to keep the backend store compact and project only visible or selected items into richer UI objects.
The UI should not be the primary storage model for the inspection session.
10. Common mistakes
Ignoring allocation cost completely
This is common in teams coming from low-volume enterprise CRUD systems.
They assume .NET is “fast enough” and do not think about allocation patterns at all. In streaming and imaging systems, that mindset breaks down badly.
Consequence: the app passes functional testing but degrades under sustained load.
Premature micro-optimization
The opposite mistake is also common.
Someone starts hand-optimizing string formatting, replacing every loop with low-level constructs, and introducing pooled objects everywhere before measuring anything.
Consequence: the code gets harder to maintain, bugs increase, and the real bottleneck is still somewhere else.
Using Span<T> everywhere unnecessarily
This often becomes a performance fashion trend.
If a piece of code runs once per minute, rewriting it around spans is usually wasted complexity. Sometimes the cleanest code is the right choice.
Memory leaks via event handlers
Classic WPF and desktop problem.
A short-lived object subscribes to a long-lived publisher and never unsubscribes. That one mistake can keep entire graphs alive: view models, images, buffers, windows, and closures.
Consequence: memory keeps growing even though screens were closed.
Keeping references alive accidentally
Examples:
- global caches
- static events
- long-lived tasks holding captured state
- diagnostic history lists that never rotate
- queues that are never drained properly
- background services retaining old results
This is one of the hardest production problems because the GC is technically working correctly. The objects are still reachable.
Over-caching everything
Caching is not free. Every cache is a retention policy.
Teams often cache images, metadata, thumbnails, and parsed results “for performance,” then slowly turn the process into a memory sink.
Consequence: improved short-term speed, worse long-term stability.
11. Performance measurement
This is one of the biggest differences between mid-level and senior engineers.
Senior engineers do not guess performance problems. They measure them.
How to identify real bottlenecks
Start with symptoms:
- UI freezes
- throughput drops
- memory growth
- periodic pauses
- CPU spikes
- lag after hours of runtime
Then measure in the actual workload shape:
- live streaming rate
- realistic image sizes
- realistic session duration
- realistic defect volume
- realistic UI screens open
A benchmark on a tiny synthetic sample is not enough.
Allocation profiling vs CPU profiling
You need both.
CPU profiling tells you where execution time goes.
Allocation profiling tells you where memory churn is created.
Many performance problems in .NET are mixed problems: a method may not be the top CPU consumer, but it may allocate so heavily that it causes GC overhead elsewhere.
That is why allocation profiling is so important in managed systems.
What senior engineers actually measure
They usually care about things like:
- allocation rate per second
- GC frequency by generation
- pause patterns during load
- LOH allocation patterns
- retained memory growth over time
- queue depth and backlog
- UI thread responsiveness
- frame/update smoothness
- per-stage latency in processing pipelines
They also compare “fresh start” vs “after hours of runtime,” because long-run stability matters.
Practical workflow
A realistic approach is:
- Reproduce the issue under representative load.
- Measure CPU, allocation rate, and retained memory.
- Find the highest-impact hot paths.
- Fix one thing at a time.
- Re-measure.
- Keep the simplest fix that delivers meaningful improvement.
That process is much more valuable than heroic low-level cleverness.
12. Trade-offs
Readability vs performance
Readable code is the default.
Optimized code earns its complexity only where measurement proves it matters.
A plain foreach and a simple object model may be best almost everywhere. A manual loop, pooled buffer, and span-based parser may be best in the hot path. Good engineering is knowing where each belongs.
Allocation reduction vs code complexity
Reducing allocations often means more control over lifetimes, ownership, and reuse. That can make code more fragile.
For example, pooled buffers improve performance, but they also create correctness risks. That is a real trade-off, not a free win.
Reuse vs safety
Fresh allocation is simple and safe. Reuse is fast but requires discipline.
If the team cannot reliably manage ownership and lifetime, aggressive reuse can introduce subtle bugs worse than the original performance problem.
Optimization vs maintainability
The most dangerous optimized code is the kind nobody understands six months later.
Performance work must leave the system not only faster, but still supportable by the team.
That is especially important in industrial software, where long lifetime and operational stability matter more than clever implementation.
13. Senior engineer mental model
Experienced engineers think about performance in layers.
Layer 1: architecture
First ask whether the design itself is causing unnecessary work.
Are we copying images too many times? Are we pushing every event into the UI? Are we storing data in a UI-shaped model? Are we keeping too much history alive? Are we using synchronous handoffs that create bursts and stalls?
Architecture usually dominates small code tweaks.
Layer 2: data movement
Then ask how data flows.
How many times is the same data allocated, copied, transformed, serialized, or projected? Can we process by slice instead of copy? Can we batch? Can we reuse buffers? Can we reduce object graph size?
Layer 3: hot-path code
Only after that do they optimize local code paths.
This is where they look at:
- LINQ in tight loops
- boxing
- temporary strings
- small object churn
- unnecessary wrappers
- struct vs class choices
- pooling opportunities
Layer 4: long-run stability
Senior engineers also think in hours, not milliseconds.
Will this approach still behave well after a full production shift? Will memory remain stable? Will the UI remain smooth? Will retained objects grow? Will LOH usage stay under control?
That long-run view is extremely important in real machine systems.
Optimize only where it matters
The best engineers do not try to make the whole system low-level.
They keep most of the code clean and understandable, then make targeted improvements in places proven to matter. That is how you avoid both under-optimization and over-optimization.
Keep the system stable over long runtime
In industrial desktop software, stable runtime behavior is often more valuable than maximum benchmark speed.
A pipeline that is slightly slower but steady for 12 hours is usually better than a pipeline that benchmarks faster but causes memory spikes, UI pauses, and unpredictable degradation.
That is the mature trade-off.
A practical summary for interview use
If you need to explain this in a leadership interview, the strongest framing is:
High-performance .NET is not about fighting the runtime. It is about understanding where allocation patterns create system-level instability. In long-running WPF and hardware-integrated applications, excessive allocation causes GC pressure, jitter, UI pauses, LOH problems, and long-term degradation. The right approach is to measure real hot paths, reduce unnecessary object creation, avoid wasteful copying, use pooling selectively, virtualize the UI, and optimize with discipline rather than cargo-cult tricks.
And the most senior-sounding insight is this:
In production systems, the real goal is not “fast code.” It is predictable, stable behavior under sustained load.
If you want, I can turn this into a second pass with:
- interview Q&A with strong sample answers, or
- a wafer-inspection-specific architecture walkthrough showing exactly where each optimization technique belongs.
Absolutely. These three are closely related, but they solve different problems.
A lot of .NET engineers hear about ArrayPool<T>, Span<T>, and Memory<T> as if they are one “performance package.” In real systems, they are not the same thing.
A useful way to think about them is:
ArrayPool<T>is about reusing buffersSpan<T>is about working with memory efficientlyMemory<T>is about holding onto memory safely across async or object boundaries
That distinction matters a lot in production code.
1. The big picture
In high-throughput systems, performance problems often come from two things:
- allocating too many buffers
- copying data too many times
These tools address those two problems from different angles.
Imagine a wafer inspection app receiving raw image lines from a camera.
A naïve pipeline often does this:
- allocate a new
byte[]for incoming data - copy into another array for parsing
- copy into another array for processing
- copy into another array for display
- allocate temporary subarrays for segments
That is not just wasteful. It creates GC pressure, LOH pressure, latency spikes, and long-run instability.
A better pipeline tries to answer three questions:
- Can I reuse the buffer instead of allocating a new one?
- Can I view part of existing memory instead of copying it?
- Can I pass memory through async code without violating lifetime rules?
That is where ArrayPool<T>, Span<T>, and Memory<T> come in.
2. ArrayPool<T> — what it really is
ArrayPool<T> is a shared buffer rental system.
Instead of doing this every time:
var buffer = new byte[65536];you do this:
var buffer = ArrayPool<byte>.Shared.Rent(65536);and when done:
ArrayPool<byte>.Shared.Return(buffer);So instead of constantly creating and destroying arrays, you borrow one, use it, then give it back.
That reduces allocation churn dramatically in hot paths.
Why this matters so much
Arrays are everywhere in real systems:
- image buffers
- network packets
- file reads
- binary parsing
- compression/decompression
- serialization
- intermediate transform buffers
If these arrays are allocated repeatedly in high-frequency code, you can create a huge amount of GC pressure.
In a streaming or imaging system, this may happen thousands of times per second.
The point of pooling is not that new byte[] is always slow. The point is that repeated allocation over time causes system-wide cost.
3. What ArrayPool<T> does not do
This is important.
ArrayPool<T> does not give you an array of exactly the requested size.
If you ask for 10,000 bytes, you may get a larger array.
Example:
byte[] rented = ArrayPool<byte>.Shared.Rent(10000);
Console.WriteLine(rented.Length); // maybe 16384, maybe moreSo you must treat the usable portion separately from the physical array length.
Correct:
int requested = 10000;
byte[] rented = ArrayPool<byte>.Shared.Rent(requested);
Span<byte> usable = rented.AsSpan(0, requested);Do not accidentally process the entire backing array unless that is intentional.
4. Practical ArrayPool<T> example
Naïve packet parser
public Packet ParsePacket(Stream stream, int length)
{
byte[] buffer = new byte[length];
stream.ReadExactly(buffer, 0, length);
return Parse(buffer);
}This allocates a fresh buffer every time.
If packets come continuously, that becomes expensive.
Better with pooling
private static readonly ArrayPool<byte> Pool = ArrayPool<byte>.Shared;
public Packet ParsePacket(Stream stream, int length)
{
byte[] rented = Pool.Rent(length);
try
{
stream.ReadExactly(rented, 0, length);
return Parse(rented.AsSpan(0, length));
}
finally
{
Pool.Return(rented);
}
}Now the parser avoids repeated allocations.
That is already a big win.
5. The most important ArrayPool<T> rule: ownership
Pooling introduces an ownership model.
Who owns the rented buffer? Who is allowed to write to it? When is it safe to return it? Can anyone still read it after return?
This is where many bugs come from.
Bad example:
public ReadOnlyMemory<byte> ReadMessage(Stream stream, int length)
{
byte[] rented = ArrayPool<byte>.Shared.Rent(length);
stream.ReadExactly(rented, 0, length);
ArrayPool<byte>.Shared.Return(rented);
return rented.AsMemory(0, length); // BUG
}This returns memory pointing to an array that has already gone back to the pool. Another part of the app may rent and overwrite it.
That is a correctness bug, not just a performance issue.
The lifetime of pooled memory must be crystal clear.
6. When ArrayPool<T> is a great fit
It is a very good fit when all of these are true:
- the code runs frequently
- arrays are medium or large
- the data is short-lived
- ownership is clear
- the buffer can be returned soon after use
Examples:
- per-frame temporary image buffers
- parsing device messages
- encoding/decoding work buffers
- staging buffers in a pipeline
- temporary aggregation buffers
When it is a bad fit
It is a poor fit when:
- the data must live a long time
- ownership is fuzzy
- multiple async consumers might outlive the caller
- the team cannot reliably enforce return discipline
- the logic becomes much harder to reason about
In those cases, normal allocation may be safer.
7. Common ArrayPool<T> mistakes
Returning too early
public async Task SendAsync(NetworkStream stream, byte[] source)
{
byte[] rented = ArrayPool<byte>.Shared.Rent(source.Length);
source.CopyTo(rented, 0);
var memory = rented.AsMemory(0, source.Length);
ArrayPool<byte>.Shared.Return(rented);
await stream.WriteAsync(memory); // BUG
}The async write may still be using the memory after the return.
Correct:
public async Task SendAsync(NetworkStream stream, byte[] source)
{
byte[] rented = ArrayPool<byte>.Shared.Rent(source.Length);
try
{
source.CopyTo(rented, 0);
await stream.WriteAsync(rented.AsMemory(0, source.Length));
}
finally
{
ArrayPool<byte>.Shared.Return(rented);
}
}Forgetting to return
That reduces the benefit of pooling and can quietly hurt memory behavior over time.
Assuming contents are zeroed
Pooled arrays may contain old data.
byte[] rented = ArrayPool<byte>.Shared.Rent(1024);
// contents are undefined from your point of viewIf you rely on clean contents, clear the relevant slice yourself.
Returning corrupted shared state
If two code paths accidentally share the same rented array, one can modify data the other still depends on.
That kind of bug is painful.
8. Span<T> — what it really solves
Span<T> is not about pooling.
Span<T> is about representing a contiguous region of memory without allocating.
It is like a lightweight window over memory.
It can point to:
- an array
- part of an array
- stack memory
- unmanaged memory
- other memory-backed sources
The key value is this: you can work with slices of data without creating new arrays.
Simple example
Without Span<T>:
byte[] header = buffer.Skip(0).Take(16).ToArray();
byte[] payload = buffer.Skip(16).Take(payloadLength).ToArray();This allocates two new arrays.
With Span<T>:
ReadOnlySpan<byte> data = buffer;
ReadOnlySpan<byte> header = data.Slice(0, 16);
ReadOnlySpan<byte> payload = data.Slice(16, payloadLength);No copies. No allocations.
That is the core win.
9. Why Span<T> is powerful in real systems
A lot of production code spends time cutting buffers into pieces:
- packet headers
- protocol frames
- rows in image memory
- regions of interest
- string parsing
- file chunks
Without spans, developers often create temporary arrays or substrings. Those copies add up fast.
With spans, you can parse and process directly from the original memory.
That reduces both allocation and data movement.
And in many systems, reducing copying matters almost as much as reducing allocation.
10. Practical Span<T> example — binary parsing
Suppose a device sends this message format:
- bytes 0-1: message type
- bytes 2-5: payload length
- bytes 6 onward: payload
Naïve version:
public Message Parse(byte[] buffer)
{
byte[] typeBytes = buffer[0..2];
byte[] lengthBytes = buffer[2..6];
byte[] payload = buffer[6..];
short type = BitConverter.ToInt16(typeBytes, 0);
int length = BitConverter.ToInt32(lengthBytes, 0);
return new Message(type, payload.Take(length).ToArray());
}This creates several unnecessary arrays.
Better:
public Message Parse(ReadOnlySpan<byte> buffer)
{
short type = BitConverter.ToInt16(buffer.Slice(0, 2));
int length = BitConverter.ToInt32(buffer.Slice(2, 4));
ReadOnlySpan<byte> payloadSpan = buffer.Slice(6, length);
byte[] payload = payloadSpan.ToArray(); // only if ownership requires a copy
return new Message(type, payload);
}Now you only copy if you truly need an owned payload array.
Sometimes you can avoid even that final copy depending on the design.
11. Practical Span<T> example — image row processing
Imagine an 8-bit grayscale image stored in a single flat array.
You want to process one row at a time.
Without span:
for (int y = 0; y < height; y++)
{
byte[] row = new byte[width];
Array.Copy(buffer, y * width, row, 0, width);
ProcessRow(row);
}This allocates a new array for every row.
With span:
ReadOnlySpan<byte> image = buffer;
for (int y = 0; y < height; y++)
{
ReadOnlySpan<byte> row = image.Slice(y * width, width);
ProcessRow(row);
}Now each row is just a view into existing memory.
That is a very real and very important production improvement.
12. Why Span<T> has restrictions
Span<T> is intentionally limited because it is designed for safety and performance.
It is a ref struct, which means:
- it cannot be boxed
- it cannot be stored in normal heap objects
- it cannot be used as a field in a class
- it cannot cross
await - it cannot be captured by lambdas in the usual way
At first this feels annoying. But the reason is good: Span<T> may refer to stack memory or short-lived memory, so the runtime prevents unsafe lifetime mistakes.
So Span<T> is great for local, synchronous, tight processing.
It is not designed for “store this and use it later.”
13. Memory<T> — why it exists
Memory<T> exists because sometimes you need span-like semantics, but the data must survive longer or cross async boundaries.
You can think of Memory<T> as the heap-safe, storable counterpart.
It still represents a region of memory, but unlike Span<T>, it can be:
- stored in fields
- passed through async methods
- kept as part of an object
- used in APIs that complete later
Example
This is illegal with Span<T>:
public async Task<int> ReadAndProcessAsync(Stream stream, Span<byte> buffer)
{
int read = await stream.ReadAsync(buffer); // not valid shape for stored lifetime scenarios
return read;
}But this is fine with Memory<T>:
public async Task<int> ReadAndProcessAsync(Stream stream, Memory<byte> buffer)
{
int read = await stream.ReadAsync(buffer);
return read;
}Then inside synchronous processing code, you can get a span:
Span<byte> writable = buffer.Span;So Memory<T> is often the bridge between async/object-oriented code and fast span-based local processing.
14. The relationship between Span<T> and Memory<T>
This is the clean mental model:
- use
Span<T>when processing memory right here, right now, synchronously - use
Memory<T>when memory must be stored, passed around, or awaited - use
ReadOnlySpan<T>andReadOnlyMemory<T>when callers should not modify the data
That is usually enough for real-world design decisions.
15. Practical Memory<T> example — async pipeline stage
Suppose a camera pipeline produces buffers and passes them to an async saver.
Bad version with array copying:
public async Task SaveFrameAsync(byte[] frame)
{
byte[] copy = new byte[frame.Length];
Array.Copy(frame, copy, frame.Length);
await _storage.WriteAsync(copy, 0, copy.Length);
}This creates an extra copy every time.
Better:
public async Task SaveFrameAsync(ReadOnlyMemory<byte> frame)
{
await _storage.WriteAsync(frame);
}Now the API can accept memory directly.
But this raises the real question: who owns the underlying buffer, and how long is it valid?
That is where architecture matters more than syntax.
If the caller is using pooled memory, it must not return that memory to the pool until the async save completes.
16. Span<T> and ArrayPool<T> together
These are often used together.
Pattern:
- rent a buffer from
ArrayPool<T> - expose only the relevant slice as
Span<T>orMemory<T> - process efficiently without copy
- return to pool when lifetime ends
Example:
private static readonly ArrayPool<byte> Pool = ArrayPool<byte>.Shared;
public void ProcessFrame(ReadOnlySpan<byte> source)
{
byte[] rented = Pool.Rent(source.Length);
try
{
Span<byte> working = rented.AsSpan(0, source.Length);
source.CopyTo(working);
ApplyThreshold(working);
Analyze(working);
}
finally
{
Pool.Return(rented);
}
}Here:
- pooling avoids repeated allocation
- span avoids extra slicing/copy overhead
- the lifetime is clearly contained
This is a good production pattern.
17. Memory<T> and ArrayPool<T> together
This is more delicate.
Example:
public async Task SendFrameAsync(ReadOnlyMemory<byte> frame)
{
await _network.WriteAsync(frame);
}If the underlying memory comes from a rented pooled array, the caller must retain ownership until the send completes.
That often means the buffer lifetime must be tied to the async operation.
A common real-world pattern is to wrap pooled memory in an owner object so buffer return is explicit and delayed until disposal.
For example, conceptually:
public sealed class PooledBuffer : IDisposable
{
private byte[]? _array;
public Memory<byte> Memory { get; }
public PooledBuffer(int length)
{
_array = ArrayPool<byte>.Shared.Rent(length);
Memory = _array.AsMemory(0, length);
}
public void Dispose()
{
if (_array is not null)
{
ArrayPool<byte>.Shared.Return(_array);
_array = null;
}
}
}Then usage:
using var buffer = new PooledBuffer(length);
await stream.ReadAsync(buffer.Memory);
await ProcessAsync(buffer.Memory);This makes ownership much clearer.
That kind of pattern becomes valuable in serious pipelines.
18. ReadOnlySpan<T> and ReadOnlyMemory<T>
In many APIs, read-only variants are even more important.
They communicate that the function will inspect data but not mutate it.
That improves safety and API clarity.
Examples:
public int FindMarker(ReadOnlySpan<byte> data)
public ValueTask SaveAsync(ReadOnlyMemory<byte> frame)This is a great design habit for performance-sensitive APIs.
It also reduces accidental copying because callers can pass arrays, slices, or other memory-backed data directly.
19. Common design patterns
Pattern 1: parse synchronously with span
public Header ParseHeader(ReadOnlySpan<byte> data)Good for local parsing.
Pattern 2: accept memory for async I/O
public Task WriteAsync(ReadOnlyMemory<byte> data)Good for async boundaries.
Pattern 3: use pooling behind the implementation
public Task<Result> ProcessAsync(ReadOnlyMemory<byte> input)Inside, the implementation may rent working buffers.
This is often better than exposing pooling to every caller.
Pattern 4: keep pooled lifetimes tightly scoped
The shorter and clearer the rental lifetime, the safer the code.
20. When not to use them
This is just as important.
Do not use ArrayPool<T> if:
- arrays are tiny and infrequent
- the code is not hot
- ownership becomes confusing
- safety risk is too high for the gain
Do not use Span<T> if:
- the code is not performance-sensitive
- it makes the API harder to understand
- you need to store the data or cross async boundaries
Do not use Memory<T> if:
- a simple array is perfectly fine
- the abstraction adds no measurable value
- lifetime/ownership is already obvious without it
These are powerful tools, not default style rules.
21. Real wafer inspection examples
Example A: image row analysis
Best fit:
- pooled backing buffer for frame acquisition
Span<byte>for row-by-row analysis- no row copies
That is high value.
Example B: async save to disk
Best fit:
ReadOnlyMemory<byte>for the async write API- careful ownership until the save completes
That is where Memory<T> shines.
Example C: cropping many tiny regions
If you are extracting thousands of small ROIs from a frame, avoid allocating a new array for each ROI unless absolutely necessary. Prefer working with coordinates and spans over the original buffer where possible.
That can remove huge allocation volume.
Example D: packet parser for PLC/device protocol
Use ReadOnlySpan<byte> to parse headers, lengths, command codes, checksums, and payload sections directly from the receive buffer.
That is usually much cleaner and faster than splitting into many small arrays.
22. Trade-offs in real systems
These tools improve performance by making memory and ownership more explicit.
That is both their strength and their cost.
They often produce:
- less allocation
- less copying
- better throughput
- smoother long-run behavior
But they can also produce:
- more complex lifetime rules
- harder debugging when ownership is unclear
- subtle bugs if pooled buffers escape too far
- more cognitive load for the team
That is why senior engineers use them deliberately, not ideologically.
23. The senior engineer mental model
A strong mental model is:
ArrayPool<T>
“I need temporary buffers often, and allocating them repeatedly is expensive.”
Span<T>
“I need to process part of existing memory efficiently without copying.”
Memory<T>
“I need span-like memory handling, but the data must survive across async/object boundaries.”
And one more critical rule:
Never separate performance technique from lifetime reasoning.
Most bugs with these APIs are not syntax bugs. They are lifetime bugs.
The code compiles. The benchmarks look good. Then hours later in production, a buffer gets reused too early, data becomes corrupted, or memory is retained too long.
That is why mature teams treat these APIs as memory management tools, not just performance tricks.
24. Practical guidance
If I were designing a real high-throughput .NET pipeline, I would usually do this:
- start with normal arrays and clean code
- measure allocation hot spots
- add
Span<T>first in parsing/slicing code where copies are obvious - add
ArrayPool<T>where buffer churn is significant - use
Memory<T>at async boundaries - keep pooled ownership tight and explicit
- avoid exposing pooled lifetimes all over the codebase unless necessary
That sequence tends to give the best balance of performance, correctness, and maintainability.
25. One sentence summary
ArrayPool<T> helps you avoid repeated buffer allocation, Span<T> helps you work on existing memory without copying, and Memory<T> helps you carry that memory safely through async and longer-lived code.
If you want, next I can go even deeper with one of these three: ArrayPool<T> internals, Span<T> internals and compiler/runtime restrictions, or production design patterns for buffer ownership in streaming/image pipelines.