Span<T>, ReadOnlySpan<T>, Memory<T>, ReadOnlyMemory<T>, and stackalloc in modern .NET
This topic matters because a lot of real performance problems are not about CPU math. They are about data movement.
In production systems, especially real-time and long-running ones, performance often gets worse because we keep:
- allocating short-lived arrays and strings
- copying buffers again and again
- creating temporary objects while parsing or transforming data
- forcing the GC to work harder than necessary
That is exactly the problem these features were introduced to solve.
They are not “cool syntax.” They are tools for writing code that works directly over existing memory, with less copying and fewer allocations.
Part 1 — The big picture
Why modern .NET introduced Span<T> and Memory<T>
Before these types became common, a lot of data processing code followed patterns like this:
- receive a
byte[] - extract a subrange into another
byte[] - convert some bytes into a
string - split or copy again
- pass another new array to another layer
That style is easy to write, but in high-throughput systems it creates a huge amount of temporary garbage.
In a business app, this may not matter much.
In systems like:
- image processing
- machine communication
- real-time telemetry
- binary protocol parsing
- defect/result pipelines
it matters a lot.
Because these systems often do the same operation:
- thousands of times per second
- on large buffers
- for many hours
- under latency pressure
- while the UI must stay responsive
So the runtime and language evolved toward a more explicit model:
“Can we process existing memory directly, instead of copying it into new objects?”
That is the world of Span<T> and Memory<T>.
Why copying data is expensive
Copying has two costs.
The first cost is obvious: CPU time. If you copy 8 KB once, it may be trivial. If you copy 8 KB millions of times, it becomes real work.
The second cost is often worse: allocation pressure.
Every time you create a new array, substring, or intermediate object, you increase GC workload. In long-running desktop systems, that shows up as:
- occasional pauses
- unstable latency
- increased memory footprint
- more frequent Gen 0 / Gen 1 collections
- eventual promotion of objects that should have died early
- harder-to-predict throughput under load
In a wafer inspection system, that can mean:
- delayed UI updates
- jitter in machine status handling
- slower result processing after hours of runtime
- more memory fragmentation around large result sets and images
So the goal is not “avoid every allocation.” The goal is:
avoid unnecessary allocations in the parts of the system that run often enough for it to matter.
Why zero-allocation processing improves both performance and stability
A lot of engineers think this topic is just about speed. It is also about stability.
In real systems, low-allocation processing helps because it makes runtime behavior more predictable.
If a message parser:
- reuses memory
- slices instead of copying
- avoids temporary strings
- uses stack-local scratch space for tiny buffers
then the system tends to behave more consistently under sustained load.
That matters a lot in:
- camera pipelines
- binary device communication
- continuous inspection loops
- high-frequency measurement processing
- always-on WPF desktop applications
The practical benefit is often not “2x faster.” Very often the real benefit is:
- fewer spikes
- lower GC noise
- smoother throughput
- less memory churn
- more consistent latency
That is a very production-oriented win.
Part 2 — What Span<T> really is
Conceptually: a lightweight view over contiguous memory
The best mental model for Span<T> is this:
Span<T>is not the data itself. It is a window onto existing contiguous memory.
That memory might come from:
- an array
- a stack allocation
- native memory
- another span
- a pooled buffer
A Span<byte> does not mean “create bytes.” It means “here is a safe view over these bytes.”
That is why it is so useful. You can pass around a view of the data without duplicating the data.
Slicing without copying
This is one of the most important ideas.
Suppose you receive a machine message in a byte[]. The header is in the first 16 bytes, and the payload starts after that.
Old style thinking:
- create a new
byte[]for the header - create another new
byte[]for the payload
Span-based thinking:
- create a slice for the header
- create a slice for the payload
Both slices refer to the same underlying memory. No copy is needed.
That matters a lot in hot paths because many protocols are basically:
- read buffer
- carve it into sections
- interpret sections
- move on
Span<T> is ideal for that.
Why Span<T> is stack-only
Span<T> is intentionally restricted because it can point to memory with very short lifetime.
For example, it may refer to:
- stack memory from
stackalloc - temporary local data
- memory that must not outlive the current scope
If the runtime allowed spans to be stored anywhere freely, it would become easy to create dangerous cases where a reference outlives the memory it points to.
So the language protects you by making Span<T> a stack-only type.
That means, in practical terms:
- you cannot store it in normal class fields
- you cannot let it escape into arbitrary heap-based state
- you cannot freely carry it across async boundaries
- it is meant for synchronous, scoped, immediate use
This is not a random limitation. It is the reason Span<T> can be both fast and safe.
Why it is a ref struct
Span<T> is a ref struct because the language needs stronger lifetime rules for it.
The important mental model is not compiler trivia. The important point is:
.NET wants to let you work directly over memory, but only in places where the lifetime can be proven safe.
That is why Span<T> feels powerful but constrained.
Those constraints are not annoying extras. They are the design.
Part 3 — ReadOnlySpan<T> in practice
When to use ReadOnlySpan<T>
Use ReadOnlySpan<T> when you want:
- zero-copy access
- slicing
- high-performance reading
- protection against accidental mutation
It is the “read-only view” version of Span<T>.
In many production systems, this is actually the more common choice, because many low-level operations only need to inspect data, not modify it.
Why it is especially useful for strings, arrays, and buffers
This type is extremely useful because many read-heavy operations are really just:
- scanning
- parsing
- validating
- matching
- tokenizing
- extracting fields
Examples:
- parsing a command payload from
byte[] - reading metadata from an image header
- scanning telemetry frames for markers
- checking prefixes/suffixes without creating substrings
- parsing delimited text without
Split()
A big advantage is that ReadOnlySpan<char> lets you process parts of a string without creating new strings.
That is a major improvement over old habits like:
SubstringSplit- repeated small string allocations during parsing
Practical examples
Parsing command payloads
A machine sends a payload like:
CMD=START;MODE=AUTO;LOT=12345
Naive code may:
- convert bytes to string
- split by
; - split again by
= - allocate lots of temporary strings
A better approach is to:
- decode only when needed
- parse spans directly
- slice token by token
This reduces garbage and gives more control over parsing cost.
Reading image metadata
Suppose image metadata is stored in a binary header:
- width
- height
- pixel format
- timestamp
- exposure
You do not need to copy parts of the header into new arrays. You can take slices of a ReadOnlySpan<byte> and decode fields directly.
Scanning buffers
In telemetry or protocol parsing, sometimes you just need to:
- find a marker byte
- match a signature
- skip a known prefix
- read a fixed-width field
That is exactly the kind of operation spans were made for.
Part 4 — What Memory<T> / ReadOnlyMemory<T> solve
This is where many engineers get confused.
Why Memory<T> exists in addition to Span<T>
Span<T> is great, but it is intentionally scoped and synchronous.
That means it does not work well when the memory must survive beyond the current synchronous call chain.
Real systems often need to:
- queue buffers
- store them in objects
- pass them across async methods
- hand them to background workers
- keep references for later stages in a pipeline
That is the problem Memory<T> solves.
The mental model is:
Span<T>= fast, synchronous view for immediate workMemory<T>= storable, heap-friendly, async-friendly memory handle
ReadOnlyMemory<T> is the read-only version.
Difference between Span<T> and Memory<T>
A simple way to think about it:
| Type | Best for |
|---|---|
Span<T> | synchronous hot-path processing |
ReadOnlySpan<T> | synchronous read-only parsing and inspection |
Memory<T> | passing/storing memory across async or heap-based boundaries |
ReadOnlyMemory<T> | async-friendly read-only buffer ownership/pass-through |
Memory<T> can later give you a Span<T> when you are in a safe synchronous scope.
That pattern is very common.
When you need one vs the other
Use Span<T> / ReadOnlySpan<T> when:
- the method is synchronous
- the work is immediate
- you want minimal overhead
- you are parsing, scanning, formatting, or transforming in place
Use Memory<T> / ReadOnlyMemory<T> when:
- the buffer must be stored
- it crosses
await - it is queued for later processing
- it moves between pipeline stages
- ownership/lifetime must extend beyond one synchronous scope
Examples
Pipelines crossing async boundaries
A camera acquisition service receives image bytes and pushes them into an async processing pipeline.
The acquisition layer may produce or own:
IMemoryOwner<byte>Memory<byte>
Later, inside a synchronous decode method, you turn that into:
Span<byte>ReadOnlySpan<byte>
That separation is healthy.
Handing buffers between services
Suppose one component reads data from a socket and another parses it later on a background channel.
You cannot safely carry Span<byte> through that whole path. But you can pass Memory<byte> or a pooled owner object.
Background processing stages
If one stage reads a packet and another stage processes it later, Memory<T> is usually the right abstraction at the boundary.
Then, inside the actual parser, you work with ReadOnlySpan<T>.
That is a very common real-world design.
Part 5 — stackalloc in practice
What stackalloc does
stackalloc allocates a block of memory on the current thread’s stack instead of on the managed heap.
That means:
- allocation is extremely cheap
- cleanup is automatic when the method returns
- no GC tracking is needed for that buffer itself
This makes it useful for small, short-lived temporary buffers.
Relationship with Span<T>
Modern .NET made stackalloc far more practical because it works naturally with spans.
Instead of dealing with unsafe pointer-heavy code, you can do:
- allocate a small stack buffer
- wrap it in a
Span<T> - use normal span operations on it
That is why stackalloc became much more approachable in production code.
When it helps
It helps when you need:
- a tiny scratch buffer
- only for the current method
- in a hot path
- where allocating a heap array would happen frequently
Examples:
- temporary formatting buffer
- parsing a small token
- assembling a short protocol frame
- small conversion workspace
- transient char buffer for number formatting or normalization
Why it should be used carefully
The stack is limited.
Heap allocations are GC-managed and can scale much larger. Stack allocations are fast, but they consume a small bounded resource.
So stackalloc is a tool for:
- small
- predictable
- local
- short-lived buffers
Not for:
- large buffers
- variable huge payloads
- data you need after the method returns
- anything you want to store or queue
A common practical pattern is:
- use
stackallocfor small sizes - fall back to heap or pool for larger sizes
Benefits and risks
Benefits:
- avoids heap allocation
- very cheap for tiny buffers
- excellent for hot-path scratch space
- naturally works with
Span<T>
Risks:
- stack overflow if misused
- harder readability if overused
- dangerous if size is not controlled
- not appropriate for buffers with longer lifetime
Part 6 — Real problems in a wafer inspection WPF desktop app
This is where these tools become meaningful.
Where they actually help
Parsing high-frequency binary machine messages
Machines often emit compact binary frames:
- command acknowledgements
- sensor values
- motion status
- fault data
- timestamps
- counters
This code is usually:
- low-level
- repetitive
- latency-sensitive
- allocation-sensitive
Using ReadOnlySpan<byte> here is a strong fit.
You can:
- slice headers and payloads
- decode primitives without copies
- avoid per-message temporary arrays
Processing image metadata without copying
Large image buffers are expensive to move. Often you do not need to copy the whole image, or even the header.
You may only need:
- dimensions
- pixel format
- row stride
- channel layout
- capture timestamp
- exposure/gain metadata
Span-based readers are a good fit for this.
Slicing large image or telemetry buffers
A single acquisition buffer may contain:
- header
- image body
- trailing metadata
- checksum
- appended inspection info
Creating separate arrays for each part is wasteful. Slicing is cleaner and cheaper.
Reducing temporary allocations in defect/result pipelines
Suppose a defect pipeline takes raw measurements and transforms them into normalized records. If that hot path:
- builds temporary arrays
- converts many small fragments into strings
- repeatedly copies data into intermediate buffers
you get unnecessary churn.
This is a classic place for:
- span-based parsing
- span-based formatting
- pooled buffers
- careful boundary design
Writing hot-path low-level infrastructure code
Examples:
- protocol parsers
- frame decoders
- checksum calculators
- custom binary serializers
- measurement transformation loops
- small image utility routines
These are good candidates.
Where they usually do not belong
This is just as important.
These types usually do not belong in:
- ViewModels
- general business logic
- workflow orchestration
- command handlers
- application services
- most UI-facing code
- stateful domain logic that is not performance-critical
Why?
Because in those places:
- readability matters more
- allocation cost is usually negligible
- data flow is more important than raw buffer efficiency
- async/stateful design is often dominant
- the code is maintained by a broader team
If you push span-heavy style too far upward, you make the codebase harder to understand for very little gain.
A senior engineer isolates this style to the places where it pays.
Part 7 — Zero-allocation data processing patterns
Slicing instead of copying
This is the foundation.
Instead of:
- allocate a subarray
- copy data into it
- process the copy
you:
- create a slice
- process the slice directly
This is often the single biggest conceptual shift.
Working over spans in hot loops
In hot loops, tiny inefficiencies compound.
If you are processing:
- measurement batches
- telemetry frames
- image line metadata
- protocol records
then using spans helps you stay close to the data without repeated object creation.
Parsing directly from spans
This is a very modern and useful design style.
Instead of:
- convert bytes to string
- create substrings
- parse each substring
you:
- read directly from
ReadOnlySpan<byte>orReadOnlySpan<char> - parse fields in place
- only materialize strings when you truly need them
This is much more efficient for protocol and format parsing.
Formatting into spans/buffers
Output can also be low-allocation.
Instead of:
- concatenate strings repeatedly
- build many intermediate strings
you can:
- format into a span
- write into a reusable or stack-allocated buffer
- materialize a final string only once, if needed
This matters in:
- logging infrastructure
- protocol generation
- identifier formatting
- repeated numeric formatting in hot paths
Avoiding intermediate strings, arrays, and lists
A lot of hidden cost comes from intermediate representations.
Examples:
Split()produces arraysSubstring()produces new strings- LINQ chains can create iterators and extra work
- converting bytes to strings too early creates churn
In performance-critical code, the better question is:
can I stay on the original buffer a bit longer?
That is usually where spans pay off.
Examples
Binary protocol parsing
Take a ReadOnlySpan<byte>:
- read fixed header
- slice payload
- validate checksum
- decode fields directly
No subarray copies needed.
Image header parsing
Take the first N bytes of an image buffer:
- decode width
- decode height
- decode pixel format
- decode stride
Again, direct reads from slices.
Tokenizing input without allocation
Given text input, instead of Split(','), scan the ReadOnlySpan<char>, find separators, and process slices.
That is very useful in log parsing, command parsing, and compact text protocols.
Transforming batches of measurement data
If a batch arrives as a contiguous buffer, you can iterate spans over fixed-size records instead of allocating record fragments repeatedly.
Part 8 — How these features behave with async and pipelines
Why Span<T> cannot cross async/await
Span<T> is scoped to safe synchronous use.
An await can suspend execution, move control elsewhere, and resume later. That breaks the simple lifetime guarantees Span<T> depends on.
So Span<T> is deliberately not allowed to flow across async boundaries.
This is one of the most important design rules to internalize.
Why Memory<T> is often used for async-friendly pipelines
When data must:
- survive across awaits
- be queued
- be stored in objects
- move through channels
- be processed later
you need a representation that can live on the heap safely.
That is what Memory<T> is for.
A common pattern is:
- transport/pipeline boundary uses
Memory<T>orReadOnlyMemory<T> - synchronous processing method converts to span
- parsing/transformation happens over span
- result moves onward in a higher-level model
Designing APIs correctly
This is a very mature design pattern:
- sync hot-path API →
ReadOnlySpan<byte>orSpan<byte> - async/boundary API →
ReadOnlyMemory<byte>orMemory<byte>
That gives you:
- high performance where it matters
- correct lifetime behavior across async stages
- cleaner separation of concerns
Example pattern
- acquisition stage reads bytes and exposes
ReadOnlyMemory<byte> - pipeline queues that memory to another stage
- parser method receives
ReadOnlySpan<byte> - parser works synchronously over that span
- output is mapped to a higher-level record
That combination is very common in well-designed systems.
Part 9 — Practical .NET usage
Here are realistic examples.
1. Method taking ReadOnlySpan<byte>
using System;
using System.Buffers.Binary;
public readonly record struct MachineMessageHeader(
ushort MessageType,
ushort Version,
int PayloadLength,
uint SequenceNumber);
public static class MachineProtocolParser
{
private const int HeaderSize = 12;
public static bool TryParseHeader(
ReadOnlySpan<byte> buffer,
out MachineMessageHeader header)
{
header = default;
if (buffer.Length < HeaderSize)
return false;
ushort messageType = BinaryPrimitives.ReadUInt16LittleEndian(buffer.Slice(0, 2));
ushort version = BinaryPrimitives.ReadUInt16LittleEndian(buffer.Slice(2, 2));
int payloadLength = BinaryPrimitives.ReadInt32LittleEndian(buffer.Slice(4, 4));
uint sequence = BinaryPrimitives.ReadUInt32LittleEndian(buffer.Slice(8, 4));
header = new MachineMessageHeader(messageType, version, payloadLength, sequence);
return true;
}
}Why this is good:
- no subarray allocation
- direct reads from the original buffer
- easy to compose into a larger parser
2. Parsing from ReadOnlySpan<byte> without copying
using System;
public readonly record struct InspectionPacket(
MachineMessageHeader Header,
ReadOnlyMemory<byte> Payload);
public static class InspectionPacketReader
{
public static bool TryParsePacket(
ReadOnlyMemory<byte> packetMemory,
out InspectionPacket packet)
{
packet = default;
ReadOnlySpan<byte> packetSpan = packetMemory.Span;
if (!MachineProtocolParser.TryParseHeader(packetSpan, out var header))
return false;
int totalLength = 12 + header.PayloadLength;
if (packetSpan.Length < totalLength)
return false;
ReadOnlyMemory<byte> payload = packetMemory.Slice(12, header.PayloadLength);
packet = new InspectionPacket(header, payload);
return true;
}
}This is a nice pattern:
- boundary type is
ReadOnlyMemory<byte> - parsing uses
Span - payload remains slice-based, not copied
3. Using stackalloc for a small temporary formatting buffer
using System;
public static class DefectCodeFormatter
{
public static string FormatDefectCode(int line, int column, int defectId)
{
Span<char> buffer = stackalloc char[64];
buffer.Clear();
int written = 0;
"L".AsSpan().CopyTo(buffer.Slice(written));
written += 1;
if (!line.TryFormat(buffer.Slice(written), out int lineWritten))
throw new InvalidOperationException("Failed to format line.");
written += lineWritten;
"-C".AsSpan().CopyTo(buffer.Slice(written));
written += 2;
if (!column.TryFormat(buffer.Slice(written), out int columnWritten))
throw new InvalidOperationException("Failed to format column.");
written += columnWritten;
"-D".AsSpan().CopyTo(buffer.Slice(written));
written += 2;
if (!defectId.TryFormat(buffer.Slice(written), out int defectWritten))
throw new InvalidOperationException("Failed to format defect ID.");
written += defectWritten;
return new string(buffer.Slice(0, written));
}
}This avoids several intermediate strings during formatting.
Would I use this everywhere? No. Would I use it in a hot path that formats huge volumes of identifiers? Possibly yes.
4. Converting Memory<T> to Span<T> safely in a synchronous scope
using System;
public static class TelemetryNormalizer
{
public static void NormalizeInPlace(Memory<float> samples, float offset, float scale)
{
Span<float> span = samples.Span;
for (int i = 0; i < span.Length; i++)
{
span[i] = (span[i] + offset) * scale;
}
}
}This is fine because the span is only used inside the synchronous method body.
5. Slicing buffers
using System;
public static class ImageHeaderReader
{
public static bool TryReadDimensions(ReadOnlySpan<byte> buffer, out int width, out int height)
{
width = 0;
height = 0;
const int HeaderLength = 16;
if (buffer.Length < HeaderLength)
return false;
ReadOnlySpan<byte> metadata = buffer.Slice(0, HeaderLength);
ReadOnlySpan<byte> widthBytes = metadata.Slice(4, 4);
ReadOnlySpan<byte> heightBytes = metadata.Slice(8, 4);
width = BitConverter.ToInt32(widthBytes);
height = BitConverter.ToInt32(heightBytes);
return true;
}
}Even this simple pattern is useful: slice the data you need, do not create a new buffer for it.
Part 10 — Common mistakes
1. Using Span<T> everywhere just because it is fast
This is a classic overreaction.
Engineers learn that spans reduce allocations, then start pushing them into every method. The result is often:
- harder APIs
- more confusing code
- lifetime issues
- little measurable benefit
Most code does not need this style.
Use it where data movement and allocation are actually part of the problem.
2. Trying to store Span<T> in class fields
This happens because people think of span as “just another buffer type.”
It is not.
It is intentionally scoped. If you try to turn it into long-lived object state, you are fighting the design.
If you need storage, use:
Memory<T>ReadOnlyMemory<T>- arrays
- pooled owners
not Span<T>.
3. Using stackalloc for large buffers
This is dangerous.
A small stackalloc is a nice optimization. A large one is a reliability risk.
In production, this can turn into intermittent failures or stack overflows, especially if:
- the method is recursive
- multiple stack-heavy calls are nested
- buffer sizes vary more than expected
Use it only for small, bounded scratch space.
4. Introducing hard-to-read code for tiny gains
Very common.
For example, replacing readable parsing code with intricate span logic in a path that runs a few times per user action is usually not worth it.
The result:
- harder maintenance
- fewer engineers confident to change the code
- more bug risk
- no meaningful product impact
Performance code should pay rent.
5. Misunderstanding Memory<T> vs Span<T>
A lot of confusion comes from treating them as interchangeable.
They are related, but they play different roles:
Span<T>for immediate synchronous accessMemory<T>for storage and async-friendly boundaries
If you use the wrong one, the design becomes awkward.
6. Forcing low-allocation patterns into application-layer code
This is a maturity issue.
ViewModels, orchestration services, domain workflows, and UI commands usually benefit more from:
- clarity
- composability
- testability
- explicit business intent
Low-level buffer tricks there usually reduce quality.
7. Optimizing before profiling or benchmarking
This is maybe the most important mistake.
You may spend hours rewriting code with spans and stackalloc, then discover:
- the bottleneck was database I/O
- the UI thread was blocked by rendering
- the real problem was image decode, not header parsing
- allocations were not on the hot path at all
These features are powerful, but they still need measurement discipline.
Part 11 — Performance and memory trade-offs
Reduced allocations vs increased complexity
This is the central trade-off.
You often get:
- fewer temporary objects
- less copying
- lower GC pressure
But you also get:
- stricter lifetime rules
- less familiar APIs
- more low-level code
- more mental overhead
Good engineers do not ignore either side.
Stack usage limits
stackalloc is fast because stack usage is simple and local. But the stack is much more limited than the heap.
So stack-based optimization is good when the amount is:
- small
- fixed or tightly bounded
- short-lived
It is a bad choice for “maybe this payload is 32 bytes, maybe 200 KB.”
Zero-copy vs readability
Zero-copy is not automatically better.
Sometimes copying a tiny amount of data into a clearer, safer structure is absolutely the right engineering decision.
For example:
- small config parsing done rarely
- UI-driven actions
- business-layer orchestration
- code touched by many engineers
The goal is not purity. The goal is proportional optimization.
API flexibility vs low-level efficiency
A low-level API taking ReadOnlySpan<byte> can be excellent internally. But if you expose that style too broadly in public or general-purpose layers, you may reduce usability.
There is often a good balance:
- keep hot-path helpers efficient
- keep higher-level application APIs ergonomic
When the win is real vs negligible
The win is real when:
- the code is on a hot path
- data is large or frequent
- allocations are measurable
- GC noise is visible
- the code processes streams, buffers, or binary data
- copying is happening repeatedly
The win is negligible when:
- the code runs rarely
- data is tiny
- overall latency is dominated elsewhere
- readability suffers more than performance improves
That is the engineering judgment part.
Part 12 — API design with Span<T> / Memory<T>
Designing low-level APIs
These types are great in low-level APIs like:
- binary parsers
- protocol readers
- image header utilities
- checksum/hash routines
- encoding/decoding helpers
- numeric formatting/parsing helpers
Typical pattern:
- input:
ReadOnlySpan<T> - output target:
Span<T> - async/storage boundary:
Memory<T>or owner abstractions
That is usually a strong design.
When public APIs should expose these types
Expose them when the API is clearly:
- low-level
- performance-sensitive
- buffer-oriented
- used by engineers comfortable with this style
Examples:
- internal SDKs
- infrastructure libraries
- parsing libraries
- image-processing utilities
- transport/protocol components
When to keep them internal
Keep them internal when:
- the benefit is local
- the external API should remain simpler
- most consumers do not need to think about memory views
- the abstraction would leak low-level concerns upward
This is very often the right call in application codebases.
Example judgments
Parsing helpers
Great fit for ReadOnlySpan<byte> or ReadOnlySpan<char>.
Protocol readers
Excellent fit. This is one of the best use cases.
Image-processing utilities
Very good fit, especially for headers, rows, channels, and pixel buffers.
Internal infrastructure libraries
Often a good fit if the library is performance-critical.
But for top-level application services, a more domain-oriented API is often better.
Part 13 — Connection to other advanced features
ref struct
Span<T> is one of the most important real-world examples of a ref struct.
This is how C# enforces:
- stack-only usage
- tighter lifetime control
- safe direct memory access patterns
So understanding spans helps you understand why ref struct exists.
ref, in, out
These features all live in the same broader world:
- controlling copying
- improving performance
- handling data more directly
Examples:
inhelps avoid copying large structsreflets you operate on existing storage- spans are essentially a safer view-based model over memory
They are different tools, but they belong to the same mental model:
data movement matters
ArrayPool<T>
Spans often become even more useful when combined with pooled memory.
A common pattern is:
- rent a large array from
ArrayPool<T> - expose slices as spans
- process without further allocation
- return the array to the pool
This is common in:
- protocol readers
- serialization
- high-throughput buffering
- image pipelines
- reusable transport layers
Low-allocation async pipelines
This is where Memory<T> becomes important.
A modern high-performance async pipeline often uses:
Memory<T>orReadOnlyMemory<T>across async stagesSpan<T>inside the synchronous inner loop- pooled buffers for reuse
- owner abstractions for explicit lifetime management
That combination is very practical.
Benchmarking and profiling
These features are exactly the kind of thing you should validate with measurement.
Good questions:
- did allocations actually drop?
- did throughput improve?
- did p99 latency improve?
- did GC collections drop?
- did code complexity increase too much?
Without profiling and benchmarking, it is easy to overuse these tools.
Part 14 — Senior engineer mental model
How experienced engineers think about data movement
Strong engineers look at a pipeline and ask:
- where is data being copied?
- where are temporary allocations created?
- how often does this path run?
- how large is the data?
- is GC pressure showing up in traces or profiles?
- do we need a new object here, or just a view?
That mindset is far more important than memorizing APIs.
How to decide when these features are justified
They are justified when:
- the code is performance-critical
- memory churn is measurable
- the operation is buffer-oriented
- parsing/formatting/transformation happens at high frequency
- zero-copy design meaningfully improves throughput or stability
They are not justified just because the code “looks more advanced.”
How to isolate performance-critical code
This is one of the most important design habits.
Keep low-level performance code:
- small
- focused
- well-tested
- close to infrastructure boundaries
- hidden behind clean abstractions
For example:
- parser utility uses spans internally
- application service receives a normal parsed model
- ViewModel never sees the low-level buffer logic
That is the right separation.
How to keep it maintainable and testable
A good pattern is:
- isolate span-heavy code in dedicated components
- keep methods short and purpose-specific
- validate boundaries carefully
- test with representative data
- benchmark before and after
- document why the low-level optimization exists
If a method is span-heavy and hard to read, it should earn that complexity.
How to avoid turning the whole codebase into performance code
This is the maturity point.
The best systems do not try to make every layer low-allocation.
They use the right style in the right layer:
low-level transport/parsing/image utilities span/memory/pool/stackalloc can be appropriate
domain logic and workflows clarity and correctness dominate
UI/ViewModels/application services readability, state management, and testability dominate
That balance is what separates strong engineering judgment from cargo-cult optimization.
Final practical summary
Span<T> and friends exist because copying and allocating too much is expensive in real systems.
The simple mental model is:
Span<T>= fast, stack-scoped view for synchronous workReadOnlySpan<T>= read-only version for parsing/scanning/inspectionMemory<T>= heap-storable, async-friendly memory abstractionReadOnlyMemory<T>= read-only async/storage-friendly versionstackalloc= tiny temporary stack buffer for carefully bounded hot-path work
They shine in:
- binary protocol parsing
- image metadata/header processing
- telemetry/result pipelines
- low-level infrastructure code
- high-frequency buffer transformations
They usually do not belong in:
- ViewModels
- workflow orchestration
- most business logic
- general application-layer code
The real senior-level lesson is this:
Do not optimize syntax. Optimize data movement.
When data is large, frequent, or hot-path, these features can make a real difference. When the path is not performance-critical, they mostly add complexity.
Use them deliberately, measure the result, and keep the complexity contained.
If you want, I can next turn this into:
- an interview-ready Q&A version, or
- a sharper version with even deeper code examples using
ArrayPool<T>,BinaryPrimitives, and async pipeline patterns.