SLICE-2.0: Store Allocation & Contention Profiling
- Status: Completed (2026-05-07)
- Date: 2026-05-03
- Depends on: Requirements, Evolution Roadmap, SLICE-006: Observability Baseline, SLICE-1.4: Storm & Soak Profiles
Goal
Instrument AppStateStore.Update to measure (a) per-call allocation, (b) lock-wait time, (c) call rate, and (d) caller distribution. Run a 30-minute capture under the MultiTag profile and produce a slice-2-0-store-profiling row that quantifies what fraction of the application's total allocation rate flows through AppStateStore.Update and how contended the store lock actually is. The resulting numbers determine whether the originally planned Phase 2 slices (2.1, 2.2, 2.3, 2.4) are load-bearing as specified, need reordering, or can be deferred. This slice ships measurement only — no architectural refactoring.
Why This Slice
The Phase 1 exit-gate evidence (rows slice-1-1 through slice-1-4-soak-8h) shows the application surviving load beautifully: frames.dropped near zero, runs.faulted bounded, working-set steady-state flat across 8 hours, GC pause p95 stable at 7.9–12.4 ms. The roadmap §3 explicitly says: "If the app survives this run beautifully, Phase 2 is deferred. If not, we know exactly which slice of the store to attack first."
The app did survive beautifully — but AppStateStore.Update allocation share and lock-wait time are unmeasured. Phase 2's planned slices were specified with concrete exit gates that depend on these unmeasured baselines:
- SLICE-2.1 (slice
AppStateinto sub-records) — exit gate: "Allocations/sec atAppStateStore.Updatedrop by ≥ 40% under the Phase-1 soak profile." Requires a baseline. - SLICE-2.2 (immutable collections) — purpose: avoid
List<T>rebuilds on hot paths. Without seeing a measurable allocation share, we can't tell if the rebuilds are even visible in the trace. - SLICE-2.3 (data-plane lift-out for telemetry / frames / diagnostics) — already done for the encoder stream (SLICE-1.3 bypasses
AppState). For tags/frames/diagnostics: justified only if those callers dominate the AppState write rate. - SLICE-2.4 (per-slice observables) — exit gate: "Lock-wait on the store stays under a measured threshold." Requires the threshold to be measured first.
Opening any of those slices without a baseline means committing to a refactor whose payoff cannot be quantified. SLICE-2.0 produces the baseline. Its row block is the measured justification for Phase 2 that the roadmap has been pointing at since 2026-04-22.
Requirements Coverage
- 04. UI and Technical Requirements: the canonical store must remain reactive under load; instrumentation makes "reactive" measurable.
- 07. AI Delivery Constraints and Roadmap: each phase ships a measurable before-and-after; SLICE-2.0 is Phase 2's "before" baseline that the planned 2.1/2.2/2.3/2.4 slices delta against.
In Scope
Instrumentation in AppStateStore.Update
The current AppStateStore.Update(Func<AppState, AppState> reducer) becomes:
public void Update(
Func<AppState, AppState> reducer,
[CallerMemberName] string callerMember = "",
[CallerFilePath] string callerFile = "")
{
var caller = string.IsNullOrEmpty(callerFile)
? callerMember
: $"{Path.GetFileNameWithoutExtension(callerFile)}.{callerMember}";
var sw = Stopwatch.StartNew();
long allocBefore = GC.GetAllocatedBytesForCurrentThread();
AppState next;
lock (_lock)
{
// Lock-wait time = total elapsed up to this point (acquired the lock)
var lockWaitMicros = sw.Elapsed.TotalMicroseconds;
_metrics.StoreUpdateLockWaitMicros.Record(lockWaitMicros);
next = reducer(_current);
_current = next;
}
long allocBytes = GC.GetAllocatedBytesForCurrentThread() - allocBefore;
_metrics.StoreUpdateAllocBytes.Add(allocBytes);
_metrics.StoreUpdateCalls.Add(1, new KeyValuePair<string, object?>("caller", caller));
StateChanged?.Invoke(next);
}The two [Caller*] parameters are compile-resolved by the C# compiler at every callsite — no callsite changes required. The resulting caller string is a <TypeFileName>.<MethodName> pair, e.g. FramePipelineService.ExecuteAsync, WorkflowService.OnFaultInjected. Multiple Update calls inside the same method aggregate under one key; that granularity is sufficient for SLICE-2.0's prioritization purpose.
The IAppStateStore interface gains the same two optional parameters so the abstraction matches the implementation; existing test doubles compile unchanged because the parameters are optional and the values default at the callsite.
New AppMetrics counters
Three new counters on the existing InspectionPrototype meter:
store.update.calls(counter, dimensioncaller) — total count, broken down by caller identity.store.update.alloc.bytes(counter, no dimension) — total allocation bytes summed across allUpdatecalls.store.update.lock_wait.micros(histogram) — distribution of lock-wait time in microseconds. Histogram boundaries[1, 10, 100, 1_000, 10_000, 100_000](1 µs → 100 ms).
StateChanged event firing semantics, return value, and the Current getter remain unchanged. The lock is held for the same duration as before plus the cost of one _metrics.StoreUpdateLockWaitMicros.Record(...) call (~50 ns); the alloc-delta and call-count emissions happen outside the lock.
Measurement-extraction helpers
tools/MeasurementExtraction.psm1 gains four helpers:
Get-StoreUpdateRate -Csv $rows -DurationSeconds $dur— returnsstore.update.callstotal / duration, in calls/sec.Get-StoreUpdateAllocShare -Csv $rows -DurationSeconds $dur— returns the ratiostore.update.alloc.bytes total / dotnet.gc.heap.total_allocated totalas a percentage, rounded to 1 decimal.Get-StoreUpdateLockWaitP95 -Csv $rows— returns the p95 ofstore.update.lock_wait.microsas a number in microseconds.Get-StoreUpdateCallerDistribution -Csv $rows— returns the top 5 callers by call count as an[ordered]array[{caller=…; calls=…; share_pct=…}].
ConvertTo-MeasurementRow adds four rows to the markdown block (becomes 26-metric for captures that include the new counters; preserves the SLICE-1.2/1.3/1.4 "—" sentinel pattern when the input CSV pre-dates SLICE-2.0):
| store.update rate (calls/s) | <Get-StoreUpdateRate> |
| store.update alloc share (%) | <Get-StoreUpdateAllocShare> |
| store.update lock-wait p95 (µs) | <Get-StoreUpdateLockWaitP95> |
| store.update top-caller | <top entry from Get-StoreUpdateCallerDistribution> |The full caller-distribution table (top 5) lives in the row's Notes section, not in the metric table, because it is multi-row and exceeds the table format.
Measurement scenario
A new §5.1 in docs/runbook/capturing-measurements.md (the new §5.x section bracket signals Phase 2 captures, distinct from §4.x Phase-1):
- §5.1 Store-allocation profiling — SLICE-2.0,
MultiTagprofile. 30-minute capture using the existingMultiTagSoakFlaUiscenario with-Profile MultiTag. TheMultiTagprofile is chosen because it exercises the highest sustainedUpdaterate of any baseline profile (50-tag emitters at the 50 ms snapshot interval drive ~20 Hz tag-snapshot writes plus per-frame writes plus per-position-change writes).
No new FlaUI scenario class. Reuses the SLICE-1.6 rig.
Measurement row
A new file docs/reviews/phase-2-measurements.md is created (mirrors phase-1-measurements.md structure) with one row block, slice-2-0-store-profiling. Baseline column: slice-1-1-multi-tag-telemetry for the 22 metrics that overlap (tag/frame/encoder rates, GC, working-set, etc.); "—" for the four new SLICE-2.0 metrics since the baseline pre-dates them.
Decision rubric (informative, not gating)
The row block's Notes section includes an explicit Phase 2 prioritization recommendation based on the captured numbers:
store.update alloc share < 10%→ SLICE-2.1 (slice AppState into sub-records) and SLICE-2.2 (immutable collections) are deferred. Their stated exit gates require ≥ 40% allocation reduction atAppStateStore.Update; if the absolute number is small, even a 100% reduction is below noise. Document the deferral with the measured share.store.update alloc share ≥ 10%→ SLICE-2.1 opens. Write its spec with the measured share as the baseline; the ≥ 40% reduction target becomes a concrete number (e.g., "from 22% share to ≤ 13% share").store.update lock-wait p95 < 100 µs→ SLICE-2.4 (per-slice observables) is deferred. Lock contention is not the bottleneck.store.update lock-wait p95 ≥ 100 µs→ SLICE-2.4 opens, with the measured p95 as the baseline.- Top caller is
TagStreamPipelineService.*with > 50% of total calls → SLICE-2.3 (data-plane lift-out for tags) becomes the priority. Mirrors the SLICE-1.3 encoder bypass: tag stream becomes a channel-drain pipeline that doesn't write throughAppState. The other 2.3 sub-targets (frames, diagnostics) are evaluated separately. - Top caller is
FramePipelineService.*with > 50% → SLICE-2.3 (frames bypass) is the priority instead. - Top caller is workflow-state-machine code (
WorkflowService.*,OnFaultInjected, etc.) with > 50% → no Phase 2 slice fits cleanly. The store is doing exactly the job it was designed for. Recommend Phase 2 deferral entirely.
The row's Notes section spells out which case the captured data lands in and explicitly proposes (or defers) the next slice. Phase 2 spec authoring follows from this recommendation, not from the original roadmap text.
Out of Scope
- Refactoring
AppStateStore(slicing, immutable collections, observables, lift-out) — that's Phase 2.1/2.2/2.3/2.4, opened conditionally based on this slice's evidence. - Instrumenting subscribers of
StateChanged— that's a Phase 2.4 measurement, not Phase 2 baseline. - Per-
AppState-field allocation breakdown (e.g., "how much of the alloc comes fromLatestTagValuesrebuild vsRunHistory.Prepend") — out of scope for this baseline; would require deeper instrumentation. If 2.0 evidence motivates 2.1, the 2.1 task can add per-field accounting at that point. - Stack-walk-based caller resolution (slower, more accurate) —
[CallerMemberName]+[CallerFilePath]is sufficient for the prioritization decision. - Removing or renaming the
StateChangedevent — semantics unchanged. - Adding new profiles — uses existing
MultiTag. - A new
IScenarioclass —MultiTagSoakFlaUiwith-Profile MultiTagis the capture path. - Writing the actual Phase 2 specs (2.1/2.2/2.3/2.4) — out of scope; SLICE-2.0's row block names which Phase 2 slices to write next, but writing them is itself a separate slice-opening session.
- Changing
IAppStateStore.Update's required arguments — the two new parameters are optional with[Caller*]defaults; no existing callsite changes.
Runtime Behavior
Performance impact of instrumentation
Stopwatch.StartNew(): ~20 ns per call.GC.GetAllocatedBytesForCurrentThread(): ~30 ns per call (×2 = before + after).Counter.Add(1, KVP): ~50 ns per call.Histogram.Record(double): ~100 ns per call.- Total per-
Updateoverhead: ~250 ns.
AppStateStore.Update is called at most ~50 Hz under MultiTag (extrapolating from row 0a's 4.95 Hz × the 10× higher rate MultiTag drives via the 50 ms snapshot interval). 50 calls/sec × 250 ns/call = 12.5 µs/sec of CPU overhead, < 0.001% of a 1 GHz core. Below noise. The instrumentation is always-on; no opt-in flag.
Caller-string semantics
[CallerMemberName] returns the method name where the call site lives. [CallerFilePath] returns the absolute file path of that source file. We strip the path to the filename (no extension) and concatenate. Examples:
FramePipelineService.ExecuteAsync(frame-pipeline drain loop)FramePipelineService.ProcessDefectsForFrame(defect detection)TagStreamPipelineService.ExecuteAsync(tag snapshot drain)WorkflowService.OnFaultInjected(critical fault path)WorkflowService.OnPositionChanged(motion-update event)MainViewModel.OnRecipeListUpdated(UI-driven update)
Some methods make multiple Update calls (e.g., WorkflowService.RunLoopAsync has several writes per scan-point cycle). Those aggregate under one key; sub-key granularity is out of scope.
Histogram boundaries
store.update.lock_wait.micros uses fixed boundaries [1, 10, 100, 1_000, 10_000, 100_000] (1 µs, 10 µs, 100 µs, 1 ms, 10 ms, 100 ms). Most calls under steady load should land in the 1-10 µs bucket (uncontended lock acquisition). Contended cases (frame-pipeline write fighting with tag-stream write fighting with workflow write) would land in 10-100 µs. Anything in 1 ms+ buckets indicates real contention and would directly motivate SLICE-2.4.
Behavior when no other consumers register
store.update.calls, .alloc.bytes, and .lock_wait.micros follow the same lifecycle as the existing InspectionPrototype meter counters. They appear in dotnet-counters monitor output and dotnet-counters collect --format csv. Unit tests use a MeterListener to verify increments (same pattern as AppMetricsTagDimensionTests and AppMetricsEncoderDimensionTests).
Test-double compatibility
Existing IAppStateStore test fakes (e.g., RecordingAppStateStore from SLICE-1.3 tests) implement Update(Func<AppState, AppState>) directly. The new optional parameters mean void Update(Func<AppState, AppState> reducer, string callerMember = "", string callerFile = "") — fakes that implement the interface need to add the parameters (compile-only). Existing test setups do not need to pass the parameters explicitly because the C# compiler injects the [CallerMemberName] / [CallerFilePath] defaults.
Acceptance Criteria
This slice is satisfied only if all of the following are true:
AppStateStore.Updateis instrumented per "Instrumentation inAppStateStore.Update":[CallerMemberName]+[CallerFilePath]parameters added;Stopwatchmeasures lock-wait;GC.GetAllocatedBytesForCurrentThread()measures alloc delta; three newAppMetricscounters are emitted per call. Thelockis held for the same logical duration as before (measure lock-wait before entering critical section; emit alloc/calls after exiting); reducer execution and_current = nextassignment remain inside the lock.IAppStateStoreinterface gains the two optional[Caller*]parameters with empty-string defaults. All existingUpdate(...)callsites in the codebase compile unchanged. Existing test fakes (RecordingAppStateStoreand any others undertests/.../Stubs/) implement the new signature with the same defaults.AppMetricsexposesStoreUpdateCalls(Counter<long>),StoreUpdateAllocBytes(Counter<long>), andStoreUpdateLockWaitMicros(Histogram<double>). XML doc explains thecallerdimension onStoreUpdateCalls. The existing 11 counters (frames, telemetry, runs, samples, encoder, tags-active gauge) are unchanged.MeasurementExtraction.psm1exportsGet-StoreUpdateRate,Get-StoreUpdateAllocShare,Get-StoreUpdateLockWaitP95,Get-StoreUpdateCallerDistribution. Pester tests cover each helper with synthetic CSV input, including the "absent counter" sentinel case (returns$nullor"—"per the SLICE-1.2/1.3/1.4 convention).ConvertTo-MeasurementRowemits the four new metric rows when the input CSV contains the new counters, and"—"sentinels otherwise. Existing rows inphase-1-measurements.mdremain reproducible: re-runningConvertTo-MeasurementRowagainstslice-1-1-multi-tag-telemetry.csvproduces a row whose first 22 metrics match the existing markdown bit-for-bit, with four new "—" rows appended.- A new file
docs/reviews/phase-2-measurements.mdexists with the same conventions asphase-1-measurements.md(Conventions / Fixed metric set / row blocks / Notes) and one row block taggedslice-2-0-store-profiling. - A 30-minute
MultiTagcapture underMultiTagSoakFlaUi(commit at-or-after this slice's merge) produces the row block. The CSV is committed atdocs/captures/slice-2-0-store-profiling-<date>.csv. - The row's Notes section includes an explicit Phase 2 prioritization recommendation following the rubric in "Decision rubric": one of {2.1 opens / 2.2 opens / 2.3 (tags) opens / 2.3 (frames) opens / 2.4 opens / Phase 2 deferred entirely}. The recommendation cites the measured numbers that drove the conclusion.
docs/runbook/capturing-measurements.mdgains §5.1 (Phase-2 captures) describing the procedure. The §4.7+ Phase-2 placeholder added in SLICE-1.4 close-out is replaced.- The full existing test suite still passes, plus new tests covering:
AppStateStoreInstrumentationTestsverifies that callingUpdateincrementsStoreUpdateCalls{caller=…},StoreUpdateAllocBytes(≥ 0), and records toStoreUpdateLockWaitMicros; verifies the lock-wait timing path (no measurable alloc inside the lock-waitStopwatch.Elapsedread); verifiescallerdimension correctly identifies the calling type+method via[CallerMemberName]/[CallerFilePath]. Pester tests for the four new helpers.dotnet testruntime stays under 90 seconds. - No regression to prior measurement rows:
Capture-Measurements.ps1 -Profile MultiTagproduces a row equivalent toslice-1-1-multi-tag-telemetrywithin the existing per-tag accuracy bounds (criterion 16 of SLICE-1.4 carried forward). - Phase 2 prioritization decision (criterion 8) drives the next slice-opening session: if the recommendation says "SLICE-2.1 opens", the next session writes
SLICE-2.1spec andTASK-2.1task; if it says "Phase 2 deferred", CLAUDE.md current-position records that and the next session opens Phase 3 evaluation. SLICE-2.0's job ends at producing the recommendation.
Verification Notes
The implementation task for this spec must include verification for:
- Instrumentation overhead is below the per-Update budget. The slice's "Performance impact of instrumentation" estimate is ~250 ns per call. Verify by capturing a 60-second
MultiTagsmoke and asserting thatcpu-usage avg (%)row delta vs the pre-SLICE-2.0slice-1-1-multi-tag-telemetryrow is within ±2 percentage points. If the delta is larger, the instrumentation is too heavy; investigate before the 30-min capture. - The
callerdimension correctly identifies the caller. Unit test callsUpdatefrom inside a method namedTestCaller_RegistersAsCallerNameand asserts the emitted dimension contains both the method name and the test file's name (sans extension). - Allocation measurement is deterministic. A unit test invokes
Updatewith a reducer that does a known allocation (e.g.,new List<int>(1024)of known size), reads the emittedStoreUpdateAllocBytesincrement, and asserts it is within ±1 KB of the expected size. Unit tests must callGC.Collect()+GC.WaitForPendingFinalizers()before the assertion to drain background allocations. - Lock-wait timing distribution. Under no contention (single thread calling
Update1 000 times serially), assert p95 < 10 µs. Under contention (10 threads each callingUpdate100 times), assert p95 > 0 µs and < 1 ms (the lock should not be the dominant cost; if p95 hits 10 ms+, something is wrong). - The 30-minute capture decision rubric. The Notes section's prioritization recommendation must cite both the measured numbers and the matching rubric line. A reviewer reading only the Notes should be able to reproduce the decision without re-running the capture.
MultiTagprofile reproducibility. The first 22 metrics of theslice-2-0-store-profilingrow must match theslice-1-1-multi-tag-telemetryrow within the per-tag accuracy bounds amended in SLICE-1.1's criterion 7. This is the criterion-11 reproducibility check carried forward; if the new instrumentation perturbs the data plane enough to break reproducibility, the slice is too heavy.- The deferred-Phase-2 case is not a regression. If the captured data leads the rubric to "Phase 2 deferred entirely", that is a successful outcome — not a failure to motivate refactoring. The roadmap's principle ("simulator-first, measured, then refactor — premature refactoring is the most expensive mistake available here") makes deferral a first-class outcome of the measurement.