Skip to content

perf(client) + fix(h1): decode/body hot-path optimizations and a single-connection HTTP/1.1 pipelining deadlock#45

Merged
st0o0 merged 37 commits into
release-nextfrom
feat/dispatcher-analysis
Jun 22, 2026
Merged

perf(client) + fix(h1): decode/body hot-path optimizations and a single-connection HTTP/1.1 pipelining deadlock#45
st0o0 merged 37 commits into
release-nextfrom
feat/dispatcher-analysis

Conversation

@st0o0

@st0o0 st0o0 commented Jun 20, 2026

Copy link
Copy Markdown
Member

Overview

Five client hot-path performance optimizations on the HTTP/1.1 and HTTP/2 decode/body
paths, plus a fix for an intermittent HTTP/1.1 single-connection pipelining deadlock
that surfaced while stress-testing them. All behavior-preserving except one documented
internal decoder contract change.

Fix - HTTP/1.1 single-connection pipelining deadlock

Under heavy single-connection pipelining the server streams responses back to back, so a
response's status line or header block is frequently split across two TCP reads. The
H1.1 client kept no cross-read remainder (only the streaming back-pressure path retained
anything), so the unconsumed prefix of a split header was discarded and the next read's
continuation parsed as garbage → HttpProtocolException: Malformed header field. That
faulted one request and stranded its in-flight pipelined siblings, deadlocking the
connection (SingleConnectionConcurrencyRegressionSpec failed ~50–80% of runs).

Fix: retain the unconsumed prefix in Http11ClientStateMachine (_partialResponse)
and prepend it to the next inbound buffer - mirroring what the H2 FrameDecoder already
does - cleared on disconnect, decode failure, and cleanup. Added a deterministic repro
(Http11ClientFragmentedResponseSpec) covering status-line, header-line, and
second-pipelined-response splits. The stress guard now drains 256 × 40 cleanly.

Performance optimizations

  • H2 FrameDecoder.Decode returns its reused _frames list directly instead of
    ToArray() per inbound read — removes one array allocation per H2 network read on both
    client and server. Callers that hold a result across a later Decode now snapshot
    explicitly; CLAUDE.md updated to document the contract.
  • HPACK encoder/decoder reuse the raw UTF-8 byte lengths they already computed instead
    of recomputing GetByteCount when adding to the dynamic table (single 4-arg Add).
  • H1.x BufferSearch.FindCrlf uses a single vectorized two-byte IndexOf("\r\n"u8)
    instead of find-CR-then-check-LF with scan restarts — runs once per response line.
  • QueuedBodyStream.CopyToAsync override writes pooled body chunks straight to the
    destination, removing a body-sized copy and the 81,920-byte framework rent per buffered
    download (return-to-pool deferred until after the write completes).

Testing

All test projects green, no flakes:
unit 5754/0 · End2End 101/0 · Client 474/0 · Server 89/0 · AcceptanceTests 712/0 ·
API.Tests 1/0 · AspNetCore.Tests 36/0. Each change is TDD'd with a deterministic spec;
Roslyn diagnostics clean on every changed file.

Notes

  • Submodule lib/servus.akka is unchanged (gitlink not bumped).
  • The only behavioral contract change is FrameDecoder.Decode returning a reused buffer;
    all in-tree callers consume it synchronously and tests that held results were updated to
    snapshot.

@st0o0 st0o0 changed the base branch from main to release-next June 20, 2026 19:37
@st0o0 st0o0 force-pushed the feat/dispatcher-analysis branch from 7448e0c to 2458cab Compare June 22, 2026 10:20
@st0o0 st0o0 force-pushed the feat/dispatcher-analysis branch from 1d1928c to 4db1c82 Compare June 22, 2026 13:22
@st0o0 st0o0 merged commit 63f42a6 into release-next Jun 22, 2026
6 checks passed
@st0o0 st0o0 deleted the feat/dispatcher-analysis branch June 22, 2026 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant