Skip to content

feat(cluster): add TLS 1.3 to replica plane with PSK channel binding#3461

Open
hubcio wants to merge 1 commit into
masterfrom
tcp-tls-in-cluster
Open

feat(cluster): add TLS 1.3 to replica plane with PSK channel binding#3461
hubcio wants to merge 1 commit into
masterfrom
tcp-tls-in-cluster

Conversation

@hubcio

@hubcio hubcio commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

The replica port carried plaintext: the PSK handshake authenticates
cluster membership but gives no confidentiality, and encryption was
delegated to out-of-band tunnels (WireGuard/VPC). Replica traffic
now supports in-process TLS behind an opt-in [cluster.tls] table.

TLS session state cannot cross shard boundaries, so shard 0 no
longer runs the PSK handshake before delegating: it blindly ships
the raw fd right after accept or dial, and the owning shard wraps
it and runs the handshake inside the TLS stream, both under one
handshake_grace budget. With handshakes no longer serialized on
shard 0, unauthenticated inflight connections get a global cap: a
shard-0 slot table, an outcome ack from the owning shard, and a
deadline that reclaims slots when an ack is lost. The same ack
clears a pending-dial set that stops the periodic reconnect sweep
from double-dialing a peer whose handshake is still in flight.
When the owning shard is shard 0 itself the ack releases inline
on the local bus instead of self-sending a frame.

TLS 1.3 only, ALPN "iggy-replica". Two cert modes mirror the
legacy TcpTlsConfig shape: CA files (default; the new ca_file
anchors the dialer, a TLS-client role the server plane never has)
and self_signed (auto-generated cert, accept-any verifier). Both
modes require cluster.auth: neither carries client certificates,
so TLS authenticates the acceptor only and the PSK handshake stays
the sole peer authenticator. The PSK MAC absorbs the TLS exporter
value plus a mode byte: a relay MITM terminating both legs
produces two different exporters and the handshake fails. The
exporter is only reachable by handshaking with futures-rustls
directly and converting into the compio-tls stream via its public
From impls; a tripwire test pins that route. The transcript change
is wire-incompatible with the previous MAC, acceptable while
server-ng is unreleased.

Also fixes framing::write_message never flushing: rustls holds
ciphertext in its internal buffer until flushed, so handshake
frames written over TLS never reached the wire (no-op on plain
TCP).

The replica port carried plaintext: the PSK handshake authenticates
cluster membership but gives no confidentiality, and encryption was
delegated to out-of-band tunnels (WireGuard/VPC). Replica traffic
now supports in-process TLS behind an opt-in [cluster.tls] table.

TLS session state cannot cross shard boundaries, so shard 0 no
longer runs the PSK handshake before delegating: it blindly ships
the raw fd right after accept or dial, and the owning shard wraps
it and runs the handshake inside the TLS stream, both under one
handshake_grace budget. With handshakes no longer serialized on
shard 0, unauthenticated inflight connections get a global cap: a
shard-0 slot table, an outcome ack from the owning shard, and a
deadline that reclaims slots when an ack is lost. The same ack
clears a pending-dial set that stops the periodic reconnect sweep
from double-dialing a peer whose handshake is still in flight.
When the owning shard is shard 0 itself the ack releases inline
on the local bus instead of self-sending a frame.

TLS 1.3 only, ALPN "iggy-replica". Two cert modes mirror the
legacy TcpTlsConfig shape: CA files (default; the new ca_file
anchors the dialer, a TLS-client role the server plane never has)
and self_signed (auto-generated cert, accept-any verifier). Both
modes require cluster.auth: neither carries client certificates,
so TLS authenticates the acceptor only and the PSK handshake stays
the sole peer authenticator. The PSK MAC absorbs the TLS exporter
value plus a mode byte: a relay MITM terminating both legs
produces two different exporters and the handshake fails. The
exporter is only reachable by handshaking with futures-rustls
directly and converting into the compio-tls stream via its public
From impls; a tripwire test pins that route. The transcript change
is wire-incompatible with the previous MAC, acceptable while
server-ng is unreleased.

Also fixes framing::write_message never flushing: rustls holds
ciphertext in its internal buffer until flushed, so handshake
frames written over TLS never reached the wire (no-op on plain
TCP).
@github-actions github-actions Bot added the S-waiting-on-review PR is waiting on a reviewer label Jun 12, 2026
@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 70.05814% with 309 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.39%. Comparing base (843457d) to head (52e0463).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
core/server-ng/src/bootstrap.rs 0.00% 140 Missing ⚠️
core/shard/src/router.rs 0.00% 51 Missing ⚠️
core/message_bus/src/replica/handshake.rs 79.25% 45 Missing and 5 partials ⚠️
core/message_bus/src/transports/tls/mod.rs 50.00% 27 Missing ⚠️
core/message_bus/src/installer/mod.rs 13.04% 20 Missing ⚠️
core/message_bus/src/installer/replica.rs 90.85% 8 Missing and 7 partials ⚠️
core/shard/src/coordinator.rs 94.00% 3 Missing ⚠️
core/message_bus/src/replica/io.rs 60.00% 2 Missing ⚠️
core/message_bus/src/framing.rs 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master    #3461       +/-   ##
=============================================
- Coverage     74.56%   60.39%   -14.18%     
  Complexity      937      937               
=============================================
  Files          1249     1248        -1     
  Lines        123564   112691    -10873     
  Branches      99839    88997    -10842     
=============================================
- Hits          92133    68057    -24076     
- Misses        28448    41540    +13092     
- Partials       2983     3094      +111     
Components Coverage Δ
Rust Core 57.73% <70.05%> (-17.97%) ⬇️
Java SDK 58.57% <ø> (ø)
C# SDK 69.30% <ø> (-0.62%) ⬇️
Python SDK 81.06% <ø> (ø)
PHP SDK 83.57% <ø> (ø)
Node SDK 91.34% <ø> (-0.02%) ⬇️
Go SDK 40.25% <ø> (ø)
Files with missing lines Coverage Δ
core/configs/src/server_config/cluster.rs 100.00% <100.00%> (ø)
core/configs/src/server_config/defaults.rs 77.70% <100.00%> (+0.05%) ⬆️
core/configs/src/server_config/validators.rs 78.39% <100.00%> (+2.84%) ⬆️
core/message_bus/src/connector.rs 97.05% <100.00%> (+7.72%) ⬆️
core/message_bus/src/installer/tcp.rs 93.95% <ø> (ø)
core/message_bus/src/lib.rs 91.73% <100.00%> (+2.54%) ⬆️
core/message_bus/src/replica/auth.rs 94.56% <100.00%> (+1.36%) ⬆️
core/message_bus/src/replica/listener.rs 78.26% <100.00%> (-2.31%) ⬇️
core/message_bus/src/transports/tcp_tls.rs 95.46% <100.00%> (+0.56%) ⬆️
core/shard/src/lib.rs 78.39% <ø> (ø)
... and 10 more

... and 226 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review PR is waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant