Replace LiveKit with WebSocket Opus audio relay#326
Merged
Conversation
cb218b9 to
f2e9e56
Compare
Remove the external LiveKit WebRTC SFU dependency and replace it with a simple WebSocket-based Opus audio relay embedded in sprout-relay. All codec work happens in Rust — the WebView is a dumb mic capture + UI layer. Relay (new audio/ module): - AudioRoomManager with DashMap<Uuid, Room> for room lifecycle - WS endpoint at /huddle/:channel_id/audio with NIP-42 keypair auth - Binary fan-out: client sends raw Opus, relay prepends 1-byte peer_index - Dual per-peer channels (audio_tx + ctrl_tx) so control messages (joined/left) are never starved by audio backpressure - Dual WS send channels (data + ctrl) with priority drain, matching the existing connection.rs pattern - Heartbeat via ping/pong (30s interval, 3 missed -> disconnect) - Peer index recycling via IndexPool freelist (no 255-wrap bug) - Relay emits kind:48101/48102 lifecycle events (single source of truth) Desktop Rust backend: - connect_audio_relay() performs synchronous WS handshake (connect + NIP-42 auth + wait for joined) before returning — auth failures propagate to the caller, no silent degradation - Opus encode (48kHz mono, 32kbps, DTX) from PCM via push_audio_pcm - Per-peer Opus decode with persistent rodio Player for playback - Client-side active speaker tracking: peer_index->pubkey map seeded from initial joined message, maintained via control messages, emitted as huddle-active-speakers Tauri event every 500ms - Surviving task explicitly aborted after select to prevent leaks Desktop frontend: - Delete livekit.ts and livekit-client npm dependency - HuddleContext: getUserMedia directly, no LiveKit connection object - HuddleIndicator: extract participant from p-tag (relay-signed events) - HuddleBar: remove livekit_room, isReconnecting - worklet.js: 4800->960 sample buffer (100ms->20ms) for Opus frame alignment - Mic stream cleanup on failed start/join via try/catch Deleted: - sprout-huddle crate (token.rs, webhook.rs, session.rs, error.rs, lib.rs) - api/huddles.rs (LiveKit token endpoint) - api/webhooks.rs (LiveKit webhook handler) - livekit.ts (frontend LiveKit connection) - livekit-client npm dependency 34 files changed, ~1450 insertions, ~1735 deletions
Fixes all issues found across 3 independent reviews (codex o3, gpt-5.4, claude-4-opus) of the LiveKit→WebSocket Opus relay replacement. Critical: - C1: start_huddle/join_huddle now rollback state on post_connect_setup failure (archive ephemeral channel for creators, reset to Idle for joiners) - C2: Audio relay pipeline emits 'huddle-audio-disconnected' Tauri event on unexpected WS drop; frontend listens and auto-leaves. Event is suppressed on intentional teardown via CancellationToken guard. teardown_huddle cancels the token BEFORE dropping pcm_tx to prevent a race where the send task exits via sender-drop before is_cancelled() is true. Important: - I3: Control channel capacity 4→32 slots with warn! on drop - I4: Client WS handshake now has 5s timeout for challenge and joined - I5: Strengthened TODO(security) comment on ensure_membership - I6: Opus encoder errors logged instead of silently swallowed Minor: - M7: 8KB text frame size limit for auth/control messages - M8: Removed dead 'stream' parameter from cleanupFailedStart - M9: Bytes::copy_from_slice→Bytes::from (zero-copy) - M10: Tag::parse uses match/warn!/return (no .expect() in prod) - M11: MAX_PEERS_PER_ROOM (25) defense-in-depth cap - M14: Heartbeat missed-pong counter readability (fetch_add+1) - M15: 48kHz documented in worklet.js, enforced in getUserMedia Reviewed across 4 codex CLI passes (7→4→6→9/10).
When the last audio WS peer disconnects, the relay now automatically archives the ephemeral channel and emits kind:48103 (huddle ended). Since bots never connect to the audio WS, every peer is a human — an empty room means zero humans remain. Concurrency design (10 codex review passes, 6→4→5→3→3→4→3→4→7→9/10): Room-level AdmissionGuard mutex synchronizes peer admission with the end-of-huddle transition. add_peer holds the lock across ended-check + index-alloc + peer-insert. remove_peer_and_check_ended holds the same lock across index-release + is_empty + ended=true. These are mutually exclusive — no concurrent add_peer can succeed after the room is marked ended, and no concurrent remover can double-trigger auto-end (!g.ended guard ensures only the first empty+!ended transition wins). Three layers of defense against stale joins: 1. ensure_membership checks archived_at before is_member 2. Post-get_or_create DB check (fail-closed) catches cross-boundary race 3. AdmissionGuard mutex catches same-room race Archive failure rolls back the ended flag via clear_ended — the huddle stays alive and the 1-hour TTL reaper handles cleanup later.
tlongwell-block
added a commit
that referenced
this pull request
May 24, 2026
The docs had drifted from the code in both directions — overstating removed features (LiveKit) and understating shipped ones (git hosting, mobile, huddles). This corrects the docs to match reality and removes hardcoded counts that go stale. - Huddles: rewrite around the in-relay WebSocket Opus path (LiveKit was removed in #326). Drop the phantom `sprout-huddle` crate from every crate map (README, AGENTS, CONTRIBUTING, ARCHITECTURE) and remove the dead LIVEKIT_* vars from .env.example. - Status tables: git hosting (smart HTTP + NIP-34) and the Flutter mobile client are real and shipping/in-progress — promote them from "designed/ planned." Huddles → built (recording/tracks still planned). - Remove hardcoded counts ("44 tools", "44-command CLI", "17 crates", "~72K LOC", per-crate LOC markers, the LOC Summary appendix). They drift; point at the source of truth instead. - Fix repository URL sprout-rs/sprout → block/sprout (incl. relay NIP-11 software field). - AGENTS crate map: complete it (was missing 8 crates), group it, add web/ and mobile/. - README: reframe sprout-mcp as "being phased out in favor of the CLI" (matches the deprecation direction) rather than "legacy/optional." - CONTRIBUTING: `cargo test -p sprout-test-client` → add `--ignored` (the E2E tests are #[ignore]d; without it the command runs nothing). - kind.rs: drop the dead RESEARCH/ doc pointer; this module is the source. - ARCHITECTURE: correct stale "Known Limitations" (huddle is wired; the send_dm/set_channel_topic actions fail at runtime, not silently). Docs-only plus comment/string fixes; no logic changes. Verified `cargo check` passes on the four crates with edited source. Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace the external LiveKit WebRTC SFU with a WebSocket-based Opus audio relay embedded in
sprout-relay. All codec work happens in Rust — the WebView becomes a dumb mic capture + UI layer. No new services, no new ports, no new network dependencies.34 files changed, +1464 / −1742 (net −278 lines)
Architecture
What changed
Relay — new
audio/moduleroom.rs—Room,AudioPeer,AdmissionGuard(mutex-synchronized peer admission + ended flag + index recycling), dual-channel fan-out (audio_tx+ctrl_tx),MAX_PEERS_PER_ROOM(25) defense-in-depth cap,remove_peer_and_check_endedfor atomic last-peer detectionhandler.rs— WS handler with NIP-42 keypair auth, dual ctrl/data send channels with priority drain, heartbeat, lifecycle event emission (48101/48102), 8 KB text frame size limit, relay-side auto-end (archive + kind:48103 when last human disconnects)mod.rs— re-exportsRelay — wiring & deletions
GET /huddle/{channel_id}/audio(WebSocket upgrade)AppState: addedaudio_rooms: Arc<AudioRoomManager>, removedhuddle_serviceandlivekit_urlsprout-huddlecrate,api/huddles.rs,api/webhooks.rs, LiveKit env var handlingDesktop Rust backend
relay_api.rs:connect_audio_relay()performs synchronous WS handshake (connect + NIP-42 auth + wait for joined) with 5s timeout before returning — auth failures propagate, no silent degradation. Background pipeline does Opus encode/decode + rodio playback. Emitshuddle-audio-disconnectedTauri event on unexpected WS drop (suppressed on intentional teardown via CancellationToken guard).state.rs: removedlivekit_*fields, addedaudio_ws_cancel+audio_relay_pcm_txmod.rs: simplified commands — no token fetch, no 48101/48102 emission (relay owns these). Audio relay failure is fatal (not degraded mode). Bothstart_huddleandjoin_huddlerollback state onpost_connect_setupfailure.events.rs: removedlivekit_roomparam frombuild_huddle_started, deleted participant join/leave builderspipeline.rs: audio relay connection failure propagates to callerDesktop frontend
livekit.tsandlivekit-clientnpm dependencyHuddleContext.tsx:getUserMediadirectly (48 kHz enforced), no LiveKit connection object, mic stream cleanup on failure, stabledisconnectMediacallback (reads track from ref, not state) to prevent React effect cascade from self-ending huddles during startup. Listens forhuddle-audio-disconnectedevent to auto-leave on unexpected relay WS drop.HuddleIndicator.tsx: extract participant pubkey fromp-tag (relay-signed 48101/48102 events)HuddleBar.tsx: removedlivekit_room,isReconnectingworklet.js: buffer 4800→960 samples (100ms→20ms) for Opus frame alignment, 48 kHz assumption documentedKey design decisions
IndexPoolfreelist prevents the 255-join exhaustion bugjoinedmessage, maintained via control messages, emitted as Tauri event every 500msconnect_audio_relay()completes WS connect + auth before returning; failures are fataldisconnectMediauses a ref for the audio track instead of state, preventing the unmount-cleanup effect from re-firing mid-startupteardown_huddlecancels the token before dropping the PCM sender to prevent a race).AdmissionGuard, a shared mutex that makesadd_peerandremove_peer_and_check_endedmutually exclusive (10 codex review passes to get the synchronization right).Testing
Bugs found and fixed during development
text.contains("leave")hack → proper JSON parseevent.pubkeyfor relay-signed 48101/48102 → p-tag extractionaudio_tx/ctrl_txconnect_audio_relayreturned before handshake → synchronous connect+auth:channel_id→{channel_id})post_connect_setupfailure swallowed as non-fatal → audio relay failure is now fataldisconnectMediadep onlocalAudioTrackstate caused unmount-cleanup to re-fire mid-startup → stable ref-based callback with[]depsPost-crossfire hardening (3 reviewers × 4 fix-review passes)
start_huddle/join_huddlecommitted state beforepost_connect_setupwith no rollback → added rollback (archive ephemeral channel for creators, reset to Idle for joiners)huddle-audio-disconnectedTauri event; frontend listens and auto-leavesunwrap_or(0)) → logged via eprintlnunwrap_or_default()→ hard errorTag::parseused.expect()in production path → match/warn!/returnMAX_PEERS_PER_ROOMdefense-in-depth cap (25) addedfetch_add + 1patterngetUserMediadidn't enforce 48 kHz →sampleRate: 48000addedstreamparameter incleanupFailedStart→ removedBytes::copy_from_slicein recv_loop →Bytes::from(zero-copy)TODO(security)comment onensure_membershipstrengthened (unverifiedparent_channel_id)is_cancelled()guardleaveHuddleRef2referenced beforeleaveHuddledefined (TDZ) → reuse existing refteardown_huddledropped PCM sender before cancelling token (race) → cancel first, then dropAuto-end feature (10 codex review passes)
ensure_membershipcheckedis_memberbeforearchived_at→ reordered so archived channels are rejected firstget_or_createDB check was missing → added fail-closed archived_at re-check to close cross-boundary raceremove_peerandmark_endedwere separate lock acquisitions → combined into atomicremove_peer_and_check_endedunder oneAdmissionGuardmutexadd_peerended check was outside the lock → moved inside, held across ended-check + alloc + insert!g.endedguard ensures only first empty+!ended transition winsclear_ended()rollback, TTL reaper handles cleanupWhat is NOT built (per plan §7)
Video, server-side recording/mixing, TURN/NAT traversal, jitter buffer, per-participant volume, WS reconnect/resume, FEC — all deferred per spec.