fix(presence): clear on disconnect, fix heartbeat/TTL, drop broken REST path#877
Merged
Conversation
… path Three distinct bugs caused users to appear offline while still active: 1. No `clear_presence` on WS disconnect — crashed clients stayed "online" in Redis until the 90s TTL expired, then jumped straight to offline. Fix: clear presence in `handle_connection` cleanup when the connection was authenticated. 2. Heartbeat/TTL mismatch — the Redis TTL comment says "3x the 30s heartbeat" but the desktop heartbeat was 60s, leaving only 30s of slack. One missed heartbeat = user goes offline. Fix: halve the heartbeat from 60s to 30s to match the TTL's design assumption. 3. Silent broken REST fallback — `useSetPresenceMutation` caught WS failures and fell back to the Tauri `set_presence` command, which used `POST /events`. The relay rejects kind:20001 over HTTP, so the fallback always silently failed. Fix: remove the fallback; the 30s heartbeat loop provides natural retry behavior. Also removes `reconcile_mcp_commands_in_file` — the migration that cleared stale `"sprout-mcp-server"` values from managed-agents.json. It has been running on every launch since before #850 tightened it; all active users are already clean and the binary no longer ships. Bonus: fix WS subscription handler to extract the real user pubkey from the `p` tag on relay-synthesized presence events (previously used `event.pubkey` which is the relay's key for synthesized events).
Disconnect cleanup now checks whether the user has other active connections before clearing Redis presence. Without this guard, CLI `set-presence online` (short-lived WS) would immediately delete its own presence on disconnect, and multi-device users would lose presence when closing one device. Also fixes the E2E mock bridge: kind:20001 events from `relayClient.sendPresence()` now get a dedicated handler that updates the mock presence map and fans out to global subscribers, matching real relay behavior. Removes the dead `set_presence` Tauri command mock (`handleSetPresence`, `RawSetPresenceResponse`, dispatcher case).
wesbillman
approved these changes
Jun 5, 2026
michaelneale
added a commit
that referenced
this pull request
Jun 6, 2026
* origin/main: chore(release): release version 0.3.12 (#886) Show hover cards for inline message emoji (#885) Fix monotonic read-state merges (#884) Refine sidebar behavior and borders (#869) fix(presence): clear on disconnect, fix heartbeat/TTL, drop broken REST path (#877) fix(cli): publish ephemeral events over WebSocket via sprout-ws-client (#876) docs(sprout-acp): add communication discipline rules to base prompt + deprecate --mention flag (#883) Polish thread summaries and reactions (#881) feat(cli): add emoji export and import subcommands (#882) Polish message row hover states (#880) Improve emoji naming and custom emoji UX (#878) docs: add ecosystem section to CONTRIBUTING.md, fix stale release info (#873) fix(relay): wire custom filter fields through HTTP bridge (#864) chore: deprecate sprout-mcp — fill CLI gaps, remove crate and all references (#850) Fix custom emoji status in profile popover (#874) fix(agent): gate handoff on provider token usage, not byte estimate (#821) docs: add VISION_MESH.md — the compute-commons vision (#867) fix(desktop): simplify profile popover header (#853) fix(desktop): remove thread comment hover outline (#861) feat(desktop): always show channel section search/add buttons (#856) # Conflicts: # crates/sprout-cli/src/client.rs # desktop/src/app/AppShell.tsx # justfile
tellaho
pushed a commit
that referenced
this pull request
Jun 8, 2026
…ST path (#877) Signed-off-by: Taylor Ho <taylorkmho@gmail.com>
wpfleger96
pushed a commit
that referenced
this pull request
Jun 8, 2026
Re-enable the generic `reconcile_provider_mcp_commands` migration that was accidentally removed from the startup sequence in PR #877. This reconciles `mcp_command` values in managed-agents.json against the discovery table on every launch — fixing stale "sprout-mcp-server" references and any future drift without a dedicated one-off function. Extended to cover both `app_data_dir()` and `canonical_dev_data_dir` so worktree instances are also healed. Additionally, harden the spawn site in runtime.rs: if `mcp_command` references a binary that cannot be resolved, log a warning and continue spawning without MCP rather than hard-failing. This prevents this entire class of breakage permanently, regardless of whether reconciliation ran. Fixes the user-reported issue where agents created before v0.3.12 fail to spawn because "sprout-mcp-server" no longer exists.
wesbillman
added a commit
that referenced
this pull request
Jun 8, 2026
Re-enable the generic `reconcile_provider_mcp_commands` migration that was accidentally removed from the startup sequence in PR #877. This reconciles `mcp_command` values in managed-agents.json against the discovery table on every launch — fixing stale "sprout-mcp-server" references and any future drift without a dedicated one-off function. Extended to cover both `app_data_dir()` and `canonical_dev_data_dir` so worktree instances are also healed. Additionally, harden the spawn site in runtime.rs: if `mcp_command` references a binary that cannot be resolved, log a warning and continue spawning without MCP rather than hard-failing. This prevents this entire class of breakage permanently, regardless of whether reconciliation ran. Fixes the user-reported issue where agents created before v0.3.12 fail to spawn because "sprout-mcp-server" no longer exists. Co-authored-by: Brain <21994759fc7a6fa6b965551d35cfd7897d262f2495467f2d78694ddcfa6a5c7e@sprout-oss.stage.blox.sqprod.co> Co-authored-by: Duncan <dcfd242e557282d7a1e2cf2e6877522682f1e5c6156dc92ca7d90eaedd3b0f95@sprout-oss.stage.blox.sqprod.co> Signed-off-by: Wes <wesbillman@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three bugs caused users to appear offline or away while they were clearly active:
Bug 1 — No presence clear on disconnect. When a WebSocket connection drops (crash, network loss, app quit without sending "offline"), the relay never removed the user's Redis presence key. The user stayed "online" until the 90-second TTL expired, then vanished to "offline" with no "away" transition. Fix:
handle_connection()cleanup now callsclear_presence()when the connection was authenticated and no other connections remain for that pubkey — preventing CLI one-shot publishes and multi-device sessions from clearing each other's presence.Bug 2 — Heartbeat/TTL mismatch. The Redis TTL comment says "3x the 30s heartbeat interval", but the desktop heartbeat was 60 seconds — leaving only 30 seconds of slack. One missed heartbeat (brief WS reconnect, backgrounded tab, network hiccup) and the key expires before the next beat fires. Fix: halve
PRESENCE_HEARTBEAT_INTERVAL_MSfrom60_000to30_000to match the TTL's design assumption.Bug 3 — Silent broken REST fallback.
useSetPresenceMutationcaught WS failures and fell back to the Tauriset_presencecommand, which submitted kind:20001 overPOST /events. The relay rejects that. The error was swallowed by.catch(() => {}), so WS failures silently lost presence updates. Fix: remove the fallback entirely — the 30s heartbeat loop provides natural retry.Also removes the
reconcile_mcp_commands_in_filestartup migration that cleared stale"sprout-mcp-server"values frommanaged-agents.json. It has been running on every launch since before #850; all active users are clean and the binary no longer ships.Bonus:
handlePresenceEventin the WS subscription now extracts the real user pubkey from theptag on relay-synthesized presence events (previously readevent.pubkey, which is the relay's key for synthesized events).crates/sprout-relay/src/connection.rs— clear presence on disconnect, guarded by remaining-connections check viaconnection_ids_for_pubkeydesktop/src/features/presence/hooks.ts— halve heartbeat, remove REST fallback, fix p-tag handlingdesktop/src-tauri/src/commands/profile.rs+events.rs+models.rs— remove deadset_presencecommand +build_presencebuilder +SetPresenceResponsetypedesktop/src-tauri/src/migration.rs+lib.rs— removereconcile_mcp_commands_in_file+ startup call + 9 testsdesktop/src/shared/api/tauri.ts+types.ts— remove deadsetPresence()function +SetPresenceResulttypedesktop/src/testing/e2eBridge.ts— add kind:20001 mock WS handler, remove deadset_presenceTauri mock