refactor: split network maintenance loop into smaller functions#430
refactor: split network maintenance loop into smaller functions#430xdustinface merged 1 commit intov0.42-devfrom
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughThe peer maintenance loop in the network manager is refactored by extracting per-tick maintenance logic and DNS fallback discovery into separate private methods ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
9e299da to
f20068c
Compare
|
This PR has merge conflicts with the base branch. Please rebase or merge the base branch into your branch to resolve them. |
17eaa24 to
1d9fe8e
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dash-spv/src/network/manager.rs (1)
861-885:⚠️ Potential issue | 🔴 CriticalReplace
time::sleepwithtime::intervalto prevent maintenance task starvation.
time::sleep(MAINTENANCE_INTERVAL)creates a fresh future on eachtokio::select!loop iteration. Sincetime::interval().tick()fires immediately on the first poll, the DNS branch completes first and the maintenance sleep is dropped before awaiting. WithMAINTENANCE_INTERVALandDNS_DISCOVERY_DELAYboth at 10 seconds,maintenance_tickwill never execute in non-exclusive mode.Use
time::intervalfor both timers to maintain independent periodic cadences:Proposed fix
async fn start_maintenance_loop(&self) { let this = self.clone(); let mut tasks = self.tasks.lock().await; tasks.spawn(async move { + let mut maintenance_interval = time::interval(MAINTENANCE_INTERVAL); let mut dns_interval = time::interval(DNS_DISCOVERY_DELAY); while !this.shutdown_token.is_cancelled() { tokio::select! { - _ = time::sleep(MAINTENANCE_INTERVAL) => { + _ = maintenance_interval.tick() => { log::debug!("Maintenance interval elapsed"); this.maintenance_tick().await; } _ = dns_interval.tick(), if !this.exclusive_mode => { this.dns_fallback_tick().await; } _ = this.shutdown_token.cancelled() => { log::info!("Maintenance loop shutting down"); break; } } } }); }Note: Both
time::intervalinstances tick immediately on first poll. Useinterval.tick().awaitbefore the loop ortime::interval_at()to skip the initial tick if needed.
1d9fe8e to
0a40638
Compare
The upstream refactor (#430) extracted maintenance loop logic into maintenance_tick() and dns_fallback_tick() methods but dropped the shutdown_token checks that this branch had added inside the loops. Restore them so shutdown is not delayed by in-progress peer connections or pings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(spv): exit manager task loop on network errors and signal shutdown before disconnect The tick branch in SyncManager::run() now breaks on SyncError::Network instead of logging indefinitely, preventing ~26s of channel-closed spam during shutdown. SyncCoordinator gains a signal_shutdown() method that cancels the shared token without awaiting joins, and DashSpvClient::stop() calls it before network.disconnect() so managers observe cancellation promptly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Lukasz Klimek <842586+lklimek@users.noreply.github.com> * fix(dash-spv): shutdown token not checked when waiting for peer connection * fix(spv): address review findings for network error handling - Emit SyncEvent::ManagerError before breaking on network tick errors so the coordinator has visibility into why a manager stopped - Clear pool connecting state when shutdown cancels a peer connection - Import SyncError and use short form instead of fully-qualified path - Document SyncError::Network variant's tick() exit semantics Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(spv): replace signal_shutdown with full shutdown in stop() Await sync_coordinator.shutdown() in stop() to drain manager tasks before tearing down network and storage. Remove the now-unused signal_shutdown() method. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(spv): resolve deadlock in PeerNetworkManager shutdown PeerNetworkManager::shutdown() held self.tasks lock while draining the JoinSet. The maintenance loop task, being drained, called connect_to_peer() which also needed self.tasks lock — classic deadlock. Fix with two changes: - shutdown(): use std::mem::take to extract JoinSet from mutex before draining, so the lock is released immediately - connect_to_peer(): wrap lock acquisition with tokio::select! against shutdown token so it returns immediately during shutdown Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * doc: document logging * feat(spv): surface FatalNetwork errors to API consumers Add SyncEvent::ManagerExited variant and set SyncState::Error on fatal network exits so consumers can detect when a manager stops permanently. Extract handle_fatal_exit helper to deduplicate the 4 FatalNetwork paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(spv): self-recover on FatalNetwork instead of exiting manager loop Managers now reset to WaitingForConnections and continue their loop on FatalNetwork errors, letting PeersUpdated events re-trigger sync. This eliminates the need for consumers to detect ManagerExited and restart the entire client. Managers only exit when the shutdown token fires. - Rename handle_fatal_exit → handle_fatal_network_error - Change all 4 FatalNetwork paths from break to continue - Remove SyncEvent::ManagerExited variant (no longer needed) - Update FatalNetwork doc to reflect reset behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(spv): remove take() in PeerNetworkManager::shutdown to close race The JoinSet was moved out of the mutex via std::mem::take before draining. This created a race where in-flight tasks that had not yet observed the cancellation token could spawn new tasks into the original (now empty) JoinSet, leaving them orphaned. Since connect_to_peer() already uses select! with the cancellation token when acquiring the tasks lock, no deadlock can occur. Drain the JoinSet while holding the lock instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(spv): add missing shutdown checks in maintenance loop Add shutdown_token check before connect_to_peer in the DNS discovery peer loop, and add an early-exit check before the ping/save section so the maintenance loop exits promptly during shutdown. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(spv): address PR #440 audit findings Remove FatalNetwork variant (never produced in production) and match on Network instead. Add 2-second cooldown after network-error recovery to prevent log/event flooding from the 100ms tick loop. Replace the error-prone handle_fatal_network_error (which required callers to call set_state first) with a recover_from_network_error default trait method that enforces correct ordering. Migrate all log:: calls to tracing:: in manager.rs to match project convention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fmt * refactor(spv): extract network error cooldown into named constant Replace inline `Duration::from_secs(2)` with `NETWORK_ERROR_COOLDOWN` constant for clarity. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fmt * fix(spv): restore shutdown checks dropped during upstream refactor The upstream refactor (#430) extracted maintenance loop logic into maintenance_tick() and dns_fallback_tick() methods but dropped the shutdown_token checks that this branch had added inside the loops. Restore them so shutdown is not delayed by in-progress peer connections or pings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fmt * revert: CLAUDE.md logging info * revert logging changes * Update dash-spv/src/network/manager.rs * chore: fmt * fix: don't set SyncState::WaitingForConnections) on network error * refactor: remove cooldown * refactor: strip non-shutdown changes from PR 440 Revert network error recovery, doc comment changes, and sync_manager additions so this branch contains only shutdown/deadlock fixes. Non-shutdown improvements are moved to fix/spv-network-error-recovery. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Signed-off-by: Lukasz Klimek <842586+lklimek@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Make it better readable and maintainable.
Based on:
start_maintenance_looptask #426Summary by CodeRabbit