Skip to content

refactor: split network maintenance loop into smaller functions#430

Merged
xdustinface merged 1 commit intov0.42-devfrom
refactor/split-loop
Feb 17, 2026
Merged

refactor: split network maintenance loop into smaller functions#430
xdustinface merged 1 commit intov0.42-devfrom
refactor/split-loop

Conversation

@xdustinface
Copy link
Collaborator

@xdustinface xdustinface commented Feb 11, 2026

Make it better readable and maintainable.

Based on:

Summary by CodeRabbit

  • Refactor
    • Reorganized peer maintenance processes by consolidating network management tasks into focused operations for improved code clarity.
    • Enhanced DNS-based peer discovery fallback logic with better separation of concerns and control flow.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 11, 2026

Warning

Rate limit exceeded

@xdustinface has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 7 minutes and 41 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

The peer maintenance loop in the network manager is refactored by extracting per-tick maintenance logic and DNS fallback discovery into separate private methods (maintenance_tick and dns_fallback_tick), improving code organization while preserving existing behavior and public APIs.

Changes

Cohort / File(s) Summary
Peer Maintenance Refactoring
dash-spv/src/network/manager.rs
Extracted maintenance loop logic into dedicated maintenance_tick (handles cleanup, peer counting, reconnection, pings, data persistence) and dns_fallback_tick (handles DNS-based peer discovery). Main loop now delegates to these methods on appropriate intervals. Existing behavior preserved for exclusive mode, reputation updates, and peer persistence.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hoppy refactoring, so neat and so clean,
Ticks split asunder, a system serene,
Peers maintained well in methods so fine,
Logic extracted—each task gets its line!
DNS discovery hops on its own beat,
Making our code more organized—what a treat! 🌟

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: refactoring the network maintenance loop by extracting logic into smaller functions (maintenance_tick and dns_fallback_tick).
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into v0.42-dev

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/split-loop

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@xdustinface xdustinface force-pushed the refactor/self-in-run-loop branch from 9e299da to f20068c Compare February 12, 2026 03:44
@github-actions
Copy link

This PR has merge conflicts with the base branch. Please rebase or merge the base branch into your branch to resolve them.

@github-actions github-actions bot added the merge-conflict The PR conflicts with the target branch. label Feb 12, 2026
Base automatically changed from refactor/self-in-run-loop to v0.42-dev February 12, 2026 19:56
@xdustinface xdustinface marked this pull request as ready for review February 13, 2026 00:31
@github-actions github-actions bot removed the merge-conflict The PR conflicts with the target branch. label Feb 13, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dash-spv/src/network/manager.rs (1)

861-885: ⚠️ Potential issue | 🔴 Critical

Replace time::sleep with time::interval to prevent maintenance task starvation.

time::sleep(MAINTENANCE_INTERVAL) creates a fresh future on each tokio::select! loop iteration. Since time::interval().tick() fires immediately on the first poll, the DNS branch completes first and the maintenance sleep is dropped before awaiting. With MAINTENANCE_INTERVAL and DNS_DISCOVERY_DELAY both at 10 seconds, maintenance_tick will never execute in non-exclusive mode.

Use time::interval for both timers to maintain independent periodic cadences:

Proposed fix
     async fn start_maintenance_loop(&self) {
         let this = self.clone();
         let mut tasks = self.tasks.lock().await;
         tasks.spawn(async move {
+            let mut maintenance_interval = time::interval(MAINTENANCE_INTERVAL);
             let mut dns_interval = time::interval(DNS_DISCOVERY_DELAY);
 
             while !this.shutdown_token.is_cancelled() {
                 tokio::select! {
-                    _ = time::sleep(MAINTENANCE_INTERVAL) => {
+                    _ = maintenance_interval.tick() => {
                         log::debug!("Maintenance interval elapsed");
                         this.maintenance_tick().await;
                     }
                     _ = dns_interval.tick(), if !this.exclusive_mode => {
                         this.dns_fallback_tick().await;
                     }
                     _ = this.shutdown_token.cancelled() => {
                         log::info!("Maintenance loop shutting down");
                         break;
                     }
                 }
             }
         });
     }

Note: Both time::interval instances tick immediately on first poll. Use interval.tick().await before the loop or time::interval_at() to skip the initial tick if needed.

ZocoLini
ZocoLini previously approved these changes Feb 16, 2026
@xdustinface xdustinface merged commit 7e0f44d into v0.42-dev Feb 17, 2026
53 checks passed
@xdustinface xdustinface deleted the refactor/split-loop branch February 17, 2026 00:48
lklimek added a commit that referenced this pull request Feb 17, 2026
The upstream refactor (#430) extracted maintenance loop logic into
maintenance_tick() and dns_fallback_tick() methods but dropped the
shutdown_token checks that this branch had added inside the loops.
Restore them so shutdown is not delayed by in-progress peer connections
or pings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
xdustinface pushed a commit that referenced this pull request Feb 18, 2026
* fix(spv): exit manager task loop on network errors and signal shutdown before disconnect

The tick branch in SyncManager::run() now breaks on SyncError::Network
instead of logging indefinitely, preventing ~26s of channel-closed spam
during shutdown. SyncCoordinator gains a signal_shutdown() method that
cancels the shared token without awaiting joins, and DashSpvClient::stop()
calls it before network.disconnect() so managers observe cancellation
promptly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Lukasz Klimek <842586+lklimek@users.noreply.github.com>

* fix(dash-spv): shutdown token not checked when waiting for peer connection

* fix(spv): address review findings for network error handling

- Emit SyncEvent::ManagerError before breaking on network tick errors
  so the coordinator has visibility into why a manager stopped
- Clear pool connecting state when shutdown cancels a peer connection
- Import SyncError and use short form instead of fully-qualified path
- Document SyncError::Network variant's tick() exit semantics

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(spv): replace signal_shutdown with full shutdown in stop()

Await sync_coordinator.shutdown() in stop() to drain manager tasks
before tearing down network and storage. Remove the now-unused
signal_shutdown() method.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(spv): resolve deadlock in PeerNetworkManager shutdown

PeerNetworkManager::shutdown() held self.tasks lock while draining the
JoinSet. The maintenance loop task, being drained, called connect_to_peer()
which also needed self.tasks lock — classic deadlock.

Fix with two changes:
- shutdown(): use std::mem::take to extract JoinSet from mutex before
  draining, so the lock is released immediately
- connect_to_peer(): wrap lock acquisition with tokio::select! against
  shutdown token so it returns immediately during shutdown

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* doc: document logging

* feat(spv): surface FatalNetwork errors to API consumers

Add SyncEvent::ManagerExited variant and set SyncState::Error on fatal
network exits so consumers can detect when a manager stops permanently.
Extract handle_fatal_exit helper to deduplicate the 4 FatalNetwork paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(spv): self-recover on FatalNetwork instead of exiting manager loop

Managers now reset to WaitingForConnections and continue their loop on
FatalNetwork errors, letting PeersUpdated events re-trigger sync. This
eliminates the need for consumers to detect ManagerExited and restart
the entire client. Managers only exit when the shutdown token fires.

- Rename handle_fatal_exit → handle_fatal_network_error
- Change all 4 FatalNetwork paths from break to continue
- Remove SyncEvent::ManagerExited variant (no longer needed)
- Update FatalNetwork doc to reflect reset behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(spv): remove take() in PeerNetworkManager::shutdown to close race

The JoinSet was moved out of the mutex via std::mem::take before
draining. This created a race where in-flight tasks that had not yet
observed the cancellation token could spawn new tasks into the original
(now empty) JoinSet, leaving them orphaned.

Since connect_to_peer() already uses select! with the cancellation
token when acquiring the tasks lock, no deadlock can occur. Drain the
JoinSet while holding the lock instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(spv): add missing shutdown checks in maintenance loop

Add shutdown_token check before connect_to_peer in the DNS discovery
peer loop, and add an early-exit check before the ping/save section
so the maintenance loop exits promptly during shutdown.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(spv): address PR #440 audit findings

Remove FatalNetwork variant (never produced in production) and match on
Network instead. Add 2-second cooldown after network-error recovery to
prevent log/event flooding from the 100ms tick loop. Replace the
error-prone handle_fatal_network_error (which required callers to call
set_state first) with a recover_from_network_error default trait method
that enforces correct ordering. Migrate all log:: calls to tracing:: in
manager.rs to match project convention.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fmt

* refactor(spv): extract network error cooldown into named constant

Replace inline `Duration::from_secs(2)` with `NETWORK_ERROR_COOLDOWN`
constant for clarity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fmt

* fix(spv): restore shutdown checks dropped during upstream refactor

The upstream refactor (#430) extracted maintenance loop logic into
maintenance_tick() and dns_fallback_tick() methods but dropped the
shutdown_token checks that this branch had added inside the loops.
Restore them so shutdown is not delayed by in-progress peer connections
or pings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fmt

* revert: CLAUDE.md logging info

* revert logging changes

* Update dash-spv/src/network/manager.rs

* chore: fmt

* fix: don't set SyncState::WaitingForConnections) on network error

* refactor: remove cooldown

* refactor: strip non-shutdown changes from PR 440

Revert network error recovery, doc comment changes, and sync_manager
additions so this branch contains only shutdown/deadlock fixes.
Non-shutdown improvements are moved to fix/spv-network-error-recovery.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Signed-off-by: Lukasz Klimek <842586+lklimek@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments