Skip to content

fix(registry): support canister migration from/to deleted subnet in registry#10470

Open
mraszyk wants to merge 7 commits into
masterfrom
mraszyk/migrate-canisters-on-deleted-subnets-in-registry
Open

fix(registry): support canister migration from/to deleted subnet in registry#10470
mraszyk wants to merge 7 commits into
masterfrom
mraszyk/migrate-canisters-on-deleted-subnets-in-registry

Conversation

@mraszyk

@mraszyk mraszyk commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

This PR supports canister migration from/to deleted subnet in the registry by:

  • adding a test that canister migration from a deleted (source) subnet succeeds (the implementation already supports this case so we just add a test to prevent a regression);
  • skipping updating the deleted target subnet in the routing table (the canister migration is still successful since the outcome should be as if the migration completed first and the subnet was only deleted later) and adding a test covering this case.

The motivation for this change is to ensure that the state machine in the migration canister does not get stuck if a subnet is deleted while a canister migration is ongoing. In such a case, the canister migration is supposed to complete with the same outcome as if the canister migration completed before subnet deletion (which is infeasible to always ensure in practice though, e.g., if the deleted subnet was stuck).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the registry canister’s canister-migration logic to tolerate migrations involving subnets that have been deleted from the registry/routing table, and adds regression tests to lock in the behavior.

Changes:

  • Adjust routing-table mutation so migrating to a target subnet that is absent from the routing table does not leave any ranges assigned to that (deleted) target subnet.
  • Relax payload validation to allow migrations to target subnets that no longer exist in the registry.
  • Add/extend unit tests covering migration when the source subnet is “deleted” (unrouted gap) and when the target subnet does not exist.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
rs/registry/canister/src/mutations/routing_table.rs Skips persisting assignments to a target subnet that has no existing ranges in the routing table.
rs/registry/canister/src/mutations/do_migrate_canisters.rs Removes target-subnet existence validation and adds tests for deleted-source / deleted-target migration scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rs/registry/canister/src/mutations/routing_table.rs Outdated
Comment thread rs/registry/canister/src/mutations/do_migrate_canisters.rs Outdated
Comment thread rs/registry/canister/src/mutations/do_migrate_canisters.rs
mraszyk and others added 4 commits June 15, 2026 12:18
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@mraszyk mraszyk marked this pull request as ready for review June 15, 2026 11:17
@mraszyk mraszyk requested a review from a team as a code owner June 15, 2026 11:17

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request changes code owned by the Governance team. Therefore, make sure that
you have considered the following (for Governance-owned code):

  1. Update unreleased_changelog.md (if there are behavior changes, even if they are
    non-breaking).

  2. Are there BREAKING changes?

  3. Is a data migration needed?

  4. Security review?

How to Satisfy This Automatic Review

  1. Go to the bottom of the pull request page.

  2. Look for where it says this bot is requesting changes.

  3. Click the three dots to the right.

  4. Select "Dismiss review".

  5. In the text entry box, respond to each of the numbered items in the previous
    section, declare one of the following:

  • Done.

  • $REASON_WHY_NO_NEED. E.g. for unreleased_changelog.md, "No
    canister behavior changes.", or for item 2, "Existing APIs
    behave as before.".

Brief Guide to "Externally Visible" Changes

"Externally visible behavior change" is very often due to some NEW canister API.

Changes to EXISTING APIs are more likely to be "breaking".

If these changes are breaking, make sure that clients know how to migrate, how to
maintain their continuity of operations.

If your changes are behind a feature flag, then, do NOT add entrie(s) to
unreleased_changelog.md in this PR! But rather, add entrie(s) later, in the PR
that enables these changes in production.

Reference(s)

For a more comprehensive checklist, see here.

GOVERNANCE_CHECKLIST_REMINDER_DEDUP

@mraszyk

mraszyk commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

I believe this PR is not worth an entry in the changelog...

@zeropath-ai

zeropath-ai Bot commented Jun 15, 2026

Copy link
Copy Markdown

No security or compliance issues detected. Reviewed everything up to 183afe3.

Security Overview
Detected Code Changes
Change Type Relevant files
Refactor ► do_migrate_canisters.rs
    Remove target subnet existence validation in do_migrate_canisters
    Add test for successful migration when source subnet is deleted
    Add test for successful migration when target subnet does not exist
► routing_table.rs
    Remove assigned canister ranges if target subnet is not in routing table

.collect::<Result<Vec<_>, _>>()?;

// Intentionally do not validate that the target subnet exists: the migration orchestrator
// may still attempt to migrate canisters to/from a subnet that has been deleted.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

migrating to a deleted subnet is confusing (even migrating from a deleted canister is confusing, but less). Can you elaborate (in this comment)? Like, what is the eventual result?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Verify that migrate_canisters succeeds when the target subnet does not exist
// (has no key in subnet_list and no ranges in routing_table).
//
// We simulate this by using a subnet ID that is not in the routing table to begin with.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This describes the whole test, right? Such comments should appear BEFORE the test, just like in all other cases: when a (non-inline) comment talks about a thing, the comment appears before that thing (in this case, the test), not within it.

Looks like in the case of tests (e.g. test_basic_migrate_canisters), such comments are not triple slash, even though tests are technically functions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

#[test]
fn test_migrate_canisters_succeeds_if_target_subnet_does_not_exist() {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We discussed in this in Slack, but just so that this lives in the record of GitHub...)

It seems like such requests should be blocked, not result in canister deletion, because that would be a surprising data loss, the worst kind of surprise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subnet deletion is (currently) a manual process and we don't expect to actually delete a subnet while a canister migration is ongoing (instead we expect to disable the canister migration API first and only delete a subnet once there's no ongoing canister migration). However, e.g., if a subnet is permanently stuck, it might be impossible to successfully complete an ongoing canister migration and in this case, we'd like to still be able to delete such a subnet. This leaves us with two options:

  • adjust canister migration so that it can gracefully abort canister migration to a subnet that is stuck/deleted (that would very likely result in substantial engineering overhead and risk);
  • adjust canister migration so that performing canister migration from/to a stuck/deleted subnet (transparently) behaves as if canister migration completed before the subnet got stuck/deleted.

The second option requires less engineering overhead which seems like a good trade-off given that it is rather unlikely that we'll need this in practice.

@daniel-wong-dfinity-org-twin

daniel-wong-dfinity-org-twin commented Jun 18, 2026

Copy link
Copy Markdown

I would say this is a fix, not chore. Why: IIUC, people are already allowed to migrate when deletion is happening, but if they do that, the system ends up in an inconsistent state. Whereas, this change makes it so that such migrations do NOT result in inconsistent state.

Again, chore is for if you don't really care if this gets into production in a timely manner. But it seems to me, that you do care. If you care, don't use chore, because we more or less ignore chore PRs when considering whether an upgrade should be done.

Also, scope to registry.

@mraszyk mraszyk changed the title chore: support canister migration from/to deleted subnet in registry fix(registry): support canister migration from/to deleted subnet in registry Jun 18, 2026
@github-actions github-actions Bot added fix and removed chore labels Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants