feat(orchestrator): run ic-gateway as a side-car to the replica in cloud engines#10499
Draft
pierugo-dfinity wants to merge 22 commits into
Draft
feat(orchestrator): run ic-gateway as a side-car to the replica in cloud engines#10499pierugo-dfinity wants to merge 22 commits into
ic-gateway as a side-car to the replica in cloud engines#10499pierugo-dfinity wants to merge 22 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cloud engines should be self-contained, but today their nodes rely on external routing to reach the replica. This PR runs
ic-gatewayas a side-car alongside the replica on each node, so the gateway terminates external traffic locally and proxies it to the replica.Launching
ic-gatewayis still gated behind a flag and only happens on cloud engines, so there's no behavioral change for now.Refactoring the orchestrator's process manager
The orchestrator's process management was written around a single process (the replica), with a parallel, partially-duplicated path for
ic-boundary. Adding a third managed process the same way would have meant copying that logic again.Instead, process management is now generic. The upgrade loop no longer knows what it's running — it just starts and stops a manager that owns the set of node processes. Adding a future process is a matter of describing how it's built and registering it; no changes to the upgrade loop.
Scope: this refactor is confined to the upgrade loop, i.e. processes whose lifetime is tied to a subnet. Unassigned nodes and API BNs are unaffected.
Testing
A new system test
cloud_engine_ic_gateway_test(now stillmanualbecause the feature flag is disabled) hitsapi/v2/statuson port 80 of each node, confirmingic-gatewayis up and correctly proxying to the replica.Implementation details for reviewers
MultipleProcessesManagerand callsstart_all/stop_all. It's process-agnostic, except forstop_replicawhich is (arguably) still needed during recoveries.ic-boundary) moved intoprocesses.rs. EachProcessdeclares how it's built viabuild(&Self::Config, Self::Args), separating static config from dynamic arguments that can change across the orchestrator's lifetime, likely derived from the registry.ProcessManager<Process>centralizes the logging, metrics, andOrchestratorResulthandling that was previously duplicated.MultipleProcessesManagerholds several of these and decides what to start/stop (e.g.ic-gatewayonly on cloud engines).ic-boundaryis an exception as it has some additional logic when the node's domain changes. Keep that logic by introducingIcBoundaryManager, a wrapper overProcessManager<IcBoundaryProcess>exposingensure_ic_boundary_running_and_restarted_on_domain_change.To add a new process: define
NewProcessConfigandNewProcessimplementingProcess, add aProcessManager<NewProcess>field toMultipleProcessesManager, and wire it into the start/stop methods.