Skip to content

feat(orchestrator): run ic-gateway as a side-car to the replica in cloud engines#10499

Draft
pierugo-dfinity wants to merge 22 commits into
masterfrom
pierugo/orchestrator/cloud-engine-ic-gateway
Draft

feat(orchestrator): run ic-gateway as a side-car to the replica in cloud engines#10499
pierugo-dfinity wants to merge 22 commits into
masterfrom
pierugo/orchestrator/cloud-engine-ic-gateway

Conversation

@pierugo-dfinity

@pierugo-dfinity pierugo-dfinity commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Cloud engines should be self-contained, but today their nodes rely on external routing to reach the replica. This PR runs ic-gateway as a side-car alongside the replica on each node, so the gateway terminates external traffic locally and proxies it to the replica.
Launching ic-gateway is still gated behind a flag and only happens on cloud engines, so there's no behavioral change for now.

Refactoring the orchestrator's process manager

The orchestrator's process management was written around a single process (the replica), with a parallel, partially-duplicated path for ic-boundary. Adding a third managed process the same way would have meant copying that logic again.
Instead, process management is now generic. The upgrade loop no longer knows what it's running — it just starts and stops a manager that owns the set of node processes. Adding a future process is a matter of describing how it's built and registering it; no changes to the upgrade loop.

Scope: this refactor is confined to the upgrade loop, i.e. processes whose lifetime is tied to a subnet. Unassigned nodes and API BNs are unaffected.

Testing

A new system test cloud_engine_ic_gateway_test (now still manual because the feature flag is disabled) hits api/v2/status on port 80 of each node, confirming ic-gateway is up and correctly proxying to the replica.

Implementation details for reviewers
  • The upgrade loop now holds a MultipleProcessesManager and calls start_all/stop_all. It's process-agnostic, except for stop_replica which is (arguably) still needed during recoveries.
  • All per-process logic (including ic-boundary) moved into processes.rs. Each Process declares how it's built via build(&Self::Config, Self::Args), separating static config from dynamic arguments that can change across the orchestrator's lifetime, likely derived from the registry.
  • ProcessManager<Process> centralizes the logging, metrics, and OrchestratorResult handling that was previously duplicated. MultipleProcessesManager holds several of these and decides what to start/stop (e.g. ic-gateway only on cloud engines).
  • ic-boundary is an exception as it has some additional logic when the node's domain changes. Keep that logic by introducing IcBoundaryManager, a wrapper over ProcessManager<IcBoundaryProcess> exposing ensure_ic_boundary_running_and_restarted_on_domain_change.

To add a new process: define NewProcessConfig and NewProcess implementing Process, add a ProcessManager<NewProcess> field to MultipleProcessesManager, and wire it into the start/stop methods.

@pierugo-dfinity pierugo-dfinity added the CI_ALL_BAZEL_TARGETS Runs all bazel targets label Jun 17, 2026
@github-actions github-actions Bot added the feat label Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI_ALL_BAZEL_TARGETS Runs all bazel targets feat

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant