Skip to content

desktop(mesh-llm): let a serving node route a different model#833

Merged
tlongwell-block merged 2 commits into
mainfrom
eva/mesh-serve-route-other-model
Jun 3, 2026
Merged

desktop(mesh-llm): let a serving node route a different model#833
tlongwell-block merged 2 commits into
mainfrom
eva/mesh-serve-route-other-model

Conversation

@tlongwell-block

Copy link
Copy Markdown
Collaborator

Why

A user asked: why can't I use a different model than the one I'm hosting? The answer turned out to be a Sprout-side policy, not a mesh constraint.

ensure_client_node_for_model rejected any running Serve runtime with "this desktop is currently sharing compute; stop sharing before using relay mesh as a client". That made "host model A" and "use model B" mutually exclusive.

But mesh-llm's API proxy already resolves a request to a local, remote, or split target at request time (route_missing_local_modelhosts_for_model). A serving node's 9337 ingress can route a model it doesn't host locally to a peer. The serve-vs-client split was a blunt UI/runtime policy layered on top of a router that never needed it.

What

desktop/src-tauri/src/commands/mesh_llm.rs — treat a running runtime (in any mode) as the local 9337 OpenAI ingress and hand it back. If the caller selected a target, dial it first so the node joins the chosen peer's mesh.

if let Some(runtime) = runtime.as_ref() {
    if let Some(endpoint_addr) = /* selected target, trimmed */ {
        runtime.dial_endpoint_addr(endpoint_addr).await?;  // join the chosen peer
    }
    return runtime.status().await;  // agent uses 9337; router decides per request
}

Two deliberate calls:

  • No /v1/models preflight probe. It races model gossip and would wrongly reject freshly-discovered remote/split models. Routability is left to the request path, which already returns a real error. (/api/model-targets exists in mesh-llm but isn't surfaced in Sprout's SDK and still isn't a per-request guarantee.)
  • Preserve the dial side effect. A first cut that just deleted the guard dropped dial_endpoint_addr — a serve runtime not yet on the target's mesh would fail its first inference while the frontend had already published call-me-now to the peer. Dialing the selected target restores the join.

justfile — pass --features mesh-llm directly to tauri dev/tauri build. TAURI_CLI_EXTRA_CARGO_ARGS is not a recognized Tauri v2 env var, so the feature was silently dropped and desktop dev/staging/build ran without mesh-llm. Fixed all three recipes (dev, staging, build).

Tests

  • New #[ignore] hardware regression in commands/mesh_llm.rs: a real Serve runtime + a different-model preflight returns Ok with mode: Serve and the same api_base_url — exactly the case the old guard forbade. PASS.
  • Full sprout-desktop --features mesh-llm suite: 436 passed, 0 failed (the hardware test is correctly ignored by default).
  • mesh_signaling: 14/14. compile + clippy clean (lib + tests).

Known gaps

  1. The Some(endpoint) dial path isn't asserted to fire in a unit test — dialing a bogus endpoint is flaky, and a live second peer hit gap Initial release — Sprout Nostr relay with enterprise extensions #2. The branch is small and source-obvious; covered by the live create-agent flow.
  2. The live "serve A, route B to a real peer" end-to-end isn't re-proven here. A two-process harness hit a mesh-llm downloader/readiness bug (peer's model download stalled at 0%, then a 5s startup-shutdown timeout + Metal teardown assert) — HF itself is reachable, so it's orthogonal to this change. The routing premise is proven by source + the existing mesh_serve_client_smoke. Worth a separate look (possibly relevant to split-model startup).

Reviewed twice by Max (mesh internals) before opening.

@tlongwell-block tlongwell-block requested a review from a team as a code owner June 3, 2026 17:47
ensure_client_node_for_model rejected any running Serve runtime with
"stop sharing before using relay mesh as a client". That made hosting
one model and using another mutually exclusive — but mesh-llm's router
already resolves a request to a local, remote, or split target at request
time (route_missing_local_model -> hosts_for_model). The mode split was a
Sprout policy, not a mesh constraint.

Treat a running runtime (any mode) as the local 9337 OpenAI ingress and
hand it back. If the caller selected a target, still dial it first so the
node joins the chosen peer's mesh — skipping that join was a latent bug:
a serve runtime not yet on the target's mesh would fail its first
inference while the frontend had already signalled the peer to expect us.
No /v1/models probe: it races model gossip and would wrongly reject
freshly-discovered remote/split models.

justfile: pass --features mesh-llm directly to tauri dev/build. The
TAURI_CLI_EXTRA_CARGO_ARGS env var is not a recognized Tauri v2 setting,
so the feature was silently dropped and desktop dev/staging/build ran
WITHOUT mesh-llm.

Adds an #[ignore] hardware regression: a real Serve runtime + a
different-model preflight returns Ok with the same 9337 ingress.

Signed-off-by: npub1qyvc0c5kl4gqv2fd97fsk46tu378sqgy35vc83rvgfwne90sel7s0ed67d <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
@tlongwell-block tlongwell-block force-pushed the eva/mesh-serve-route-other-model branch from be28ff3 to 8b62277 Compare June 3, 2026 17:47
Signed-off-by: npub1mprnacetjua2xx3p5eddmhxyk6wv929ymm5py8kd2xfxurxahspqqlgyta <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co>
@tlongwell-block tlongwell-block merged commit 9f0c22a into main Jun 3, 2026
16 checks passed
@tlongwell-block tlongwell-block deleted the eva/mesh-serve-route-other-model branch June 3, 2026 18:30
michaelneale added a commit that referenced this pull request Jun 4, 2026
* origin/main: (36 commits)
  fix: use immutable commit-SHA URLs in screenshot PR comments (#842)
  feat(mobile+desktop): two-tier Slack-style app icon badge (#802)
  chore: simplify file-size check to a flat 1000-line limit (#839)
  fix(desktop): robust emoji picker — unify picker + fix custom emoji in editing, status, reactions (#837)
  feat(desktop): reusable screenshot workflow for agents (#826)
  desktop(mesh-llm): let a serving node route a different model (#833)
  chore(release): release version 0.3.9 (#832)
  fix: native arbitrary-file download + image context-menu flash (#830)
  fix(desktop): custom emoji reaction rendering + picker autofocus (#831)
  Mesh-LLM v1: relay-gated direct-iroh inference between users (WAN) (#822)
  chore(release): release version 0.3.8 (#829)
  chore(release): release version 0.3.7 (#825)
  feat: code block rendering, syntax highlighting, and compose fixes (#803)
  feat: custom emoji — user-owned NIP-30 sets with a client-side union (#816)
  Install sprout-cli skill at repo root + fix desktop clippy (#818)
  fix(desktop): use public re-export path for ensure_client_node_for_model (#824)
  refactor(desktop): feature-gate mesh-llm-sdk behind optional Cargo feature (#823)
  fix(desktop): align workflow read/save commands to the frontend contract (#820)
  fix(desktop): disable mesh-llm auto-build to prevent git config corruption (#819)
  fix(desktop): clear clippy lints in agents/mesh_llm commands (#817)
  ...

# Conflicts:
#	Cargo.lock
#	desktop/scripts/check-file-sizes.mjs
#	desktop/src-tauri/Cargo.toml
#	desktop/src/app/AppShell.tsx
#	desktop/src/app/AppTopChrome.tsx
#	desktop/src/features/messages/hooks.ts
#	desktop/src/features/workspaces/useWorkspaceInit.ts
#	desktop/src/shared/api/tauri.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant