Skip to content

Switch remote backend from polling to streaming SSE#112

Open
JadenFiotto-Kaufman wants to merge 7 commits into
mainfrom
feat/sse-remote-backend
Open

Switch remote backend from polling to streaming SSE#112
JadenFiotto-Kaufman wants to merge 7 commits into
mainfrom
feat/sse-remote-backend

Conversation

@JadenFiotto-Kaufman
Copy link
Copy Markdown
Member

@JadenFiotto-Kaufman JadenFiotto-Kaufman commented Apr 20, 2026

Summary

Replace the old two-step polling flow (HTTP submit → browser polls NDIF → HTTP fetch-results) with a single SSE endpoint per interpretability tool. The FastAPI route opens one stream to the client, forwards each NDIF status update as it arrives, downloads the final result inside the backend, formats it, and emits a single terminal `data` event. The browser no longer talks to NDIF directly.

Depends on ndif-team/nnsight#648 (splits `submit_request` / `handle_response` and adds `async_submit_request`) and AdamBelfki3/nnsightful#2 (tools return the backend when provided).

How it works

`workbench/_api/streaming_backend.py::StreamingRemoteBackend` subclasses nnsight's `RemoteBackend`:

  • `call(tracer)` (sync, fires from `trace` / `session` `exit`) captures the tracer and serializes the request — no I/O.
  • `aiter()` (async) opens a `socketio.AsyncSimpleClient`, stamps its session id into the headers, async-POSTs via the parent's new `async_submit_request`, and yields `ResponseModel`s as they come in. On `COMPLETED` it downloads the result via the parent's `async_get_result`, replaces `response.data` with the decoded dict, yields once more, and stops. On `ERROR` it yields and raises `RemoteException`.

`workbench/_api/sse.py` holds the shared SSE helpers (`sse_event`, `stream_backend`, `stream_value`, `stream_error`). Each route then reduces to:

```python
backend = state.make_streaming_backend(model=model)
tool._run(model, ..., remote=True, backend=backend) # primes backend

def process(raw):
raw["tokenizer"] = ... # local context
return tool._format(raw, ...) # -> ToolData / BaseModel

return StreamingResponse(stream_backend(backend, process), media_type=MEDIA_TYPE)
```

For non-`Tool` routes (predictions, generation, lens line/grid) the same pattern applies — each route factors its inline `model.trace` into a `trace` and `format` pair.

Endpoints

Collapsed pairs into single `/run*` SSE routes:

Before After
`POST /logit_lens/start` + `POST /logit_lens/results/{job_id}` `POST /logit_lens/run`
`POST /activation_patching/start` + `.../results/{job_id}` `POST /activation_patching/run`
`POST /models/start-prediction` + `.../results-prediction/{job_id}` `POST /models/run-prediction`
`POST /models/start-generate` + `.../results-generate/{job_id}` `POST /models/run-generate`
`POST /lens/start-line` + `.../results-line/{job_id}` `POST /lens/run-line`
`POST /lens/start-grid` + `.../results-grid/{job_id}` `POST /lens/run-grid`

Each stream emits N×`event: status` frames (with the `ResponseModel` minus its `data` field) followed by one terminal frame — either `event: data` with the JSON-encoded payload or `event: error` with an `{error}` object.

Frontend

  • New `workbench/_web/src/lib/runAndStream.ts` — a `fetch` + `ReadableStream` SSE parser. Dispatches `status` frames to `useWorkspace.setJobStatus`, resolves the promise on the `data` frame, throws on `error` or a premature end-of-stream.
  • `useLens2`, `useActivationPatching`, `usePrediction`, `useGenerate`, `useLensLine`, `useLensGrid` all now call `runAndStream` against the new endpoints.
  • `config.ts` collapses the old `start*` + `results*` entries into single `run*` entries and drops `ndifStatusUrl` — the browser no longer polls NDIF.
  • `lib/startAndPoll.ts` is deleted.

Commit layout

  1. Add StreamingRemoteBackend + SSE helpers — `streaming_backend.py`, `sse.py`, `state.make_streaming_backend`.
  2. Migrate logit_lens + activation_patching routes to SSE — tool-based routes.
  3. Migrate models + lens routes to SSE — non-tool routes.
  4. Switch frontend from polling to SSE — `runAndStream` + rewire + delete `startAndPoll`.
  5. StreamingRemoteBackend: use parent async_submit_request — cleanup once Split submit_request from handle_response; add async submit + get_response nnsight#648 is in.

Live verification

Smoke-tested end-to-end against api.ndif.us with `openai-community/gpt2`: `POST /logit_lens/run` streams RECEIVED → QUEUED → DISPATCHED → RUNNING status frames, then one `data` frame carrying the full `LogitLensData`. No errors.

Test plan

  • Local mode (`REMOTE=false`): logit-lens, activation-patching, prediction, generate, lens line, lens grid each return their expected payload via a single `data` SSE event.
  • Remote mode (`REMOTE=true`): logit-lens on gpt2 streams status frames and delivers a final data frame matching the pre-change output.
  • Other five remote flows verified end-to-end.
  • Remote mode: simulated NDIF error propagates as an `event: error` frame and surfaces on the client.
  • Browser network tab shows one long-lived request per run (no polling loop, no direct calls to `api.ndif.us`).
  • Chart thumbnails still upload on success (Lens2 path doesn't require the capture; V1 lens path does).

StreamingRemoteBackend is a subclass of nnsight's RemoteBackend that
defers submission and status-waiting so the caller can forward each
status update to the browser over Server-Sent Events.

- __call__ (sync, fired from trace/session __exit__) captures the
  tracer and serializes the request; no I/O.
- __aiter__ opens an async WebSocket, stamps the session id into the
  headers, async-POSTs the submit via httpx.AsyncClient, and yields
  ResponseModels from the socket as they arrive.
- On COMPLETED: download the result (replacing response.data with the
  decoded dict), yield once more, stop.
- On ERROR: yield, then raise RemoteException.

workbench/_api/sse.py factors the shared frame formatter, the async
generator that drives the backend, and small helpers for single-value
local-mode streams and error-only streams.

state.py exposes the new backend via make_streaming_backend(model);
make_backend is kept so any remaining legacy callers still compile.
Replace the old /start + /results/{job_id} pair for each tool with a
single POST /run endpoint that streams status events from NDIF
directly to the client and emits one final data event carrying the
formatted ToolData. The browser no longer polls NDIF for status.

The endpoint flow:

    backend = state.make_streaming_backend(model=model)
    tool._run(model, ..., remote=True, backend=backend)
    async for response in backend:
        if response.status == COMPLETED:
            raw = response.data        # dict of save-keyed tensors
            raw.update({tokenizer, input_tokens, model_name, ...})
            yield data event (tool._format(raw, ...))
        else:
            yield status event
Convert the remaining polling endpoints — /models/start-prediction,
/models/start-generate, /lens/start-line, /lens/start-grid — and
their companion /results/{job_id} routes into single /run-* SSE
endpoints using the same pattern as the tool routes.

These routes don't use the nnsightful Tool class, so each one splits
its existing function into a _trace_* (runs model.trace / model.generate
and saves outputs) and a _format_* (builds the response from the raw
dict or the live tensors returned locally). In remote mode, the
routes iterate the streaming backend; in local mode they call the
trace directly and emit a single data event.

Existing telemetry milestones are preserved at STARTED / 403 ERROR;
the READY / COMPLETE milestones previously logged against the job id
are dropped because the id isn't assigned until iteration begins.
Add lib/runAndStream.ts — a fetch + ReadableStream SSE client that
parses status/data/error frames, forwards job status to
useWorkspace.setJobStatus, resolves on the data frame, and throws on
the error frame or a stream that ends without one.

Repoint every API hook at the new /run-* endpoints and remove the
polling helper:

- lensApi.ts          useLens2            → /logit_lens/run
- activationPatchingApi.ts  useActivationPatching → /activation_patching/run
- modelsApi.ts        usePrediction       → /models/run-prediction
                      useGenerate         → /models/run-generate
- chartApi.ts         useLensLine         → /lens/run-line
                      useLensGrid         → /lens/run-grid

config.ts: collapse the start* + results* endpoint pairs into single
run* entries and drop ndifStatusUrl; the browser no longer talks to
NDIF directly. lib/startAndPoll.ts is deleted.
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
workbench Ready Ready Preview, Comment Apr 20, 2026 9:07pm

Request Review

The duplicated _async_submit in StreamingRemoteBackend existed only
because the parent RemoteBackend didn't expose an async submit. With
ndif-team/nnsight#648 merged, we can delete it and call
super().async_submit_request() directly — which also means the
job_id bookkeeping (self.job_id = response.id) is handled by the
parent, not open-coded here.

Depends on ndif-team/nnsight#648.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant