Switch remote backend from polling to streaming SSE#112
Open
JadenFiotto-Kaufman wants to merge 7 commits into
Open
Switch remote backend from polling to streaming SSE#112JadenFiotto-Kaufman wants to merge 7 commits into
JadenFiotto-Kaufman wants to merge 7 commits into
Conversation
StreamingRemoteBackend is a subclass of nnsight's RemoteBackend that defers submission and status-waiting so the caller can forward each status update to the browser over Server-Sent Events. - __call__ (sync, fired from trace/session __exit__) captures the tracer and serializes the request; no I/O. - __aiter__ opens an async WebSocket, stamps the session id into the headers, async-POSTs the submit via httpx.AsyncClient, and yields ResponseModels from the socket as they arrive. - On COMPLETED: download the result (replacing response.data with the decoded dict), yield once more, stop. - On ERROR: yield, then raise RemoteException. workbench/_api/sse.py factors the shared frame formatter, the async generator that drives the backend, and small helpers for single-value local-mode streams and error-only streams. state.py exposes the new backend via make_streaming_backend(model); make_backend is kept so any remaining legacy callers still compile.
Replace the old /start + /results/{job_id} pair for each tool with a
single POST /run endpoint that streams status events from NDIF
directly to the client and emits one final data event carrying the
formatted ToolData. The browser no longer polls NDIF for status.
The endpoint flow:
backend = state.make_streaming_backend(model=model)
tool._run(model, ..., remote=True, backend=backend)
async for response in backend:
if response.status == COMPLETED:
raw = response.data # dict of save-keyed tensors
raw.update({tokenizer, input_tokens, model_name, ...})
yield data event (tool._format(raw, ...))
else:
yield status event
Convert the remaining polling endpoints — /models/start-prediction,
/models/start-generate, /lens/start-line, /lens/start-grid — and
their companion /results/{job_id} routes into single /run-* SSE
endpoints using the same pattern as the tool routes.
These routes don't use the nnsightful Tool class, so each one splits
its existing function into a _trace_* (runs model.trace / model.generate
and saves outputs) and a _format_* (builds the response from the raw
dict or the live tensors returned locally). In remote mode, the
routes iterate the streaming backend; in local mode they call the
trace directly and emit a single data event.
Existing telemetry milestones are preserved at STARTED / 403 ERROR;
the READY / COMPLETE milestones previously logged against the job id
are dropped because the id isn't assigned until iteration begins.
Add lib/runAndStream.ts — a fetch + ReadableStream SSE client that
parses status/data/error frames, forwards job status to
useWorkspace.setJobStatus, resolves on the data frame, and throws on
the error frame or a stream that ends without one.
Repoint every API hook at the new /run-* endpoints and remove the
polling helper:
- lensApi.ts useLens2 → /logit_lens/run
- activationPatchingApi.ts useActivationPatching → /activation_patching/run
- modelsApi.ts usePrediction → /models/run-prediction
useGenerate → /models/run-generate
- chartApi.ts useLensLine → /lens/run-line
useLensGrid → /lens/run-grid
config.ts: collapse the start* + results* endpoint pairs into single
run* entries and drop ndifStatusUrl; the browser no longer talks to
NDIF directly. lib/startAndPoll.ts is deleted.
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Merged
5 tasks
The duplicated _async_submit in StreamingRemoteBackend existed only because the parent RemoteBackend didn't expose an async submit. With ndif-team/nnsight#648 merged, we can delete it and call super().async_submit_request() directly — which also means the job_id bookkeeping (self.job_id = response.id) is handled by the parent, not open-coded here. Depends on ndif-team/nnsight#648.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace the old two-step polling flow (HTTP submit → browser polls NDIF → HTTP fetch-results) with a single SSE endpoint per interpretability tool. The FastAPI route opens one stream to the client, forwards each NDIF status update as it arrives, downloads the final result inside the backend, formats it, and emits a single terminal `data` event. The browser no longer talks to NDIF directly.
Depends on ndif-team/nnsight#648 (splits `submit_request` / `handle_response` and adds `async_submit_request`) and AdamBelfki3/nnsightful#2 (tools return the backend when provided).
How it works
`workbench/_api/streaming_backend.py::StreamingRemoteBackend` subclasses nnsight's `RemoteBackend`:
`workbench/_api/sse.py` holds the shared SSE helpers (`sse_event`, `stream_backend`, `stream_value`, `stream_error`). Each route then reduces to:
```python
backend = state.make_streaming_backend(model=model)
tool._run(model, ..., remote=True, backend=backend) # primes backend
def process(raw):
raw["tokenizer"] = ... # local context
return tool._format(raw, ...) # -> ToolData / BaseModel
return StreamingResponse(stream_backend(backend, process), media_type=MEDIA_TYPE)
```
For non-`Tool` routes (predictions, generation, lens line/grid) the same pattern applies — each route factors its inline `model.trace` into a `trace` and `format` pair.
Endpoints
Collapsed pairs into single `/run*` SSE routes:
Each stream emits N×`event: status` frames (with the `ResponseModel` minus its `data` field) followed by one terminal frame — either `event: data` with the JSON-encoded payload or `event: error` with an `{error}` object.
Frontend
Commit layout
Live verification
Smoke-tested end-to-end against api.ndif.us with `openai-community/gpt2`: `POST /logit_lens/run` streams RECEIVED → QUEUED → DISPATCHED → RUNNING status frames, then one `data` frame carrying the full `LogitLensData`. No errors.
Test plan