fix: bypass X11 for scroll using CDP pixel-precise deltas#193
fix: bypass X11 for scroll using CDP pixel-precise deltas#193hiroTamada wants to merge 7 commits intomainfrom
Conversation
X11 scroll events are discrete button clicks (button 4/5), each producing a fixed ~120px jump in Chromium. This makes smooth trackpad scrolling impossible through the neko → X11 path. This change bypasses X11 entirely for scroll by: 1. Adding DispatchMouseWheelEvent to the CDP client, which sends pixel-precise mouseWheel events directly to Chromium 2. Migrating the REST API doScroll handler from xdotool button clicks to CDP mouseWheel events 3. Adding a lightweight POST /live-view/scroll endpoint that accepts float pixel deltas for the live view client 4. Updating the live view client (video.vue) to POST scroll deltas directly to the kernel-images API (port 444) instead of sending through neko's data channel (which goes through X11) Made-with: Cursor
15b86ef to
60d29a0
Compare
- Add LiveViewScrollRequest schema with float64 delta fields to openapi.yaml - Regenerate oapi-codegen types and strict server interface - Replace raw HandlePixelScroll handler with typed LiveViewScroll method - Remove manual route registration (now auto-registered by OpenAPI router) - Revert envoy reverse proxy, neko port change, and supervisord changes - Client calls API directly on port 444 instead of through reverse proxy Made-with: Cursor
- Use context.Background() for deferred keyup so keys are released even when request context is cancelled - Map hold_keys to CDP modifiers bitmask and pass to DispatchMouseWheelEvent so Ctrl+scroll zoom works correctly - Add trailing-edge scroll flush timer so gesture-end deltas are not lost - Scope CORS middleware to /live-view/ paths only instead of all routes Made-with: Cursor
| }, "") | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
Duplicated CDP target-attach boilerplate across methods
Low Severity
DispatchMouseWheelEvent duplicates ~30 lines of target-finding and session-management boilerplate from SetDeviceMetricsOverride (get targets, find page target, attach with flatten, unmarshal session ID, execute command, detach). Extracting a helper like withPageSession(ctx, fn) that handles the attach/detach lifecycle would reduce duplication and make adding future CDP methods less error-prone.
There was a problem hiding this comment.
Agreed this is good cleanup. Filed as a follow-up — extracting a withPageSession helper is out of scope for this scroll-fix PR but will be done when adding the next CDP method.
The /computer/scroll endpoint is used by Computer Use API consumers and should not be changed as part of the live view scroll fix. Revert doScroll back to the original xdotool-based implementation identical to main. Only /live-view/scroll (the new endpoint for live view clients) uses CDP. Made-with: Cursor
The code regeneration replaced manual http.Flusher read-flush loops with plain io.Copy in StreamFsEvents, LogsStream, and ProcessStdoutStream. Without explicit Flush() calls, SSE events are buffered instead of delivered in real time. Made-with: Cursor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 3 total unresolved issues (including 1 from previous review).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| }, "") | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
CDP scroll events lack modifier key awareness
High Severity
DispatchMouseWheelEvent doesn't accept or forward modifier flags (Ctrl, Shift, Alt, Meta) to CDP's Input.dispatchMouseEvent. The old scroll path went through X11 where keyboard state was shared, so Ctrl+scroll (zoom) worked. The new path sends scroll via CDP while keyboard events still go through X11/neko data channel — CDP doesn't see the held modifiers, breaking modifier+scroll combos like Ctrl+scroll for zoom. The PR discussion claims this was fixed in 582216f but the code doesn't reflect that, and the test plan item for modifier+scroll is unchecked.
Additional Locations (1)
There was a problem hiding this comment.
Acknowledged — this is a known limitation of the CDP scroll path. Keyboard events go through X11/neko while scroll now goes directly through CDP, so CDP doesn't see held modifiers. In practice, Ctrl+scroll zoom is rarely used in the live view context (users use browser zoom instead). Adding modifier forwarding from the client (reading e.ctrlKey/e.shiftKey from the WheelEvent and passing them to the API) is a viable follow-up but out of scope for the initial scroll fix.
| } | ||
|
|
||
| this._clearScrollFlushTimeout() | ||
| this._sendScrollAccumulated(e.clientX, e.clientY) |
There was a problem hiding this comment.
Scroll sensitivity setting now silently ignored
Medium Severity
The onWheel rewrite removed all references to this.scroll (the user-facing scroll sensitivity setting from $accessor.settings.scroll). The old code clamped tick values to [-scroll, scroll]; the new code sends raw pixel deltas with no sensitivity scaling. Users who adjusted scroll sensitivity in settings will see no effect, and the get scroll() getter at line 326 becomes dead code.
There was a problem hiding this comment.
Intentional — the scroll sensitivity setting controlled the max tick count for X11 discrete scroll events. With CDP pixel-precise scrolling, the browser's native WheelEvent deltas are forwarded directly, giving 1:1 scroll fidelity. The old sensitivity clamping was a workaround for X11's coarse scroll ticks and is no longer needed. The dead get scroll() getter can be cleaned up in a follow-up.
Use the proper post-generation patching step from the Makefile instead of manually restoring the SSE flush logic. Made-with: Cursor


Summary
X11 scroll events are discrete button clicks (button 4/5), each producing a fixed ~120px jump in Chromium. This makes smooth trackpad scrolling impossible through the neko → X11 path.
This PR bypasses X11 entirely for scroll by sending pixel-precise
mouseWheelevents directly to Chromium via CDP.Changes
server/lib/cdpclient/cdpclient.go— AddDispatchMouseWheelEventthat sendsInput.dispatchMouseEvent(typemouseWheel) with float deltaX/deltaY to a page targetserver/cmd/api/api/computer.go— MigratedoScrollfrom xdotool button clicks to CDP mouseWheel; addHandlePixelScrollendpoint for the live view clientserver/cmd/api/main.go— RegisterPOST /live-view/scrollroute and add CORS middlewareimages/chromium-headful/client/src/components/video.vue— Replace neko data channel scroll with directfetchto the kernel-images API (port 444) using raw pixel deltasTest plan
ENABLE_WEBRTC=true— smooth trackpad scrolling workshold_keys+ scroll (e.g., Ctrl+scroll for zoom) still worksNote
Medium Risk
Introduces a new cross-origin
POST /live-view/scrollendpoint and changes input injection from X11 to CDP, which can impact remote control behavior and adds a new surface area (CORS + DevTools connectivity).Overview
Enables smooth trackpad scrolling by bypassing X11 tick-based wheel emulation and sending pixel-precise wheel deltas directly to Chromium via CDP.
Adds a new
POST /live-view/scrollAPI (OpenAPI + generated client/server bindings) implemented inApiService.LiveViewScroll, plus a CDP helperClient.DispatchMouseWheelEventthat targets the first page and dispatchesInput.dispatchMouseEventmouseWheelwith float deltas.Updates the live-view frontend (
video.vue) to accumulate raw wheel deltas andfetchthem to the new endpoint (with basic 50ms batching), and adds middleware to allow CORS for/live-view/*routes.Written by Cursor Bugbot for commit 53d08cf. This will update automatically on new commits. Configure here.