Skip to content

fix: bypass X11 for scroll using CDP pixel-precise deltas#193

Open
hiroTamada wants to merge 7 commits intomainfrom
hiro/cdp-scroll-bypass-x11
Open

fix: bypass X11 for scroll using CDP pixel-precise deltas#193
hiroTamada wants to merge 7 commits intomainfrom
hiro/cdp-scroll-bypass-x11

Conversation

@hiroTamada
Copy link
Copy Markdown
Contributor

@hiroTamada hiroTamada commented Mar 27, 2026

Summary

X11 scroll events are discrete button clicks (button 4/5), each producing a fixed ~120px jump in Chromium. This makes smooth trackpad scrolling impossible through the neko → X11 path.

This PR bypasses X11 entirely for scroll by sending pixel-precise mouseWheel events directly to Chromium via CDP.

Changes

  • server/lib/cdpclient/cdpclient.go — Add DispatchMouseWheelEvent that sends Input.dispatchMouseEvent (type mouseWheel) with float deltaX/deltaY to a page target
  • server/cmd/api/api/computer.go — Migrate doScroll from xdotool button clicks to CDP mouseWheel; add HandlePixelScroll endpoint for the live view client
  • server/cmd/api/main.go — Register POST /live-view/scroll route and add CORS middleware
  • images/chromium-headful/client/src/components/video.vue — Replace neko data channel scroll with direct fetch to the kernel-images API (port 444) using raw pixel deltas

Test plan

  • Tested locally in Docker with ENABLE_WEBRTC=true — smooth trackpad scrolling works
  • Deployed as unikernel to KraftCloud dev metro — confirmed smooth scroll
  • Verify scroll works in headless mode (direct API calls)
  • Verify hold_keys + scroll (e.g., Ctrl+scroll for zoom) still works

Note

Medium Risk
Introduces a new cross-origin POST /live-view/scroll endpoint and changes input injection from X11 to CDP, which can impact remote control behavior and adds a new surface area (CORS + DevTools connectivity).

Overview
Enables smooth trackpad scrolling by bypassing X11 tick-based wheel emulation and sending pixel-precise wheel deltas directly to Chromium via CDP.

Adds a new POST /live-view/scroll API (OpenAPI + generated client/server bindings) implemented in ApiService.LiveViewScroll, plus a CDP helper Client.DispatchMouseWheelEvent that targets the first page and dispatches Input.dispatchMouseEvent mouseWheel with float deltas.

Updates the live-view frontend (video.vue) to accumulate raw wheel deltas and fetch them to the new endpoint (with basic 50ms batching), and adds middleware to allow CORS for /live-view/* routes.

Written by Cursor Bugbot for commit 53d08cf. This will update automatically on new commits. Configure here.

X11 scroll events are discrete button clicks (button 4/5), each producing
a fixed ~120px jump in Chromium. This makes smooth trackpad scrolling
impossible through the neko → X11 path.

This change bypasses X11 entirely for scroll by:

1. Adding DispatchMouseWheelEvent to the CDP client, which sends
   pixel-precise mouseWheel events directly to Chromium
2. Migrating the REST API doScroll handler from xdotool button clicks
   to CDP mouseWheel events
3. Adding a lightweight POST /live-view/scroll endpoint that accepts
   float pixel deltas for the live view client
4. Updating the live view client (video.vue) to POST scroll deltas
   directly to the kernel-images API (port 444) instead of sending
   through neko's data channel (which goes through X11)

Made-with: Cursor
@hiroTamada hiroTamada force-pushed the hiro/cdp-scroll-bypass-x11 branch from 15b86ef to 60d29a0 Compare March 27, 2026 20:42
- Add LiveViewScrollRequest schema with float64 delta fields to openapi.yaml
- Regenerate oapi-codegen types and strict server interface
- Replace raw HandlePixelScroll handler with typed LiveViewScroll method
- Remove manual route registration (now auto-registered by OpenAPI router)
- Revert envoy reverse proxy, neko port change, and supervisord changes
- Client calls API directly on port 444 instead of through reverse proxy

Made-with: Cursor
- Use context.Background() for deferred keyup so keys are released even
  when request context is cancelled
- Map hold_keys to CDP modifiers bitmask and pass to
  DispatchMouseWheelEvent so Ctrl+scroll zoom works correctly
- Add trailing-edge scroll flush timer so gesture-end deltas are not lost
- Scope CORS middleware to /live-view/ paths only instead of all routes

Made-with: Cursor
}, "")

return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated CDP target-attach boilerplate across methods

Low Severity

DispatchMouseWheelEvent duplicates ~30 lines of target-finding and session-management boilerplate from SetDeviceMetricsOverride (get targets, find page target, attach with flatten, unmarshal session ID, execute command, detach). Extracting a helper like withPageSession(ctx, fn) that handles the attach/detach lifecycle would reduce duplication and make adding future CDP methods less error-prone.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed this is good cleanup. Filed as a follow-up — extracting a withPageSession helper is out of scope for this scroll-fix PR but will be done when adding the next CDP method.

@hiroTamada hiroTamada requested review from Sayan- and rgarcia March 30, 2026 17:02
The /computer/scroll endpoint is used by Computer Use API consumers and
should not be changed as part of the live view scroll fix. Revert doScroll
back to the original xdotool-based implementation identical to main.

Only /live-view/scroll (the new endpoint for live view clients) uses CDP.

Made-with: Cursor
The code regeneration replaced manual http.Flusher read-flush loops with
plain io.Copy in StreamFsEvents, LogsStream, and ProcessStdoutStream.
Without explicit Flush() calls, SSE events are buffered instead of
delivered in real time.

Made-with: Cursor
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

}, "")

return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CDP scroll events lack modifier key awareness

High Severity

DispatchMouseWheelEvent doesn't accept or forward modifier flags (Ctrl, Shift, Alt, Meta) to CDP's Input.dispatchMouseEvent. The old scroll path went through X11 where keyboard state was shared, so Ctrl+scroll (zoom) worked. The new path sends scroll via CDP while keyboard events still go through X11/neko data channel — CDP doesn't see the held modifiers, breaking modifier+scroll combos like Ctrl+scroll for zoom. The PR discussion claims this was fixed in 582216f but the code doesn't reflect that, and the test plan item for modifier+scroll is unchecked.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — this is a known limitation of the CDP scroll path. Keyboard events go through X11/neko while scroll now goes directly through CDP, so CDP doesn't see held modifiers. In practice, Ctrl+scroll zoom is rarely used in the live view context (users use browser zoom instead). Adding modifier forwarding from the client (reading e.ctrlKey/e.shiftKey from the WheelEvent and passing them to the API) is a viable follow-up but out of scope for the initial scroll fix.

}

this._clearScrollFlushTimeout()
this._sendScrollAccumulated(e.clientX, e.clientY)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scroll sensitivity setting now silently ignored

Medium Severity

The onWheel rewrite removed all references to this.scroll (the user-facing scroll sensitivity setting from $accessor.settings.scroll). The old code clamped tick values to [-scroll, scroll]; the new code sends raw pixel deltas with no sensitivity scaling. Users who adjusted scroll sensitivity in settings will see no effect, and the get scroll() getter at line 326 becomes dead code.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional — the scroll sensitivity setting controlled the max tick count for X11 discrete scroll events. With CDP pixel-precise scrolling, the browser's native WheelEvent deltas are forwarded directly, giving 1:1 scroll fidelity. The old sensitivity clamping was a workaround for X11's coarse scroll ticks and is no longer needed. The dead get scroll() getter can be cleaned up in a follow-up.

Use the proper post-generation patching step from the Makefile instead
of manually restoring the SSE flush logic.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant