Skip to content

Feature Request: browser automation plugin (kimi-browser) via MCP #945

@tomsen-ai

Description

@tomsen-ai

Motivation

Kimi Code currently has excellent tools for reading and modifying local code, plus WebSearch and FetchURL for text-based web access. However, there is no built-in or official-plugin way to interact with a web browser visually—e.g., navigate to a page, take a screenshot, click an element, fill a form, or verify a local web app in a headless browser.

Other coding agents are moving in this direction:

  • Codex ships a Computer Use plugin that can operate desktop apps and browsers (screen recording + accessibility permissions).
  • Claude exposes a native computer_use tool that combines screenshots with coordinate-based actions.
  • The community has produced MCP servers such as microsoft/playwright-mcp that expose browser control to agents.

For many dev tasks, a lightweight headless-browser capability is enough and avoids the heavy permissions of full desktop control:

  • Verify a checkout page after local changes.
  • Screenshot a UI component rendered in Storybook.
  • Fill out a form to reproduce a bug.
  • Scrape dynamic content that FetchURL cannot retrieve.

Proposal

Add an official Kimi Code plugin, tentatively named kimi-browser, that exposes browser automation tools through an MCP server backed by Playwright (or Puppeteer).

Plugin location

plugins/official/kimi-browser/
├── kimi.plugin.json
├── SKILL.md
└── bin/
    └── kimi-browser.mjs   # MCP server entry

Exposed MCP tools (initial set)

Tool Purpose
mcp__kimi-browser__navigate Open a URL in a headless browser context.
mcp__kimi-browser__screenshot Capture the current viewport or a specific element.
mcp__kimi-browser__click Click an element by selector or coordinates.
mcp__kimi-browser__type Type text into an input field.
mcp__kimi-browser__scroll Scroll the page or an element.
mcp__kimi-browser__evaluate Run a JS snippet in the page context and return the result.
mcp__kimi-browser__close Close the browser context and release resources.

How it fits Kimi Code

  • No core changes: the plugin only declares an MCP server in its manifest, matching the existing kimi-datasource pattern.
  • Reuses existing permission model: users approve mcp__kimi-browser__* calls just like any other MCP tool.
  • Optional install: users who do not need browser automation can simply not install it.
  • Cross-platform: headless Playwright works on macOS, Linux, and Windows without requiring screen-recording or accessibility permissions.

Security considerations

  • Browser automation can interact with signed-in sessions and external sites, so all tool calls should require explicit approval by default.
  • The plugin should default to headless mode and isolate each session (clean context, no persistent cookies unless configured).
  • Network access should respect the user environment; the plugin should not bypass proxy or firewall settings.

Scope questions for maintainers

Before I start implementing a PR, I would like to confirm a few things:

  1. Is an official kimi-browser plugin aligned with Kimi Code’s roadmap, or is browser/computer-use automation being handled differently?
  2. Should the plugin ship its own browser binary via playwright install, or should it expect the user to have Chromium/Chrome already installed?
  3. Are there naming conventions or manifest requirements for official plugins beyond what kimi-datasource demonstrates?
  4. Would the team prefer a minimal initial PR (e.g., navigate + screenshot only) or a more complete tool set from the start?

I am happy to iterate on the design and provide a proof-of-concept once the direction is confirmed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions