Skip to content

feat: MCP async task support for long-running ingest operations #485

@RobertLD

Description

@RobertLD

Summary

The MCP November 2025 spec update added first-class async task support. libscope's MCP server currently runs all operations synchronously — ingest, re-index, and connector sync block the client for their full duration. For large document sets or slow embedding models, this times out or degrades MCP client UX. Adopting async tasks makes these operations non-blocking and observable.

Why

  • Indexing a large document or running a connector sync can take 10–60+ seconds with a local embedding model. Blocking the MCP transport for this is a bad experience.
  • The MCP spec now provides tasks/create, tasks/get, and tasks/cancel primitives for exactly this use case.
  • Async ingest would also enable progress reporting (chunks processed, embeddings queued) which is currently invisible to MCP clients.
  • This is spec compliance work, not a new concept — the design is prescribed by the protocol.

Affected MCP tools

These tools should become async-capable:

Tool Current Target
index_document sync async (returns task ID immediately)
reindex_library sync async
sync_connector sync async
install_pack sync async

Tools that are inherently fast (search, get_document, list_documents, rate_chunk) remain synchronous.

Proposed design

Task lifecycle

client calls index_document(url, content)
  → MCP server returns { taskId: "task_abc123", status: "queued" }
  → client polls tasks/get(taskId) or subscribes to task events
  → server streams progress: { status: "running", progress: { chunks: 12, total: 45 } }
  → server completes: { status: "done", result: { documentId: "doc_xyz", chunkCount: 45 } }

Implementation approach

  1. Add a lightweight in-memory task queue (src/mcp/tasks.ts) tracking status, progress, and result per task ID.
  2. Wrap long-running tool handlers to enqueue work and return a task reference immediately.
  3. Register tasks/get and tasks/cancel MCP handlers.
  4. Emit progress updates via MCP notifications during indexing (hook into src/core/indexing.ts progress callbacks).

Task store interface

interface Task {
  id: string;
  tool: string;
  status: "queued" | "running" | "done" | "failed" | "cancelled";
  progress?: { current: number; total: number; message?: string };
  result?: unknown;
  error?: string;
  createdAt: string;
  updatedAt: string;
}

Tasks are in-memory only (no persistence across server restarts). TTL: 1 hour after completion.

Backward compatibility

Async behavior should be opt-in at the tool call level via a parameter:

{ "tool": "index_document", "arguments": { "url": "...", "async": true } }

Default remains synchronous (async: false) so existing integrations aren't broken.

Acceptance criteria

  • src/mcp/tasks.ts — in-memory task store with TTL cleanup
  • tasks/get MCP handler: returns task status and progress
  • tasks/cancel MCP handler: cancels queued/running tasks
  • index_document supports async: true parameter — returns task reference
  • reindex_library supports async: true
  • sync_connector supports async: true
  • install_pack supports async: true
  • Progress events emitted during indexing (chunks embedded, total expected)
  • Existing synchronous behavior unchanged when async is not set or false
  • Unit tests for task store (create, progress update, complete, cancel, TTL cleanup)
  • Integration test: async index_document returns task ID, polling resolves to completion

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions