Skip to content

feat(cli): Cursor scanner with multi-format token extraction#1

Merged
adamhenson merged 11 commits into
mainfrom
feat/local-sync
Jun 15, 2026
Merged

feat(cli): Cursor scanner with multi-format token extraction#1
adamhenson merged 11 commits into
mainfrom
feat/local-sync

Conversation

@adamhenson

Copy link
Copy Markdown
Contributor

Summary

  • Implements the Cursor scanner reading agent sessions from Cursor's global SQLite DB (state.vscdb)
  • Handles all three token storage formats Cursor has used across versions:
    • agentKv:bubbleCheckpoint:* (mid-era, protobuf): batched in 2 queries, decodes context tokens + model name
    • bubbleId:* JSON tokenCount (older): per-session BETWEEN range scans (B-tree indexed; LIKE was ~1s/session × 211 sessions = timeout)
    • composerData.contextTokensUsed (newest): final context-window size as fallback
  • Covers ~139 of 211 agent sessions locally (72 have no local token data)
  • All Cursor tokens marked isApproximate: true (subscription-based, no exact API billing data)
  • Adds isApproximate field to TokensSchema and threads it through api.ts
  • Mocks CursorScanner in sync tests to prevent hitting real Cursor data during CI
  • Bumps tsup target to node22 for node:sqlite support

Test plan

  • pnpm typecheck passes
  • pnpm test passes (29 tests)
  • node dist/index.js sync --dry-run --engine cursor returns sessions without timeout
  • launchctl restart picks up new scanner and syncs Cursor sessions to dashboard

Service was only writing AGENTMETER_API_KEY to environment variables,
so a custom API URL (e.g. localhost) was lost after reboot.
…ript source

When installed via pnpm cli (dev mode), process.argv[1] is a .ts file.
Using bare node to run it causes exit code 1. Detect the .ts case and
resolve the tsx binary so the service can actually start.
launchd runs with a minimal PATH that omits /usr/local/bin, so the tsx
shell wrapper could not exec node. Derive the node binary directory from
process.execPath and prepend it to PATH in EnvironmentVariables.
Previously only status changes triggered re-submission. A session whose
status stays 'success' throughout would keep stale token counts forever.
Storing endTime in sync-state and re-syncing when it changes ensures the
final token count is captured as the session grows and when it ends.
Claude Code writes an ai-title entry to the JSONL containing the
AI-generated tab title (e.g. 'Set up AgentMeter CLI re...'). Prefer
that over the first-user-message heuristic, which was often noisy.
…inutes

Sessions whose last JSONL entry is within 30 minutes are likely still open.
Using 'running' lets the dashboard show in-progress vs completed correctly,
and triggers a re-sync on the next poll cycle to capture updated tokens.
…faces

Adds JSDoc blocks to every exported and internal function, type alias,
and interface across packages/cli/src. Inline field docs added to all
interface and object-type declarations per the updated typescript.md rule.
Adds 'title' to SyncedSession so the persisted sync state tracks what
title was last submitted. classifySessions now treats a title change as
a re-sync trigger alongside status and endTime changes. This fixes the
drift where a session synced before Claude Code writes its ai-title
entry would keep the user-message-derived title in the dashboard forever.
Claude Code rewrites the ai-title entry multiple times as the conversation
progresses, with the final value being most accurate. Previously extractTitle
returned the first match, causing stale early titles to win over the refined
later ones.
Reads Cursor agent sessions from the global state DB and extracts token
counts across three storage formats that Cursor has used across versions:

- agentKv:bubbleCheckpoint:* (protobuf, mid-era): batch query all
  checkpoints, decode protobuf blobs to get context tokens and model name
- bubbleId:* JSON tokenCount (older era): per-session BETWEEN range scan
  using B-tree index (LIKE queries hit ~148K rows each and take ~1s/session)
- composerData.contextTokensUsed (newest era): final context-window size
  used as fallback when neither earlier format has data

All Cursor token counts are marked isApproximate: true since Cursor is
subscription-based and doesn't expose exact API billing data locally.
Adds isApproximate field to TokensSchema and threads it through api.ts.
@adamhenson adamhenson merged commit 199c68b into main Jun 15, 2026
9 of 10 checks passed
@adamhenson adamhenson deleted the feat/local-sync branch June 15, 2026 22:42

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat(cli): Cursor scanner with multi-format token extraction

The core scanner logic is well-structured: Zod schemas for all external data shapes, defensive parsing throughout, try/catch on every DB call, proper finally on globalDb.close(), noUncheckedIndexedAccess handled correctly across all array/map accesses. The protobuf decoder handles malformed input gracefully and the batch query approach to avoid LIKE per-session scans is the right call.

Two issues worth addressing:

Important — no unit tests for CursorScanner: 560+ lines of new code with zero test coverage. __tests__/scanners/ only has claude.test.ts. The pure functions (decodeBubbleCheckpoint, normalizeCursorModel) and _scanWithDb are the right starting points — see the inline comment for a sketch.

Notable — systemd ExecStart unescaped: programArgs.join(' ') breaks for paths with spaces; the plist version correctly isolates args in <string> tags. Low-probability but worth fixing for consistency.

No any types, no unhandled exceptions, no security issues found.

Generated by Agent: Code Review for issue #1 · ● 250.2K

* Token counts are summed from per-turn protobuf bubble checkpoints (approximate).
* Sessions with no bubble checkpoint data are skipped.
*/
async scan(): Promise<LocalSession[]> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing unit tests for CursorScanner

This PR adds 560+ lines of new scanner logic — protobuf decoding, three distinct token storage formats, model name normalization — but there's no cursor.test.ts (the __tests__/scanners/ directory only has claude.test.ts). The sync tests mock out CursorScanner entirely, so nothing exercises the real paths.

decodeBubbleCheckpoint and normalizeCursorModel are pure functions that can be tested directly with hex fixtures:

// __tests__/scanners/cursor.test.ts
import { describe, it, expect } from 'vitest';

describe('normalizeCursorModel', () => {
  it('converts Cursor claude naming to AgentMeter format', () => {
    expect(normalizeCursorModel('claude-4.5-sonnet')).toBe('claude-sonnet-4-5');
    expect(normalizeCursorModel('claude-3.7-sonnet-thinking')).toBe('claude-sonnet-3-7');
    expect(normalizeCursorModel('gpt-4o-high')).toBe('gpt-4o');
    expect(normalizeCursorModel('unknown-model')).toBe('unknown-model');
  });
});

describe('decodeBubbleCheckpoint', () => {
  it('returns nulls for an empty/malformed hex string', () => {
    expect(decodeBubbleCheckpoint('')).toEqual({ modelName: null, contextTokens: null });
    expect(decodeBubbleCheckpoint('not-hex')).toEqual({ modelName: null, contextTokens: null });
  });
});

_scanWithDb can be tested by mocking node:sqlite (via vi.mock('node:sqlite', ...)) and passing a DatabaseSync-shaped stub. The project pattern for this kind of thing is in claude.test.ts.

[Service]
Type=simple
ExecStart=${nodePath} ${binaryPath} watch
ExecStart=${programArgs.join(' ')}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExecStart breaks for paths containing spaces

programArgs.join(' ') produces a malformed systemd unit if any path component has a space. The macOS plist version handles this correctly — each arg is wrapped in its own <string> element. The Linux path doesn't get that protection.

which tsx can return paths like /home/user name/.nvm/versions/node/v22/bin/tsx on some systems, and the tsx shell wrapper itself can live in a directory with spaces.

Fix: quote args that contain spaces:

const escapedArgs = programArgs
  .map((a) => (a.includes(' ') ? `"\$\{a}"` : a))
  .join(' ');
// ExecStart=\$\{escapedArgs}

Same consideration applies to the Environment= lines if config.apiUrl were ever set to a URL with a space (unlikely but the plist escapes these with escapeXml, so it's worth being consistent).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant