feat(cli): Cursor scanner with multi-format token extraction#1
Conversation
Service was only writing AGENTMETER_API_KEY to environment variables, so a custom API URL (e.g. localhost) was lost after reboot.
…ript source When installed via pnpm cli (dev mode), process.argv[1] is a .ts file. Using bare node to run it causes exit code 1. Detect the .ts case and resolve the tsx binary so the service can actually start.
launchd runs with a minimal PATH that omits /usr/local/bin, so the tsx shell wrapper could not exec node. Derive the node binary directory from process.execPath and prepend it to PATH in EnvironmentVariables.
Previously only status changes triggered re-submission. A session whose status stays 'success' throughout would keep stale token counts forever. Storing endTime in sync-state and re-syncing when it changes ensures the final token count is captured as the session grows and when it ends.
Claude Code writes an ai-title entry to the JSONL containing the AI-generated tab title (e.g. 'Set up AgentMeter CLI re...'). Prefer that over the first-user-message heuristic, which was often noisy.
…inutes Sessions whose last JSONL entry is within 30 minutes are likely still open. Using 'running' lets the dashboard show in-progress vs completed correctly, and triggers a re-sync on the next poll cycle to capture updated tokens.
…faces Adds JSDoc blocks to every exported and internal function, type alias, and interface across packages/cli/src. Inline field docs added to all interface and object-type declarations per the updated typescript.md rule.
Adds 'title' to SyncedSession so the persisted sync state tracks what title was last submitted. classifySessions now treats a title change as a re-sync trigger alongside status and endTime changes. This fixes the drift where a session synced before Claude Code writes its ai-title entry would keep the user-message-derived title in the dashboard forever.
Claude Code rewrites the ai-title entry multiple times as the conversation progresses, with the final value being most accurate. Previously extractTitle returned the first match, causing stale early titles to win over the refined later ones.
Reads Cursor agent sessions from the global state DB and extracts token counts across three storage formats that Cursor has used across versions: - agentKv:bubbleCheckpoint:* (protobuf, mid-era): batch query all checkpoints, decode protobuf blobs to get context tokens and model name - bubbleId:* JSON tokenCount (older era): per-session BETWEEN range scan using B-tree index (LIKE queries hit ~148K rows each and take ~1s/session) - composerData.contextTokensUsed (newest era): final context-window size used as fallback when neither earlier format has data All Cursor token counts are marked isApproximate: true since Cursor is subscription-based and doesn't expose exact API billing data locally. Adds isApproximate field to TokensSchema and threads it through api.ts.
There was a problem hiding this comment.
Review: feat(cli): Cursor scanner with multi-format token extraction
The core scanner logic is well-structured: Zod schemas for all external data shapes, defensive parsing throughout, try/catch on every DB call, proper finally on globalDb.close(), noUncheckedIndexedAccess handled correctly across all array/map accesses. The protobuf decoder handles malformed input gracefully and the batch query approach to avoid LIKE per-session scans is the right call.
Two issues worth addressing:
Important — no unit tests for CursorScanner: 560+ lines of new code with zero test coverage. __tests__/scanners/ only has claude.test.ts. The pure functions (decodeBubbleCheckpoint, normalizeCursorModel) and _scanWithDb are the right starting points — see the inline comment for a sketch.
Notable — systemd ExecStart unescaped: programArgs.join(' ') breaks for paths with spaces; the plist version correctly isolates args in <string> tags. Low-probability but worth fixing for consistency.
No any types, no unhandled exceptions, no security issues found.
Generated by Agent: Code Review for issue #1 · ● 250.2K
| * Token counts are summed from per-turn protobuf bubble checkpoints (approximate). | ||
| * Sessions with no bubble checkpoint data are skipped. | ||
| */ | ||
| async scan(): Promise<LocalSession[]> { |
There was a problem hiding this comment.
Missing unit tests for CursorScanner
This PR adds 560+ lines of new scanner logic — protobuf decoding, three distinct token storage formats, model name normalization — but there's no cursor.test.ts (the __tests__/scanners/ directory only has claude.test.ts). The sync tests mock out CursorScanner entirely, so nothing exercises the real paths.
decodeBubbleCheckpoint and normalizeCursorModel are pure functions that can be tested directly with hex fixtures:
// __tests__/scanners/cursor.test.ts
import { describe, it, expect } from 'vitest';
describe('normalizeCursorModel', () => {
it('converts Cursor claude naming to AgentMeter format', () => {
expect(normalizeCursorModel('claude-4.5-sonnet')).toBe('claude-sonnet-4-5');
expect(normalizeCursorModel('claude-3.7-sonnet-thinking')).toBe('claude-sonnet-3-7');
expect(normalizeCursorModel('gpt-4o-high')).toBe('gpt-4o');
expect(normalizeCursorModel('unknown-model')).toBe('unknown-model');
});
});
describe('decodeBubbleCheckpoint', () => {
it('returns nulls for an empty/malformed hex string', () => {
expect(decodeBubbleCheckpoint('')).toEqual({ modelName: null, contextTokens: null });
expect(decodeBubbleCheckpoint('not-hex')).toEqual({ modelName: null, contextTokens: null });
});
});_scanWithDb can be tested by mocking node:sqlite (via vi.mock('node:sqlite', ...)) and passing a DatabaseSync-shaped stub. The project pattern for this kind of thing is in claude.test.ts.
| [Service] | ||
| Type=simple | ||
| ExecStart=${nodePath} ${binaryPath} watch | ||
| ExecStart=${programArgs.join(' ')} |
There was a problem hiding this comment.
ExecStart breaks for paths containing spaces
programArgs.join(' ') produces a malformed systemd unit if any path component has a space. The macOS plist version handles this correctly — each arg is wrapped in its own <string> element. The Linux path doesn't get that protection.
which tsx can return paths like /home/user name/.nvm/versions/node/v22/bin/tsx on some systems, and the tsx shell wrapper itself can live in a directory with spaces.
Fix: quote args that contain spaces:
const escapedArgs = programArgs
.map((a) => (a.includes(' ') ? `"\$\{a}"` : a))
.join(' ');
// ExecStart=\$\{escapedArgs}Same consideration applies to the Environment= lines if config.apiUrl were ever set to a URL with a space (unlikely but the plist escapes these with escapeXml, so it's worth being consistent).
Summary
state.vscdb)agentKv:bubbleCheckpoint:*(mid-era, protobuf): batched in 2 queries, decodes context tokens + model namebubbleId:*JSONtokenCount(older): per-session BETWEEN range scans (B-tree indexed; LIKE was ~1s/session × 211 sessions = timeout)composerData.contextTokensUsed(newest): final context-window size as fallbackisApproximate: true(subscription-based, no exact API billing data)isApproximatefield toTokensSchemaand threads it throughapi.tsCursorScannerin sync tests to prevent hitting real Cursor data during CInode22fornode:sqlitesupportTest plan
pnpm typecheckpassespnpm testpasses (29 tests)node dist/index.js sync --dry-run --engine cursorreturns sessions without timeoutlaunchctlrestart picks up new scanner and syncs Cursor sessions to dashboard