feat: libscope/lite — embeddable semantic search library (#451) by RobertLD · Pull Request #452 · RobertLD/libscope

RobertLD · 2026-03-19T17:01:54Z

Summary

Introduces LibScopeLite, exported from libscope/lite, for embedding semantic search directly into external apps (e.g. Bitbucket MCP) without spawning a subprocess or HTTP server
Adds tree-sitter code-aware chunker (src/lite/chunker-treesitter.ts) supporting TypeScript, JavaScript, Python, and Go — tree-sitter is an optional peer dependency
Adds input normalization layer (src/lite/normalize.ts) dispatching HTML/PDF/DOCX/plaintext to existing parsers
49 new tests (18 unit/chunker, 21 unit/lite, 10 integration) — 1492 total passing

API

const lite = new LibScopeLite({ dbPath: ':memory:', provider })
await lite.indexBatch(repoFiles, { concurrency: 4 })
const context = await lite.getContext('How does auth work?')

Test plan

npm run format:check — clean
npm run lint — 40 errors (baseline, no new errors)
npm run typecheck — no new errors
npm test — 1492/1492 passing
npm run build — dist/lite/index.js present
SonarCloud quality gate (CI)

Root cause note

MockEmbeddingProvider.hashToVector had a zero-hash collapse bug for long strings (200+ chars) that produced NaN vectors. sqlite-vec returns null distance for NaN vectors, causing DatabaseError in vectorSearch. Fixed with a non-zero seed and zero-magnitude guard.

Closes #451

🤖 Generated with Claude Code

Introduces `LibScopeLite`, a lightweight embeddable class exported from `libscope/lite` for use in external applications (e.g. Bitbucket MCP) without spawning a subprocess or HTTP server. New files: - src/lite/index.ts — public entrypoint, exports LibScopeLite + types - src/lite/core.ts — LibScopeLite class implementation - src/lite/types.ts — LiteDoc, SearchResult, ContextOptions, etc. - src/lite/normalize.ts — input normalization (HTML/PDF/DOCX/plaintext → markdown) - src/lite/chunker-treesitter.ts — tree-sitter code chunker (TS/JS/Python/Go) - tests/unit/lite.test.ts — 21 unit tests for LibScopeLite - tests/unit/code-chunker.test.ts — 18 unit tests for tree-sitter chunker - tests/integration/lite-embed.test.ts — 10 integration tests (index→search→getContext→rate) API: index(), indexRaw(), indexBatch(), search(), getContext(), ask(), askStream(), rate(), close(). Tree-sitter is an optional peer dependency with graceful error messaging if not installed. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Four issues found during pre-push validation: 1. MockEmbeddingProvider.hashToVector: hash could collapse to 0 for long content (e.g. 200+ chars), producing a NaN vector. sqlite-vec returns null distance for NaN vectors, causing DatabaseError in vectorSearch. Fixed: non-zero seed (5381) + djb2-style mixing + zero-mag guard. 2. LibScopeLite.core.ts: replaced inline sqlite-vec require with createDatabase() from db/connection.ts — reuses the battle-tested extension loading path and avoids duplicating setup logic. 3. LibScopeLite.core.ts: added optional db injection to LiteOptions so tests and callers can supply a pre-configured Database instance. 4. Test lint fixes: floating promise in lite-embed, unused import and recursive ReturnType in code-chunker, unbound-method and async-without- await in lite.test.ts. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

vercel · 2026-03-19T17:02:03Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
libscope	Ignored	Preview	Mar 19, 2026 5:29pm

…lete Resolves @typescript-eslint/unbound-method lint error — accessing a method as an unbound property before passing to vi.mocked() is flagged. Using vi.mocked(obj).method keeps the reference bound to its object. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

- S7735 (core.ts:32): flip negated condition — use === undefined, not !== - S7721 (lite.test.ts:222): move fakeStream generator to module scope - S2933 (chunker-treesitter.ts:95): mark grammarCache as readonly - S3776 (chunker-treesitter.ts:164): reduce extractChunks complexity 25→7 by extracting flushDeclaration() helper - S3776 (chunker-treesitter.ts:232): reduce splitLargeNode complexity 22→4 by extracting accumulateNamedChildren() helper Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

vi.mocked(mockLlm).complete passes the method reference unbound. Switch to vi.mocked(mockLlm.complete) — accessing the property on the object before passing to vi.mocked avoids the ESLint rule. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Assigning vi.fn() to a variable before placing it in the LlmProvider object means we never reference completeSpy as mockLlm.complete — the ESLint rule only fires when a method is accessed from an object. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…xing New VitePress pages: - docs/guide/lite.md — full user guide: constructor options, indexing, search, RAG (getContext/ask/askStream), code indexing, rate(), lifecycle, integration pattern for external MCP servers - docs/guide/code-indexing.md — tree-sitter chunking guide: installation, supported languages, node types, preamble accumulation, large-node splitting, caching, error handling, full directory-indexing example - docs/reference/lite-api.md — complete TypeScript API reference for LibScopeLite and TreeSitterChunker with all types, options, and examples Updated existing docs: - docs/.vitepress/config.ts — add LibScope Lite, Code Indexing (sidebar Integrations) and LibScope Lite API (sidebar Reference) - docs/guide/architecture.md — add lite/ to system layers diagram, module map, and LibScope Lite layer section - docs/guide/programmatic-usage.md — tip callout pointing to libscope/lite - CLAUDE.md — add src/lite/ to project structure - README.md — add LibScopeLite and TreeSitterChunker to SDK section with examples Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-03-19T17:30:36Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.7% Duplication on New Code

See analysis details on SonarQube Cloud

RobertLD

Reviewed the libscope/lite implementation. Four critical bugs found — three in production code, one off-by-one in line number output.

RobertLD · 2026-03-19T17:48:40Z

src/lite/core.ts

+          if (!doc) break;
+          idx++;
+          activeCount++;
+          void this.index([doc]).finally(() => {


Bug: indexBatch silently discards errors.

void this.index([doc]).finally(...) drops any rejection from this.index(). The .finally() callback receives no argument — it cannot tell whether the preceding promise succeeded or failed. The Promise<void> wrapping this has no reject path, so if any document fails to embed/index, indexBatch still resolves successfully, silently losing the failure.

The fix is to pass a reject callback to the outer Promise constructor and call it inside .finally() (or use .then(runNext, (err) => reject(err))). At minimum the error should be surfaced:

await new Promise<void>((resolve, reject) => { const runNext = (): void => { while (activeCount < concurrency && idx < docs.length) { const doc = docs[idx]; if (!doc) break; idx++; activeCount++; void this.index([doc]).then( () => { activeCount--; if (idx >= docs.length && activeCount === 0) resolve(); else runNext(); }, (err: unknown) => reject(err), ); } }; runNext(); });

RobertLD · 2026-03-19T17:48:40Z

src/lite/core.ts

+        });
+        if (i === 0) firstId = result.id;
+      }
+      return firstId;


Bug: indexRaw returns "" (empty string) as a document ID when the first chunk fails.

firstId is initialised to "" and only set on i === 0. If indexDocument throws on the first chunk, the for loop exits via the thrown exception (which propagates correctly), but if it throws on a later chunk the function has already assigned a real ID and this path is fine. The real problem is a subtler semantic issue: if normalized.chunks has length > 1 but the loop body somehow completes without entering if (i === 0) — impossible today but fragile — "" is returned as a valid ID. More importantly, the API contract says indexRaw returns the document ID of the newly created document, but when multiple chunks are indexed they each get their own document ID. Only the first is returned; the others are silently unreachable by the caller. This is a data loss bug for multi-chunk files: the caller cannot rate, retrieve, or delete the documents for chunks 2–N.

Consider either (a) returning all IDs as string[], or (b) indexing all chunks as a single document with pre-split content.

RobertLD · 2026-03-19T17:48:40Z

src/lite/chunker-treesitter.ts

+ * producing semantically meaningful chunks suitable for embedding.
+ */
+export class TreeSitterChunker {
+  private parserCache: TSParser | undefined;


Race condition: shared mutable parser state across concurrent chunk() calls.

parserCache holds a single TSParser instance. parser.setLanguage(grammar) on line 128 mutates the parser's active grammar. If chunk() is called concurrently for two different languages (e.g., "typescript" and "python"), the sequence can be:

Coroutine A: getParser() → returns cached parser

Coroutine B: getParser() → returns same cached parser

Coroutine A: parser.setLanguage(typescriptGrammar)

Coroutine B: parser.setLanguage(pythonGrammar) ← overwrites A's language

Coroutine A: parser.parse(tsSource) ← parsed with Python grammar → wrong AST

This is realistic when indexBatch runs concurrency > 1 and the batch contains mixed-language files that go through indexRaw→normalizeRawInput→chunker.chunk().

The parser must either be cloned/re-created per chunk() call, or language be serialised (one active language at a time), or one parser instance be kept per language.

RobertLD · 2026-03-19T17:48:40Z

src/lite/chunker-treesitter.ts

+      chunks.push({
+        content: preamble,
+        startLine: preambleStartLine ?? startLine,
+        endLine: child.startPosition.row,


Off-by-one: preamble endLine is 0-based while every other endLine in the same chunk array is 1-based.

endLine: child.startPosition.row, // line 223 — missing +1

Every other place in this file converts row to a 1-based line number with row + 1 (lines 184, 194, 214, 244, 254, 265, 269, 280). Only the preamble chunk emitted inside the large-node path uses the raw 0-based row value. For example, if the preamble ends just before a class starting at row 10 (1-based line 11), the preamble chunk will report endLine: 10 instead of endLine: 11, making the range appear to end one line early and leaving line 11 unaccounted for in any line-range display or navigation built on top of these chunks.

RobertLD and others added 2 commits March 19, 2026 16:38

RobertLD and others added 4 commits March 19, 2026 17:04

RobertLD requested a review from Copilot March 19, 2026 17:22

RobertLD commented Mar 19, 2026

View reviewed changes

RobertLD merged commit 022b958 into main Mar 19, 2026
10 checks passed

RobertLD deleted the feat/libscope-lite-451 branch March 19, 2026 18:03

github-actions bot mentioned this pull request Mar 19, 2026

chore(main): release 1.6.0 #421

Merged

RobertLD restored the feat/libscope-lite-451 branch March 19, 2026 19:08

RobertLD review requested due to automatic review settings March 23, 2026 22:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: libscope/lite — embeddable semantic search library (#451)#452

feat: libscope/lite — embeddable semantic search library (#451)#452
RobertLD merged 7 commits intomainfrom
feat/libscope-lite-451

RobertLD commented Mar 19, 2026

Uh oh!

vercel bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Mar 19, 2026

Uh oh!

RobertLD left a comment

Uh oh!

RobertLD Mar 19, 2026

Uh oh!

RobertLD Mar 19, 2026

Uh oh!

RobertLD Mar 19, 2026

Uh oh!

RobertLD Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RobertLD commented Mar 19, 2026

Summary

API

Test plan

Root cause note

Uh oh!

vercel bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented Mar 19, 2026

Quality Gate passed

Uh oh!

RobertLD left a comment

Choose a reason for hiding this comment

Uh oh!

RobertLD Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

RobertLD Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

RobertLD Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

RobertLD Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Mar 19, 2026 •

edited

Loading