feat: libscope/lite — embeddable semantic search library

## Problem Statement

libscope is currently only usable as a CLI tool or MCP server. There is no way to embed it as a library into another application. Concrete use case: a Bitbucket MCP server wants semantic search over repository files, Jira tickets, and Confluence pages — it should be able to call `libscope.index(docs)` and `libscope.search(query)` directly without spawning a subprocess or running an HTTP server.

Additionally, there is no support for indexing source code files. The current chunker is markdown-heading-aware, which works well for documentation but produces poor-quality chunks for code. A code-aware chunker splitting at function/class boundaries is needed.

## Goals

- New `LibScopeLite` class exported from `libscope/lite`
- Caller-controlled persistence: `:memory:` default, file path for cross-session reuse
- All existing embedding providers supported: local (xenova), openai, ollama
- `index(docs)` — accepts pre-parsed content OR raw input types (HTML, PDF, DOCX, etc.)
- `search(query)` — hybrid vector+FTS search, same engine as full libscope
- `getContext(question)` — RAG context retrieval without synthesis (for embedding in external LLMs)
- `ask(question)` — RAG with passthrough mode (returns context for external LLM to synthesize)
- `askStream(question)` — streaming RAG responses
- `indexBatch(docs, { concurrency })` — batch indexing with concurrency control
- `rate(docId, score)` — record feedback signal
- Code indexing: tree-sitter AST chunking for TypeScript, JavaScript, Python, Go
- Input normalization: parser layer converts HTML/PDF/DOCX/plaintext → markdown before indexing
- Complete user docs and developer docs updates

## Non-Goals

- No CLI commands for lite mode (use full libscope for that)
- No MCP server entrypoint (future enhancement)
- No bundled connectors (Confluence, Notion, etc.) — caller supplies documents
- No topics, packs, webhooks, or registry in lite mode

## Public API Design

\`\`\`typescript
// libscope/lite
import { LibScopeLite } from 'libscope/lite';

// --- Construction ---
const scope = new LibScopeLite({
  provider: 'local' | 'openai' | 'ollama',   // embedding provider
  apiKey?: string,                             // for openai
  ollamaUrl?: string,                          // for ollama
  ollamaModel?: string,
  dbPath?: string,                             // default: ':memory:'
  llm?: {                                      // for ask() — optional
    provider: 'openai' | 'ollama' | 'anthropic' | 'passthrough',
    apiKey?: string,
    model?: string,
  },
});

// Must call init() before use (async provider/DB setup)
await scope.init();

// --- Indexing ---
await scope.index({
  title: string,
  content: string,           // pre-normalized text/markdown
  url?: string,
  sourceType?: string,       // defaults to 'manual'
  language?: string,         // hint for code chunking
});

// Or provide raw content + type hint
await scope.indexRaw({
  title: string,
  content: Buffer | string,  // raw HTML, PDF, DOCX, source code, etc.
  contentType: 'html' | 'pdf' | 'docx' | 'markdown' | 'plaintext' | 'code',
  language?: string,         // for contentType: 'code'
  url?: string,
  sourceType?: string,
});

// Batch indexing with concurrency control
await scope.indexBatch(files.map(f => ({
  title: f.path,
  content: f.content,
  contentType: 'code',
  language: 'typescript',
})), { concurrency: 4 });

// --- Search ---
const results = await scope.search('how does auth work?', {
  limit?: number,            // default 10
  sourceType?: string,
  minRating?: number,
});

// --- RAG: Context Retrieval ---
const { context, sources } = await scope.getContext('what does auth depend on?', {
  topK?: number,             // chunks to retrieve, default 5
});
// Use context in your own LLM prompt

// --- RAG: Full Answer ---
const answer = await scope.ask('What does the deploy process look like?', {
  topK?: number,
});
// returns { answer: string, sources: RagSource[], model: string }

// --- RAG: Streaming ---
for await (const event of scope.askStream('How does...?')) {
  if (event.token) console.log(event.token);
  if (event.done) console.log('Sources:', event.sources);
}

// --- Feedback ---
scope.rate(docId: string, score: 1 | -1): void;

// --- Lifecycle ---
scope.close(): void;
\`\`\`

## Implementation Plan

### Phase 1: Core lite entrypoint (new files)

**New file: \`src/lite/index.ts\`** — the public export
**New file: \`src/lite/core.ts\`** — LibScopeLite class  
**New file: \`src/lite/types.ts\`** — exported types

The entire implementation delegates to existing core functions from \`src/core/\`. LibScopeLite is a thin, dependency-injection-friendly wrapper around indexing, search, and RAG functions.

### Phase 2: Input normalization layer

**New file: \`src/lite/normalize.ts\`** — normalizeInput(input: LiteRawIndexInput): string

Dispatches to existing parsers by contentType:
- \`'html'\` → HtmlParser.parse(Buffer)
- \`'pdf'\` → PdfParser.parse(Buffer)
- \`'docx'\` → WordParser.parse(Buffer)
- \`'markdown'\` | \`'plaintext'\` → UTF-8 decode, pass through
- \`'code'\` → new CodeParser

### Phase 3: Code-aware chunking (tree-sitter)

**New file: \`src/core/parsers/code.ts\`** — CodeParser implementing DocumentParser

Converts source code to annotated markdown format that the existing \`chunkContent()\` function can split at heading boundaries. Tree-sitter is an optional peer dependency with graceful error handling.

**New file: \`src/core/chunk-code.ts\`** — chunkCode(source: string, language: string): string

Takes raw source code, returns annotated markdown with headings per symbol.

Supported languages: TypeScript, JavaScript, Python, Go (phase 1); Rust, Java, C++ (future)

### Phase 4: package.json exports update

Add to \`package.json\` exports field:
\`\`\`json
"./lite": {
  "import": "./dist/lite/index.js",
  "require": "./dist/lite/index.js"
}
\`\`\`

### Phase 5: Tests

- \`tests/unit/lite.test.ts\` — LibScopeLite init, index, search, rate
- \`tests/unit/code-chunker.test.ts\` — chunkCode() for TypeScript, Python, fallback
- \`tests/integration/lite-embed.test.ts\` — full index → search → ask workflow

### Phase 6: Documentation updates

- \`docs/lite.md\` — User guide: installation, quick start, API reference, code indexing guide
- \`docs/code-indexing.md\` — How code indexing works, supported languages, extending
- \`src/lite/README.md\` — Module overview, design decisions
- \`CLAUDE.md\` — Update project structure to include lite module

## Critical Files

| File | Action |
|------|--------|
| \`src/lite/index.ts\` | NEW — public export |
| \`src/lite/core.ts\` | NEW — LibScopeLite class |
| \`src/lite/types.ts\` | NEW — exported types |
| \`src/lite/normalize.ts\` | NEW — input normalization dispatch |
| \`src/core/parsers/code.ts\` | NEW — CodeParser (tree-sitter → annotated markdown) |
| \`src/core/chunk-code.ts\` | NEW — chunkCode() helper |
| \`src/core/indexing.ts\` | REUSE — indexDocument() called directly |
| \`src/core/search.ts\` | REUSE — searchDocuments() called directly |
| \`src/core/rag.ts\` | REUSE — askQuestion(), askQuestionStream() |
| \`package.json\` | MODIFY — add \`./lite\` export |
| \`docs/lite.md\` | NEW — user docs |
| \`tests/unit/lite.test.ts\` | NEW |
| \`tests/integration/lite-embed.test.ts\` | NEW |

## Primary Use Case: Bitbucket MCP PR Enhancement

The first concrete embedding target is a Bitbucket MCP server that enhances PR review quality while reducing token usage:

1. On repo connect: \`indexBatch(repoFiles, { concurrency: 4 })\` with persistent DB
2. On PR review: \`getContext("what does the changed auth module depend on?")\` → returns top-k chunks
3. MCP injects those chunks into its own LLM prompt — no second LLM call to libscope
4. Instead of sending 50 files of context, the MCP sends 5 highly-relevant chunks

## Resolved Design Decisions

1. **Tree-sitter: optional peer dep** — clear error message if not installed
2. **Separate getContext() method** — for embedding into external LLMs (primary use case)
3. **askStream() included in v1** — streaming is useful for agent-to-agent communication
4. **indexBatch() included in v1** — required for the Bitbucket repo indexing use case
5. **Persistent SQLite is practical default** — callers should pass dbPath pointing to a project-local file

## Verification

1. \`npm run typecheck\` — no new errors
2. \`npm run lint\` — no new errors
3. \`npm test\` — all new tests pass including lite integration test
4. Manual smoke test with local provider
5. Code indexing smoke test with TypeScript file
6. SonarCloud: check duplication density on new files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: libscope/lite — embeddable semantic search library #451

Problem Statement

Goals

Non-Goals

Public API Design

Implementation Plan

Phase 1: Core lite entrypoint (new files)

Phase 2: Input normalization layer

Phase 3: Code-aware chunking (tree-sitter)

Phase 4: package.json exports update

Phase 5: Tests

Phase 6: Documentation updates

Critical Files

Primary Use Case: Bitbucket MCP PR Enhancement

Resolved Design Decisions

Verification

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

File	Action
`src/lite/index.ts`	NEW — public export
`src/lite/core.ts`	NEW — LibScopeLite class
`src/lite/types.ts`	NEW — exported types
`src/lite/normalize.ts`	NEW — input normalization dispatch
`src/core/parsers/code.ts`	NEW — CodeParser (tree-sitter → annotated markdown)
`src/core/chunk-code.ts`	NEW — chunkCode() helper
`src/core/indexing.ts`	REUSE — indexDocument() called directly
`src/core/search.ts`	REUSE — searchDocuments() called directly
`src/core/rag.ts`	REUSE — askQuestion(), askQuestionStream()
`package.json`	MODIFY — add `./lite` export
`docs/lite.md`	NEW — user docs
`tests/unit/lite.test.ts`	NEW
`tests/integration/lite-embed.test.ts`	NEW

feat: libscope/lite — embeddable semantic search library #451

Description

Problem Statement

Goals

Non-Goals

Public API Design

Implementation Plan

Phase 1: Core lite entrypoint (new files)

Phase 2: Input normalization layer

Phase 3: Code-aware chunking (tree-sitter)

Phase 4: package.json exports update

Phase 5: Tests

Phase 6: Documentation updates

Critical Files

Primary Use Case: Bitbucket MCP PR Enhancement

Resolved Design Decisions

Verification

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions