Feature: `probe map` -- Repository Structure Overview Command

## Problem Statement

When an AI agent (or human) encounters an unfamiliar codebase, the first question is always: "What's in this repo?" Currently, there's no way to get a structural overview from probe without searching for something specific. The agent has to either:
- Guess search terms blindly
- Use the NPM `listFiles` tool (flat, one-level directory listing, no code structure)
- Run `find` or `ls -R` externally (no semantic information)

Other tools solve this: ABCoder's `get_repo_structure` / `get_package_structure`, Stakgraph's `repo_map` MCP tool, and Octocode's `view` command. Probe needs its own version that fits its zero-setup, instant, AST-aware philosophy.

## Proposed Solution

A new `probe map` CLI command that returns a hierarchical view of a codebase with top-level symbol signatures, using the existing tree-sitter infrastructure. No indexing, no setup -- same instant behavior as `probe search`.

## Existing Building Blocks

The codebase already has most of the pieces:

1. **`extract_all_symbols_from_file()`** in `src/extract/processor.rs:884` -- **DEAD CODE** that already:
   - Parses a file with tree-sitter
   - Iterates root-level children
   - Filters by `is_acceptable_parent()`
   - Calls `get_symbol_signature()` for each symbol
   - Returns `Vec<SearchResult>` with `symbol_signature` populated
   - Just needs to be exposed and extended

2. **`get_symbol_signature()`** implemented for 8 languages in `src/language/`:
   - Rust (`rust.rs:142`): functions, structs, impls, traits, enums, consts, statics, types, macros
   - TypeScript (`typescript.rs:106`)
   - JavaScript (`javascript.rs:97`)
   - Python (`python.rs:64`)
   - Go (`go.rs:151`)
   - YAML (`yaml.rs:191`)
   - Markdown (`markdown.rs:178`)
   - HTML (`html.rs:329`)

3. **`file_list_cache`** in `src/search/file_list_cache.rs` -- cached `.gitignore`-aware file listing with language filtering

4. **Token counting** via tiktoken (`src/search/search_tokens.rs:333`) -- can enforce `--max-tokens` on map output

5. **`ParentContext`** model in `src/models.rs:25` -- already represents scope hierarchy

## CLI Interface

```bash
# Basic: map the current directory
probe map .

# Map a specific subdirectory
probe map ./src/search

# Filter by language
probe map ./src --language rust

# Control depth: how many directory levels deep
probe map ./src --depth 2

# Control detail level
probe map ./src --detail signatures    # default: symbol signatures
probe map ./src --detail files         # files only, no symbols
probe map ./src --detail full          # signatures + first doc comment line

# Token budget (critical for AI agents)
probe map ./src --max-tokens 4000

# Output formats (reuse existing infrastructure)
probe map ./src --format outline       # default
probe map ./src --format json
probe map ./src --format xml

# Ignore patterns (reuse existing --ignore flag)
probe map ./src --ignore "test*" --ignore "*.generated.*"

# Exclude test files (reuse existing --allow-tests flag, inverted default)
probe map ./src               # excludes test files by default
probe map ./src --allow-tests # includes test files
```

## Expected Output Formats

### Outline Format (default) -- `--detail signatures`

```
src/
  search/
    search_runner.rs
      pub fn perform_probe(options: &SearchOptions) -> Result<Vec<SearchResult>>
      pub fn search_with_structured_patterns(...) -> Result<HashMap<PathBuf, ...>>
      fn process_file_with_results(...) -> Result<Vec<SearchResult>>
    result_ranking.rs
      pub fn rank_search_results(results: &mut Vec<SearchResult>, ...) -> Result<()>
    search_output.rs
      pub fn format_and_print_search_results(results: &LimitedSearchResults, ...) -> Result<String>
      pub fn collect_parent_context_for_line(...) -> Vec<ParentContext>
      fn format_and_print_outline_results(...) -> Result<String>
    elastic_query.rs
      pub enum Expr
      pub fn parse_query(query: &str) -> Result<Expr>
    block_merging.rs
      pub fn merge_ranked_blocks(results: Vec<SearchResult>, ...) -> Vec<SearchResult>
    cache.rs
      pub struct SearchCache
      pub fn new(session_id: &str) -> Self
    filters.rs
      pub struct SearchFilters
    tokenization.rs
      pub fn tokenize(text: &str) -> Vec<String>
  language/
    language_trait.rs
      pub trait LanguageImpl
    factory.rs
      pub fn get_language_for_file(path: &Path) -> Option<Box<dyn LanguageImpl>>
    rust.rs
      pub struct RustLanguage
    parser_pool.rs
      pub struct ParserPool
    tree_cache.rs
      pub struct TreeCache
  models.rs
    pub struct SearchResult
    pub struct ParentContext
    pub struct CodeBlock
    pub struct LimitedSearchResults
    pub struct SearchLimits
  ranking.rs
    pub fn rank_documents(query: &Expr, ...) -> Vec<(usize, f64)>
    pub struct QueryTokenMap
  extract/
    mod.rs
      pub fn handle_extract(options: ExtractOptions) -> Result<()>
    processor.rs
      pub fn process_file_for_extraction(...) -> Result<Vec<SearchResult>>
    symbol_finder.rs
      pub fn find_all_symbols_in_file(...) -> Result<Vec<SymbolMatch>>
  cli.rs
    pub enum Commands
    pub struct SearchArgs
    pub struct ExtractArgs
```

### Outline Format -- `--detail files`

```
src/
  search/
    search_runner.rs (2,145 lines)
    result_ranking.rs (487 lines)
    search_output.rs (2,680 lines)
    elastic_query.rs (356 lines)
    block_merging.rs (290 lines)
    cache.rs (185 lines)
    filters.rs (142 lines)
    tokenization.rs (210 lines)
  language/
    language_trait.rs (45 lines)
    factory.rs (120 lines)
    rust.rs (280 lines)
    parser_pool.rs (95 lines)
    tree_cache.rs (110 lines)
  models.rs (105 lines)
  ranking.rs (520 lines)
  extract/
    mod.rs (780 lines)
    processor.rs (930 lines)
    symbol_finder.rs (480 lines)
  cli.rs (460 lines)
```

### Outline Format -- `--detail full`

```
src/
  search/
    search_runner.rs
      /// Main entry point for probe search. Orchestrates the full pipeline.
      pub fn perform_probe(options: &SearchOptions) -> Result<Vec<SearchResult>>
      /// Search using structured patterns with SIMD acceleration.
      pub fn search_with_structured_patterns(...) -> Result<HashMap<PathBuf, ...>>
```

### JSON Format

```json
{
  "root": "./src",
  "total_files": 42,
  "total_symbols": 187,
  "total_tokens": 3850,
  "tree": [
    {
      "path": "src/search",
      "type": "directory",
      "children": [
        {
          "path": "src/search/search_runner.rs",
          "type": "file",
          "lines": 2145,
          "language": "rust",
          "symbols": [
            {
              "name": "perform_probe",
              "signature": "pub fn perform_probe(options: &SearchOptions) -> Result<Vec<SearchResult>>",
              "node_type": "function_item",
              "line": 225,
              "end_line": 450,
              "visibility": "public",
              "doc": "Main entry point for probe search. Orchestrates the full pipeline."
            }
          ]
        }
      ]
    }
  ]
}
```

## Implementation Plan

### Phase 1: Core `probe map` Command (Rust)

**Step 1: New model types in `src/models.rs`**

```rust
pub struct MapEntry {
    pub path: String,
    pub entry_type: MapEntryType,
    pub language: Option<String>,
    pub line_count: Option<usize>,
    pub symbols: Vec<SymbolInfo>,
    pub children: Vec<MapEntry>,  // for directories
}

pub enum MapEntryType {
    Directory,
    File,
}

pub struct SymbolInfo {
    pub name: String,
    pub signature: String,
    pub node_type: String,
    pub start_line: usize,
    pub end_line: usize,
    pub visibility: Option<String>,  // pub, pub(crate), private, etc.
    pub doc_comment: Option<String>, // first line only
}

pub struct MapOptions {
    pub paths: Vec<String>,
    pub depth: Option<usize>,
    pub detail: MapDetail,       // files, signatures, full
    pub language: Option<String>,
    pub max_tokens: Option<usize>,
    pub format: String,          // outline, json, xml
    pub ignore_patterns: Vec<String>,
    pub allow_tests: bool,
}

pub enum MapDetail {
    Files,       // just file names + line counts
    Signatures,  // + symbol signatures (default)
    Full,        // + doc comments
}
```

**Step 2: New module `src/map/`**

Create `src/map/mod.rs`:
- `pub fn handle_map(options: MapOptions) -> Result<MapResult>` -- main entry point
- Reuse `file_list_cache` for `.gitignore`-aware traversal
- For each file: call a revived `extract_all_symbols_from_file()` (currently dead code at `processor.rs:884`)
- Build directory tree from flat file list
- Apply `--max-tokens` budget using existing tiktoken infrastructure

Create `src/map/output.rs`:
- `format_map_outline()` -- indented text output
- `format_map_json()` -- structured JSON
- `format_map_xml()` -- XML output

**Step 3: Token-Budget-Aware Truncation**

Critical for AI agents. When `--max-tokens` is set:

1. Start with directory structure (cheapest)
2. Add symbols for files in priority order:
   - Smaller files first (more likely to be focused modules)
   - Files closer to root first
   - Public symbols only if budget is tight
3. When budget runs out, show remaining files as `... (N more files)` with just the filename
4. Return metadata: `{ total_files, shown_files, total_symbols, shown_symbols, tokens_used }`

This ensures the agent always gets SOMETHING useful within its token budget, never an error or empty result.

**Step 4: CLI registration in `src/cli.rs`**

Add `Map` variant to the `Commands` enum:

```rust
/// Generate a structural overview of a codebase
Map {
    /// Paths to map
    #[arg(default_value = ".")]
    paths: Vec<String>,

    /// Maximum directory depth
    #[arg(long, short = 'd')]
    depth: Option<usize>,

    /// Detail level: files, signatures, full
    #[arg(long, default_value = "signatures")]
    detail: String,

    /// Filter by programming language
    #[arg(long, short = 'l')]
    language: Option<String>,

    /// Maximum output tokens
    #[arg(long)]
    max_tokens: Option<usize>,

    /// Output format
    #[arg(long, short = 'o', default_value = "outline")]
    format: String,

    /// Custom ignore patterns
    #[arg(long, short = 'i')]
    ignore: Vec<String>,

    /// Include test files
    #[arg(long)]
    allow_tests: bool,
}
```

### Phase 2: MCP Integration

Add `map_code` tool to the MCP server at `npm/src/mcp/index.ts`:

```typescript
{
  name: "map_code",
  description: "Get a structural overview of a codebase with file tree and symbol signatures. Use this FIRST when exploring an unfamiliar codebase before searching.",
  inputSchema: {
    type: "object",
    properties: {
      path: { type: "string", description: "Directory to map" },
      depth: { type: "number", description: "Max directory depth (default: unlimited)" },
      detail: { type: "string", enum: ["files", "signatures", "full"], default: "signatures" },
      language: { type: "string", description: "Filter by language" },
      maxTokens: { type: "number", description: "Token budget for output", default: 4000 },
    },
    required: ["path"]
  }
}
```

### Phase 3: Agent Integration

Update ProbeAgent system prompt to use `map_code` as the first step when exploring a new codebase:
```
When exploring an unfamiliar codebase:
1. Use map_code to understand the overall structure
2. Use search_code to find specific code
3. Use extract_code to read specific files/symbols
```

## Performance Considerations

- **Lazy symbol extraction**: Only parse files with tree-sitter when `--detail signatures` or `--detail full` is requested. For `--detail files`, just count lines.
- **Parallel processing**: Use rayon for file parsing (same as search pipeline).
- **Cache reuse**: The parser pool (`ParserPool`) and tree cache (`TreeCache`) are already designed for reuse across files.
- **Early termination**: Stop processing files once `--max-tokens` budget is exhausted.
- **File list cache**: Reuse `file_list_cache` to avoid re-walking the directory on repeated calls.

## Testing

### Unit Tests (`src/map/mod.rs`)

```rust
#[cfg(test)]
mod tests {
    #[test]
    fn test_map_single_file() { /* map a single .rs file, verify symbols extracted */ }

    #[test]
    fn test_map_directory_tree() { /* map a directory, verify tree structure */ }

    #[test]
    fn test_map_depth_limit() { /* --depth 1 only shows one level */ }

    #[test]
    fn test_map_language_filter() { /* --language rust only shows .rs files */ }

    #[test]
    fn test_map_token_budget() { /* --max-tokens 500 truncates gracefully */ }

    #[test]
    fn test_map_detail_files() { /* --detail files shows no symbols */ }

    #[test]
    fn test_map_detail_signatures() { /* --detail signatures shows signatures */ }

    #[test]
    fn test_map_excludes_tests() { /* test files excluded by default */ }

    #[test]
    fn test_map_gitignore_respected() { /* .gitignore patterns honored */ }
}
```

### CLI Tests (`tests/cli_tests.rs`)

```rust
#[test]
fn test_map_command_basic() {
    let output = Command::new("probe")
        .args(["map", "./src", "--format", "json"])
        .output().unwrap();
    let map: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap();
    assert!(map["total_files"].as_u64().unwrap() > 0);
    assert!(map["tree"].as_array().unwrap().len() > 0);
}

#[test]
fn test_map_command_max_tokens() {
    let output = Command::new("probe")
        .args(["map", "./src", "--max-tokens", "500", "--format", "json"])
        .output().unwrap();
    let map: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap();
    assert!(map["total_tokens"].as_u64().unwrap() <= 500);
}
```

## Success Criteria

1. `probe map ./src` returns useful output in <1 second for a medium codebase (~500 files)
2. `probe map . --max-tokens 4000` always fits within token budget
3. Output is immediately useful for an LLM to understand repo structure
4. No indexing or setup required
5. Respects `.gitignore` and `--ignore` patterns
6. Token budget truncation is graceful (never empty output, always shows at least file tree)

## Competitive Context

This feature was identified by comparing probe with:
- **ABCoder** (CloudWeGo/ByteDance) -- `get_repo_structure` / `get_package_structure` MCP tools with hierarchical drill-down
- **Stakgraph** (Stakwork) -- `repo_map` MCP tool returning graph overview
- **Octocode** (Muvon) -- `view` command showing file signatures via glob patterns
- **grepai** -- no equivalent (relies on semantic search for discovery)

Probe's advantage: **zero setup, instant results** -- unlike ABCoder (requires batch parse) or Octocode (requires indexing). Same philosophy as `probe search`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: `probe map` -- Repository Structure Overview Command #501

Problem Statement

Proposed Solution

Existing Building Blocks

CLI Interface

Expected Output Formats

Outline Format (default) -- `--detail signatures`

Outline Format -- `--detail files`

Outline Format -- `--detail full`

JSON Format

Implementation Plan

Phase 1: Core `probe map` Command (Rust)

Phase 2: MCP Integration

Phase 3: Agent Integration

Performance Considerations

Testing

Unit Tests (`src/map/mod.rs`)

CLI Tests (`tests/cli_tests.rs`)

Success Criteria

Competitive Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature: probe map -- Repository Structure Overview Command #501

Description

Problem Statement

Proposed Solution

Existing Building Blocks

CLI Interface

Expected Output Formats

Outline Format (default) -- --detail signatures

Outline Format -- --detail files

Outline Format -- --detail full

JSON Format

Implementation Plan

Phase 1: Core probe map Command (Rust)

Phase 2: MCP Integration

Phase 3: Agent Integration

Performance Considerations

Testing

Unit Tests (src/map/mod.rs)

CLI Tests (tests/cli_tests.rs)

Success Criteria

Competitive Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Feature: `probe map` -- Repository Structure Overview Command #501

Outline Format (default) -- `--detail signatures`

Outline Format -- `--detail files`

Outline Format -- `--detail full`

Phase 1: Core `probe map` Command (Rust)

Unit Tests (`src/map/mod.rs`)

CLI Tests (`tests/cli_tests.rs`)