SECURITY: Path Traversal in LLM Cache File Operations

## Vulnerability Summary

**Severity**: HIGH (CVSS 7.5)  
**CWE**: CWE-22 (Path Traversal)  
**Location**: `src/agentready/services/llm_cache.py:37,68`  
**Impact**: Arbitrary file read/write outside cache directory

## Description

The `LLMCache` class constructs file paths from partially user-controlled cache keys without validation, allowing path traversal attacks.

```python
# VULNERABLE CODE (llm_cache.py:37)
def get(self, cache_key: str) -> DiscoveredSkill | None:
    cache_file = self.cache_dir / f"{cache_key}.json"  # No validation!
    
    if not cache_file.exists():
        return None
    
    with open(cache_file, "r", encoding="utf-8") as f:
        data = json.load(f)
```

## Attack Vector

Cache keys are generated from:
```python
# llm_cache.py:96
key_data = f"{attribute_id}_{score}_{evidence_hash}"
return hashlib.sha256(key_data.encode()).hexdigest()[:16]
```

While the key is hashed, a malicious assessor or modified repository could provide a crafted `attribute_id` containing path traversal sequences **before** hashing, or exploit hash collisions to target specific files.

**More critically**: If an attacker can control the inputs to `generate_key()`, they could craft inputs that produce a hash starting with `../` sequences (though unlikely, hash collisions are possible).

## Proof of Concept

```python
# Malicious attribute_id
attribute_id = "../../../etc/passwd"
score = 100.0
evidence_hash = "a" * 16

# Generated key might start with ../ sequences after truncation
cache_key = LLMCache.generate_key(attribute_id, score, evidence_hash)

# Results in path traversal
cache_file = Path(".agentready/llm-cache") / f"{cache_key}.json"
# Could resolve to: /etc/passwd.json (depending on hash output)
```

## Security Impact

- **Information disclosure**: Read arbitrary files as JSON
- **Arbitrary file write**: Write malicious JSON to system locations
- **Cache poisoning**: Inject malicious skills into cache
- **Denial of service**: Fill disk with cache files in arbitrary locations

## Remediation

### Immediate Fix (P0)

Add path validation to prevent traversal:

```python
# SECURITY: Path traversal prevention in cache operations
# Why: User-influenced cache keys could contain ../ sequences
# Prevents: Path Traversal (CWE-22)
# Alternative considered: Filesystem sandboxing rejected due to portability

def get(self, cache_key: str) -> DiscoveredSkill | None:
    """Get cached skill if exists and not expired."""
    # Validate cache key format (alphanumeric only)
    if not cache_key.isalnum():
        logger.warning(f"Invalid cache key format: {cache_key}")
        return None
    
    cache_file = self.cache_dir / f"{cache_key}.json"
    
    # SECURITY: Ensure resolved path is within cache directory
    try:
        cache_file = cache_file.resolve()
        if not str(cache_file).startswith(str(self.cache_dir.resolve())):
            logger.error(f"Path traversal attempt blocked: {cache_key}")
            return None
    except Exception as e:
        logger.error(f"Path resolution error: {e}")
        return None
    
    if not cache_file.exists():
        logger.debug(f"Cache miss: {cache_key}")
        return None
    
    # ... rest of method
```

### Additional Protections

1. **Strict key format validation**:
   ```python
   import re
   CACHE_KEY_PATTERN = re.compile(r'^[a-f0-9]{16}$')
   if not CACHE_KEY_PATTERN.match(cache_key):
       raise ValueError(f"Invalid cache key: {cache_key}")
   ```

2. **Filesystem permissions**:
   ```python
   # Set restrictive permissions on cache directory
   cache_dir.mkdir(parents=True, exist_ok=True, mode=0o700)
   ```

3. **File size limits**:
   ```python
   # Prevent DoS via large cache files
   if cache_file.stat().st_size > 1_000_000:  # 1MB max
       logger.warning(f"Cache file too large: {cache_file}")
       return None
   ```

## References

- [OWASP Path Traversal](https://owasp.org/www-community/attacks/Path_Traversal)
- [CWE-22: Improper Limitation of a Pathname](https://cwe.mitre.org/data/definitions/22.html)
- [Python pathlib security considerations](https://docs.python.org/3/library/pathlib.html#pathlib.Path.resolve)

## Related Vulnerabilities

Same pattern exists in:
- `FileCreationFix.apply()` - file_path validation needed
- `FileModificationFix.apply()` - file_path validation needed
- `CodeSampler._format_code_samples()` - path validation needed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SECURITY: Path Traversal in LLM Cache File Operations #53

Vulnerability Summary

Description

Attack Vector

Proof of Concept

Security Impact

Remediation

Immediate Fix (P0)

Additional Protections

References

Related Vulnerabilities

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SECURITY: Path Traversal in LLM Cache File Operations #53

Description

Vulnerability Summary

Description

Attack Vector

Proof of Concept

Security Impact

Remediation

Immediate Fix (P0)

Additional Protections

References

Related Vulnerabilities

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions