[FEAT]: Enforce Structured LLM Outputs via Pydantic & JSON Schema

## 📝 Description
Currently, the extraction pipeline in `src/llm.py` relies on sequential API calls (one per field) and fragile manual string parsing (e.g., `value.strip().replace('"', "")` inside `add_response_to_json`) to clean up the LLM's output. 

This approach has two major drawbacks:
1. **Performance bottleneck:** A form with 20 fields requires 20 separate inferences, sending the same transcript context to the local LLM repeatedly.
2. **Fragility:** If the LLM hallucinates formatting, wraps its answer in markdown, or provides unhandled string variations, the string parsing fails, resulting in corrupted PDF data.

## 💡 Rationale
To make FireForm production-ready and reliable, the system shouldn't guess the shape of the LLM's response. By integrating `pydantic`, we can dynamically generate a JSON schema based on the PDF's target fields and pass it directly to Ollama's native structured output capabilities. This forces the LLM to return a strictly typed, fully mapped JSON object in a single pass.

## 🛠️ Proposed Solution
- Add `pydantic>=2.0.0` to the project dependencies.
- Overhaul `src/llm.py` to dynamically generate a Pydantic `create_model` using the keys from `self._target_fields`.
- Convert the Pydantic model to a JSON Schema and pass it via the `format` parameter in the Ollama API payload.
- Remove the legacy `add_response_to_json` string-cleaning logic entirely, as the API response will be guaranteed to match the exact JSON dictionary expected by `filler.py`.
- Consolidate the `main_loop` into a single, comprehensive API call rather than iterating over fields.

## ✅ Acceptance Criteria
- [ ] `pydantic` is integrated for schema generation.
- [ ] `main_loop` in `src/llm.py` executes a single API request for all fields simultaneously.
- [ ] The LLM response is perfectly typed and immediately parsed as a Python dictionary without manual string stripping.
- [ ] The feature maintains compatibility with existing checkpointing/state logic.

## 📌 Additional Context
This architectural shift completely eliminates "garbage-in, garbage-out" scenarios and will drastically reduce extraction latency by batching the inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT]: Enforce Structured LLM Outputs via Pydantic & JSON Schema #148

📝 Description

💡 Rationale

🛠️ Proposed Solution

✅ Acceptance Criteria

📌 Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEAT]: Enforce Structured LLM Outputs via Pydantic & JSON Schema #148

Description

📝 Description

💡 Rationale

🛠️ Proposed Solution

✅ Acceptance Criteria

📌 Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions