Skip to content

[FEAT]: Enforce Structured LLM Outputs via Pydantic & JSON Schema #148

@Cubix33

Description

@Cubix33

📝 Description

Currently, the extraction pipeline in src/llm.py relies on sequential API calls (one per field) and fragile manual string parsing (e.g., value.strip().replace('"', "") inside add_response_to_json) to clean up the LLM's output.

This approach has two major drawbacks:

  1. Performance bottleneck: A form with 20 fields requires 20 separate inferences, sending the same transcript context to the local LLM repeatedly.
  2. Fragility: If the LLM hallucinates formatting, wraps its answer in markdown, or provides unhandled string variations, the string parsing fails, resulting in corrupted PDF data.

💡 Rationale

To make FireForm production-ready and reliable, the system shouldn't guess the shape of the LLM's response. By integrating pydantic, we can dynamically generate a JSON schema based on the PDF's target fields and pass it directly to Ollama's native structured output capabilities. This forces the LLM to return a strictly typed, fully mapped JSON object in a single pass.

🛠️ Proposed Solution

  • Add pydantic>=2.0.0 to the project dependencies.
  • Overhaul src/llm.py to dynamically generate a Pydantic create_model using the keys from self._target_fields.
  • Convert the Pydantic model to a JSON Schema and pass it via the format parameter in the Ollama API payload.
  • Remove the legacy add_response_to_json string-cleaning logic entirely, as the API response will be guaranteed to match the exact JSON dictionary expected by filler.py.
  • Consolidate the main_loop into a single, comprehensive API call rather than iterating over fields.

✅ Acceptance Criteria

  • pydantic is integrated for schema generation.
  • main_loop in src/llm.py executes a single API request for all fields simultaneously.
  • The LLM response is perfectly typed and immediately parsed as a Python dictionary without manual string stripping.
  • The feature maintains compatibility with existing checkpointing/state logic.

📌 Additional Context

This architectural shift completely eliminates "garbage-in, garbage-out" scenarios and will drastically reduce extraction latency by batching the inference.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions