Skip to content

Feat/pydantic validation#61

Open
pigeio wants to merge 3 commits into
fireform-core:mainfrom
pigeio:feat/pydantic-validation
Open

Feat/pydantic validation#61
pigeio wants to merge 3 commits into
fireform-core:mainfrom
pigeio:feat/pydantic-validation

Conversation

@pigeio
Copy link
Copy Markdown

@pigeio pigeio commented Feb 24, 2026

##Linked Issues

Resolves #59 (Feature: Pydantic Validation & Single-Shot Extraction)
Resolves #29 (Bug: Mutable default argument in textToJSON causes data leakage)

Description

This PR significantly refactors the textToJSON class in src/backend.py to improve the performance, reliability, and type-safety of the AI data extraction pipeline.
Previously, the pipeline iterated through target fields one by one, making multiple HTTP requests to Ollama and relying on manual string manipulation. This PR replaces that with a single-shot extraction approach using Pydantic V2.

Key Changes

  • Single-Shot Extraction: Replaced the main_loop with extract_data_with_pydantic(). The AI is now prompted once with a dynamically generated JSON schema, reducing processing time by roughly 75-80%.
  • Strict Schema Validation: Utilized pydantic.create_model to enforce strict types and ensure the LLM returns exact keys.
  • Safe Field Mapping: Added logic to map complex PDF field names (e.g., spaces) to safe variables during Pydantic validation, then map them back for the Fill class.
  • Graceful Degradation: Missing fields now safely default to "-1" instead of causing KeyError crashes.
  • Bug Fix: Fixed the mutable default argument in init (json={} -> json_data=None) preventing data leakage between class instances.
  • Dependencies: Removed the broken commonforms dependency and added pydantic>=2.0 to requirements.txt.

Testing Performed

✅ Tested locally against a running ollama instance (Mistral model).

✅ Verified that a transcript can successfully populate multiple complex fields (e.g., Address, Incident Type, Time) in a single request.

✅ Verified that missing data points gracefully output -1.

✅ Verified that plural string outputs (e.g., "val1; val2") are still correctly parsed into lists for backward compatibility.

Test Output

[LOG] Generating Pydantic Schema for 4 fields...
[LOG] Sending single-shot extraction request to Ollama...
[LOG] Resulting JSON created and validated via Pydantic:
{
"Incident Address": "123 Maple Avenue",
"Incident Type": "Structural Fire",
"Time of Incident": "14:30",
"Casualties": "-1"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant