Feat/pydantic validation#61
Open
pigeio wants to merge 3 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
##Linked Issues
Resolves #59 (Feature: Pydantic Validation & Single-Shot Extraction)
Resolves #29 (Bug: Mutable default argument in textToJSON causes data leakage)
Description
This PR significantly refactors the textToJSON class in src/backend.py to improve the performance, reliability, and type-safety of the AI data extraction pipeline.
Previously, the pipeline iterated through target fields one by one, making multiple HTTP requests to Ollama and relying on manual string manipulation. This PR replaces that with a single-shot extraction approach using Pydantic V2.
Key Changes
Testing Performed
✅ Tested locally against a running ollama instance (Mistral model).
✅ Verified that a transcript can successfully populate multiple complex fields (e.g., Address, Incident Type, Time) in a single request.
✅ Verified that missing data points gracefully output -1.
✅ Verified that plural string outputs (e.g., "val1; val2") are still correctly parsed into lists for backward compatibility.
Test Output
[LOG] Generating Pydantic Schema for 4 fields...
[LOG] Sending single-shot extraction request to Ollama...
[LOG] Resulting JSON created and validated via Pydantic:
{
"Incident Address": "123 Maple Avenue",
"Incident Type": "Structural Fire",
"Time of Incident": "14:30",
"Casualties": "-1"
}