[FEAT]: Implement strict JSON schema validation and single-shot extraction using Pydantic

---
name: 🚀 Feature Request
about: Suggest an idea or a new capability for FireForm.
title: "[FEAT]: <Short Description>"
labels: enhancement
assignees: ''

---

## 📝 Description
Currently, the textToJSON class in src/backend.py iterates through every target field individually, making separate API calls to the LLM for each field. This approach is slow (latency increases linearly with the number of fields) and fragile, as it relies on manual string manipulation (e.g., .replace('"', '')) to parse the AI's response.
If the LLM outputs conversational text or malformed JSON, the application creates invalid data structures that cause KeyError or crashes during the PDF filling stage.
I propose refactoring the backend to use Pydantic V2 for single-shot data extraction and strict schema validation.

## 💡 Rationale

1. Reliability: Emergency response software requires deterministic outputs. Pydantic guarantees that the data passed to the PDF filler matches the expected schema types (str, int, etc.), preventing runtime crashes.

2. Performance: Instead of N API calls for N fields, we can extract all data in 1 single API call. This significantly reduces the time from "voice input" to "filled PDF."

3. Data Integrity: We need to handle cases where the LLM omits fields or hallucinates keys. Pydantic handles missing fields via Optional types and default values (e.g., "-1").

4. Fixes Existing Bugs: This refactor also resolves [BUG]: Mutable default argument in textToJSON #29.

## 🛠️ Proposed Solution
A brief sketch of how we might implement this. 

[ ] Updates src/backend.py to use pydantic.create_model to dynamically generate a schema based on the input field list.
[ ] Replaces the iterative main_loop with a single request to Ollama using format="json".
[ ] Implements model_validate_json to parse and validate the AI response strictly.
[ ] Updates requirements.txt to include pydantic>=2.0.

## ✅ Acceptance Criteria
How will we know this is finished?

[ ] The system extracts all requested fields (Address, Time, Incident Type, etc.) in a single LLM pass.
[ ] The application handles missing information by defaulting to "-1" (or a configured default) instead of crashing.
[ ] The output is successfully mapped to the existing Fill class for PDF generation.
[ ] requirements.txt is updated to include Pydantic.

## 📌 Additional Context
I have tested this locally with ollama (Mistral model).

Previous implementation: ~15-20 seconds for a 5-field form (multiple round trips).
New implementation: ~3-5 seconds (single round trip).

I have already implemented this locally and verified the tests pass.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT]: Implement strict JSON schema validation and single-shot extraction using Pydantic #59

📝 Description

💡 Rationale

🛠️ Proposed Solution

✅ Acceptance Criteria

📌 Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEAT]: Implement strict JSON schema validation and single-shot extraction using Pydantic #59

Description

📝 Description

💡 Rationale

🛠️ Proposed Solution

✅ Acceptance Criteria

📌 Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions