[Enhancement] Add JSON Schema Validation for LLM-Extracted Output to Ensure Data Integrity

Currently, the textToJSON class in backend.py sends prompts to the Ollama LLM and directly stores whatever raw string the model returns into the JSON dictionary — with no validation whatsoever. There is no schema definition, no type checking, no format enforcement, and no constraint verification on the AI-extracted values before they are used to fill PDF forms.

This is a critical gap because LLMs are inherently non-deterministic and can return:

Hallucinated or nonsensical values (e.g., a phone number of "yes")
Incorrectly formatted data (e.g., date as "January 2" instead of "01/02/2025")
Extra conversational text wrapping the answer (e.g., "The name is John Doe." instead of "John Doe")
Empty or partial responses

Proposed Solution
Define a JSON Schema (or Pydantic model) for expected field types and constraints:
json
{
  "employee_name": {"type": "string", "min_length": 1},
  "phone_number": {"type": "string", "pattern": "^[0-9\\-\\+\$\$ ]+$"},
  "date": {"type": "string", "format": "date"},
  "email": {"type": "string", "format": "email"}
}
After each LLM response in main_loop(), validate the extracted value against the schema.
Flag any values that fail validation with a warning and optionally prompt the user for correction.
Add a retry mechanism for failed extractions (re-prompt the LLM with more specific instructions).

## ✅ Acceptance Criteria
How will we know this is finished?
- [x] Feature works in Docker container.
- [x] Documentation updated in `docs/`.
- [x] JSON output validates against the schema.

## 📌 Additional Context
Add any other screenshots, links to fire department forms, or research here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Add JSON Schema Validation for LLM-Extracted Output to Ensure Data Integrity #40

✅ Acceptance Criteria

📌 Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Enhancement] Add JSON Schema Validation for LLM-Extracted Output to Ensure Data Integrity #40

Description

✅ Acceptance Criteria

📌 Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions