[BUG]: Mistral AI Outputs are Written Directly to Legal PDFs Without Confidence Checks

---
name: 🐛 Bug Report
about: Create a report to help us improve FireForm.
title: "[BUG]: Missing Confidence Validation for LLM-Generated Content in PDF Filler"
labels: bug, enhancement, high-priority
assignees: ''

---

## ⚡️ Describe the Bug
When processing incident reports, the LLM outputs are currently written directly into the generated PDF form widgets ([src/filler.py](cci:7://file:///Volumes/Untitled/abab/fireform/FireForm/src/filler.py:0:0-0:0)) without any intermediate validation or confidence threshold check. 

Because FireForm target users include Cal Fire, EMS, and Sheriff's departments, accuracy in these forms is a strict legal requirement. LLMs are known to hallucinate (fabricating names, badge numbers, or incident codes). Writing a hallucinated value directly onto an official government/medical document without human oversight creates a significant liability risk, making this a critical architectural bug before production adoption.

## 👣 Steps to Reproduce
1. Submit a form processing request via the API with a standard PDF template.
2. Provide an `input_text` that is ambiguous or intentionally omits a required field (e.g., leaving out the badge number).
3. The LLM (`mistral`) will attempt to guess or hallucinate the missing value.
4. Observe [src/filler.py](cci:7://file:///Volumes/Untitled/abab/fireform/FireForm/src/filler.py:0:0-0:0) lines 41-44 `annot.V = f"{answers_list[i]}"`. 
5. See that the system forcefully inserts whatever hallucinated string Mistral generated into the official document without throwing a warning or flagging a low-confidence guess.

## 📉 Expected Behavior
The expected behavior is that the system enforces a validation boundary before modifying the PDF. The underlying LLM extraction (in [src/llm.py](cci:7://file:///Volumes/Untitled/abab/fireform/FireForm/src/llm.py:0:0-0:0)) should return a structured response with a **confidence score**. If a field's confidence falls below a set threshold (e.g., 85%), the system should flag it as `needs_review`. The API should return this state so the frontend can highlight the specific field for a "human-in-the-loop" review by the First Responder before the final PDF is generated.

## 🖥️ Environment Information
- **OS:** N/A (Docker container based)
- **Docker/Compose Version:** N/A
- **Ollama Model used:** mistral (default)

## 📸 Screenshots/Logs
*N/A - Architectural flaw in [src/filler.py](cci:7://file:///Volumes/Untitled/abab/fireform/FireForm/src/filler.py:0:0-0:0) and [src/llm.py](cci:7://file:///Volumes/Untitled/abab/fireform/FireForm/src/llm.py:0:0-0:0) logic rather than a runtime crash.*

## 🕵️ Possible Fix
1. **Prompt Engineering Update:** Update [src/llm.py](cci:7://file:///Volumes/Untitled/abab/fireform/FireForm/src/llm.py:0:0-0:0) to prompt the model to return a JSON schema containing [value](cci:1://file:///Volumes/Untitled/abab/fireform/FireForm/src/llm.py:106:4-131:21) and `confidence` (e.g. `{"value": "John Doe", "confidence": 0.95}`) instead of a raw string.
2. **Review State Implementation:** Introduce a `needs_review` flag for fields with `confidence < 0.85`.
3. **API Handoff:** Ensure the FastAPI response includes the `needs_review` data rather than immediately generating the final PDF, allowing the frontend client to enforce a human review step.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Mistral AI Outputs are Written Directly to Legal PDFs Without Confidence Checks #222

⚡️ Describe the Bug

👣 Steps to Reproduce

📉 Expected Behavior

🖥️ Environment Information

📸 Screenshots/Logs

🕵️ Possible Fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG]: Mistral AI Outputs are Written Directly to Legal PDFs Without Confidence Checks #222

Description

⚡️ Describe the Bug

👣 Steps to Reproduce

📉 Expected Behavior

🖥️ Environment Information

📸 Screenshots/Logs

🕵️ Possible Fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions