name: 🐛 Bug Report
about: Create a report to help us improve FireForm.
title: "[BUG]: Missing Confidence Validation for LLM-Generated Content in PDF Filler"
labels: bug, enhancement, high-priority
assignees: ''
⚡️ Describe the Bug
When processing incident reports, the LLM outputs are currently written directly into the generated PDF form widgets (src/filler.py) without any intermediate validation or confidence threshold check.
Because FireForm target users include Cal Fire, EMS, and Sheriff's departments, accuracy in these forms is a strict legal requirement. LLMs are known to hallucinate (fabricating names, badge numbers, or incident codes). Writing a hallucinated value directly onto an official government/medical document without human oversight creates a significant liability risk, making this a critical architectural bug before production adoption.
👣 Steps to Reproduce
- Submit a form processing request via the API with a standard PDF template.
- Provide an
input_text that is ambiguous or intentionally omits a required field (e.g., leaving out the badge number).
- The LLM (
mistral) will attempt to guess or hallucinate the missing value.
- Observe src/filler.py lines 41-44
annot.V = f"{answers_list[i]}".
- See that the system forcefully inserts whatever hallucinated string Mistral generated into the official document without throwing a warning or flagging a low-confidence guess.
📉 Expected Behavior
The expected behavior is that the system enforces a validation boundary before modifying the PDF. The underlying LLM extraction (in src/llm.py) should return a structured response with a confidence score. If a field's confidence falls below a set threshold (e.g., 85%), the system should flag it as needs_review. The API should return this state so the frontend can highlight the specific field for a "human-in-the-loop" review by the First Responder before the final PDF is generated.
🖥️ Environment Information
- OS: N/A (Docker container based)
- Docker/Compose Version: N/A
- Ollama Model used: mistral (default)
📸 Screenshots/Logs
N/A - Architectural flaw in src/filler.py and src/llm.py logic rather than a runtime crash.
🕵️ Possible Fix
- Prompt Engineering Update: Update src/llm.py to prompt the model to return a JSON schema containing value and
confidence (e.g. {"value": "John Doe", "confidence": 0.95}) instead of a raw string.
- Review State Implementation: Introduce a
needs_review flag for fields with confidence < 0.85.
- API Handoff: Ensure the FastAPI response includes the
needs_review data rather than immediately generating the final PDF, allowing the frontend client to enforce a human review step.
name: 🐛 Bug Report
about: Create a report to help us improve FireForm.
title: "[BUG]: Missing Confidence Validation for LLM-Generated Content in PDF Filler"
labels: bug, enhancement, high-priority
assignees: ''
⚡️ Describe the Bug
When processing incident reports, the LLM outputs are currently written directly into the generated PDF form widgets (src/filler.py) without any intermediate validation or confidence threshold check.
Because FireForm target users include Cal Fire, EMS, and Sheriff's departments, accuracy in these forms is a strict legal requirement. LLMs are known to hallucinate (fabricating names, badge numbers, or incident codes). Writing a hallucinated value directly onto an official government/medical document without human oversight creates a significant liability risk, making this a critical architectural bug before production adoption.
👣 Steps to Reproduce
input_textthat is ambiguous or intentionally omits a required field (e.g., leaving out the badge number).mistral) will attempt to guess or hallucinate the missing value.annot.V = f"{answers_list[i]}".📉 Expected Behavior
The expected behavior is that the system enforces a validation boundary before modifying the PDF. The underlying LLM extraction (in src/llm.py) should return a structured response with a confidence score. If a field's confidence falls below a set threshold (e.g., 85%), the system should flag it as
needs_review. The API should return this state so the frontend can highlight the specific field for a "human-in-the-loop" review by the First Responder before the final PDF is generated.🖥️ Environment Information
📸 Screenshots/Logs
N/A - Architectural flaw in src/filler.py and src/llm.py logic rather than a runtime crash.
🕵️ Possible Fix
confidence(e.g.{"value": "John Doe", "confidence": 0.95}) instead of a raw string.needs_reviewflag for fields withconfidence < 0.85.needs_reviewdata rather than immediately generating the final PDF, allowing the frontend client to enforce a human review step.