Skip to content

[BUG]: Mistral AI Outputs are Written Directly to Legal PDFs Without Confidence Checks #222

@RITVIKKAMASETTY

Description

@RITVIKKAMASETTY

name: 🐛 Bug Report
about: Create a report to help us improve FireForm.
title: "[BUG]: Missing Confidence Validation for LLM-Generated Content in PDF Filler"
labels: bug, enhancement, high-priority
assignees: ''


⚡️ Describe the Bug

When processing incident reports, the LLM outputs are currently written directly into the generated PDF form widgets (src/filler.py) without any intermediate validation or confidence threshold check.

Because FireForm target users include Cal Fire, EMS, and Sheriff's departments, accuracy in these forms is a strict legal requirement. LLMs are known to hallucinate (fabricating names, badge numbers, or incident codes). Writing a hallucinated value directly onto an official government/medical document without human oversight creates a significant liability risk, making this a critical architectural bug before production adoption.

👣 Steps to Reproduce

  1. Submit a form processing request via the API with a standard PDF template.
  2. Provide an input_text that is ambiguous or intentionally omits a required field (e.g., leaving out the badge number).
  3. The LLM (mistral) will attempt to guess or hallucinate the missing value.
  4. Observe src/filler.py lines 41-44 annot.V = f"{answers_list[i]}".
  5. See that the system forcefully inserts whatever hallucinated string Mistral generated into the official document without throwing a warning or flagging a low-confidence guess.

📉 Expected Behavior

The expected behavior is that the system enforces a validation boundary before modifying the PDF. The underlying LLM extraction (in src/llm.py) should return a structured response with a confidence score. If a field's confidence falls below a set threshold (e.g., 85%), the system should flag it as needs_review. The API should return this state so the frontend can highlight the specific field for a "human-in-the-loop" review by the First Responder before the final PDF is generated.

🖥️ Environment Information

  • OS: N/A (Docker container based)
  • Docker/Compose Version: N/A
  • Ollama Model used: mistral (default)

📸 Screenshots/Logs

N/A - Architectural flaw in src/filler.py and src/llm.py logic rather than a runtime crash.

🕵️ Possible Fix

  1. Prompt Engineering Update: Update src/llm.py to prompt the model to return a JSON schema containing value and confidence (e.g. {"value": "John Doe", "confidence": 0.95}) instead of a raw string.
  2. Review State Implementation: Introduce a needs_review flag for fields with confidence < 0.85.
  3. API Handoff: Ensure the FastAPI response includes the needs_review data rather than immediately generating the final PDF, allowing the frontend client to enforce a human review step.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions