Skip to content

[FEAT]: Schema Validation + Error Recovery #114

@Acuspeedster

Description

@Acuspeedster

name: 🚀 Feature Request
about: Suggest an idea or a new capability for FireForm.
title: "[FEAT]: Add Schema Validation and Error Recovery for LLM Output"
labels: enhancement
assignees: ''

📝 Description

Currently, LLM extraction output is passed directly into the PDF filler without validation.
There is no verification that extracted fields:

  • Match expected data types
  • Exist in the schema
  • Were actually found in the transcript

If the LLM hallucinates, returns partial data, or outputs mismatched types, the PDF may be silently corrupted or left incomplete.


💡 Rationale

FireForm is designed for real-world emergency response environments where accuracy and reliability are critical.

Without schema validation:

  • Incorrect values silently propagate
  • Missing fields are not surfaced to operators
  • Type mismatches go unnoticed
  • There is no structured error recovery

A validation layer improves reliability, transparency, and operator trust.


🛠️ Proposed Solution

Introduce a SchemaValidator class:

  • Validate extracted data against template schema

  • Attempt type coercion where possible ("2"int)

  • Classify field confidence:

    • HIGH
    • LOW
    • MISSING
  • Produce a structured ValidationReport

  • Surface warnings without crashing the pipeline

  • Return validated clean data for PDF filling

  • Logic change in src/

  • New validation module (src/validator.py)

  • Integrate into file_manipulator.py

  • Unit tests for validation edge cases


✅ Acceptance Criteria

  • All extracted fields validated against schema
  • null, "", and "-1" treated as MISSING
  • Type coercion attempted before marking LOW
  • ValidationReport exposes validated_data
  • Missing fields surfaced as warnings
  • Pipeline continues gracefully
  • Full unit test coverage

📌 Additional Context

This improves robustness of the core AI → JSON → PDF pipeline and aligns with FireForm’s goal of production-grade reliability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions