Skip to content

[FEAT]: Department Profile System for Pre-Mapped PDF Templates #206

@utkarshqz

Description

@utkarshqz

📝 Description

FireForm currently extracts PDF field names as machine-generated identifiers (e.g. textbox_0_0, textbox_0_1). When these are sent to Mistral for extraction, the model has no semantic context and either returns null for all fields or hallucinates a single value repeated across unrelated fields (see related Bug #173).

A Department Profile system would ship pre-built mappings between human-readable field labels and the internal PDF field identifiers for common agency forms used by Fire Departments, Police, and EMS.

💡 Rationale

FireForm's mission is to serve real first responders out of the box. Currently:

  • A firefighter uploads a Cal Fire incident form
  • Mistral receives {"textbox_0_0": "", "textbox_0_1": ""}
  • It has no idea what these fields mean → returns null or wrong values
  • The filled PDF is blank or incorrect

With department profiles:

  • The profile provides {"Officer Name": "textbox_0_0", "Incident Location": "textbox_0_1"}
  • Mistral receives human-readable labels → extracts correctly
  • The filled PDF is accurate

This solves the root cause of Issue #173 without requiring changes to the LLM pipeline.

🛠️ Proposed Solution

  • Create src/profiles/ directory with JSON profile files
  • Each profile maps human-readable field labels → internal PDF field IDs
  • Add profile selector to the frontend UI (dropdown by department type)
  • Pass field label mapping to LLM prompt during extraction

Profile schema:

{
  "department": "Fire Department",
  "description": "Standard Cal Fire incident report",
  "fields": {
    "Officer Name": "textbox_0_0",
    "Badge Number": "textbox_0_1",
    "Incident Location": "textbox_0_2",
    "Incident Date": "textbox_0_3",
    "Number of Victims": "textbox_0_4"
  },
  "example_transcript": "Officer Smith, badge 4421, responding to structure fire at 742 Evergreen Terrace on March 8th. Two victims on scene."
}

Profiles to implement:

  • fire_department.json — Cal Fire incident report
  • police_report.json — Standard police incident form
  • ems_medical.json — EMS patient care report
  • Logic change in src/llm.py to use profile labels in prompt
  • Frontend dropdown to select department profile

✅ Acceptance Criteria

  • At least 3 department profiles ship with the repo
  • Profile labels are injected into the Mistral prompt
  • Extraction accuracy improves for pre-mapped forms (no null output)
  • Feature works in Docker container
  • Documentation updated in docs/
  • JSON output validates against the schema

📌 Additional Context

Related bugs this directly addresses: #173 (PDF filler hallucinates repeating values)

Related features this complements: #111 (Field Mapping Wizard — for custom PDFs not covered by profiles)

This is especially important for FireForm's stated mission as a UN Digital Public Good — the system should work correctly for real first responders without requiring technical setup.

Metadata

Metadata

Labels

to-thinkMore time to think about, advantages and disadvantages of each

Type

No type
No fields configured for issues without a type.

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions