[FEAT]: Universal Support for Static (Non-Fillable) Scanned PDF Operations


name: 🚀 Feature Request
about: Suggest an idea or a new capability for FireForm.
title: "[FEAT]: Universal Support for Static (Non-Fillable) Scanned PDF Operations"

---

## 📝 Description
Currently, FireForm's architecture relies heavily on AcroForm extraction and filling (via `pdfrw`). However, a massive operational reality is that **the majority of emergency response departments still rely on static, flat, or scanned PDFs** that completely lack digital form fields. 

A prime example is the **CAL FIRE ICS-214 (Activity Log)** form. In many deployment environments, these forms are printed, scanned, and distributed as flat images inside a PDF wrapper. Because they lack explicit AcroForm `AP.N` streams or digital input metadata, our current pipeline cannot fill them, alienating a large percentage of potential station administrators who only possess scanned legacy documents.

## 💡 Rationale
To achieve true "Reporting Ubiquity", FireForm must be agnostic to the PDF's internal format. If a department uploads a flat image scan of an ICS-214, the platform should still be able to mathematically overlay the LLM-extracted unstructured data (from our Data Lake pipeline) precisely onto the blank lines of the image.

## 🛠️ Proposed Solution
I propose building a deterministic static-PDF handler that severs reliance on embedded digital fields:

1. **OCR Bounding-Box Detection (Tesseract/OpenCV):**
   - Instead of trying to parse embedded text offsets (which is highly mathematically fragile on scanned documents), we pass the flat PDF through Tesseract to identify logical field zones and empty lines visually.
   - We map the bounding coordinates `(X, Y, W, H)` of these empty regions dynamically.

2. **Semantic Hardware Overlay (PyMuPDF / fitz):**
   - Once the zones are mapped, the LLM maps the values.
   - We utilize `PyMuPDF` to programmatically "stamp" the extracted text strings exactly at those coordinate locations. 
   - `PyMuPDF` handles word-wrapping, font scaling, and bleeding automatically within the strict bounding boxes, preventing text from overlapping into other rows.

## ✅ Acceptance Criteria
- [ ] Pipeline dynamically detects if a document is Flat (0 fillable fields) vs an AcroForm.
- [ ] Ability to parse the visual structure of a static CAL FIRE ICS-214 form.
- [ ] Backend perfectly overlays/stamps text strings dynamically into the blank regions using `fitz`.
- [ ] Text strictly auto-wraps within its calculated geometric bounding box.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT]: Universal Support for Static (Non-Fillable) Scanned PDF Operations #432

📝 Description

💡 Rationale

🛠️ Proposed Solution

✅ Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEAT]: Universal Support for Static (Non-Fillable) Scanned PDF Operations #432

Description

📝 Description

💡 Rationale

🛠️ Proposed Solution

✅ Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions