feat: Static PDF filling for ICS-214 (CAL fire) forms . (Scan Once, Fill Forever)#437
Open
utkarshqz wants to merge 16 commits into
Open
feat: Static PDF filling for ICS-214 (CAL fire) forms . (Scan Once, Fill Forever)#437utkarshqz wants to merge 16 commits into
utkarshqz wants to merge 16 commits into
Conversation
… Lake UI, semantic mapper fix
…stic field detection
…LM context capping
Author
|
@marcvergees @vharkins1 @jansans04 @juanalvv please check this . |
926cc34 to
343dbe4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
FireForm can now populate static, non-fillable scanned PDFs — including the CAL FIRE ICS-214 Activity Log — from a single voice or text transcript.
Before this PR, the pipeline was completely blocked for any PDF without embedded AcroForm fields. This is the reality for most rural emergency response stations — the CAL FIRE ICS-214 Activity Log, one of the most critical wildland firefighting forms, exists largely as a static flat scan in most department archives. Uploading it to FireForm previously resulted in 0 fields detected and a completely empty output.
This PR introduces a three-path deterministic scan cascade that makes FireForm format-agnostic:
Path 1 — AcroForm (pdfrw): Existing digital fillable PDFs continue to work unchanged, zero regression.
Path 2 — OCR Bounding-Box Scan (Tesseract + OpenCV): If no AcroForm fields are found,
pytesseractvisually maps the bounding geometry(X, Y, W, H)of blank lines and input zones on the static page image.PyMuPDF (fitz)then stamps the extracted text directly at those coordinates, auto-wrapping text strictly within each bounding box to prevent overflow or line bleeding.Path 3 — Gemma3 Vision Fallback (gemma3:4b via Ollama): If Tesseract confidence falls below threshold, the page image is passed directly to
gemma3:4brunning locally via Ollama. The model infers the logical field zones from the visual layout natively — completely offline, no cloud dependency.Fixes : #432
The "Scan Once, Fill Forever" Architecture
The core insight driving this pipeline is that bounding-box detection is a one-time setup cost per template. After the first scan, the detected coordinate schema is persisted permanently into the Master Incident Data Lake (see PR #385). The Dynamic AI Semantic Mapper (see PR #386) then bridges universally — mapping the Data Lake's canonical incident JSON to the static PDF's exact pixel coordinate map, regardless of how the PDF's blank spaces are labelled or structured.
This means:
Real-World Target: CAL FIRE ICS-214
The ICS-214 (Activity Log) is a mandatory document for all CAL FIRE incident operations. It tracks resources, tasks, and activity across the operational period. Because it is distributed and archived as a flat non-fillable PDF at most stations, FireForm could not support it until now. This PR directly enables a CAL FIRE responder to speak naturally into the app and receive a completed, stamped ICS-214 output.
Type of change
How Has This Been Tested?
Tested locally on Windows 11 with
gemma3:4brunning via Ollama (OLLAMA_HOST=http://localhost:11434, CPU-only, no GPU required).Test results:
pdfrwpath ✅fitz✅using gemma 3 :

using gemma 3 + Tesseract OCR + pymupdf (and also using data lake #385 ):

Test Configuration:
gemma3:4bvia Ollama (CPU-only)Files Changed
api/routes/templates.pyscan_static_template()three-path cascade with Tesseract + Gemma3src/llm.pyrequirements.txtpytesseract,Pillow; removed legacypdfplumberfrontend/index.htmlChecklist