feat: Static PDF filling for ICS-214 (CAL fire) forms . (Scan Once, Fill Forever) by utkarshqz · Pull Request #437 · fireform-core/FireForm

utkarshqz · 2026-04-15T18:02:23Z

Description

FireForm can now populate static, non-fillable scanned PDFs — including the CAL FIRE ICS-214 Activity Log — from a single voice or text transcript.

Before this PR, the pipeline was completely blocked for any PDF without embedded AcroForm fields. This is the reality for most rural emergency response stations — the CAL FIRE ICS-214 Activity Log, one of the most critical wildland firefighting forms, exists largely as a static flat scan in most department archives. Uploading it to FireForm previously resulted in 0 fields detected and a completely empty output.

This PR introduces a three-path deterministic scan cascade that makes FireForm format-agnostic:

Path 1 — AcroForm (pdfrw): Existing digital fillable PDFs continue to work unchanged, zero regression.

Path 2 — OCR Bounding-Box Scan (Tesseract + OpenCV): If no AcroForm fields are found, pytesseract visually maps the bounding geometry (X, Y, W, H) of blank lines and input zones on the static page image. PyMuPDF (fitz) then stamps the extracted text directly at those coordinates, auto-wrapping text strictly within each bounding box to prevent overflow or line bleeding.

Path 3 — Gemma3 Vision Fallback (gemma3:4b via Ollama): If Tesseract confidence falls below threshold, the page image is passed directly to gemma3:4b running locally via Ollama. The model infers the logical field zones from the visual layout natively — completely offline, no cloud dependency.

Fixes : #432

The "Scan Once, Fill Forever" Architecture

The core insight driving this pipeline is that bounding-box detection is a one-time setup cost per template. After the first scan, the detected coordinate schema is persisted permanently into the Master Incident Data Lake (see PR #385). The Dynamic AI Semantic Mapper (see PR #386) then bridges universally — mapping the Data Lake's canonical incident JSON to the static PDF's exact pixel coordinate map, regardless of how the PDF's blank spaces are labelled or structured.

This means:

Scan once → coordinates stored in Data Lake → never scanned again
Every subsequent report fill for that template is instant
Works across every incident, every responder, every shift — with zero re-scan cost

Real-World Target: CAL FIRE ICS-214

The ICS-214 (Activity Log) is a mandatory document for all CAL FIRE incident operations. It tracks resources, tasks, and activity across the operational period. Because it is distributed and archived as a flat non-fillable PDF at most stations, FireForm could not support it until now. This PR directly enables a CAL FIRE responder to speak naturally into the app and receive a completed, stamped ICS-214 output.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

How Has This Been Tested?

Tested locally on Windows 11 with gemma3:4b running via Ollama (OLLAMA_HOST=http://localhost:11434, CPU-only, no GPU required).

Test results:

AcroForm PDFs continue to fill correctly via existing pdfrw path ✅
Static flat PDFs: coordinates detected via Tesseract OCR, stamped via PyMuPDF fitz ✅
Gemma3 vision fallback activates correctly when Tesseract confidence is below threshold ✅
Data Lake coordinate persistence: re-uploads of the same template skip OCR scan entirely ✅

using gemma 3 :

using gemma 3 + Tesseract OCR + pymupdf (and also using data lake #385 ):

Test Configuration:

OS: Windows 11
Model: gemma3:4b via Ollama (CPU-only)
PDF target: CAL FIRE ICS-214 (static flat scan)

Files Changed

File	Change
`api/routes/templates.py`	Added `scan_static_template()` three-path cascade with Tesseract + Gemma3
`src/llm.py`	Context stability limiter for large multi-page form extraction
`requirements.txt`	Added `pytesseract`, `Pillow`; removed legacy `pdfplumber`
`frontend/index.html`	Updated inference state labels to "FireForm AI"

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules

…vation

…where

…neration

…pull

…escriptions

… Lake UI, semantic mapper fix

…stic field detection

…LM context capping

utkarshqz · 2026-04-15T18:03:12Z

@marcvergees @vharkins1 @jansans04 @juanalvv please check this .

utkarshqz added 16 commits March 17, 2026 22:56

feat: voice transcription via faster-whisper + all accumulated fixes

e6689fc

feat: voice transcription, PWA mobile, frontend improvements, 70 tests

c4fb150

chore: remove mobile/

721539c

fix: robust radio button kid extraction and checkbox AP stream preser…

0f9bfab

…vation

feat: implement Master Incident Data Lake — Record Once, Report Every…

f3fd0fd

…where

feat: implement Master Incident Data Lake — Record Once, Report Every…

4e3c6c5

…where

feat: add Dynamic AI Semantic Mapper for universal schema-less PDF ge…

883b641

…neration

fix: Docker production setup - system deps, PYTHONPATH, ports, model …

5107250

…pull

fix: Docker production setup - closes 8 community-reported issues

971d0a0

fix: Docker production setup — resolve 8 community-reported issues

13f4fc2

chore: setup vision branch base and resolve requirement conflict

78ede7d

chore: setup vision branch base and resolve requirement conflict

47e901c

feat: implement gemma vision pipeline for form population and scene d…

69ea0ae

…escriptions

feat: vision pipeline — Gemma 3 scan, static PDF overlay filler, Data…

8d9c5d0

… Lake UI, semantic mapper fix

feat(static-pdf): switch to PyMuPDF + reportlab overlay for determini…

28328a5

…stic field detection

feat: implement static PDF cascade with Tesseract OCR, PyMuPDF, and L…

343dbe4

…LM context capping

utkarshqz force-pushed the feat/vision-model-all branch from 926cc34 to 343dbe4 Compare April 19, 2026 07:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Static PDF filling for ICS-214 (CAL fire) forms . (Scan Once, Fill Forever)#437

feat: Static PDF filling for ICS-214 (CAL fire) forms . (Scan Once, Fill Forever)#437
utkarshqz wants to merge 16 commits into
fireform-core:mainfrom
utkarshqz:feat/vision-model-all

utkarshqz commented Apr 15, 2026 •

edited by juanalvv

Loading

Uh oh!

utkarshqz commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

utkarshqz commented Apr 15, 2026 • edited by juanalvv Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Fixes : #432

The "Scan Once, Fill Forever" Architecture

Real-World Target: CAL FIRE ICS-214

Type of change

How Has This Been Tested?

Files Changed

Checklist

Uh oh!

utkarshqz commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

utkarshqz commented Apr 15, 2026 •

edited by juanalvv

Loading