name: 🚀 Feature Request
about: Suggest an idea or a new capability for FireForm.
title: "[FEAT]: Non-blocking async LLM extraction with streaming, retry, and job orchestration"
labels: enhancement
assignees: ''
📝 Description
The current implementation of POST /forms/fill performs LLM extraction synchronously using requests.post() within the FastAPI request lifecycle.
Execution chain:
forms.py → controller.fill_form() → llm.get_data() → requests.post()
Because FastAPI runs on an asyncio event loop (via Uvicorn), this synchronous HTTP call blocks the event loop thread for the entire duration of Ollama inference.
Operational impact:
- Each field extraction may take 2–5 seconds on CPU inference.
- With sequential
main_loop() (N fields), total blocking time scales linearly.
- A 10-field form can hold the event loop for 20–50 seconds.
- Even
main_loop_batch() still blocks for the full duration of a single inference call.
This results in:
- Event loop starvation
- Inability to serve concurrent requests
- 30+ second client response latency
- No retry mechanism for missed fields
- No per-field confidence visibility
- No observable progress during extraction
- No fault tolerance for partial success scenarios
💡 Rationale
PR #151 (currently open) proposes schema enforcement using Ollama’s format parameter and dynamically generated Pydantic models.
While that meaningfully improves output structural reliability, it does not address:
- Transport-layer blocking
- Concurrency limitations
- Retry logic for null/missed fields
- Event loop starvation
- Client-side observability
- Job orchestration
In high-impact public-sector deployments, a system must not only return structured output but must also:
- Remain responsive under load
- Provide progressive feedback
- Handle partial failures gracefully
- Support operational monitoring
This issue proposes a transport and orchestration layer redesign to meet those requirements.
🛠️ Proposed Solution
1️⃣ Asynchronous Concurrent Extraction
- Replace
requests with httpx.AsyncClient
- Introduce
async_extract_all_streaming() in src/llm.py
- Launch per-field extraction tasks via
asyncio.create_task()
- Collect results using
asyncio.as_completed()
This ensures:
- Wall-clock time is bounded by the slowest field
- Partial results become available immediately
- Event loop remains free to serve other requests
If 9 fields resolve in 3 seconds and 1 resolves in 8 seconds:
- Client receives 9 results at second 3
- Final result at second 8
Prior implementations return nothing until second 8.
2️⃣ Two-Pass Auto-Retry Mechanism
After Pass 1 completes:
- Any field returning
None enters Pass 2.
_build_targeted_prompt() constructs a focused single-field prompt.
- Explicit instruction: return
-1 if not found.
- Retry tasks launched concurrently.
Confidence scoring:
| Confidence |
Meaning |
| high |
Extracted in Pass 1 |
| medium |
Recovered in Pass 2 |
| low |
Missing after both passes |
This provides deterministic field-level reliability reporting.
3️⃣ Non-Blocking PDF Generation
- Introduce
fill_form_with_data() in filler.py
- Offload pdfrw operations to a
ThreadPoolExecutor via loop.run_in_executor()
- Prevent CPU-bound PDF writes from blocking event loop
Correctness improvement:
None values written as empty strings instead of literal "None"
4️⃣ New Client-Observable API Surfaces
POST /forms/fill/stream
- Returns
text/event-stream
- Emits one Server-Sent Event per field as soon as it resolves
- Event payload includes:
- field
- value
- confidence
- phase
- Final
complete event includes:
- submission_id
- output_pdf_path
POST /forms/fill/async
- Returns 202 with
job_id
- Full extraction pipeline runs as FastAPI
BackgroundTask
GET /forms/jobs/{job_id}
Returns:
- status (pending, running, complete, failed)
- partial_results
- field_confidence
- output_pdf_path
- error_message
5️⃣ Database Orchestration Layer
Introduce FillJob SQLModel table:
- UUID primary key
- template_id
- input_text
- status
- output_pdf_path
- partial_results (JSON)
- field_confidence (JSON)
- error_message
- created_at
Repository functions:
- create_job
- get_job
- update_job (**kwargs for partial updates)
This enables incremental persistence of extraction progress.
✅ Acceptance Criteria
📌 Additional Context
This redesign is transport- and orchestration-focused.
It is fully compatible with schema enforcement approaches such as PR #151 and does not conflict with model-level validation strategies.
name: 🚀 Feature Request
about: Suggest an idea or a new capability for FireForm.
title: "[FEAT]: Non-blocking async LLM extraction with streaming, retry, and job orchestration"
labels: enhancement
assignees: ''
📝 Description
The current implementation of
POST /forms/fillperforms LLM extraction synchronously usingrequests.post()within the FastAPI request lifecycle.Execution chain:
forms.py → controller.fill_form() → llm.get_data() → requests.post()
Because FastAPI runs on an asyncio event loop (via Uvicorn), this synchronous HTTP call blocks the event loop thread for the entire duration of Ollama inference.
Operational impact:
main_loop()(N fields), total blocking time scales linearly.main_loop_batch()still blocks for the full duration of a single inference call.This results in:
💡 Rationale
PR #151 (currently open) proposes schema enforcement using Ollama’s
formatparameter and dynamically generated Pydantic models.While that meaningfully improves output structural reliability, it does not address:
In high-impact public-sector deployments, a system must not only return structured output but must also:
This issue proposes a transport and orchestration layer redesign to meet those requirements.
🛠️ Proposed Solution
1️⃣ Asynchronous Concurrent Extraction
requestswithhttpx.AsyncClientasync_extract_all_streaming()insrc/llm.pyasyncio.create_task()asyncio.as_completed()This ensures:
If 9 fields resolve in 3 seconds and 1 resolves in 8 seconds:
Prior implementations return nothing until second 8.
2️⃣ Two-Pass Auto-Retry Mechanism
After Pass 1 completes:
Noneenters Pass 2._build_targeted_prompt()constructs a focused single-field prompt.-1if not found.Confidence scoring:
This provides deterministic field-level reliability reporting.
3️⃣ Non-Blocking PDF Generation
fill_form_with_data()infiller.pyThreadPoolExecutorvialoop.run_in_executor()Correctness improvement:
Nonevalues written as empty strings instead of literal"None"4️⃣ New Client-Observable API Surfaces
POST /forms/fill/streamtext/event-streamcompleteevent includes:POST /forms/fill/asyncjob_idBackgroundTaskGET /forms/jobs/{job_id}Returns:
5️⃣ Database Orchestration Layer
Introduce
FillJobSQLModel table:Repository functions:
This enables incremental persistence of extraction progress.
✅ Acceptance Criteria
/forms/fillendpoint preserved📌 Additional Context
This redesign is transport- and orchestration-focused.
It is fully compatible with schema enforcement approaches such as PR #151 and does not conflict with model-level validation strategies.