[FEAT]: Non-Blocking Async Extraction, Retry, and Observability for /forms/fill

---
name: 🚀 Feature Request
about: Suggest an idea or a new capability for FireForm.
title: "[FEAT]: Non-blocking async LLM extraction with streaming, retry, and job orchestration"
labels: enhancement
assignees: ''
---

## 📝 Description

The current implementation of `POST /forms/fill` performs LLM extraction synchronously using `requests.post()` within the FastAPI request lifecycle.

Execution chain:

forms.py → controller.fill_form() → llm.get_data() → requests.post()

Because FastAPI runs on an asyncio event loop (via Uvicorn), this synchronous HTTP call blocks the event loop thread for the entire duration of Ollama inference.

Operational impact:

- Each field extraction may take 2–5 seconds on CPU inference.
- With sequential `main_loop()` (N fields), total blocking time scales linearly.
- A 10-field form can hold the event loop for 20–50 seconds.
- Even `main_loop_batch()` still blocks for the full duration of a single inference call.

This results in:

- Event loop starvation
- Inability to serve concurrent requests
- 30+ second client response latency
- No retry mechanism for missed fields
- No per-field confidence visibility
- No observable progress during extraction
- No fault tolerance for partial success scenarios

---

## 💡 Rationale

PR #151 (currently open) proposes schema enforcement using Ollama’s `format` parameter and dynamically generated Pydantic models.

While that meaningfully improves output structural reliability, it does not address:

- Transport-layer blocking
- Concurrency limitations
- Retry logic for null/missed fields
- Event loop starvation
- Client-side observability
- Job orchestration

In high-impact public-sector deployments, a system must not only return structured output but must also:

- Remain responsive under load
- Provide progressive feedback
- Handle partial failures gracefully
- Support operational monitoring

This issue proposes a transport and orchestration layer redesign to meet those requirements.

---

## 🛠️ Proposed Solution

### 1️⃣ Asynchronous Concurrent Extraction

- Replace `requests` with `httpx.AsyncClient`
- Introduce `async_extract_all_streaming()` in `src/llm.py`
- Launch per-field extraction tasks via `asyncio.create_task()`
- Collect results using `asyncio.as_completed()`

This ensures:

- Wall-clock time is bounded by the slowest field
- Partial results become available immediately
- Event loop remains free to serve other requests

If 9 fields resolve in 3 seconds and 1 resolves in 8 seconds:
- Client receives 9 results at second 3
- Final result at second 8

Prior implementations return nothing until second 8.

---

### 2️⃣ Two-Pass Auto-Retry Mechanism

After Pass 1 completes:

- Any field returning `None` enters Pass 2.
- `_build_targeted_prompt()` constructs a focused single-field prompt.
- Explicit instruction: return `-1` if not found.
- Retry tasks launched concurrently.

Confidence scoring:

| Confidence | Meaning |
|------------|----------|
| high       | Extracted in Pass 1 |
| medium     | Recovered in Pass 2 |
| low        | Missing after both passes |

This provides deterministic field-level reliability reporting.

---

### 3️⃣ Non-Blocking PDF Generation

- Introduce `fill_form_with_data()` in `filler.py`
- Offload pdfrw operations to a `ThreadPoolExecutor` via `loop.run_in_executor()`
- Prevent CPU-bound PDF writes from blocking event loop

Correctness improvement:
- `None` values written as empty strings instead of literal `"None"`

---

### 4️⃣ New Client-Observable API Surfaces

#### `POST /forms/fill/stream`
- Returns `text/event-stream`
- Emits one Server-Sent Event per field as soon as it resolves
- Event payload includes:
  - field
  - value
  - confidence
  - phase
- Final `complete` event includes:
  - submission_id
  - output_pdf_path

#### `POST /forms/fill/async`
- Returns 202 with `job_id`
- Full extraction pipeline runs as FastAPI `BackgroundTask`

#### `GET /forms/jobs/{job_id}`
Returns:
- status (pending, running, complete, failed)
- partial_results
- field_confidence
- output_pdf_path
- error_message

---

### 5️⃣ Database Orchestration Layer

Introduce `FillJob` SQLModel table:

- UUID primary key
- template_id
- input_text
- status
- output_pdf_path
- partial_results (JSON)
- field_confidence (JSON)
- error_message
- created_at

Repository functions:
- create_job
- get_job
- update_job (**kwargs for partial updates)

This enables incremental persistence of extraction progress.

---

## ✅ Acceptance Criteria

- [x] Event loop no longer blocked by extraction
- [x] Concurrent per-field extraction implemented
- [x] Two-pass retry operational
- [x] Field-level confidence scoring implemented
- [x] SSE streaming endpoint functional
- [x] Async job queue endpoint functional
- [x] FillJob model implemented with incremental updates
- [x] Original `/forms/fill` endpoint preserved
- [x] Comprehensive test coverage

---

## 📌 Additional Context

This redesign is transport- and orchestration-focused.

It is fully compatible with schema enforcement approaches such as PR #151 and does not conflict with model-level validation strategies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT]: Non-Blocking Async Extraction, Retry, and Observability for /forms/fill #152

name: 🚀 Feature Request
about: Suggest an idea or a new capability for FireForm.
title: "[FEAT]: Non-blocking async LLM extraction with streaming, retry, and job orchestration"
labels: enhancement
assignees: ''

📝 Description

💡 Rationale

🛠️ Proposed Solution

1️⃣ Asynchronous Concurrent Extraction

2️⃣ Two-Pass Auto-Retry Mechanism

3️⃣ Non-Blocking PDF Generation

4️⃣ New Client-Observable API Surfaces

`POST /forms/fill/stream`

`POST /forms/fill/async`

`GET /forms/jobs/{job_id}`

5️⃣ Database Orchestration Layer

✅ Acceptance Criteria

📌 Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Confidence	Meaning
high	Extracted in Pass 1
medium	Recovered in Pass 2
low	Missing after both passes

[FEAT]: Non-Blocking Async Extraction, Retry, and Observability for /forms/fill #152

Description

name: 🚀 Feature Request about: Suggest an idea or a new capability for FireForm. title: "[FEAT]: Non-blocking async LLM extraction with streaming, retry, and job orchestration" labels: enhancement assignees: ''

📝 Description

💡 Rationale

🛠️ Proposed Solution

1️⃣ Asynchronous Concurrent Extraction

2️⃣ Two-Pass Auto-Retry Mechanism

3️⃣ Non-Blocking PDF Generation

4️⃣ New Client-Observable API Surfaces

POST /forms/fill/stream

POST /forms/fill/async

GET /forms/jobs/{job_id}

5️⃣ Database Orchestration Layer

✅ Acceptance Criteria

📌 Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

name: 🚀 Feature Request
about: Suggest an idea or a new capability for FireForm.
title: "[FEAT]: Non-blocking async LLM extraction with streaming, retry, and job orchestration"
labels: enhancement
assignees: ''

`POST /forms/fill/stream`

`POST /forms/fill/async`

`GET /forms/jobs/{job_id}`