Skip to content

[FEAT]: Non-Blocking Async Extraction, Retry, and Observability for /forms/fill #152

@Acuspeedster

Description

@Acuspeedster

name: 🚀 Feature Request
about: Suggest an idea or a new capability for FireForm.
title: "[FEAT]: Non-blocking async LLM extraction with streaming, retry, and job orchestration"
labels: enhancement
assignees: ''

📝 Description

The current implementation of POST /forms/fill performs LLM extraction synchronously using requests.post() within the FastAPI request lifecycle.

Execution chain:

forms.py → controller.fill_form() → llm.get_data() → requests.post()

Because FastAPI runs on an asyncio event loop (via Uvicorn), this synchronous HTTP call blocks the event loop thread for the entire duration of Ollama inference.

Operational impact:

  • Each field extraction may take 2–5 seconds on CPU inference.
  • With sequential main_loop() (N fields), total blocking time scales linearly.
  • A 10-field form can hold the event loop for 20–50 seconds.
  • Even main_loop_batch() still blocks for the full duration of a single inference call.

This results in:

  • Event loop starvation
  • Inability to serve concurrent requests
  • 30+ second client response latency
  • No retry mechanism for missed fields
  • No per-field confidence visibility
  • No observable progress during extraction
  • No fault tolerance for partial success scenarios

💡 Rationale

PR #151 (currently open) proposes schema enforcement using Ollama’s format parameter and dynamically generated Pydantic models.

While that meaningfully improves output structural reliability, it does not address:

  • Transport-layer blocking
  • Concurrency limitations
  • Retry logic for null/missed fields
  • Event loop starvation
  • Client-side observability
  • Job orchestration

In high-impact public-sector deployments, a system must not only return structured output but must also:

  • Remain responsive under load
  • Provide progressive feedback
  • Handle partial failures gracefully
  • Support operational monitoring

This issue proposes a transport and orchestration layer redesign to meet those requirements.


🛠️ Proposed Solution

1️⃣ Asynchronous Concurrent Extraction

  • Replace requests with httpx.AsyncClient
  • Introduce async_extract_all_streaming() in src/llm.py
  • Launch per-field extraction tasks via asyncio.create_task()
  • Collect results using asyncio.as_completed()

This ensures:

  • Wall-clock time is bounded by the slowest field
  • Partial results become available immediately
  • Event loop remains free to serve other requests

If 9 fields resolve in 3 seconds and 1 resolves in 8 seconds:

  • Client receives 9 results at second 3
  • Final result at second 8

Prior implementations return nothing until second 8.


2️⃣ Two-Pass Auto-Retry Mechanism

After Pass 1 completes:

  • Any field returning None enters Pass 2.
  • _build_targeted_prompt() constructs a focused single-field prompt.
  • Explicit instruction: return -1 if not found.
  • Retry tasks launched concurrently.

Confidence scoring:

Confidence Meaning
high Extracted in Pass 1
medium Recovered in Pass 2
low Missing after both passes

This provides deterministic field-level reliability reporting.


3️⃣ Non-Blocking PDF Generation

  • Introduce fill_form_with_data() in filler.py
  • Offload pdfrw operations to a ThreadPoolExecutor via loop.run_in_executor()
  • Prevent CPU-bound PDF writes from blocking event loop

Correctness improvement:

  • None values written as empty strings instead of literal "None"

4️⃣ New Client-Observable API Surfaces

POST /forms/fill/stream

  • Returns text/event-stream
  • Emits one Server-Sent Event per field as soon as it resolves
  • Event payload includes:
    • field
    • value
    • confidence
    • phase
  • Final complete event includes:
    • submission_id
    • output_pdf_path

POST /forms/fill/async

  • Returns 202 with job_id
  • Full extraction pipeline runs as FastAPI BackgroundTask

GET /forms/jobs/{job_id}

Returns:

  • status (pending, running, complete, failed)
  • partial_results
  • field_confidence
  • output_pdf_path
  • error_message

5️⃣ Database Orchestration Layer

Introduce FillJob SQLModel table:

  • UUID primary key
  • template_id
  • input_text
  • status
  • output_pdf_path
  • partial_results (JSON)
  • field_confidence (JSON)
  • error_message
  • created_at

Repository functions:

  • create_job
  • get_job
  • update_job (**kwargs for partial updates)

This enables incremental persistence of extraction progress.


✅ Acceptance Criteria

  • Event loop no longer blocked by extraction
  • Concurrent per-field extraction implemented
  • Two-pass retry operational
  • Field-level confidence scoring implemented
  • SSE streaming endpoint functional
  • Async job queue endpoint functional
  • FillJob model implemented with incremental updates
  • Original /forms/fill endpoint preserved
  • Comprehensive test coverage

📌 Additional Context

This redesign is transport- and orchestration-focused.

It is fully compatible with schema enforcement approaches such as PR #151 and does not conflict with model-level validation strategies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions