feat: Refactor LLM extraction pipeline to use a single structured JSON generation request by kushu30 · Pull Request #250 · fireform-core/FireForm

kushu30 · 2026-03-14T18:33:01Z

Summary

This PR refactors the LLM extraction pipeline to generate structured JSON using a single LLM request instead of issuing one request per field.

Previously, the system iterated through each template field and queried the LLM separately to extract values from the transcript. This resulted in multiple inference calls per form submission, increasing latency and introducing potential inconsistencies across responses.

The new implementation sends a single structured prompt containing all required fields and expects a JSON response from the model. This reduces the number of LLM calls and simplifies the extraction pipeline.

Motivation

The previous extraction approach performed sequential LLM requests for each field in the template:

reporting_officer
incident_location
amount_of_victims
victim_name_s
assisting_officer

This meant the number of LLM calls scaled linearly with the number of fields in the template.

Using a single structured extraction request improves performance, simplifies the pipeline, and ensures that the model has full context when generating the structured response.

Key Improvements

Replaced multiple sequential LLM requests with a single structured extraction call
Simplified extraction logic within the LLM class
Improved extraction consistency by providing the model with the full schema
Reduced inference overhead for form processing
Added structured prompt generation for JSON output

Implementation

The extraction workflow was refactored inside the LLM class.

Before:

Transcript
   ↓
Loop through fields
   ↓
LLM request per field
   ↓
Build JSON response
   ↓
Fill PDF

After:

Transcript
   ↓
Single LLM request
   ↓
Structured JSON response
   ↓
Validate and parse JSON
   ↓
Fill PDF

The structured prompt contains all required fields and instructs the model to return valid JSON.

Example extraction result:

{
  "reporting_officer": "Officer Voldemort",
  "incident_location": "456 Oak Street",
  "amount_of_victims": "2",
  "victim_name_s": ["Mark Smith", "Jane Doe"],
  "assisting_officer": null
}

Files Changed

src/llm.py

Testing

The updated pipeline was tested locally using the existing API endpoints.

Created a template using:

POST /templates/create

Generated a filled form using:

POST /forms/fill

Verified that:

the LLM extraction runs successfully using a single request
structured JSON is generated correctly
the filled PDF form is produced as expected

Unit tests were executed with:

PYTHONPATH=. pytest

Result:

2 passed

Impact

This change reduces the number of LLM inference calls required for form extraction and simplifies the overall extraction workflow without altering the existing API interface.

The new approach improves performance and provides a cleaner foundation for future improvements to the extraction pipeline.

…g a single LLM request

utkarshqz · 2026-03-17T04:02:43Z

Hi @kushu30 , great to see interest in optimizing the extraction pipeline! I actually implemented this single-request batching architecture on March 10 in PR #210 (later refined and consolidated into PR #241). My implementation currently supports dynamic field merging across multiple templates and is already integrated into the end-to-end frontend. Happy to hear your thoughts on how we can align these efforts so we don't have overlapping logic in the core!

feat: Refactor extraction pipeline to structured JSON generation usin…

1c1f28b

…g a single LLM request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Refactor LLM extraction pipeline to use a single structured JSON generation request#250

feat: Refactor LLM extraction pipeline to use a single structured JSON generation request#250
kushu30 wants to merge 1 commit into
fireform-core:mainfrom
kushu30:feature-structured-extraction

kushu30 commented Mar 14, 2026

Uh oh!

utkarshqz commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kushu30 commented Mar 14, 2026

Summary

Motivation

Key Improvements

Implementation

Files Changed

Testing

Impact

Uh oh!

utkarshqz commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants