⚡️ Describe the Bug
The PDF extraction and filling process is producing inaccurate results by repeating the same value (e.g., "John Doe") across multiple unrelated form fields. Additionally, the core extraction loop crashes mid-execution due to an AttributeError when trying to parse the target fields.
👣 Steps to Reproduce
- In
main.py, allow reader.get_fields() to overwrite the target fields with raw PDF widget names (e.g., textbox_0_0).
- Run the extraction process via
controller.fill_form().
- In
llm.py, the main_loop() method attempts to iterate using for field in self._target_fields.keys():.
- If the crash is bypassed, observe the LLM guessing the same value for every field because it lacks semantic context for prompts like "textbox_0_0".
📉 Expected Behavior
- Context Preservation: The system should pass human-readable labels (e.g., "Employee Name") to the LLM so it can accurately extract distinct, contextually correct values.
- Output: The final JSON and PDF should contain unique data mapped appropriately, with properly stripped whitespace for plural values.
🖥️ Environment Information
- OS: WSL / Ubuntu
- Docker/Compose Version: N/A (Running locally)
- Ollama Model used: mistral
📸 Screenshots/Logs
[LOG] Resulting JSON created from the input text:
{
"textbox_0_0": "John Doe",
"textbox_0_1": "John Doe",
"textbox_0_2": "managing director",
"textbox_0_3": "managing director",
"textbox_0_4": "John Doe",
"textbox_0_5": "John Doe",
"textbox_0_6": "managing director"
}
##🕵️ Possible Fix
In main.py: Stop overwriting descriptive_fields with reader.get_fields(). Pass the human-readable list to the controller.
In llm.py (main_loop): Remove the .keys() call. It should be for field in self._target_fields:.
⚡️ Describe the Bug
The PDF extraction and filling process is producing inaccurate results by repeating the same value (e.g., "John Doe") across multiple unrelated form fields. Additionally, the core extraction loop crashes mid-execution due to an
AttributeErrorwhen trying to parse the target fields.👣 Steps to Reproduce
main.py, allowreader.get_fields()to overwrite the target fields with raw PDF widget names (e.g.,textbox_0_0).controller.fill_form().llm.py, themain_loop()method attempts to iterate usingfor field in self._target_fields.keys():.📉 Expected Behavior
🖥️ Environment Information
📸 Screenshots/Logs
##🕵️ Possible Fix
In main.py: Stop overwriting descriptive_fields with reader.get_fields(). Pass the human-readable list to the controller.
In llm.py (main_loop): Remove the .keys() call. It should be for field in self._target_fields:.