Skip to content

#142- checkboxes and radio buttons handling#143

Closed
Cubix33 wants to merge 2 commits into
fireform-core:mainfrom
Cubix33:checkboxex
Closed

#142- checkboxes and radio buttons handling#143
Cubix33 wants to merge 2 commits into
fireform-core:mainfrom
Cubix33:checkboxex

Conversation

@Cubix33

@Cubix33 Cubix33 commented Mar 2, 2026

Copy link
Copy Markdown

Closes #142

🚀 Description

This PR introduces technical support for PDF Checkbox and Radio Button widgets (/Btn field types).

Currently, the filling logic in filler.py assumes all form fields are text-based (/Tx), which causes issues when trying to "check" a box. This update allows the system to distinguish between text boxes and buttons, ensuring that checkboxes are visually rendered correctly by manipulating both the Value (/V) and Appearance State (/AS).

🛠️ Changes Made

src/llm.py

  • Updated add_response_to_json to normalize LLM outputs into standardized PDF boolean states (Yes or Off).
  • Improved handling of boolean intent (mapping "True", "Checked", or "1" to "Yes"). [cite: 1]

src/filler.py

  • Imported PdfName from pdfrw to support PDF-native name objects. [cite: 1]
  • Implemented logic to check the /FT (Field Type) of annotations. [cite: 1]
  • For /Btn widgets, the system now sets the /V and /AS keys using PdfName to ensure the checkmark is visually rendered in PDF viewers. [cite: 1]
  • Maintained standard text filling logic as a fallback for /Tx fields. [cite: 1]

🧪 How to Test

  1. Place a PDF containing interactive checkboxes (like a medical or government form) in src/inputs/file.pdf.
  2. Run the extraction: make exec.
  3. Open the output PDF. Verify that checkboxes are visually "checked" with an X or checkmark rather than just having text printed near them.

✅ Checklist

  • Checkboxes and Radio buttons are successfully toggled.
  • Logic correctly identifies /Btn vs /Tx field types.
  • No regressions in standard text box filling.
  • Code verified within the Docker container.

@utkarshqz

Copy link
Copy Markdown

Hi @Cubix33! Excellent work on this. Catching the /Btn vs /Tx sub-typing is a crucial first step for supporting interactive forms!

I was actually working on a similar structural refactor for checkboxes and ran into a few massive edge cases with real-world government PDFs that you might want to consider for this implementation:

  1. Appearance Streams (/AP): Be very careful with annot.AP = None. While clearing the appearance stream is required for /Tx (text fields) to force the PDF viewer to redraw the text string, doing this indiscriminately on /Btn fields will permanently delete the vector graphics for the checkmark/radio dot. The data will save as "Yes", but it will be visually invisible to the user.
  2. Adobe LiveCycle Radio Buttons: Many legacy government PDFs do not actually provide an /Opt array for their Radio Button parent groups. If the loop explicitly relies on /Opt to find the correct kid index, it will silently fail. I highly recommend adding a fallback that scans each kid.AP.N dictionary to dynamically reverse-engineer the graphic's "On" key (e.g., /Choice1) if the /Opt array resolves to None.
  3. Key-Based vs Positional Filling: Just structurally, to support true batch extraction, we likely need to refactor the root fill_form to map by explicit field keys (annot.T.strip()) rather than array positional indices so that the LLM extraction order doesn't break the PDF mapping.

I recently pushed a massive refactor that natively handles all three of these legacy PDF issues (especially the missing /Opt scanning) over in PR #246 if you want to take a look and merge our approaches!

Let me know what you think. Great work again on getting this started!

@Cubix33 Cubix33 closed this by deleting the head repository Apr 13, 2026
@Cubix33

Cubix33 commented Apr 14, 2026

Copy link
Copy Markdown
Author

Closing for now to reduce load on maintainers. Will reopen after further discussion or during GSOC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT]: Support for PDF Checkbox and Radio Button Widgets (/Btn)

2 participants