Fix #41: Add explicit binary detection for PDF files #48

Dreamstick9 · 2026-01-16T17:38:54Z

Some PDF files were being incorrectly identified as text because they start with a text header (%PDF-)
This change adds a check in src/typecode/contenttype.py
if a file is initially detected as text, it now looks for a %PDF- signature and correctly sets it to binary if found.
i also added a regression test in tests/test_testcontenttype.py to cover this case
Fixes #41

Signed-off-by: Kushagar Garg <[email protected]>

Fix aboutcode-org#41: Fix PDF binary detection and apply formatting

1cea9bb

Signed-off-by: Kushagar Garg <[email protected]>

Dreamstick9 force-pushed the main branch from 878d260 to 1cea9bb Compare January 16, 2026 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix #41: Add explicit binary detection for PDF files #48

Fix #41: Add explicit binary detection for PDF files #48

Uh oh!

Dreamstick9 commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Fix #41: Add explicit binary detection for PDF files #48

Are you sure you want to change the base?

Fix #41: Add explicit binary detection for PDF files #48

Uh oh!

Conversation

Dreamstick9 commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant