Skip to content

Conversation

@Dreamstick9
Copy link

Some PDF files were being incorrectly identified as text because they start with a text header (%PDF-)
This change adds a check in src/typecode/contenttype.py
if a file is initially detected as text, it now looks for a %PDF- signature and correctly sets it to binary if found.
i also added a regression test in tests/test_testcontenttype.py to cover this case
Fixes #41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PDF file detected as non-binary

1 participant