With a routine version bump of requirements, I noticed chardet had been switched out for charset_normalizer (which I had never heard of before) in #5797, apparently due to LGPL license concerns.
I agree with @sigmavirus24's comment #5797 (comment) that it's strange for something as central in the Python ecosystem as requests is (45k stars, 8k forks, many contributors at the time of writing) to switch to such a relatively unknown and unproven library (132 stars, 5 forks, 2 contributors) for a hard dependency in something as central in the Python ecosystem as requests is.
The release notes say you could use pip install "requests[use_chardet_on_py3]" to use chardet instead of charset_normalizer, but with that extra set both libraries get installed.
I would imagine many users don't really necessarily need the charset detection features in Requests; could we open a discussion on making both chardet/charset_normalizer optional, á la requests[chardet] or requests[charset_normalizer]?
AFAICS, the only place where chardet is actually used in requests is Response.apparent_encoding, which is used by Response.text when there is no determined encoding.
Maybe apparent_encoding could try to
- as a built-in first attempt, try decoding the content as UTF-8 (which would likely be successful for many cases)
- if neither
chardet or charset_normalizer is installed, warn the user ("No encoding detection library is installed. Falling back to XXXX. Please see YYYY for instructions" or somesuch) and return e.g. ascii
- use either chardet library as per usual
With a routine version bump of requirements, I noticed
chardethad been switched out forcharset_normalizer(which I had never heard of before) in #5797, apparently due to LGPL license concerns.I agree with @sigmavirus24's comment #5797 (comment) that it's strange for something as central in the Python ecosystem as
requestsis (45k stars, 8k forks, many contributors at the time of writing) to switch to such a relatively unknown and unproven library (132 stars, 5 forks, 2 contributors) for a hard dependency in something as central in the Python ecosystem asrequestsis.The release notes say you could use
pip install "requests[use_chardet_on_py3]"to usechardetinstead ofcharset_normalizer, but with that extra set both libraries get installed.I would imagine many users don't really necessarily need the charset detection features in Requests; could we open a discussion on making both
chardet/charset_normalizeroptional, á larequests[chardet]orrequests[charset_normalizer]?AFAICS, the only place where
chardetis actually used inrequestsisResponse.apparent_encoding, which is used byResponse.textwhen there is no determined encoding.Maybe
apparent_encodingcould try tochardetorcharset_normalizeris installed, warn the user ("No encoding detection library is installed. Falling back to XXXX. Please see YYYY for instructions" or somesuch) and return e.g.ascii