Skip to content

Fix encoding issue in file writing to handle Non-ASCII characters#312

Closed
splendid-pilot wants to merge 0 commit into
pyutils:mainfrom
splendid-pilot:main
Closed

Fix encoding issue in file writing to handle Non-ASCII characters#312
splendid-pilot wants to merge 0 commit into
pyutils:mainfrom
splendid-pilot:main

Conversation

@splendid-pilot
Copy link
Copy Markdown
Contributor

Description:
This PR fixes a UnicodeEncodeError in line_profiler when profiling functions with non-ASCII characters (e.g., CJK characters) in function names or docstrings. The error occurs on Windows due to the default cp1252 encoding not supporting such characters.

Solution: The fix explicitly sets encoding='utf-8' in the write_text function calls in explicit_profiler.py, ensuring proper handling of non-ASCII characters during profiling output.

Why:
Ensures line_profiler works correctly with CJK and other non-ASCII characters, especially on Windows systems.
Prevents crashes when profiling functions with non-Latin characters in their names or docstrings.
image

@Erotemic
Copy link
Copy Markdown
Member

Ah Windows encoding issues. Such a pain.

LGTM. I'll merge once dashboards pass.

@Erotemic
Copy link
Copy Markdown
Member

I need to merge the 1.4.2 branch, after that point we will rebase on top of main and then add a CHANGELOG note. Then this will be merged into main and be part of the next patch release.

By default I'll wait for other PRs before I make a release, but if this is causing day-to-day issues, let me know and I can expedite the process.

@Erotemic Erotemic closed this Jan 18, 2025
@Erotemic
Copy link
Copy Markdown
Member

did not mean to close this, reopened here: #313

@splendid-pilot
Copy link
Copy Markdown
Contributor Author

No problem at all, and thank you for addressing the issue. I’ve already applied the change in my local environment, so there’s no need to expedite the process. Thanks again for your time and for maintaining this great project!

vnmabus added a commit to vnmabus/line_profiler that referenced this pull request May 18, 2025
When using the `%lprun` magic in Windows, and trying to store the profile results in a file, the following exception was raised if the functions profiled contain non-ASCII characters:

```python
UnicodeEncodeError                        Traceback (most recent call last)
Cell In[5], line 1

File ...\.conda\Lib\site-packages\IPython\core\interactiveshell.py:2482, in InteractiveShell.run_line_magic(self, magic_name, line, _stack_depth)
   2480     kwargs['local_ns'] = self.get_local_scope(stack_depth)
   2481 with self.builtin_trap:
-> 2482     result = fn(*args, **kwargs)
   2484 # The code below prevents the output from being displayed
   2485 # when using magics with decorator @output_can_be_silenced
   2486 # when the last Python token in the expression is a ';'.
   2487 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False):

File ...\.conda\Lib\site-packages\line_profiler\ipython_extension.py:161, in LineProfilerMagics.lprun(self, parameter_s)
    159 if text_file:
    160     pfile = open(text_file, "w")
--> 161     pfile.write(output)
    162     pfile.close()
    163     print(f"\n*** Profile printout saved to text file {text_file!r}. {message}")

File ...\.conda\Lib\encodings\cp1252.py:19, in IncrementalEncoder.encode(self, input, final)
     18 def encode(self, input, final=False):
---> 19     return codecs.charmap_encode(input,self.errors,encoding_table)[0]

UnicodeEncodeError: 'charmap' codec can't encode character '\u0394' in position 36719: character maps to <undefined>
```

This is because the default encoding in Windows does not support non-ASCII characters. Thus, the solution (as in pyutils#312) is to specify the UTF-8 encoding in the `open` call.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants