Significantly decrease profiling overhead & update build process (4.0.0) by Theelx · Pull Request #178 · pyutils/line_profiler

Theelx · 2022-11-02T02:18:07Z

This is a continuation of #165, which was accidentally closed.

This PR is a large amalgamation of work I've been doing over the past few months to make line_profiler better. The main improvements are:

The python trace callback that is called with every line of profiled code executed has been rewritten to make more extensive use of C++, and to reduce python interaction. Cython can generate annotated HTML files which indicate more python interaction by making the line more yellow. An explanation for this can be found here. Generally, the more yellow the line is, the Python interaction there will be, so the slower the line/function will be. I have attached screenshots of the HTML files plus their raw originals for easy viewing.
The callback is now approximately 2-3x faster. I have noticed in my other projects that lines that used to say they took 0.7µs (microseconds) to execute now say they execute in 0.2-0.3µs. This is because line_profiler isn't able to completely get rid of its own overhead when tracking lines. The overhead has been reduced from approximately 0.4µs to <0.1µs on my laptop, running Ubuntu 20.04.5 on Linux 5.17.9, with an Intel i7-11800H CPU, with turbo-boost disabled, and when plugged into its charger. This translates into some of my example programs going from a 3.5x slowdown with profiling enabled, to a 1.8x slowdown.
The build system has been switched to pure Cython. Scikit-build's Cython build system is woefully outdated, and it doesn't support nearly as many options as Cython's native build system. In addition, the Cmake process is unintuitive for many Python contributors, and the C-extension compilation process for local development is undocumented and complex. With the new system, one can just run pip install . when inside the cloned directory, and the build and installation is automatically done for them. An extra anecdotal benefit is that it seems to install a bit faster.
The -i option has been added. This was introduced in an Add option to automatically output a file every n seconds #46, but rejected due to not being an asyncio-based approach. At the moment, profiling asyncio code continues to work in 4.0.0 according to tests of other async code of mine, but I'd appreciate it if you could test with any async code you deem relevant. In addition, basic benchmarks at the bottom of the post indicate no noticeable slowdown when using the -i 1 option, and it doesn't mess with the GIL. If this needs to be split into a separate PR, I can do that.
This release would be 4.0.0 because in order to implement the C++ optimizations, I had to change how the code_map and last_time attributes on the LineProfiler object work. The attributes are still accessible from pure-Python code, but they contain different objects, so any code relying on specific behavior from those attributes may break. However, I don't think those attributes were ever meant to be relied upon by external code anyway.

The Cython code is a bit more complex now, but it should be managable. You'll notice I had to do some manipulation of the code objects of functions, because in order to avoid python interaction I couldn't store the function code objects in a Python dictionary, so I had to hash the function objects. This originally caused problems when there were two different functions with the exact same code, however that was fixed by making line_profiler add no-op instructions to any duplicate functions, so that they could be profiled as separate functions. This still

import asyncio
import math

@profile
async def testing(x):
    for i in range(x):
        y = math.pow(100, 150) + math.pow(100, 150)

asyncio.run(testing(7500000))

Above code without -i 1, 7.5m repetitions:
5.28s
5.30s
5.34s
5.31s

Above code with -i 1, 7.5m repetitions:
5.27s
5.32s
5.28s
5.30s

There is essentially no difference in performance with -i, as long as it's not done like every .001 seconds. Luckily, it only allows intervals in multiples of 1 second, and 0 seconds is the same as disabled, so there isn't really much chance of it breaking. The timings are approximately the same for both cases when removing asyncio.

Tweaks

…ix race condition on threaded code

… with highest # hits. This is the same behaviour used by `print_stats`

fixed regression introduced by 699f0d4

Fix race conditions in threaded code

codecov · 2022-11-02T13:43:57Z

Codecov Report

Merging #178 (09fa869) into main (71db7da) will not change coverage.
The diff coverage is 100.00%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #178   +/-   ##
=======================================
  Coverage   50.59%   50.59%           
=======================================
  Files           4        4           
  Lines         253      253           
  Branches       34       37    +3     
=======================================
  Hits          128      128           
  Misses        112      112           
  Partials       13       13

Impacted Files	Coverage Δ
line_profiler/line_profiler.py	`52.24% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 71db7da...09fa869. Read the comment docs.

Erotemic · 2022-11-02T14:14:11Z

-        code = <object>py_frame.f_code
-        if code in self.code_map:
+        block_hash = hash(get_frame_code(py_frame))
+        code_hash = compute_line_hash(block_hash, py_frame.f_lineno)


Is there a memory leak here? The get_frame_code function calls PyFrame_GetCode, which returns a PyCodeObject*, and that has its reference count decreased in the same function by Py_DECREF. However, the PyObject* returned by PyCode_GetCode is returned from that function, and (correct me if I'm wrong) I believe it returns a "strong reference" meaning that we need to manually call DECREF, but it never has it's reference count decreased later on. Is that an oversight, or am I misunderstanding PyCode_GetCode?

Your understanding is absolutely correct. However, the key piece of this, which I should add a comment mentioning, is that Cython takes care of the DECREF for us. If you compile with the annotated HTML, it shows a bit of yellow at the hash, due to the DECREF. When I get to my computer, I'll add a comment in explaining that.

Theelx · 2022-11-07T16:02:43Z

@Erotemic Are there any other roadblocks to getting this merged?

Erotemic · 2022-11-07T19:28:46Z

I want to look into the discussion that happened on your fork just to make sure I understand everything. This is just waiting on me having time to do that. But I'll try to make some later tonight.

Erotemic · 2022-11-11T16:50:23Z

-        for timing in timings.values():
-            self.assertEqual(timing.nhits, 1)
-
-    def test_gen_decorator(self):


I thought I had a patch that fixed and re-enabled all of these tests. Maybe it got lost?

That's possible. I don't recall the patch though, so I have no clue where to find it.

Erotemic · 2022-11-12T01:40:24Z

This has been waiting long enough. I'm satisfied that there are no security vulnerabilities and that the code is stable. There may be niche regressions, but those users can pin to old versions or help this community driven project out. Thank you @Theelx for your patience, dedication, and impactful contribution.

Theelx and others added 30 commits September 27, 2022 22:00

Add threading

f3254b0

fix erroneous variable name

39eb7d6

add -i option

5a38626

Update build process & massively decrease profiling overhead

04df6fc

Remove testing file

f7d5ec1

Expose python interface

d138c40

Improve python interface & update gitignore

00aabd5

Remove unused scikit stuff and update changelog/version

c4bb3fc

Update README.rst

71851bf

Remove unnecssary build requirements

d9ff50a

Fix formatting

f91c1e0

Fix bytecode hashing failing to display identical functions

d701729

Misc code cleanup

62f7657

More misc code cleanups

001cedf

Remove useless comments

5e65833

Move around external declaration for better 3.11 compatability

ba57c29

Add python 3.11 support

6f7d098

Drop unused build-time dependencies

a9448ec

Update kernprof version in sync

873535f

Fix tests

50df07a

Update changelog

5a0e3cf

Fix accidental cpdef

e47c57d

Fix cython submodule and relative imports

8218f55

Fix relative imports in order to remove redundant file

b25c875

Extract xor into a specialized function

d418abe

Rebase on main branch

380f049

Fix merge conflicts

796a26d

Fix a whoopsie

2f4888e

Use ns resolution for times on non-windows platforms

7cc13f4

Remove duplicate function in setup

5c51546

Erotemic and others added 12 commits October 2, 2022 17:26

Refactor test to use pytest intead of unittest

db76206

Make code_map and last_time backwards compatible

ed3c667

refactor code_replace

699f0d4

Merge pull request #1 from Erotemic/tweaks

ba4ad39

Tweaks

Fix _code_replace function

f6464b0

Move enable_count and _c_last_time to thread-local variables to f…

976da1b

…ix race condition on threaded code

Read new time before other operations

ba4f1ad

In cases where there are multiple entries per line, only keep the one…

a5d0e54

… with highest # hits. This is the same behaviour used by `print_stats`

fixed regression introduced by 699f0d4

dfc2229

Merge pull request #3 from fazalmajid/code_replace_fix

6cdceae

fixed regression introduced by 699f0d4

Merge branch 'main' into threading-4

0f8f188

Merge pull request #2 from noperator-zz/threading-4

abcf021

Fix race conditions in threaded code

Enable 311 wheels

1ad9b23

Erotemic reviewed Nov 2, 2022

View reviewed changes

Clarify DECREF

a1e56e7

Erotemic reviewed Nov 11, 2022

View reviewed changes

Theelx added 2 commits November 11, 2022 12:31

Undo README changes

cbc8bc6

Re-add CMakeLists for backup build method

09fa869

Erotemic merged commit fe0bb5a into pyutils:main Nov 12, 2022

TTsangSC mentioned this pull request Jun 8, 2025

FIX: aggregating profiling data from duplicate code objects #349

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significantly decrease profiling overhead & update build process (4.0.0)#178

Significantly decrease profiling overhead & update build process (4.0.0)#178
Erotemic merged 46 commits into
pyutils:mainfrom
Theelx:main

Theelx commented Nov 2, 2022 •

edited

Loading

Uh oh!

codecov Bot commented Nov 2, 2022 •

edited

Loading

Uh oh!

Erotemic Nov 2, 2022

Uh oh!

Theelx Nov 2, 2022

Uh oh!

Theelx commented Nov 7, 2022

Uh oh!

Erotemic commented Nov 7, 2022

Uh oh!

Uh oh!

Uh oh!

Erotemic Nov 11, 2022

Uh oh!

Theelx Nov 11, 2022

Uh oh!

Erotemic commented Nov 12, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Theelx commented Nov 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Nov 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Erotemic Nov 2, 2022

Choose a reason for hiding this comment

Uh oh!

Theelx Nov 2, 2022

Choose a reason for hiding this comment

Uh oh!

Theelx commented Nov 7, 2022

Uh oh!

Erotemic commented Nov 7, 2022

Uh oh!

Uh oh!

Uh oh!

Erotemic Nov 11, 2022

Choose a reason for hiding this comment

Uh oh!

Theelx Nov 11, 2022

Choose a reason for hiding this comment

Uh oh!

Erotemic commented Nov 12, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Theelx commented Nov 2, 2022 •

edited

Loading

codecov Bot commented Nov 2, 2022 •

edited

Loading