Significantly decrease profiling overhead & update build process (4.0.0)#178
Conversation
…ix race condition on threaded code
… with highest # hits. This is the same behaviour used by `print_stats`
fixed regression introduced by 699f0d4
Fix race conditions in threaded code
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #178 +/- ##
=======================================
Coverage 50.59% 50.59%
=======================================
Files 4 4
Lines 253 253
Branches 34 37 +3
=======================================
Hits 128 128
Misses 112 112
Partials 13 13
Continue to review full report at Codecov.
|
| code = <object>py_frame.f_code | ||
| if code in self.code_map: | ||
| block_hash = hash(get_frame_code(py_frame)) | ||
| code_hash = compute_line_hash(block_hash, py_frame.f_lineno) |
There was a problem hiding this comment.
Is there a memory leak here? The get_frame_code function calls PyFrame_GetCode, which returns a PyCodeObject*, and that has its reference count decreased in the same function by Py_DECREF. However, the PyObject* returned by PyCode_GetCode is returned from that function, and (correct me if I'm wrong) I believe it returns a "strong reference" meaning that we need to manually call DECREF, but it never has it's reference count decreased later on. Is that an oversight, or am I misunderstanding PyCode_GetCode?
There was a problem hiding this comment.
Your understanding is absolutely correct. However, the key piece of this, which I should add a comment mentioning, is that Cython takes care of the DECREF for us. If you compile with the annotated HTML, it shows a bit of yellow at the hash, due to the DECREF. When I get to my computer, I'll add a comment in explaining that.
|
@Erotemic Are there any other roadblocks to getting this merged? |
|
I want to look into the discussion that happened on your fork just to make sure I understand everything. This is just waiting on me having time to do that. But I'll try to make some later tonight. |
| for timing in timings.values(): | ||
| self.assertEqual(timing.nhits, 1) | ||
|
|
||
| def test_gen_decorator(self): |
There was a problem hiding this comment.
I thought I had a patch that fixed and re-enabled all of these tests. Maybe it got lost?
There was a problem hiding this comment.
That's possible. I don't recall the patch though, so I have no clue where to find it.
|
This has been waiting long enough. I'm satisfied that there are no security vulnerabilities and that the code is stable. There may be niche regressions, but those users can pin to old versions or help this community driven project out. Thank you @Theelx for your patience, dedication, and impactful contribution. |
This is a continuation of #165, which was accidentally closed.
This PR is a large amalgamation of work I've been doing over the past few months to make line_profiler better. The main improvements are:
-ioption has been added. This was introduced in an Add option to automatically output a file every n seconds #46, but rejected due to not being an asyncio-based approach. At the moment, profiling asyncio code continues to work in 4.0.0 according to tests of other async code of mine, but I'd appreciate it if you could test with any async code you deem relevant. In addition, basic benchmarks at the bottom of the post indicate no noticeable slowdown when using the -i 1 option, and it doesn't mess with the GIL. If this needs to be split into a separate PR, I can do that.This release would be 4.0.0 because in order to implement the C++ optimizations, I had to change how the code_map and last_time attributes on the LineProfiler object work. The attributes are still accessible from pure-Python code, but they contain different objects, so any code relying on specific behavior from those attributes may break. However, I don't think those attributes were ever meant to be relied upon by external code anyway.
The Cython code is a bit more complex now, but it should be managable. You'll notice I had to do some manipulation of the code objects of functions, because in order to avoid python interaction I couldn't store the function code objects in a Python dictionary, so I had to hash the function objects. This originally caused problems when there were two different functions with the exact same code, however that was fixed by making line_profiler add no-op instructions to any duplicate functions, so that they could be profiled as separate functions. This still
Above code without -i 1, 7.5m repetitions:
5.28s
5.30s
5.34s
5.31s
Above code with -i 1, 7.5m repetitions:
5.27s
5.32s
5.28s
5.30s
There is essentially no difference in performance with -i, as long as it's not done like every .001 seconds. Luckily, it only allows intervals in multiples of 1 second, and 0 seconds is the same as disabled, so there isn't really much chance of it breaking. The timings are approximately the same for both cases when removing asyncio.