-
Notifications
You must be signed in to change notification settings - Fork 3.8k
CASSANDRA-20854: Support low-overhead async profiling #4487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
src/java/org/apache/cassandra/tools/profiler/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
test/unit/org/apache/cassandra/tools/AsyncProfilerServiceTest.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/config/CassandraRelevantProperties.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/tools/profiler/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/tools/profiler/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/tools/profiler/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/tools/profiler/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/tools/profiler/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/profiler/AsyncProfilerUnsafe.java
Outdated
Show resolved
Hide resolved
52edd69 to
52c1ed0
Compare
src/java/org/apache/cassandra/service/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
98683de to
073febe
Compare
src/java/org/apache/cassandra/service/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
| try | ||
| { | ||
| createLogDir(); | ||
| return Files.readAllBytes(new File(logDir, resultFile).toPath()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need a validate a file size here against a configured limit, if it is too large we can cause OOM or affect GC on a server size or OOM on a client side (nodetool), I suppose the default value should be around 25MiB...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@netudima should not we also limit duration of profiling? I think that something "sensible" should be used there as well but at the same time something not too short. Like 12h? That has to be enough for everybody I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we can not limit size of a file before it is created. Can we? This can be limited in async-profiler itself? We can basically just react on this. So what if async-profiler creates a file bigger than your limit? How do we actually get that information? As this is all asynchronous - if we wait for duration to expire (if we do not stop it ourselves).
What do you want to do with such a big file? Remove it?
We were also thinking about compressing it before sending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not we also limit duration of profiling?
yes, I think it makes sense to set an upper limit, maybe 12h is even too big.., I think 1h is more realistic one (I do not remember if I actually used more than 15 minutes in reality). If if we speak about hours it is more like continuous profiling use case and should be implemented in a different way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we can not limit size of a file before it is created
I've not seen such logic in async-profiler.., only in FlightRecorder. I am not sure if we can handle this use case easily enough to implement..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess time limit would only apply for the safe mode right?
That of course mitigates the problem, but does not quite make it bullet proof. I guess such a guardrail could depend on a monitor process server side that automatically stops the profiling if the process is going to be causing GC problems? That worries me as I think we may be getting in the area of over complicating things... Using nodetool against a live database is complex. And you can do worse things than profiling for an hour with it. Do those commands have guardrails?
src/java/org/apache/cassandra/service/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/service/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
| { | ||
| try | ||
| { | ||
| createLogDir(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this method (as well as fetch) be read-only and does not create a directory, especially if profiler is not enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's no harm there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still want to return list of files even it is not enabled. There is no harm creating that directory really ...
src/java/org/apache/cassandra/service/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
| } | ||
| }); | ||
| } | ||
| else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need a different logic for text files compared to binary ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because some results of async-profiler are not text - they are not html but binary, like jfr and similar.
So when I go to fetch it then I need to know into what I am going to save it. Because when I do new String(byte[]) and I write it to a file, it will be messed up as these bytes are not strings (only in html case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could probably do Files.write(new File(localFile).toPath(), content, CREATE, TRUNCATE_EXISTING, WRITE); for both cases but not sure how it will behave for strings. It will be probably fine.
src/java/org/apache/cassandra/tools/nodetool/AsyncProfileCommandGroup.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/tools/nodetool/AsyncProfileCommandGroup.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/service/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/service/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
| { | ||
| try | ||
| { | ||
| createLogDir(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still want to return list of files even it is not enabled. There is no harm creating that directory really ...
src/java/org/apache/cassandra/service/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/service/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/service/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/service/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/service/AsyncProfilerService.java
Outdated
Show resolved
Hide resolved
src/java/org/apache/cassandra/tools/nodetool/AsyncProfileCommandGroup.java
Outdated
Show resolved
Hide resolved
62ba006 to
f53c997
Compare
Add enable/disable AsyncProfiler JVM flags Refactor Add output format option & initial tests Refactor & add tests Add tests for system properties & refactor Checkstyle fixes & switching airlift with picocli (CASSANDRA-17445) Add more tests Fix Picocli 'Profile' command integration Address feedback Changes.txt Remove not needed property Add missing licenses Apply feedback Fix help tests refactoring fix commands hardened validation and simplification of commands fix log dir fixed tests more fixes add purge command more hardening implement list and fetch Remove unused import more fixes added documentation fixed nodetool help output tests added status command introduced binary download of files in fetch command if necessary hardened code ability to specify duration in human format (e.g. 5m) improved error parsing ability to execute purge, list and fetch even with disabled profiler async-profiler is disabled by default added nodetool tests add startup check checking kernel parameters Updates documentation with different profiler info Applies some feedback Merge AsyncProfiler and AsyncProfilerService Improvements Improve refactor Fixed folder for profiles Fix help tests Use config log folder Applies feedback more fixes
f53c997 to
1b6e538
Compare
This is a follow up PR on top of
#4255