-
Notifications
You must be signed in to change notification settings - Fork 74
[#722] fix segfault and hung threads on KeyboardIinterrupt during parallel get #728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
After a bit of manual testing, will attempt to make a proper test for SIGINT and SIGTERM to ensure things are left in an ok state. |
e9d96e2 to
7e9a09f
Compare
|
A GUI for example that maintains background asynch parallel transfers using PRC could trap and guard against Ctrl-C thusly: Update: |
abafff5 to
fb36836
Compare
alanking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable. Just a couple of things in the test
|
Looks like we have a conflict. Seems this PR is close to completion? |
|
Just checking to see if this PR is still being considered for 3.3.0. |
Will check its currency to see whether the segfault is still a concern. If so, then I think we can consider it for this release. |
fb36836 to
481952c
Compare
|
What's the status of this PR? |
I believe it's almost ready. I want to look over it once more. |
alanking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awaiting signal that this is ready
I've added a test (first draft, will run soon) that interrupts a put . We weren't testing that previously. |
|
Now open to comment ... once more. These changes are final in my mind, as far as the significant changes to library functionality. |
|
I can squash the last 6 commits or so, if that helps reviewers. |
|
Yes please. |
|
Also, take a peek at the 2 failing checks to see if there's anything actionable. |
4b0458e to
14037f9
Compare
|
Sorry about the delay. (I see some reviews are already made.) I have just submitted a squash - but no actual changes of any kind since the last note I posted above. |
Going to have to make an issue for this one and handle it later, that is in |
|
Awaiting resolution of open review comments. |
Will take care of those later today. For now, putting in work done tho' new tests do not pass. Multithread programming is evil. |
… transfers
We now provide utility for bringing down parallel PUTs and GETs in an orderly way.
The segmentation faults could not be duplicated, although they are more likely in general when
aborting a main process that has spawned daemon threads. Note that since 3.9 (the version we
now support at a minimum), Python no longer uses daemon threads in support of concurrent.futures.
use subtest.
try to preserve latest synchronous parallel put/get for orderly shutdown in signal handler
can now abort parallel transfers with SIGINT/^C or SIGTERM
some debug still remains.
[_722] update readme for signals and parallel put/get
prevent auto_close
satisfy static typing.
revise README
forward ref needed for mypy?
patch test
more informative error message when retcodes do not match
delete unnecessary "import irods"
Update README.md
Co-authored-by: Alan King <alanking@renci.org>
add a finite timeout
review comments
comments regarding futures returning None
test condition wait is ten minutes is the default, no need to specify in call
catch was a no-op
remove TODO's
[_722] test a data put is sanely interruptable
[squashed multiple commits] tighten up all the quit logic:
finish put test
debug(parallel)
debug(put-test)
behaves better if we add mgr to list sooner?
experimental changes ACTIVE_PATH
paths active
make return values consisten from io_multipart_*()
print debug on abort
almost there?
move statement where transfer_managers is updated
rework abort_transfer fn slightly
handle logic for prematurely shutdown executor
[another_squash] tidy, fix, add put test
add tools.py with shared functions.
make doc string more thorough, for abort_parallel_transfers().
codacy, review
ws
update README on abort_parallel_transfers
resolve and display causes of error as is best possible
currently we just get out cleanly, and make sure the causation is preserved.
Later (v4.0) we may introduce a TransferInterrupted or similar which can be
more useful than RuntimeError for indicating what happened. (in place of
the RuntimeError("xxx failed.") in irods/manager/data_object_manager.)
whitespace
gettest
cond in handler
alter get test for multiple abort M.O. (exit from hdlr vs catch exc)
ensure multiple calls to quit() are not inefficient (but, in any case they are safe.)
noprint
by default abort_parallel_transfers should not increase reference counts.
extra param dicttype
remove README whitespace
comment,filter fnsf
generalize return code for tests
dictionary_type recognized as a more general parameter, "transform"
review comment (needed explanation for print)
deambiguate f -> future with previous mention.
e3b5599 to
2700f86
Compare
… attached to RuntimeError
Co-authored-by: Kory Draughn <korydraughn@ymail.com>
|
This appears ready for review. Should be the final round.... |
|
Okay. Please take a look at codacy to see if there's anything worth addressing. |
korydraughn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're very close.
| iRODS server versions 4.2.9+ and file sizes larger than a default | ||
| threshold value of 32 Megabytes. | ||
|
|
||
| Because multithread processes under Unix-type operating systems sometimes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should multithread have an ed appended to it?
| Because multithread processes under Unix-type operating systems sometimes | |
| Because multithreaded processes under Unix-type operating systems sometimes |
| signal(SIGTERM, handler) | ||
|
|
||
| try: | ||
| # a multi-1247 put or get can leave non-daemon threads running if not treated with care. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # a multi-1247 put or get can leave non-daemon threads running if not treated with care. | |
| # A multi-1247 put or get can leave non-daemon threads running if not treated with care. |
| non-daemon threads not finishing, which could risk preventing a prompt and | ||
| orderly exit from the main program. | ||
|
|
||
| When a signal or exception handler calls abort_parallel_transfers(), all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| When a signal or exception handler calls abort_parallel_transfers(), all | |
| When a signal or exception handler calls `abort_parallel_transfers()`, all |
| signal handlers. However, if desired, `abort_parallel_transfers` may bei | ||
| terated subsequently with (dry_run=True,...) to track the progress of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| signal handlers. However, if desired, `abort_parallel_transfers` may bei | |
| terated subsequently with (dry_run=True,...) to track the progress of the | |
| signal handlers. However, if desired, `abort_parallel_transfers()` may be | |
| iterated subsequently with `(dry_run=True, ...)` to track the progress of the |
Please reread your original words to make sure I interpreted them correctly.
| signal handlers. However, if desired, `abort_parallel_transfers` may bei | ||
| terated subsequently with (dry_run=True,...) to track the progress of the | ||
| shutdown. The default object returned (a dictionary whose keys are weak | ||
| references to the thread managers) will have a boolean value of False once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we highlight instances of False in other sections of the README?
| references to the thread managers) will have a boolean value of False once | |
| references to the thread managers) will have a boolean value of `False` once |
|
|
||
| # Wait for download process to reach the point of spawning data transfer threads. In Python 3.9+ versions | ||
| # of the concurrent.futures module, these are nondaemon threads and will block the exit of the main thread | ||
| # unless measures are taken (#722). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is #722 referring to? Is it a TODO item or something else?
| from the test function. | ||
| """ | ||
| start_time = time.clock_gettime_ns(time.CLOCK_BOOTTIME) | ||
| while not (truth_value := function()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming function to callback so it's slightly more clear that it is something that is passed in.
Wherein we close down threads in an orderly way, so that things don't leave things to be disposed in the wrong order for the ever persnickety SSL shutdown logic.
Experiments show that SIGTERM actually does induce the Python interpreter to shut down non-daemonic threads, so installing a signal handler for that may not be necessary in the end.