Skip to content

Conversation

@d-w-moore
Copy link
Collaborator

@d-w-moore d-w-moore commented May 19, 2025

Wherein we close down threads in an orderly way, so that things don't leave things to be disposed in the wrong order for the ever persnickety SSL shutdown logic.

Experiments show that SIGTERM actually does induce the Python interpreter to shut down non-daemonic threads, so installing a signal handler for that may not be necessary in the end.

@d-w-moore d-w-moore changed the title [_722] fix segfault and hung threads on SIGINT during parallel get [#722] fix segfault and hung threads on KeyboardIinterrupt during parallel get May 19, 2025
@d-w-moore d-w-moore self-assigned this May 19, 2025
@d-w-moore d-w-moore marked this pull request as draft May 19, 2025 17:09
@d-w-moore
Copy link
Collaborator Author

After a bit of manual testing, will attempt to make a proper test for SIGINT and SIGTERM to ensure things are left in an ok state.

@d-w-moore
Copy link
Collaborator Author

d-w-moore commented Jun 5, 2025

A GUI for example that maintains background asynch parallel transfers using PRC could trap and guard against Ctrl-C thusly:

from irods.parallel import abort_asynchronous_transfers
signal(SIGINT, lambda *_:exit(0 if abort_asynchronous_transfers() else 0))

Update: abort_asynchronous_transfers has transformed. It is now abort_parallel_transfers and may now be used to abort the current (just interrupted) synchronous transfer as well as all pending background ones. See the README updates in this pull request.

@d-w-moore d-w-moore force-pushed the segfault_parallel_io_722.m branch from abafff5 to fb36836 Compare June 6, 2025 13:48
Copy link
Contributor

@alanking alanking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. Just a couple of things in the test

@korydraughn
Copy link
Contributor

Looks like we have a conflict.

Seems this PR is close to completion?

@alanking
Copy link
Contributor

alanking commented Dec 5, 2025

Just checking to see if this PR is still being considered for 3.3.0.

@d-w-moore
Copy link
Collaborator Author

Just checking to see if this PR is still being considered for 3.3.0.

Will check its currency to see whether the segfault is still a concern. If so, then I think we can consider it for this release.

@d-w-moore d-w-moore force-pushed the segfault_parallel_io_722.m branch from fb36836 to 481952c Compare December 12, 2025 15:25
@korydraughn
Copy link
Contributor

What's the status of this PR?

@d-w-moore
Copy link
Collaborator Author

What's the status of this PR?

I believe it's almost ready. I want to look over it once more.

Copy link
Contributor

@alanking alanking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awaiting signal that this is ready

@d-w-moore
Copy link
Collaborator Author

Awaiting signal that this is ready

I've added a test (first draft, will run soon) that interrupts a put . We weren't testing that previously.

@d-w-moore
Copy link
Collaborator Author

Now open to comment ... once more. These changes are final in my mind, as far as the significant changes to library functionality.

@d-w-moore
Copy link
Collaborator Author

I can squash the last 6 commits or so, if that helps reviewers.

@korydraughn
Copy link
Contributor

Yes please.

@alanking
Copy link
Contributor

Also, take a peek at the 2 failing checks to see if there's anything actionable.

@d-w-moore d-w-moore force-pushed the segfault_parallel_io_722.m branch from 4b0458e to 14037f9 Compare January 15, 2026 21:23
@d-w-moore
Copy link
Collaborator Author

d-w-moore commented Jan 15, 2026

Sorry about the delay. (I see some reviews are already made.) I have just submitted a squash - but no actual changes of any kind since the last note I posted above.

@d-w-moore
Copy link
Collaborator Author

d-w-moore commented Jan 15, 2026

Still a TODO: Introduce a DataTransferInterruptedException and raise the RuntimeError from it, in the case that we need to signal a put or get didn't complete. We could also, possibly, deprecate catching the RuntimeError in case at time of 4.0.0 release we want to make it a bare DataTransferInterruptedException. (Name up for suggestions, but that's what I've settled on.)

Going to have to make an issue for this one and handle it later, that is in 4.0.0, I think. Things are looking pretty complex so far, and I don't want to complicate this release with unforeseen and as yet too-difficult-to-test problems....

@alanking
Copy link
Contributor

Awaiting resolution of open review comments.

@d-w-moore
Copy link
Collaborator Author

Awaiting resolution of open review comments.

Will take care of those later today. For now, putting in work done tho' new tests do not pass.

Multithread programming is evil.

… transfers

We now provide utility for bringing down parallel PUTs and GETs in an orderly way.

The segmentation faults could not be duplicated, although they are more likely in general when
aborting a main process that has spawned daemon threads.  Note that since 3.9 (the version we
now support at a minimum), Python no longer uses daemon threads in support of concurrent.futures.

use subtest.

try to preserve latest synchronous parallel put/get for orderly shutdown in signal handler

can now abort parallel transfers with SIGINT/^C or SIGTERM

some debug still remains.

[_722] update readme for signals and parallel put/get

prevent auto_close

satisfy static typing.

revise README

forward ref needed for mypy?

patch test

more informative error message when retcodes do not match

delete unnecessary "import irods"

Update README.md

Co-authored-by: Alan King <alanking@renci.org>

add a finite timeout

review comments

comments regarding futures returning None

test condition wait is ten minutes is the default, no need to specify in call

catch was a no-op

remove TODO's

[_722] test a data put is sanely interruptable

[squashed multiple commits] tighten up all the quit logic:

finish put test

debug(parallel)

debug(put-test)

behaves better if we add mgr to list sooner?

experimental changes ACTIVE_PATH

paths active

make return values consisten from io_multipart_*()

print debug on abort

almost there?

move statement where transfer_managers is updated

rework abort_transfer fn slightly

handle logic for prematurely shutdown executor

[another_squash] tidy, fix, add put test

add tools.py with shared functions.

make doc string more thorough, for abort_parallel_transfers().

codacy, review

ws

update README on abort_parallel_transfers

resolve and display causes of error as is best possible

currently we just get out cleanly, and make sure the causation is preserved.

Later (v4.0) we may introduce a TransferInterrupted or similar which can be
more useful than RuntimeError for indicating what happened.  (in place of
the RuntimeError("xxx failed.") in irods/manager/data_object_manager.)

whitespace

gettest

cond in handler

alter get test for multiple abort M.O. (exit from hdlr vs catch exc)

ensure multiple calls to quit() are not inefficient (but, in any case they are safe.)

noprint

by default abort_parallel_transfers should not increase reference counts.

extra param dicttype

remove README whitespace

comment,filter fnsf

generalize return code for tests

dictionary_type recognized as a more general parameter, "transform"

review comment (needed explanation for print)

deambiguate f -> future with previous mention.
@d-w-moore d-w-moore force-pushed the segfault_parallel_io_722.m branch from e3b5599 to 2700f86 Compare January 21, 2026 10:29
Co-authored-by: Kory Draughn <korydraughn@ymail.com>
@d-w-moore
Copy link
Collaborator Author

This appears ready for review. Should be the final round....

@korydraughn
Copy link
Contributor

Okay. Please take a look at codacy to see if there's anything worth addressing.

Copy link
Contributor

@korydraughn korydraughn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're very close.

iRODS server versions 4.2.9+ and file sizes larger than a default
threshold value of 32 Megabytes.

Because multithread processes under Unix-type operating systems sometimes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should multithread have an ed appended to it?

Suggested change
Because multithread processes under Unix-type operating systems sometimes
Because multithreaded processes under Unix-type operating systems sometimes

signal(SIGTERM, handler)

try:
# a multi-1247 put or get can leave non-daemon threads running if not treated with care.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# a multi-1247 put or get can leave non-daemon threads running if not treated with care.
# A multi-1247 put or get can leave non-daemon threads running if not treated with care.

non-daemon threads not finishing, which could risk preventing a prompt and
orderly exit from the main program.

When a signal or exception handler calls abort_parallel_transfers(), all
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When a signal or exception handler calls abort_parallel_transfers(), all
When a signal or exception handler calls `abort_parallel_transfers()`, all

Comment on lines +371 to +372
signal handlers. However, if desired, `abort_parallel_transfers` may bei
terated subsequently with (dry_run=True,...) to track the progress of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
signal handlers. However, if desired, `abort_parallel_transfers` may bei
terated subsequently with (dry_run=True,...) to track the progress of the
signal handlers. However, if desired, `abort_parallel_transfers()` may be
iterated subsequently with `(dry_run=True, ...)` to track the progress of the

Please reread your original words to make sure I interpreted them correctly.

signal handlers. However, if desired, `abort_parallel_transfers` may bei
terated subsequently with (dry_run=True,...) to track the progress of the
shutdown. The default object returned (a dictionary whose keys are weak
references to the thread managers) will have a boolean value of False once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we highlight instances of False in other sections of the README?

Suggested change
references to the thread managers) will have a boolean value of False once
references to the thread managers) will have a boolean value of `False` once


# Wait for download process to reach the point of spawning data transfer threads. In Python 3.9+ versions
# of the concurrent.futures module, these are nondaemon threads and will block the exit of the main thread
# unless measures are taken (#722).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is #722 referring to? Is it a TODO item or something else?

from the test function.
"""
start_time = time.clock_gettime_ns(time.CLOCK_BOOTTIME)
while not (truth_value := function()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider renaming function to callback so it's slightly more clear that it is something that is passed in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants