Skip to content

Conversation

@CodeZHXS
Copy link
Contributor

  • Bugfix: Use rt_tgsigqueueinfo to send the signal to correct worker thread

What problem does this PR solve?

Issue Number: fix #3038

Problem Summary:

What is changed and the side effects?

Changed:

Side effects:

  • Performance effects:

  • Breaking backward compatibility:


Check List:

@CodeZHXS CodeZHXS force-pushed the fix_signal_trace_syscall branch 2 times, most recently from 4abae57 to 8a48105 Compare July 24, 2025 13:20
@wwbmmm wwbmmm requested review from chenBright and Copilot July 25, 2025 11:34

This comment was marked as outdated.

@chenBright
Copy link
Contributor

chenBright commented Jul 26, 2025

https://man7.org/linux/man-pages/man2/sigqueue.2.html

On Linux, sigqueue() is implemented using the rt_sigqueueinfo(2) system call.

@chenBright
Copy link
Contributor

Perhaps we can use pthread_sigqueue which has better compatibility.

https://man7.org/linux/man-pages/man3/pthread_sigqueue.3.html

The pthread_sigqueue() function performs a similar task to
sigqueue(3), but, rather than sending a signal to a process, it
sends a signal to a thread in the same process as the calling thread.

@CodeZHXS
Copy link
Contributor Author

CodeZHXS commented Jul 26, 2025

Perhaps we can use pthread_sigqueue which has better compatibility.

https://man7.org/linux/man-pages/man3/pthread_sigqueue.3.html

The pthread_sigqueue() function performs a similar task to
sigqueue(3), but, rather than sending a signal to a process, it
sends a signal to a thread in the same process as the calling thread.

It's okay. So I need to change the type of worker_tid from pid_t to pthread_t, right?

@chenBright
Copy link
Contributor

So I need to change the type of worker_tid from pid_t to pthread_t, right?

Yes.

@CodeZHXS CodeZHXS force-pushed the fix_signal_trace_syscall branch 3 times, most recently from 4ad9917 to eecd82a Compare July 26, 2025 08:11
@CodeZHXS
Copy link
Contributor Author

@chenBright Please review again.

@CodeZHXS CodeZHXS requested a review from Copilot July 26, 2025 08:16
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a signal handling bug in the Signal Trace mode where SIGURG was being sent to the wrong thread. The core issue was using process IDs (pid_t) instead of pthread IDs (pthread_t) for thread identification.

Key changes:

  • Changed thread identification from pid_t to pthread_t throughout the codebase
  • Replaced sigqueue() with pthread_sigqueue() for proper thread-specific signal delivery
  • Updated initialization values from -1 to 0 for pthread compatibility

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/bthread/task_tracer.h Updated function signatures and member variables to use pthread_t
src/bthread/task_tracer.cpp Implemented thread safety check and switched to pthread_sigqueue()
src/bthread/task_meta.h Changed worker thread ID type from pid_t to pthread_t
src/bthread/task_group.h Updated TaskGroup thread ID member and accessor method
src/bthread/task_group.cpp Updated local variable type for consistency
src/bthread/task_control.cpp Changed thread ID assignment from syscall(SYS_gettid) to pthread_self()

// Therefore, use async-signal-safe absl::DefaultStackUnwinder instead of libunwind.

if (worker_tid == pthread_self()) {
return Result::MakeErrorResult("Forbid to trace self=%d", worker_tid);
Copy link

Copilot AI Jul 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format specifier %d is incorrect for pthread_t. pthread_t is an opaque type that may not be an integer on all platforms. Consider using %p to format it as a pointer or convert it appropriately for display.

Suggested change
return Result::MakeErrorResult("Forbid to trace self=%d", worker_tid);
return Result::MakeErrorResult("Forbid to trace self=%p", (void*)worker_tid);

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On other platforms, pthread_t is not an integer, but a pointer.

// Remove reference for SignalHandler.
signal_sync->RemoveRefManually();
return Result::MakeErrorResult("Fail to sigqueue: %s, tid: %d", berror(), tid);
return Result::MakeErrorResult("Fail to pthread_sigqueue: %s, tid: %d", berror(), worker_tid);
Copy link

Copilot AI Jul 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format specifier %d is incorrect for pthread_t. pthread_t is an opaque type that may not be an integer on all platforms. Consider using %p to format it as a pointer or convert it appropriately for display.

Suggested change
return Result::MakeErrorResult("Fail to pthread_sigqueue: %s, tid: %d", berror(), worker_tid);
return Result::MakeErrorResult("Fail to pthread_sigqueue: %s, tid: %p", berror(), (void*)worker_tid);

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On other platforms, pthread_t is not an integer, but a pointer.

@CodeZHXS CodeZHXS force-pushed the fix_signal_trace_syscall branch from eecd82a to ef2d16e Compare July 26, 2025 08:23
// Therefore, use async-signal-safe absl::DefaultStackUnwinder instead of libunwind.

if (worker_tid == pthread_self()) {
return Result::MakeErrorResult("Forbid to trace self=%d", worker_tid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On other platforms, pthread_t is not an integer, but a pointer.

// Remove reference for SignalHandler.
signal_sync->RemoveRefManually();
return Result::MakeErrorResult("Fail to sigqueue: %s, tid: %d", berror(), tid);
return Result::MakeErrorResult("Fail to pthread_sigqueue: %s, tid: %d", berror(), worker_tid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On other platforms, pthread_t is not an integer, but a pointer.

@CodeZHXS CodeZHXS force-pushed the fix_signal_trace_syscall branch 2 times, most recently from e002e3a to 825625b Compare July 28, 2025 14:58
* Bugfix: Use `pthread_sigqueue` to send the signal to correct worker thread'

* Change the type of `TaskGroup::_tid` to `pthread_t`
@CodeZHXS CodeZHXS force-pushed the fix_signal_trace_syscall branch from 825625b to 5fe1a06 Compare July 28, 2025 15:01
@CodeZHXS
Copy link
Contributor Author

@chenBright ping

Copy link
Contributor

@chenBright chenBright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenBright chenBright merged commit b6ae7bf into apache:master Aug 26, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Signal Trace mode may send SIGURG to wrong thread

2 participants