Skip to content

io_uring example hangs on first call to sync_wait #1949

@catfinated

Description

@catfinated

When compiling with both gcc 15.2.1 and clang 22.1.1 on 6.19.7-1-cachyos the example.io_uring executable hangs.

Steps to reproduce:

  1. cmake -S . -B build -DSTDEXEC_ENABLE_IO_URING=ON -GNinja
  2. cmake --build build
  3. build/examples/example.io_uring

TLDR

The initialization of __m in __umin is incorrect here: https://github.com/NVIDIA/stdexec/blob/main/include/stdexec/__detail/__utility.hpp#L129

Since std::size_t is unsigned, the check on line 132 will never be true and __m will never be updated so this function always returns 0.

This function only appears to be used in one spot detailed below which explains why this issue has likely gone unnoticed. The following diff fixes the issue:

❯ git diff include/stdexec/__detail/__utility.hpp
diff --git a/include/stdexec/__detail/__utility.hpp b/include/stdexec/__detail/__utility.hpp
index f1937319..66d58ff0 100644
--- a/include/stdexec/__detail/__utility.hpp
+++ b/include/stdexec/__detail/__utility.hpp
@@ -22,6 +22,7 @@
 #include <cstdarg>
 #include <cstdio>
 #include <initializer_list>
+#include <limits>
 #include <memory>   // IWYU pragma: keep for std::start_lifetime_as
 #include <new>      // IWYU pragma: keep for std::launder
 #include <utility>  // IWYU pragma: keep for std::unreachable
@@ -126,7 +127,7 @@ namespace STDEXEC
 
   inline constexpr auto __umin(std::initializer_list<std::size_t> __il) noexcept -> std::size_t
   {
-    std::size_t __m = 0;
+    std::size_t __m = std::numeric_limits<std::size_t>::max();    
     for (std::size_t __i: __il)
     {
       if (__i < __m)

Details

I understand io_uring support is experimental but I decided to debug a bit. The hang was happening on the first call stdexec::sync_wait here https://github.com/NVIDIA/stdexec/blob/main/examples/io_uring.cpp#L40. A simple strace shows the process hanging on a call to futex. It also shows that both contexts are having the rings setup with io_uring_setup but io_uring_enter is never called indicating no operations are being executed by either context.

❯ strace build/examples/example.io_uring 2>&1 |grep io_uring
execve("build/examples/example.io_uring", ["build/examples/example.io_uring"], 0x7ffe251ad100 /* 76 vars */) = 0
io_uring_setup(1024, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=1024, cq_entries=2048, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|IORING_FEAT_SQPOLL_NONFIXED|IORING_FEAT_EXT_ARG|IORING_FEAT_NATIVE_WORKERS|IORING_FEAT_RSRC_TAGS|IORING_FEAT_CQE_SKIP|IORING_FEAT_LINKED_FILE|IORING_FEAT_REG_REG_RING|IORING_FEAT_RECVSEND_BUNDLE|IORING_FEAT_MIN_TIMEOUT|IORING_FEAT_RW_ATTR|IORING_FEAT_NO_IOWAIT, sq_off={head=0, tail=4, ring_mask=16, ring_entries=24, flags=36, dropped=32, array=32832, user_addr=0}, cq_off={head=8, tail=12, ring_mask=20, ring_entries=28, overflow=44, cqes=64, flags=40, user_addr=0}}) = 3
io_uring_setup(1024, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=1024, cq_entries=2048, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|IORING_FEAT_SQPOLL_NONFIXED|IORING_FEAT_EXT_ARG|IORING_FEAT_NATIVE_WORKERS|IORING_FEAT_RSRC_TAGS|IORING_FEAT_CQE_SKIP|IORING_FEAT_LINKED_FILE|IORING_FEAT_REG_REG_RING|IORING_FEAT_RECVSEND_BUNDLE|IORING_FEAT_MIN_TIMEOUT|IORING_FEAT_RW_ATTR|IORING_FEAT_NO_IOWAIT, sq_off={head=0, tail=4, ring_mask=16, ring_entries=24, flags=36, dropped=32, array=32832, user_addr=0}, cq_off={head=8, tail=12, ring_mask=20, ring_entries=28, overflow=44, cqes=64, flags=40, user_addr=0}}) = 5

I added some local debug logging to run_until_stopped() and saw it was breaking out of the main loop here: https://github.com/NVIDIA/stdexec/blob/main/include/exec/linux/io_uring_context.hpp#L576 before any operations where submitted to the kernel. This led me to delve into run_some() where __n_total_submitted is updated from the result of __submission_queue::submit where the __max_submissions gets capped to 0 through the use of __umin here: https://github.com/NVIDIA/stdexec/blob/main/include/exec/linux/io_uring_context.hpp#L243 so even though the initial wakeup operation to keep the mainloop going waiting for work is started here: https://github.com/NVIDIA/stdexec/blob/main/include/exec/linux/io_uring_context.hpp#L562 it's never submitted to the kernel and the run loop breaks out before any of the timer based operations materialized by exec::schedule_after inside main() can be submitted to the kernel as well.

Note that the unit tests also appear to have a similar hang which is addressed by the above fix. There is also a namespace ambiguity issue in those tests that needs to be addressed on newer compilers:

❯ git diff test/exec/test_io_uring_context.cpp
diff --git a/test/exec/test_io_uring_context.cpp b/test/exec/test_io_uring_context.cpp
index 27e75267..dc931ea5 100644
--- a/test/exec/test_io_uring_context.cpp
+++ b/test/exec/test_io_uring_context.cpp
@@ -26,6 +26,7 @@
 #  include "exec/scope.hpp"
 #  include "exec/single_thread_context.hpp"
 #  include "exec/when_any.hpp"
+#  include "exec/start_detached.hpp"
 
 #  include "catch2/catch.hpp"
 
@@ -119,7 +120,7 @@ namespace
     io_uring_context   context;
     io_uring_scheduler scheduler = context.get_scheduler();
     bool               is_called = false;
-    start_detached(schedule(scheduler)
+       exec::start_detached(schedule(scheduler)
                    | then(
                      [&]
                      {
@@ -206,7 +207,7 @@ namespace
     io_uring_scheduler scheduler = context.get_scheduler();
     {
       bool is_called = false;
-      start_detached(schedule(scheduler)
+         exec::start_detached(schedule(scheduler)
                      | then(
                        [&]
                        {

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions