Skip to content

fix: bound timeouts when submit_timeout is set#665

Merged
DMcGee-SAS merged 2 commits into
sassoftware:mainfrom
dmsenter89:fix-submit-delay
Apr 24, 2026
Merged

fix: bound timeouts when submit_timeout is set#665
DMcGee-SAS merged 2 commits into
sassoftware:mainfrom
dmsenter89:fix-submit-delay

Conversation

@dmsenter89
Copy link
Copy Markdown
Contributor

What Problem does this PR address?

submit_timeout from #664 works correctly when SAS crashes outright (segfault): the
compute service detects the crash and remains responsive, so the cancel request
completes quickly and SASsubmitTimeout is raised as expected.

When SAS enters an infinite loop instead of crashing, the compute service stays up
but becomes CPU-starved. Two timeout=None sockets cause indefinite hangs:

  1. Status pollconn.getresponse() for the ?wait=N long-poll blocks until the
    server can respond, which may be far longer than N seconds. The deadline check
    never fires.
  2. Cancel request — even after the deadline fires, conn.connect() for the
    best-effort PUT cancel blocks indefinitely, so SASsubmitTimeout is never raised.

How this PR Addresses the Problem

Two targeted additions to the submit() polling loop in sasiohttp.py:

  1. Before the loop, when submit_timeout is set, conn.timeout is set to
    delay + 10 seconds. A TimeoutError from a slow server is caught, the
    connection is recycled, and the loop continues — where the deadline check at the
    top will fire.
  2. In the cancel block, conn.timeout is temporarily set to 5 seconds before
    conn.connect(), ensuring the cancel attempt always gives up promptly.

Behavior is unchanged when submit_timeout is not passed.

Relation to #664

submit_timeout, SASsubmitTimeout, and the deadline logic from #664 are untouched.
This PR makes them work for the full range of failure modes, not just server crashes.

Michael Senter and others added 2 commits April 23, 2026 18:15
When a SAS job causes an infinite loop, the compute service can become
CPU-starved and unable to respond to HTTP requests promptly.  Two places
in the submit() polling loop used timeout=None, allowing indefinite hangs:

1. Status poll (conn.getresponse() for the ?wait=N long-poll): if the
   server is too slow to respond, the deadline check never fires.  Fixed
   by setting conn.timeout = delay + 10 before the polling loop when
   submit_timeout is active.  A TimeoutError from a slow server is caught
   and the loop continues, where the deadline check at the top will fire.

2. Cancel-job PUT (conn.connect() after the deadline fires): if the server
   is still CPU-starved, the cancel request blocks indefinitely and
   SASsubmitTimeout is never raised.  Fixed by temporarily setting
   conn.timeout = 5 for the cancel attempt, restoring it in a finally block.

Together these ensure SASsubmitTimeout is raised within approximately
submit_timeout + delay + 10 + 5 seconds worst-case, regardless of
whether the server crashes (segfault) or enters an infinite loop.
Behavior is unchanged when submit_timeout is not passed.

Signed-off-by: Michael Senter, PhD <dmsenter89@users.noreply.github.com>
Signed-off-by: Michael Senter, PhD <dmsenter89@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@DMcGee-SAS DMcGee-SAS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@DMcGee-SAS DMcGee-SAS marked this pull request as ready for review April 24, 2026 12:53
@DMcGee-SAS DMcGee-SAS merged commit 66597e3 into sassoftware:main Apr 24, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants