-
Notifications
You must be signed in to change notification settings - Fork 3.2k
absolute timeout fix #42652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
absolute timeout fix #42652
Conversation
Co-authored-by: Copilot <[email protected]>
….com/Azure/azure-sdk-for-python into users/dibahl/client_reset_connection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a critical bug in the Cosmos DB SDK's retry utility where absolute timeout checks were not consistently enforced across all exception types. The timeout validation was previously only checked for CosmosHttpResponseError and at the end of the exception handling block, allowing timeouts to be exceeded when other retry policies raised exceptions that exited the function early.
- Moves timeout validation to the beginning of exception handling for all exception types
- Refactors exception handling from separate
exceptblocks to a single unified block withisinstancechecks - Updates test files to properly test timeout behavior across different error scenarios
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
sdk/cosmos/azure-cosmos/azure/cosmos/_retry_utility.py |
Moves timeout check to beginning of unified exception block for consistent enforcement |
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_retry_utility_async.py |
Same timeout fix applied to async version of retry utility |
sdk/cosmos/azure-cosmos/tests/test_crud.py |
Updates timeout tests with better error scenario coverage and improved transport mocking |
sdk/cosmos/azure-cosmos/tests/test_crud_async.py |
Async version of updated timeout tests with transport changes from requests to aiohttp |
|
/azp run python - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/aio/base_execution_context.py
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/azure/cosmos/_execution_context/aio/base_execution_context.py
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_cosmos_client_connection_async.py
Outdated
Show resolved
Hide resolved
simorenoh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM just a couple questions. Also, I didn't see the TimeoutScope being used anywhere other than for read_items, is that intentional? and does the Page scope get used anywhere, maybe I missed it?
no Page is the default and since read_items is for logical operation I set it to Operation scope. In the query iterator we check for Operation scope and if its not we reset the time window. Page scope is just for clarity but we might use it in the future. |
|
/azp run python - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run python - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run python - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
simorenoh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks Dikshi!
|
/azp run python - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
The retry utility class which keeps track of absolute timeout has a bug where the timeout check was not enforced for all exception types and for CosmosHttpResponseError it was done at the end of the block which was causing issues as some retry policies were raising exceptions that exit the function before reaching this timeout check. This meant timeout could be exceeded without ever being checked.
This PR intends to fix this by doing the timeout check at the beginning for all exceptions. It helps to ensure timeout is always checked regardless of which retry policy is used or what decision it makes, and this behavior is consistent across all exceptions.
This change makes the timeout a hard limit that takes precedence over retry policies, which is the expected behavior for a client timeout setting.