Skip to content

Conversation

@dibahlfi
Copy link
Member

@dibahlfi dibahlfi commented Aug 22, 2025

The retry utility class which keeps track of absolute timeout has a bug where the timeout check was not enforced for all exception types and for CosmosHttpResponseError it was done at the end of the block which was causing issues as some retry policies were raising exceptions that exit the function before reaching this timeout check. This meant timeout could be exceeded without ever being checked.
This PR intends to fix this by doing the timeout check at the beginning for all exceptions. It helps to ensure timeout is always checked regardless of which retry policy is used or what decision it makes, and this behavior is consistent across all exceptions.
This change makes the timeout a hard limit that takes precedence over retry policies, which is the expected behavior for a client timeout setting.

Copilot AI review requested due to automatic review settings August 22, 2025 17:56
@dibahlfi dibahlfi requested a review from a team as a code owner August 22, 2025 17:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a critical bug in the Cosmos DB SDK's retry utility where absolute timeout checks were not consistently enforced across all exception types. The timeout validation was previously only checked for CosmosHttpResponseError and at the end of the exception handling block, allowing timeouts to be exceeded when other retry policies raised exceptions that exited the function early.

  • Moves timeout validation to the beginning of exception handling for all exception types
  • Refactors exception handling from separate except blocks to a single unified block with isinstance checks
  • Updates test files to properly test timeout behavior across different error scenarios

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
sdk/cosmos/azure-cosmos/azure/cosmos/_retry_utility.py Moves timeout check to beginning of unified exception block for consistent enforcement
sdk/cosmos/azure-cosmos/azure/cosmos/aio/_retry_utility_async.py Same timeout fix applied to async version of retry utility
sdk/cosmos/azure-cosmos/tests/test_crud.py Updates timeout tests with better error scenario coverage and improved transport mocking
sdk/cosmos/azure-cosmos/tests/test_crud_async.py Async version of updated timeout tests with transport changes from requests to aiohttp

@dibahlfi
Copy link
Member Author

/azp run python - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@simorenoh simorenoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM just a couple questions. Also, I didn't see the TimeoutScope being used anywhere other than for read_items, is that intentional? and does the Page scope get used anywhere, maybe I missed it?

@dibahlfi
Copy link
Member Author

Overall LGTM just a couple questions. Also, I didn't see the TimeoutScope being used anywhere other than for read_items, is that intentional? and does the Page scope get used anywhere, maybe I missed it?

no Page is the default and since read_items is for logical operation I set it to Operation scope. In the query iterator we check for Operation scope and if its not we reset the time window. Page scope is just for clarity but we might use it in the future.

@dibahlfi
Copy link
Member Author

/azp run python - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dibahlfi
Copy link
Member Author

/azp run python - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dibahlfi
Copy link
Member Author

/azp run python - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@simorenoh simorenoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks Dikshi!

@dibahlfi
Copy link
Member Author

/azp run python - cosmos - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dibahlfi dibahlfi merged commit 9f7a2fb into main Nov 25, 2025
37 checks passed
@dibahlfi dibahlfi deleted the users/dibahl/absolute_timeout_fix branch November 25, 2025 02:24
msyyc pushed a commit that referenced this pull request Nov 25, 2025
simorenoh pushed a commit that referenced this pull request Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants