Skip to content

Conversation

@ezelkow1
Copy link
Member

This adds some fixes around fail_action 5. First it prevents a possible looping scenario, previously after exhausting waiting for a read lock after having failed in a write lock it would loop back around to check for write again. Now it will just go to origin. While this allows an extra request through it can prevent some odd scenarios, including one where it just writes back to the cache again anyway which would already be useless. Also if the original rww request is very slow then this allows waiters to bypass it.

Second there could be an issue around redirects, normally a redirect is fine since it would have a different cachekey but if we enable use_orig then we could end up in the same looping contention in that scenario as well.

Also adds some autests, though we should keep an eye on these since they could be timing dependent. So if we start getting failures we can turn them off.

@ezelkow1 ezelkow1 self-assigned this Jan 31, 2026
@ezelkow1 ezelkow1 added Enhancement dependencies Pull requests that update a dependency file and removed dependencies Pull requests that update a dependency file labels Jan 31, 2026
@ezelkow1
Copy link
Member Author

ezelkow1 commented Feb 2, 2026

[approve ci autest]

@ezelkow1 ezelkow1 merged commit 4a7c5f7 into apache:master Feb 2, 2026
15 checks passed
@ezelkow1 ezelkow1 deleted the fa5 branch February 2, 2026 18:13
@bryancall bryancall requested a review from Copilot February 3, 2026 23:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds fixes for cache fail_action 5 (READ_RETRY mode) to prevent infinite looping scenarios and handle redirect edge cases correctly.

Changes:

  • Prevents looping when read retries are exhausted by bypassing cache instead of attempting another write lock
  • Adds defensive check for redirect scenarios with redirect_use_orig_cache_key enabled
  • Adds comprehensive autotests to verify READ_RETRY mode stability and request collapsing

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/proxy/http/HttpTransact.cc Adds two checks to prevent looping in READ_RETRY state: one in HandleCacheOpenReadMiss and a defensive check in set_cache_prepare_write_action_for_new_request for redirect scenarios
tests/gold_tests/cache/replay/cache-read-retry.replay.yaml New test file that validates READ_RETRY mode with concurrent requests, slow origin responses, and verifies request collapsing and system stability
tests/gold_tests/cache/cache-read-retry-mode.test.py Test runner for the READ_RETRY mode test

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants