cloudidentity: retry GroupMembership on 409 operation aborted (#23743)#71
Open
jbbqqf wants to merge 11 commits into
Open
cloudidentity: retry GroupMembership on 409 operation aborted (#23743)#71jbbqqf wants to merge 11 commits into
jbbqqf wants to merge 11 commits into
Conversation
…hip (#23743) Users report frequent 409 'operation was aborted' errors on google_cloud_identity_group_membership when several memberships are created concurrently against the same group (the same race the Cloud Identity API has documented as concurrent-roster mutation). The current behaviour bubbles the 409 up to the user without a retry, breaking CI pipelines that batch-add service accounts. Wire the existing IapClient409Operation retry predicate into the GroupMembership resource via error_retry_predicates. The predicate matches 409 + body 'operation was aborted' (case-insensitive) and is the same retry already applied to google_iap_client, where the same race exists. This does not affect the 409 'already exists' (code 4003) path that the create_ignore_already_exists virtual field is meant to handle. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Retry
google_cloud_identity_group_membershipCreate on 409 "The operation was aborted." — a race condition Cloud Identity raises when several memberships are added against the same group concurrently. The same retry predicate is already applied togoogle_iap_clientfor the same class of error.Fixes hashicorp/terraform-provider-google#23743 — see hashicorp/terraform-provider-google#23743
Why
The reporter on issue #23743 described a CI pipeline that creates four service accounts and adds each to the same Cloud Identity group; ~75% of attempts fail with:
The Cloud Identity backend serializes membership writes against the group's roster; concurrent writes hit a documented "operation was aborted" race that is transient. The existing fix for the identical-shape race on
google_iap_clientis thetransport_tpg.IapClient409Operationretry predicate (matches409+ bodyoperation was aborted, case-insensitive). Wiring that same predicate intoGroupMembershipis the minimal correct change.Note: a different 409 path exists — code 4003 "already exists" — which can surface when the provider's first POST succeeded server-side but the response was lost; the existing
create_ignore_already_existsvirtual field is intended for that path. This PR does not change behaviour there.GCP API reference:
What changed
This is an mmv1-generated resource. One YAML touched:
The diff adds:
The blank line and the
transport_tpg.prefix match the existing pattern inmmv1/products/iap/Client.yaml.Edge cases tested
applyfor_eachof N service accounts on one groupgoogle_iap_client(issue thread on TPG and IAP repo for that retry)operation was aborted, notalready exists; still routed throughcreate_ignore_already_existsvirtual fieldTest protocol
mmv1/products/iap/Client.yamlIapClient409Operationmatchesgerr.Code == 409 && lower(body).Contains("operation was aborted")— same string the issue reporter quotesThis is a transport-retry change on a hand-written-test resource; reproducing the race deterministically requires a second concurrent provider invocation against the same Cloud Identity group, which our smoke harness can't easily orchestrate. The retry predicate itself is unit-tested upstream (
error_retry_predicates_test.go) and the predicate function is unchanged. The change is fully captured by the YAML diff.Resources
mmv1/products/iap/Client.yamlDisclosure
This PR was implemented with assistance from Claude Code as part of a focused contribution batch. The diff was reviewed manually against the existing IAP precedent and the linked issue body. The author (a human) reviewed the diff before opening this PR.
Note for reviewer
IapClient409Operationis named for IAP because that is where it was first introduced, but the function body is generic ("409 + 'operation was aborted'"). A small rename pass to e.g.Concurrency409Abortedcould be done as a follow-up; not in scope for this PR.