Skip to content

fix(opencode): extract statusCode from error variants for 5xx retry#19204

Open
okuyam2y wants to merge 1 commit intoanomalyco:devfrom
okuyam2y:fix/5xx-retry-error-variants
Open

fix(opencode): extract statusCode from error variants for 5xx retry#19204
okuyam2y wants to merge 1 commit intoanomalyco:devfrom
okuyam2y:fix/5xx-retry-error-variants

Conversation

@okuyam2y
Copy link
Copy Markdown

@okuyam2y okuyam2y commented Mar 26, 2026

Issue for this PR

Closes #19203

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Some OpenAI-compatible providers throw plain errors or objects instead of APICallError, so 5xx responses fall through to UnknownError and never retry. This extracts the status code from common error shapes so 5xx responses stay retryable.

How did you verify your code works?

Added unit coverage for the status extraction paths and negative cases. bun test test/session/message-v2.test.ts and bun run typecheck pass.

Screenshots / recordings

N/A — no UI change.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

@github-actions github-actions bot added the needs:compliance This means the issue will auto-close after 2 hours. label Mar 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The following comment was made by an LLM, it may be inaccurate:

Potential Related PRs

Found 2 related PRs that address similar retry logic issues:

  1. PR fix(retry): retry transient 429 responses even when provider marks non-retryable #18443 - fix(retry): retry transient 429 responses even when provider marks non-retryable

  2. PR fix: retry on AI_TypeValidationError from provider 5xx responses #16228 - fix: retry on AI_TypeValidationError from provider 5xx responses

These PRs are not exact duplicates but address the same general area (session/retry error handling for provider responses). You may want to review them for potential conflicts or ensure consistency in the retry logic approach.

@github-actions github-actions bot removed the needs:compliance This means the issue will auto-close after 2 hours. label Mar 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for updating your PR! It now meets our contributing guidelines. 👍

@okuyam2y okuyam2y force-pushed the fix/5xx-retry-error-variants branch from 4be2e20 to dbb61ba Compare April 6, 2026 14:20
@okuyam2y
Copy link
Copy Markdown
Author

okuyam2y commented Apr 6, 2026

Rebased onto latest dev and squashed into a single commit. All 28 message-v2 tests pass, typecheck clean.

Changes from previous version:

  • Adapted to upstream's errorMessage() helper (added in recent commits)
  • Squashed 3 commits into 1 for cleaner history

Would appreciate a review when you get a chance. Happy to address any feedback.

@okuyam2y okuyam2y force-pushed the fix/5xx-retry-error-variants branch from dbb61ba to 17b44a0 Compare April 12, 2026 11:34
@okuyam2y okuyam2y changed the title fix(session): retry on 5xx errors from non-standard provider error types fix(opencode): extract statusCode from error variants for 5xx retry Apr 12, 2026
@okuyam2y okuyam2y marked this pull request as draft April 12, 2026 11:34
@okuyam2y okuyam2y marked this pull request as ready for review April 12, 2026 11:40
@tim-mohrbach-ikigai
Copy link
Copy Markdown

Nice PR! This covers the most common error shapes well.

One edge case I have hit with OpenRouter: their error JSON uses code instead of status/statusCode:

{"code":502,"message":"Network connection lost.","metadata":{"error_type":"provider_unavailable"}}

The current JSON fallback checks obj?.status ?? obj?.statusCode ?? obj?.response?.status ?? obj?.response?.statusCode, which misses obj?.code. Adding it to the extraction would cover OpenRouter's format too:

const status =
  obj?.status ?? obj?.statusCode ?? obj?.code ??
  obj?.response?.status ?? obj?.response?.statusCode
if (typeof status === "number" && status >= 500) {

The metadata.error_type field (e.g. "provider_unavailable") could also be useful for classification, but just checking code for numeric 5xx values would already fix the OpenRouter case.

I logged this separately in #22448 — happy to close it as a duplicate if this PR ends up covering the code field.

Some OpenAI-compatible providers throw native Error subclasses or plain
objects instead of ai-sdk's APICallError. These fell through to
NamedError.Unknown and were not retried, even on 5xx status codes.

Extract statusCode from common error shapes (Error.status,
Error.statusCode, Error.response.status, JSON-encoded message) and
treat 5xx as retryable in both `instanceof Error` and plain object
fallback branches.

Closes anomalyco#19203
@okuyam2y okuyam2y force-pushed the fix/5xx-retry-error-variants branch from 17b44a0 to 4b4e18f Compare April 15, 2026 02:39
@okuyam2y
Copy link
Copy Markdown
Author

Good catch, thanks! I've added code to the extraction chain in all three branches (Error, JSON-encoded, plain object) and the ErrorWithStatus interface. Three new tests cover the OpenRouter format.

Updated in 4b4e18f. Also rebased onto latest dev.

@tim-mohrbach-ikigai feel free to close #22448 if this covers your case — or keep it open if you want metadata.error_type classification as a separate enhancement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: 5xx errors from non-standard providers are not retried

2 participants