Skip to content

Add Responses stream lifecycle diagnostics#19755

Closed
etraut-openai wants to merge 8 commits into
mainfrom
etraut/stream-lifecycle-diagnostics
Closed

Add Responses stream lifecycle diagnostics#19755
etraut-openai wants to merge 8 commits into
mainfrom
etraut/stream-lifecycle-diagnostics

Conversation

@etraut-openai
Copy link
Copy Markdown
Collaborator

@etraut-openai etraut-openai commented Apr 27, 2026

Why

Refs #19745.

Responses stream failures are currently hard to diagnose because the client often reports only the terminal transport error. That makes it difficult to tell whether a stream was silent from the start, closed after response.created, stalled before durable output, stalled after text began, or completed with inconsistent response IDs.

This adds diagnostics for those lifecycle boundaries without changing retry budgets, idle timeouts, fallback policy, model behavior, or app-server protocol shape.

What changed

  • Added a private ResponseStreamLifecycleRecorder in codex-api that tracks stream attempt, transport, terminal state, response IDs, first and last event timing, output milestones, observed event kinds, and event count.
  • Wired lifecycle capture into both Responses HTTP SSE and Responses WebSocket streams, including warning logs plus appended ApiError::Stream details for failed streams and completed streams with mismatched response IDs.
  • Updated internal ResponseEvent::Created parsing to carry the created response ID so lifecycle capture uses parsed event data instead of reparsing raw JSON.
  • Threaded stream attempt numbers from the core retry loop into ModelClientSession, with the existing WebSocket-to-HTTP fallback reset preserving fresh attempt numbering for the fallback transport.

Verification

  • cargo check -p codex-api -p codex-core
  • cargo test -p codex-api
  • cargo test -p codex-core responses_websocket_v2_surfaces_terminal_error_without_close_handshake
  • cargo test -p codex-otel

@etraut-openai etraut-openai requested a review from a team as a code owner April 27, 2026 04:00
Bojun-Vvibe added a commit to Bojun-Vvibe/oss-contributions that referenced this pull request Apr 27, 2026
- openai/codex#19773 permissions: require profiles in TUI thread state (merge-after-nits)
- openai/codex#19755 Add Responses stream lifecycle diagnostics (merge-after-nits)
- openai/codex#19753 Terminate stdio MCP servers on shutdown (merge-as-is)
- BerriAI/litellm#26581 fix(semantic-filter): run for anthropic_messages call type (merge-as-is)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant