Skip to content

Python: drop hosted MCP calls when reasoning is stripped#6210

Open
he-yufeng wants to merge 1 commit into
microsoft:mainfrom
he-yufeng:fix/openai-drop-mcp-call-with-reasoning
Open

Python: drop hosted MCP calls when reasoning is stripped#6210
he-yufeng wants to merge 1 commit into
microsoft:mainfrom
he-yufeng:fix/openai-drop-mcp-call-with-reasoning

Conversation

@he-yufeng
Copy link
Copy Markdown
Contributor

Motivation and Context

Fixes #6074.

When OpenAI Responses history is replayed without service-side storage, the client strips text_reasoning because replaying standalone reasoning items is rejected by the API. Hosted MCP calls from the same model output were still kept, which can leave the request with a bare mcp_call after its paired reasoning item has been removed.

Description

This change treats that stateless replay path as an all-or-nothing pair for hosted MCP items. If a message contains reasoning that will be stripped, hosted MCP call/result contents in the same message are skipped too. Any later unmatched MCP result marker is then dropped by the existing coalescing logic.

The existing behavior for hosted MCP calls without reasoning is unchanged, and storage-backed continuation still strips server-issued MCP items as before.

A regression test covers a reasoning plus hosted MCP call/result history and verifies that the prepared input does not contain a bare reasoning, mcp_call, or function_call_output item.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Copilot AI review requested due to automatic review settings May 30, 2026 15:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds handling for a replay scenario where reasoning content is stripped (when not using service-side storage) by also dropping the paired hosted-MCP tool call/result to avoid emitting invalid item sequences to the OpenAI API.

Changes:

  • Add a unit test that asserts hosted-MCP call/result are dropped when paired reasoning is stripped (no service-side storage).
  • Update _prepare_message_for_openai to detect “reasoning will be dropped” and skip serializing mcp_call accordingly.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
python/packages/openai/tests/openai/test_openai_chat_client.py Adds regression test for dropping hosted-MCP call/result when reasoning is stripped.
python/packages/openai/agent_framework_openai/_chat_client.py Drops mcp_call when reasoning is stripped without storage to prevent invalid payloads.

Comment on lines +1495 to +1497
drops_reasoning_without_storage = not request_uses_service_side_storage and any(
content.type == "text_reasoning" for content in message.contents
)
Comment on lines +1552 to +1555
#
# Without storage, a reasoning + hosted-MCP pair cannot be replayed
# partially: reasoning is stripped above, and a bare mcp_call is rejected.
if request_uses_service_side_storage or drops_reasoning_without_storage:
@moonbox3
Copy link
Copy Markdown
Contributor

moonbox3 commented Jun 2, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/openai/agent_framework_openai
   _chat_client.py108514886%276, 289, 631–635, 643–646, 652–656, 706–713, 715–717, 724–726, 772, 780, 803, 921, 1020, 1079, 1081, 1083, 1085, 1151, 1165, 1245, 1255, 1260, 1303, 1414–1415, 1430, 1645, 1650, 1654–1656, 1660–1661, 1744, 1754, 1781, 1787, 1797, 1803, 1808, 1814, 1819–1820, 1839, 1842–1845, 1859, 1861, 1869–1870, 1882, 1924, 2014, 2036–2037, 2052–2053, 2071–2072, 2115, 2281, 2319–2320, 2338, 2418–2426, 2456, 2566, 2601, 2616, 2636–2646, 2659, 2670–2674, 2688, 2702–2713, 2722, 2754–2757, 2767–2768, 2779–2781, 2795–2797, 2807–2808, 2814, 2829
TOTAL37394435088% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
7452 34 💤 0 ❌ 0 🔥 1m 51s ⏱️

Copy link
Copy Markdown
Contributor

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the failing tests as well.

#
# Without storage, a reasoning + hosted-MCP pair cannot be replayed
# partially: reasoning is stripped above, and a bare mcp_call is rejected.
if request_uses_service_side_storage or drops_reasoning_without_storage:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the same drop apply to function_call? Under no-storage this branch keeps the call but reasoning is stripped above, the exact orphan we're fixing for mcp_call. The docstring on test_prepare_message_for_openai_includes_reasoning_with_function_call says that combo 400s: "function_call was provided without its required reasoning item". So either function_call needs the same drops_reasoning_without_storage guard and this PR fixes only half the problem, or function_call and mcp_call genuinely differ at the API and it's worth a note on why only mcp_call is all-or-nothing. Reasoning-model + intermittent, so the new test wouldn't surface it either way.

# marker, since the server-stored items would otherwise duplicate the inline ones. Without
# storage, standalone reasoning items are invalid per the API ("reasoning was provided
# without its required following item"), so the reasoning branch always drops.
drops_reasoning_without_storage = not request_uses_service_side_storage and any(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flag is message-scoped, so it only fires when reasoning and the mcp_call share one Message. For responses parsed by this client that always holds (streaming updates coalesce into one assistant Message, non-streaming builds one), so live traffic is fine. But if a replayed or checkpointed history ever splits a turn so reasoning sits in a separate message, this is False for the mcp_call's message and the bare call survives, the exact state the API rejects. Worth a no-storage test with reasoning and the call in separate messages to pin the contract, or a note that the split can't happen?

@he-yufeng he-yufeng force-pushed the fix/openai-drop-mcp-call-with-reasoning branch from b20225e to 67f2890 Compare June 2, 2026 18:51
@he-yufeng
Copy link
Copy Markdown
Contributor Author

Rebased this branch onto latest upstream/main and re-ran the checks for the OpenAI changes here.

Local validation:

  • uv run --dev python -m ruff check packages/openai/agent_framework_openai/_chat_client.py packages/openai/tests/openai/test_openai_chat_client.py
  • uv run --dev python -m py_compile packages/openai/agent_framework_openai/_chat_client.py packages/openai/tests/openai/test_openai_chat_client.py
  • uv run --dev python -m mypy --config-file pyproject.toml packages/openai/agent_framework_openai/_chat_client.py
  • uv run --dev python -m pytest packages\openai\tests\openai\test_openai_chat_client.py -q -m "not integration" --basetemp .tmp\pytest-6210-unit

I also checked the failed GitHub Actions job. The failing cases are the Docker shell-tool tests, and the failure is from pulling alpine:3:

Unable to find image 'alpine:3' locally
docker: Error response from daemon: Get "https://registry-1.docker.io/v2/": context deadline exceeded

Those tests do not touch the OpenAI message serialization path changed in this PR. I did not change Docker-related code.

One local note: running the full test_openai_chat_client.py file includes live integration tests; those are blocked in my local environment by missing model config / an invalid local OpenAI credential, so I used the non-integration subset above for this PR's regression coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: [Bug]: ] Orphaned mcp_call items cause HTTP 400 in multi-turn reasoning model flows

3 participants