fix(agent): gpt-5.2 등 OpenAI reasoning 모델 effort=minimal 분기 + CoT 누출 제거#63
Merged
Conversation
… + CoT 누출 제거 PR #54는 Kimi/DeepSeek 기준이라 OpenRouter면 무조건 _REASONING_OFF_BODY (enabled:false, max_tokens:1)를 적용했음. 그러나 OpenRouter 문서상 GPT-5 시리즈/o-series는 reasoning.effort로 제어해야 해서, OpenAI 모델에 그 바디를 쓰면 동작이 불확실(빈 응답/CoT 누출)했음. - _is_openai_native_model() 추가 (gpt-*, o1/o3/o4, openai/*) - OpenAI-native → extra_body의 reasoning.effort="minimal" (OpenRouter) / reasoning_effort="minimal" (직결). Kimi/DeepSeek는 기존 _REASONING_OFF_BODY 유지 - 빈 content 시 reasoning(CoT) 반환 제거 → '[답변 불가]'로 정규화 (doc-summary PR #134와 동일 정책 — judge JSON 파싱 깨짐/CoT 누출 차단) - 기본 max_tokens 1024 → 2048 (minimal reasoning + 답변 여유) 이후 doc-graph를 gpt-5.2 + Claude judge로 측정해도 빈 응답/CoT 누출 없이 doc-summary와 동일한 답변 처리 정책으로 동작 → cross-system 비교 정합.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
배경
doc-graph를
openai/gpt-5.2+ Claude judge로 측정하기 전 필요한 fix. PR #54는 Kimi/DeepSeek 기준이라is_openrouter면 무조건_REASONING_OFF_BODY(enabled:false, max_tokens:1, exclude:true)를 적용했음.그러나 OpenRouter reasoning 문서 기준 GPT-5 시리즈/o-series는
reasoning.effort로 제어해야 함 (xhigh/high/medium/low/minimal/none). OpenAI 모델에 Kimi용 바디(enabled:false+max_tokens:1)를 적용하면 동작이 불확실해, gpt-5.2로 돌릴 때 빈 응답 / CoT 누출 위험이 있었음 — doc-summary에서 잡았던 버그(PR #134)와 동형.변경
agent/llm_client.py:_is_openai_native_model()추가 —gpt-*,o1/o3/o4,openai/*판별llm.py와 정합):extra_body={"reasoning":{"effort":"minimal"}}reasoning_effort="minimal"_REASONING_OFF_BODY유지reasoning/reasoning_content반환하던 fallback 삭제 →"[답변 불가]"로 정규화 (judge JSON 파싱 깨짐 + 영어 CoT 누출 차단, doc-summary PR #134와 동일 정책)max_tokens1024 → 2048 (minimal reasoning + 답변 여유)호환성
_REASONING_OFF_BODY유지)openai/gpt-5.2OpenRouter 측정검증
py_compileOK[답변 불가]반환 확인 (코드 경로)영향
이 fix + PR #62(judge 프롬프트 동기화) 머지 후, doc-graph를 gpt-5.2 + Claude judge로 측정하면 doc-summary와 동일한 답변/판정 정책으로 동작 → cross-system 비교 정합.
Refs PR #54, doc-summary PR #134