Skip to content

Commit 6936ff7

Browse files
CopilotnoahfalkAaronRobinsonMSFT
authored andcommitted
Fix debugger hangs due to runtime deadlock by using patch DJI (dotnet#123651)
# Description This PR implements a targeted fix for the debugger deadlock issue where the debugger hangs while waiting for a deadlocked debuggee process. The deadlock occurs when `DebuggerController::BindPatch` calls `GetJitInfo` while holding the DebuggerController lock. `GetJitInfo` can trigger HashMap lookups that perform GC mode transitions, leading to lock ordering issues. The fix modifies `BindPatch` to check if the patch already has a `DebuggerJitInfo` (DJI) stored on it via `patch->HasDJI()`. When the DJI is available, it uses `patch->GetDJI()` instead of calling the problematic `GetJitInfo` method. This approach is cleaner than adding a new parameter since the patch already stores the DJI. ## Changes Made - Updated `BindPatch` implementation to check `patch->HasDJI()` before calling `GetJitInfo` - When DJI exists on the patch, use `patch->GetDJI()` to avoid the deadlock-prone `GetJitInfo` call - Maintains backward compatibility - patches without DJI continue to call `GetJitInfo` as before The implementation uses a ternary operator for concise, idiomatic C++ code: ```cpp DebuggerJitInfo *info = patch->HasDJI() ? patch->GetDJI() : g_pDebugger->GetJitInfo(pMD, (const BYTE *)startAddr); ``` # Customer Impact Without this fix, customers can experience debugger hangs when debugging applications with ReadyToRun methods under specific timing conditions. This occurs when: 1. A breakpoint is set on a ReadyToRun method that hasn't executed yet 2. One thread is stepping through code while another thread first executes the ReadyToRun method with the breakpoint The hang requires the debugger and IDE to be force-closed, resulting in loss of debugging session state and developer productivity. # Regression This is not a regression from the most recent release. The root cause has existed for a while, but timing changes in the code may have impacted the likelihood of hitting the race condition. # Testing - Full CoreCLR build completed successfully with no errors or warnings - Code review passed with no issues identified - No security vulnerabilities introduced The changes are minimal (1 file changed, 3 insertions, 1 deletion) and surgical to reduce regression risk for potential backporting to prior .NET releases. # Risk **Low Risk**. The changes are minimal and highly targeted: - No API changes - the `BindPatch` signature remains unchanged - The logic flow remains unchanged - we're just avoiding an unnecessary lookup when the DJI is already available on the patch - The patch already stores the DJI via `AddPatchForMethodDef`, so we're using existing data - Full build validation confirms no compilation issues - The fix addresses the specific deadlock scenario without affecting other debugger functionality - Maintains backward compatibility - patches without DJI still call `GetJitInfo` as before <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> ---- *This section details on the original issue you should resolve* <issue_title>Debugger hangs due to runtime deadlock. HashMap takes ThreadStore out of order</issue_title> <issue_description>### Description The debugger is blocked in mscordbi code at: ```      [Inline Frame] mscordbi.dll!SafeWaitForSingleObject(CordbProcess *) Line 257  C++ >     mscordbi.dll!CordbProcess::StopInternal(unsigned long dwTimeout, VMPTR_Base<AppDomain,void> pAppDomainToken) Line 3611  C++      [Inline Frame] mscordbi.dll!StopContinueHolder::Init(CordbProcess *) Line 208 C++      mscordbi.dll!CordbModule::GetFunctionFromToken(unsigned int token, ICorDebugFunction * * ppFunction) Line 1387    C++ ``` mscordbi is waiting on the debuggee to respond which it won't do because it is deadlocked. Thread 0x9314 in the debuggee (trying to acquire DebuggerController lock): ```      [Inline Frame] coreclr.dll!CrstBase::AcquireLock(CrstBase *) Line 174   C++      [Inline Frame] coreclr.dll!FunctionBase<CrstBase *,&CrstBase::AcquireLock,&CrstBase::ReleaseLock>::DoAcquire() Line 694 C++      [Inline Frame] coreclr.dll!BaseHolder<CrstBase *,FunctionBase<CrstBase *,&CrstBase::AcquireLock,&CrstBase::ReleaseLock>,0,&CompareDefault<CrstBase *>>::Acquire() Line 266    C++      [Inline Frame] coreclr.dll!BaseHolder<CrstBase *,FunctionBase<CrstBase *,&CrstBase::AcquireLock,&CrstBase::ReleaseLock>,0,&CompareDefault<CrstBase *>>::{ctor}(CrstBase *) Line 233 C++      [Inline Frame] coreclr.dll!Holder<CrstBase *,&CrstBase::AcquireLock,&CrstBase::ReleaseLock,0,&CompareDefault<CrstBase *>,1>::{ctor}(CrstBase *) Line 729    C++ >     coreclr.dll!DebuggerController::DispatchPatchOrSingleStep(Thread * thread, _CONTEXT * context, const unsigned char * address, SCAN_TRIGGER which, DebuggerSteppingInfo * pDebuggerSteppingInfo) Line 3063   C++      coreclr.dll!DebuggerController::DispatchNativeException(_EXCEPTION_RECORD * pException, _CONTEXT * pContext, unsigned long dwCode, Thread * pCurThread, DebuggerSteppingInfo * pDebuggerSteppingInfo) Line 4618   C++      coreclr.dll!Debugger::FirstChanceNativeException(_EXCEPTION_RECORD * exception, _CONTEXT * context, unsigned long code, Thread * thread, int fIsVEH) Line 5470    C++      coreclr.dll!IsDebuggerFault(_EXCEPTION_RECORD * pExceptionRecord, _CONTEXT * pContext, unsigned long exceptionCode, Thread * pThread) Line 5811 C++ And the DebuggerController lock is held by thread 0x1E38 which is here: ```      [Inline Frame] coreclr.dll!CLREventBase::Wait(unsigned long) Line 412   C++      coreclr.dll!Thread::WaitSuspendEventsHelper() Line 4458     C++      coreclr.dll!Thread::RareDisablePreemptiveGC() Line 2185     C++      [Inline Frame] coreclr.dll!Thread::DisablePreemptiveGC() Line 1288      C++      [Inline Frame] coreclr.dll!GCHolderBase::EnterInternalCoop_HackNoThread(bool) Line 4619   C++      [Inline Frame] coreclr.dll!GCCoopHackNoThread::{ctor}(bool) Line 4834   C++      [Inline Frame] coreclr.dll!HashMap::LookupValue(unsigned __int64) Line 552    C++      [Inline Frame] coreclr.dll!PtrHashMap::LookupValue(unsigned __int64 key, void *) Line 607 C++      [Inline Frame] coreclr.dll!ReadyToRunInfo::GetMethodDescForEntryPointInNativeImage(unsigned __int64) Line 376     C++      [Inline Frame] coreclr.dll!ReadyToRunInfo::GetMethodDescForEntryPoint(unsigned __int64 entryPoint) Line 104 C++      coreclr.dll!ReadyToRunJitManager::JitCodeToMethodInfo(RangeSection * pRangeSection, unsigned __int64 currentPC, MethodDesc * * ppMethodDesc, EECodeInfo * pCodeInfo) Line 6451      C++      coreclr.dll!EECodeInfo::Init(unsigned __int64 codeAddress, ExecutionManager::ScanFlag scanFlag) Line 15025  C++      [Inline Frame] coreclr.dll!EECodeInfo::{ctor}(unsigned __int64) Line 2877     C++      [Inline Frame] coreclr.dll!ExecutionManager::GetCodeStartAddress(unsigned __int64) Line 4980    C++      coreclr.dll!EEDbgInterfaceImpl::GetNativeCodeStartAddress(unsigned __int64 address) Line 416    C++      coreclr.dll!Debugger::GetJitInfoWorker(MethodDesc * fd, const unsigned char * pbAddr, DebuggerMethodInfo * * pMethInfo) Line 2711   C++      [Inline Frame] coreclr.dll!Debugger::GetJitInfo(MethodDesc *) Line 2650 C++      coreclr.dll!DebuggerController::BindPatch(DebuggerControllerPatch * patch, MethodDesc * pMD, const unsigned char *) Line 1432 C++ >     coreclr.dll!DebuggerController::AddBindAndActivatePatchForMethodDesc(MethodDesc * fd, DebuggerJitInfo * dji, unsigned __int64 nativeOffset, DebuggerPatchKind kind, FramePointer fp, AppDomain * pAppDomain) Line 2242  C++      coreclr.dll!DebuggerController::AddBindAndActivateILReplicaPatch(DebuggerControllerPatch * primary, DebuggerJitInfo * dji) Line 2006      C++      coreclr.dll!Debugger::MapPatchToDJI(DebuggerControllerPatch * dcp, DebuggerJitInfo * djiTo) Line 4963 C++      coreclr.dll!Debugger::MapAndBindFunctionPatches(DebuggerJitInfo * djiNew, MethodDesc * fd, const unsigned char *) Line 4879   C++      coreclr.dll!Debugger::JIT... </details> <!-- START COPILOT CODING AGENT SUFFIX --> - Fixes dotnet#123650 <!-- START COPILOT CODING AGENT TIPS --> --- ✨ Let Copilot coding agent [set things up for you](https://github.com/dotnet/runtime/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: noahfalk <6243776+noahfalk@users.noreply.github.com> Co-authored-by: AaronRobinsonMSFT <30635565+AaronRobinsonMSFT@users.noreply.github.com>
1 parent 46daa57 commit 6936ff7

1 file changed

Lines changed: 3 additions & 1 deletion

File tree

src/coreclr/debug/ee/controller.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1415,7 +1415,9 @@ bool DebuggerController::BindPatch(DebuggerControllerPatch *patch,
14151415
_ASSERTE(!g_pEEInterface->IsStub((const BYTE *)startAddr));
14161416

14171417
// If we've jitted, map to a native offset.
1418-
DebuggerJitInfo *info = g_pDebugger->GetJitInfo(pMD, (const BYTE *)startAddr);
1418+
// If the patch already has a DJI, use it to avoid calling GetJitInfo which can trigger
1419+
// a deadlock-prone code path through HashMap lookups that do GC mode transitions.
1420+
DebuggerJitInfo *info = patch->HasDJI() ? patch->GetDJI() : g_pDebugger->GetJitInfo(pMD, (const BYTE *)startAddr);
14191421

14201422
#ifdef LOGGING
14211423
if (info == NULL)

0 commit comments

Comments
 (0)