-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
Description
A Windows service running in production environment crashes periodically (once-twice a week) with AVs as seen below:
Faulting application name: XXXXX.exe, version: 0.0.0.0, time stamp: 0x5f6b3998
Faulting module name: coreclr.dll, version: 4.700.20.47201, time stamp: 0x5f6a7a28
Exception code: 0xc0000005
Fault offset: 0x0000000000232e2b
WinDbg: the point of failure in FW 3.1 is this line:
PTR_VOID pRawSourceWatsonBucketArray = dac_cast<PTR_VOID>(refSourceWatsonBucketArray->GetDataPtr());
Configuration
- .NET Core 3.1 (coreclr 4.700.20.47201 and 4.700.20.36602)
- Windows 2016 DataCenter Server x64
- 64 vCPUs, 400 GB of RAM
Regression?
On a first sight this started immediately after the upgrade to v3.1.7 (dotnet-sdk-3.1.401-win-x64) (coreclr.dll 4.700.20.36602) on August 30 (fault offset 0x0000000000232e1b).
Update 1
We found 5 more crashes in our logs going back to beginning of 2020 (although not as frequent as it happens now):
- 3 crashes on coreclr 4.700.20.6602
- 2 crashes on coreclr 4.700.20.20201
Update 2
Jan 23 2019: a similar exception handing failure:
Faulting application name: XXXXX.exe, version: 0.0.0.0, time stamp: 0x5c007e95
Faulting module name: coreclr.dll, version: 4.6.27129.4, time stamp: 0x5c00327e
Exception code: 0xc0000005
Fault offset: 0x00000000001a3b8d
WinDbg > [e:\a_work\104\s\src\vm\exceptionhandling.cpp @ 1029] (00000001800379e0) coreclr!ProcessCLRException+0x16c1ad | (0000000180037e00) coreclr!ExceptionTracker::ProcessOSExceptionNotification
Other information
The crash dump attached in Visual Studio:
There is a normal .NET exception ends up in the crash at excep.cpp#L10309.
Outer exception:
- {"Exception has been thrown by the target of an invocation."} System.Reflection.TargetInvocationException
- SerializationWatsonBuckets: null
InnerException
- {"InitializingException: Initializing"} System.Exception {InitializingException}
- SerializationWatsonBuckets {byte[5616]}
Note that the application is in startup mode and there are quite a few of these Initializing(s) flying around, increasing the possibility of the bug to present itself (if there is one). Indeed, the crash is more likely to happen during the takeoff than during the cruising phase of the process lifetime.
Looking around a bit I noticed that the method that hits the null ref is not supposed to run unless AreWatsonBucketsPresent returns true. Could it be a race condition?
Please advice if upgrading to FW 5.0 or moving to another OS could help with the issue short term.
I will provide more details as necessary.
Thank you.
FYI @karelz
