Fix watson buckets setting race condition#46636
Conversation
The code that updates watson buckets in an exception from its inner exception has a race that can cause runtime crashes when many threads are rethrowing the same exception using ExceptionDispatchInfo.Throw. At several places, we check if the inner exception contains watson buckets and if that is true, we copy them to the outer exception. However, the ExceptionDispatchInfo.Throw on another thread can reset the buckets to the captured value, which might be null. So the copying then crashes with access violation. This change fixes the issue by reading the buckets reference once and then performing the checks / copying using that extracted value. It also adds a regression test that asserts quickly in debug / checked builds.
| @@ -0,0 +1,124 @@ | |||
| using System; | |||
There was a problem hiding this comment.
Oops, all of my recent regression tests are missing licence header. I'll fix them.
src/coreclr/vm/object.h
Outdated
| { | ||
| LIMITED_METHOD_CONTRACT; | ||
| return _watsonBuckets; | ||
| PTR_U1Array buckets = (PTR_U1Array)OBJECTREFToObject(_watsonBuckets); |
There was a problem hiding this comment.
The problem is that the underlying REF constructor that is called to create the return value calls validation on the _watsonBuckets this way:
do {if ((objref) != NULL) (objref).Validate();} while (0)During the race, the condition is true, but another thread sets the buckets to NULL right after the check, so the (objref).Validate() crashes. This change prevents that by extracting the object and then constructing the returned value on that.
There was a problem hiding this comment.
It may be better to fix the validation to avoid this race condition. I suspect a bunch of places throughout the code have the same problem.
There was a problem hiding this comment.
Sounds reasonable. I'll give it a try.
src/coreclr/vm/excep.cpp
Outdated
|
|
||
| if (gotWatsonBucketDetails) | ||
| { | ||
| // Set the flag that we got bucketing details for the exception |
* Make the VALIDATEOBJECTREF race-resilient * Add license header to the test * Fix indentation
src/coreclr/vm/vars.hpp
Outdated
|
|
||
| // the while (0) syntax below is to force a trailing semicolon on users of the macro | ||
| #define VALIDATEOBJECT(obj) do {if ((obj) != NULL) (obj)->Validate();} while (0) | ||
| #define VALIDATEOBJECTREF(objref) do { VALIDATEOBJECT(OBJECTREFToObject(objref)); } while (0) |
There was a problem hiding this comment.
How is this fixing the race condition? I would expect to see a temporary local variable or something like that as part of the fix.
There was a problem hiding this comment.
Sigh, you are right, I have somehow forgotten that this is a macro and not a function call and so the OBJECTREFToObject(objref) will get propagated into the other macro as is.
The code that updates watson buckets in an exception from its inner
exception has a race that can cause runtime crashes when many threads
are rethrowing the same exception using ExceptionDispatchInfo.Throw.
At several places, we check if the inner exception contains watson
buckets and if that is true, we copy them to the outer exception.
However, the ExceptionDispatchInfo.Throw on another thread can reset
the buckets to the captured value, which might be null. So the
copying then crashes with access violation.
This change fixes the issue by reading the buckets reference once and
then performing the checks / copying using that extracted value.
It also adds a regression test that asserts quickly in debug / checked
builds.
Close #45929