[mono][interp] Fix concurrency issues with publishing transfomed imethod#87555
[mono][interp] Fix concurrency issues with publishing transfomed imethod#87555BrzVlad merged 3 commits intodotnet:mainfrom
Conversation
|
This change additionally transforms |
Before this change they were doing a full memory barrier, regardless of architecture. We have a beginning of more precise implementation via the *_FENCE defines. Implement mono_memory_write_barrier and mono_memory_read_barrier reusing these defines instead. The only consequence of this change is that, on x86 and amd64, `mono_memory_write_barrier` and `mono_memory_read_barrier` become a compiler barrier instead of a full mfence.
When publishing a transformed InterpMethod*, we first set all relevant fields (like `code`, `alloca_size` etc), we execute a write barrier and finally we set the `transformed` flag. On relaxed memory arches we need to have a read barrier on the consumer, since there is no data dependency between `transformed` and the other fields of `InterpMethod`. On arm this change does a full barrier (we could get away with just a load acquire but we haven't yet added support for emitting this in the runtime). Still, this doesn't seem to introduce a heavy perf penalty (on my arm64 M1) but we can revisit if necessary. On x86/amd64 this is a compiler barrier so it should have no impact. WASM is single threaded for now.
c409825 to
160301b
Compare
One thing to consider is the store buffer on x86/amd64 that will delay stores from becoming visible to other cores (still visible to current core while in store buffer). Not sure what semantics we should have in mono_memory_write_barrier, but if we would like the store buffer to be drained as part of mono_memory_write_barrier, then we would at least need a write barrier on x86/amd64. If we only care about the stores (without relation to loads), then a compiler barrier will be enough. |
|
@lateralusX In all the cases that I've seen where we use |
When publishing a transformed InterpMethod*, we first set all relevant fields (like
code,alloca_sizeetc), we execute a write barrier and finally we set thetransformedflag. On relaxed memory arches we need to have a read barrier on the consumer, since there is no data dependency betweentransformedand the other fields ofInterpMethod.On arm this change does a full barrier (we could get away with just a load acquire but we haven't yet added support for emitting this in the runtime). Still, this doesn't seem to introduce a heavy perf penalty (at least on my arm64 M1) but we can revisit if necessary. On x86/amd64 this is a compiler barrier so it should have no impact. WASM is single threaded for now.
Fixes #87271