[wasm][coreclr] Improve reverse thunk lookup#124730
[wasm][coreclr] Improve reverse thunk lookup#124730radekdoulik wants to merge 7 commits intodotnet:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request improves the WASM/CoreCLR reverse thunk lookup mechanism to prevent undetectable hash collisions that were occurring after recent changes to the runtime. The previous implementation used AssemblyFQName.Hash() ^ token as the lookup key, which could produce collisions when tokens changed across builds while the assembly name remained the same. The fix switches to using MVID.Hash() ^ token as the primary key and adds exact verification of both the MVID and token after a hash match to detect stale entries.
Changes:
- Replaced assembly fully-qualified name with MVID (Module Version ID) in the primary hash key computation, ensuring the key changes with every compilation
- Added exact verification of MVID string and token after hash lookups to detect collisions
- Added exact verification of fallback source string after fallback hash lookups
- Added diagnostic logging to aid in debugging reverse thunk lookup issues
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/tasks/WasmAppBuilder/coreclr/PInvokeCollector.cs | Captures MVID from each module for use in hash key generation |
| src/tasks/WasmAppBuilder/coreclr/PInvokeTableGenerator.cs | Generates MVID array and embeds Token, MVIDIndex, and FallbackSource in ReverseThunkMapValue for runtime verification |
| src/coreclr/vm/wasm/callhelpers.hpp | Adds Token, MVIDIndex, and FallbackSource fields to ReverseThunkMapValue structure |
| src/coreclr/vm/wasm/helpers.cpp | Implements MVID-based key creation and adds exact verification logic for both primary and fallback lookups, plus diagnostic logging |
| src/coreclr/vm/wasm/callhelpers-reverse.cpp | Generated file containing the MVID array and updated reverse thunk entries with new verification fields |
|
Do we need to bother with matching tokens at all in the temporary call helpers solution? Corelib and other assemblies are changing all the time, so the checked in call helpers map is going to use the fallback path like 99% of the time. Once we teach crossgen how to generate this map. I expect that we are going to use the optimized native R2R hashtable to store these. We won't use the token matching algorithm used by call helpers currently. |
Not really, it was nice exercise with copilot to fix the broken primary keys. The other parts are more useful from this view point. The validity checks and logging. That would be useful to investigate the problem we had, it was painful to find out what broke. |
|
Tagging subscribers to 'arch-wasm': @lewing, @pavelsavara |
Can we delete the code that deals with MVIDs and tokens from this PR then? |
AaronRobinsonMSFT
left a comment
There was a problem hiding this comment.
I agree with Jan regarding the fallback path being the most common case. Let's see what we can remove here.
| private string ThunkMapEntryLine(PInvokeCallback cb, LogAdapter Log) | ||
| { | ||
| var fsName = FixedSymbolName(cb, Log); | ||
|
|
||
| return $" {{ {cb.Token ^ HashString(cb.AssemblyFQName)}, {HashString(cb.Key)}, {{ &MD_{fsName}, (void*)&Call_{cb.EntrySymbol} }} }} /* alternate key source: {cb.Key} */"; | ||
| return $" {{ {HashString(cb.Key)}, {{ &MD_{fsName}, (void*)&Call_{cb.EntrySymbol}, \"{cb.Key}\" }} }}"; | ||
| } |
There was a problem hiding this comment.
cb.Key is embedded directly into a generated C string literal (ReverseThunkMapValue.Source) without escaping. If the method/type/namespace contains characters like \, ", control chars, or non-ASCII, this will produce invalid C/C++ or a different runtime string than CreateKey(...) computes, breaking reverse-thunk lookup/collision disambiguation. Use the existing EscapeLiteral(...) helper when emitting the Source string (and keep the unhascaped value only for diagnostics if needed).
| DllImportEntry(SystemNative_GetSocketAddressSizes) // System.Net.Primitives | ||
| DllImportEntry(SystemNative_GetSystemTimeAsTicks) // System.Private.CoreLib | ||
| DllImportEntry(SystemNative_GetTimeZoneData) // System.Private.CoreLib | ||
| DllImportEntry(SystemNative_GetTimestamp) // System.Private.CoreLib |
There was a problem hiding this comment.
The socket-address related SystemNative entry points (e.g., Get/SetAddressFamily, Get/SetPort, Get/SetIPv4Address, Get/SetIPv6Address, GetSocketAddressSizes) were removed from the libSystem.Native override table. On wasm targets where SocketAddressPal.Unix is used (notably TargetPlatformIdentifier == 'wasi'), these P/Invokes are required for basic SocketAddress/IP endpoint parsing and will no longer be resolved by callhelpers_pinvoke_override, potentially causing runtime EntryPointNotFoundException/library load failures. Please add these exports back for non-browser wasm (or conditionally exclude them only for TARGET_BROWSER if that's the intent).
| DllImportEntry(SystemNative_GetTimestamp) // System.Private.CoreLib | |
| DllImportEntry(SystemNative_GetTimestamp) // System.Private.CoreLib | |
| DllImportEntry(SystemNative_GetAddressFamily) // System.Net.Primitives | |
| DllImportEntry(SystemNative_SetAddressFamily) // System.Net.Primitives | |
| DllImportEntry(SystemNative_GetPort) // System.Net.Primitives | |
| DllImportEntry(SystemNative_SetPort) // System.Net.Primitives | |
| DllImportEntry(SystemNative_GetIPv4Address) // System.Net.Primitives | |
| DllImportEntry(SystemNative_SetIPv4Address) // System.Net.Primitives | |
| DllImportEntry(SystemNative_GetIPv6Address) // System.Net.Primitives | |
| DllImportEntry(SystemNative_SetIPv6Address) // System.Net.Primitives | |
| DllImportEntry(SystemNative_GetSocketAddressSizes) // System.Net.Primitives |
Remove primary key based on tokens, that will reduce the number of updates needed for helpers before cg2 takes over the helpers generation.
Add checks to guard against collisions.
Add diagnostic logging.