Skip to content

[mono] Incompatible loader behavior w.r.t. Resolving event #54814

@uweigand

Description

@uweigand

Description

Using a Mono runtime based "dotnet" host to build the runtime libs.tests target currently fails for with an error building the System.Reflection.Metadata.ApplyUpdate test cases.

The actual command that fails is the following:

uweigand@m8345019:~/runtime$ dotnet ~/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/1.0.1-alpha.0.21314.1/tools/net6.0/Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll -msbuild:src/libraries/System.Runtime.Loader/tests/ApplyUpdate/System.Reflection.Metadata.ApplyUpdate.Test.MethodBody1/System.Reflection.Metadata.ApplyUpdate.Test.MethodBody1.csproj -script:src/libraries/System.Runtime.Loader/tests/ApplyUpdate/System.Reflection.Metadata.ApplyUpdate.Test.MethodBody1/deltascript.json
msbuild failed opening project src/libraries/System.Runtime.Loader/tests/ApplyUpdate/System.Reflection.Metadata.ApplyUpdate.Test.MethodBody1/System.Reflection.Metadata.ApplyUpdate.Test.MethodBody1.csproj
MSBuildWorkspace Failure: Msbuild failed when processing the file '/home/uweigand/runtime/src/libraries/System.Runtime.Loader/tests/ApplyUpdate/System.Reflection.Metadata.ApplyUpdate.Test.MethodBody1/System.Reflection.Metadata.ApplyUpdate.Test.MethodBody1.csproj' with message: The SDK 'Microsoft.DotNet.Arcade.Sdk' specified could not be found.  /home/uweigand/runtime/Directory.Build.props
failed: failed workspace

Configuration

I'm building natively on s390x-ibm-linux, using a .NET 6 Preview5 host toolchain rebuilt for s390x from the official sources (plus some extra patches required to support s390x - these are mostly backports from mainline plus pending PRs). Note that this toolchain is based on the Mono runtime as we don't support CoreCLR on s390x.

Regression?

It is a regression in the sense that the libs.tests target used to build previously. But that is caused simply by the fact that the ApplyUpdates test exposing the problem was recently added. The underlying problem in the Mono loader was there previously.

Other information

The failing Arcade SDK reference is supposed to be resolved via NuGetSdkResolver, which is loaded from ${DOTNET_ROOT}/sdk/6.0.100-preview.5.21302.13/Microsoft.Build.NuGetSdkResolver.dll

This in turn has a dependency on Newtonsoft.Json, which it uses to parse the global.json file (which contains the list of SDK dependencies). However, loading Newtonsoft.Json fails with:

System.IO.FileNotFoundException: Could not load file or assembly 'Newtonsoft.Json, Version=9.0.0.0, Culture=neutral, PublicKeyToken=30ad4fe6b2a6aeed' or one of its dependencies.

and as a consequence the NuGetSdkResolver lookup fails as well.

During debugging I made the following observations:

A) The command fails with the Mono runtime and the current msbuild
B) The command works with the Mono runtime and older msbuild versions
C) The command works with the CoreCLR runtime and the current msbuild

which I'm looking into in more detail in the following sections.

Why does A happen?

The Mono loader uses the following algorithm to try and load the Newtonsoft.Json assembly (from mono/metadata/assembly.c:netcore_load_reference):

         * 1. Check if it's already loaded by the ALC.
         *
         * 2. If it's a non-default ALC, call the Load() method.
         *
         * 3. If the ALC is not the default and this is not a satellite request,
         *    check if it's already loaded by the default ALC.
         *
         * 4. If we have a bundle registered and this is not a satellite request,
         *    search the images for a matching name.
         *
         * 5. If we have a satellite bundle registered and this is a satellite request,
         *    find the parent ALC and search the images for a matching name and culture.
         *
         * 6. If the ALC is the default or this is not a satellite request,
         *    check the TPA list, APP_PATHS, and ApplicationBase.
         *
         * 7. If this is a satellite request, call the ALC ResolveSatelliteAssembly method.
         *
         * 8. Call the ALC Resolving event.
         *
         * 9. Call the ALC AssemblyResolve event (except for corlib satellite assemblies).
         *
         * 10. Return NULL.

Step 1 fails since the assembly is not loaded yet. Note that Microsoft.Build.NuGetSdkResolver.dll was loaded by
Microsoft.Build.Shared.CoreClrAssemblyLoader into a non-default ALC, which is then also used for its dependencies including Newtonsoft.Json.

Step 2 also fails. Note that while the non-default ALC does have a Load method (Microsoft.Build.Shared.MSBuildLoadContext:Load), none of the three options in that routine manage to load the assembly:

            if (WellKnownAssemblyNames.Contains(assemblyName.Name!))

Newtonsoft.Json is not on the well-known list.

            foreach (var cultureSubfolder in string.IsNullOrEmpty(assemblyName.CultureName)

This does find ${DOTNET_ROOT}/sdk/6.0.100-preview.5.21302.13/Newtonsoft.Json.dll, but the file is rejected due to version mismatch (9 vs. 12)

            if (FileSystems.Default.FileExists(assemblyNameInExecutableDirectory))

This looks for ${DOTNET_ROOT}/sdk/6.0.100-preview.5.21302.13/Newtonsoft.Json (without suffix), which doesn't exist

Step 3 also fails, since the assembly isn't in the default ALC either.

Steps 4 and 5 do not apply.

Step 6 fails, since:

  • Newtonsoft.Json is not on the Trusted Platform Assembly list.
  • APP_PATHS is not set.
  • ApplicationBase is ~/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/1.0.1-alpha.0.21314.1/tools/net6.0/ and there is no copy of Newtonsoft.Json in that directory.

Step 7 does not apply. Neither do steps 8 or 9, since the non-default ALC does not have any handlers for either the Resolving or the AssemblyResolve event.

Why does B happen?

Before some recent changes in Microsoft.Build.Shared.CoreClrAssemblyLoader (dotnet/msbuild@bfdb787), Microsoft.Build.NuGetSdkResolver.dll was loaded into the default ALC, which was then also used to search for Newtonsoft.Json. For the most part, the search proceeds the same as above. However, in Step 8, there is a difference: the default ALC does have a Resolving event handler provided by MSBuildLocator, which is installed at startup by Microsoft.DotNet.HotReload.Utils.Generator.

This Resolving event handler finds and successfully loads Newtonsoft.Json.

(This handler is not found in the A scenario above, because it is registered only in the default ALC, not the MSBuildLoadContext ALC, and the Mono loader invokes the event only in the latter.)

Why does C happen?

The CoreCLR loader uses a somewhat different algorithm, documented as follows (in CLRPrivBinderAssemblyLoadContext::BindAssemblyByName):

    // 1) Lookup the assembly within the LoadContext itself. If assembly is found, use it.
    // 2) Invoke the LoadContext's Load method implementation. If assembly is found, use it.
    // 3) Lookup the assembly within TPABinder (except for satellite requests). If assembly is found, use it.
    // 4) Invoke the LoadContext's ResolveSatelliteAssembly method (for satellite requests). If assembly is found, use it.
    // 5) Invoke the LoadContext's Resolving event. If assembly is found, use it.
    // 6) Raise exception.

Steps 1 and 2 correspond to steps 1 and 2 in the Mono algorithm (and just like with Mono do not find Newtonsoft.Json).

Step 3 is interesting, in that it in effect recursively performs the same steps 1-5, just on the TPABinder context instead of the
application LoadContext. Step 1 of that recursive invocation seems to mostly correspond to steps 3-6 of the Mono algorithm (and likewise does not find Newtonsoft.Json).

Steps 2 and 3 of the recursive invocation are skipped, and step 4 doesn't apply here. However, step 5 is interesting: this step is executed, and it calls the Resolving event handler in the TPABinder context, which uses the default ALC. This means the handler registered by MSBuildLocator does get control, and successfully loads Newtonsoft.Json.

Conclusion

So in a nutshell, the difference between Mono and CoreCLR seems to be that Mono invokes the Resolving event only in the application ALC, while CoreCLR invokes it twice, first in the default ALC and then in the application ALC.

Changing the Mono loader to likewise try invoking the Resolving event in the default ALC first (when loading a non-satellite assembly into a non-default ALC) fixes the original problem. I'll post a PR proposing this change shortly.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions