Skip to content

New code to perform managed <-> java lookups (typemap)#3992

Merged
jonpryor merged 1 commit intodotnet:masterfrom
grendello:new-typemap
Feb 10, 2020
Merged

New code to perform managed <-> java lookups (typemap)#3992
jonpryor merged 1 commit intodotnet:masterfrom
grendello:new-typemap

Conversation

@grendello
Copy link
Copy Markdown
Contributor

@grendello grendello commented Dec 4, 2019

Xamarin.Assembly needs to "translate" managed types to Java types and
vice versa in order to provide a bridge between the two world. So far it
has been done using a straightforward (and fast) method of performing
the lookups - all the type pairs were stored in two tables of the same
size, with all type names padded to the width of the longest name so
that the bsearch C function can be used to quickly perform a binary
search over the data set. This approach works very well at the expense
of data size (shorter strings are 0-padded to the maximum width) and a
slightly degraded performace because of the requirement to perform
string comparisons. Furthermore, the lookup required that reflection is
used to obtain full managed type name (when translating from managed to
Java) or to get a Type instance from type name (when translating from
Java to managed).

For Release builds all the above data is placed in the
libxamarin-app.so library, for Debug builds it is also placed in two
files - one for each direction of lookup, described above.

This commit is a slight improvement over the above scheme. It eliminates
reflection from the process by using managed type tokens (which are
integers) and using UUID/Guid of the module in which the type is found.
This allows us to perform the binary search over the set of 20 bytes (16
bytes for the UUID and 4 bytes for the token ID) for managed to Java
lookups and a single string comparison + binary search over a set of
integers for the Java to managed lookup.

Java type names must still be used because Java doesn't provide any
equivalent to the .NET's type token and module UUID. Those names are
still 0-padded to the width of the longest name but there are no longer
duplicated. Managed type names are eliminated completely.

If Xamarin.Android Instant Run is not used (which is the case for OSS
code) for Debug builds, the operation is performed in the same way for
both Release and Debug builds. If, however, Instant Run is in effect,
the type maps are stored in several files with the .typemap extension -
one per module. The files contain both the Java to managed maps as
well as managed to Java maps (which use indexes into the Java to managed
maps). All of those files are loaded during Debug app startup and used
to construct a dataset which is the searched during all the lookups.

Typemap index file format, all data is little-endian:

Header format

[Magic string] # XATI
[Format version] # 32-bit unsigned integer, 4 bytes
[Entry count] # 32-bit unsigned integer, 4 bytes
[Module file name width] # 32-bit unsigned integer, 4 bytes
[Index entries] # Format described below, Entry count entries

Index entry format:

[Module UUID][File name]<NUL>

Where:

[Module UUID] is 16 bytes long
[File name] is right-padded with <NUL> characters to the [Module file name width] boundary.

Typemap file format, all data is little-endian:

Header format

[Magic string] # XATM
[Format version] # 32-bit integer, 4 bytes
[Module UUID] # 16 bytes
[Entry count] # unsigned 32-bit integer, 4 bytes
[Duplicate count] # unsigned 32-bit integer, 4 bytes (might be 0)
[Java type name width] # unsigned 32-bit integer, 4 bytes
[Assembly name size] # unsigned 32-bit integer, 4 bytes
[Assembly name] # Non-null terminated assembly name
[Java-to-managed map] # Format described below, [Entry count] entries
[Managed-to-java map] # Format described below, [Entry count] entries
[Managed-to-java duplicates map] # Map of unique managed IDs which point to the same Java type name (might be empty)

Java-to-managed map format:

[Java type name]<NUL>[Managed type token ID]

Each name is padded with <NUL> to the width specified in the [Java type name width] field above.
Names are written without the size prefix, instead they are always terminated with a nul character
to make it easier and faster to handle by the native runtime.

Each token ID is an unsigned 32-bit integer, 4 bytes

Managed-to-java map format:

[Managed type token ID][Java type name table index]

Both fields are unsigned 32-bit integers, to a total of 8 bytes per entry. Index points into the
[Java-to-managed map] table above.

Managed-to-java duplicates map format:

Format is identical to [Managed-to-java] above.

Size changes (XF integration test, libxamarin-app.so, Release build):

  • armeabi-v7a
    • before: 376616
      • after: 97860
  • arm64-v8a
    • before: 377408
      • after: 104192
  • x86
    • before: 376424
    • after: 97604

Performance changes (XF integration test, Release build):

    Device name: **Pixel 3 XL**

Device architecture: arm64-v8a
Number of test runs: 10

Native to managed Runtime init Displayed Notes
master 141.102 160.606 839.80 preload enabled; 32-bit build
this commit 134.539 154.701 836.10
master 141.743 158.325 837.20 preload disabled; 32-bit build
this commit 134.064 149.137 831.90
master 134.526 152.640 805.10 preload enabled; 64-bit build
this commit 126.376 143.226 788.60
master 134.049 149.543 779.40 preload disabled; 64-bit build
this commit 124.847 139.227 776.10

Build performance (Release build):

Before

389 ms  GenerateJavaStubs                          1 calls

After

247 ms  GenerateJavaStubs                          1 calls

New code generates only native assembly or only the binary typemap
files, unlike the old code which generated both. Initially the new
generator code was moved to a separate task, but Jonathan Peppers
determined that it was suboptimal and re-integrated the code back with
GenerateJavaStubs

@grendello grendello added the do-not-merge PR should not be merged. label Dec 4, 2019
@grendello grendello force-pushed the new-typemap branch 6 times, most recently from 8bf4da4 to cc5e0e5 Compare December 9, 2019 10:08
@grendello grendello force-pushed the new-typemap branch 4 times, most recently from 6f93cf4 to 690efa6 Compare January 13, 2020 19:46
@grendello grendello force-pushed the new-typemap branch 3 times, most recently from 5cdfa14 to 6cc5192 Compare January 14, 2020 21:09
@grendello grendello changed the title [WIP] New typemap New code to perform managed <-> java lookups (typemap) Jan 14, 2020
@grendello grendello added full-mono-integration-build For PRs; run a full build (~6-10h for mono bumps), not the faster PR subset (~2h for mono bumps) and removed do-not-merge PR should not be merged. labels Jan 14, 2020
@grendello grendello self-assigned this Jan 14, 2020
Copy link
Copy Markdown
Contributor

@dellis1972 dellis1972 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok. One formatting issue, and we need the performance data. Other than that 👍

@grendello
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@grendello
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@grendello
Copy link
Copy Markdown
Contributor Author

Please don't merge yet, I need to test fastdev mode tomorrow after the changes

@jonathanpeppers
Copy link
Copy Markdown
Member

Ok, great so much better!

Build_CSharp_Change:
Exceeded expected time of 4100ms, actual 4160ms
Build_AndroidManifest_Change:
Exceeded expected time of 4250ms, actual 4430ms
Build_JLO_Change:
Exceeded expected time of 8150ms, actual 8630ms

You can bump three of these numbers:
https://github.com/xamarin/xamarin-android/blob/476921a7e526ffd522245de2bcfef93f1a91fd0a/tests/msbuild-times-reference/MSBuildDeviceIntegration.csv#L6-L10

Maybe:

-Build_CSharp_Change,4100
+Build_CSharp_Change,4200
-Build_AndroidManifest_Change,4250
+Build_AndroidManifest_Change,4500
-Build_JLO_Change,8150
+Build_JLO_Change,8700

@grendello
Copy link
Copy Markdown
Contributor Author

OK, everything checks out locally. I applied the changes requested in #3992 (comment), so if this build is green-ish, I think the PR's ready for merging

{
const char *e = reinterpret_cast<const char*> (bsearch (name, map, header.entry_count, header.entry_length, TypeMappingInfo_compare_key ));
if (e == nullptr)
// This comes from the app, so let's be civil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it comes from the app, shouldn't we just SIGSEGV?

...and why am I having a strong sense of deja vu writing that comment?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...perhaps the issue is I'm misinterpreting "the app", and this could be "not us user code"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "user code" ("something outside XA runtime") may, somehow, pass a null string to e.g. typemap_java_to_managed and I see no reason to crash in this case. If "they" want to crash, that's OK, but since it's an outside factor from our POV, we should recover gracefully.

}

constexpr size_t size = sizeof(Entry);
while (nmemb > 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nmemb is a size_t, meaning it's unsigned, so the only way this loop will terminate is when it reaches 0, and only if it reaches zero.

I cannot easily determine if e.g. nmemb -= nmemb / 2 + 1 could ever result in underflow -- never mind the rest of this loop --resulting in an infinite loop.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will underflow only if nmemb == 0, but that's the loop termination condition so it won't happen.

} else {
base = ret + 1;
}
nmemb -= nmemb / 2 + 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some gcc/clang builtin to crash the process if this results in underflow?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cannot underflow. The lowest value nmemb can reach is 1, and in this case nmemb / 2 + 1 will yield a value of 1 which when subtracted from nmemb will give us 0, which will terminate the loop

@grendello
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Xamarin.Assembly needs to "translate" managed types to Java types and
vice versa in order to provide a bridge between the two world. So far it
has been done using a straightforward (and fast) method of performing
the lookups - all the type pairs were stored in two tables of the same
size, with all type names padded to the width of the longest name so
that the `bsearch` C function can be used to quickly perform a binary
search over the data set. This approach works very well at the expense
of data size (shorter strings are 0-padded to the maximum width) and a
slightly degraded performace because of the requirement to perform
string comparisons. Furthermore, the lookup required that reflection is
used to obtain full managed type name (when translating from managed to
Java) or to get a `Type` instance from type name (when translating from
Java to managed).

For Release builds all the above data is placed in the
`libxamarin-app.so` library, for Debug builds it is also placed in two
files - one for each direction of lookup, described above.

This commit is a slight improvement over the above scheme. It eliminates
reflection from the process by using managed type tokens (which are
integers) and using UUID/Guid of the module in which the type is found.
This allows us to perform the binary search over the set of 20 bytes (16
bytes for the UUID and 4 bytes for the token ID) for managed to Java
lookups and a single string comparison + binary search over a set of
integers for the Java to managed lookup.

Java type names must still be used because Java doesn't provide any
equivalent to the .NET's type token and module UUID. Those names are
still 0-padded to the width of the longest name but there are no longer
duplicated. Managed type names are eliminated completely.

If Xamarin.Android Instant Run is not used (which is the case for OSS
code) for Debug builds, the operation is performed in the same way for
both Release and Debug builds. If, however, Instant Run is in effect,
the type maps are stored in several files with the .typemap extension -
one per **module**. The files contain both the Java to managed maps as
well as managed to Java maps (which use indexes into the Java to managed
maps). All of those files are loaded during Debug app startup and used
to construct a dataset which is the searched during all the lookups.

Typemap index file format, all data is little-endian:
----

 **Header format**

 `[Magic string]`             # XATI
 `[Format version]`           # 32-bit unsigned integer, 4 bytes
 `[Entry count]`              # 32-bit unsigned integer, 4 bytes
 `[Module file name width]`   # 32-bit unsigned integer, 4 bytes
 `[Index entries]`            # Format described below, `Entry count` entries

 **Index entry format:**

 `[Module UUID][File name]<NUL>`

  *Where:*

 `[Module UUID]` is 16 bytes long
 `[File name]` is right-padded with `<NUL>` characters to the `[Module file name width]` boundary.

Typemap file format, all data is little-endian:
----

 **Header format**

 `[Magic string]`                    # XATM
 `[Format version]`                  # 32-bit integer, 4 bytes
 `[Module UUID]`                     # 16 bytes
 `[Entry count]`                     # unsigned 32-bit integer, 4 bytes
 `[Duplicate count]`                 # unsigned 32-bit integer, 4 bytes (might be 0)
 `[Java type name width]`            # unsigned 32-bit integer, 4 bytes
 `[Assembly name size]`              # unsigned 32-bit integer, 4 bytes
 `[Assembly name]`                   # Non-null terminated assembly name
 `[Java-to-managed map]`             # Format described below, `[Entry count]` entries
 `[Managed-to-java map]`             # Format described below, `[Entry count]` entries
 `[Managed-to-java duplicates map]`  # Map of unique managed IDs which point to the same Java type name (might be empty)

 **Java-to-managed map format:**

 `[Java type name]<NUL>[Managed type token ID]`

 Each name is padded with `<NUL>` to the width specified in the `[Java type name width]` field above.
 Names are written without the size prefix, instead they are always terminated with a nul character
 to make it easier and faster to handle by the native runtime.

 Each token ID is an unsigned 32-bit integer, 4 bytes

 **Managed-to-java map format:**

 `[Managed type token ID][Java type name table index]`

 Both fields are unsigned 32-bit integers, to a total of 8 bytes per entry. Index points into the
 `[Java-to-managed map]` table above.

 **Managed-to-java duplicates map format:**

 Format is identical to `[Managed-to-java]` above.

Size changes (XF integration test, `libxamarin-app.so`, Release build):
----

  - armeabi-v7a
	  - before: 376616
		-  after: 97860
  - arm64-v8a
	  - before: 377408
		-  after: 104192
  - x86
	  - before: 376424
    -  after: 97604

Performance changes (XF integration test, Release build):
----

        Device name: **Pixel 3 XL**
Device architecture: **arm64-v8a**
Number of test runs: **10**

|                 | **Native to managed**  | **Runtime init** | **Displayed** | **Notes**                      |
|-----------------|------------------------|------------------|---------------|--------------------------------|
| **master**      | 141.102                | 160.606          | 839.80        |  preload enabled; 32-bit build |
| **this commit** | 134.539                | 154.701          | 836.10        |  |
| **master**      | 141.743                | 158.325          | 837.20        | preload disabled; 32-bit build |
| **this commit** | 134.064                | 149.137          | 831.90        |  |
| **master**      | 134.526                | 152.640          | 805.10        |  preload enabled; 64-bit build |
| **this commit** | 126.376                | 143.226          | 788.60        |  |
| **master**      | 134.049                | 149.543          | 779.40        | preload disabled; 64-bit build |
| **this commit** | 124.847                | 139.227          | 776.10        |  |

Build performance (**Release** build):
----

**Before**

    389 ms  GenerateJavaStubs                          1 calls

**After**

    247 ms  GenerateJavaStubs                          1 calls

New code generates only native assembly or only the binary typemap
files, unlike the old code which generated both. Initially the new
generator code was moved to a separate task, but Jonathan Peppers
determined that it was suboptimal and re-integrated the code back with
`GenerateJavaStubs`
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

full-mono-integration-build For PRs; run a full build (~6-10h for mono bumps), not the faster PR subset (~2h for mono bumps) use-rebase-and-merge Normally we squash-and-merge PRs. Use this label so we instead rebase & merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants