-
Notifications
You must be signed in to change notification settings - Fork 766
Use xxhash for composite checker keys instead of strings #2288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR replaces string-based map keys with xxhash-based (128-bit) keys to reduce memory allocations. The change affects various type caches throughout the checker, converting from human-readable string keys to opaque hash values. The PR achieves a 35% reduction in memory allocations with minimal performance impact.
Key changes:
- Replaced
KeyBuilder(string builder) withkeyBuilder(hash builder) using xxh3 - Updated all internal type cache maps to use
xxh3.Uint128as keys instead of strings - Modified
getRelationKeyto return both a hash and a boolean indicating whether type parameters are constrained
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| internal/checker/types.go | Updated map type signatures to use xxh3.Uint128 keys for type instantiations, conditional types, and index symbol caches |
| internal/checker/relater.go | Changed relation cache to use hash keys and updated getRelationKey to return constrained flag separately |
| internal/checker/flow.go | Replaced flow reference key from string to hash, added refKeyValid boolean flag to track validity instead of checking for "?" sentinel value |
| internal/checker/checker.go | Rewrote key builder implementation to hash data instead of building strings; updated all type cache maps and special signature keys to use hashes |
|
In general, in places where we now write a fixed number of bytes (e.g. four bytes for a type ID), we don't need separators between multiple consecutive IDs. |
| var ( | ||
| SignatureKeyErased = CacheHashKey(xxh3.HashString128("-")) | ||
| SignatureKeyCanonical = CacheHashKey(xxh3.HashString128("*")) | ||
| SignatureKeyBase = CacheHashKey(xxh3.HashString128("#")) | ||
| SignatureKeyInner = CacheHashKey(xxh3.HashString128("<")) | ||
| SignatureKeyOuter = CacheHashKey(xxh3.HashString128(">")) | ||
| SignatureKeyImplementation = CacheHashKey(xxh3.HashString128("+")) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These would be better as functions returning the structs, to actually be constants. It's unfortunate to not have const structs.
| b.WriteByte(base64chars[value&0x3F]) | ||
| value >>= 6 | ||
| } | ||
| type CacheHashKey xxh3.Uint128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshed, but could also be CompositeMapKey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| // The result is an empty string if the reference isn't a dotted name. | ||
| func (c *Checker) getFlowReferenceKey(f *FlowState) string { | ||
| var b KeyBuilder | ||
| // The second return value indicates whether the reference is a valid dotted name. |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states "The second return value indicates whether the reference is a valid dotted name" but the function only returns a single value of type CacheHashKey. The comment should be updated to clarify that an invalid dotted name is indicated by a zero CacheHashKey value, not a second return value.
Often when I see people's profiles, the top allocator is
WriteByte, all from the checker building runtime map keys.We stringify data to use as map keys for union types, intersection types, etc. In JS, this is cheap thanks to the "rope" optimization where the strings are never actually produced unless required (which does not include
Maplookups!).But in Go, this doesn't work so well because we do actually have to back strings with real fully-written memory, and it's not easy to do a lookup otherwise. There's no magic string type that can expand itself when read. A Go stdlib map that allows custom hashing/equality is seemingly not coming soon.
An alternative is to just hash the data and use that hash as a key. A while ago I had tried use
sha256to do this (like gopls does), but did not see much success. And, introducingcryptointo the binary is a can of worms I don't want to open. So, I gave up on that idea.We have since then added xxhash to our dependencies for incremental mode, the LSP, etc.
A 128-bit xxhash could be argued to have enough collision resistance; UUIDs are of course also 128-bit. What are the chances of a UUID-level collision happening within a single compile? 😅
This PR tries that out, swapping our key builder out for one that hashes the data as it comes in instead.
The effect seems pretty good. VS Code before:
And after:
The time in general seems unchanged, but with 35% fewer allocations.
Downside is of course that the map keys become unreadable and irreversible. Right now, they're only "unreadable" thanks to us making numbers into shorter strings. We could have a mode which preserves the data behind a build tag. Not sure about that.
Thankfully, these keys are never actually exposed through any API, so should be opaque enough.