Implement computed goto dispatch for the CoreCLR interpreter#129216
Open
BrzVlad wants to merge 3 commits into
Open
Implement computed goto dispatch for the CoreCLR interpreter#129216BrzVlad wants to merge 3 commits into
BrzVlad wants to merge 3 commits into
Conversation
Member
Author
|
/azp run runtime-interpreter |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Member
Author
|
/azp run runtime-libraries-interpreter |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Member
Author
|
Assembly before Assembly after |
janvorli
approved these changes
Jun 17, 2026
Instead of making each opcode dispatch to the loop start where we switch on the opcode, we transform every switch case into a label, we let the compiler statically populate the s_dispatchTable which maps each opcode to the label address. This makes opcode dispatch a simple load + branch to the label address from this table. This makes the interpreter 25% faster. It is unclear whether this has any impact on wasm. Likely not, because wasm has control flow limitations that make random branches impossible.
Removing this can improve execution speed by around 3%. Instead we explicitly save pFrame->ip in opcodes that can trigger GC or throw exception. Add InterpThrow helper to ensure pFrame->ip is set before all throws.
1f2bd0a to
e39e619
Compare
Member
Author
|
/azp run runtime-interpreter |
|
No pipelines are associated with this pull request. |
Member
Author
|
/azp run runtime-interpreter |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Member
Author
|
/azp run runtime-libraries-interpreter |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This was referenced Jun 17, 2026
Open
Comment on lines
+854
to
+857
| #if USE_COMPUTED_GOTO | ||
| #define INTOP_CASE(x) LABEL_ ## x: | ||
| #define INTOP_DISPATCH(op) opcode = (uint32_t)(op); goto *s_dispatchTable[opcode] | ||
| #define INTOP_NEXT INTOP_DISPATCH(ip[0]) |
Comment on lines
+1417
to
+1421
| static const void* const s_dispatchTable[] = { | ||
| #define OPDEF(a,b,c,d,e,f) &&LABEL_ ## a, | ||
| #include "intops.def" | ||
| #undef OPDEF | ||
| }; |
Comment on lines
+79
to
+83
| #ifdef PERFTRACING_DISABLE_THREADS | ||
| OPDEF(INTOP_PROF_SAMPLEPOINT, "prof.samplepoint", 1, 0, 0, InterpOpNoArgs) | ||
| #endif | ||
|
|
||
| #if defined(TARGET_BROWSER) && defined(PERFTRACING_DISABLE_THREADS) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Instead of making each opcode dispatch to the loop start where we switch on the opcode, we transform every switch case into a label, we let the compiler statically populate the s_dispatchTable which maps each opcode to the label address. This makes opcode dispatch a simple load + branch to the label address from this table. This makes the interpreter 25% faster.
It is unclear whether this has any impact on wasm. Likely not, because wasm has control flow limitations that make random branches impossible.
As an additional optimization, we avoid saving
pFrame->ipfor each opcode. Removing this can improve execution speed by around 3%. Instead we explicitly savepFrame->ipin opcodes that can trigger GC or throw exception. Add InterpThrow helper to ensurepFrame->ipis set before all throws.