[RESP] Optimize command parsing with SIMD fast path and O(1) hash table lookup#1658
Open
[RESP] Optimize command parsing with SIMD fast path and O(1) hash table lookup#1658
Conversation
Add RespCommandHashLookup: a cache-friendly O(1) hash table for RESP command name resolution. Uses hardware CRC32 (SSE4.2/ARM) for hashing, 32-byte cache-line-aligned entries, and linear probing within L1 cache. Key changes: - New RespCommandHashLookup.cs: static hash table (512 entries, 16KB) mapping uppercase command name bytes to RespCommand enum values - Per-parent subcommand hash tables for CLUSTER, CONFIG, CLIENT, ACL, COMMAND, SCRIPT, LATENCY, SLOWLOG, MODULE, PUBSUB, MEMORY, BITOP - ArrayParseCommand now uses hash lookup for primary commands instead of the ~950-line FastParseArrayCommand nested switch/if-else chains - BITOP pseudo-subcommands (AND/OR/XOR/NOT/DIFF) handled inline via dedicated ParseBitopSubcommand method with hash-based subcommand lookup - Subcommand dispatch (CLUSTER, CONFIG, etc.) falls through to existing SlowParseCommand for full backward compatibility - FastParseCommand hot path (GET, SET, PING, DEL) is completely untouched Performance: O(1) hash lookup (~10-12 cycles) replaces O(n) sequential comparisons (~30-300 cycles) for the long tail of ~170+ commands. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add three optimization tiers to RESP command parsing:
Tier 1 - SIMD Vector128 FastParseCommand:
- 30 static Vector128<byte> patterns matching full RESP encoding (*N\r\n$L\r\nCMD\r\n)
- Single 16-byte load + masked comparison validates header + command in one op
- Covers top commands: GET, SET, DEL, TTL, PING, INCR, DECR, EXISTS, etc.
- Falls through to existing scalar ulong switch for variable-arg commands
Tier 2 - CRC32 hash table (RespCommandHashLookup):
- 512-entry cache-line-aligned table (16KB, L1-resident) with hardware CRC32 hash
- O(1) lookup for ~200 primary commands + 12 subcommand tables
- Replaces ~950-line FastParseArrayCommand nested switch/if-else chains
- BITOP pseudo-subcommands handled via dedicated ParseBitopSubcommand
Tier 3 - SlowParseCommand (existing):
- Subcommand dispatch for admin commands (CLUSTER, CONFIG, ACL, etc.)
Additional optimizations:
- HashLookupCommand uses GetCommand instead of GetUpperCaseCommand
(MakeUpperCase already uppercased the buffer, avoiding redundant work)
- TryParseCustomCommand moved after hash lookup (built-in commands
are far more common than custom extensions)
- FastParseCommand hot path preserved as scalar fallback for edge cases
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…add PING to parsing benchmark
Benchmarks ParseRespCommandBuffer directly to measure pure parsing throughput. Commands categorized by their position in the OLD parser: - Tier 1a SIMD: PING, GET, SET, INCR, EXISTS - Tier 1b Scalar: SETEX, EXPIRE - FastParseArrayCommand top: HSET, LPUSH, ZADD - FastParseArrayCommand deep: ZRANGEBYSCORE, ZREMRANGEBYSCORE, HINCRBYFLOAT - SlowParseCommand: SUBSCRIBE, GEORADIUS, SETIFMATCH Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2-entry MRU cache sits after SIMD patterns but before scalar switch in FastParseCommand. Caches the last 2 matched command patterns as Vector128 + mask, enabling 3-op cache hits for repeated Tier 1b/2 commands (HSET, LPUSH, ZADD etc.) that would otherwise fall through to the scalar switch or hash table. Cache is populated on successful ArrayParseCommand resolution and excludes: synthetic ParseRespCommandBuffer calls (ACL checks), subcommand results, and custom commands. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…to badrishc/fast-parses
Eliminate SlowParseCommand for all subcommand routing. HandleSubcommandLookup now uses per-parent hash tables for CLUSTER, CLIENT, ACL, CONFIG, COMMAND, SCRIPT, LATENCY, SLOWLOG, MODULE, PUBSUB, MEMORY subcommands. Key fixes: - Fix CLUSTER SET-CONFIG-EPOCH hash entry (was SETCONFIGEPOCH, missing hyphens) - Handle edge cases: COMMAND with 0 args, case-insensitive GETKEYS/USAGE - Error message formatting: GenericErrUnknownSubCommand for CLUSTER/LATENCY, GenericErrUnknownSubCommandNoHelp for others - Remove writeErrorOnFailure guard from MRU cache (unnecessary) - Use consumedBytes (readHead - cmdStartOffset) for cache entry sizing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…le parsing Replace references to FastParseArrayCommand/SlowParseCommand with hash table instructions. New commands now just need one Add() call in PopulatePrimaryTable(). Document subcommand table wiring and warn about wire-protocol spelling (hyphens etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…fixes - Add Debug.Assert for command name length/positivity in hash table ops - Add startup ValidateSubTable: verifies every subcommand entry round-trips correctly through the hash table (catches typos like SET-CONFIG-EPOCH) - Clean up InsertIntoTable: remove redundant double-assignment of NameWord1/2, add explicit zero-init and clear comments on word layout contract - Fix comment in HashLookupCommand: document that MakeUpperCase only uppercases the first token, subcommands need GetUpperCaseCommand - Add comment documenting MRU cache zero-initialization safety Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR upgrades Garnet’s RESP command parsing pipeline by introducing SIMD-accelerated matching for the hottest commands, plus a cache-friendly hash table (with subcommand tables) to replace the previous deep switch/linear-scan parsing logic. It also adds benchmark coverage and updates contributor documentation to reflect the new recommended parsing extension points.
Changes:
- Added
RespCommandHashLookup(primary + subcommand hash tables) and integrated it intoArrayParseCommand. - Reworked
FastParseCommandto add SIMD Vector128 pattern matching and a per-session 2-slot MRU cache. - Added a dedicated BenchmarkDotNet benchmark for parser-only throughput and updated docs/guides for adding commands.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| website/docs/dev/fast-parsing-plan.md | Adds a detailed parsing optimization/design document. |
| libs/server/Resp/Parser/RespCommandHashLookup.cs | New static hash-table-based command/subcommand lookup implementation. |
| libs/server/Resp/Parser/RespCommand.cs | Integrates SIMD fast path + MRU cache + hash lookup parsing; removes legacy slow parsing paths. |
| benchmark/BDN.benchmark/Operations/CommandParsingBenchmark.cs | Adds parsing-only microbenchmarks across tiers. |
| .github/skills/add-garnet-command/SKILL.md | Updates contributor guidance to use the new hash lookup path. |
| .github/copilot-instructions.md | Updates “add parsing logic” instructions to reference the new hash lookup table. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of Change
Replaces the two legacy command parsing methods (
FastParseArrayCommand~950 lines of nested switch/if chains,SlowParseCommand~815 lines of sequentialSequenceEqualcomparisons) with a tiered architecture:RespCommandHashLookup) — O(1) lookup for all400 built-in commands. 512-entry table (16KB, L1-resident) with linear probing and per-parent subcommand tables.Also unifies BITOP subcommand handling, adds debug assertions, hardens the hash table against edge cases (empty names, oversized names), and adds
ValidatePrimaryTable()for startup integrity checks.Key files:
RespCommand.cs— RefactoredFastParseCommand(SIMD + scalar + MRU), removedFastParseArrayCommandandSlowParseCommandRespCommandHashLookup.cs— New: Hash table engine (lookup, insert, validate, subcommand dispatch)RespCommandHashLookupData.cs— New: Command registration (PopulatePrimaryTable, subcommand arrays)RespCommandSimdPatterns.cs— New: SIMD Vector128 patterns andRespPattern()helperCommandParsingBenchmark.cs— New: BDN benchmark covering all parser tiersCommandParsingBenchmark Results (Params=None, batch of 100):