Skip to content

[RESP] Optimize command parsing with SIMD fast path and O(1) hash table lookup#1658

Open
badrishc wants to merge 24 commits intodevfrom
badrishc/fast-parses
Open

[RESP] Optimize command parsing with SIMD fast path and O(1) hash table lookup#1658
badrishc wants to merge 24 commits intodevfrom
badrishc/fast-parses

Conversation

@badrishc
Copy link
Copy Markdown
Collaborator

@badrishc badrishc commented Apr 1, 2026

Description of Change

Replaces the two legacy command parsing methods (FastParseArrayCommand ~950 lines of nested switch/if chains, SlowParseCommand ~815 lines of sequential SequenceEqual comparisons) with a tiered architecture:

  1. SIMD Vector128 fast path — matches ~18 hot commands (GET, SET, DEL, PING, INCR, etc.) by comparing the full RESP encoding in a single 16-byte vector comparison. Cost: 3 ops per candidate.
  2. MRU cache — 2-slot per-session cache catches repeated commands (HSET, LPUSH, ZADD) at the same 3-op cost as SIMD.
  3. Scalar ulong fast path — handles hot commands too long for SIMD (PUBLISH, SETRANGE, GETRANGE) and variable-arg commands (EXPIRE, SETEXNX, GETEX, PEXPIRE).
  4. CRC32 hash table (RespCommandHashLookup) — O(1) lookup for all 400 built-in commands. 512-entry table (16KB, L1-resident) with linear probing and per-parent subcommand tables.

Also unifies BITOP subcommand handling, adds debug assertions, hardens the hash table against edge cases (empty names, oversized names), and adds ValidatePrimaryTable() for startup integrity checks.

Key files:

  • RespCommand.cs — Refactored FastParseCommand (SIMD + scalar + MRU), removed FastParseArrayCommand and SlowParseCommand
  • RespCommandHashLookup.csNew: Hash table engine (lookup, insert, validate, subcommand dispatch)
  • RespCommandHashLookupData.csNew: Command registration (PopulatePrimaryTable, subcommand arrays)
  • RespCommandSimdPatterns.csNew: SIMD Vector128 patterns and RespPattern() helper
  • CommandParsingBenchmark.csNew: BDN benchmark covering all parser tiers

CommandParsingBenchmark Results (Params=None, batch of 100):

Command Tier dev (μs) PR (μs) Delta
ParsePING SIMD 1.576 1.427 −9.5%
ParseGET SIMD 2.145 1.847 −13.9%
ParseSET SIMD 2.823 2.453 −13.1%
ParseINCR SIMD 2.152 2.034 −5.5%
ParseEXISTS SIMD 8.891 2.319 −73.9%
ParseSETEX SIMD 3.376 3.469 +2.8%
ParsePUBLISH Scalar 3.469 new
ParseEXPIRE Scalar 3.076 3.505 +13.9%
ParseHSET Hash+MRU 10.035 3.867 −61.5%
ParseLPUSH Hash+MRU 8.981 3.176 −64.6%
ParseZADD Hash+MRU 9.599 3.843 −60.0%
ParseZRANGEBYSCORE Hash 12.455 8.539 −31.4%
ParseZREMRANGEBYSCORE Hash 13.983 9.289 −33.6%
ParseHINCRBYFLOAT Hash 11.104 9.311 −16.1%
ParseSUBSCRIBE Hash 9.649 7.196 −25.4%
ParseGEORADIUS Hash 20.397 10.478 −48.6%
ParseSETIFMATCH Hash 28.701 8.230 −71.3%
ParseCommand [AggressiveInlining]
│
├── FastParseCommand [AggressiveInlining]
│   │
│   │   // On SIMD hardware (Vector128.IsHardwareAccelerated && remainingBytes >= 16):
│   ├── SIMD pattern matching
│   │   Matches ~18 hot commands by comparing the full RESP encoding
│   │   (*N\r\n$L\r\nCMD\r\n) as a single 16-byte vector.
│   │   Cost: 1 load + 1 AND + 1 EqualsAll per candidate.
│   │   Hit → return immediately (readHead advanced past command header + name).
│   │
│   ├── MRU cache (2-slot, Vector128-based)
│   │   Catches repeated commands not in the SIMD table (e.g., HSET, LPUSH, ZADD).
│   │   Populated by ParseCommand after ArrayParseCommand resolves a command.
│   │   Slot 1 promoted to slot 0 on hit (LRU).
│   │   Hit → return immediately.
│   │
│   │   // Always (SIMD or not), if buffer starts with *N\r\n$L\r\n:
│   ├── Scalar path (ulong comparisons)
│   │   Three sections:
│   │   (1) Same fixed-arg hot commands as SIMD — fallback when remainingBytes < 16
│   │   (2) Hot commands too long for SIMD (name > 6 chars: PUBLISH, SETRANGE, etc.)
│   │   (3) Hot variable-arg commands (SETEXNX, GETEX, EXPIRE, PEXPIRE)
│   │   Hit → return immediately.
│   │
│   │   // If buffer does NOT start with *N\r\n$L\r\n:
│   └── Inline command check (FastParseInlineCommand)
│       Matches PING\r\n and QUIT\r\n (no array framing).
│       Hit → return immediately.
│
│   // FastParseCommand returned NONE — command not matched by any fast path
│
├── ArrayParseCommand [NoInlining]
│   │
│   ├── MakeUpperCase + retry FastParseCommand
│   │   If the command was lowercase, uppercases in-place and retries
│   │   FastParseCommand. Catches lowercase get, set, ping, etc.
│   │   Hit → return immediately.
│   │
│   ├── Parse RESP array header (*N\r\n)
│   │   Reads the array length. If buffer doesn't start with '*',
│   │   treats as malformed inline command and returns INVALID.
│   │
│   └── HashLookupCommand [NoInlining]
│       │
│       ├── Extract and uppercase command name ($len\r\nNAME\r\n)
│       │
│       ├── Hash table lookup (RespCommandHashLookup.Lookup)
│       │   CRC32 hash, 512-entry table (~16KB, L1-resident), linear probing.
│       │   Covers all ~400 built-in commands. O(1).
│       │
│       ├── If hash miss → TryParseCustomCommand
│       │   Runtime-registered commands (CustomTxn, CustomProcedure, etc.).
│       │   If miss → return INVALID.
│       │
│       └── If has subcommands → HandleSubcommandLookup [NoInlining]
│           Extract and uppercase subcommand name, look up in per-parent
│           hash table (CLUSTER, CLIENT, ACL, CONFIG, COMMAND, BITOP, etc.).
│           If miss → return INVALID with command-specific error message.
│
├── Update MRU cache (on SIMD hardware, if ArrayParseCommand resolved a
│   non-custom command — captures the 16-byte RESP pattern so
│   FastParseCommand's MRU check matches it on subsequent calls)
│
├── Parse arguments (parseState.Read for each remaining token)
│
└── Return command + argument count

badrishc and others added 14 commits March 25, 2026 17:33
Add RespCommandHashLookup: a cache-friendly O(1) hash table for RESP
command name resolution. Uses hardware CRC32 (SSE4.2/ARM) for hashing,
32-byte cache-line-aligned entries, and linear probing within L1 cache.

Key changes:
- New RespCommandHashLookup.cs: static hash table (512 entries, 16KB)
  mapping uppercase command name bytes to RespCommand enum values
- Per-parent subcommand hash tables for CLUSTER, CONFIG, CLIENT, ACL,
  COMMAND, SCRIPT, LATENCY, SLOWLOG, MODULE, PUBSUB, MEMORY, BITOP
- ArrayParseCommand now uses hash lookup for primary commands instead
  of the ~950-line FastParseArrayCommand nested switch/if-else chains
- BITOP pseudo-subcommands (AND/OR/XOR/NOT/DIFF) handled inline via
  dedicated ParseBitopSubcommand method with hash-based subcommand lookup
- Subcommand dispatch (CLUSTER, CONFIG, etc.) falls through to existing
  SlowParseCommand for full backward compatibility
- FastParseCommand hot path (GET, SET, PING, DEL) is completely untouched

Performance: O(1) hash lookup (~10-12 cycles) replaces O(n) sequential
comparisons (~30-300 cycles) for the long tail of ~170+ commands.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add three optimization tiers to RESP command parsing:

Tier 1 - SIMD Vector128 FastParseCommand:
  - 30 static Vector128<byte> patterns matching full RESP encoding (*N\r\n$L\r\nCMD\r\n)
  - Single 16-byte load + masked comparison validates header + command in one op
  - Covers top commands: GET, SET, DEL, TTL, PING, INCR, DECR, EXISTS, etc.
  - Falls through to existing scalar ulong switch for variable-arg commands

Tier 2 - CRC32 hash table (RespCommandHashLookup):
  - 512-entry cache-line-aligned table (16KB, L1-resident) with hardware CRC32 hash
  - O(1) lookup for ~200 primary commands + 12 subcommand tables
  - Replaces ~950-line FastParseArrayCommand nested switch/if-else chains
  - BITOP pseudo-subcommands handled via dedicated ParseBitopSubcommand

Tier 3 - SlowParseCommand (existing):
  - Subcommand dispatch for admin commands (CLUSTER, CONFIG, ACL, etc.)

Additional optimizations:
  - HashLookupCommand uses GetCommand instead of GetUpperCaseCommand
    (MakeUpperCase already uppercased the buffer, avoiding redundant work)
  - TryParseCustomCommand moved after hash lookup (built-in commands
    are far more common than custom extensions)
  - FastParseCommand hot path preserved as scalar fallback for edge cases

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Benchmarks ParseRespCommandBuffer directly to measure pure parsing
throughput. Commands categorized by their position in the OLD parser:
- Tier 1a SIMD: PING, GET, SET, INCR, EXISTS
- Tier 1b Scalar: SETEX, EXPIRE
- FastParseArrayCommand top: HSET, LPUSH, ZADD
- FastParseArrayCommand deep: ZRANGEBYSCORE, ZREMRANGEBYSCORE, HINCRBYFLOAT
- SlowParseCommand: SUBSCRIBE, GEORADIUS, SETIFMATCH

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2-entry MRU cache sits after SIMD patterns but before scalar switch
in FastParseCommand. Caches the last 2 matched command patterns as
Vector128 + mask, enabling 3-op cache hits for repeated Tier 1b/2
commands (HSET, LPUSH, ZADD etc.) that would otherwise fall through
to the scalar switch or hash table.

Cache is populated on successful ArrayParseCommand resolution and
excludes: synthetic ParseRespCommandBuffer calls (ACL checks),
subcommand results, and custom commands.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Eliminate SlowParseCommand for all subcommand routing. HandleSubcommandLookup
now uses per-parent hash tables for CLUSTER, CLIENT, ACL, CONFIG, COMMAND,
SCRIPT, LATENCY, SLOWLOG, MODULE, PUBSUB, MEMORY subcommands.

Key fixes:
- Fix CLUSTER SET-CONFIG-EPOCH hash entry (was SETCONFIGEPOCH, missing hyphens)
- Handle edge cases: COMMAND with 0 args, case-insensitive GETKEYS/USAGE
- Error message formatting: GenericErrUnknownSubCommand for CLUSTER/LATENCY,
  GenericErrUnknownSubCommandNoHelp for others
- Remove writeErrorOnFailure guard from MRU cache (unnecessary)
- Use consumedBytes (readHead - cmdStartOffset) for cache entry sizing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…le parsing

Replace references to FastParseArrayCommand/SlowParseCommand with
hash table instructions. New commands now just need one Add() call
in PopulatePrimaryTable(). Document subcommand table wiring and
warn about wire-protocol spelling (hyphens etc.).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…fixes

- Add Debug.Assert for command name length/positivity in hash table ops
- Add startup ValidateSubTable: verifies every subcommand entry round-trips
  correctly through the hash table (catches typos like SET-CONFIG-EPOCH)
- Clean up InsertIntoTable: remove redundant double-assignment of NameWord1/2,
  add explicit zero-init and clear comments on word layout contract
- Fix comment in HashLookupCommand: document that MakeUpperCase only
  uppercases the first token, subcommands need GetUpperCaseCommand
- Add comment documenting MRU cache zero-initialization safety

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 1, 2026 01:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades Garnet’s RESP command parsing pipeline by introducing SIMD-accelerated matching for the hottest commands, plus a cache-friendly hash table (with subcommand tables) to replace the previous deep switch/linear-scan parsing logic. It also adds benchmark coverage and updates contributor documentation to reflect the new recommended parsing extension points.

Changes:

  • Added RespCommandHashLookup (primary + subcommand hash tables) and integrated it into ArrayParseCommand.
  • Reworked FastParseCommand to add SIMD Vector128 pattern matching and a per-session 2-slot MRU cache.
  • Added a dedicated BenchmarkDotNet benchmark for parser-only throughput and updated docs/guides for adding commands.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
website/docs/dev/fast-parsing-plan.md Adds a detailed parsing optimization/design document.
libs/server/Resp/Parser/RespCommandHashLookup.cs New static hash-table-based command/subcommand lookup implementation.
libs/server/Resp/Parser/RespCommand.cs Integrates SIMD fast path + MRU cache + hash lookup parsing; removes legacy slow parsing paths.
benchmark/BDN.benchmark/Operations/CommandParsingBenchmark.cs Adds parsing-only microbenchmarks across tiers.
.github/skills/add-garnet-command/SKILL.md Updates contributor guidance to use the new hash lookup path.
.github/copilot-instructions.md Updates “add parsing logic” instructions to reference the new hash lookup table.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@badrishc badrishc changed the title Improve parser [RESP] Optimize command parsing with SIMD fast path and O(1) hash table lookup Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants