Skip to content

perf: fast-path small candid numbers#709

Merged
lwshang merged 6 commits into
masterfrom
perf-1-fast-path-small-num
Mar 15, 2026
Merged

perf: fast-path small candid numbers#709
lwshang merged 6 commits into
masterfrom
perf-1-fast-path-small-num

Conversation

@sasa-tomic

@sasa-tomic sasa-tomic commented Mar 13, 2026

Copy link
Copy Markdown
Contributor

Overview
Performance improvement

Requirements
Preserve existing wire compatibility and keep record decoding behavior unchanged.

Solution
Encode and decode Nat and Int values through native LEB128 when they fit in machine integers so common small numbers avoid bigint work.

Considerations
I expect performance improvement and full forward and backward compatibility

Ref. #710

Encode and decode Nat and Int values through native LEB128 when they fit in machine integers so common small numbers avoid bigint work.
@github-actions

github-actions Bot commented Mar 13, 2026

Copy link
Copy Markdown
Name Max Mem (Kb) Encode Decode
blob 4_224 4_207_487 ($\textcolor{green}{-0.01\%}$) 2_122_432 ($\textcolor{green}{-0.00\%}$)
btreemap 73_856 ($\textcolor{green}{-2.12\%}$) 531_975_943 ($\textcolor{green}{-88.77\%}$) 13_040_266_452 ($\textcolor{green}{-14.16\%}$)
nns 192 2_021_253 ($\textcolor{green}{-0.27\%}$) 5_714_511 ($\textcolor{green}{-0.55\%}$)
nns_list_proposal 1_216 7_015_064 ($\textcolor{green}{-0.47\%}$) 67_819_354 ($\textcolor{green}{-0.07\%}$)
option_list 128 716_025 ($\textcolor{green}{-91.14\%}$) 23_547_997 ($\textcolor{green}{-11.30\%}$)
text 6_336 4_204_384 ($\textcolor{green}{-0.01\%}$) 7_877_759 ($\textcolor{green}{-0.00\%}$)
variant_list 128 711_051 ($\textcolor{green}{-91.26\%}$) 22_235_339 ($\textcolor{green}{-11.67\%}$)
vec_int16 16_704 123_694_298 ($\textcolor{green}{-0.00\%}$) 1_015_045_667 ($\textcolor{green}{-0.00\%}$)
  • Parser cost: 17_069_949
  • Extra args: 3_416_403 ($\textcolor{red}{0.00\%}$)
Click to see raw report
---------------------------------------------------

Benchmark: blob
  total:
    instructions: 6.33 M (-0.00%) (change within noise threshold)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 4.21 M (-0.01%) (change within noise threshold)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 2.12 M (-0.00%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap
  total:
    instructions: 13.57 B (improved by 31.89%)
    heap_increase: 1154 pages (improved by 2.12%)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 531.98 M (improved by 88.77%)
    heap_increase: 159 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 13.04 B (improved by 14.16%)
    heap_increase: 995 pages (improved by 2.45%)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: extra_args
  total:
    instructions: 3.42 M (0.00%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: nns
  total:
    instructions: 25.65 M (-0.14%) (change within noise threshold)
    heap_increase: 3 pages (no change)
    stable_memory_increase: 0 pages (no change)

  0. Parsing (scope):
    calls: 1 (no change)
    instructions: 17.07 M (no change)
    heap_increase: 3 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 2.02 M (-0.27%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 5.71 M (-0.55%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: nns_list_proposal
  total:
    instructions: 74.84 M (-0.11%) (change within noise threshold)
    heap_increase: 19 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 7.02 M (-0.47%) (change within noise threshold)
    heap_increase: 5 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 67.82 M (-0.07%) (change within noise threshold)
    heap_increase: 14 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: option_list
  total:
    instructions: 24.27 M (improved by 29.93%)
    heap_increase: 2 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 716.02 K (improved by 91.14%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 23.55 M (improved by 11.30%)
    heap_increase: 2 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: text
  total:
    instructions: 12.08 M (-0.00%) (change within noise threshold)
    heap_increase: 99 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 4.20 M (-0.01%) (change within noise threshold)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 7.88 M (-0.00%) (change within noise threshold)
    heap_increase: 33 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: variant_list
  total:
    instructions: 22.95 M (improved by 31.10%)
    heap_increase: 2 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 711.05 K (improved by 91.26%)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 22.24 M (improved by 11.67%)
    heap_increase: 2 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_int16
  total:
    instructions: 1.14 B (-0.00%) (change within noise threshold)
    heap_increase: 261 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 123.69 M (-0.00%) (change within noise threshold)
    heap_increase: 261 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 1.02 B (-0.00%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Summary:
  instructions:
    status:   Improvements detected 🟢
    counts:   [total 9 | regressed 0 | improved 3 | new 0 | unchanged 6]
    change:   [max +102 | p75 -288 | median -37.23K | p25 -10.36M | min -6.35B]
    change %: [max 0.00% | p75 -0.00% | median -0.11% | p25 -29.93% | min -31.89%]

  heap_increase:
    status:   Improvements detected 🟢
    counts:   [total 9 | regressed 0 | improved 1 | new 0 | unchanged 8]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min -25]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min -2.12%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 9 | regressed 0 | improved 0 | new 0 | unchanged 9]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------

Only significant changes:
| status | name                      | calls |     ins |  ins Δ% |    HI |  HI Δ% | SMI |  SMI Δ% |
|--------|---------------------------|-------|---------|---------|-------|--------|-----|---------|
|   -    | option_list::2. Decoding  |     1 |  23.55M | -11.30% |     2 |  0.00% |   0 |   0.00% |
|   -    | variant_list::2. Decoding |     1 |  22.24M | -11.67% |     2 |  0.00% |   0 |   0.00% |
|   -    | btreemap::2. Decoding     |     1 |  13.04B | -14.16% |   995 | -2.45% |   0 |   0.00% |
|   -    | option_list               |       |  24.27M | -29.93% |     2 |  0.00% |   0 |   0.00% |
|   -    | variant_list              |       |  22.95M | -31.10% |     2 |  0.00% |   0 |   0.00% |
|   -    | btreemap                  |       |  13.57B | -31.89% | 1.15K | -2.12% |   0 |   0.00% |
|   -    | btreemap::1. Encoding     |     1 | 531.98M | -88.77% |   159 |  0.00% |   0 |   0.00% |
|   -    | option_list::1. Encoding  |     1 | 716.02K | -91.14% |     0 |  0.00% |   0 |   0.00% |
|   -    | variant_list::1. Encoding |     1 | 711.05K | -91.26% |     0 |  0.00% |   0 |   0.00% |

ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

---------------------------------------------------
Successfully persisted results to canbench_results.yml

@sasa-tomic sasa-tomic marked this pull request as ready for review March 13, 2026 09:36
@sasa-tomic sasa-tomic requested a review from a team as a code owner March 13, 2026 09:36
lwshang and others added 4 commits March 15, 2026 12:47
Mirror the u64-accumulation fast path already added to Nat::decode.
Accumulate bits into a u64 with the same overflow guard, then
sign-extend at the last byte to produce an i64.  Values that fit in
i64 skip BigInt allocation entirely; values in [2^63, 2^64-1] (valid
Int but not i64) still use BigInt correctly.  The BigInt fallback for
truly large numbers mirrors the structure of Nat::decode's fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Cover the exact values where the u64 fast path hands off to BigInt:
- Nat: u64::MAX (last fast-path value) and u64::MAX+1 (first BigInt),
  plus a sweep of values near the boundary.
- Int: i64::MAX and i64::MAX+1 (positive overflow branch), i64::MIN
  (sign-extension fast path) and i64::MIN-1 (BigInt negative branch).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add inline comments explaining why the `shift < 64` branch must be
tested before evaluating `1u64 << (64 - shift)`: the expression would
panic in debug mode if shift were >= 64, so the else-false arm acts as
both the panic guard and the BigInt-fallback trigger.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The identical 7-line fits_u64 guard existed verbatim in both Nat::decode
and Int::decode. Extract it into a shared inline helper function.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lwshang lwshang merged commit 7319172 into master Mar 15, 2026
11 checks passed
@lwshang lwshang deleted the perf-1-fast-path-small-num branch March 15, 2026 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants