perf: batch cost tracking for primitive vector decode by sasa-tomic · Pull Request #716 · dfinity/candid

sasa-tomic · 2026-03-13T14:05:04Z

Summary

Pre-compute the total decoding cost for primitive vectors at the vector level instead of calling add_cost per element. Also sets primitive_vec_fast_path once before visit_seq rather than saving and restoring it on every element.

Added primitive_byte_cost helper
Pre-computed len * (3 + byte_cost) in deserialize_seq for primitive vectors
Removed per-element add_cost from primitive_impl! and deserialize_bool fast paths
Moved add_cost(3) into non-primitive branch in next_element_seed

Benchmark (canbench, wasm32)

Metric	Before	After	Change
vec_int16 Decoding	694.2M inst	409.0M inst	41.1% faster
vec_int16 Total	817.9M inst	532.7M inst	-34.9%

No regressions on any of the 9 benchmarks.

Compatibility

Wire format and cost accounting semantics are unchanged. The total cost charged is identical — it's just computed once upfront instead of incrementally.

Relates to #710

github-actions · 2026-03-13T14:10:04Z

Name	Max Mem (Kb)	Encode	Decode
blob	4_224	4_207_487	2_122_465
btreemap	73_856	531_975_943	13_058_092_273
nns	192	2_021_253	5_670_657 ($\textcolor{green}{-0.04\%}$)
nns_list_proposal	1_216	7_018_096 ($\textcolor{red}{0.04\%}$)	64_298_369 ($\textcolor{green}{-0.10\%}$)
option_list	64	715_981	21_800_274
text	6_336	4_204_384	7_877_792
variant_list	64	710_969	20_592_254 ($\textcolor{green}{-0.02\%}$)
vec_int16	12_480	8_404_689	408_970_132 ($\textcolor{green}{-35.64\%}$)

Parser cost: 16_179_361 ($\textcolor{red}{0.03\%}$)
Extra args: 2_838_484 ($\textcolor{green}{-1.17\%}$)

Click to see raw report

---------------------------------------------------

Benchmark: blob
  total:
    instructions: 6.33 M (no change)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 4.21 M (no change)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 2.12 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap
  total:
    instructions: 13.59 B (no change)
    heap_increase: 1154 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 531.98 M (no change)
    heap_increase: 159 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 13.06 B (no change)
    heap_increase: 995 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: extra_args
  total:
    instructions: 2.84 M (-1.17%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: nns
  total:
    instructions: 24.71 M (0.01%) (change within noise threshold)
    heap_increase: 3 pages (no change)
    stable_memory_increase: 0 pages (no change)

  0. Parsing (scope):
    calls: 1 (no change)
    instructions: 16.18 M (0.03%) (change within noise threshold)
    heap_increase: 3 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 2.02 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 5.67 M (-0.04%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: nns_list_proposal
  total:
    instructions: 71.32 M (-0.08%) (change within noise threshold)
    heap_increase: 19 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 7.02 M (0.04%) (change within noise threshold)
    heap_increase: 5 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 64.30 M (-0.10%) (change within noise threshold)
    heap_increase: 14 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: option_list
  total:
    instructions: 22.52 M (no change)
    heap_increase: 1 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 715.98 K (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 21.80 M (no change)
    heap_increase: 1 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: text
  total:
    instructions: 12.08 M (no change)
    heap_increase: 99 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 4.20 M (no change)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 7.88 M (no change)
    heap_increase: 33 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: variant_list
  total:
    instructions: 21.31 M (-0.02%) (change within noise threshold)
    heap_increase: 1 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 710.97 K (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 20.59 M (-0.02%) (change within noise threshold)
    heap_increase: 1 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_int16
  total:
    instructions: 417.38 M (improved by 35.18%)
    heap_increase: 195 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 8.40 M (no change)
    heap_increase: 130 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 408.97 M (improved by 35.64%)
    heap_increase: 65 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Summary:
  instructions:
    status:   Improvements detected 🟢
    counts:   [total 9 | regressed 0 | improved 1 | new 0 | unchanged 8]
    change:   [max +2.97K | p75 0 | median 0 | p25 -33.63K | min -226.49M]
    change %: [max +0.01% | p75 0.00% | median 0.00% | p25 -0.08% | min -35.18%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 9 | regressed 0 | improved 0 | new 0 | unchanged 9]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 9 | regressed 0 | improved 0 | new 0 | unchanged 9]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------

Only significant changes:
| status | name                   | calls |     ins |  ins Δ% |  HI |  HI Δ% | SMI |  SMI Δ% |
|--------|------------------------|-------|---------|---------|-----|--------|-----|---------|
|   -    | vec_int16              |       | 417.38M | -35.18% | 195 |  0.00% |   0 |   0.00% |
|   -    | vec_int16::2. Decoding |     1 | 408.97M | -35.64% |  65 |  0.00% |   0 |   0.00% |

ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

---------------------------------------------------
Successfully persisted results to canbench_results.yml

Pre-compute the total decoding cost for primitive vectors at the vector level instead of calling add_cost per element. Also sets primitive_vec_fast_path once before visit_seq rather than saving and restoring it on every element. Benchmark: vec_int16 decoding 694M → 409M instructions (41% faster). Wire format and cost accounting semantics are unchanged. Made-with: Cursor

sasa-tomic requested a review from a team as a code owner March 13, 2026 14:05

lwshang changed the base branch from sat-perf-improvements to master March 15, 2026 17:49

lwshang force-pushed the sat-perf-batch-decode branch from d55bd46 to 40251ab Compare March 15, 2026 18:01

lwshang approved these changes Mar 15, 2026

View reviewed changes

lwshang merged commit beb56d5 into master Mar 15, 2026
11 checks passed

lwshang deleted the sat-perf-batch-decode branch March 15, 2026 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: batch cost tracking for primitive vector decode#716

perf: batch cost tracking for primitive vector decode#716
lwshang merged 1 commit into
masterfrom
sat-perf-batch-decode

sasa-tomic commented Mar 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sasa-tomic commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark (canbench, wasm32)

Compatibility

Uh oh!

github-actions Bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sasa-tomic commented Mar 13, 2026 •

edited

Loading

github-actions Bot commented Mar 13, 2026 •

edited

Loading