Skip to content

perf: batch cost tracking for primitive vector decode#716

Merged
lwshang merged 1 commit into
masterfrom
sat-perf-batch-decode
Mar 15, 2026
Merged

perf: batch cost tracking for primitive vector decode#716
lwshang merged 1 commit into
masterfrom
sat-perf-batch-decode

Conversation

@sasa-tomic

@sasa-tomic sasa-tomic commented Mar 13, 2026

Copy link
Copy Markdown
Contributor

Summary

Pre-compute the total decoding cost for primitive vectors at the vector level instead of calling add_cost per element. Also sets primitive_vec_fast_path once before visit_seq rather than saving and restoring it on every element.

  • Added primitive_byte_cost helper
  • Pre-computed len * (3 + byte_cost) in deserialize_seq for primitive vectors
  • Removed per-element add_cost from primitive_impl! and deserialize_bool fast paths
  • Moved add_cost(3) into non-primitive branch in next_element_seed

Benchmark (canbench, wasm32)

Metric Before After Change
vec_int16 Decoding 694.2M inst 409.0M inst 41.1% faster
vec_int16 Total 817.9M inst 532.7M inst -34.9%

No regressions on any of the 9 benchmarks.

Compatibility

Wire format and cost accounting semantics are unchanged. The total cost charged is identical — it's just computed once upfront instead of incrementally.

Relates to #710

@sasa-tomic sasa-tomic requested a review from a team as a code owner March 13, 2026 14:05
@github-actions

github-actions Bot commented Mar 13, 2026

Copy link
Copy Markdown
Name Max Mem (Kb) Encode Decode
blob 4_224 4_207_487 2_122_465
btreemap 73_856 531_975_943 13_058_092_273
nns 192 2_021_253 5_670_657 ($\textcolor{green}{-0.04\%}$)
nns_list_proposal 1_216 7_018_096 ($\textcolor{red}{0.04\%}$) 64_298_369 ($\textcolor{green}{-0.10\%}$)
option_list 64 715_981 21_800_274
text 6_336 4_204_384 7_877_792
variant_list 64 710_969 20_592_254 ($\textcolor{green}{-0.02\%}$)
vec_int16 12_480 8_404_689 408_970_132 ($\textcolor{green}{-35.64\%}$)
  • Parser cost: 16_179_361 ($\textcolor{red}{0.03\%}$)
  • Extra args: 2_838_484 ($\textcolor{green}{-1.17\%}$)
Click to see raw report
---------------------------------------------------

Benchmark: blob
  total:
    instructions: 6.33 M (no change)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 4.21 M (no change)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 2.12 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: btreemap
  total:
    instructions: 13.59 B (no change)
    heap_increase: 1154 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 531.98 M (no change)
    heap_increase: 159 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 13.06 B (no change)
    heap_increase: 995 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: extra_args
  total:
    instructions: 2.84 M (-1.17%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: nns
  total:
    instructions: 24.71 M (0.01%) (change within noise threshold)
    heap_increase: 3 pages (no change)
    stable_memory_increase: 0 pages (no change)

  0. Parsing (scope):
    calls: 1 (no change)
    instructions: 16.18 M (0.03%) (change within noise threshold)
    heap_increase: 3 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 2.02 M (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 5.67 M (-0.04%) (change within noise threshold)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: nns_list_proposal
  total:
    instructions: 71.32 M (-0.08%) (change within noise threshold)
    heap_increase: 19 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 7.02 M (0.04%) (change within noise threshold)
    heap_increase: 5 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 64.30 M (-0.10%) (change within noise threshold)
    heap_increase: 14 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: option_list
  total:
    instructions: 22.52 M (no change)
    heap_increase: 1 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 715.98 K (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 21.80 M (no change)
    heap_increase: 1 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: text
  total:
    instructions: 12.08 M (no change)
    heap_increase: 99 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 4.20 M (no change)
    heap_increase: 66 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 7.88 M (no change)
    heap_increase: 33 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: variant_list
  total:
    instructions: 21.31 M (-0.02%) (change within noise threshold)
    heap_increase: 1 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 710.97 K (no change)
    heap_increase: 0 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 20.59 M (-0.02%) (change within noise threshold)
    heap_increase: 1 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Benchmark: vec_int16
  total:
    instructions: 417.38 M (improved by 35.18%)
    heap_increase: 195 pages (no change)
    stable_memory_increase: 0 pages (no change)

  1. Encoding (scope):
    calls: 1 (no change)
    instructions: 8.40 M (no change)
    heap_increase: 130 pages (no change)
    stable_memory_increase: 0 pages (no change)

  2. Decoding (scope):
    calls: 1 (no change)
    instructions: 408.97 M (improved by 35.64%)
    heap_increase: 65 pages (no change)
    stable_memory_increase: 0 pages (no change)

---------------------------------------------------

Summary:
  instructions:
    status:   Improvements detected 🟢
    counts:   [total 9 | regressed 0 | improved 1 | new 0 | unchanged 8]
    change:   [max +2.97K | p75 0 | median 0 | p25 -33.63K | min -226.49M]
    change %: [max +0.01% | p75 0.00% | median 0.00% | p25 -0.08% | min -35.18%]

  heap_increase:
    status:   No significant changes 👍
    counts:   [total 9 | regressed 0 | improved 0 | new 0 | unchanged 9]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

  stable_memory_increase:
    status:   No significant changes 👍
    counts:   [total 9 | regressed 0 | improved 0 | new 0 | unchanged 9]
    change:   [max 0 | p75 0 | median 0 | p25 0 | min 0]
    change %: [max 0.00% | p75 0.00% | median 0.00% | p25 0.00% | min 0.00%]

---------------------------------------------------

Only significant changes:
| status | name                   | calls |     ins |  ins Δ% |  HI |  HI Δ% | SMI |  SMI Δ% |
|--------|------------------------|-------|---------|---------|-----|--------|-----|---------|
|   -    | vec_int16              |       | 417.38M | -35.18% | 195 |  0.00% |   0 |   0.00% |
|   -    | vec_int16::2. Decoding |     1 | 408.97M | -35.64% |  65 |  0.00% |   0 |   0.00% |

ins = instructions, HI = heap_increase, SMI = stable_memory_increase, Δ% = percent change

---------------------------------------------------
Successfully persisted results to canbench_results.yml

@lwshang lwshang changed the base branch from sat-perf-improvements to master March 15, 2026 17:49
Pre-compute the total decoding cost for primitive vectors at the
vector level instead of calling add_cost per element. Also sets
primitive_vec_fast_path once before visit_seq rather than saving
and restoring it on every element.

Benchmark: vec_int16 decoding 694M → 409M instructions (41% faster).

Wire format and cost accounting semantics are unchanged.

Made-with: Cursor
@lwshang lwshang force-pushed the sat-perf-batch-decode branch from d55bd46 to 40251ab Compare March 15, 2026 18:01
@lwshang lwshang merged commit beb56d5 into master Mar 15, 2026
11 checks passed
@lwshang lwshang deleted the sat-perf-batch-decode branch March 15, 2026 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants