Skip to content

Conversation

@meiravgri
Copy link
Collaborator

@meiravgri meiravgri commented Nov 6, 2025

This PR introduces a benchmarking suite for the SVS algorithm, including basic operations on loaded indices built on top of the existing training phase infrastructure.

New Benchmarks Added

  • BM_AddLabelOneByOne - Measures time to add individual vectors to a loaded SVS index one-by-one

  • BM_TriggerUpdateTiered - Measures time to move vectors from frontend (flat buffer) to backend (SVS index) in tiered index

  • BM_RunGC - Tests graph repairing after deletions

Test Name Number of Vectors Thread Count
BM_AddLabelOneByOne 1,024 1
BM_TriggerUpdateTiered 1,024, 5,000, 10,240 (update_threshold) 2, 4, 8
BM_RunGC 50, 100, 250, 500 (num_deletions) 1

introduce BM_VecSimSVSTrain class with 2 methods: Train and TrainAsync

add GoogleTest to benchmarks so we can use ASSERT_* API

tieredIndexMock: possible to initialize with a specific thread count

add train bemchmark to CI benchmark dispatcher
rename svs_training_fp32 ->svs_indices_training_fp32

add to bm_files.sh
move svs params init to CreateTieredSVSIndex

only 5 iterations
remove 100K
move UNIT_AND_ITERATIONS and QUANT_BITS_ARGS to bm_vecsim_basics_Svs
… to bm_training_initialize.h

define DATA_TYPE_INDEX_T in bm_svs_training_fp*.cpp

remove th 10k and 50k for arm
fix include header for fp16
@codecov
Copy link

codecov bot commented Nov 6, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.63%. Comparing base (1a15b59) to head (a4c95f5).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #829   +/-   ##
=======================================
  Coverage   96.63%   96.63%           
=======================================
  Files         126      126           
  Lines        7379     7379           
=======================================
  Hits         7131     7131           
  Misses        248      248           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

we hard code the name of the data file based on quantBits
benchmark deletsion according to a when we exceed 0.5 index size
this benchmark takes 20K ms (20s) for 500 vectors!!!
that's a lot
after revert - benchmark only gc to detrmine
Base automatically changed from meiravg_svs_training_bm to main November 10, 2025 13:59
runGC instead of delete label to not be depnd on consolidation_threshold that can't be controloed and runs for vrey very long!
fix mac
@meiravgri meiravgri changed the title introduce bm_svs [MOD-9685] Introduce SVS Basic Benchmarks Nov 10, 2025
@meiravgri meiravgri enabled auto-merge November 11, 2025 13:32
@meiravgri meiravgri added this pull request to the merge queue Nov 12, 2025
Merged via the queue into main with commit 11cdc8b Nov 12, 2025
37 of 38 checks passed
@meiravgri meiravgri deleted the meiravg-svs_basic_bm branch November 12, 2025 13:55
@github-actions
Copy link

Backport failed for 8.2, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 8.2
git worktree add -d .worktree/backport-829-to-8.2 origin/8.2
cd .worktree/backport-829-to-8.2
git switch --create backport-829-to-8.2
git cherry-pick -x 11cdc8b7a8435dfb2e62f37293f6dabd55b7465b

meiravgri added a commit that referenced this pull request Nov 13, 2025
* initial imp of training bm

introduce BM_VecSimSVSTrain class with 2 methods: Train and TrainAsync

add GoogleTest to benchmarks so we can use ASSERT_* API

tieredIndexMock: possible to initialize with a specific thread count

add train bemchmark to CI benchmark dispatcher

* make bm_files general for other algs

rename svs_training_fp32 ->svs_indices_training_fp32

add to bm_files.sh

* replace std::formtat only supported from gcc13 with ostringstream

* format

* revrt assert

* intialize quantBits
move svs params init to CreateTieredSVSIndex

only 5 iterations

* move iterawtion logic to runTrainBMIteration

add compressed index bm

* assert depdnding on HAVE_SVS_LVQ

* sepearate non compression and compression bm

* TO REVERT !!! test abort

* fix if else

* revrt timeoutgurard vhanges

* dont pause after training to see how it affrects performance

remove some prints

* fix #ifdef HAVE_SVS_LVQ to #if HAVE_SVS_LVQ

* use pause timers its faster

* do 3 iter instread of 5 and test if results are stable

* use 5 again

* fix download all all script

* fp16 bm

remove 100K

* remove 100K from fp32

* increase timeout

* try bigger machine

* try a bigger machine

* try 2 iter

move UNIT_AND_ITERATIONS and QUANT_BITS_ARGS to bm_vecsim_basics_Svs

* unify bm_training_initialize_fp32.h and bm_training_initialize_fp16.h to bm_training_initialize.h

define DATA_TYPE_INDEX_T in bm_svs_training_fp*.cpp

remove th 10k and 50k for arm

* reevet timeout to 10

fix include header for fp16

* move CreateTieredSVSParams and verifyNumThreads to svs params

* revert increease machine size

* change assert to log

* fix

* fix2

* format

* introduce bm_svs

* add tiered
add NewIndex from existing svs to tiered factory
imp AddLabel
add AddLabelBatches(not implmneted)

take bm_utils from meiravg_svs_training_bm

* introduce setUpdateTriggerThreshold in BUILD_TESTS

move initialize index to a function

introfuce bm function:
addlabel: insert one by one
AddLabelBatches: add in batches with one thread
AddLabelAsync: add in batches with multiple threads

* fix comment
add to yml

* remove lock

* format

* fix num threads in addlabelinplace
fix assertupdateTriggerThreshold in AddLabelAsync

* use train svs instead

* format

* small fixes

* rename BM_VecSimSVSTrain->BM_VecSimSVS

bm_vecsim_svs_train.h->bm_vecsim_svs

* remove unrelated

* align with new name

* revert unnecessary changes in bm_vecsim_index

add LVQ BM if HAVE_SVS_LVQ

* fix include

* fix quantbits

* extract general

* fix missing main on LVQ cpp

* replace vectors file

* run only BENCHMARK_MAIN

* try dummy for mac

* fix DATA_TYPE_INDEX_T definition LVQ

* quantBits is now static and needs to be intizlied by the CPP file
we hard code the name of the data file based on quantBits

* TO REVERT:
benchmark deletsion according to a when we exceed 0.5 index size
this benchmark takes 20K ms (20s) for 500 vectors!!!
that's a lot
after revert - benchmark only gc to detrmine

* REVERT svs.h change consolidation_threshold

runGC instead of delete label to not be depnd on consolidation_threshold that can't be controloed and runs for vrey very long!

* revert unrelated changes

* fix LVQ8 cpp for non LVQ

* foirmat

* cleanups

fix mac

* remove new line in cmake

* Update tests/benchmark/bm_vecsim_svs.h

Co-authored-by: BenGoldberger <[email protected]>

---------

Co-authored-by: BenGoldberger <[email protected]>
(cherry picked from commit 11cdc8b)
github-merge-queue bot pushed a commit that referenced this pull request Nov 13, 2025
* [MOD-9685] Introduce SVS Basic Benchmarks (#829)

* initial imp of training bm

introduce BM_VecSimSVSTrain class with 2 methods: Train and TrainAsync

add GoogleTest to benchmarks so we can use ASSERT_* API

tieredIndexMock: possible to initialize with a specific thread count

add train bemchmark to CI benchmark dispatcher

* make bm_files general for other algs

rename svs_training_fp32 ->svs_indices_training_fp32

add to bm_files.sh

* replace std::formtat only supported from gcc13 with ostringstream

* format

* revrt assert

* intialize quantBits
move svs params init to CreateTieredSVSIndex

only 5 iterations

* move iterawtion logic to runTrainBMIteration

add compressed index bm

* assert depdnding on HAVE_SVS_LVQ

* sepearate non compression and compression bm

* TO REVERT !!! test abort

* fix if else

* revrt timeoutgurard vhanges

* dont pause after training to see how it affrects performance

remove some prints

* fix #ifdef HAVE_SVS_LVQ to #if HAVE_SVS_LVQ

* use pause timers its faster

* do 3 iter instread of 5 and test if results are stable

* use 5 again

* fix download all all script

* fp16 bm

remove 100K

* remove 100K from fp32

* increase timeout

* try bigger machine

* try a bigger machine

* try 2 iter

move UNIT_AND_ITERATIONS and QUANT_BITS_ARGS to bm_vecsim_basics_Svs

* unify bm_training_initialize_fp32.h and bm_training_initialize_fp16.h to bm_training_initialize.h

define DATA_TYPE_INDEX_T in bm_svs_training_fp*.cpp

remove th 10k and 50k for arm

* reevet timeout to 10

fix include header for fp16

* move CreateTieredSVSParams and verifyNumThreads to svs params

* revert increease machine size

* change assert to log

* fix

* fix2

* format

* introduce bm_svs

* add tiered
add NewIndex from existing svs to tiered factory
imp AddLabel
add AddLabelBatches(not implmneted)

take bm_utils from meiravg_svs_training_bm

* introduce setUpdateTriggerThreshold in BUILD_TESTS

move initialize index to a function

introfuce bm function:
addlabel: insert one by one
AddLabelBatches: add in batches with one thread
AddLabelAsync: add in batches with multiple threads

* fix comment
add to yml

* remove lock

* format

* fix num threads in addlabelinplace
fix assertupdateTriggerThreshold in AddLabelAsync

* use train svs instead

* format

* small fixes

* rename BM_VecSimSVSTrain->BM_VecSimSVS

bm_vecsim_svs_train.h->bm_vecsim_svs

* remove unrelated

* align with new name

* revert unnecessary changes in bm_vecsim_index

add LVQ BM if HAVE_SVS_LVQ

* fix include

* fix quantbits

* extract general

* fix missing main on LVQ cpp

* replace vectors file

* run only BENCHMARK_MAIN

* try dummy for mac

* fix DATA_TYPE_INDEX_T definition LVQ

* quantBits is now static and needs to be intizlied by the CPP file
we hard code the name of the data file based on quantBits

* TO REVERT:
benchmark deletsion according to a when we exceed 0.5 index size
this benchmark takes 20K ms (20s) for 500 vectors!!!
that's a lot
after revert - benchmark only gc to detrmine

* REVERT svs.h change consolidation_threshold

runGC instead of delete label to not be depnd on consolidation_threshold that can't be controloed and runs for vrey very long!

* revert unrelated changes

* fix LVQ8 cpp for non LVQ

* foirmat

* cleanups

fix mac

* remove new line in cmake

* Update tests/benchmark/bm_vecsim_svs.h

Co-authored-by: BenGoldberger <[email protected]>

---------

Co-authored-by: BenGoldberger <[email protected]>
(cherry picked from commit 11cdc8b)

* fix factory

---------

Co-authored-by: BenGoldberger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants