Native serialization to a stream for FlatIndex by razdoburdin · Pull Request #280 · intel/ScalableVectorSearch

razdoburdin · 2026-03-02T10:00:24Z

Reopening of #275 for developer branch

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

mergify · 2026-03-02T10:00:58Z

⚠️ The sha of the head commit of this PR conflicts with #275. Mergify cannot evaluate rules on this PR. Once #275 is merged or closed, Mergify will resume processing this PR. ⚠️

rfsaliev · 2026-03-04T10:01:08Z

+
+template <typename T = void> class StreamWriter : public Writer<T, StreamWriter<T>> {
+  public:
+    StreamWriter(std::ostream& os)


It seemed like the Header structure written by FileWriter at the beginning of a file has some important information including:

magic number and uuid - for versioning.

stored data size.
Why StreamWriter does not populate the same header?

General question:
How are we going to handle cases when several objects to be stored/loaded in a stream?
E.g. in case of Vamana index, we have to store/load configuration, graph and data (where data may contain 2 simple datasets for LVQ/LeanVec cases).

For FileWriter ostream is seek able (we know that it is a fstream) , so we can insert placeholder, write data, calculate size of data being written, replace placeholder to an actual header. But for StreamWriter ostream may be non seek able, and we can't do the same trick with placeholder Header.
I see two options here:

Create temporary seek able stringstream, and use it as a buffer. But it creates a 2x memory overhead in serialization.

Extract all required information from metadata. In this case we don't need Header.

I have used the fhe first approach (with stringstream) for toml::table serialization, since metadata are small, and overhead doesn't look like an acceptable trade-off in this case.
But for the main data I try to realize the second option (without overhead). I haven't started work on Vamana yet, so I am not confident, if metadata contains all required information in this case.

So, I would add a test for flat+LVQ/LeanVec to validate if multi-dataset cases are managed properly.

rfsaliev

LGFM
Except objections regarding multiple data/datasets in 1 stream - to be verified on next steps during implementation of Vamana index support.

This PR adds native stream serialization to all SVS index types, as an alternative to the existing (legacy) directory-based serialization. It allow to avoid filesystem round-trips of the data. The native serialization doesn't require from the stream to be seek able, so no additional restrictions were introduced. See the following PR for details: #280, #281, #285, #286, #289, #292, #294, #296, #299 Main changes are: 1. A CRTP base `Archiver` extracts binary I/O primitives (`write_size`, `read_size`, `write_name`, `read_name`, `read_from_istream`) from `DirectoryArchiver`. `DirectoryArchiver` and new `StreamArchiver` class inherit from `Archiver`. `StreamArchiver` has its own magic number ("SVS_STRM") to distinguish native streams from directory archives. 2. The monolithic `Writer` is split via CRTP with two derived classes: `FileWriter` owns an `std::ofstream`, writes a header, flushes on destructor, `StreamWriter` wraps an external `std::ostream&`, no header/lifecycle management. This allows `io::save(data, os)` to write vector data directly to any stream. 3. The `save(stream)` in orchestrator `Impl` classes no longer does temp-dir->pack. Instead it directly calls `impl().save(stream)`. 4. The dispatching between new (native) and old (legacy) deserialization is made at the orchestrators. `Deserializer::build(is)` reads the magic number, exposes `is_native()` to choose path. --------- Co-authored-by: Dmitry Razdoburdin <drazdobu@intel.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Rafik Saliev <rafik.f.saliev@intel.com> Co-authored-by: ethanglaser <42726565+ethanglaser@users.noreply.github.com>

Dmitry Razdoburdin and others added 9 commits February 16, 2026 03:54

initial

f211865

fixes

77ce251

Update include/svs/core/data/simple.h

1055a82

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

adress copilot comments

a186515

Merge branch 'main' into serialization_flat_index

4e882cc

avoid stringstream.tellp(); improve comments

a0988f0

fix macos

f593ba6

fix backwardcompatibility; add tests

394fb3e

typos and minor fixes

0b81537

razdoburdin requested review from ahuber21, ibhati, mihaic and yuejiaointel as code owners March 2, 2026 10:00

razdoburdin mentioned this pull request Mar 2, 2026

Native serialization to a stream for FlatIndex #275

Closed

razdoburdin requested a review from rfsaliev March 2, 2026 17:06

rfsaliev reviewed Mar 4, 2026

View reviewed changes

rfsaliev approved these changes Mar 4, 2026

View reviewed changes

razdoburdin merged commit c6c42c4 into intel:dev/razdoburdin_streaming Mar 4, 2026
37 checks passed

razdoburdin mentioned this pull request Mar 24, 2026

Native serialization for Indexes. #300

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native serialization to a stream for FlatIndex#280

Native serialization to a stream for FlatIndex#280
razdoburdin merged 9 commits into
intel:dev/razdoburdin_streamingfrom
razdoburdin:serialization_flat_index

razdoburdin commented Mar 2, 2026

Uh oh!

mergify Bot commented Mar 2, 2026

Uh oh!

rfsaliev Mar 4, 2026

Uh oh!

razdoburdin Mar 4, 2026

Uh oh!

rfsaliev Mar 4, 2026 •

edited

Loading

Uh oh!

rfsaliev left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

razdoburdin commented Mar 2, 2026

Uh oh!

mergify Bot commented Mar 2, 2026

Uh oh!

rfsaliev Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

razdoburdin Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

rfsaliev Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rfsaliev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rfsaliev Mar 4, 2026 •

edited

Loading