Skip to content

refactor: Switch Arrow/Orjson serializers from Blake3 to xxHash3 #1

@27Bslash6

Description

@27Bslash6

Summary

Arrow and Orjson serializers use checksums for integrity checking. This issue tracks alignment with the Rust ByteStorage layer.

Current State (2025-12-11)

Phase 1 Complete: Switched to xxHash3-64 via Python xxhash package

Serializer Checksum Size Implementation
StandardSerializer xxHash3-64 8 bytes Rust ByteStorage (FFI)
ArrowSerializer xxHash3-64 8 bytes Python xxhash package
OrjsonSerializer xxHash3-64 8 bytes Python xxhash package

Files updated:

  • src/cachekit/serializers/arrow_serializer.py
  • src/cachekit/serializers/orjson_serializer.py
  • Tests: test_xxhash_integrity.py (14 new tests), updated existing tests

Future Work: FFI Implementation

🔮 Phase 2 (Optional): Use Rust FFI for checksums instead of Python package

Blocked by: cachekit-io/cachekit-core#13 (checksum-only API)

# Current (Python xxhash)
import xxhash
checksum = xxhash.xxh3_64_digest(data)

# Future (Rust FFI) - requires cachekit-core#13
from cachekit._rust_serializer import compute_checksum
checksum = compute_checksum(data)

Benefits of FFI approach:

  • Single implementation (no Python xxhash dependency)
  • Consistent with StandardSerializer path
  • Potentially faster for large payloads (avoid Python GIL)

Trade-offs:

  • FFI overhead may negate speed gains for small payloads
  • More complex build (Rust required)
  • Current Python solution works fine

Decision Log

  • 2025-12-11: Implemented Phase 1 (Python xxhash). Phase 2 deferred pending cachekit-core#13 and benchmarking to determine if FFI overhead is worth it.

Related

  • Upstream: cachekit-core#13 (checksum-only API in Rust)
  • Context: xxHash3 migration in ByteStorage (2025-12-05)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions