Skip to content

fix: Correct REST JSON serialization of UUID/HUGEINT/BLOB/BIT (#89)#92

Merged
jrosskopf merged 1 commit into
mainfrom
fix/gh-89-scalar-type-coverage
Jun 26, 2026
Merged

fix: Correct REST JSON serialization of UUID/HUGEINT/BLOB/BIT (#89)#92
jrosskopf merged 1 commit into
mainfrom
fix/gh-89-scalar-type-coverage

Conversation

@jrosskopf

Copy link
Copy Markdown
Contributor

Summary

Follow-up to #90 / #91. A full audit of every DuckDB type against the REST JSON serializer turned up more mis-routed scalar types — including a crash:

Type Before After
UUID segfault (read as a string pointer; physically a 128-bit int) canonical 8-4-4-4-12 hex string
HUGEINT / UHUGEINT read as 64-bit → truncated + wrong multi-row stride (MAX::HUGEINT-1) exact decimal string (lossless)
BLOB raw bytes → invalid UTF-8 / invalid JSON DuckDB blob string (\xNN for non-printable)
BIT raw bit storage → garbage "101010" string

128-bit integers are emitted as strings (per maintainer decision) so values beyond 2⁶³ survive — JSON numbers lose precision above 2⁵³.

Verified correct (no change needed)

All standard scalars, DATE/TIME/TIMESTAMP(+variants), DECIMAL, ENUM, INTERVAL, JSON, and the nested types LIST/STRUCT/ARRAY/UNION/MAP incl. deeply nested combinations.

Known limitation

VARINT/BIGNUM, GEOMETRY, and VARIANT still serialize as null — their internal/extension encodings aren't safely convertible at the vector level. Documented in code; to_json(col) remains a workaround.

Test plan

  • New [query_executor][scalar_types] regression tests: UUID (single + multi-row, no crash), HUGEINT/UHUGEINT exact decimals, BLOB, BIT, and NULLs.
  • Full C++ unit suite: 644/644 passing.
  • Reviewed with codex (LGTM — UUID formatting matches DuckDB, no value leaks, validity handled).

Relates to #89

Type-coverage audit follow-up. Several scalar types were mis-routed in
the REST serializer:

- UUID -> read via the VARCHAR path (as a string pointer) but is
  physically a 128-bit int, causing a SEGFAULT on any UUID column.
  Now formatted as the canonical 8-4-4-4-12 hex string.
- HUGEINT/UHUGEINT -> read as 64-bit ints, truncating values and
  mis-striding multi-row chunks (e.g. MAX::HUGEINT -> -1). Now emitted
  as exact decimal strings (lossless; JSON numbers lose precision
  above 2^53).
- BLOB -> emitted raw bytes (invalid UTF-8 / invalid JSON). Now uses
  DuckDB's blob string form (printable as-is, others as \xNN).
- BIT -> emitted raw bit storage as garbage. Now emitted as its 0/1
  string.

All honor row validity (NULL -> null). Adds [scalar_types] regression
tests. VARINT/BIGNUM, GEOMETRY and VARIANT remain serialized as null
(internal/extension encodings not safely convertible at the vector
level) — documented as a known limitation.

Found via type audit + codex review.
@jrosskopf jrosskopf merged commit 187f4bb into main Jun 26, 2026
21 checks passed
@jrosskopf jrosskopf deleted the fix/gh-89-scalar-type-coverage branch June 26, 2026 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant