Better quaternion quantization #36

f-dy · 2025-06-10T19:59:27Z

Fixes #31

As explained in #31, the method used by spz for quaternion quantization may cause very large errors. An example is given in that issue that has a 8° rotation error, and while the effect of such an error may be barely visible for small Gaussians, it may clearly stand out as artifacts if the model has large elongated Gaussians.

This PR fixes it by keeping backward compatibility with the SPZ format, but adopting a slightly better quantization strategy, which works by gradually exploring "shells" of increasing radius in the xyz 8-bit quantized quaternion space, centered around the default value (which is just the result of rounding xyz).

The code outputs statistics about rotation quantization. For example, in this case:

[SPZ] Loading 116380 points
[SPZ] Rotation quantization statistics:
[SPZ] Average geodesic distance in degrees with fast quantization: 1.24818
[SPZ] Average geodesic distance in degrees with optimized quantization: 0.773356
[SPZ] Points tested up to radius 0: 29.93% (34829)
[SPZ] Points tested up to radius 1: 51.13% (59502)
[SPZ] Points tested up to radius 2: 9.30% (10826)
[SPZ] Points tested up to radius 3: 3.61% (4205)
[SPZ] Points tested up to radius 4: 2.56% (2980)
[SPZ] Points tested up to radius 5: 1.47% (1711)
[SPZ] Points tested up to radius 6: 0.91% (1063)
[SPZ] Points tested up to radius 7: 0.40% (461)
[SPZ] Points tested up to radius 8: 0.69% (803)
[SPZ] Best radius found:
[SPZ] Best radius 0: 63.76% (74203)
[SPZ] Best radius 1: 32.56% (37899)
[SPZ] Best radius 2: 2.82% (3280)
[SPZ] Best radius 3: 0.59% (683)
[SPZ] Best radius 4: 0.17% (199)
[SPZ] Best radius 5: 0.06% (75)
[SPZ] Best radius 6: 0.03% (30)
[SPZ] Best radius 7: 0.01% (10)
[SPZ] Best radius 8: 0.00% (1)

29.93% of the points had a reconstruction error less than 0.5° with the current (rounding) strategy.
51.13% of the points had a reconstruction error more than 0.5° with the current strategy, but less than 1.0° when exploring the first shell as well (3x3x3 neighborhood)
etc. (those thresholds were set experimentally and they are in the code)
for 63.76% of the points, the best reconstruction error was obtained using the current strategy
for 32.56% of the points, the best reconstruction error was obtained from a value in the first shell
etc.

In terms of execution speed, the resulting code is ~10% slower than the original version

It has been found experimentally that going beyond r=8 never brougth a better solution (in the example above, only 1 point out of 116380 has the best solution for r=8).

Although one could argue that the reduction of the average error is small, but most of the error is made of a few large outliers that really stand out visually if they correspond to large Gaussians.

Here are the results on the lego ply produced by the original code:

[SPZ] Loading 313547 points
[SPZ] Rotation quantization statistics:
[SPZ] Average geodesic distance in degrees with fast quantization: 0.466883
[SPZ] Average geodesic distance in degrees with optimized quantization: 0.457116
[SPZ] Points tested up to radius 0: 61.91% (194110)
[SPZ] Points tested up to radius 1: 37.70% (118214) (0.5° < error < 1°)
[SPZ] Points tested up to radius 2: 0.25% (797)  (1° < error < 1°)
[SPZ] Points tested up to radius 3: 0.06% (203)
[SPZ] Points tested up to radius 4: 0.04% (111)
[SPZ] Points tested up to radius 5: 0.02% (50)
[SPZ] Points tested up to radius 6: 0.01% (28)
[SPZ] Points tested up to radius 7: 0.00% (11)
[SPZ] Points tested up to radius 8: 0.01% (23)
[SPZ] Best radius found:
[SPZ] Best radius 0: 95.52% (299514)
[SPZ] Best radius 1: 4.43% (13901)
[SPZ] Best radius 2: 0.04% (111)
[SPZ] Best radius 3: 0.01% (17)
[SPZ] Best radius 4: 0.00% (3)
[SPZ] Best radius 5: 0.00% (1)
[SPZ] Best radius 6: 0.00% (0)
[SPZ] Best radius 7: 0.00% (0)
[SPZ] Best radius 8: 0.00% (0)

As you can see, it's highly scene dependent (the lego scene is perfectly axis-aligned, and my scene above is slightly misaligned, which is the worst case for the original quaternion quantization method), but the good news is that:

it never hurts. Worst case you spend 10% more time compressing
most visual artifacts disappear with this method (I couldn't find any remaining artifacts on my examples, but there should be some)

This PR also includes a small standalone utility to do the compression/decompression (inspired by another fork), and that utility can also run the original compression method, as well as set the antialiasing/mip-splatting flag in the SPZ.

It also improves PLY parsing but skipping comment lines (as those produced by nerfstudio's exporter).

fixes nianticlabs#31

display stats if VERBOSE >=1

ProjitB · 2025-06-18T21:42:27Z

This looks great, thank you for contributing this! Please give me until early next week to look through this a bit more carefully :)

weegeekps · 2025-06-30T17:59:06Z

My colleague (@keyboardspecialist) and I took a closer look at this and ran some of our own 3DGS PLY files through this to test things out, given our interest in improved quaternion precision in SPZ. We were interested in two angles:

Understanding how our data works with this method.
Understanding the performance implications of this method.

Precision testing

To test the impact on precision @keyboardspecialist put together a histogram measurement of the angular error across several of our datasets with and without the fix applied. The results between the datasets are identical so we are only sharing one set of data here. The results I'm sharing are taken from running the source data for our substation example through the SPZ library with and without @f-dy's fixes.

Overall, the results look good to us. The error is substantially reduced, with the maximum angular error being reduced from 11-degrees to 7-degrees. More importantly, the error is skewed much further towards 0, with the 0.5-degree to 1.0-degree bucket seeing a rise of about 22%. Subsequent buckets have a much sharper drop off compared to the uncorrected version. Charts provided below.


Uncorrected	Corrected

Performance & file size testing

I performed performance & file size testing as I had concerns about how this would impact the creation and read processes. Read in particular is extremely important since that is a runtime factor for us.

My test cases used mechanisms within the SPZ library to eliminate outside variables. To test creation, I loaded my ply file and created an SPZ. To test read, I loaded an SPZ and created a PLY. I ran each for 10 iterations since we're above the second range for creation even on main. Since I could easily grab file size as well, I decided to log that as well. I tried several datasets but as with above, the results were effectively the same, so I am focusing on our substation dataset here.

NOTE: Something on the original test machine I used caused me to get weird results with my testing. A different machine had way better results, resulting in only a 25% uplift: 2.23s to 2.98s with the fix.

Results were mixed. Creation of the SPZ file is significantly slower, seeing about a 4x reduction in performance. On the other hand, I saw no significant difference between reading SPZ files, which is the part I was concerned about. At least for our needs, we can probably live with the reduction in creation performance. Impressively, the file sizes saw no significant change.

Results with `main`

REPEAT NOTE: Something on the original test machine I used caused me to get weird results with my testing. A different machine had way better results, resulting in only a 25% uplift: 2.23s to 2.98s with the fix. The results below are the updated results.

File size: 11.9 MB

-------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations
-------------------------------------------------------------------------------
BM_LoadPLYAndCreateSPZ/iterations:10       2.23 s          2.23 s            10
BM_LoadSPZAndCreatePLY/iterations:10        283 ms          283 ms           10

Results with this PR

File size: 11.9 MB

-------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations
-------------------------------------------------------------------------------
BM_LoadPLYAndCreateSPZ/iterations:10       2.98 s          2.98 s            10
BM_LoadSPZAndCreatePLY/iterations:10        292 ms          291 ms           10

Summary

We're really impressed with this method and feel that it does a great job improving the quantization of the quaternions. The trade-off of creation performance for significantly improved quaternion precision, the same file sizes and equivalent read performance is good enough for our needs.

f-dy · 2025-06-30T22:58:46Z

Thank you for testing! I'm a bit surprised by the 4x time factor for the compression, is it with compile-time optimization turned on? Does it include the zip compression? My timings were for the whole quantization-and-compression process, and I observed a much smaller overhead (10%), but that may be scene-dependent

weegeekps · 2025-07-01T13:41:33Z

@f-dy, so am I, and it didn't sit well with me all night. I spent some significant time tweaking things to eliminate variables, including disabling cpu scaling, but kept getting the same result.

This morning, I decided to take my test harness and run it on a different machine and got much better results. On this other machine, I'm seeing about a 25% uplift: 2.23s to 2.98s with the fix.

Only theory I have right now is that something on my work machine is interfering with this somehow. I've updated my original post.

f-dy · 2025-07-01T17:49:26Z

Thanks for that extended testing! 25% seems very reasonable.

One parameter to tune for speed is the maximum radius. I set it to 8 because I got one case (out of millions) where this yielded a better result, but you see from the two examples above (look at "Best radius found") that max radius=3 is probably fine. If targeting real-time compression, I would even limit to radius=2 and hardcode the point offsets in the radius=1 and radius=2 shells.

I noticed that, even if a higher radius yields a better result, it is usually only marginally better, i.e. it's almost never bringing an error of 5° down to 0.5°, but it will maybe get something like 4.5°. I didn't figure out what metric could demonstrate that. Do you have an idea, @weegeekps or @keyboardspecialist?

jeanphilippepons · 2025-07-03T15:43:23Z

Hi. Thanks for adressing this topic.

At Esri, we are interested in guaranteeing a survey-grade accuracy to gaussian splats when using 3D Tiles with a GLTF extension based on SPZ. In this context, a maximum error of 12 degrees or 7 degrees does not work for us.

We are proposing to introduce a new version of the SPZ format using the classical "smallest three" quantization of quaternions, on 4 bytes (9 bits is enough for each of the 3 smallest components + 2 bits to remind which component is largest). The PR is here and your feedback is very welcome: #41

The current PR may still be useful to improve the accuracy of SPZ version 2.

f-dy · 2025-07-04T12:39:13Z

Hi @jeanphilippepons ! Hope you're doing well. Smallest three sure makes sense to mitigate that issue, but even with that solution, rounding may not be the best solution (to be tested).
It's also easy to implement.
Did you consider other options ? (see Marc-B-Reynolds/Marc-B-Reynolds.github.io#26)

jeanphilippepons · 2025-07-04T13:40:02Z

Are you the Frédéric Devernay I've known at INRIA 20 years ago?
We have browsed the literature but we haven't tested another option than smallest three.

javagl · 2025-07-21T18:55:27Z

Does this become obsolete with #41 ?

f-dy · 2025-07-21T19:07:29Z

Are you the Frédéric Devernay I've known at INRIA 20 years ago?

Yes

We have browsed the literature but we haven't tested another option than smallest three.

https://marc-b-reynolds.github.io/quaternions/2017/05/02/QuatQuantPart1.html

MarvinSt · 2025-11-08T03:30:25Z

Is it not better to encode the orientation as spherical coordinates + a roll angle? This will give an upper-bound on the rotation error.
I’m not an expert on this by any means, but I’m curious to know how the current quantization stacks up against this alternative.

Here’s the Monte Carlo histogram for the (φ, θ, ψ) = (11, 10, 11) bit allocation for a total budget of 32bit.

About 500 000 random orientations were quantized and compared:
• Median error: ≈ 0.06°
• 90 % quantile: ≈ 0.12°
• 99 % quantile: ≈ 0.19°
• Worst observed: ≈ 0.28°

import numpy as np
import matplotlib.pyplot as plt

def quat_from_axis_angle(axis, angle):
    axis = axis / np.linalg.norm(axis, axis=-1, keepdims=True)
    half = angle / 2
    s = np.sin(half)
    return np.stack([np.cos(half), *(axis.T * s)], axis=-1)

def quat_mult(q1, q2):
    w1, x1, y1, z1 = q1.T
    w2, x2, y2, z2 = q2.T
    return np.stack([
        w1*w2 - x1*x2 - y1*y2 - z1*z2,
        w1*x2 + x1*w2 + y1*z2 - z1*y2,
        w1*y2 - x1*z2 + y1*w2 + z1*x2,
        w1*z2 + x1*y2 - y1*x2 + z1*w2
    ], axis=-1)

def quat_conj(q):
    q = np.array(q)
    q[...,1:] *= -1
    return q

def quat_angle_error(q1, q2):
    dq = quat_mult(q1, quat_conj(q2))
    dq /= np.linalg.norm(dq, axis=-1, keepdims=True)
    ang = 2 * np.arccos(np.clip(np.abs(dq[...,0]), -1, 1))
    return np.degrees(ang)

# Bit allocation
b_phi, b_theta, b_psi = 11, 10, 11

# Quantization steps
step_phi = 2*np.pi / (2**b_phi)
step_theta = np.pi / (2**b_theta)
step_psi = 2*np.pi / (2**b_psi)

# Monte Carlo sampling
N = 500_000
phi = np.random.rand(N) * 2*np.pi
theta = np.arccos(2*np.random.rand(N) - 1)
psi = np.random.rand(N) * 2*np.pi

# Quantize angles
phi_q = (np.floor(phi / step_phi + 0.5) * step_phi) % (2*np.pi)
theta_q = np.clip(np.floor(theta / step_theta + 0.5) * step_theta, 0, np.pi)
psi_q = (np.floor(psi / step_psi + 0.5) * step_psi) % (2*np.pi)

# Convert to quaternions
axes = np.stack([np.sin(theta)*np.cos(phi),
                 np.sin(theta)*np.sin(phi),
                 np.cos(theta)], axis=-1)
axes_q = np.stack([np.sin(theta_q)*np.cos(phi_q),
                   np.sin(theta_q)*np.sin(phi_q),
                   np.cos(theta_q)], axis=-1)

# Build quaternions
q_true = quat_from_axis_angle(axes, psi)
q_quant = quat_from_axis_angle(axes_q, psi_q)

# Compute angular errors
errors_deg = quat_angle_error(q_true, q_quant)

# Statistics
percentiles = np.percentile(errors_deg, [50, 90, 99, 99.9])
worst = np.max(errors_deg)

# Plot histogram
plt.figure(figsize=(6,4))
plt.hist(errors_deg, bins=300, range=(0,0.5), density=True)
plt.xlabel("Rotation error (degrees)")
plt.ylabel("Probability density")
plt.title(
    f"Monte Carlo orientation quantization errors (11,10,11 bits)\n"
    f"median={percentiles[0]:.3f}°, 90%={percentiles[1]:.3f}°, "
    f"99%={percentiles[2]:.3f}°, max≈{worst:.3f}°"
)
plt.grid(True)
plt.show()

jeanphilippepons · 2025-11-12T08:25:35Z

Hi @MarvinSt, thanks for looking into this. This PR is not applicable anymore, as it was superseded by #41, which yields accuracy statistics similar to your proposal, and which is included in SPZ format version 3. What are your thoughts on this solution?

javagl · 2025-11-12T12:38:21Z

This PR could/should be closed. Alternative suggestions for encoding could/should be discussed in dedicated issues.

f-dy added 7 commits April 24, 2025 14:43

more accurate quaternion quantization

2c8982f

fixes nianticlabs#31

add spz_convert

4820676

small adjustments

69dea84

Update .gitignore

39605c3

Update .gitignore

20bbdef

Create settings.json

7707b37

finalize better quaternion quantization

b7a8bf0

display stats if VERBOSE >=1

f-dy marked this pull request as ready for review June 10, 2025 20:37

f-dy mentioned this pull request Jun 10, 2025

quaternions/2017/05/02/QuatQuantPart1 Marc-B-Reynolds/Marc-B-Reynolds.github.io#26

Open

Merge branch 'nianticlabs:main' into better-quaternion-quantization

df10f1e

RenaudKeriven mentioned this pull request Jun 19, 2025

Gaussian Splat SPZ Support CesiumGS/cesium#12582

Merged

6 tasks

jeanphilippepons mentioned this pull request Jul 3, 2025

Version 3 with "smallest three" quaternion encoding on 4 bytes #41

Merged

hobu mentioned this pull request Aug 25, 2025

add splat-version.h to define the library release version information #53

Open

Better quaternion quantization #36

Are you sure you want to change the base?

Better quaternion quantization #36

Conversation

f-dy commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProjitB commented Jun 18, 2025

Uh oh!

weegeekps commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Precision testing

Performance & file size testing

Results with main

Results with this PR

Summary

Uh oh!

f-dy commented Jun 30, 2025

Uh oh!

weegeekps commented Jul 1, 2025

Uh oh!

f-dy commented Jul 1, 2025

Uh oh!

jeanphilippepons commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

f-dy commented Jul 4, 2025

Uh oh!

jeanphilippepons commented Jul 4, 2025

Uh oh!

javagl commented Jul 21, 2025

Uh oh!

f-dy commented Jul 21, 2025

Uh oh!

MarvinSt commented Nov 8, 2025

Uh oh!

jeanphilippepons commented Nov 12, 2025

Uh oh!

javagl commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

f-dy commented Jun 10, 2025 •

edited

Loading

weegeekps commented Jun 30, 2025 •

edited

Loading

Results with `main`

jeanphilippepons commented Jul 3, 2025 •

edited

Loading