Skip to content

Conversation

@f-dy
Copy link

@f-dy f-dy commented Jun 10, 2025

Fixes #31

As explained in #31, the method used by spz for quaternion quantization may cause very large errors. An example is given in that issue that has a 8° rotation error, and while the effect of such an error may be barely visible for small Gaussians, it may clearly stand out as artifacts if the model has large elongated Gaussians.

This PR fixes it by keeping backward compatibility with the SPZ format, but adopting a slightly better quantization strategy, which works by gradually exploring "shells" of increasing radius in the xyz 8-bit quantized quaternion space, centered around the default value (which is just the result of rounding xyz).

The code outputs statistics about rotation quantization. For example, in this case:

[SPZ] Loading 116380 points
[SPZ] Rotation quantization statistics:
[SPZ] Average geodesic distance in degrees with fast quantization: 1.24818
[SPZ] Average geodesic distance in degrees with optimized quantization: 0.773356
[SPZ] Points tested up to radius 0: 29.93% (34829)
[SPZ] Points tested up to radius 1: 51.13% (59502)
[SPZ] Points tested up to radius 2: 9.30% (10826)
[SPZ] Points tested up to radius 3: 3.61% (4205)
[SPZ] Points tested up to radius 4: 2.56% (2980)
[SPZ] Points tested up to radius 5: 1.47% (1711)
[SPZ] Points tested up to radius 6: 0.91% (1063)
[SPZ] Points tested up to radius 7: 0.40% (461)
[SPZ] Points tested up to radius 8: 0.69% (803)
[SPZ] Best radius found:
[SPZ] Best radius 0: 63.76% (74203)
[SPZ] Best radius 1: 32.56% (37899)
[SPZ] Best radius 2: 2.82% (3280)
[SPZ] Best radius 3: 0.59% (683)
[SPZ] Best radius 4: 0.17% (199)
[SPZ] Best radius 5: 0.06% (75)
[SPZ] Best radius 6: 0.03% (30)
[SPZ] Best radius 7: 0.01% (10)
[SPZ] Best radius 8: 0.00% (1)
  • 29.93% of the points had a reconstruction error less than 0.5° with the current (rounding) strategy.

  • 51.13% of the points had a reconstruction error more than 0.5° with the current strategy, but less than 1.0° when exploring the first shell as well (3x3x3 neighborhood)

  • etc. (those thresholds were set experimentally and they are in the code)

  • for 63.76% of the points, the best reconstruction error was obtained using the current strategy

  • for 32.56% of the points, the best reconstruction error was obtained from a value in the first shell

  • etc.

In terms of execution speed, the resulting code is ~10% slower than the original version

It has been found experimentally that going beyond r=8 never brougth a better solution (in the example above, only 1 point out of 116380 has the best solution for r=8).

Although one could argue that the reduction of the average error is small, but most of the error is made of a few large outliers that really stand out visually if they correspond to large Gaussians.

Here are the results on the lego ply produced by the original code:

[SPZ] Loading 313547 points
[SPZ] Rotation quantization statistics:
[SPZ] Average geodesic distance in degrees with fast quantization: 0.466883
[SPZ] Average geodesic distance in degrees with optimized quantization: 0.457116
[SPZ] Points tested up to radius 0: 61.91% (194110)
[SPZ] Points tested up to radius 1: 37.70% (118214) (0.5° < error < 1°)
[SPZ] Points tested up to radius 2: 0.25% (797)  (1° < error < 1°)
[SPZ] Points tested up to radius 3: 0.06% (203)
[SPZ] Points tested up to radius 4: 0.04% (111)
[SPZ] Points tested up to radius 5: 0.02% (50)
[SPZ] Points tested up to radius 6: 0.01% (28)
[SPZ] Points tested up to radius 7: 0.00% (11)
[SPZ] Points tested up to radius 8: 0.01% (23)
[SPZ] Best radius found:
[SPZ] Best radius 0: 95.52% (299514)
[SPZ] Best radius 1: 4.43% (13901)
[SPZ] Best radius 2: 0.04% (111)
[SPZ] Best radius 3: 0.01% (17)
[SPZ] Best radius 4: 0.00% (3)
[SPZ] Best radius 5: 0.00% (1)
[SPZ] Best radius 6: 0.00% (0)
[SPZ] Best radius 7: 0.00% (0)
[SPZ] Best radius 8: 0.00% (0)

As you can see, it's highly scene dependent (the lego scene is perfectly axis-aligned, and my scene above is slightly misaligned, which is the worst case for the original quaternion quantization method), but the good news is that:

  • it never hurts. Worst case you spend 10% more time compressing
  • most visual artifacts disappear with this method (I couldn't find any remaining artifacts on my examples, but there should be some)

This PR also includes a small standalone utility to do the compression/decompression (inspired by another fork), and that utility can also run the original compression method, as well as set the antialiasing/mip-splatting flag in the SPZ.

It also improves PLY parsing but skipping comment lines (as those produced by nerfstudio's exporter).

@ProjitB
Copy link
Collaborator

ProjitB commented Jun 18, 2025

This looks great, thank you for contributing this! Please give me until early next week to look through this a bit more carefully :)

@weegeekps
Copy link
Contributor

weegeekps commented Jun 30, 2025

My colleague (@keyboardspecialist) and I took a closer look at this and ran some of our own 3DGS PLY files through this to test things out, given our interest in improved quaternion precision in SPZ. We were interested in two angles:

  1. Understanding how our data works with this method.
  2. Understanding the performance implications of this method.

Precision testing

To test the impact on precision @keyboardspecialist put together a histogram measurement of the angular error across several of our datasets with and without the fix applied. The results between the datasets are identical so we are only sharing one set of data here. The results I'm sharing are taken from running the source data for our substation example through the SPZ library with and without @f-dy's fixes.

Overall, the results look good to us. The error is substantially reduced, with the maximum angular error being reduced from 11-degrees to 7-degrees. More importantly, the error is skewed much further towards 0, with the 0.5-degree to 1.0-degree bucket seeing a rise of about 22%. Subsequent buckets have a much sharper drop off compared to the uncorrected version. Charts provided below.

image image
Uncorrected Corrected

Performance & file size testing

I performed performance & file size testing as I had concerns about how this would impact the creation and read processes. Read in particular is extremely important since that is a runtime factor for us.

My test cases used mechanisms within the SPZ library to eliminate outside variables. To test creation, I loaded my ply file and created an SPZ. To test read, I loaded an SPZ and created a PLY. I ran each for 10 iterations since we're above the second range for creation even on main. Since I could easily grab file size as well, I decided to log that as well. I tried several datasets but as with above, the results were effectively the same, so I am focusing on our substation dataset here.

NOTE: Something on the original test machine I used caused me to get weird results with my testing. A different machine had way better results, resulting in only a 25% uplift: 2.23s to 2.98s with the fix.

Results were mixed. Creation of the SPZ file is significantly slower, seeing about a 4x reduction in performance. On the other hand, I saw no significant difference between reading SPZ files, which is the part I was concerned about. At least for our needs, we can probably live with the reduction in creation performance. Impressively, the file sizes saw no significant change.

Results with main

REPEAT NOTE: Something on the original test machine I used caused me to get weird results with my testing. A different machine had way better results, resulting in only a 25% uplift: 2.23s to 2.98s with the fix. The results below are the updated results.

File size: 11.9 MB

-------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations
-------------------------------------------------------------------------------
BM_LoadPLYAndCreateSPZ/iterations:10       2.23 s          2.23 s            10
BM_LoadSPZAndCreatePLY/iterations:10        283 ms          283 ms           10

Results with this PR

File size: 11.9 MB

-------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations
-------------------------------------------------------------------------------
BM_LoadPLYAndCreateSPZ/iterations:10       2.98 s          2.98 s            10
BM_LoadSPZAndCreatePLY/iterations:10        292 ms          291 ms           10

Summary

We're really impressed with this method and feel that it does a great job improving the quantization of the quaternions. The trade-off of creation performance for significantly improved quaternion precision, the same file sizes and equivalent read performance is good enough for our needs.

@f-dy
Copy link
Author

f-dy commented Jun 30, 2025

Thank you for testing! I'm a bit surprised by the 4x time factor for the compression, is it with compile-time optimization turned on? Does it include the zip compression? My timings were for the whole quantization-and-compression process, and I observed a much smaller overhead (10%), but that may be scene-dependent

@weegeekps
Copy link
Contributor

@f-dy, so am I, and it didn't sit well with me all night. I spent some significant time tweaking things to eliminate variables, including disabling cpu scaling, but kept getting the same result.

This morning, I decided to take my test harness and run it on a different machine and got much better results. On this other machine, I'm seeing about a 25% uplift: 2.23s to 2.98s with the fix.

Only theory I have right now is that something on my work machine is interfering with this somehow. I've updated my original post.

@f-dy
Copy link
Author

f-dy commented Jul 1, 2025

Thanks for that extended testing! 25% seems very reasonable.

One parameter to tune for speed is the maximum radius. I set it to 8 because I got one case (out of millions) where this yielded a better result, but you see from the two examples above (look at "Best radius found") that max radius=3 is probably fine. If targeting real-time compression, I would even limit to radius=2 and hardcode the point offsets in the radius=1 and radius=2 shells.

I noticed that, even if a higher radius yields a better result, it is usually only marginally better, i.e. it's almost never bringing an error of 5° down to 0.5°, but it will maybe get something like 4.5°. I didn't figure out what metric could demonstrate that. Do you have an idea, @weegeekps or @keyboardspecialist?

@jeanphilippepons
Copy link
Contributor

jeanphilippepons commented Jul 3, 2025

Hi. Thanks for adressing this topic.

At Esri, we are interested in guaranteeing a survey-grade accuracy to gaussian splats when using 3D Tiles with a GLTF extension based on SPZ. In this context, a maximum error of 12 degrees or 7 degrees does not work for us.

We are proposing to introduce a new version of the SPZ format using the classical "smallest three" quantization of quaternions, on 4 bytes (9 bits is enough for each of the 3 smallest components + 2 bits to remind which component is largest). The PR is here and your feedback is very welcome: #41

The current PR may still be useful to improve the accuracy of SPZ version 2.

@f-dy
Copy link
Author

f-dy commented Jul 4, 2025

Hi @jeanphilippepons ! Hope you're doing well. Smallest three sure makes sense to mitigate that issue, but even with that solution, rounding may not be the best solution (to be tested).
It's also easy to implement.
Did you consider other options ? (see Marc-B-Reynolds/Marc-B-Reynolds.github.io#26)

@jeanphilippepons
Copy link
Contributor

Are you the Frédéric Devernay I've known at INRIA 20 years ago?
We have browsed the literature but we haven't tested another option than smallest three.

@javagl
Copy link

javagl commented Jul 21, 2025

Does this become obsolete with #41 ?

@f-dy
Copy link
Author

f-dy commented Jul 21, 2025

Are you the Frédéric Devernay I've known at INRIA 20 years ago?

Yes

We have browsed the literature but we haven't tested another option than smallest three.

https://marc-b-reynolds.github.io/quaternions/2017/05/02/QuatQuantPart1.html

@MarvinSt
Copy link

MarvinSt commented Nov 8, 2025

Is it not better to encode the orientation as spherical coordinates + a roll angle? This will give an upper-bound on the rotation error.
I’m not an expert on this by any means, but I’m curious to know how the current quantization stacks up against this alternative.

Here’s the Monte Carlo histogram for the (φ, θ, ψ) = (11, 10, 11) bit allocation for a total budget of 32bit.

About 500 000 random orientations were quantized and compared:
• Median error: ≈ 0.06°
• 90 % quantile: ≈ 0.12°
• 99 % quantile: ≈ 0.19°
• Worst observed: ≈ 0.28°

image
import numpy as np
import matplotlib.pyplot as plt

def quat_from_axis_angle(axis, angle):
    axis = axis / np.linalg.norm(axis, axis=-1, keepdims=True)
    half = angle / 2
    s = np.sin(half)
    return np.stack([np.cos(half), *(axis.T * s)], axis=-1)

def quat_mult(q1, q2):
    w1, x1, y1, z1 = q1.T
    w2, x2, y2, z2 = q2.T
    return np.stack([
        w1*w2 - x1*x2 - y1*y2 - z1*z2,
        w1*x2 + x1*w2 + y1*z2 - z1*y2,
        w1*y2 - x1*z2 + y1*w2 + z1*x2,
        w1*z2 + x1*y2 - y1*x2 + z1*w2
    ], axis=-1)

def quat_conj(q):
    q = np.array(q)
    q[...,1:] *= -1
    return q

def quat_angle_error(q1, q2):
    dq = quat_mult(q1, quat_conj(q2))
    dq /= np.linalg.norm(dq, axis=-1, keepdims=True)
    ang = 2 * np.arccos(np.clip(np.abs(dq[...,0]), -1, 1))
    return np.degrees(ang)

# Bit allocation
b_phi, b_theta, b_psi = 11, 10, 11

# Quantization steps
step_phi = 2*np.pi / (2**b_phi)
step_theta = np.pi / (2**b_theta)
step_psi = 2*np.pi / (2**b_psi)

# Monte Carlo sampling
N = 500_000
phi = np.random.rand(N) * 2*np.pi
theta = np.arccos(2*np.random.rand(N) - 1)
psi = np.random.rand(N) * 2*np.pi

# Quantize angles
phi_q = (np.floor(phi / step_phi + 0.5) * step_phi) % (2*np.pi)
theta_q = np.clip(np.floor(theta / step_theta + 0.5) * step_theta, 0, np.pi)
psi_q = (np.floor(psi / step_psi + 0.5) * step_psi) % (2*np.pi)

# Convert to quaternions
axes = np.stack([np.sin(theta)*np.cos(phi),
                 np.sin(theta)*np.sin(phi),
                 np.cos(theta)], axis=-1)
axes_q = np.stack([np.sin(theta_q)*np.cos(phi_q),
                   np.sin(theta_q)*np.sin(phi_q),
                   np.cos(theta_q)], axis=-1)

# Build quaternions
q_true = quat_from_axis_angle(axes, psi)
q_quant = quat_from_axis_angle(axes_q, psi_q)

# Compute angular errors
errors_deg = quat_angle_error(q_true, q_quant)

# Statistics
percentiles = np.percentile(errors_deg, [50, 90, 99, 99.9])
worst = np.max(errors_deg)

# Plot histogram
plt.figure(figsize=(6,4))
plt.hist(errors_deg, bins=300, range=(0,0.5), density=True)
plt.xlabel("Rotation error (degrees)")
plt.ylabel("Probability density")
plt.title(
    f"Monte Carlo orientation quantization errors (11,10,11 bits)\n"
    f"median={percentiles[0]:.3f}°, 90%={percentiles[1]:.3f}°, "
    f"99%={percentiles[2]:.3f}°, max≈{worst:.3f}°"
)
plt.grid(True)
plt.show()

@jeanphilippepons
Copy link
Contributor

Hi @MarvinSt, thanks for looking into this. This PR is not applicable anymore, as it was superseded by #41, which yields accuracy statistics similar to your proposal, and which is included in SPZ format version 3. What are your thoughts on this solution?

@javagl
Copy link

javagl commented Nov 12, 2025

This PR could/should be closed. Alternative suggestions for encoding could/should be discussed in dedicated issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Very large errors when quantizing the rotation (with a solution)

6 participants