bench: aggregations over decimals#6669
Conversation
The Avro spec provides conflicting/unclear guidance on canonicalizing schemas that contain decimal data (i.e. decimal values require the presence of precision and scale fields to express equality, but neither field is documented in the canonicalization process). To solve this problem for kgen, we are no longer canonicalizing schemas.
This is a scaffold for Sean to create a new benchmark type for testing the performance of aggregations.
cirego
left a comment
There was a problem hiding this comment.
Looks great to me!
We need to figure out why the smoke test failed in CI.
| @@ -0,0 +1 @@ | |||
| data | |||
There was a problem hiding this comment.
What's creating the data file and/or directory?
| FROM KAFKA BROKER 'kafka:9092' | ||
| TOPIC 'aggregationtest' | ||
| FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY 'http://schema-registry:8081' | ||
| ENVELOPE UPSERT |
There was a problem hiding this comment.
You might consider using INSERT instead of UPSERT, as the upsert operator is quite a bit more expensive. Using insert should make the performance delta between implementations more apparent.
|
It looks like this failed with the following error message: |
|
I'm able to reproduce this locally: |
|
I also see |
|
I think the "OuterRecord" error may be a red-herring, when we really should have exited with error when kgen failed. Edit: forget what I said about the red herring. I was contemplating that perhaps the schema didn't exist in the schema registry yet, but I can see that it was created: Once #6675 lands, we should pull those improvements into this benchmark too! |
|
@cirego tysm for checking into this; the root of the problem was just an inverted inequality that caused everything else to topped down around it. re: |
A very simple test of an aggregation over a single decimal column; this will help us understand how much the
rust-dec-backed decimal implementation degrades performance and increases memory usage.This supersedes #6658 and #6643