bench: aggregations over decimals by sploiselle · Pull Request #6669 · MaterializeInc/materialize

sploiselle · 2021-05-06T16:24:47Z

A very simple test of an aggregation over a single decimal column; this will help us understand how much the rust-dec-backed decimal implementation degrades performance and increases memory usage.

This supersedes #6658 and #6643

The Avro spec provides conflicting/unclear guidance on canonicalizing schemas that contain decimal data (i.e. decimal values require the presence of precision and scale fields to express equality, but neither field is documented in the canonicalization process). To solve this problem for kgen, we are no longer canonicalizing schemas.

This is a scaffold for Sean to create a new benchmark type for testing the performance of aggregations.

cirego

Looks great to me!

We need to figure out why the smoke test failed in CI.

cirego · 2021-05-06T23:03:07Z

@@ -0,0 +1 @@
+data


What's creating the data file and/or directory?

cirego · 2021-05-06T23:04:54Z

+FROM KAFKA BROKER 'kafka:9092'
+TOPIC 'aggregationtest'
+FORMAT AVRO USING CONFLUENT SCHEMA REGISTRY 'http://schema-registry:8081'
+ENVELOPE UPSERT


You might consider using INSERT instead of UPSERT, as the upsert operator is quite a bit more expensive. Using insert should make the performance delta between implementations more apparent.

cirego · 2021-05-06T23:07:47Z

It looks like this failed with the following error message:

CREATE SOURCE
psql:/data/aggregation_views.sql:21: ERROR:  column name "OuterRecord" is ambiguous
psql:/data/aggregation_views.sql:23: ERROR:  unknown catalog item 'aggregationtest'
$ docker wait 15c81efce39dd2dcf9575085d9e51cb288a90ebdba2f625bad69ffba79c26b47

cirego · 2021-05-06T23:24:26Z

I'm able to reproduce this locally:

[chris-work:~/materialize/materialize:(dec-agg-test[|])] docker logs aggregations_create-views_run_f4dc208bfa72 
CREATE SOURCE
psql:/data/aggregation_views.sql:21: ERROR:  column name "OuterRecord" is ambiguous
psql:/data/aggregation_views.sql:23: ERROR:  unknown catalog item 'aggregationtest'

cirego · 2021-05-06T23:29:58Z

I also see kgen exiting with the following errors:

Spawning 4 generator processes, writing 25000 messages each
thread 'main' panicked at 'max value of 1000000000 exceeds value expressable with precision 15', src/kafka-util/src/bin/kgen.rs:294:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'max value of 1000000000 exceeds value expressable with precision 15', src/kafka-util/src/bin/kgen.rs:294:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'max value of 1000000000 exceeds value expressable with precision 15', src/kafka-util/src/bin/kgen.rs:294:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'max value of 1000000000 exceeds value expressable with precision 15', src/kafka-util/src/bin/kgen.rs:294:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
0/4 processes finished: pid=7 returncode=101
1/4 processes finished: pid=8 returncode=101
2/4 processes finished: pid=9 returncode=101
3/4 processes finished: pid=10 returncode=101

cirego · 2021-05-06T23:44:17Z

I think the "OuterRecord" error may be a red-herring, when we really should have exited with error when kgen failed.

Edit: forget what I said about the red herring. I was contemplating that perhaps the schema didn't exist in the schema registry yet, but I can see that it was created:

[chris-work:~/materialize/materialize/test/bench/aggregations:(dec-agg-test[|])] curl -s http://localhost:8081/subjects/aggregationtest-value/versions/latest | jq .
{
  "subject": "aggregationtest-value",
  "version": 1,
  "id": 13,
  "schema": "{\"type\":\"record\",\"name\":\"aggregationtest\",\"namespace\":\"com.acme.avro\",\"fields\":[{\"name\":\"Key1Unused\",\"type\":\"long\"},{\"name\":\"OuterRecord\",\"type\":{\"type\":\"record\",\"name\":\"OuterRecord\",\"fields\":[{\"name\":\"DecValue\",\"type\":{\"type\":\"bytes\",\"logicalType\":\"decimal\",\"precision\":15,\"scale\":2}}]}}]}"
}

Once #6675 lands, we should pull those improvements into this benchmark too!

sploiselle · 2021-05-07T16:31:51Z

@cirego tysm for checking into this; the root of the problem was just an inverted inequality that caused everything else to topped down around it. re: UPSERT vs. append-only envelope, I want the ability to control the total size of the view to examine memory consumption of the refactor. I guess it's possible to also throttle this with the number of messages we send, so idk––going to go with UPSERT because it also feels like a slightly more realistic.

Sean Loiselle and others added 3 commits May 5, 2021 15:10

kafka-util: support generating decimal data

8b3ef78

Create a new aggregations benchmark

a659426

This is a scaffold for Sean to create a new benchmark type for testing the performance of aggregations.

sploiselle requested a review from cirego May 6, 2021 16:24

sploiselle mentioned this pull request May 6, 2021

avro+kafka: decimal data generation #6658

Closed

cirego approved these changes May 6, 2021

View reviewed changes

Sean Loiselle and others added 2 commits May 7, 2021 12:00

bench: decimal aggregation benchmark

f34ef88

bench: improve error handling

b343ff1

cirego mentioned this pull request May 7, 2021

Add mzbench-friendly avro-insert benchmark #6676

Merged

sploiselle force-pushed the dec-agg-test branch from 6303fc5 to b343ff1 Compare May 7, 2021 16:31

sploiselle merged commit d1e9ec3 into MaterializeInc:main May 7, 2021

sploiselle deleted the dec-agg-test branch May 7, 2021 17:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: aggregations over decimals#6669

bench: aggregations over decimals#6669
sploiselle merged 5 commits into
MaterializeInc:mainfrom
sploiselle:dec-agg-test

sploiselle commented May 6, 2021 •

edited

Loading

Uh oh!

cirego left a comment

Uh oh!

cirego May 6, 2021

Uh oh!

cirego May 6, 2021

Uh oh!

cirego commented May 6, 2021

Uh oh!

cirego commented May 6, 2021

Uh oh!

cirego commented May 6, 2021

Uh oh!

cirego commented May 6, 2021 •

edited

Loading

Uh oh!

sploiselle commented May 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sploiselle commented May 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cirego left a comment

Choose a reason for hiding this comment

Uh oh!

cirego May 6, 2021

Choose a reason for hiding this comment

Uh oh!

cirego May 6, 2021

Choose a reason for hiding this comment

Uh oh!

cirego commented May 6, 2021

Uh oh!

cirego commented May 6, 2021

Uh oh!

cirego commented May 6, 2021

Uh oh!

cirego commented May 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sploiselle commented May 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sploiselle commented May 6, 2021 •

edited

Loading

cirego commented May 6, 2021 •

edited

Loading