Skip to content

ELE-740 - paginate upload of source freshness data#850

Merged
elongl merged 2 commits intoelementary-data:masterfrom
fredmny:fix/source_freshness_upload
May 10, 2023
Merged

ELE-740 - paginate upload of source freshness data#850
elongl merged 2 commits intoelementary-data:masterfrom
fredmny:fix/source_freshness_upload

Conversation

@fredmny
Copy link
Copy Markdown
Contributor

@fredmny fredmny commented May 7, 2023

This solves issue ELE-740

I am not really sure about what the best way to determine the chunk_size for the upload of the source data is. What I did:

  • Looked up what the MAX_ARG for Linux systems is and according to some stackoverflow threads it's generally something between 100k and 200k characters - it's bigger for MacOS (this also finally explains why I wasn't getting the error locally, but it kept appearing in the GH action), but since we are running it in a Github instance and to go on the safe side, it obviously makes sense to go with the lesser value
  • Looking at the sources.json file from our project we have:
    • ~ 320k chars
    • ~ 470 sources
  • Using this as a reference the chunk size for each upload segment could be around 140 sources. To go on the super safe side, i determined it to be 100 sources.
    I normally don't like the idea of hard coding it. We also could:
  • Take the max argument size of the specific system (e.g.:getconf ARG_MAX on unix based systems)
  • calculate the chunk sizes based on that
    but IMO this would let the code hard to read and over engineered.
    That said, if you have other suggestions, I'd be happy to hear them.

Then, I am simply running a for loop to call the command uploading the sources multiple times. Is there a better way to do this?

Testing
I didn't find any testing guidelines for the cli (justo for the dbt-package). What I did to test it, was to install the altered package and run the command on our dbt project. It's working as expected. Let me know how I can better test it.

Comments
From what I see the Elementary code isn't over commented, so I also didn't comment the steps in the code.

Please let me know what I could do better or am doing wrong.

Copy link
Copy Markdown
Contributor

@elongl elongl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for opening this PR.
I checked and it does indeed resolve the issue!
Added one very small suggestion but it's not a must, let me know if you want to add it, else I'll merge.

quiet=True,
)
chunk_size = 100
for chunk in range(0, len(results), chunk_size):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it might be cool if we'll add a progress bar like we do here. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, makes sense! I will do that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, @elongl
I added the progress bar. I tested and it's working as expected, but let me know if there is something I should change.

I based the title on the completion message: Uploaded source freshness results successfully

Copy link
Copy Markdown
Contributor

@elongl elongl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for opening this PR.
I checked and it does indeed resolve the issue!
Added one very small suggestion but it's not a must, let me know if you want to add it, else I'll merge.

@fredmny fredmny requested a review from elongl May 9, 2023 21:26
@elongl elongl merged commit 0aaa6f3 into elementary-data:master May 10, 2023
@elongl
Copy link
Copy Markdown
Contributor

elongl commented May 10, 2023

Thanks a lot for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants