Skip to content

upload nightly tps metrics to s3#571

Merged
THardy98 merged 4 commits intomainfrom
upload-nightly-tps-metrics-s3
Dec 10, 2025
Merged

upload nightly tps metrics to s3#571
THardy98 merged 4 commits intomainfrom
upload-nightly-tps-metrics-s3

Conversation

@THardy98
Copy link
Contributor

@THardy98 THardy98 commented Dec 5, 2025

What was changed

Upload nightly prometheus worker metrics to S3 for querying in Omni

Why?

observability

@THardy98 THardy98 requested a review from a team as a code owner December 5, 2025 00:52
--option min-throughput-per-hour=1000 \
2>&1 | tee $WORKER_LOG_DIR/scenario.log

- name: Configure AWS credentials
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there no secrets required here? So anyone can upload to this bucket?

Copy link
Contributor Author

@THardy98 THardy98 Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We assume an ARN that gives us perms to write to this specific bucket, from specific github repos (the sdk repos)

Copy link
Contributor

@cretz cretz Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But doesn't that let anyone do the same since the ARN is not a secret and is visible by everyone? Or is there some way that the AWS endpoint knows it's from this GH workflow and not from someone else's S3 client? Otherwise, it there some secret/auth that we can set to prove it came from us in these GH workflows?

Copy link
Contributor Author

@THardy98 THardy98 Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I was told, the ARN only permits uploads from our SDK repos. (based on its trust policy), but I can move it behind a secret regardless

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be interested to confirm how that works. So the ARN restricts to uploading from GH worker IPs only? Yes, would definitely prefer using secrets when uploading.

Copy link
Contributor Author

@THardy98 THardy98 Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to a secret

I believe it uses GH OIDC in-tandem with a trust policy to scope to specific repos with specific permissible actions (basically saying "only repo X have permission to do action Y"). Given that this workflow is only trigger-able via cron and manual dispatch (only available to maintainers), it can't be abused.

This blog post covers the setup
https://aws.amazon.com/blogs/security/use-iam-roles-to-connect-github-actions-to-actions-in-aws/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it does an OIDC handshake and AWS validates the JWT against GH actions signed token. If setup on the AWS side to validate the GH key, then yes, no concerns about a user being able to use this. Approving (though up to you whether to move back our from secrets now that I understand what aws-actions/configure-aws-credentials is doing).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic here is getting more and more involved. I think it'd be wise to consider either a Go project to encapsulate this logic or moving this workflow file to a centralized place (e.g. omes itself?) that is only invoked from here, similar to what we do with features. Arguably you don't even need to invoke from the SDK repos because there's not really any value in doing it in the SDK repos.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree.
I have a draft PR in omes that covers the non-lang specific part of the workflow (prior to the S3 upload), which basically was the omes run itself, but I liked being able to enable flags independently in each SDK. Arguably you could allow these flags to be inputs though.

I'll move this to a shared workflow in the future.

@THardy98 THardy98 requested a review from cretz December 9, 2025 02:32
@THardy98 THardy98 force-pushed the upload-nightly-tps-metrics-s3 branch from 684de08 to 08bcd2c Compare December 9, 2025 19:14
@THardy98 THardy98 merged commit 2149661 into main Dec 10, 2025
11 checks passed
@THardy98 THardy98 deleted the upload-nightly-tps-metrics-s3 branch December 10, 2025 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants