Fix multiple issues in size-labeler.sh#1034
Conversation
The `IGNORED` list contain path globs, but they were getting tested using bash regex syntax, and this would not produce the right results.
There was a problem hiding this comment.
Code Review
This pull request updates the size-labeler.sh script to use Bash glob patterns for file exclusion and increases the GitHub API per_page parameter to 100. While the increase in page size is helpful, feedback was provided noting that the script still fails to handle pagination for pull requests exceeding 100 files, which could result in inaccurate size labels.
This enhances the code to support getting more than 100 files from GitHub. (Written with the help of Gemini CLI.)
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the compute_changes function in dev_tools/ci/size-labeler.sh to implement pagination when querying the GitHub API for pull request files, ensuring accurate size labeling for large changesets. It also improves shell script robustness through better variable quoting and refined pattern matching. Feedback was provided to optimize performance by reducing the number of jq process executions within the file processing loop, suggesting a single-pass approach to extract filenames and change counts.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
pavoljuhas
left a comment
There was a problem hiding this comment.
If we are getting into these weeds, we can get the same using gh api --paginate instead.
I would also remove the IGNORED array and logic as those patterns do not match anything in the qsim repo.
| "*_pb2.py" | ||
| "*_pb2.pyi" | ||
| "*_pb2_grpc.py" | ||
| ".*.lock" | ||
| "*.bundle.js" |
There was a problem hiding this comment.
None of these was ever present in the repo. I suggest to remove the IGNORED array and the related skip-file logic.
Alternatively, we can replace it with and extended regular expression for jq see below why.
| local response | ||
| response="$(api_call "pulls/${pr}/files?per_page=100&page=${page}")" | ||
|
|
||
| if [[ "$(jq_stdin '. | length' <<<"${response}")" -eq 0 ]]; then | ||
| break | ||
| fi |
There was a problem hiding this comment.
I suggest to replace all the pagination logic with gh api --paginate and let it do the next-page calls for us. We can also let the gh sum up change counts in a jq expression, for example,
gh api --paginate /repos/quantumlib/qsim/pulls/965/files \
--jq="[0, .[].changes] | add"(the leading 0 in the array is for getting an empty array to sum to 0 rather than null)
Extending the expression with some filter for an IGNORED regex pattern would be straightforward, but we should leave it for later when (if ever) it becomes necessary.
Changes:
The
IGNOREDarray insize-labeler.shcontains glob patterns (e.g.,*_pb2.py,.*.lock). However, the code used the Bash regex match operator=~, which fails to match these patterns correctly. The following demonstrates the problem:The fix (to use
==) is the same fix applied in the Cirq repo in PR #7986 there for the same problem.Previously, only a max of 100 files were fetched from GitHub, which might have been sufficient for all cases. This PR adds support for using a paging parameter to get more than 100 files back from the GitHub API. The solution was implemented with the help of Gemini CLI.
The code analyzing the changes was inefficient. This incorporates improvements suggested by Gemini CLI.
The label
Size:XShad inconsistent capitalization compared to the other labels. Fixed.