Skip to content

Fix multiple issues in size-labeler.sh#1034

Open
mhucka wants to merge 8 commits intoquantumlib:mainfrom
mhucka:fix-size-labeler
Open

Fix multiple issues in size-labeler.sh#1034
mhucka wants to merge 8 commits intoquantumlib:mainfrom
mhucka:fix-size-labeler

Conversation

@mhucka
Copy link
Copy Markdown
Collaborator

@mhucka mhucka commented Apr 3, 2026

Changes:

  • The IGNORED array in size-labeler.sh contains glob patterns (e.g., *_pb2.py, .*.lock). However, the code used the Bash regex match operator =~, which fails to match these patterns correctly. The following demonstrates the problem:

     # bash -c 'p="*_pb2.py"; [[ a_pb2.py =~ $p ]]'
     bash: line 1: [[: invalid regular expression `*_pb2.py': 
     Invalid preceding regular expression

    The fix (to use ==) is the same fix applied in the Cirq repo in PR #7986 there for the same problem.

  • Previously, only a max of 100 files were fetched from GitHub, which might have been sufficient for all cases. This PR adds support for using a paging parameter to get more than 100 files back from the GitHub API. The solution was implemented with the help of Gemini CLI.

  • The code analyzing the changes was inefficient. This incorporates improvements suggested by Gemini CLI.

  • The label Size:XS had inconsistent capitalization compared to the other labels. Fixed.

The `IGNORED` list contain path globs, but they were getting tested
using bash regex syntax, and this would not produce the right results.
@github-actions github-actions bot added the size: XS <10 lines changed label Apr 3, 2026
@mhucka mhucka marked this pull request as ready for review April 3, 2026 01:42
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the size-labeler.sh script to use Bash glob patterns for file exclusion and increases the GitHub API per_page parameter to 100. While the increase in page size is helpful, feedback was provided noting that the script still fails to handle pagination for pull requests exceeding 100 files, which could result in inaccurate size labels.

This enhances the code to support getting more than 100 files from
GitHub.

(Written with the help of Gemini CLI.)
@github-actions github-actions bot added size: M 50< lines changed <250 and removed size: XS <10 lines changed labels Apr 3, 2026
@mhucka
Copy link
Copy Markdown
Collaborator Author

mhucka commented Apr 3, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the compute_changes function in dev_tools/ci/size-labeler.sh to implement pagination when querying the GitHub API for pull request files, ensuring accurate size labeling for large changesets. It also improves shell script robustness through better variable quoting and refined pattern matching. Feedback was provided to optimize performance by reducing the number of jq process executions within the file processing loop, suggesting a single-pass approach to extract filenames and change counts.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@github-actions github-actions bot added size: S 10< lines changed <50 and removed size: M 50< lines changed <250 labels Apr 3, 2026
@mhucka mhucka changed the title Fix matching operator for ignored files in size-labeler.sh Fix matching error, fix label name, and improve efficiency in size-labeler.sh Apr 5, 2026
@mhucka mhucka changed the title Fix matching error, fix label name, and improve efficiency in size-labeler.sh Fix multiple issues in size-labeler.sh Apr 5, 2026
@mhucka mhucka requested a review from pavoljuhas April 7, 2026 05:41
Copy link
Copy Markdown
Collaborator

@pavoljuhas pavoljuhas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are getting into these weeds, we can get the same using gh api --paginate instead.

I would also remove the IGNORED array and logic as those patterns do not match anything in the qsim repo.

Comment on lines 43 to 47
"*_pb2.py"
"*_pb2.pyi"
"*_pb2_grpc.py"
".*.lock"
"*.bundle.js"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these was ever present in the repo. I suggest to remove the IGNORED array and the related skip-file logic.

Alternatively, we can replace it with and extended regular expression for jq see below why.

Comment on lines +116 to +121
local response
response="$(api_call "pulls/${pr}/files?per_page=100&page=${page}")"

if [[ "$(jq_stdin '. | length' <<<"${response}")" -eq 0 ]]; then
break
fi
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to replace all the pagination logic with gh api --paginate and let it do the next-page calls for us. We can also let the gh sum up change counts in a jq expression, for example,

gh api --paginate /repos/quantumlib/qsim/pulls/965/files \
   --jq="[0, .[].changes] | add"

(the leading 0 in the array is for getting an empty array to sum to 0 rather than null)

Extending the expression with some filter for an IGNORED regex pattern would be straightforward, but we should leave it for later when (if ever) it becomes necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size: S 10< lines changed <50

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants