Skip to content

ci: Optimize docker layer for caching and add remote cache#1437

Open
thomasdhc wants to merge 10 commits intomainfrom
donghyukc/remote_cache
Open

ci: Optimize docker layer for caching and add remote cache#1437
thomasdhc wants to merge 10 commits intomainfrom
donghyukc/remote_cache

Conversation

@thomasdhc
Copy link
Contributor

Description

Improve docker layer caching and add container build stage that push and pulls to remote cache

Usage

# Add snippet demonstrating usage

Checklist

  • I am familiar with the Contributing Guide.
  • New or Existing tests cover these changes.
  • The documentation is up to date with these changes.

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +73 to +75
client-id: ${{ inputs.AZURE_CLIENT_ID }}
tenant-id: ${{ inputs.AZURE_TENANT_ID }}
subscription-id: ${{ inputs.AZURE_SUBSCRIPTION_ID }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Input reference using uppercase instead of lowercase will fail

Suggested change
client-id: ${{ inputs.AZURE_CLIENT_ID }}
tenant-id: ${{ inputs.AZURE_TENANT_ID }}
subscription-id: ${{ inputs.AZURE_SUBSCRIPTION_ID }}
client-id: ${{ inputs.azure-client-id }}
tenant-id: ${{ inputs.azure-tenant-id }}
subscription-id: ${{ inputs.azure-subscription-id }}

${{ env.container-registry }}/${{ env.REPO_LOWER }}:${{ fromJSON(steps.get-pr-info.outputs.pr-info || '{}').number || 0 }}
${{ env.container-registry }}/${{ env.REPO_LOWER }}:${{ github.sha }}
secrets: |
GH_TOKEN=${{ secrets.PAT }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should reference input parameter not secret in composite action - change secrets.PAT to inputs.PAT

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 28, 2026

Additional Comments (2)

.github/actions/test-template/action.yml
Still building container locally instead of using inputs.container-image passed from workflow - defeats the purpose of the remote cache optimization

    - name: Pull container
      shell: bash
      run: |
        docker pull ${{ inputs.container-image }}
        docker tag ${{ inputs.container-image }} ray-curator

.github/actions/test-template/action.yml
Checkout path NeMo-Curator doesn't match repo name Curator

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +48 to +50
- name: Checkout
uses: actions/checkout@v4
with:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty with block will cause the checkout action to fail or behave unexpectedly

Suggested change
- name: Checkout
uses: actions/checkout@v4
with:
- name: Checkout
uses: actions/checkout@v4

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 28, 2026

Additional Comments (2)

.github/actions/test-template/action.yml
The container-image input parameter is defined but not used - this step still builds the container locally instead of pulling the pre-built image from the registry, negating the caching benefits

    - name: Pull container
      shell: bash
      env:
        GH_TOKEN: ${{ inputs.PAT }}
      run: |
        docker pull ${{ inputs.container-image }}
        docker tag ${{ inputs.container-image }} ray-curator

.github/actions/test-template/action.yml
Missing sudo - apt-get commands will fail without elevated privileges in GitHub Actions runners

      run: |
        sudo apt-get update
        sudo apt-get install -y uuid-runtime

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 28, 2026

Additional Comments (2)

.github/actions/test-template/action.yml
Container is being built locally instead of using the pre-built image from cicd-container-build job. The container-image input is defined but never used, wasting the remote cache benefits.

    - name: Pull container image
      shell: bash
      env:
        GH_TOKEN: ${{ inputs.PAT }}
      run: |
        docker pull ${{ inputs.container-image }}
        docker tag ${{ inputs.container-image }} ray-curator

.github/workflows/cicd-main.yml
Tags use PR number 0 when not in a PR context (e.g., main branch pushes), which will cause tag collisions and cache conflicts

          ${{ env.container-registry }}/${{ env.REPO_LOWER }}:${{ github.event.pull_request.number || github.run_id }}
          ${{ env.container-registry }}/${{ env.REPO_LOWER }}:${{ github.sha }}

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

}
}
}' | jq -r '.data.repository.pullRequests.nodes[].number' | while read -r number; do
echo "type=registry,ref=${{ env.container-registry }}/${{ env.REPO_LOWER }}:$number-buildcache,mode=max"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${{ env.REPO_LOWER }} is empty here - it's set in step "Normalize repo name to lowercase" which runs after this step. Use ${{ inputs.repo-name }} with bash lowercasing in the same script instead.

Suggested change
echo "type=registry,ref=${{ env.container-registry }}/${{ env.REPO_LOWER }}:$number-buildcache,mode=max"
echo "type=registry,ref=${{ env.container-registry }}/$(echo ${{ inputs.repo-name }} | tr '[:upper:]' '[:lower:]'):$number-buildcache,mode=max"

Comment on lines +84 to +85
apt-get update
apt-get install -y gh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apt-get requires sudo in GitHub Actions runners

Suggested change
apt-get update
apt-get install -y gh
sudo apt-get update
sudo apt-get install -y gh

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +84 to +85
apt-get update
apt-get install -y gh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing sudo - inconsistent with Azure CLI installation at line 66 which uses sudo bash. Will fail if runner doesn't have root privileges.

Suggested change
apt-get update
apt-get install -y gh
sudo apt-get update
sudo apt-get install -y gh

Comment on lines 96 to 97
apt-get update
apt-get install -y uuid-runtime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing sudo - inconsistent with Azure CLI installation at line 76 which uses sudo bash. Will fail if runner doesn't have root privileges.

Suggested change
apt-get update
apt-get install -y uuid-runtime
sudo apt-get update
sudo apt-get install -y uuid-runtime

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +84 to +85
apt-get update
apt-get install -y gh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apt-get requires sudo - inconsistent with Azure CLI installation which uses sudo bash

Suggested change
apt-get update
apt-get install -y gh
sudo apt-get update
sudo apt-get install -y gh

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines 96 to 97
apt-get update
apt-get install -y uuid-runtime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apt-get requires sudo - inconsistent with Azure CLI installation which uses sudo bash

Suggested change
apt-get update
apt-get install -y uuid-runtime
sudo apt-get update
sudo apt-get install -y uuid-runtime

}
}
}' | jq -r '.data.repository.pullRequests.nodes[].number' | while read -r number; do
echo "type=registry,ref=${{ env.container-registry }}/${{ env.REPO_LOWER }}:$number-buildcache,mode=max"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${{ env.REPO_LOWER }} empty here - the step "Normalize repo name to lowercase" runs after this step (line 87). The variable is set at line 92 but not available until subsequent steps. Will produce malformed cache references like nemoci.azurecr.io/:123-buildcache

Comment on lines +130 to +132
type=registry,ref=${{ env.container-registry }}/${{ env.REPO_LOWER }}:${{ fromJSON(steps.get-pr-info.outputs.pr-info || '{}').number || 0 }}-buildcache,mode=max
type=registry,ref=${{ env.container-registry }}/${{ env.REPO_LOWER }}:main-buildcache,mode=max
${{ steps.cache_from.outputs.LAST_PRS }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${{ env.REPO_LOWER }} empty here - it's set in the step at line 87-92 which runs before this, but env variables aren't available in subsequent steps within the same job in composite actions. Will produce malformed image references

- name: Checkout repository
uses: actions/checkout@v2
with:
path: NeMo-Curator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkout path NeMo-Curator doesn't match repository name Curator

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
--mount=type=cache,target=/root/.cache/uv \
uv sync --link-mode copy --locked --extra all --all-groups --no-cache && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using --no-cache flag negates the --mount=type=cache,target=/root/.cache/uv above it, preventing uv from utilizing its cache during sync

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 7 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +84 to +85
apt-get update
apt-get install -y gh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apt-get requires sudo in GitHub Actions runners

Suggested change
apt-get update
apt-get install -y gh
sudo apt-get update
sudo apt-get install -y gh

Comment on lines 96 to 97
apt-get update
apt-get install -y uuid-runtime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apt-get requires sudo in GitHub Actions runners

Suggested change
apt-get update
apt-get install -y uuid-runtime
sudo apt-get update
sudo apt-get install -y uuid-runtime

Comment on lines +112 to +113
}' | jq -r '.data.repository.pullRequests.nodes[].number' | while read -r number; do
echo "type=registry,ref=${{ env.container-registry }}/${{ env.REPO_LOWER }}:$number-buildcache,mode=max"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${{ env.REPO_LOWER }} is empty here - it's set in the next step at line 92, but environment variables aren't available until steps after they're set. This produces malformed cache references like nemoci.azurecr.io/:123-buildcache

Use inline bash lowercasing instead:

echo "type=registry,ref=${{ env.container-registry }}/${REPO,,}:$number-buildcache,mode=max"

Comment on lines +130 to +138
type=registry,ref=${{ env.container-registry }}/${{ env.REPO_LOWER }}:${{ fromJSON(steps.get-pr-info.outputs.pr-info || '{}').number || 0 }}-buildcache,mode=max
type=registry,ref=${{ env.container-registry }}/${{ env.REPO_LOWER }}:main-buildcache,mode=max
${{ steps.cache_from.outputs.LAST_PRS }}
cache-to: |
type=registry,ref=${{ env.container-registry }}/${{ env.REPO_LOWER }}:${{ fromJSON(steps.get-pr-info.outputs.pr-info || '{}').number || 0 }}-buildcache,mode=max
no-cache: false
tags: |
${{ env.container-registry }}/${{ env.REPO_LOWER }}:${{ fromJSON(steps.get-pr-info.outputs.pr-info || '{}').number || 0 }}
${{ env.container-registry }}/${{ env.REPO_LOWER }}:${{ github.sha }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${{ env.REPO_LOWER }} is empty at line 130-131 - it's set at line 92 but composite actions don't make env variables available to subsequent steps. Cache and tag references will be malformed like nemoci.azurecr.io/:abc123

Set REPO_LOWER as output from the normalize step and reference ${{ steps.normalize.outputs.repo_lower }} instead

- name: Checkout repository
uses: actions/checkout@v2
with:
path: NeMo-Curator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkout path NeMo-Curator doesn't match repository name curator

Suggested change
path: NeMo-Curator
path: curator

@@ -116,7 +119,7 @@ runs:
--volume $(pwd)/NeMo-Curator:/workspace \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Volume mount path references NeMo-Curator but should be curator to match the repository name

Suggested change
--volume $(pwd)/NeMo-Curator:/workspace \
--volume $(pwd)/curator:/workspace \

# Install Curator
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
uv sync --link-mode copy --locked --extra all --all-groups --no-cache && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--no-cache flag negates the --mount=type=cache,target=/var/cache/apt above it, preventing the cache mount from functioning

Suggested change
uv sync --link-mode copy --locked --extra all --all-groups --no-cache && \
uv sync --link-mode copy --locked --extra all --all-groups && \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants