OCPNODE-3720: Test for AutoNodeSizing#30789
OCPNODE-3720: Test for AutoNodeSizing#30789openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
Skipping CI for Draft Pull Request. |
|
/test all |
|
/pj-rehearse list |
|
/pj-rehearse periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-disruptive-longrunning |
|
/testwith openshift/release/main/e2e-aws-disruptive-longrunning |
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-disruptive-longrunning |
|
@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e1272140-0b60-11f1-81d0-c8556be4b24f-0 |
7f0a623 to
be9a128
Compare
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-disruptive-longrunning |
|
@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7af17aa0-0b93-11f1-85a4-9ddcbb120a7f-0 |
|
/test all |
be9a128 to
75a7a2e
Compare
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-disruptive-longrunning |
|
@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/fb0b6030-0b97-11f1-9ca3-c7755568cd2c-0 |
75a7a2e to
1645179
Compare
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-disruptive-longrunning |
|
@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4758ed80-0bc1-11f1-89b3-25dbfa29a094-0 |
1645179 to
92b001a
Compare
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-disruptive-longrunning |
|
@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/80fa8820-0bd8-11f1-8d89-3eccb3c98203-0 |
92b001a to
281f2f4
Compare
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-disruptive-longrunning |
|
@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/cb08da00-0be4-11f1-81df-2e95b97beff5-0 |
|
@ngopalak-redhat: This pull request references OCPNODE-3720 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target only the "4.22.0" version, but multiple target versions were set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-disruptive-longrunning |
|
@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/9e20d030-283e-11f1-872f-8e7fcfdab5c7-0 |
f672159 to
f736466
Compare
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-disruptive-longrunning |
|
@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/cd801b70-2847-11f1-953f-1120707ac005-0 |
|
Scheduling required tests: |
f736466 to
9cb3ad9
Compare
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-disruptive-longrunning |
|
@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e5e6a4c0-2862-11f1-9c39-20349b7a8b2d-0 |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/extended/node/node_swap_cnv.go (1)
897-917:⚠️ Potential issue | 🟠 MajorTC12 can pass without ever reaching the “OS swap disabled” scenario.
Errors from
swapon -s/swapoff -aare ignored, and Lines 914-916 explicitly continue when swap is still active. That turns this spec into a false positive whenever the node cannot be driven to a swap-free baseline.Suggested fix
- initialSwapOutput, _ := ExecOnNodeWithChroot(oc, cnvWorkerNode, "swapon", "-s") + initialSwapOutput, initialSwapErr := ExecOnNodeWithChroot(oc, cnvWorkerNode, "swapon", "-s") + o.Expect(initialSwapErr).NotTo(o.HaveOccurred(), "must be able to inspect initial swap state") @@ - _, _ = ExecOnNodeWithNsenter(oc, cnvWorkerNode, "swapoff", "-a") + _, swapoffErr := ExecOnNodeWithNsenter(oc, cnvWorkerNode, "swapoff", "-a") + o.Expect(swapoffErr).NotTo(o.HaveOccurred(), "TC12 requires OS swap to be disabled") framework.Logf("OS-level swap disabled") } @@ - swapOutput, _ := ExecOnNodeWithChroot(oc, cnvWorkerNode, "swapon", "-s") + swapOutput, swapCheckErr := ExecOnNodeWithChroot(oc, cnvWorkerNode, "swapon", "-s") + o.Expect(swapCheckErr).NotTo(o.HaveOccurred(), "must be able to verify swap is disabled") framework.Logf("swapon -s output:\n%s", swapOutput) hasOSSwap := strings.TrimSpace(swapOutput) != "" && swapOutput != "Filename\t\t\t\tType\t\tSize\t\tUsed\t\tPriority" if hasOSSwap { - framework.Logf("Warning: Could not disable OS swap, but continuing with test") + e2eskipper.Skipf("TC12 requires a node without active OS swap") } else { framework.Logf("Confirmed: No OS-level swap on node %s", cnvWorkerNode) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/node/node_swap_cnv.go` around lines 897 - 917, The test currently ignores errors from ExecOnNodeWithChroot/ExecOnNodeWithNsenter and treats inability to disable swap as a warning (variables initialHasSwap and hasOSSwap), which lets TC12 falsely pass; update the logic in the block around initialSwapOutput/initialHasSwap and the swapoff attempt so that you capture and check the error return values from ExecOnNodeWithNsenter("swapoff", "-a") and ExecOnNodeWithChroot("swapon", "-s"), and if swap was present (initialHasSwap) but swapoff returned an error or subsequent swapon still indicates hasOSSwap, fail the test (or return an error) instead of logging a warning; ensure the failure path references the same node variable cnvWorkerNode and includes the command error for diagnostics.
♻️ Duplicate comments (2)
test/extended/node/node_swap_cnv.go (1)
1101-1105:⚠️ Potential issue | 🟠 MajorCheck swap reset failures before each TC13 iteration.
These two commands establish the baseline for every size subtest. If either one fails, the next
dd/mkswap/swaponsequence runs on leftover state and the result is unreliable.Suggested fix
- ExecOnNodeWithNsenter(oc, cnvWorkerNode, "swapoff", "-a") - ExecOnNodeWithChroot(oc, cnvWorkerNode, "rm", "-f", swapFilePath) + if _, err := ExecOnNodeWithNsenter(oc, cnvWorkerNode, "swapoff", "-a"); err != nil { + framework.Logf("Warning: failed to disable existing swap: %v", err) + result.success = false + results = append(results, result) + continue + } + if _, err := ExecOnNodeWithChroot(oc, cnvWorkerNode, "rm", "-f", swapFilePath); err != nil { + framework.Logf("Warning: failed to remove previous swap file: %v", err) + result.success = false + results = append(results, result) + continue + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/node/node_swap_cnv.go` around lines 1101 - 1105, Ensure the baseline swap cleanup commands check for failures before proceeding: after calling ExecOnNodeWithNsenter(cnvWorkerNode, "swapoff", "-a") and ExecOnNodeWithChroot(cnvWorkerNode, "rm", "-f", swapFilePath) (the calls shown in the block using ExecOnNodeWithNsenter and ExecOnNodeWithChroot), verify their return/error results and abort or fail the test (e.g., via framework.Failf or framework.ExpectNoError) if they fail so the subsequent dd/mkswap/swapon sequence does not run on leftover state; add clear log messages including the node and operation when detecting an error.test/extended/node/node_utils.go (1)
394-401:⚠️ Potential issue | 🟠 MajorFail
installCNVOperatorwhen the worker MCP never reconverges.
test/extended/node/node_swap_cnv.go, Lines 61-69 only abort setup wheninstallCNVOperatorreturns an error. Warning-only handling here lets the CNV swap suite continue against a non-converged worker pool, which is a flaky/invalid starting state.Suggested fix
mcClient, err := machineconfigclient.NewForConfig(oc.AdminConfig()) if err != nil { framework.Logf("Warning: failed to create MC client for MCP check: %v", err) } else { - err = waitForMCP(ctx, mcClient, "worker", 30*time.Minute) - if err != nil { - framework.Logf("Warning: MCP rollout check failed: %v", err) - } + if err = waitForMCP(ctx, mcClient, "worker", 30*time.Minute); err != nil { + return fmt.Errorf("MCP rollout check failed: %w", err) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/node/node_utils.go` around lines 394 - 401, The current logic logs warnings when creating the MachineConfig client or when waitForMCP("worker", ...) fails, allowing installCNVOperator to succeed even if the worker MCP never reconverges; change this so errors are propagated instead of only logged: in the function that creates mcClient and calls waitForMCP (references: mcClient and waitForMCP), return the error (or wrap and return it) when NewForConfig or waitForMCP fail so installCNVOperator (the caller) will abort setup on MCP non-convergence rather than continuing with warnings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@test/extended/node/node_swap_cnv.go`:
- Around line 897-917: The test currently ignores errors from
ExecOnNodeWithChroot/ExecOnNodeWithNsenter and treats inability to disable swap
as a warning (variables initialHasSwap and hasOSSwap), which lets TC12 falsely
pass; update the logic in the block around initialSwapOutput/initialHasSwap and
the swapoff attempt so that you capture and check the error return values from
ExecOnNodeWithNsenter("swapoff", "-a") and ExecOnNodeWithChroot("swapon", "-s"),
and if swap was present (initialHasSwap) but swapoff returned an error or
subsequent swapon still indicates hasOSSwap, fail the test (or return an error)
instead of logging a warning; ensure the failure path references the same node
variable cnvWorkerNode and includes the command error for diagnostics.
---
Duplicate comments:
In `@test/extended/node/node_swap_cnv.go`:
- Around line 1101-1105: Ensure the baseline swap cleanup commands check for
failures before proceeding: after calling ExecOnNodeWithNsenter(cnvWorkerNode,
"swapoff", "-a") and ExecOnNodeWithChroot(cnvWorkerNode, "rm", "-f",
swapFilePath) (the calls shown in the block using ExecOnNodeWithNsenter and
ExecOnNodeWithChroot), verify their return/error results and abort or fail the
test (e.g., via framework.Failf or framework.ExpectNoError) if they fail so the
subsequent dd/mkswap/swapon sequence does not run on leftover state; add clear
log messages including the node and operation when detecting an error.
In `@test/extended/node/node_utils.go`:
- Around line 394-401: The current logic logs warnings when creating the
MachineConfig client or when waitForMCP("worker", ...) fails, allowing
installCNVOperator to succeed even if the worker MCP never reconverges; change
this so errors are propagated instead of only logged: in the function that
creates mcClient and calls waitForMCP (references: mcClient and waitForMCP),
return the error (or wrap and return it) when NewForConfig or waitForMCP fail so
installCNVOperator (the caller) will abort setup on MCP non-convergence rather
than continuing with warnings.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f19de946-a931-414e-9073-53628a3894c4
📒 Files selected for processing (4)
test/extended/node/node_e2e/node.gotest/extended/node/node_sizing.gotest/extended/node/node_swap_cnv.gotest/extended/node/node_utils.go
|
Scheduling required tests: |
|
@ngopalak-redhat: This pull request references OCPNODE-3720 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target only the "4.22.0" version, but multiple target versions were set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
1 similar comment
|
@ngopalak-redhat: This pull request references OCPNODE-3720 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target only the "4.22.0" version, but multiple target versions were set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
|
@ngopalak-redhat: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/assign @BhargaviGudi |
|
cc: @cpmeadors |
|
/lgtm |
|
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: 9cb3ad9
New tests seen in this PR at sha: 9cb3ad9
|
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BhargaviGudi, cpmeadors, ngopalak-redhat The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@ngopalak-redhat: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
…Node OCPNODE-3720: Test for AutoNodeSizing
Summary
Re-adds the auto node sizing test that was reverted in #30580, with fixes for CI failures and code refactoring.
This test validates that:
Changes
minimize disruption
KubeletConfig changes
Bug Fixes
required labels:
pattern
Refactoring
Testing
Passes in e2e-aws-disruptive-longrunning suite
MonitorTest PodSecurityViolation resolved
https://pr-payload-tests.ci.openshift.org/runs/ci/e5e6a4c0-2862-11f1-9c39-20349b7a8b2d-0
Related
OCPNODE-3751: Add automated tests for kubelet LimitedSwap drop-in configuration for CNV #30795 (comment)
OCPBUGS-66964: Revert "OCPNODE-3720: Add auto-sizing-reversed test to origin" #30580
Fixes OCPNODE-3720