Skip to content

OCPNODE-3751: Add automated tests for kubelet LimitedSwap drop-in configuration for CNV#30795

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
BhargaviGudi:bgudi-node-cnv-swap
Mar 24, 2026
Merged

OCPNODE-3751: Add automated tests for kubelet LimitedSwap drop-in configuration for CNV#30795
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
BhargaviGudi:bgudi-node-cnv-swap

Conversation

@BhargaviGudi
Copy link
Copy Markdown
Contributor

This PR adds automated tests for the LimitedSwap feature using the /etc/openshift/kubelet.conf.d/ drop-in directory on CNV-enabled clusters.

Jira: https://issues.redhat.com/browse/OCPNODE-3751

Test Cases:
TC1: Verify silent creation and ownership of drop-in directory on CNV nodes
TC2: Verify kubelet starts normally with empty directory
TC3: Apply LimitedSwap configuration from drop-in file
TC4: Revert to NoSwap when drop-in file is removed
TC5: Verify control plane kubelets ignore drop-in config
TC6: Verify drop-in directory is auto-recreated after deletion
TC7: Validate security and permissions of drop-in directory
TC8: Verify cluster stability with LimitedSwap enabled
TC9: Verify non-CNV workers have no swap configuration
TC10: Apply correct precedence with multiple drop-in files
TC11: Maintain consistent configuration with checksum verification across CNV nodes
TC12: Handle LimitedSwap config gracefully when OS swap is disabled
TC13: Work correctly with various swap sizes
TC14: Expose swap metrics correctly via Prometheus

Changes
test/extended/node/node_swap_cnv.go - New test file with CNV swap tests
test/extended/node/node_utils.go - Helper functions for node operations and CNV lifecycle
test/extended/testdata/node/cnv-swap/ - Test data YAML files and README
pkg/testsuites/standard_suites.go - Added openshift/nodes/cnv suite definition

Running Tests
Run individual test
./openshift-tests run-test "[Jira:Node/Kubelet][sig-node][Feature:NodeSwap][Serial][Disruptive][Suite:openshift/nodes/cnv] Kubelet LimitedSwap Drop-in Configuration for CNV TC1: should verify silent creation and ownership of drop-in directory on CNV nodes
Run entire suite
./openshift-tests run openshift/nodes/cnv

All the testcases are passed with 4.22.0-0.nightly-2026-01-19-085729
I ran each test case individually and captured the outputs here.
Ran test suite and kept the output here

CI job:
https://github.com/openshift/release/pull/73670 WIP

@BhargaviGudi BhargaviGudi changed the title OCPNODE-3751: Add automated tests for kubelet LimitedSwap drop-in configuration for CNV Add automated tests for kubelet LimitedSwap drop-in configuration for CNV Feb 18, 2026
@BhargaviGudi
Copy link
Copy Markdown
Contributor Author

/ok-to-test

@BhargaviGudi
Copy link
Copy Markdown
Contributor Author

/test all

@BhargaviGudi BhargaviGudi changed the title Add automated tests for kubelet LimitedSwap drop-in configuration for CNV OCPNODE-3751: Add automated tests for kubelet LimitedSwap drop-in configuration for CNV Feb 18, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 18, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Feb 18, 2026

@BhargaviGudi: This pull request references OCPNODE-3751 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR adds automated tests for the LimitedSwap feature using the /etc/openshift/kubelet.conf.d/ drop-in directory on CNV-enabled clusters.

Jira: https://issues.redhat.com/browse/OCPNODE-3751

Test Cases:
TC1: Verify silent creation and ownership of drop-in directory on CNV nodes
TC2: Verify kubelet starts normally with empty directory
TC3: Apply LimitedSwap configuration from drop-in file
TC4: Revert to NoSwap when drop-in file is removed
TC5: Verify control plane kubelets ignore drop-in config
TC6: Verify drop-in directory is auto-recreated after deletion
TC7: Validate security and permissions of drop-in directory
TC8: Verify cluster stability with LimitedSwap enabled
TC9: Verify non-CNV workers have no swap configuration
TC10: Apply correct precedence with multiple drop-in files
TC11: Maintain consistent configuration with checksum verification across CNV nodes
TC12: Handle LimitedSwap config gracefully when OS swap is disabled
TC13: Work correctly with various swap sizes
TC14: Expose swap metrics correctly via Prometheus

Changes
test/extended/node/node_swap_cnv.go - New test file with CNV swap tests
test/extended/node/node_utils.go - Helper functions for node operations and CNV lifecycle
test/extended/testdata/node/cnv-swap/ - Test data YAML files and README
pkg/testsuites/standard_suites.go - Added openshift/nodes/cnv suite definition

Running Tests
Run individual test
./openshift-tests run-test "[Jira:Node/Kubelet][sig-node][Feature:NodeSwap][Serial][Disruptive][Suite:openshift/nodes/cnv] Kubelet LimitedSwap Drop-in Configuration for CNV TC1: should verify silent creation and ownership of drop-in directory on CNV nodes
Run entire suite
./openshift-tests run openshift/nodes/cnv

All the testcases are passed with 4.22.0-0.nightly-2026-01-19-085729
I ran each test case individually and captured the outputs here.
Ran test suite and kept the output here

CI job:
https://github.com/openshift/release/pull/73670 WIP

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@smg247
Copy link
Copy Markdown
Member

smg247 commented Feb 18, 2026

/approve

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 18, 2026
@cpmeadors
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 18, 2026
@openshift-ci-robot
Copy link
Copy Markdown

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@BhargaviGudi
Copy link
Copy Markdown
Contributor Author

/verified by @BhargaviGudi

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Feb 19, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@BhargaviGudi: This PR has been marked as verified by @BhargaviGudi.

Details

In response to this:

/verified by @BhargaviGudi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@BhargaviGudi
Copy link
Copy Markdown
Contributor Author

/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-serial-1of2

@BhargaviGudi
Copy link
Copy Markdown
Contributor Author

/label acknowledge-critical-fixes-only

@openshift-ci openshift-ci bot added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Feb 19, 2026
@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Feb 19, 2026
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 19, 2026
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 19, 2026
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 19, 2026
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 19, 2026
@BhargaviGudi
Copy link
Copy Markdown
Contributor Author

/payload-job periodic-ci-openshift-release-master-ci-4.22-e2e-aws-upgrade-ovn-single-node

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 17, 2026

@BhargaviGudi: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
test/extended/node/node_swap_cnv.go (3)

41-42: Misplaced defer g.GinkgoRecover() in g.Describe block.

defer g.GinkgoRecover() at line 42 has no effect here because it's in the g.Describe function body, not in a goroutine or a function that will panic. GinkgoRecover is intended for use in goroutines spawned within tests to recover panics. This line can be removed as it serves no purpose in this context.

Proposed fix
 var _ = g.Describe("[Jira:Node/Kubelet][sig-node][Feature:NodeSwap][Serial][Disruptive][Suite:openshift/disruptive-longrunning] Kubelet LimitedSwap Drop-in Configuration for CNV", g.Ordered, func() {
-	defer g.GinkgoRecover()
-
 	var oc = exutil.NewCLI("cnv-swap")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_swap_cnv.go` around lines 41 - 42, Remove the no-op
defer g.GinkgoRecover() call from inside the g.Describe block in
test/extended/node/node_swap_cnv.go; locate the Describe declaration (the var _
= g.Describe(...) anonymous block) and delete the line `defer
g.GinkgoRecover()`—if GinkgoRecover is needed, only add it inside goroutines
spawned within tests (e.g., inside any go-routine bodies used in
It/BeforeEach/AfterEach), not at the top-level Describe.

931-942: Incomplete swap restoration in TC12 cleanup.

The cleanup logs "may need manual re-enable" for swap (line 937) but doesn't actually restore it. While this may be acceptable since the test is disruptive, it leaves the node in a potentially different state. Consider storing the original swap configuration and restoring it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_swap_cnv.go` around lines 931 - 942, The deferred
cleanup currently only logs that swap "may need manual re-enable" but does not
restore the original swap state; update the test to capture the node's initial
swap configuration (e.g., whether swap was enabled and any related fstab or
swapfile path) before disabling it and then restore that exact state in the
defer block: when initialHasSwap is true re-create or restore the original fstab
entry or swapfile and run swapon on the node (use the same oc/node exec helpers
you already use around removeDropInFile, restartKubeletOnNode,
waitForNodeToBeReady), ensuring these steps occur before restarting kubelet so
the node returns to its prior state; reference initialHasSwap, cnvWorkerNode,
cnvDropInFilePath, removeDropInFile, restartKubeletOnNode, and
waitForNodeToBeReady when implementing the capture-and-restore logic.

1342-1352: Bearer token included in curl command string may appear in logs.

The bearer token is embedded in the curl command at line 1342, and the result is logged at line 1349. While this is a test environment with short-lived tokens, consider using environment variables or a different approach to avoid token exposure in logs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_swap_cnv.go` around lines 1342 - 1352, The curl
command currently embeds the bearer token in curlCmd (token variable) which
risks exposing it; change the oc.AsAdmin().WithoutNamespace().Run("exec")
invocation to avoid putting token directly into the command string by passing
the token via stdin and sourcing it inside the remote shell (e.g., pipe the
token into oc exec and use sh -c 'read TOKEN; curl -sk -H "Authorization: Bearer
$TOKEN" "https://.../api/v1/query?query=..."'), stop logging the raw command
(curlCmd) and ensure any logs (e.g., framework.Logf around swapTotalResult)
never print the token variable or the composed command; update references in
this block (curlCmd, token, swapTotalResult, the
oc.AsAdmin().WithoutNamespace().Run("exec") call) accordingly.
test/extended/node/node_utils.go (1)

139-155: rand.Intn used without seeding produces deterministic results.

rand.Intn at lines 145 and 154 uses the default global random source, which in Go versions prior to 1.20 starts with seed 1, producing deterministic sequences. While Go 1.20+ auto-seeds, for consistent behavior consider using rand.New(rand.NewSource(time.Now().UnixNano())) or document the Go version requirement.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_utils.go` around lines 139 - 155, The function
getCNVWorkerNodeName uses rand.Intn (calls to rand.Intn in this function) which
can produce deterministic results if the global source isn't seeded; seed the
PRNG at function or package init by creating a
rand.New(rand.NewSource(time.Now().UnixNano())) and use that instance for the
two rand.Intn calls (or seed the global source in init) so node selection is
non-deterministic; update imports (time and math/rand) and replace the rand.Intn
usages in getCNVWorkerNodeName accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/extended/node/node_swap_cnv.go`:
- Around line 150-154: The current check uses
o.ContainSubstring("error.*kubelet.conf.d") which is a literal substring match
and will never treat ".*" as a regex; update the assertion to use a regex
matcher or separate substring checks: replace the ContainSubstring call with
o.MatchRegexp("error.*kubelet.conf.d") (or split into two literal checks for
"error" and "kubelet.conf.d") so the lowerOutput, o.Expect and the failing
assertion correctly detect patterns (use the existing lowerOutput variable and
the o.Expect(...) wrapper when changing ContainSubstring to MatchRegexp).
- Around line 341-380: The loop that creates drop-in files on control plane
nodes can exit early on assertions and skip cleanup; after calling
createDropInFile (and before any assertion that can fail) arrange guaranteed
cleanup by deferring removal: capture cpNodeName in a new local variable (e.g.,
curNode := cpNodeName) and immediately defer removeDropInFile(oc, curNode,
cnvDropInFilePath) and a DebugNodeWithChroot(oc, curNode, "rmdir", cnvDropInDir)
pair so the drop-in file and directory are removed even if
getKubeletConfigFromNode, restartKubeletOnNode, or the o.Expect assertions fail;
ensure the deferred calls use the local copy to avoid closure-of-loop-variable
bugs.

---

Nitpick comments:
In `@test/extended/node/node_swap_cnv.go`:
- Around line 41-42: Remove the no-op defer g.GinkgoRecover() call from inside
the g.Describe block in test/extended/node/node_swap_cnv.go; locate the Describe
declaration (the var _ = g.Describe(...) anonymous block) and delete the line
`defer g.GinkgoRecover()`—if GinkgoRecover is needed, only add it inside
goroutines spawned within tests (e.g., inside any go-routine bodies used in
It/BeforeEach/AfterEach), not at the top-level Describe.
- Around line 931-942: The deferred cleanup currently only logs that swap "may
need manual re-enable" but does not restore the original swap state; update the
test to capture the node's initial swap configuration (e.g., whether swap was
enabled and any related fstab or swapfile path) before disabling it and then
restore that exact state in the defer block: when initialHasSwap is true
re-create or restore the original fstab entry or swapfile and run swapon on the
node (use the same oc/node exec helpers you already use around removeDropInFile,
restartKubeletOnNode, waitForNodeToBeReady), ensuring these steps occur before
restarting kubelet so the node returns to its prior state; reference
initialHasSwap, cnvWorkerNode, cnvDropInFilePath, removeDropInFile,
restartKubeletOnNode, and waitForNodeToBeReady when implementing the
capture-and-restore logic.
- Around line 1342-1352: The curl command currently embeds the bearer token in
curlCmd (token variable) which risks exposing it; change the
oc.AsAdmin().WithoutNamespace().Run("exec") invocation to avoid putting token
directly into the command string by passing the token via stdin and sourcing it
inside the remote shell (e.g., pipe the token into oc exec and use sh -c 'read
TOKEN; curl -sk -H "Authorization: Bearer $TOKEN"
"https://.../api/v1/query?query=..."'), stop logging the raw command (curlCmd)
and ensure any logs (e.g., framework.Logf around swapTotalResult) never print
the token variable or the composed command; update references in this block
(curlCmd, token, swapTotalResult, the
oc.AsAdmin().WithoutNamespace().Run("exec") call) accordingly.

In `@test/extended/node/node_utils.go`:
- Around line 139-155: The function getCNVWorkerNodeName uses rand.Intn (calls
to rand.Intn in this function) which can produce deterministic results if the
global source isn't seeded; seed the PRNG at function or package init by
creating a rand.New(rand.NewSource(time.Now().UnixNano())) and use that instance
for the two rand.Intn calls (or seed the global source in init) so node
selection is non-deterministic; update imports (time and math/rand) and replace
the rand.Intn usages in getCNVWorkerNodeName accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 17d696af-1352-4db5-9690-c00b6ba41039

📥 Commits

Reviewing files that changed from the base of the PR and between 2fa0e14 and e178648.

📒 Files selected for processing (11)
  • pkg/testsuites/standard_suites.go
  • test/extended/node/node_swap_cnv.go
  • test/extended/node/node_utils.go
  • test/extended/testdata/bindata.go
  • test/extended/testdata/node/cnv-swap/README.md
  • test/extended/testdata/node/cnv-swap/cnv-hyperconverged.yaml
  • test/extended/testdata/node/cnv-swap/cnv-namespace.yaml
  • test/extended/testdata/node/cnv-swap/cnv-operatorgroup.yaml
  • test/extended/testdata/node/cnv-swap/cnv-subscription.yaml
  • test/extended/testdata/node/cnv-swap/kubelet-limitedswap-dropin.yaml
  • test/extended/testdata/node/cnv-swap/kubelet-noswap-dropin.yaml

@openshift-ci-robot
Copy link
Copy Markdown

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@BhargaviGudi
Copy link
Copy Markdown
Contributor Author

/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-disruptive-longrunning

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 18, 2026

@BhargaviGudi: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-disruptive-longrunning

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/fd1673c0-2293-11f1-86fc-d9415f12224b-0

@BhargaviGudi
Copy link
Copy Markdown
Contributor Author

/test e2e-metal-ipi-ovn-ipv6

@BhargaviGudi
Copy link
Copy Markdown
Contributor Author

All the tests are passed

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2026
…figuration for CNV

Updated go.mod by runing go mod tidy and make verify-deps

verify-deps in the PR was failing due to this issue

Resolved conflicts

Removed exstra spaces

resolved gofmt issue

Resolved rebase issue

Resolved rebase issue;Added skip for hypershift and SNO clusters as needed

resolved regex issue
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/testsuites/standard_suites.go`:
- Around line 481-494: The suite entry with Name "openshift/nodes/cnv" has a
Qualifiers slice containing `name.contains("[Suite:openshift/nodes/cnv")` that
no longer matches the new CNV specs tagged
`[Suite:openshift/disruptive-longrunning]`; update the Qualifiers for that suite
(the Qualifiers slice for the "openshift/nodes/cnv" entry) to include the
correct selector `name.contains("[Suite:openshift/disruptive-longrunning]")` (or
replace the existing selector) so the new specs (e.g.,
test/extended/node/node_swap_cnv.go) are picked up by this suite.

In `@test/extended/node/node_swap_cnv.go`:
- Around line 901-907: The test disables host swap with swapoff -a but never
restores pre-existing swap; update the Setup/teardown around the blocks that
check initialHasSwap to first capture the current active swap entries (e.g., run
"swapon --show=NAME --noheadings" via DebugNodeWithNsenter using oc and
cnvWorkerNode) before calling swapoff, and add a defer/cleanup that re-enables
only those captured swap devices (using "swapon <device>" or the original swapon
entries) when initialHasSwap is true; apply the same change to the other
occurrences referenced (around the blocks identified by initialHasSwap at the
other ranges) so tests leave the node in its original state.
- Around line 1321-1434: The test currently treats Prometheus route/token/query
and kubelet metrics failures as warnings which lets TC14 pass without proving
metric exposure; change the logic so that if prometheusRoute is found then token
retrieval and both Prometheus queries (swapTotalResult, swapFreeResult) must
succeed and be non-empty (use o.Expect or framework.Failf on errors/empty
results) and if prometheusRoute is not found decide to fail the test or
explicitly Skip with a clear message; similarly, require kubeletMetrics to be
retrievable and contain swap-related lines (fail if the raw
/api/v1/nodes/.../proxy/metrics call errors or no swap metrics found) by
updating the checks around prometheusRoute, token, swapTotalResult,
swapFreeResult, kubeletMetrics and the final assertions that now assert metric
query success in addition to /proc/meminfo.
- Around line 75-79: The global setup currently calls
ensureDropInDirectoryExists (which does a mkdir -p across all workers) in
node_swap_cnv.go; remove that call from the global setup so test-created state
is not pre-populated and regressions in CNV/MCO directory creation won't be
masked. If some individual tests actually require the directory to exist for
their own preconditions, move the ensureDropInDirectoryExists() invocation into
those specific test setup blocks (not the global setup), or replace it with
explicit per-test setup that validates CNV/MCO behavior instead of pre-creating
the directory.

In `@test/extended/node/node_utils.go`:
- Around line 139-155: The helper getCNVWorkerNodeName currently falls back to
any worker when no node has the "kubevirt.io/schedulable=true" label; change it
to fail fast instead: remove the fallback call to getNodesByLabel(...,
"node-role.kubernetes.io/worker") and if getNodesByLabel(ctx, oc,
"kubevirt.io/schedulable=true") returns no nodes or an error, surface that
failure (e.g., change getCNVWorkerNodeName signature to return (string, error)
or call the test framework's fail helper like framework.Failf/GinkgoFail) so
callers cannot silently proceed; update all callers of getCNVWorkerNodeName to
handle the new error/failed-case accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f18679f5-6cda-4352-b1cc-607f9509d340

📥 Commits

Reviewing files that changed from the base of the PR and between e178648 and 8a3bfd6.

📒 Files selected for processing (11)
  • pkg/testsuites/standard_suites.go
  • test/extended/node/node_swap_cnv.go
  • test/extended/node/node_utils.go
  • test/extended/testdata/bindata.go
  • test/extended/testdata/node/cnv-swap/README.md
  • test/extended/testdata/node/cnv-swap/cnv-hyperconverged.yaml
  • test/extended/testdata/node/cnv-swap/cnv-namespace.yaml
  • test/extended/testdata/node/cnv-swap/cnv-operatorgroup.yaml
  • test/extended/testdata/node/cnv-swap/cnv-subscription.yaml
  • test/extended/testdata/node/cnv-swap/kubelet-limitedswap-dropin.yaml
  • test/extended/testdata/node/cnv-swap/kubelet-noswap-dropin.yaml
🚧 Files skipped from review as they are similar to previous changes (4)
  • test/extended/testdata/node/cnv-swap/README.md
  • test/extended/testdata/node/cnv-swap/kubelet-noswap-dropin.yaml
  • test/extended/testdata/node/cnv-swap/kubelet-limitedswap-dropin.yaml
  • test/extended/testdata/node/cnv-swap/cnv-hyperconverged.yaml

Comment on lines +75 to +79
// Ensure drop-in directory exists on all worker nodes
err = ensureDropInDirectoryExists(ctx, oc, cnvDropInDir)
if err != nil {
framework.Logf("Warning: failed to ensure drop-in directory exists: %v", err)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't create the drop-in directory in global setup.

ensureDropInDirectoryExists() is a mkdir -p across all workers (test/extended/node/node_utils.go:644-657). That makes TC1/TC2/TC6 validate test-created state instead of CNV/MCO behavior, so regressions in directory creation or auto-recreation will be masked.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_swap_cnv.go` around lines 75 - 79, The global setup
currently calls ensureDropInDirectoryExists (which does a mkdir -p across all
workers) in node_swap_cnv.go; remove that call from the global setup so
test-created state is not pre-populated and regressions in CNV/MCO directory
creation won't be masked. If some individual tests actually require the
directory to exist for their own preconditions, move the
ensureDropInDirectoryExists() invocation into those specific test setup blocks
(not the global setup), or replace it with explicit per-test setup that
validates CNV/MCO behavior instead of pre-creating the directory.

Comment on lines +901 to +907
// If swap is already enabled, disable it for this test
if initialHasSwap {
g.By("Disabling existing OS-level swap for test")
framework.Logf("Running: swapoff -a")
_, _ = DebugNodeWithNsenter(oc, cnvWorkerNode, "swapoff", "-a")
framework.Logf("OS-level swap disabled")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Restore any pre-existing swap instead of leaving the node altered.

These specs disable all swap with swapoff -a, but cleanup only removes the temporary file and even notes that manual re-enable may be needed. On nodes that started with swap enabled, the suite permanently changes host state and can skew the rest of the ordered run. Capture the active swap entries before disabling them and re-enable them in the defer.

Also applies to: 931-941, 1068-1080, 1101-1104

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_swap_cnv.go` around lines 901 - 907, The test
disables host swap with swapoff -a but never restores pre-existing swap; update
the Setup/teardown around the blocks that check initialHasSwap to first capture
the current active swap entries (e.g., run "swapon --show=NAME --noheadings" via
DebugNodeWithNsenter using oc and cnvWorkerNode) before calling swapoff, and add
a defer/cleanup that re-enables only those captured swap devices (using "swapon
<device>" or the original swapon entries) when initialHasSwap is true; apply the
same change to the other occurrences referenced (around the blocks identified by
initialHasSwap at the other ranges) so tests leave the node in its original
state.

Comment on lines +1321 to +1434
prometheusRoute, err := oc.AsAdmin().WithoutNamespace().Run("get").Args(
"route", "prometheus-k8s", "-n", "openshift-monitoring",
"-o", "jsonpath={.spec.host}").Output()
if err != nil || prometheusRoute == "" {
framework.Logf("Warning: Could not get Prometheus route: %v", err)
framework.Logf("Skipping Prometheus metrics validation")
} else {
framework.Logf("Prometheus route: %s", prometheusRoute)

// Get bearer token for Prometheus access
token, err := oc.AsAdmin().WithoutNamespace().Run("whoami").Args("-t").Output()
if err != nil {
framework.Logf("Warning: Could not get token: %v", err)
} else {
framework.Logf("Got authentication token for Prometheus access")

g.By("Querying node_memory_SwapTotal_bytes metric")
// Query for swap total metric - URL encode the query
swapTotalQuery := fmt.Sprintf("node_memory_SwapTotal_bytes{instance=~\"%s.*\"}", cnvWorkerNode)
framework.Logf("Prometheus query: %s", swapTotalQuery)
encodedSwapTotalQuery := url.QueryEscape(swapTotalQuery)
curlCmd := fmt.Sprintf("curl -sk -H 'Authorization: Bearer %s' 'https://%s/api/v1/query?query=%s'",
strings.TrimSpace(token), prometheusRoute, encodedSwapTotalQuery)
swapTotalResult, err := oc.AsAdmin().WithoutNamespace().Run("exec").Args(
"-n", "openshift-monitoring",
"prometheus-k8s-0", "-c", "prometheus",
"--", "sh", "-c", curlCmd).Output()
if err == nil {
framework.Logf("Prometheus SwapTotal query result: %s", swapTotalResult)
} else {
framework.Logf("Warning: Prometheus query failed: %v", err)
}

g.By("Querying node_memory_SwapFree_bytes metric")
swapFreeQuery := fmt.Sprintf("node_memory_SwapFree_bytes{instance=~\"%s.*\"}", cnvWorkerNode)
framework.Logf("Prometheus query: %s", swapFreeQuery)
encodedSwapFreeQuery := url.QueryEscape(swapFreeQuery)
curlCmd = fmt.Sprintf("curl -sk -H 'Authorization: Bearer %s' 'https://%s/api/v1/query?query=%s'",
strings.TrimSpace(token), prometheusRoute, encodedSwapFreeQuery)
swapFreeResult, err := oc.AsAdmin().WithoutNamespace().Run("exec").Args(
"-n", "openshift-monitoring",
"prometheus-k8s-0", "-c", "prometheus",
"--", "sh", "-c", curlCmd).Output()
if err == nil {
framework.Logf("Prometheus SwapFree query result: %s", swapFreeResult)
} else {
framework.Logf("Warning: Prometheus query failed: %v", err)
}
}
}

g.By("Querying kubelet metrics endpoint for swap data")
framework.Logf("Running: oc get --raw \"/api/v1/nodes/%s/proxy/metrics\" | grep -i swap", cnvWorkerNode)
kubeletMetrics, err := oc.AsAdmin().WithoutNamespace().Run("get").Args(
"--raw", fmt.Sprintf("/api/v1/nodes/%s/proxy/metrics", cnvWorkerNode)).Output()
if err == nil {
// Filter for swap-related metrics
swapMetrics := []string{}
for _, line := range strings.Split(kubeletMetrics, "\n") {
lowerLine := strings.ToLower(line)
if strings.Contains(lowerLine, "swap") && !strings.HasPrefix(line, "#") {
swapMetrics = append(swapMetrics, line)
}
}
if len(swapMetrics) > 0 {
framework.Logf("Kubelet swap-related metrics found:")
for _, metric := range swapMetrics {
framework.Logf(" %s", metric)
}
} else {
framework.Logf("No swap-specific metrics found in kubelet metrics (may be exposed via node-exporter)")
}
} else {
framework.Logf("Warning: Could not query kubelet metrics: %v", err)
}

g.By("Validating metrics are present and accurate")
// Verify /proc/meminfo shows swap info (SwapTotal and SwapFree fields should exist)
o.Expect(meminfoOutput).To(o.ContainSubstring("SwapTotal"))
o.Expect(meminfoOutput).To(o.ContainSubstring("SwapFree"))
framework.Logf("Validation passed: /proc/meminfo contains SwapTotal and SwapFree fields")

// If we created swap, verify non-zero values
if swapCreated || hasOSSwap {
o.Expect(swapTotalBytes).To(o.BeNumerically(">", 0), "SwapTotal should be > 0 when swap is enabled")
framework.Logf("Validation passed: SwapTotal=%d bytes (swap is enabled)", swapTotalBytes)
// Expected swap size ~512MB = 536870912 bytes (allow some variance)
if swapCreated {
expectedMinBytes := int64(swapSizeMB*1024*1024) - 10*1024*1024 // Allow 10MB variance
o.Expect(swapTotalBytes).To(o.BeNumerically(">=", expectedMinBytes),
fmt.Sprintf("SwapTotal should be approximately %dMB", swapSizeMB))
}
}

g.By("Verifying node remains Ready with metrics collection active")
node, err := oc.AdminKubeClient().CoreV1().Nodes().Get(ctx, cnvWorkerNode, metav1.GetOptions{})
o.Expect(err).NotTo(o.HaveOccurred())
o.Expect(isNodeInReadyState(node)).To(o.BeTrue(), "Node should remain Ready")
framework.Logf("Node %s is Ready", cnvWorkerNode)

osSwapStatus := "enabled"
if swapCreated {
osSwapStatus = fmt.Sprintf("enabled (created %dMB by test)", swapSizeMB)
} else if hasOSSwap {
osSwapStatus = "enabled (pre-existing)"
}
framework.Logf("=== TC14 PASSED ===")
framework.Logf("Swap metrics and observability verification:")
framework.Logf("- Node: %s", cnvWorkerNode)
framework.Logf("- OS swap: %s", osSwapStatus)
framework.Logf("- Kubelet swapBehavior: %s", config.MemorySwap.SwapBehavior)
framework.Logf("- /proc/meminfo: SwapTotal=%d KB, SwapFree=%d KB", swapTotalKB, swapFreeKB)
framework.Logf("- Prometheus metrics: queried successfully with non-zero values")
framework.Logf("- Kubelet metrics endpoint: queried")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

TC14 passes without actually proving metric exposure.

Every Prometheus/kubelet metrics failure here is reduced to a warning, and the hard assertions at the end only check /proc/meminfo. This spec can therefore succeed even when route lookup, auth, pod exec, or metric queries all fail.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_swap_cnv.go` around lines 1321 - 1434, The test
currently treats Prometheus route/token/query and kubelet metrics failures as
warnings which lets TC14 pass without proving metric exposure; change the logic
so that if prometheusRoute is found then token retrieval and both Prometheus
queries (swapTotalResult, swapFreeResult) must succeed and be non-empty (use
o.Expect or framework.Failf on errors/empty results) and if prometheusRoute is
not found decide to fail the test or explicitly Skip with a clear message;
similarly, require kubeletMetrics to be retrievable and contain swap-related
lines (fail if the raw /api/v1/nodes/.../proxy/metrics call errors or no swap
metrics found) by updating the checks around prometheusRoute, token,
swapTotalResult, swapFreeResult, kubeletMetrics and the final assertions that
now assert metric query success in addition to /proc/meminfo.

Comment on lines +139 to +155
// getCNVWorkerNodeName returns the name of a worker node with CNV label (kubevirt.io/schedulable=true)
func getCNVWorkerNodeName(ctx context.Context, oc *exutil.CLI) string {
// First try to get nodes with CNV schedulable label
nodes, err := getNodesByLabel(ctx, oc, "kubevirt.io/schedulable=true")
if err == nil && len(nodes) > 0 {
// Randomly select a node from the available CNV nodes
return nodes[rand.Intn(len(nodes))].Name
}

// Fallback to any worker node
nodes, err = getNodesByLabel(ctx, oc, "node-role.kubernetes.io/worker")
if err != nil || len(nodes) == 0 {
return ""
}
// Randomly select a node from available worker nodes
return nodes[rand.Intn(len(nodes))].Name
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail fast when no CNV-schedulable worker exists.

This helper is used by CNV-only specs, but if kubevirt.io/schedulable=true is absent it silently falls back to any worker. That masks missing CNV coverage and can produce false passes/failures on nodes that were never configured for CNV.

Possible minimal fix
 func getCNVWorkerNodeName(ctx context.Context, oc *exutil.CLI) string {
 	// First try to get nodes with CNV schedulable label
 	nodes, err := getNodesByLabel(ctx, oc, "kubevirt.io/schedulable=true")
 	if err == nil && len(nodes) > 0 {
 		// Randomly select a node from the available CNV nodes
 		return nodes[rand.Intn(len(nodes))].Name
 	}
-
-	// Fallback to any worker node
-	nodes, err = getNodesByLabel(ctx, oc, "node-role.kubernetes.io/worker")
-	if err != nil || len(nodes) == 0 {
-		return ""
-	}
-	// Randomly select a node from available worker nodes
-	return nodes[rand.Intn(len(nodes))].Name
+	return ""
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// getCNVWorkerNodeName returns the name of a worker node with CNV label (kubevirt.io/schedulable=true)
func getCNVWorkerNodeName(ctx context.Context, oc *exutil.CLI) string {
// First try to get nodes with CNV schedulable label
nodes, err := getNodesByLabel(ctx, oc, "kubevirt.io/schedulable=true")
if err == nil && len(nodes) > 0 {
// Randomly select a node from the available CNV nodes
return nodes[rand.Intn(len(nodes))].Name
}
// Fallback to any worker node
nodes, err = getNodesByLabel(ctx, oc, "node-role.kubernetes.io/worker")
if err != nil || len(nodes) == 0 {
return ""
}
// Randomly select a node from available worker nodes
return nodes[rand.Intn(len(nodes))].Name
}
// getCNVWorkerNodeName returns the name of a worker node with CNV label (kubevirt.io/schedulable=true)
func getCNVWorkerNodeName(ctx context.Context, oc *exutil.CLI) string {
// First try to get nodes with CNV schedulable label
nodes, err := getNodesByLabel(ctx, oc, "kubevirt.io/schedulable=true")
if err == nil && len(nodes) > 0 {
// Randomly select a node from the available CNV nodes
return nodes[rand.Intn(len(nodes))].Name
}
return ""
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_utils.go` around lines 139 - 155, The helper
getCNVWorkerNodeName currently falls back to any worker when no node has the
"kubevirt.io/schedulable=true" label; change it to fail fast instead: remove the
fallback call to getNodesByLabel(..., "node-role.kubernetes.io/worker") and if
getNodesByLabel(ctx, oc, "kubevirt.io/schedulable=true") returns no nodes or an
error, surface that failure (e.g., change getCNVWorkerNodeName signature to
return (string, error) or call the test framework's fail helper like
framework.Failf/GinkgoFail) so callers cannot silently proceed; update all
callers of getCNVWorkerNodeName to handle the new error/failed-case accordingly.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (5)
test/extended/node/node_swap_cnv.go (5)

901-907: ⚠️ Potential issue | 🟠 Major

Swap baseline is not restored after swapoff -a.

When nodes start with swap enabled, these tests disable host swap and do not re-enable original entries, leaving persistent state drift.

Proposed fix pattern
+		// capture active swap devices before any swapoff
+		preSwapOutput, err := DebugNodeWithNsenter(oc, cnvWorkerNode, "sh", "-c", "swapon --show=NAME --noheadings")
+		o.Expect(err).NotTo(o.HaveOccurred())
+		preSwapDevices := []string{}
+		for _, line := range strings.Split(strings.TrimSpace(preSwapOutput), "\n") {
+			if strings.TrimSpace(line) != "" {
+				preSwapDevices = append(preSwapDevices, strings.TrimSpace(line))
+			}
+		}
@@
-			// Re-enable swap if it was initially present
-			if initialHasSwap {
-				framework.Logf("Note: OS swap was initially enabled, may need manual re-enable")
-			}
+			for _, dev := range preSwapDevices {
+				_, _ = DebugNodeWithNsenter(oc, cnvWorkerNode, "swapon", dev)
+			}

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

Also applies to: 931-938, 1068-1080, 1101-1104

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_swap_cnv.go` around lines 901 - 907, The test
disables host swap with DebugNodeWithNsenter("swapoff", "-a") when
initialHasSwap is true but never restores it; update the test to record
initialHasSwap usage and, in the test teardown/cleanup (or immediately after
test steps), call DebugNodeWithNsenter(oc, cnvWorkerNode, "swapon", "-a") when
initialHasSwap is true (and log/handle errors via framework.Logf or process
logger) so the node's original swap state is restored; reference the existing
DebugNodeWithNsenter call, the initialHasSwap variable, and the g.By("Disabling
existing OS-level swap for test") block to locate where to add the corresponding
re-enable logic.

341-380: ⚠️ Potential issue | 🟠 Major

TC5 control-plane cleanup is not guaranteed on early failure.

If an expectation fails before Lines 375-379, modified drop-ins/directories can remain on control-plane nodes and pollute subsequent tests.

Proposed fix
 		for i, cpNode := range controlPlaneNodes {
 			cpNodeName := cpNode.Name
@@
 			err = createDropInFile(oc, cpNodeName, cnvDropInFilePath, loadConfigFromFile(cnvLimitedSwapConfigPath))
 			o.Expect(err).NotTo(o.HaveOccurred())
 			framework.Logf("Created drop-in file: %s on %s", cnvDropInFilePath, cpNodeName)
+
+			curNode := cpNodeName
+			defer func() {
+				removeDropInFile(oc, curNode, cnvDropInFilePath)
+				_, _ = DebugNodeWithChroot(oc, curNode, "rmdir", cnvDropInDir)
+			}()
@@
-			g.By(fmt.Sprintf("Cleaning up %s", cpNodeName))
-			removeDropInFile(oc, cpNodeName, cnvDropInFilePath)
-			// Also remove the drop-in directory we created on control plane
-			_, _ = DebugNodeWithChroot(oc, cpNodeName, "rmdir", cnvDropInDir)
-			framework.Logf("Removed drop-in directory from control plane node %s", cpNodeName)
 		}

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_swap_cnv.go` around lines 341 - 380, The test can
leave created drop-ins/directories on control-plane nodes if an expectation
fails before the explicit cleanup; after creating the drop-in directory and file
(calls to DebugNodeWithChroot(..., "mkdir", "-p", cnvDropInDir) and
createDropInFile(..., cnvDropInFilePath)), ensure cleanup always runs by
introducing a per-iteration deferred cleanup (use an inner anonymous func so
defer runs each loop) that calls removeDropInFile(oc, cpNodeName,
cnvDropInFilePath) and DebugNodeWithChroot(oc, cpNodeName, "rmdir",
cnvDropInDir); place the defer immediately after successful creation to
guarantee removal even on early failures.

1324-1368: ⚠️ Potential issue | 🟠 Major

TC14 can pass without validating Prometheus/kubelet swap metrics.

Route lookup, token retrieval, Prometheus query failures, and kubelet metrics errors are warning-only, yet final output reports success.

Proposed fix
 		if err != nil || prometheusRoute == "" {
-			framework.Logf("Warning: Could not get Prometheus route: %v", err)
-			framework.Logf("Skipping Prometheus metrics validation")
+			g.Skip(fmt.Sprintf("Prometheus route unavailable for TC14 validation: %v", err))
 		} else {
@@
-			if err != nil {
-				framework.Logf("Warning: Could not get token: %v", err)
-			} else {
+			o.Expect(err).NotTo(o.HaveOccurred(), "Prometheus token retrieval must succeed")
+			{
@@
-				if err == nil {
-					framework.Logf("Prometheus SwapTotal query result: %s", swapTotalResult)
-				} else {
-					framework.Logf("Warning: Prometheus query failed: %v", err)
-				}
+				o.Expect(err).NotTo(o.HaveOccurred(), "Prometheus SwapTotal query must succeed")
+				o.Expect(swapTotalResult).To(o.ContainSubstring(`"status":"success"`))
+				o.Expect(swapTotalResult).NotTo(o.ContainSubstring(`"result":[]`))
@@
-				if err == nil {
-					framework.Logf("Prometheus SwapFree query result: %s", swapFreeResult)
-				} else {
-					framework.Logf("Warning: Prometheus query failed: %v", err)
-				}
+				o.Expect(err).NotTo(o.HaveOccurred(), "Prometheus SwapFree query must succeed")
+				o.Expect(swapFreeResult).To(o.ContainSubstring(`"status":"success"`))
+				o.Expect(swapFreeResult).NotTo(o.ContainSubstring(`"result":[]`))
 			}
 		}
@@
-		if err == nil {
+		o.Expect(err).NotTo(o.HaveOccurred(), "kubelet metrics endpoint must be reachable")
+		if err == nil {
@@
-			if len(swapMetrics) > 0 {
+			o.Expect(len(swapMetrics)).To(o.BeNumerically(">", 0), "Expected swap-related metrics from kubelet endpoint")
+			if len(swapMetrics) > 0 {

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

Also applies to: 1374-1395, 1427-1434

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_swap_cnv.go` around lines 1324 - 1368, The test
currently treats failures to find the Prometheus route, get the bearer token,
and execute Prometheus queries (variables/prometheusRoute, token from
oc.AsAdmin().WithoutNamespace().Run("whoami"), and the exec curl commands
producing swapTotalResult/swapFreeResult) as warnings, allowing TC14 to pass
incorrectly; change these warning-only branches to fail the test instead (use
framework.Failf or framework.ExpectNoError as appropriate) so that if
prometheusRoute is empty or err != nil, token retrieval returns an error, or
either Prometheus query exec returns an error, the test aborts with a clear
failure message; apply the same change pattern to the other similar blocks
referenced (lines handling swapTotalResult/swapFreeResult and the other
occurrences noted).

41-41: ⚠️ Potential issue | 🟠 Major

CNV suite selector won’t pick this spec due to tag mismatch.

This spec is tagged only with [Suite:openshift/disruptive-longrunning], so openshift/nodes/cnv selection won’t include it.

Proposed fix
-var _ = g.Describe("[Jira:Node/Kubelet][sig-node][Feature:NodeSwap][Serial][Disruptive][Suite:openshift/disruptive-longrunning] Kubelet LimitedSwap Drop-in Configuration for CNV", g.Ordered, func() {
+var _ = g.Describe("[Jira:Node/Kubelet][sig-node][Feature:NodeSwap][Serial][Disruptive][Suite:openshift/nodes/cnv][Suite:openshift/disruptive-longrunning] Kubelet LimitedSwap Drop-in Configuration for CNV", g.Ordered, func() {

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_swap_cnv.go` at line 41, The test's Describe
registration string (the var _ = g.Describe(...) line) is missing the CNV suite
tag so the openshift/nodes/cnv selector won't pick it; update the Describe tags
to include the CNV suite (e.g., add "[Suite:openshift/nodes/cnv]" alongside the
existing "[Suite:openshift/disruptive-longrunning]") in the var _ =
g.Describe(...) declaration so the test is included by the CNV suite selector.

75-79: ⚠️ Potential issue | 🟠 Major

Global pre-creation of the drop-in directory invalidates creation/recreation checks.

Pre-populating /etc/openshift/kubelet.conf.d in global setup can make TC1/TC2/TC6 pass even when CNV/MCO behavior regresses.

Proposed fix
-		// Ensure drop-in directory exists on all worker nodes
-		err = ensureDropInDirectoryExists(ctx, oc, cnvDropInDir)
-		if err != nil {
-			framework.Logf("Warning: failed to ensure drop-in directory exists: %v", err)
-		}
+		// Do not pre-create drop-in directory globally.
+		// Keep directory setup local to testcases that explicitly need it.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_swap_cnv.go` around lines 75 - 79, The global
pre-creation of the drop-in directory (cnvDropInDir) via
ensureDropInDirectoryExists in the global setup is invalidating per-test
creation/recreation checks; remove or guard that call so the directory is not
created globally. Update the global setup in node_swap_cnv.go to stop calling
ensureDropInDirectoryExists (or change that helper to only check existence
without creating), and ensure individual tests (the TC1/TC2/TC6 test cases)
explicitly create or verify cnvDropInDir as part of their own setup/assertions
so they exercise CNV/MCO behavior correctly. Ensure references to
ensureDropInDirectoryExists and cnvDropInDir are updated accordingly.
🧹 Nitpick comments (4)
test/extended/node/node_utils.go (4)

563-641: Uninstall always returns nil, obscuring cleanup failures.

The function logs warnings but always returns nil. Callers cannot distinguish between successful cleanup and partial failure. If namespace deletion times out, subsequent test runs may fail unexpectedly.

Consider tracking and returning errors for critical failures:

Suggested improvement
 func uninstallCNVOperator(ctx context.Context, oc *exutil.CLI) error {
 	framework.Logf("Uninstalling CNV operator...")
+	var errs []error

 	dynamicClient := oc.AdminDynamicClient()

 	// ... deletion steps ...

 	// Wait for namespace to be deleted
 	framework.Logf("Waiting for namespace to be deleted...")
-	_ = wait.PollUntilContextTimeout(ctx, 10*time.Second, 10*time.Minute, true, func(ctx context.Context) (bool, error) {
+	if err := wait.PollUntilContextTimeout(ctx, 10*time.Second, 10*time.Minute, true, func(ctx context.Context) (bool, error) {
 		_, err := oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, cnvNamespace, metav1.GetOptions{})
 		if apierrors.IsNotFound(err) {
 			return true, nil
 		}
 		framework.Logf("Waiting for namespace %s to be deleted...", cnvNamespace)
 		return false, nil
-	})
+	}); err != nil {
+		errs = append(errs, fmt.Errorf("namespace deletion timed out: %w", err))
+	}

 	// ... remaining steps ...

 	framework.Logf("CNV operator uninstalled successfully")
+	if len(errs) > 0 {
+		return fmt.Errorf("uninstall completed with errors: %v", errs)
+	}
 	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_utils.go` around lines 563 - 641,
uninstallCNVOperator currently swallows all errors and always returns nil;
change it to collect and return failures from critical steps (deleting
HyperConverged CR, Subscription, CSVs, OperatorGroup, namespace deletion/wait,
unlabelWorkerNodesForCNV, and waitForMCPRolloutComplete) so callers can detect
partial failures. Modify uninstallCNVOperator to accumulate errors (e.g.,
[]error or hashicorp/go-multierror) when dynamicClient.Resource(...).Delete
calls or wait.PollUntilContextTimeout loops return non-notFound errors or
timeouts, and return a consolidated error (or the multierror) at the end instead
of nil; ensure you still log warnings with framework.Logf but propagate the
underlying error from failures in HyperConverged deletion, namespace
deletion/wait, and waitForMCPRolloutComplete so tests can react appropriately.

228-236: Cleanup errors are silently ignored.

Both removeDropInFile and restartKubeletOnNode errors are discarded. If cleanup fails, subsequent tests may run against a misconfigured node. Consider logging warnings at minimum.

Suggested improvement
 func cleanupDropInAndRestartKubelet(ctx context.Context, oc *exutil.CLI, nodeName, filePath string) {
 	framework.Logf("Removing drop-in file: %s", filePath)
-	removeDropInFile(oc, nodeName, filePath)
+	if err := removeDropInFile(oc, nodeName, filePath); err != nil {
+		framework.Logf("Warning: failed to remove drop-in file: %v", err)
+	}
 	framework.Logf("Restarting kubelet on node: %s", nodeName)
-	restartKubeletOnNode(oc, nodeName)
+	if err := restartKubeletOnNode(oc, nodeName); err != nil {
+		framework.Logf("Warning: failed to restart kubelet: %v", err)
+	}
 	framework.Logf("Waiting for node to be ready...")
 	waitForNodeToBeReady(ctx, oc, nodeName)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_utils.go` around lines 228 - 236, The cleanup
function currently discards errors from removeDropInFile and
restartKubeletOnNode; update error handling so failures are logged and
propagated: change removeDropInFile and restartKubeletOnNode to return error (if
they don't already), call them from cleanupDropInAndRestartKubelet and capture
their returned errors, emit framework.Logf or framework.Errorf warnings that
include the nodeName, filePath and the error value, and if either fails consider
returning the error from cleanupDropInAndRestartKubelet (or at minimum skip
waitForNodeToBeReady when restartKubeletOnNode failed) so callers can react.
Ensure you reference the functions cleanupDropInAndRestartKubelet,
removeDropInFile, restartKubeletOnNode and waitForNodeToBeReady when making
changes.

177-184: Shell escaping may be insufficient for content with $, backticks, or backslashes.

The current escaping only handles single quotes. If content contains $VAR, backticks, or backslashes, they will be interpreted by the shell. For kubelet YAML configs this is unlikely to be a problem, but consider using cat <<'EOF' heredoc or base64 encoding for robustness.

More robust alternative using heredoc
 func createDropInFile(oc *exutil.CLI, nodeName, filePath, content string) error {
-	// Escape content for shell
-	escapedContent := strings.ReplaceAll(content, "'", "'\\''")
-	cmd := fmt.Sprintf("echo '%s' > %s && chmod 644 %s", escapedContent, filePath, filePath)
-	_, err := DebugNodeWithChroot(oc, nodeName, "sh", "-c", cmd)
+	// Use heredoc with quoted delimiter to prevent variable expansion
+	cmd := fmt.Sprintf("cat <<'EOF' > %s && chmod 644 %s\n%s\nEOF", filePath, filePath, content)
+	_, err := DebugNodeWithChroot(oc, nodeName, "sh", "-c", cmd)
 	return err
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_utils.go` around lines 177 - 184, The
createDropInFile function's current echo-based escaping is insufficient for
content containing $, backticks, or backslashes; change it to write the content
via a single-quoted heredoc (e.g., use cat <<'EOF' > filePath ... EOF) or by
base64-encoding content and decoding on the node, so the shell won't interpret
special characters. Update the command built for DebugNodeWithChroot to use the
heredoc or base64 decode approach, ensure file permissions are set afterwards
(chmod 644), and keep references to createDropInFile and DebugNodeWithChroot so
reviewers can find and verify the change.

643-658: ensure function returns nil even when directory creation fails.

The function name suggests it guarantees the directory exists, but it silently continues on failure. Callers relying on this guarantee may encounter unexpected errors later.

Suggested fix
 func ensureDropInDirectoryExists(ctx context.Context, oc *exutil.CLI, dirPath string) error {
 	nodes, err := getNodesByLabel(ctx, oc, "node-role.kubernetes.io/worker")
 	if err != nil {
 		return fmt.Errorf("failed to get worker nodes: %w", err)
 	}

+	var lastErr error
 	for _, node := range nodes {
 		_, err := DebugNodeWithChroot(oc, node.Name, "mkdir", "-p", dirPath)
 		if err != nil {
 			framework.Logf("Warning: failed to create directory on node %s: %v", node.Name, err)
+			lastErr = err
 		}
 	}

-	return nil
+	return lastErr
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_utils.go` around lines 643 - 658, The
ensureDropInDirectoryExists function currently logs mkdir failures and returns
nil, violating its "ensure" contract; update it to collect and return an error
if any DebugNodeWithChroot call fails (e.g., aggregate errors or return the
first error) so callers know the directory creation did not complete on all
nodes; modify ensureDropInDirectoryExists to use the existing getNodesByLabel
and DebugNodeWithChroot calls but when err != nil record that error (including
node.Name and the original err) and after the loop return a combined error (or
the first non-nil) instead of nil.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/extended/node/node_swap_cnv.go`:
- Around line 1110-1115: The error handlers that currently log a warning and
"continue" when dd/mkswap/swapon fail (inside the swap-size setup blocks in
node_swap_cnv.go) allow TC13 to pass despite setup failure; change these
handlers so the test run fails immediately: when err != nil after the swap
commands, set result.success = false, append result to results, and then stop
the test flow (e.g., return the error or break out to the caller so the testcase
is marked failed) instead of continuing; apply the same fix to the three
identical blocks that handle dd/mkswap/swapon failures (the places updating the
result variable and appending to results).
- Around line 629-635: The check that inspects node labels (using
oc.AsAdmin().WithoutNamespace().Run("get").Args("node", nonCNVWorkerNode, "-o",
"jsonpath={.metadata.labels}").Output()) only logs a warning when the
kubevirt.io/schedulable label is present; change this to fail the test instead
so the non-CNV precondition is enforced—replace the warning branch (where
strings.Contains(output, "kubevirt.io/schedulable")) with a call that aborts the
test (e.g., framework.Failf or equivalent test-fail helper) with a clear message
referencing nonCNVWorkerNode and the label output; keep the else branch’s
confirmed log as-is.
- Around line 485-491: The test collects fileOwnership via DebugNodeWithChroot
into the local variable fileOwnership but never asserts it; add an expectation
asserting the ownership equals the intended value (e.g. "root:root" or the
appropriate owner for this test) by using the test framework's matcher
(reference the fileOwnership variable and the DebugNodeWithChroot call), so
replace the silent read with an assertion like
o.Expect(fileOwnership).To(Equal("<expected-owner>")) to fail the test on
ownership regressions.

---

Duplicate comments:
In `@test/extended/node/node_swap_cnv.go`:
- Around line 901-907: The test disables host swap with
DebugNodeWithNsenter("swapoff", "-a") when initialHasSwap is true but never
restores it; update the test to record initialHasSwap usage and, in the test
teardown/cleanup (or immediately after test steps), call
DebugNodeWithNsenter(oc, cnvWorkerNode, "swapon", "-a") when initialHasSwap is
true (and log/handle errors via framework.Logf or process logger) so the node's
original swap state is restored; reference the existing DebugNodeWithNsenter
call, the initialHasSwap variable, and the g.By("Disabling existing OS-level
swap for test") block to locate where to add the corresponding re-enable logic.
- Around line 341-380: The test can leave created drop-ins/directories on
control-plane nodes if an expectation fails before the explicit cleanup; after
creating the drop-in directory and file (calls to DebugNodeWithChroot(...,
"mkdir", "-p", cnvDropInDir) and createDropInFile(..., cnvDropInFilePath)),
ensure cleanup always runs by introducing a per-iteration deferred cleanup (use
an inner anonymous func so defer runs each loop) that calls removeDropInFile(oc,
cpNodeName, cnvDropInFilePath) and DebugNodeWithChroot(oc, cpNodeName, "rmdir",
cnvDropInDir); place the defer immediately after successful creation to
guarantee removal even on early failures.
- Around line 1324-1368: The test currently treats failures to find the
Prometheus route, get the bearer token, and execute Prometheus queries
(variables/prometheusRoute, token from
oc.AsAdmin().WithoutNamespace().Run("whoami"), and the exec curl commands
producing swapTotalResult/swapFreeResult) as warnings, allowing TC14 to pass
incorrectly; change these warning-only branches to fail the test instead (use
framework.Failf or framework.ExpectNoError as appropriate) so that if
prometheusRoute is empty or err != nil, token retrieval returns an error, or
either Prometheus query exec returns an error, the test aborts with a clear
failure message; apply the same change pattern to the other similar blocks
referenced (lines handling swapTotalResult/swapFreeResult and the other
occurrences noted).
- Line 41: The test's Describe registration string (the var _ = g.Describe(...)
line) is missing the CNV suite tag so the openshift/nodes/cnv selector won't
pick it; update the Describe tags to include the CNV suite (e.g., add
"[Suite:openshift/nodes/cnv]" alongside the existing
"[Suite:openshift/disruptive-longrunning]") in the var _ = g.Describe(...)
declaration so the test is included by the CNV suite selector.
- Around line 75-79: The global pre-creation of the drop-in directory
(cnvDropInDir) via ensureDropInDirectoryExists in the global setup is
invalidating per-test creation/recreation checks; remove or guard that call so
the directory is not created globally. Update the global setup in
node_swap_cnv.go to stop calling ensureDropInDirectoryExists (or change that
helper to only check existence without creating), and ensure individual tests
(the TC1/TC2/TC6 test cases) explicitly create or verify cnvDropInDir as part of
their own setup/assertions so they exercise CNV/MCO behavior correctly. Ensure
references to ensureDropInDirectoryExists and cnvDropInDir are updated
accordingly.

---

Nitpick comments:
In `@test/extended/node/node_utils.go`:
- Around line 563-641: uninstallCNVOperator currently swallows all errors and
always returns nil; change it to collect and return failures from critical steps
(deleting HyperConverged CR, Subscription, CSVs, OperatorGroup, namespace
deletion/wait, unlabelWorkerNodesForCNV, and waitForMCPRolloutComplete) so
callers can detect partial failures. Modify uninstallCNVOperator to accumulate
errors (e.g., []error or hashicorp/go-multierror) when
dynamicClient.Resource(...).Delete calls or wait.PollUntilContextTimeout loops
return non-notFound errors or timeouts, and return a consolidated error (or the
multierror) at the end instead of nil; ensure you still log warnings with
framework.Logf but propagate the underlying error from failures in
HyperConverged deletion, namespace deletion/wait, and waitForMCPRolloutComplete
so tests can react appropriately.
- Around line 228-236: The cleanup function currently discards errors from
removeDropInFile and restartKubeletOnNode; update error handling so failures are
logged and propagated: change removeDropInFile and restartKubeletOnNode to
return error (if they don't already), call them from
cleanupDropInAndRestartKubelet and capture their returned errors, emit
framework.Logf or framework.Errorf warnings that include the nodeName, filePath
and the error value, and if either fails consider returning the error from
cleanupDropInAndRestartKubelet (or at minimum skip waitForNodeToBeReady when
restartKubeletOnNode failed) so callers can react. Ensure you reference the
functions cleanupDropInAndRestartKubelet, removeDropInFile, restartKubeletOnNode
and waitForNodeToBeReady when making changes.
- Around line 177-184: The createDropInFile function's current echo-based
escaping is insufficient for content containing $, backticks, or backslashes;
change it to write the content via a single-quoted heredoc (e.g., use cat
<<'EOF' > filePath ... EOF) or by base64-encoding content and decoding on the
node, so the shell won't interpret special characters. Update the command built
for DebugNodeWithChroot to use the heredoc or base64 decode approach, ensure
file permissions are set afterwards (chmod 644), and keep references to
createDropInFile and DebugNodeWithChroot so reviewers can find and verify the
change.
- Around line 643-658: The ensureDropInDirectoryExists function currently logs
mkdir failures and returns nil, violating its "ensure" contract; update it to
collect and return an error if any DebugNodeWithChroot call fails (e.g.,
aggregate errors or return the first error) so callers know the directory
creation did not complete on all nodes; modify ensureDropInDirectoryExists to
use the existing getNodesByLabel and DebugNodeWithChroot calls but when err !=
nil record that error (including node.Name and the original err) and after the
loop return a combined error (or the first non-nil) instead of nil.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ceac0a87-6391-4da1-9ff1-38ae17cad17d

📥 Commits

Reviewing files that changed from the base of the PR and between 8a3bfd6 and d37aed4.

📒 Files selected for processing (11)
  • pkg/testsuites/standard_suites.go
  • test/extended/node/node_swap_cnv.go
  • test/extended/node/node_utils.go
  • test/extended/testdata/bindata.go
  • test/extended/testdata/node/cnv-swap/README.md
  • test/extended/testdata/node/cnv-swap/cnv-hyperconverged.yaml
  • test/extended/testdata/node/cnv-swap/cnv-namespace.yaml
  • test/extended/testdata/node/cnv-swap/cnv-operatorgroup.yaml
  • test/extended/testdata/node/cnv-swap/cnv-subscription.yaml
  • test/extended/testdata/node/cnv-swap/kubelet-limitedswap-dropin.yaml
  • test/extended/testdata/node/cnv-swap/kubelet-noswap-dropin.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
  • test/extended/testdata/node/cnv-swap/cnv-namespace.yaml
  • test/extended/testdata/node/cnv-swap/cnv-hyperconverged.yaml
  • test/extended/testdata/node/cnv-swap/kubelet-limitedswap-dropin.yaml

Comment on lines +485 to +491
g.By("Verifying config file ownership")
framework.Logf("Running: stat -c %%U:%%G %s", testFile)
output, err = DebugNodeWithChroot(oc, cnvWorkerNode, "stat", "-c", "%U:%G", testFile)
o.Expect(err).NotTo(o.HaveOccurred())
fileOwnership := strings.TrimSpace(output)
framework.Logf("File ownership: %s", fileOwnership)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

TC7 reads file ownership but never asserts it.

This allows ownership regressions to pass silently.

Proposed fix
 		fileOwnership := strings.TrimSpace(output)
 		framework.Logf("File ownership: %s", fileOwnership)
+		o.Expect(fileOwnership).To(o.Equal("root:root"),
+			"Drop-in file ownership should be root:root")

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
g.By("Verifying config file ownership")
framework.Logf("Running: stat -c %%U:%%G %s", testFile)
output, err = DebugNodeWithChroot(oc, cnvWorkerNode, "stat", "-c", "%U:%G", testFile)
o.Expect(err).NotTo(o.HaveOccurred())
fileOwnership := strings.TrimSpace(output)
framework.Logf("File ownership: %s", fileOwnership)
g.By("Verifying config file ownership")
framework.Logf("Running: stat -c %%U:%%G %s", testFile)
output, err = DebugNodeWithChroot(oc, cnvWorkerNode, "stat", "-c", "%U:%G", testFile)
o.Expect(err).NotTo(o.HaveOccurred())
fileOwnership := strings.TrimSpace(output)
framework.Logf("File ownership: %s", fileOwnership)
o.Expect(fileOwnership).To(o.Equal("root:root"),
"Drop-in file file ownership should be root:root")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_swap_cnv.go` around lines 485 - 491, The test
collects fileOwnership via DebugNodeWithChroot into the local variable
fileOwnership but never asserts it; add an expectation asserting the ownership
equals the intended value (e.g. "root:root" or the appropriate owner for this
test) by using the test framework's matcher (reference the fileOwnership
variable and the DebugNodeWithChroot call), so replace the silent read with an
assertion like o.Expect(fileOwnership).To(Equal("<expected-owner>")) to fail the
test on ownership regressions.

Comment on lines +629 to +635
framework.Logf("Running: oc get node %s --show-labels | grep kubevirt", nonCNVWorkerNode)
output, _ := oc.AsAdmin().WithoutNamespace().Run("get").Args("node", nonCNVWorkerNode, "-o", "jsonpath={.metadata.labels}").Output()
if strings.Contains(output, "kubevirt.io/schedulable") {
framework.Logf("Warning: Node still has CNV label in labels: %s", output)
} else {
framework.Logf("Confirmed: Node %s no longer has kubevirt.io/schedulable label", nonCNVWorkerNode)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

TC9 does not enforce the non-CNV precondition.

If the label is still present, the test only logs a warning and continues against a CNV node.

Proposed fix
-		output, _ := oc.AsAdmin().WithoutNamespace().Run("get").Args("node", nonCNVWorkerNode, "-o", "jsonpath={.metadata.labels}").Output()
-		if strings.Contains(output, "kubevirt.io/schedulable") {
-			framework.Logf("Warning: Node still has CNV label in labels: %s", output)
-		} else {
-			framework.Logf("Confirmed: Node %s no longer has kubevirt.io/schedulable label", nonCNVWorkerNode)
-		}
+		output, err := oc.AsAdmin().WithoutNamespace().Run("get").Args("node", nonCNVWorkerNode, "-o", "jsonpath={.metadata.labels}").Output()
+		o.Expect(err).NotTo(o.HaveOccurred())
+		o.Expect(output).NotTo(o.ContainSubstring("kubevirt.io/schedulable"),
+			"Node must not have CNV schedulable label for this testcase")

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_swap_cnv.go` around lines 629 - 635, The check that
inspects node labels (using
oc.AsAdmin().WithoutNamespace().Run("get").Args("node", nonCNVWorkerNode, "-o",
"jsonpath={.metadata.labels}").Output()) only logs a warning when the
kubevirt.io/schedulable label is present; change this to fail the test instead
so the non-CNV precondition is enforced—replace the warning branch (where
strings.Contains(output, "kubevirt.io/schedulable")) with a call that aborts the
test (e.g., framework.Failf or equivalent test-fail helper) with a clear message
referencing nonCNVWorkerNode and the label output; keep the else branch’s
confirmed log as-is.

Comment on lines +1110 to +1115
if err != nil {
framework.Logf("Warning: Failed to create swap file: %v", err)
result.success = false
results = append(results, result)
continue
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

TC13 can pass even when swap-size setup fails.

dd/mkswap/swapon failures are downgraded to warnings plus continue, so a scenario may be untested without failing the testcase.

Proposed fix
+		var setupFailures []string
@@
 			if err != nil {
-				framework.Logf("Warning: Failed to create swap file: %v", err)
+				setupFailures = append(setupFailures, fmt.Sprintf("%s: dd failed: %v", swapSize.name, err))
 				result.success = false
 				results = append(results, result)
 				continue
 			}
@@
 			if err != nil {
-				framework.Logf("Warning: Failed to mkswap: %v", err)
+				setupFailures = append(setupFailures, fmt.Sprintf("%s: mkswap failed: %v", swapSize.name, err))
 				result.success = false
 				results = append(results, result)
 				continue
 			}
@@
 			if err != nil {
-				framework.Logf("Warning: Failed to enable swap: %v", err)
+				setupFailures = append(setupFailures, fmt.Sprintf("%s: swapon failed: %v", swapSize.name, err))
 				result.success = false
 				results = append(results, result)
 				continue
 			}
@@
+		o.Expect(setupFailures).To(o.BeEmpty(), "All swap-size scenarios must be exercised successfully")

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

Also applies to: 1122-1127, 1131-1136

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/node/node_swap_cnv.go` around lines 1110 - 1115, The error
handlers that currently log a warning and "continue" when dd/mkswap/swapon fail
(inside the swap-size setup blocks in node_swap_cnv.go) allow TC13 to pass
despite setup failure; change these handlers so the test run fails immediately:
when err != nil after the swap commands, set result.success = false, append
result to results, and then stop the test flow (e.g., return the error or break
out to the caller so the testcase is marked failed) instead of continuing; apply
the same fix to the three identical blocks that handle dd/mkswap/swapon failures
(the places updating the result variable and appending to results).

@openshift-ci-robot
Copy link
Copy Markdown

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@BhargaviGudi
Copy link
Copy Markdown
Contributor Author

/test e2e-gcp-csi
/test e2e-aws-ovn-fips

@cpmeadors
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 20, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 20, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BhargaviGudi, cpmeadors, smg247

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@BhargaviGudi
Copy link
Copy Markdown
Contributor Author

/test e2e-metal-ipi-ovn-ipv6

@BhargaviGudi
Copy link
Copy Markdown
Contributor Author

/verified by CI-Job

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 23, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@BhargaviGudi: This PR has been marked as verified by CI-Job.

Details

In response to this:

/verified by CI-Job

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@BhargaviGudi
Copy link
Copy Markdown
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 24, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 24, 2026

@BhargaviGudi: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 02f8514 into openshift:main Mar 24, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants