OCPNODE-4381: Migrate OCP-38271 from openshift-tests-private#30960
OCPNODE-4381: Migrate OCP-38271 from openshift-tests-private#30960BhargaviGudi wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughReplaced immediate MicroShift detection with a retrying check that fails after retries; added a new Ginkgo suite and an OCP-38271 test that creates a Pod with an init container, removes the init container via CRI on the node, and verifies the init container does not restart. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@BhargaviGudi: This pull request references OCPNODE-4381 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
cba718d to
b05cf8a
Compare
|
/test verify |
f9f4ebe to
44944db
Compare
|
@BhargaviGudi: This pull request references OCPNODE-4381 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/verified by @BhargaviGudi |
|
@BhargaviGudi: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@BhargaviGudi: This pull request references OCPNODE-4381 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
44944db to
3c8a2c2
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
test/extended/node/node_e2e/node.go (2)
151-151: Use absolute path/bin/shinstead of relativebin/sh.The command uses a relative path
bin/shwhich only works because the container's default working directory is/. Using the absolute path/bin/shis the standard convention and avoids potential issues ifWorkingDiris ever specified.♻️ Suggested fix
- Command: []string{"bin/sh", "-ec", "echo running >> /mnt/data/test"}, + Command: []string{"/bin/sh", "-ec", "echo running >> /mnt/data/test"},- Command: []string{"bin/sh", "-c", "sleep 3600"}, + Command: []string{"/bin/sh", "-c", "sleep 3600"},Also applies to: 164-164
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/node/node_e2e/node.go` at line 151, Replace the relative shell path in the container Command arrays with the absolute path; specifically update the Command entry in the container spec(s) in node.go (the Command: []string{"bin/sh", "-ec", ...} occurrences) to use "/bin/sh" instead of "bin/sh" (apply the same change to the other occurrence mentioned).
116-116: Remove unnecessarydefer g.GinkgoRecover().
GinkgoRecover()is designed for recovering panics in goroutines spawned within tests. At the Describe level, this defer executes when the Describe function returns (before any tests run), serving no purpose. Ginkgo already has built-in panic recovery for test blocks.♻️ Suggested fix
var _ = g.Describe("[sig-node] [Jira:Node/Kubelet] NODE initContainer policy,volume,readines,quota", func() { - defer g.GinkgoRecover() - var (🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/node/node_e2e/node.go` at line 116, Remove the unnecessary defer g.GinkgoRecover() call from the Describe-level scope: the defer is only useful for recovering panics in goroutines spawned inside test blocks, and at the Describe level it runs too early and is redundant since Ginkgo already handles test panic recovery. Locate the defer g.GinkgoRecover() invocation (inside the Describe/registration setup in node.go) and delete that single line so no external panic recovery is deferred at describe initialization.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended/node/node_e2e/node.go`:
- Around line 241-258: The test currently returns success as soon as
RestartCount==0, which allows a race; change the wait.Poll predicate so it does
NOT return success when RestartCount==0 but instead keeps polling for the full
duration: inside the func passed to wait.Poll iterate
pod.Status.InitContainerStatuses for the "inittest" entry and if RestartCount>0
immediately return true, fmt.Errorf("init container restarted"); otherwise
return false, nil so polling continues; after wait.Poll, assert that the error
is a timeout (wait.ErrWaitTimeout) and fail if err is nil or a different error
(i.e. a restart was detected) — update the expectation that currently uses
o.Expect(err).NotTo(o.HaveOccurred()) accordingly.
---
Nitpick comments:
In `@test/extended/node/node_e2e/node.go`:
- Line 151: Replace the relative shell path in the container Command arrays with
the absolute path; specifically update the Command entry in the container
spec(s) in node.go (the Command: []string{"bin/sh", "-ec", ...} occurrences) to
use "/bin/sh" instead of "bin/sh" (apply the same change to the other occurrence
mentioned).
- Line 116: Remove the unnecessary defer g.GinkgoRecover() call from the
Describe-level scope: the defer is only useful for recovering panics in
goroutines spawned inside test blocks, and at the Describe level it runs too
early and is redundant since Ginkgo already handles test panic recovery. Locate
the defer g.GinkgoRecover() invocation (inside the Describe/registration setup
in node.go) and delete that single line so no external panic recovery is
deferred at describe initialization.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9d5d9835-9727-4042-b00a-649da2b71dd8
📒 Files selected for processing (1)
test/extended/node/node_e2e/node.go
|
Scheduling required tests: |
|
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: 3c8a2c2
New tests seen in this PR at sha: 3c8a2c2
|
|
/jira refresh |
|
@cpmeadors: This pull request references OCPNODE-4381 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
3c8a2c2 to
8817055
Compare
|
Scheduling required tests: |
|
@BhargaviGudi: This pull request references OCPNODE-4381 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Ran testcase manually. |
|
@BhargaviGudi: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
8817055 to
0fcaf4b
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
test/extended/node/node_e2e/node.go (2)
170-171:defer g.GinkgoRecover()atDescribelevel is unnecessary.
GinkgoRecoveris designed to catch panics in goroutines spawned within tests. At theDescribeblock level, Ginkgo already provides built-in panic recovery. This defer statement has no practical effect here and can be removed.♻️ Suggested removal
var _ = g.Describe("[sig-node] [Jira:Node/Kubelet] NODE initContainer policy,volume,readiness,quota", func() { - defer g.GinkgoRecover() - var ( oc = exutil.NewCLI("node-initcontainer") )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/node/node_e2e/node.go` around lines 170 - 171, Remove the unnecessary panic-recovery defer at the Describe block: delete the defer g.GinkgoRecover() call that immediately follows the g.Describe("[sig-node] ...", func() { declaration; keep the Describe block and its body intact (the symbol to change is the g.Describe(...) anonymous func containing defer g.GinkgoRecover()) so Ginkgo’s built-in panic recovery handles panics instead of the redundant defer.
31-53: Outer retry may be unnecessary givenIsMicroShiftClusteralready retries internally.Per the implementation in
test/extended/util/framework.go,IsMicroShiftClusteralready contains an internalwait.PollUntilContextTimeoutthat retries for up to 5 minutes and always returns(bool, nil)— it does not propagate errors. The outer 30-second retry loop here will typically complete on the first iteration.This defensive approach doesn't cause functional issues, but you may consider simplifying if you're confident in the internal retry behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/node/node_e2e/node.go` around lines 31 - 53, The outer wait.Poll retry around exutil.IsMicroShiftCluster in the g.BeforeEach is redundant because IsMicroShiftCluster already retries internally; remove the wait.Poll block, call exutil.IsMicroShiftCluster(oc.AdminKubeClient()) directly to set isMicroShift, drop pollErr and the extra err handling/logging tied to that poll, and then keep the existing check that calls g.Skip("Skipping test on MicroShift cluster...") when isMicroShift is true (adjust handling of the returned error from IsMicroShiftCluster according to its signature in test/extended/util/framework.go).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended/node/node_e2e/node.go`:
- Line 222: Replace the relative shell path used in the container Command fields
from "bin/sh" to the absolute "/bin/sh" to ensure reliable execution; update the
Command entries that currently read Command: []string{"bin/sh", "-ec", "echo
running >> /mnt/data/test"} (and the similar occurrence later in the file) so
they use "/bin/sh" instead.
---
Nitpick comments:
In `@test/extended/node/node_e2e/node.go`:
- Around line 170-171: Remove the unnecessary panic-recovery defer at the
Describe block: delete the defer g.GinkgoRecover() call that immediately follows
the g.Describe("[sig-node] ...", func() { declaration; keep the Describe block
and its body intact (the symbol to change is the g.Describe(...) anonymous func
containing defer g.GinkgoRecover()) so Ginkgo’s built-in panic recovery handles
panics instead of the redundant defer.
- Around line 31-53: The outer wait.Poll retry around exutil.IsMicroShiftCluster
in the g.BeforeEach is redundant because IsMicroShiftCluster already retries
internally; remove the wait.Poll block, call
exutil.IsMicroShiftCluster(oc.AdminKubeClient()) directly to set isMicroShift,
drop pollErr and the extra err handling/logging tied to that poll, and then keep
the existing check that calls g.Skip("Skipping test on MicroShift cluster...")
when isMicroShift is true (adjust handling of the returned error from
IsMicroShiftCluster according to its signature in
test/extended/util/framework.go).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 870064cf-6048-4013-8e0e-0ff27e627c94
📒 Files selected for processing (1)
test/extended/node/node_e2e/node.go
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BhargaviGudi, cpmeadors The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
0fcaf4b to
7c17571
Compare
|
New changes are detected. LGTM label has been removed. |
|
Scheduling required tests: |
|
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: 7c17571
New tests seen in this PR at sha: 7c17571
|
|
/test e2e-aws-ovn-fips |
7c17571 to
ece1818
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/extended/node/node_e2e/node.go (1)
306-308: Avoidbash -cforcrictl rmLine 307 interpolates
actualContainerIDinto a shell command even thoughnodeutils.ExecOnNodeWithChrootalready accepts argv. Passing the arguments directly is safer and avoids quoting issues in this privileged path.Suggested change
- deleteCmd := fmt.Sprintf("crictl rm %s", actualContainerID) - output, err := nodeutils.ExecOnNodeWithChroot(oc, nodeName, "/bin/bash", "-c", deleteCmd) + output, err := nodeutils.ExecOnNodeWithChroot(oc, nodeName, "crictl", "rm", actualContainerID)
sprintfthen drops out as well.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/node/node_e2e/node.go` around lines 306 - 308, The test currently builds a shell string via fmt.Sprintf (deleteCmd) and invokes nodeutils.ExecOnNodeWithChroot("/bin/bash", "-c", deleteCmd), which risks quoting issues and is unnecessary because ExecOnNodeWithChroot accepts argv; change the call to pass crictl and the container ID as separate arguments (e.g., nodeutils.ExecOnNodeWithChroot(oc, nodeName, "crictl", "rm", actualContainerID)), remove the fmt.Sprintf/deleteCmd variable and the "/bin/bash", "-c" wrapper, and handle the returned output and err the same way.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended/node/node_e2e/node.go`:
- Around line 35-48: The outer retry using wait.Poll around
exutil.IsMicroShiftCluster is ineffective because IsMicroShiftCluster already
polls internally and suppresses timeouts; remove the outer wait.Poll and call
exutil.IsMicroShiftCluster(oc.AdminKubeClient()) directly, then check its
returned err and isMicroShift to decide failure (use pollErr variable logic but
replace it with the direct call result), or alternatively modify
exutil.IsMicroShiftCluster to return an error on timeout so the outer wait.Poll
can meaningfully retry; apply the same change to the duplicate occurrence that
references the same functions/variables (exutil.IsMicroShiftCluster, wait.Poll,
isMicroShift, pollErr).
---
Nitpick comments:
In `@test/extended/node/node_e2e/node.go`:
- Around line 306-308: The test currently builds a shell string via fmt.Sprintf
(deleteCmd) and invokes nodeutils.ExecOnNodeWithChroot("/bin/bash", "-c",
deleteCmd), which risks quoting issues and is unnecessary because
ExecOnNodeWithChroot accepts argv; change the call to pass crictl and the
container ID as separate arguments (e.g., nodeutils.ExecOnNodeWithChroot(oc,
nodeName, "crictl", "rm", actualContainerID)), remove the fmt.Sprintf/deleteCmd
variable and the "/bin/bash", "-c" wrapper, and handle the returned output and
err the same way.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7dbd6b05-b92d-4a2c-9598-dfb720e118b7
📒 Files selected for processing (1)
test/extended/node/node_e2e/node.go
| // Retry check for robustness - OpenShift should eventually respond | ||
| pollErr := wait.Poll(2*time.Second, 30*time.Second, func() (bool, error) { | ||
| isMicroShift, err = exutil.IsMicroShiftCluster(oc.AdminKubeClient()) | ||
| if err != nil { | ||
| e2e.Logf("Failed to check if cluster is MicroShift: %v, retrying...", err) | ||
| return false, nil | ||
| } | ||
| return true, nil | ||
| }) | ||
|
|
||
| if pollErr != nil { | ||
| e2e.Logf("Setup failed: unable to determine if cluster is MicroShift after retries: %v", err) | ||
| g.Fail("Setup failed: unable to determine cluster type - this is an infrastructure/connectivity issue, not a test failure") | ||
| } |
There was a problem hiding this comment.
This MicroShift retry wrapper never actually retries or fails setup
Line 37 / Line 184 call exutil.IsMicroShiftCluster, but that helper already does its own 5-minute poll and always returns err == nil even on timeout (test/extended/util/framework.go:2327-2358). That means this outer 30-second wait.Poll can still block for ~5 minutes per call, then return success with isMicroShift == false, so Line 45 / Line 192's setup-failure path is effectively dead.
Either call the helper once, or change the helper to surface API failures and own the retry there; in its current form this adds long setup stalls without giving the clearer setup-vs-test failure behavior the PR is trying to introduce.
Also applies to: 182-195
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/extended/node/node_e2e/node.go` around lines 35 - 48, The outer retry
using wait.Poll around exutil.IsMicroShiftCluster is ineffective because
IsMicroShiftCluster already polls internally and suppresses timeouts; remove the
outer wait.Poll and call exutil.IsMicroShiftCluster(oc.AdminKubeClient())
directly, then check its returned err and isMicroShift to decide failure (use
pollErr variable logic but replace it with the direct call result), or
alternatively modify exutil.IsMicroShiftCluster to return an error on timeout so
the outer wait.Poll can meaningfully retry; apply the same change to the
duplicate occurrence that references the same functions/variables
(exutil.IsMicroShiftCluster, wait.Poll, isMicroShift, pollErr).
|
Scheduling required tests: |
Add test to verify init containers do not restart when removed from node. - Add pod-initContainer.yaml template - Add helper functions in node_utils.go - Add OCP-38271 test in node_e2e/node.go Author: minmli@redhat.com (original) Migrated-by: bgudi@redhat.com Move OCP-38271 test to separate Describe block Refactor OCP-38271 to use standard origin patterns instead of compat_otp Fix race condition and Add retry logic and explicit failure for MicroShift cluster check Resolved typo issue Resolved gofmt issue Missing leading slash in shell command path Use direct crictl args instead of bash -c
ece1818 to
f68b696
Compare
|
Scheduling required tests: |
|
@BhargaviGudi: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: f68b696
New tests seen in this PR at sha: f68b696
|
Summary Adds test to verify init containers do not restart when the exited init container is removed from the node.
Changes
test/extended/testdata/node/node_e2e/pod-initContainer.yaml- Pod template with init containertest/extended/node/node_utils.go:PodInitConDescriptionstruct with methods:Create(),Delete(),ContainerExit(),DeleteInitContainer(),InitContainerNotRestart()PodStatus()function to check pod readinesstest/extended/node/node_e2e/node.go:[OTP] Init containers should not restart when the exited init container is removed from node [OCP-38271]Original author: minmli@redhat.com
Migrated by: bgudi@redhat.com
tested manually on 4.22.0-0.nightly-2026-04-06-051707