Skip to content

HYPERFLEET-625: fix integration test job hanging on test failure#74635

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
rafabene:HYPERFLEET-625
Feb 10, 2026
Merged

HYPERFLEET-625: fix integration test job hanging on test failure#74635
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
rafabene:HYPERFLEET-625

Conversation

@rafabene
Copy link
Contributor

@rafabene rafabene commented Feb 9, 2026

Summary

  • Replace linear podman cleanup with trap EXIT in the hyperfleet-api integration test job
  • Ensures the podman system service background process is always killed when the script exits, even if make test-integration hangs due to a test panic

Problem

When a Go integration test panics, the test process can hang during teardown (e.g., waiting for testcontainer cleanup, HTTP server shutdown). This prevents make test-integration from exiting, which means the script never reaches the kill $PODMAN_PID cleanup line. The podman system service continues running indefinitely, keeping the Prow pod in "pending" state and never reporting the failure back to GitHub.

Fix

Add a bash trap EXIT handler immediately after starting the podman service. The EXIT trap fires automatically when the shell process terminates for any reason (normal exit, error, signal), guaranteeing the podman service is always cleaned up.

Before

podman system service --time=0 &
PODMAN_PID=$!
sleep 2
# ...
make test-integration

# Cleanup only reached if make exits:
TEST_EXIT_CODE=$?
kill $PODMAN_PID 2>/dev/null || true
wait $PODMAN_PID 2>/dev/null || true
exit $TEST_EXIT_CODE

After

podman system service --time=0 &
PODMAN_PID=$!
trap 'kill $PODMAN_PID 2>/dev/null || true; wait $PODMAN_PID 2>/dev/null || true' EXIT
sleep 2
# ...
make test-integration
# trap EXIT fires automatically, podman service is always cleaned up

Test plan

  • CI job completes normally when all tests pass
  • CI job reports failure (not hang) when a test panics
  • Podman service is properly cleaned up in both cases

Replace linear podman cleanup with trap EXIT to ensure the podman
system service is always stopped when the script exits. Without
this, if make test-integration hangs (e.g. due to a test panic
leaving the Go process in teardown), the kill $PODMAN_PID line
is never reached and the podman service keeps the pod alive
indefinitely.
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 9, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 9, 2026

@rafabene: This pull request references HYPERFLEET-625 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Summary

  • Replace linear podman cleanup with trap EXIT in the hyperfleet-api integration test job
  • Ensures the podman system service background process is always killed when the script exits, even if make test-integration hangs due to a test panic

Problem

When a Go integration test panics, the test process can hang during teardown (e.g., waiting for testcontainer cleanup, HTTP server shutdown). This prevents make test-integration from exiting, which means the script never reaches the kill $PODMAN_PID cleanup line. The podman system service continues running indefinitely, keeping the Prow pod in "pending" state and never reporting the failure back to GitHub.

Fix

Add a bash trap EXIT handler immediately after starting the podman service. The EXIT trap fires automatically when the shell process terminates for any reason (normal exit, error, signal), guaranteeing the podman service is always cleaned up.

Before

podman system service --time=0 &
PODMAN_PID=$!
sleep 2
# ...
make test-integration

# Cleanup only reached if make exits:
TEST_EXIT_CODE=$?
kill $PODMAN_PID 2>/dev/null || true
wait $PODMAN_PID 2>/dev/null || true
exit $TEST_EXIT_CODE

After

podman system service --time=0 &
PODMAN_PID=$!
trap 'kill $PODMAN_PID 2>/dev/null || true; wait $PODMAN_PID 2>/dev/null || true' EXIT
sleep 2
# ...
make test-integration
# trap EXIT fires automatically, podman service is always cleaned up

Test plan

  • CI job completes normally when all tests pass
  • CI job reports failure (not hang) when a test panics
  • Podman service is properly cleaned up in both cases

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@rafabene: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-hyperfleet-hyperfleet-api-main-presubmits-integration openshift-hyperfleet/hyperfleet-api presubmit Ci-operator config changed

Prior to this PR being merged, you will need to either run and acknowledge or opt to skip these rehearsals.

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@rh-amarin
Copy link

/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 9, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rafabene, rh-amarin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 9, 2026
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 9, 2026
@rh-amarin
Copy link

/retest

@rafabene
Copy link
Contributor Author

rafabene commented Feb 9, 2026

/pj-rehearse

@openshift-ci-robot
Copy link
Contributor

@rafabene: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@rafabene
Copy link
Contributor Author

rafabene commented Feb 9, 2026

/retest

1 similar comment
@rafabene
Copy link
Contributor Author

rafabene commented Feb 9, 2026

/retest

@rafabene
Copy link
Contributor Author

rafabene commented Feb 9, 2026

/test release-controller-config

@rafabene
Copy link
Contributor Author

rafabene commented Feb 9, 2026

/override release-controller-config

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 9, 2026

@rafabene: rafabene unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight openshift-staff-engineers openshift-sustaining-engineers.

Details

In response to this:

/override release-controller-config

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@rafabene
Copy link
Contributor Author

rafabene commented Feb 9, 2026

Waiting #74642 to be merged and fix ci/prow/release-controller-config job

@yingzhanredhat
Copy link
Contributor

/test release-controller-config

@rafabene
Copy link
Contributor Author

/pj-rehearse ack

@openshift-ci-robot
Copy link
Contributor

@rafabene: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Feb 10, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 10, 2026

@rafabene: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 24c9e93 into openshift:main Feb 10, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants