Skip to content

OCPBUGS-57348: add MCO operator manifest for boot image management#9783

Merged
openshift-merge-bot[bot] merged 3 commits intoopenshift:mainfrom
patrickdillon:OCPBUGS-57348-bootimg-mgmt
Jun 19, 2025
Merged

OCPBUGS-57348: add MCO operator manifest for boot image management#9783
openshift-merge-bot[bot] merged 3 commits intoopenshift:mainfrom
patrickdillon:OCPBUGS-57348-bootimg-mgmt

Conversation

@patrickdillon
Copy link
Contributor

@patrickdillon patrickdillon commented Jun 12, 2025

Adds manifest generation for MCO configuration. Currently the manifest is only generated when compute node custom boot images are specified, in order to disable MCO management of those boot images.

For example, aws install config with compute ami set:

compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform:
    aws:
      amiID: ami-08f1807771f4e468b
  replicas: 3

Produces the following manifest

cat test/manifests/cluster-mco-02-config.yml
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
  name: cluster
spec:
  logLevel: Normal
  operatorLogLevel: Normal
  managedBootImages:
    machineManagers:
      - resource: machinesets
        apiGroup: machine.openshift.io
        selection:
          mode: None

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Jun 12, 2025
@openshift-ci-robot
Copy link
Contributor

@patrickdillon: This pull request references Jira Issue OCPBUGS-57348, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.0) matches configured target version for branch (4.20.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @gpei

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Adds manifest generation for MCO configuration. Currently the manifest is only generated when custom boot images are specified, in order to disable MCO management of those boot images.

For example, aws install config with control plane ami set:

controlPlane:
 architecture: amd64
 hyperthreading: Enabled
 name: master
 platform:
   aws:
     amiID: ami-xxxxxxx
 replicas: 3

Produces the following manifest

cat test/manifests/cluster-mco-02-config.yml
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 creationTimestamp: null
 name: cluster
 namespace: openshift-machine-config-operator
spec:
 forceRedeploymentReason: ""
 managedBootImages:
   machineManagers:
   - apiGroup: machine.openshift.io
     resource: machinesets
     selection:
       mode: Partial
       partial:
         machineResourceSelector:
           matchLabels:
             machine.openshift.io/cluster-api-machine-role: worker
 managementState: ""
 nodeDisruptionPolicy:
   files: null
   sshkey:
     actions: null
   units: null
 observedConfig: null
 unsupportedConfigOverrides: null
status:
 managedBootImagesStatus:
   machineManagers: null
 nodeDisruptionPolicyStatus:
   clusterPolicies:
     sshkey:
       actions: null

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from andfasano, gpei and rwsu June 12, 2025 15:51
@patrickdillon patrickdillon force-pushed the OCPBUGS-57348-bootimg-mgmt branch from 97f0735 to 7f5f88f Compare June 12, 2025 15:54
@patrickdillon patrickdillon force-pushed the OCPBUGS-57348-bootimg-mgmt branch from 7f5f88f to e04874e Compare June 12, 2025 20:10
@patrickdillon patrickdillon force-pushed the OCPBUGS-57348-bootimg-mgmt branch from e04874e to 2ccb6ec Compare June 15, 2025 20:33
@openshift-ci-robot
Copy link
Contributor

@patrickdillon: This pull request references Jira Issue OCPBUGS-57348, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.0) matches configured target version for branch (4.20.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @gpei

Details

In response to this:

Adds manifest generation for MCO configuration. Currently the manifest is only generated when compute node custom boot images are specified, in order to disable MCO management of those boot images.

For example, aws install config with compute ami set:

compute:
- architecture: amd64
 hyperthreading: Enabled
 name: worker
 platform:
   aws:
     amiID: ami-08f1807771f4e468b
 replicas: 3

Produces the following manifest

cat test/manifests/cluster-mco-02-config.yml
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: machinesets
       apiGroup: machine.openshift.io
       selection:
         mode: None

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@patrickdillon
Copy link
Contributor Author

Ok I have reworked the PR, because cluster-bootstrap was not able to successfully apply the manifest created in the original implementation. I confirmed in testing what @djoshy suspected: when golang serializes the empty values of the mco configuration, it leaves a lot of empty values, and API server validation fails on them.

So in order to create the minimally specified manifest (in the updated description above), I switched the implementation to use a go template file. I have added this as a fixup commit ATM and will squash it later, once we align on this approach.

@patrickdillon
Copy link
Contributor Author

I have the lint fixes committed locally but will wait to push them until I get more feedback. This is a substantial change from the first iteration, so I wanted to get it up for review. I also still need to rework the mco tests; which I think I can mostly reuse.

Local testing shows the configuration I would expect in the cluster:

% oc describe machineconfiguration cluster 
Name:         cluster
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  operator.openshift.io/v1
Kind:         MachineConfiguration
Metadata:
  Creation Timestamp:  2025-06-15T23:39:28Z
  Generation:          1
  Resource Version:    2513
  UID:                 40bfcf3b-fd93-4066-9006-f9e37cc0808e
Spec:
  Log Level:  Normal
  Managed Boot Images:
    Machine Managers:
      API Group:  machine.openshift.io
      Resource:   machinesets
      Selection:
        Mode:          None
  Operator Log Level:  Normal
Events:                <none>

@djoshy
Copy link
Contributor

djoshy commented Jun 16, 2025

This approach looks good to me from the MCO POV. Sorry about the operator API weirdness, it's caused us some issues too(we have to do server side apply in same cases 😓 ).

Adds manifest generation for MCO configuration.
Currently the manifest is only generated when
custom boot images are specified, in order
to disable MCO management of those boot images.

The manifest generation uses a golang template
as testing revealed that API server validation
would not permit the manifests generated from
serializing the golang structs, which would
be more consistent with how we generate manifests
for other openshift operators. As golang will
populate the zero value for any non-pointer struct
this triggered validation, where the API server
expected certain required fields for these zero-value
structs. Using a template allows us to bypass this
problem.

Fixes OCPBUGS-57348
@patrickdillon patrickdillon force-pushed the OCPBUGS-57348-bootimg-mgmt branch from 2ccb6ec to 8931141 Compare June 16, 2025 18:58
)

const (
mcoConfigTemplateFileName = "cluster-mco-02-config.yaml.template"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the eventual generated manifest's filename sort vs. the installer-generated MachineSet manifests? Do we want a 90_ prefix or something to ensure we sort the MachineConfiguration first, so the MCO knows to leave MachineSets alone before the MachineSets are created in the cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The resulting filename is indeed cluster-mco-02-config.yaml and therefore sorts after the 99_ prefixed machinesets. I am running a local install to gather the bootkube logs and see if I can get a sense of when it is created.

My only hesitation is that none of the other cluster operator manifests have a numerical prefix... but I don't find that convincing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed in dd2b388

@djoshy
Copy link
Contributor

djoshy commented Jun 16, 2025

/lgtm

The MCO manifests and tests seems sane to me. I'm not as familiar with the manifest generation flow, so I will defer that to the installer folks. Thanks for putting this together!

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 16, 2025
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 16, 2025
@patrickdillon
Copy link
Contributor Author

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 17, 2025

@patrickdillon: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn dd2b388 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@patrickdillon
Copy link
Contributor Author

#9793 has merged for unit tests

/retest-required

@patrickdillon
Copy link
Contributor Author

/label tide/merge-method-squash
for fixups

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 18, 2025
@patrickdillon
Copy link
Contributor Author

@djoshy would you mind bumping lgtm?

@patrickdillon
Copy link
Contributor Author

/cherry-pick release-4.19

@openshift-cherrypick-robot

@patrickdillon: once the present PR merges, I will cherry-pick it on top of release-4.19 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@djoshy
Copy link
Contributor

djoshy commented Jun 18, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2025
@patrickdillon
Copy link
Contributor Author

/approve

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 19, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 19, 2025
@gpei
Copy link
Contributor

gpei commented Jun 19, 2025

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Jun 19, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit b241c4e into openshift:main Jun 19, 2025
17 of 18 checks passed
@openshift-ci-robot
Copy link
Contributor

@patrickdillon: Jira Issue OCPBUGS-57348: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-57348 has been moved to the MODIFIED state.

Details

In response to this:

Adds manifest generation for MCO configuration. Currently the manifest is only generated when compute node custom boot images are specified, in order to disable MCO management of those boot images.

For example, aws install config with compute ami set:

compute:
- architecture: amd64
 hyperthreading: Enabled
 name: worker
 platform:
   aws:
     amiID: ami-08f1807771f4e468b
 replicas: 3

Produces the following manifest

cat test/manifests/cluster-mco-02-config.yml
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: machinesets
       apiGroup: machine.openshift.io
       selection:
         mode: None

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-cherrypick-robot

@patrickdillon: new pull request created: #9797

Details

In response to this:

/cherry-pick release-4.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-installer
This PR has been included in build ose-installer-container-v4.20.0-202506191613.p0.gb241c4e.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-baremetal-installer
This PR has been included in build ose-baremetal-installer-container-v4.20.0-202506191613.p0.gb241c4e.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-installer-artifacts
This PR has been included in build ose-installer-artifacts-container-v4.20.0-202506191613.p0.gb241c4e.assembly.stream.el9.
All builds following this will include this PR.

tthvo added a commit to tthvo/installer that referenced this pull request Jun 19, 2025
If custom boot images are specified, the installer generates an MCO
manifest to disable MCO boot image management [0].

On AWS, previously edge compute machine pool was not considered for
custom AMI. This commit ensures custom AMI is considered in all compute
machine pools.

References:
[0] openshift#9783
tthvo added a commit to tthvo/installer that referenced this pull request Jun 19, 2025
If custom boot images are specified, the installer generates an MCO
manifest to disable MCO boot image management [0].

On AWS, previously edge compute machine pool was not considered for
custom AMI. This commit ensures custom AMI is considered in all compute
machine pools.

References:
[0] openshift#9783
tthvo added a commit to tthvo/installer that referenced this pull request Jun 19, 2025
If custom boot images are specified, the installer generates an MCO
manifest to disable MCO boot image management [0].

On AWS, previously edge compute machine pool was not considered for
custom AMI. This commit ensures custom AMI is considered in all compute
machine pools.

References:
[0] openshift#9783
tthvo added a commit to tthvo/installer that referenced this pull request Jun 19, 2025
If custom boot images are specified, the installer generates an MCO
manifest to disable MCO boot image management [0].

On AWS, previously edge compute machine pool was not considered for
custom AMI. This commit ensures custom AMI is considered in all compute
machine pools.

References:

[0] openshift#9783
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/installer that referenced this pull request Jun 24, 2025
If custom boot images are specified, the installer generates an MCO
manifest to disable MCO boot image management [0].

On AWS, previously edge compute machine pool was not considered for
custom AMI. This commit ensures custom AMI is considered in all compute
machine pools.

References:

[0] openshift#9783
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants