cpu_seq: fix set_power_state idempotency for substates#2272
Merged
Conversation
PR #2064 was _supposed_ to change the Gimlet and Cosmo CPU sequencer API so that requesting an idempotent power state transition (i.e. A0 -> A0) returns `Ok(Transition::Unchanged)` rather than `Err(SeqError::IllegalTransition)`. Unfortunately, it doesn't always do that in some cases. This is because the match arm in which we detect no-op power state transitions tests that the current and requested `PowerState`s are _equal_. But, unfortunately, the `PowerState` enum has more variants than just "A2" and "A0" --- it also represents _substates_ of A0 and A2, such as `A0PlusHP` and `A2PlusFans`. In particular, when a compute sled is fully up and running, it's actually *not* in `PowerState::A0`, it's in `PowerState::A0PlusHP`, because the NIC hotplug controller is enabled. The `PowerState::A0` and `PowerState::A0PlusHP` enum variants are not equal, so when a sled that's actually in `A0PlusHP` is told to go to `A0`, it doesn't match and falls through to the error case incorrectly. I'm pretty sure the reason we didn't realize this earlier is because, while I did test both A2->A2 and A0->A0 transitions when I was testing PR #2064, I would send the A0->A0 request more or less as soon as the system reached A0. I didn't wait for the host OS to come up before testing it, so the system was still in `PowerState::A0` and not `PowerState::A0PlusHP`. Whoopsie. This commit makes this operation idempotent in those cases by treating `A0PlusHP->A0` and `A2PlusFans->A2` transitions as idempotent successes rather than `IllegalTransition`s. I did *not* change the behavior in A0 substates that indicate a CPU reset condition (`A0Reset` and `A0Thermtrip`), as in those cases, we require an explicit transition back to A2 before the system will return to A0. Fixes #2271
700131b to
e825d1d
Compare
labbott
approved these changes
Oct 15, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR #2064 was supposed to change the Gimlet and Cosmo CPU sequencer API so that requesting an idempotent power state transition (i.e. A0 -> A0) returns
Ok(Transition::Unchanged)rather thanErr(SeqError::IllegalTransition). Unfortunately, it doesn't always do that in some cases.This is because the match arm in which we detect no-op power state transitions tests that the current and requested
PowerStates are equal. But, unfortunately, thePowerStateenum has more variants than just "A2" and "A0" --- it also represents substates of A0 and A2, such asA0PlusHPandA2PlusFans.In particular, when a compute sled is fully up and running, it's actually not in
PowerState::A0, it's inPowerState::A0PlusHP, because the NIC hotplug controller is enabled. ThePowerState::A0andPowerState::A0PlusHPenum variants are not equal, so when a sled that's actually inA0PlusHPis told to go toA0, it doesn't match and falls through to the error case incorrectly.I'm pretty sure the reason we didn't realize this earlier is because, while I did test both A2->A2 and A0->A0 transitions when I was testing PR #2064, I would send the A0->A0 request more or less as soon as the system reached A0. I didn't wait for the host OS to come up before testing it, so the system was still in
PowerState::A0and notPowerState::A0PlusHP. Whoopsie.This commit makes this operation idempotent in those cases by treating
A0PlusHP->A0andA2PlusFans->A2transitions as idempotent successes rather thanIllegalTransitions. I did not change the behavior in A0 substates that indicate a CPU reset condition (A0ResetandA0Thermtrip), as in those cases, we require an explicit transition back to A2 before the system will return to A0.Fixes #2271