Skip to content

cpu_seq: fix set_power_state idempotency for substates#2272

Merged
hawkw merged 2 commits intomasterfrom
eliza/A0PlusHP-is-A0
Oct 16, 2025
Merged

cpu_seq: fix set_power_state idempotency for substates#2272
hawkw merged 2 commits intomasterfrom
eliza/A0PlusHP-is-A0

Conversation

@hawkw
Copy link
Member

@hawkw hawkw commented Oct 15, 2025

PR #2064 was supposed to change the Gimlet and Cosmo CPU sequencer API so that requesting an idempotent power state transition (i.e. A0 -> A0) returns Ok(Transition::Unchanged) rather than
Err(SeqError::IllegalTransition). Unfortunately, it doesn't always do that in some cases.

This is because the match arm in which we detect no-op power state transitions tests that the current and requested PowerStates are equal. But, unfortunately, the PowerState enum has more variants than just "A2" and "A0" --- it also represents substates of A0 and A2, such as A0PlusHP and A2PlusFans.

In particular, when a compute sled is fully up and running, it's actually not in PowerState::A0, it's in PowerState::A0PlusHP, because the NIC hotplug controller is enabled. The PowerState::A0 and PowerState::A0PlusHP enum variants are not equal, so when a sled that's actually in A0PlusHP is told to go to A0, it doesn't match and falls through to the error case incorrectly.

I'm pretty sure the reason we didn't realize this earlier is because, while I did test both A2->A2 and A0->A0 transitions when I was testing PR #2064, I would send the A0->A0 request more or less as soon as the system reached A0. I didn't wait for the host OS to come up before testing it, so the system was still in PowerState::A0 and not PowerState::A0PlusHP. Whoopsie.

This commit makes this operation idempotent in those cases by treating A0PlusHP->A0 and A2PlusFans->A2 transitions as idempotent successes rather than IllegalTransitions. I did not change the behavior in A0 substates that indicate a CPU reset condition (A0Reset and A0Thermtrip), as in those cases, we require an explicit transition back to A2 before the system will return to A0.

Fixes #2271

PR #2064 was _supposed_ to change the Gimlet and Cosmo CPU sequencer API
so that requesting an idempotent power state transition (i.e. A0 -> A0)
returns `Ok(Transition::Unchanged)` rather than
`Err(SeqError::IllegalTransition)`. Unfortunately, it doesn't always do
that in some cases.

This is because the match arm in which we detect no-op power state
transitions tests that the current and requested `PowerState`s are
_equal_. But, unfortunately, the `PowerState` enum has more variants
than just "A2" and "A0" --- it also represents _substates_ of A0 and A2,
such as `A0PlusHP` and `A2PlusFans`.

In particular, when a compute sled is fully up and running, it's
actually *not* in `PowerState::A0`, it's in `PowerState::A0PlusHP`,
because the NIC hotplug controller is enabled. The `PowerState::A0` and
`PowerState::A0PlusHP` enum variants are not equal, so when a sled
that's actually in `A0PlusHP` is told to go to `A0`, it doesn't match
and falls through to the error case incorrectly.

I'm pretty sure the reason we didn't realize this earlier is because,
while I did test both A2->A2 and A0->A0 transitions when I was testing
PR #2064, I would send the A0->A0 request more or less as soon as the
system reached A0. I didn't wait for the host OS to come up before
testing it, so the system was still in `PowerState::A0` and not
`PowerState::A0PlusHP`. Whoopsie.

This commit makes this operation idempotent in those cases by treating
`A0PlusHP->A0` and `A2PlusFans->A2` transitions as idempotent successes
rather than `IllegalTransition`s. I did *not* change the behavior in A0
substates that indicate a CPU reset condition (`A0Reset` and
`A0Thermtrip`), as in those cases, we require an explicit transition
back to A2 before the system will return to A0.

Fixes #2271
@hawkw hawkw force-pushed the eliza/A0PlusHP-is-A0 branch from 700131b to e825d1d Compare October 15, 2025 18:21
@hawkw hawkw enabled auto-merge (squash) October 16, 2025 00:00
@hawkw hawkw merged commit a9bb8ff into master Oct 16, 2025
159 checks passed
@hawkw hawkw deleted the eliza/A0PlusHP-is-A0 branch October 16, 2025 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CPU sequencer A0->A0 power state transitions are not always idempotent

2 participants