fix: harden supervisor recovery and stuck scan by liobrasil · Pull Request #207 · onflow/FlowYieldVaults

liobrasil · 2026-03-11T04:56:47Z

Summary

add the final supervisor regression coverage for duplicate recoveries and mixed recurring/non-recurring scan sets
fix duplicate recovery churn by treating recently executed recurring transactions as active for a short optimistic-execution grace period
restrict stuck-scan ordering to recurring participants and lazily prune stale non-recurring entries during candidate walks
align scheduler docs/comments with the recurring-only scan behavior

Scope Note

the core supervisor fix is in FlowYieldVaultsAutoBalancers, FlowYieldVaultsSchedulerRegistry, and the supervisor regression tests
the remaining docs/comment updates are alignment-only and do not change supervisor behavior

Verification

flow test cadence/tests/scheduler_mixed_population_regression_test.cdc
flow test cadence/tests/scheduled_supervisor_test.cdc

nvdtf · 2026-03-13T22:06:48Z

cadence/contracts/FlowYieldVaultsAutoBalancers.cdc

+                        return true
+                    }
+
+                    if status == FlowTransactionScheduler.Status.Executed


This is the same issue we faced in onflow/FlowYieldVaultsEVM#70
The problem with the fix is that if the transaction panics, lastRebalanceTimestamp is not updated. This makes the rebalancer permanantely stuck because it's Executed and the lastRebalanceTimestamp was never updated.
You might want to consider a grace period based fix.

…-recovery

holyfuchs · 2026-03-18T13:58:59Z

cadence/contracts/FlowYieldVaultsAutoBalancers.cdc

+    /// A transaction is considered active when it is:
+    /// - still `Scheduled`, or
+    /// - already marked `Executed` by FlowTransactionScheduler, but the AutoBalancer has not
+    ///   yet advanced its last rebalance timestamp past that transaction's scheduled time.
+    ///
+    /// The second case matters because FlowTransactionScheduler flips status to `Executed`
+    /// before the handler actually runs. Without treating that in-flight window as active,
+    /// the Supervisor can falsely classify healthy vaults as stuck and recover them twice.
+    ///


Could you explain this?
I am not sure I understand.
The status can be Executed even though it is not executed?

yes, Executed can be set before the handler actually runs. In FlowTransactionScheduler, the scheduler marks a tx as Executed optimistically before the handler logic has actually finished running. The contract says this directly here:

https://github.com/onflow/flow-core-contracts/blob/27e0eb625ebe056c78cf42d6feaa6ce00a8e06c9/contracts/FlowTransactionScheduler.cdc#L1169-L1186
https://github.com/onflow/flow-core-contracts/blob/27e0eb625ebe056c78cf42d6feaa6ce00a8e06c9/contracts/FlowTransactionScheduler.cdc#L250-L264

holyfuchs · 2026-03-18T14:11:00Z

cadence/contracts/FlowYieldVaultsAutoBalancers.cdc

+            yieldVaultID: uniqueID.id,
+            handlerCap: handlerCap,
+            scheduleCap: scheduleCap,
+            participatesInStuckScan: recurringConfig != nil


The contract mentions its:
A registry of all yield vault IDs that participate in scheduled rebalancing

Would we ever have an instance where this is not the case?
To me it seems the better approach to ensure that SchedulerRegistry only has Vaults that are getting scheduled and we shouldn't add the ones which aren't.

Yes, we could do that. It would make the semantics cleaner: SchedulerRegistry would contain only vaults that are currently recurring/scheduled.

I kept this PR narrower because that would be a broader refactor. Today registration follows the vault lifecycle, not the recurring-config lifecycle, so changing the global registry to recurring-only would mean adding/removing entries whenever recurring config is enabled/disabled, and updating the related admin/recovery flows to match.

So I agree your approach is valid, but I’d treat it as a separate design change. In this PR I only made the stuck-scan ordering recurring-only. I’ll update the comments to make that distinction explicit.

check this commit : 9284c7e

…an-recovery

liobrasil added 3 commits March 10, 2026 22:19

test: tighten supervisor recovery regressions

8d609a4

fix: avoid duplicate supervisor recoveries

8447480

fix: restrict supervisor stuck scan to recurring vaults

66ccb02

liobrasil requested a review from a team as a code owner March 11, 2026 04:56

liobrasil requested a review from holyfuchs March 11, 2026 05:16

liobrasil mentioned this pull request Mar 12, 2026

Fix supervisor: report vault execution so stuck-scan order isn't fixed #187

Open

liobrasil requested review from Kay-Zee and jordanschalm March 12, 2026 17:26

nvdtf requested changes Mar 13, 2026

View reviewed changes

Merge origin/holyfuchs/supervisor-fix into lionel/fix-supervisor-scan…

88ae670

…-recovery

holyfuchs reviewed Mar 18, 2026

View reviewed changes

docs: clarify scheduler registry semantics

9284c7e

liobrasil force-pushed the lionel/fix-supervisor-scan-recovery branch from 67c0e5e to eab7ad0 Compare March 19, 2026 16:28

fix: bound optimistic execution recovery window

999cd1d

liobrasil force-pushed the lionel/fix-supervisor-scan-recovery branch from eab7ad0 to 999cd1d Compare March 19, 2026 18:22

liobrasil added 6 commits March 19, 2026 14:26

fix: bound supervisor stuck-scan pruning work

5959f87

test: fix mixed-population supervisor regression

d773f55

fix: restore manual deferred redeem claim retry

82cdec2

docs: align scheduler docs with scan semantics

f4c8e16

Merge branch 'holyfuchs/supervisor-fix' into lionel/fix-supervisor-sc…

6122a85

…an-recovery

test: align mixed-population regression comments

6ef8cc8

liobrasil requested review from a team and nvdtf March 19, 2026 21:03

liobrasil added 2 commits March 19, 2026 17:23

fix: clarify supervisor stuck-scan mutation semantics

66d5eb8

Tighten scheduler recovery grace and docs

8ffc8c7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: harden supervisor recovery and stuck scan#207

fix: harden supervisor recovery and stuck scan#207
liobrasil wants to merge 14 commits intoholyfuchs/supervisor-fixfrom
lionel/fix-supervisor-scan-recovery

liobrasil commented Mar 11, 2026 •

edited

Loading

Uh oh!

nvdtf Mar 13, 2026

Uh oh!

holyfuchs Mar 18, 2026

Uh oh!

liobrasil Mar 19, 2026

Uh oh!

holyfuchs Mar 18, 2026

Uh oh!

liobrasil Mar 19, 2026

Uh oh!

liobrasil Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

liobrasil commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope Note

Verification

Uh oh!

nvdtf Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

holyfuchs Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

liobrasil Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

holyfuchs Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

liobrasil Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

liobrasil Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

liobrasil commented Mar 11, 2026 •

edited

Loading