Skip to content

Commit 7d8f254

Browse files
authored
Bump core to 0.0.31 and bundles to 0.0.44 (#623)
* Better committed resource logging (#620) * Enable kvm_failover_evacuation weigher (#618) * Fix filter capabilities mutating flavor extra specs * Failover controller to support multicluster watches for reservations
2 parents fec0d73 + c3b2db6 commit 7d8f254

37 files changed

+1136
-187
lines changed

api/v1alpha1/reservation_types.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,13 @@ const (
3636
ReservationTypeLabelFailover = "failover"
3737
)
3838

39+
// Annotation keys for Reservation metadata.
40+
const (
41+
// AnnotationCreatorRequestID tracks the request ID that created this reservation.
42+
// Used for end-to-end traceability across API calls, controller reconciles, and scheduler invocations.
43+
AnnotationCreatorRequestID = "reservations.cortex.cloud/creator-request-id"
44+
)
45+
3946
// CommittedResourceAllocation represents a workload's assignment to a committed resource reservation slot.
4047
// The workload could be a VM (Nova/IronCore), Pod (Kubernetes), or other resource.
4148
type CommittedResourceAllocation struct {

cmd/main.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -340,7 +340,7 @@ func main() {
340340
// Initialize commitments API for LIQUID interface (with Nova client for usage reporting)
341341
commitmentsConfig := conf.GetConfigOrDie[commitments.Config]()
342342
commitmentsAPI := commitments.NewAPIWithConfig(multiclusterClient, commitmentsConfig, novaClient)
343-
commitmentsAPI.Init(mux, metrics.Registry)
343+
commitmentsAPI.Init(mux, metrics.Registry, ctrl.Log.WithName("commitments-api"))
344344

345345
deschedulingsController := &nova.DetectorPipelineController{
346346
Monitor: detectorPipelineMonitor,
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# Committed Resource Reservation System
2+
3+
The committed resource reservation system manages capacity commitments, i.e. strict reservation guarantees usable by projects.
4+
When customers pre-commit to resource usage, Cortex reserves capacity on hypervisors to guarantee availability.
5+
The system integrates with Limes (via the LIQUID protocol) to receive commitments, expose usage and capacity data, and provides acceptance/rejection feedback.
6+
7+
## File Structure
8+
9+
```text
10+
internal/scheduling/reservations/commitments/
11+
├── config.go # Configuration (intervals, API flags, secrets)
12+
├── controller.go # Reconciliation of reservations
13+
├── syncer.go # Periodic sync task with Limes, ensures local state matches Limes' commitments
14+
├── reservation_manager.go # Reservation CRUD operations
15+
├── api.go # HTTP API initialization
16+
├── api_change_commitments.go # Handle commitment changes from Limes and updates local reservations accordingly
17+
├── api_report_usage.go # Report VM usage per project, accounting to commitments or PAYG
18+
├── api_report_capacity.go # Report capacity per AZ
19+
├── api_info.go # Readiness endpoint with versioning (of underlying flavor group configuration)
20+
├── capacity.go # Capacity calculation from Hypervisor CRDs
21+
├── usage.go # VM-to-commitment assignment logic
22+
├── flavor_group_eligibility.go # Validates VMs belong to correct flavor groups
23+
└── state.go # Commitment state helper functions
24+
```
25+
26+
## Operations
27+
28+
### Configuration
29+
30+
| Helm Value | Description |
31+
|------------|-------------|
32+
| `committedResourceEnableChangeCommitmentsAPI` | Enable/disable the change-commitments endpoint |
33+
| `committedResourceEnableReportUsageAPI` | Enable/disable the usage reporting endpoint |
34+
| `committedResourceEnableReportCapacityAPI` | Enable/disable the capacity reporting endpoint |
35+
| `committedResourceRequeueIntervalActive` | How often to revalidate active reservations |
36+
| `committedResourceRequeueIntervalRetry` | Retry interval when knowledge not ready |
37+
| `committedResourceChangeAPIWatchReservationsTimeout` | Timeout waiting for reservations to become ready while processing commitment changes via API |
38+
| `committedResourcePipelineDefault` | Default scheduling pipeline |
39+
| `committedResourceFlavorGroupPipelines` | Map of flavor group to pipeline name |
40+
| `committedResourceSyncInterval` | How often the syncer reconciles Limes commitments to Reservation CRDs |
41+
42+
Each API endpoint can be disabled independently. The periodic sync task can be disabled by removing it (`commitments-sync-task`) from the list of enabled tasks in the `cortex-nova` Helm chart.
43+
44+
### Observability
45+
46+
Alerts and metrics are defined in `helm/bundles/cortex-nova/alerts/nova.alerts.yaml`. Key metric prefixes:
47+
- `cortex_committed_resource_change_api_*` - Change API metrics
48+
- `cortex_committed_resource_usage_api_*` - Usage API metrics
49+
- `cortex_committed_resource_capacity_api_*` - Capacity API metrics
50+
51+
## Architecture Overview
52+
53+
```mermaid
54+
flowchart LR
55+
subgraph State
56+
Res[(Reservation CRDs)]
57+
end
58+
59+
ChangeAPI[Change API]
60+
UsageAPI[Usage API]
61+
Syncer[Syncer Task]
62+
Controller[Controller]
63+
Scheduler[Scheduler API]
64+
65+
ChangeAPI -->|CRUD| Res
66+
Syncer -->|CRUD| Res
67+
UsageAPI -->|read| Res
68+
Res -->|watch| Controller
69+
Controller -->|update spec/status| Res
70+
Controller -->|placement request| Scheduler
71+
```
72+
73+
Reservations are managed through the Change API, Syncer Task, and Controller reconciliation. The Usage API provides read-only access to report usage data back to Limes.
74+
75+
### Change-Commitments API
76+
77+
The change-commitments API receives batched commitment changes from Limes. A request can contain multiple commitment changes across different projects and flavor groups. The semantic is **all-or-nothing**: if any commitment in the batch cannot be fulfilled (e.g., insufficient capacity), the entire request is rejected and rolled back.
78+
79+
Cortex performs CRUD operations on local Reservation CRDs to match the new desired state:
80+
- Creates new reservations for increased commitment amounts
81+
- Deletes existing reservations
82+
- Cortex preserves existing reservations that already have VMs allocated when possible
83+
84+
### Syncer Task
85+
86+
The syncer task runs periodically and fetches all commitments from Limes. It syncs the local Reservation CRD state to match Limes' view of commitments.
87+
88+
### Controller (Reconciliation)
89+
90+
The controller watches Reservation CRDs and performs reconciliation:
91+
92+
1. **For new reservations** (no target host assigned):
93+
- Calls Cortex for scheduling to find a suitable host
94+
- Assigns the target host and marks the reservation as Ready
95+
96+
2. **For existing reservations** (already have a target host):
97+
- Validates that allocated VMs are still on the expected host
98+
- Updates allocations if VMs have migrated or been deleted
99+
- Requeues for periodic revalidation
100+
101+
### Usage API
102+
103+
This API reports for a given project the total committed resources and usage per flavor group. For each VM, it reports whether the VM accounts to a specific commitment or PAYG. This assignment is deterministic and may differ from the actual Cortex internal assignment used for scheduling.
104+

go.mod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ module github.com/cobaltcore-dev/cortex
33
go 1.26
44

55
require (
6-
github.com/cobaltcore-dev/openstack-hypervisor-operator v1.0.1
6+
github.com/cobaltcore-dev/openstack-hypervisor-operator v1.0.2-0.20260324155836-56b40c7ff846
77
github.com/go-gorp/gorp v2.2.0+incompatible
88
github.com/gophercloud/gophercloud/v2 v2.11.1
99
github.com/ironcore-dev/ironcore v0.2.4

go.sum

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UF
2222
github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
2323
github.com/cobaltcore-dev/openstack-hypervisor-operator v1.0.1 h1:wXolWfljyQQZbxNQ2pZVIw8wFz9BKiDIvLrECsqGDT8=
2424
github.com/cobaltcore-dev/openstack-hypervisor-operator v1.0.1/go.mod h1:b0KmJdxvRI8UXlGe8cRm5BD8Tm2WhF7zSKMSIRGyVL4=
25+
github.com/cobaltcore-dev/openstack-hypervisor-operator v1.0.2-0.20260324155836-56b40c7ff846 h1:Hg5+F1lOUpU9dZ8gVxeohodtYC4Z1fV/iqwYoF/RuNc=
26+
github.com/cobaltcore-dev/openstack-hypervisor-operator v1.0.2-0.20260324155836-56b40c7ff846/go.mod h1:j1SaxUTo0irugdC7aHuYDKEomIPZwCHoz+4kE8EBBGM=
2527
github.com/containerd/continuity v0.4.5 h1:ZRoN1sXq9u7V6QoHMcVWGhOwDFqZ4B9i5H6un1Wh0x4=
2628
github.com/containerd/continuity v0.4.5/go.mod h1:/lNJvtJKUQStBzpVQ1+rasXO1LAWtUQssk28EZvJ3nE=
2729
github.com/containerd/errdefs v1.0.0 h1:tg5yIfIlQIrxYtu9ajqY42W3lpS19XqdxRQeEwYG8PI=

helm/bundles/cortex-cinder/Chart.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ apiVersion: v2
55
name: cortex-cinder
66
description: A Helm chart deploying Cortex for Cinder.
77
type: application
8-
version: 0.0.43
8+
version: 0.0.44
99
appVersion: 0.1.0
1010
dependencies:
1111
# from: file://../../library/cortex-postgres
@@ -16,12 +16,12 @@ dependencies:
1616
# from: file://../../library/cortex
1717
- name: cortex
1818
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
19-
version: 0.0.30
19+
version: 0.0.31
2020
alias: cortex-knowledge-controllers
2121
# from: file://../../library/cortex
2222
- name: cortex
2323
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
24-
version: 0.0.30
24+
version: 0.0.31
2525
alias: cortex-scheduling-controllers
2626

2727
# Owner info adds a configmap to the kubernetes cluster with information on

helm/bundles/cortex-crds/Chart.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,13 @@ apiVersion: v2
55
name: cortex-crds
66
description: A Helm chart deploying Cortex CRDs.
77
type: application
8-
version: 0.0.43
8+
version: 0.0.44
99
appVersion: 0.1.0
1010
dependencies:
1111
# from: file://../../library/cortex
1212
- name: cortex
1313
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
14-
version: 0.0.30
14+
version: 0.0.31
1515

1616
# Owner info adds a configmap to the kubernetes cluster with information on
1717
# the service owner. This makes it easier to find out who to contact in case

helm/bundles/cortex-ironcore/Chart.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,13 @@ apiVersion: v2
55
name: cortex-ironcore
66
description: A Helm chart deploying Cortex for IronCore.
77
type: application
8-
version: 0.0.43
8+
version: 0.0.44
99
appVersion: 0.1.0
1010
dependencies:
1111
# from: file://../../library/cortex
1212
- name: cortex
1313
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
14-
version: 0.0.30
14+
version: 0.0.31
1515

1616
# Owner info adds a configmap to the kubernetes cluster with information on
1717
# the service owner. This makes it easier to find out who to contact in case

helm/bundles/cortex-manila/Chart.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ apiVersion: v2
55
name: cortex-manila
66
description: A Helm chart deploying Cortex for Manila.
77
type: application
8-
version: 0.0.43
8+
version: 0.0.44
99
appVersion: 0.1.0
1010
dependencies:
1111
# from: file://../../library/cortex-postgres
@@ -16,12 +16,12 @@ dependencies:
1616
# from: file://../../library/cortex
1717
- name: cortex
1818
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
19-
version: 0.0.30
19+
version: 0.0.31
2020
alias: cortex-knowledge-controllers
2121
# from: file://../../library/cortex
2222
- name: cortex
2323
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
24-
version: 0.0.30
24+
version: 0.0.31
2525
alias: cortex-scheduling-controllers
2626

2727
# Owner info adds a configmap to the kubernetes cluster with information on

helm/bundles/cortex-nova/Chart.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ apiVersion: v2
55
name: cortex-nova
66
description: A Helm chart deploying Cortex for Nova.
77
type: application
8-
version: 0.0.43
8+
version: 0.0.44
99
appVersion: 0.1.0
1010
dependencies:
1111
# from: file://../../library/cortex-postgres
@@ -16,12 +16,12 @@ dependencies:
1616
# from: file://../../library/cortex
1717
- name: cortex
1818
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
19-
version: 0.0.30
19+
version: 0.0.31
2020
alias: cortex-knowledge-controllers
2121
# from: file://../../library/cortex
2222
- name: cortex
2323
repository: oci://ghcr.io/cobaltcore-dev/cortex/charts
24-
version: 0.0.30
24+
version: 0.0.31
2525
alias: cortex-scheduling-controllers
2626

2727
# Owner info adds a configmap to the kubernetes cluster with information on

0 commit comments

Comments
 (0)