Bump CortexNovaSchedulingDown alert to critical#589
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe single alert configuration file updates the CortexNovaSchedulingDown alert: severity escalated from warning to critical and the alert description revised to note it is non-critical for VMware VMs but blocks KVM VMs, recommending immediate investigation and resolution. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
helm/bundles/cortex-nova/alerts/nova.alerts.yaml (1)
13-21:⚠️ Potential issue | 🟠 MajorAlign annotation urgency with
criticalseverity.
severity: criticalconflicts with the current description text (“no immediate problem”), which can slow or misroute on-call response.Proposed wording update
annotations: summary: "Cortex Scheduling for Nova is down" description: > - The Cortex scheduling service is down. Scheduling requests from Nova will - not be served. This is no immediate problem, since Nova will continue - placing new VMs. However, the placement will be less desirable. + The Cortex scheduling service is down. Scheduling requests from Nova are + impacted and placement quality is degraded. Treat this as critical and + follow the playbook immediately.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helm/bundles/cortex-nova/alerts/nova.alerts.yaml` around lines 13 - 21, The annotation description under the alert with summary "Cortex Scheduling for Nova is down" contradicts the declared severity: change the description (annotations.description) to reflect an urgent/critical impact and recommended immediate on-call action consistent with severity: remove language like "no immediate problem", state that scheduling requests will not be served and this will significantly impact placement and resource stability, and add a suggested immediate action (e.g., escalate to on-call, follow playbook at docs/support/playbook/cortex/down) so the annotations align with severity: critical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@helm/bundles/cortex-nova/alerts/nova.alerts.yaml`:
- Around line 13-21: The annotation description under the alert with summary
"Cortex Scheduling for Nova is down" contradicts the declared severity: change
the description (annotations.description) to reflect an urgent/critical impact
and recommended immediate on-call action consistent with severity: remove
language like "no immediate problem", state that scheduling requests will not be
served and this will significantly impact placement and resource stability, and
add a suggested immediate action (e.g., escalate to on-call, follow playbook at
docs/support/playbook/cortex/down) so the annotations align with severity:
critical.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 961240c8-3c43-42f6-bb23-e7a090553654
📒 Files selected for processing (1)
helm/bundles/cortex-nova/alerts/nova.alerts.yaml
Test Coverage ReportTest Coverage 📊: 67.9% |
This alert has been around for some time, and so far never reported false positives or flapped. So we can consider it stable. Since we're going onto the critical path with cortex for nova kvm, it's crucial that we escalate this alert. There's also an actionable playbook for this alert already.