[architect] refactor(store): decompose monolithic Store interface into 7 domain sub-interfaces#16453
[architect] refactor(store): decompose monolithic Store interface into 7 domain sub-interfaces#16453clubanderson wants to merge 7420 commits into
Conversation
Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- ActiveAlerts: Guard stats.firing check and use optional chaining for stats.critical/warning/acknowledged - AlertRules: Guard rules.filter() and rule.channels.map() with array safety pattern (data || []) Fixes #15867 Fixes #15868 Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…es (#15881) Fixes #15865 Fixes #15866 Adds array safety guards to prevent crashes when API returns undefined/null data on: - Pods page: podIssues.length, clusters.filter() - Nodes page: clusters.filter() - Helm Releases page: clusters.filter() - Services page: clusters.filter(), services.filter() - Deployments page: deployments.filter(), deploymentIssues.filter(), podIssues.filter() All array operations now use (data || []) pattern to safely handle undefined/null values. Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Guards gpuNodes array with (gpuNodes || []) before calling .forEach() to prevent TypeError when GPU data is undefined. This allows the ClusterCosts card to gracefully handle GPU API failures and continue showing CPU/memory cost data. Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Restores zoom UI (10 levels: 15%-150%) with localStorage persistence and a popout-to-new-window button. Uses CSS `transform: scale()` instead of `font-size: %` to bypass browser minimum font-size clamping that previously made the smallest zoom levels (15/20/25/35%) visually indistinguishable. Popout URL switched from hardcoded `http://localhost:30500/...` to the relative proxy path so it works on the hosted site. Adds a visual regression test that captures the circuit at 100%, 25%, and 15% to catch the clamping bug if it ever returns. Signed-off-by: Kevin Roche <kproche@us.ibm.com>
pok-prod was failing with "Progress deadline exceeded" after 10 minutes even though the Helm deploy step waited longer. Make the Deployment progress deadline configurable and raise the chart default to 900 seconds so slower clusters do not fail before Helm's timeout window. Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Revert the detection-phase fallback model to gpt-4.1 for stuck-detection, implement-fix, and auto-triage so threat detection does not fail with an unsupported model error.\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
#15891) Add prompt-specific memory formatting for cluster AI prompts and cover the regression with tests. Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cover getStaleCacheMetaKeys (stale/fresh threshold, malformed JSON, missing\ntimestamp fields, boundary conditions, priority order of timestamp fields,\ninvalid values, localStorage throws) and useStaleCacheCleanup hook (runs\nonce on mount, calls safeRemoveItem for each stale key).\n\nCo-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add structured event emission (staleKeysFound, staleKeysRemoved, oldestStaleAgeMs, cleanupDurationMs, timestamp) after each cleanup run. No key names are included in emitted events per sec-check recommendation. - New staleCacheEvents.ts module following backendHealthEvents pattern - getStaleCacheMetaKeysWithAge helper tracks per-key age - No event emitted when zero stale keys found (no noise) - Event emitted even on partial cleanup (some removals fail) - Comprehensive tests for all observability scenarios Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…) (#15894) * 🐛 Address Copilot feedback on Quantum Circuit zoom (follow-up to #15884) PR #15884 was merged before a fix-up commit could land. This restores the 6 Copilot-flagged improvements as a follow-up. - Guard localStorage with try/catch + isBrowser check (privacy modes, unavailable storage, future SSR safety) via readPersistedZoom / writePersistedZoom helpers - Validate + snap persisted zoom values to the nearest entry in CIRCUIT_ZOOM_LEVELS_PCT, rejecting non-finite or non-positive values from corrupted storage - Add type="button" to all <button> elements so they never default to submit if rendered inside a <form> - Reserve scrollable layout space matching the scaled circuit by measuring the unscaled <pre> with useLayoutEffect and sizing the wrapper to naturalSize * zoomFactor; <pre> is positioned absolutely with transform: scale(). Scrollbars now match visible content at every zoom level. (Avoids reverting to font-size: % which would re-introduce browser minimum-font-size clamping.) - Visual test: clear localStorage in addInitScript so the 100% baseline is deterministic regardless of prior test state - Visual test: fix stale "eight" comment that should have read "ten" Signed-off-by: Kevin Roche <kproche@us.ibm.com> * 🐛 Address Copilot review on PR #15894 - Export CIRCUIT_ZOOM_STORAGE_KEY and import it in the visual test so the storage key isn't duplicated as a string literal (test would silently clear the wrong key if the constant ever changed). - Hide the circuit wrapper with visibility:hidden until useLayoutEffect records the natural size, eliminating the one-frame flash where the unscaled <pre> rendered before the wrapper had dimensions. The third comment (p-4 padding scaling) is intentional — offsetWidth already includes the unscaled padding, so scaledWidth = (content + p-4) × zoom matches what's actually painted. No double-counting; reply on PR. Signed-off-by: Kevin Roche <kproche@us.ibm.com> * 🐛 Move CIRCUIT_ZOOM_STORAGE_KEY to a leaf module so the visual test runs Importing from QuantumCircuitViewer.tsx pulled the app-level logger into Playwright's Node runtime and threw `Cannot read properties of undefined (reading 'DEV')` from `import.meta.env.DEV` before any test could execute. Splitting the constant into QuantumCircuitViewer.constants.ts gives the e2e test a dependency-free import target, and the component re-imports the same constant so the source of truth is preserved. CI generates the snapshot baselines itself in the "Generate baseline snapshots" step when none exist, so no PNGs need to be committed. Signed-off-by: Kevin Roche <kproche@us.ibm.com> --------- Signed-off-by: Kevin Roche <kproche@us.ibm.com>
#15901) * 🐛 Guard deployment replica values against undefined in diagnose prompt When deployment data hasn't fully loaded, readyReplicas and replicas can be undefined, causing "undefined/undefined ready" in AI diagnose/repair prompts. Use nullish coalescing to default to 0. Fixes #15899 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> * 🌱 Add Vitest unit tests for diagnose prompt replica counts Add comprehensive tests verifying deployment replica values are never 'undefined' or 'null' in AI diagnose/repair prompts. Covers: - Normal replica counts - Both replicas undefined (bug case from #15899) - Partially undefined replicas - Null values - Zero replicas (healthy zero state) - Large replica counts - Empty deployment issues array - Repair prompt with undefined replicas Also applies the nullish coalescing fix (?? 0) from PR #15901 to guard against runtime undefined values. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace incorrect page redirect with an inline workload import modal so the Workloads page opens a creation flow instead of bouncing back to Deploy. Imported workloads are added to the local workloads list so the new flow has visible in-page feedback. Signed-off-by: kubestellar-hive[bot] <kubestellar-hive[bot]@users.noreply.github.com> Co-authored-by: kubestellar-hive[bot] <kubestellar-hive[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…mode (#15909) Only show demo namespaces when in demo mode. When connected to a live cluster, fetch namespaces exclusively from the cluster API without mixing in demo data. Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement retry logic with exponential backoff and jitter when /api/workloads/resolve-deps returns HTTP 429. Adds max retry limit, Retry-After header support, AbortController cleanup, and user-facing error messaging after retries are exhausted. Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* 🌱 Add Vitest unit tests for diagnose prompt builder Assert deployment replica counts are never undefined/null in generated prompts. Covers normal, edge, and error cases. Signed-off-by: kubestellar-hive[bot] <kubestellar-hive[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> * 🌱 Retrigger CI (infra timeout on Docker cache upload) Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* 🐛 Sanitize user input in AI mission prompt builder Strip HTML tags and limit length of user-provided search input before interpolating into AI mission prompts and dialog titles. Prevents XSS and prompt injection via crafted search queries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> * 🌱 Retrigger CI (infra timeout on Docker cache upload) Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update Workloads component tests and cached data hook tests to align with the namespace demo-data hiding changes from PR #15909. Fixes mock return values and assertions to match the new hook behavior. Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* 🐛 Fix deploy page critical issues badge inconsistency (#15906) The header badge on the Deployments page used the global DashboardHealthIndicator, which derives its count from usePodIssues via useDashboardHealth. The stats cards and Deployment Issues card both derive their counts from useCachedDeploymentIssues filtered by the global cluster selection. These are two completely different data sources, causing the badge to display stale or unrelated issue counts (e.g. '2 critical issues' while stats showed Failed: 0, Healthy: 1). Fix: pass a deployment-specific afterTitle badge to DashboardPage that reads from the same issueCount (filteredDeploymentIssues.length) already used by the stats blocks. This ensures the badge, stats cards, and Deployment Issues card all reflect the same underlying data. Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> * 🌱 Add test coverage for deploy badge fix Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Apply sanitizeForPrompt() centrally inside buildEnhancedPrompt() on params.initialPrompt and cluster names before they are interpolated into prompt strings. This ensures ALL mission entry points (SearchDropdown, InlineAIAssist, ConfirmMissionPromptDialog, drill-down views) are protected against prompt injection regardless of whether the caller remembered to sanitize. Also sanitizes the content string used for resolution matching to prevent injection via title/description fields. Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Passed: 30/32 Failed: 2 [skip ci]
Fixes #15927 When resolution matching succeeds, the code was replacing the sanitized enhancedPrompt with raw params.initialPrompt, bypassing sanitizeForPrompt. Use enhancedPrompt instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* 🐛 Fix sanitizeForPrompt bypass in buildEnhancedPrompt When resolution matching succeeds, the code was replacing the sanitized enhancedPrompt with raw params.initialPrompt, bypassing sanitization. Use the already-sanitized enhancedPrompt variable instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> * 🌱 Add test for sanitizeForPrompt bypass in resolution matching (#15927) Verifies that angle brackets in initialPrompt are stripped even when the resolution-matching branch triggers, preventing prompt injection. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> * 🌱 Add regression test for sanitizeForPrompt bypass (#15927) Verifies that buildEnhancedPrompt preserves sanitization when resolution matching triggers, preventing the raw initialPrompt from bypassing the sanitizeForPrompt call. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Signed-off-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
✅ Deploy Preview for kubestellarconsole canceled.
|
|
👋 Hey @clubanderson — thanks for opening this PR!
This is an automated message. |
|
🐝 Hi @clubanderson! I'm Trusted users — org members and contributors with write access — can mention Automation may take a moment to start, and follow-up happens through workflow activity rather than chat replies. |
There was a problem hiding this comment.
Pull request overview
Refactors the backend persistence contract by decomposing the previously monolithic pkg/store.Store interface into seven domain-focused sub-interfaces, with Store now embedding those sub-interfaces while retaining cross-cutting lifecycle/transaction methods.
Changes:
- Introduces
UserStore,DashboardStore,GPUStore,AuthStore,RewardsStore,ContentStore, andInfraStoreinterfaces grouping related persistence methods. - Rewrites
Storeto embed the seven sub-interfaces, keepingWithTransactionandCloseat the top level. - Adds compile-time assertions ensuring
*SQLiteStoresatisfiesStoreand each sub-interface.
| // WithTransaction runs fn inside a database transaction. Rollback is | ||
| // automatic on fn error; callers must not call Rollback themselves. | ||
| WithTransaction(ctx context.Context, fn func(tx *sql.Tx) error) error |
Status CheckThis PR has been idle for 49+ hours with no activity. Current state: Findings
Recommended Next StepsA human maintainer should:
|
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Status CheckPR: #16453 — Findings
Recommended Next Steps
|
Requires Human InterventionPR: #16453 — Findings
Recommended Next Steps
|
Stuck Detection Run — 2026-06-05T12:04:58ZSummary
Previously Escalated PRs (already labeled
|
Stuck Detection Run — 2026-06-05T16:23:03ZSummary
Previously Escalated PRs (already labeled
|
| PR | Title | Last Updated | Blockers |
|---|---|---|---|
| #16453 | refactor(store): decompose monolithic Store interface | 2026-06-05T12:07Z | needs-rebase, DCO missing, hold |
| #16452 | fix: use %q to safely quote cluster context in AI provider | 2026-06-05T05:42Z | needs-rebase, hold |
| #16451 | planning: CNCF adopter conversion strategy | 2026-06-04T20:24Z | hold |
| #16438 | fix: make check-vendor-safety.mjs chunk-name-agnostic | 2026-06-04T23:02Z | needs-rebase, hold |
All four PRs remain idle with no human action since creation (June 2). Each has been escalated with ai-needs-human and detailed status comments — no further automated action is available until a human maintainer reviews.
Result
✅ No new stuck items detected. No new labels or status comments added this run.
Generated by Stuck Detection Workflow · sonnet46 1.3M · ◷
Stuck Detection Run — 2026-06-05T20:12:32ZSummary
Active Items (not stuck yet)
Previously Escalated PRs (already labeled
|
| PR | Title | Last Updated | Blockers |
|---|---|---|---|
| #16453 | refactor(store): decompose monolithic Store interface | 2026-06-05T16:26Z | needs-rebase, DCO missing, hold |
| #16452 | fix: use %q to safely quote cluster context | 2026-06-05T05:42Z | needs-rebase, hold |
| #16451 | planning: CNCF adopter conversion strategy | 2026-06-04T20:24Z | hold |
| #16438 | fix: make check-vendor-safety.mjs chunk-name-agnostic | 2026-06-04T23:02Z | needs-rebase, hold |
All four PRs are blocked on human review. No automated recovery is available.
Result
✅ No new stuck items detected. No new labels or status comments added this run.
Generated by Stuck Detection Workflow · sonnet46 1.1M · ◷
Stuck Detection Run — 2026-06-05T22:01:55ZSummary
Open AI-Generated PRs (already escalated)All open AI-generated PRs already have
Notes
|
Status CheckThis PR appears to be stuck and requires human intervention. Item: PR #16453 — Findings
Why this mattersThis is a large architectural refactor decomposing the 102-method monolithic Recommended Next Steps
|
Status CheckThis PR has been open for 4 days with no human review or merge activity. Current state: Findings
Recommended Next StepsA human maintainer should:
|
Stuck Detection Run — 2026-06-06T06:14:59ZSummary
Open AI-Generated PRs (already fully escalated)All open AI-generated PRs already have
Notes
ResultNo new stuck items detected. No new labels or status comments added this run.
|
Stuck Detection Run — 2026-06-06T09:47 UTCResults
Pre-existing Escalations (already labeled
|
| PR | Title | Also Labeled |
|---|---|---|
| #16453 | [architect] refactor(store): decompose monolithic Store interface | needs-rebase, hold |
| #16452 | [sec-check] fix: use %q to safely quote cluster context | needs-rebase, hold |
| #16451 | [strategist] planning: CNCF adopter conversion strategy | hold |
| #16438 | [ci-maintainer] fix: make check-vendor-safety.mjs chunk-name-agnostic | needs-rebase, hold |
Next Steps
No new action required this cycle. The 4 held PRs above need a human maintainer to review and decide whether to rebase, approve, or close them.
Generated by Stuck Detection Workflow · sonnet46 774.6K · ◷
Stuck Detection Run — 2026-06-06T11:35 UTCResults
Pre-existing Escalations (already labeled
|
| PR | Title | Also Labeled |
|---|---|---|
| #16453 | [architect] refactor(store): decompose monolithic Store interface | needs-rebase, hold |
| #16452 | [sec-check] fix: use %q to safely quote cluster context | needs-rebase, hold |
| #16451 | [strategist] planning: CNCF adopter conversion strategy | hold |
| #16438 | [ci-maintainer] fix: make check-vendor-safety.mjs chunk-name-agnostic | needs-rebase, hold |
Result
✅ No new stuck items detected. No new labels or status comments added this run.
Generated by Stuck Detection Workflow · sonnet46 1.1M · ◷
Stuck Detection Run — 2026-06-06T21:56 UTCResults
Open AI-Generated PRs — Still StuckAll 5 open AI-generated PRs have
Notes
|
|
Closing: branch is catastrophically stale (1.2M+ line diff). The Store decomposition design is sound — re-create on a fresh branch from main. |
Summary
Decomposes the monolithic
pkg/store.Storeinterface (102 methods across 20 unrelated domains) into 7 domain-aligned sub-interfaces, then makesStoreembed them all.Closes/addresses #16425.
Problem
store.Storeis a fat interface with 102 methods covering: users, dashboards, cards, pending swaps, events, feature requests, PR feedback, notifications, GPU reservations, GPU utilization snapshots, token revocation, rewards, token usage, OAuth credentials, OAuth state, audit logs, cluster groups, KB gaps, and cluster events.Any handler or service that needs one method must mock/stub all 102. This is the primary testability bottleneck in the codebase — every handler test that uses a fake store must either implement 102 stubs or accept a concrete
*SQLiteStorein tests.Change
pkg/store/store.go— structural only, no behaviour changes:UserStoreDashboardStoreGPUStoreAuthStoreRewardsStoreContentStoreInfraStoreStoreis rewritten to embed all 7 sub-interfaces +WithTransaction+Close.WithTransactionandCloseremain at theStorelevel because they are cross-cutting infrastructure concerns, not domain capabilities.Compatibility guarantee
sqlite_*.gofiles are touched.store.Storecompiles unchanged becauseStorestill exposes the identical 102-method set.SQLiteStoresatisfies all interfaces — enforced by compile-time assertions added tostore.go:Migration path (next steps, not in this PR)
Once merged, new handlers can be narrowed:
This PR establishes the interfaces; narrowing individual callers is incremental follow-up work.
Filed by architect agent (ACMM L5 — hold-gated mode)