Skip to content

Operator cert manager#139

Merged
mangelajo merged 28 commits intomainfrom
operator-cert-manager
Jan 28, 2026
Merged

Operator cert manager#139
mangelajo merged 28 commits intomainfrom
operator-cert-manager

Conversation

@mangelajo
Copy link
Copy Markdown
Member

@mangelajo mangelajo commented Jan 23, 2026

Summary by CodeRabbit

  • New Features

    • Automatic TLS via cert-manager with configurable CA/cert durations, self-signed or external issuers, per-component certificates, new certManager configuration, and egressSelectorType for JWT issuer discovery.
    • Status now exposes Kubernetes-style Conditions for richer readiness and component-level status.
  • Chores

    • Operator manifests, RBAC, and bundle updated to grant cert-manager permissions; deploy scripts can toggle/install cert-manager.
  • Tests

    • e2e/unit tests extended to install/validate cert-manager and verify issuer/certificate and TLS mounting workflows.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 23, 2026

📝 Walkthrough

Walkthrough

Replaces top-level useCertManager with structured CertManagerConfig; adds cert-manager issuer and certificate reconciliation (including self-signed CA bootstrap); migrates status to Kubernetes-style Conditions; updates CRDs/CSV/RBAC/scheme registration, controller Deployments, tests, scripts, and dependencies for cert-manager integration.

Changes

Cohort / File(s) Summary
API Types & DeepCopy
controller/deploy/operator/api/v1alpha1/jumpstarter_types.go, controller/deploy/operator/api/v1alpha1/zz_generated.deepcopy.go
Removed UseCertManager bool; added CertManager CertManagerConfig, ServerCertConfig, SelfSignedConfig, IssuerReference, new ConditionType constants, Conditions []metav1.Condition; generated deep-copy methods for new/changed types.
Controller Reconciliation & Status
controller/deploy/operator/internal/controller/jumpstarter/...
certificates.go, jumpstarter_controller.go, status.go, jumpstarter_controller_test.go
Added reconcileCertificates and cert-manager resource reconciliation (issuer selection, self-signed CA bootstrap, controller/router Certificates), conditional TLS secret mounting in Deployments, moved status updates to condition-based reconciliation; updated tests to use CertManagerConfig.
RBAC, Scheme Registration & CSV
controller/deploy/operator/config/rbac/role.yaml, controller/deploy/operator/cmd/main.go, controller/deploy/operator/bundle/manifests/jumpstarter-operator.clusterserviceversion.yaml
Registered cert-manager v1 scheme; added cert-manager RBAC rules and clusterPermissions entries.
CRD / Bundle Manifests
controller/deploy/operator/config/crd/bases/...operator.jumpstarter.dev_jumpstarters.yaml, controller/deploy/operator/bundle/manifests/operator.jumpstarter.dev_jumpstarters.yaml
Introduced spec.*.certManager blocks (enabled, server.issuerRef/selfSigned/durations), removed top-level useCertManager, expanded status schema to conditions, added egressSelectorType for JWT issuers, updated TLS secret documentation to reflect cert-manager management.
Tests & Test Setup
controller/deploy/operator/test/e2e/*, controller/deploy/operator/internal/controller/jumpstarter/suite_test.go, controller/deploy/operator/test/utils/utils.go
Registered cert-manager scheme in unit/e2e suites; added e2e cert-manager scenarios and helpers (issuer/certificate readiness, TLS checks); added IsCertManagerReady util; bumped cert-manager test version.
Scripts & Utilities
controller/hack/deploy_with_operator.sh, controller/hack/utils/*, .gitignore
Added USE_CERTMANAGER and cert-manager install helpers (is_cert_manager_installed, install_cert_manager), CERTMANAGER_VERSION bump, conditional cert-manager install and generated certManager CR config; added .beads/* ignore.
Modules / Dependency Updates
controller/deploy/operator/go.mod, controller/go.mod
Added/updated cert-manager and related dependencies (cert-manager v1.x, controller-runtime, k8s libs, and numerous indirect updates) to support cert-manager integration.

Sequence Diagram(s)

sequenceDiagram
    participant Reconciler as JumpstarterReconciler
    participant CM as cert-manager
    participant Kube as Kubernetes API
    rect rgba(200,200,255,0.5)
    Reconciler->>Reconciler: reconcileCertificates()
    end
    alt certManager.enabled
        Reconciler->>CM: ensure Issuer (external or bootstrap self-signed)
        alt external issuer
            CM->>Kube: GET Issuer/ClusterIssuer
            Kube-->>CM: Issuer object
        else self-signed workflow
            CM->>Kube: CREATE SelfSigned Issuer
            Kube-->>CM: Issuer created
            CM->>Kube: CREATE CA Certificate (bootstrap)
            Kube-->>CM: CA secret ready
            CM->>Kube: CREATE CA Issuer referencing CA secret
            Kube-->>CM: CA Issuer ready
        end
        Reconciler->>CM: CREATE/UPDATE Controller Certificate
        CM->>Kube: Certificate resource persisted
        loop per router replica
            Reconciler->>CM: CREATE/UPDATE Router Certificate (router-N)
            CM->>Kube: Certificate resource persisted
        end
        Reconciler->>Kube: Verify TLS secrets exist / mount in Deployments
        Kube-->>Reconciler: Secrets present / Deployments updated
    end
    Reconciler->>Kube: update Status Conditions (CertManagerAvailable, IssuerReady, CertificatesReady, DeploymentsReady, Ready)
    Kube-->>Reconciler: Status updated
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped in quick to stitch TLS bright,

Bootstrapped a tiny CA by night,
Issuers hummed and certs took flight,
Secrets tucked in pods just right,
Hop—now conditions glow with light!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Operator cert manager' is vague and generic, using non-descriptive terms that fail to convey the specific nature of the changes. Consider a more specific title such as 'Add cert-manager integration to Jumpstarter operator' or 'Implement certificate management in operator' to better summarize the main changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 96.15% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
controller/deploy/operator/api/v1alpha1/jumpstarter_types.go (1)

279-282: Update doc comments that still reference useCertManager.
These should now point to spec.certManager.enabled to avoid confusion.

Also applies to: 390-393

🤖 Fix all issues with AI agents
In `@controller/deploy/operator/go.mod`:
- Line 6: Update the cert-manager dependency in go.mod from
"github.com/cert-manager/cert-manager v1.17.2" to a supported release
(preferably "v1.19.2" or at minimum a v1.18.x line) and then run module tooling
to reconcile transitive changes (e.g., run go get
github.com/cert-manager/cert-manager@v1.19.2 and go mod tidy) so the dependency
graph and go.sum are updated; ensure the single dependency line
"github.com/cert-manager/cert-manager ..." is changed accordingly.
🧹 Nitpick comments (5)
controller/deploy/operator/api/v1alpha1/jumpstarter_types.go (1)

560-572: Enforce mutual exclusivity of selfSigned and issuerRef.
The struct comment states only one should be set, but the CRD doesn’t currently enforce it.

♻️ Add CRD validation to enforce exclusivity
 // ServerCertConfig defines how server certificates are issued.
 // Only one of SelfSigned or IssuerRef should be specified.
 // If neither is specified and cert-manager is enabled, SelfSigned with defaults is used.
+// +kubebuilder:validation:XValidation:rule="!(has(self.selfSigned) && has(self.issuerRef))",message="Only one of selfSigned or issuerRef may be set"
 type ServerCertConfig struct {
controller/deploy/operator/internal/controller/jumpstarter/certificates.go (3)

213-214: Unused certDuration variable with misleading comment.

The certDuration variable is computed but immediately discarded with _ = certDuration. The comment "Will be used in certificate creation" is incorrect since the controller and router certificate functions independently re-fetch duration settings from the same config. This variable and comment can be removed.

🧹 Proposed fix
-	// Store cert duration for server certs
-	_ = certDuration // Will be used in certificate creation
-
 	return cmmeta.ObjectReference{

227-239: Duplicated duration settings extraction logic.

The duration settings extraction logic is duplicated between reconcileControllerCertificate (lines 227-239) and reconcileRouterCertificate (lines 290-302). Consider extracting a helper function.

♻️ Suggested helper extraction
// getCertDurationSettings returns certificate duration and renewBefore settings
func getCertDurationSettings(js *operatorv1alpha1.Jumpstarter) (certDuration, renewBefore time.Duration) {
	certDuration = defaultCertDuration
	renewBefore = defaultRenewBefore

	if js.Spec.CertManager.Server != nil && js.Spec.CertManager.Server.SelfSigned != nil {
		cfg := js.Spec.CertManager.Server.SelfSigned
		if cfg.CertDuration != nil {
			certDuration = cfg.CertDuration.Duration
		}
		if cfg.RenewBefore != nil {
			renewBefore = cfg.RenewBefore.Duration
		}
	}
	return
}

Also applies to: 290-302


468-480: IPv6 address extraction includes brackets.

The extractHostname function returns IPv6 addresses with brackets (e.g., [::1]). While brackets are invalid in DNS names (and thus won't be used as certificate SANs), this behavior is technically correct for extracting the host portion from an address. Consider adding a comment clarifying this edge case.

controller/deploy/operator/internal/controller/jumpstarter/status.go (1)

132-148: Consider adding RBAC-specific error handling for improved diagnostics.

The function assumes CRDs exist when encountering non-NoMatch errors (including RBAC forbidden), which is a valid "fail open" approach to avoid false negatives. However, you could distinguish RBAC forbidden errors using errors.IsForbidden() to provide clearer logging about permission issues versus actual CRD absence. This would help operators differentiate between access problems and missing components during troubleshooting.

@mangelajo mangelajo force-pushed the operator-cert-manager branch from ef9bb32 to 857d6a9 Compare January 23, 2026 19:23
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
controller/deploy/operator/test/e2e/e2e_test.go (1)

365-409: Base Jumpstarter YAML still uses removed useCertManager.
That field is pruned under the new schema; update the test CR to use spec.certManager.enabled: false so the test exercises the current API.

💡 Update test CR to the new certManager field
-  useCertManager: false
+  certManager:
+    enabled: false
🤖 Fix all issues with AI agents
In
`@controller/deploy/operator/config/crd/bases/operator.jumpstarter.dev_jumpstarters.yaml`:
- Around line 432-505: Documentation still references the removed
spec.useCertManager; update any descriptions that mention it to reference the
new spec.certManager block (e.g., use spec.certManager.enabled or the
certManager.server/selfSigned settings). Specifically modify the descriptions
for controller.grpc.tls and routers.grpc.tls to point to spec.certManager (and
certManager.server.issuerRef / certManager.selfSigned) and remove or replace any
guidance that instructs users to set spec.useCertManager. Also scan for other
occurrences of spec.useCertManager and update them to the new certManager
schema.

In
`@controller/deploy/operator/internal/controller/jumpstarter/jumpstarter_controller.go`:
- Around line 641-663: The controller currently only mounts TLS secrets when
CertManager.Enabled is true, ignoring manual TLS when users set
Spec.Controller.GRPC.TLS.CertSecret or Spec.Routers.GRPC.TLS.CertSecret; update
the TLS mounting logic in jumpstarter_controller.go to also handle the
disabled-cert-manager case by checking for a non-empty CertSecret and mounting
that secret (set EXTERNAL_CERT_PEM/EXTERNAL_KEY_PEM env vars, add a tls-certs
VolumeMount and a Volume with SecretName set to the user-provided cert secret
and DefaultMode int32(420)); do the same for the routers block (the other region
around lines 827-849) and reuse the existing helper names (e.g.,
GetControllerCertSecretName / GetRouterCertSecretName or directly use the
CertSecret from Spec) so manual TLS secrets are respected when
CertManager.Enabled is false.

In `@controller/deploy/operator/test/e2e/e2e_suite_test.go`:
- Around line 80-86: The current check uses IsCertManagerCRDsInstalled(), which
only verifies CRDs and can skip runtime readiness; implement a new
IsCertManagerReady() that verifies the cert-manager-webhook Deployment in
namespace "cert-manager" is Available (same readiness logic used in
InstallCertManager()), then update the e2e setup: if !IsCertManagerReady() call
InstallCertManager() (or wait for readiness) else proceed, removing reliance on
IsCertManagerCRDsInstalled() alone; reference IsCertManagerCRDsInstalled,
InstallCertManager, and the cert-manager-webhook Deployment when making the
change.

In `@controller/deploy/operator/test/e2e/e2e_test.go`:
- Around line 1188-1193: The test currently falls back to a hardcoded relative
path when GRPCURL is not set (see the grpcurlPath variable in e2e_test.go);
update the test invocation so GRPCURL is exported from the Makefile instead of
relying on the fallback: modify the test-e2e target to set
GRPCURL=$(LOCALBIN)/grpcurl (and pass KIND_CLUSTER as before) so the environment
provides a concrete grpcurl path at runtime, then remove reliance on the
fallback by ensuring grpcurlPath reads from os.Getenv("GRPCURL") only.
♻️ Duplicate comments (1)
controller/deploy/operator/go.mod (1)

6-6: Version mismatch: go.mod uses v1.17.2 but test utils install v1.19.2.

There's an inconsistency between the cert-manager Go module dependency (v1.17.2) and the cert-manager bundle version installed during tests (v1.19.2 in test/utils/utils.go). This could cause API compatibility issues between compile-time types and runtime CRDs.

Update the go.mod dependency to match the test installation version:

-	github.com/cert-manager/cert-manager v1.17.2
+	github.com/cert-manager/cert-manager v1.19.2

Then run go mod tidy to reconcile transitive dependencies.

🧹 Nitpick comments (4)
controller/deploy/operator/internal/controller/jumpstarter/certificates.go (3)

213-214: Unused variable: certDuration is computed but never used.

The comment states "Will be used in certificate creation" but this value is discarded. The certificate functions (reconcileControllerCertificate, reconcileRouterCertificate) independently re-fetch duration settings from the spec. This appears to be leftover code.

🧹 Suggested cleanup
-	// Store cert duration for server certs
-	_ = certDuration // Will be used in certificate creation
-
 	return cmmeta.ObjectReference{

227-239: Consider extracting duration config retrieval to reduce duplication.

The duration configuration fetching logic (lines 107-123, 227-239, 290-302) is duplicated across three functions. Consider extracting a helper:

♻️ Suggested helper
func (r *JumpstarterReconciler) getDurationConfig(js *operatorv1alpha1.Jumpstarter) (caDuration, certDuration, renewBefore time.Duration) {
    caDuration = defaultCADuration
    certDuration = defaultCertDuration
    renewBefore = defaultRenewBefore
    
    if js.Spec.CertManager.Server != nil && js.Spec.CertManager.Server.SelfSigned != nil {
        cfg := js.Spec.CertManager.Server.SelfSigned
        if cfg.CADuration != nil {
            caDuration = cfg.CADuration.Duration
        }
        if cfg.CertDuration != nil {
            certDuration = cfg.CertDuration.Duration
        }
        if cfg.RenewBefore != nil {
            renewBefore = cfg.RenewBefore.Duration
        }
    }
    return
}

491-499: Consider using slices.Contains from standard library (Go 1.21+).

This custom helper can be replaced with slices.Contains(slice, str) from the slices package, which is available since Go 1.21 (the project uses Go 1.24).

controller/deploy/operator/api/v1alpha1/jumpstarter_types.go (1)

560-572: Consider adding validation for mutual exclusivity of SelfSigned and IssuerRef.

The comment states "Only one of SelfSigned or IssuerRef should be specified" but this isn't enforced by kubebuilder validation. The operator code handles this gracefully (IssuerRef takes priority per reconcileIssuer), but explicit CRD validation would provide better UX.

This could be enforced using a webhook or CEL validation in a future iteration.

@mangelajo mangelajo force-pushed the operator-cert-manager branch from 857d6a9 to b16484e Compare January 24, 2026 19:58
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@controller/deploy/operator/internal/controller/jumpstarter/certificates.go`:
- Around line 468-489: The extractHostname function incorrectly chops
unbracketed IPv6 addresses at the last colon; replace the manual parsing with
net.SplitHostPort to safely separate host and port and fall back to the original
address when no port is present. Update extractHostname to first try
net.SplitHostPort(address) and, if it returns an error indicating no port,
return the full address; also keep existing bracket-handling for bracketed IPv6
but prefer SplitHostPort for addresses that may include ports so hostname
extraction works for IPv4, IPv6 (bracketed), and host:port forms.

In `@controller/deploy/operator/internal/controller/jumpstarter/status.go`:
- Around line 294-315: The current loop treating any error from r.Get(ctx,
NamespacedName{...}, deployment) as “not ready” hides real failures; update the
error handling in the router deployment check (inside the loop using
deploymentName, r.Get, allReady, notReadyRouters) to distinguish
apierrors.IsNotFound(err) — in that case mark the router not ready and continue
— but for any other error return/propagate a concrete error (e.g., wrap with
context mentioning deploymentName) so non-NotFound failures surface to the
caller for diagnostics.
- Around line 239-255: The current handling of r.Get(ctx,
types.NamespacedName{Name: secretName, Namespace: js.Namespace}, secret) treats
all errors as “pending”; change it to distinguish not-found from other API
errors by using apierrors.IsNotFound(err) (or equivalent): if IsNotFound then
mark allReady=false and append i to notReadyRouters, but for any other err
return or propagate a concrete error (or log with context including secretName
and js.Namespace) so RBAC/transient API errors are surfaced instead of silently
treated as not-ready; keep the subsequent tls.crt/tls.key checks on secret.Data
unchanged.
♻️ Duplicate comments (5)
controller/deploy/operator/test/e2e/e2e_suite_test.go (1)

80-86: CRD-only check can skip cert-manager installation when controllers are not running.

The IsCertManagerCRDsInstalled() check only verifies CRD presence. CRDs can persist after partial uninstalls while the cert-manager webhook/controllers are down, causing subsequent Issuer/Certificate operations to fail.

As suggested in the prior review, consider implementing an IsCertManagerReady() function that verifies the cert-manager-webhook deployment is Available, matching the readiness criteria already used in InstallCertManager().

controller/deploy/operator/test/e2e/e2e_test.go (1)

1189-1193: Consider passing GRPCURL via environment from the Makefile.

The relative path fallback works but couples the test to the directory structure. Exporting GRPCURL explicitly from the Makefile's test-e2e target would make the dependency explicit.

controller/deploy/operator/config/crd/bases/operator.jumpstarter.dev_jumpstarters.yaml (1)

1144-1158: TLS description still references removed spec.useCertManager field.

Lines 1148-1149 mention "useCertManager" and "spec.useCertManager" but this field was replaced by spec.certManager.enabled. The same issue exists in routers.grpc.tls (lines 1693-1694) and controller.restApi.tls (lines 1440-1441).

These descriptions are generated from the Go type comments in jumpstarter_types.go (GRPCConfig.TLS field around lines 278-282).

controller/deploy/operator/go.mod (1)

6-6: cert-manager v1.17.2 is no longer a supported release line.

As noted in a previous review, cert-manager v1.17.x is no longer supported. The project currently supports only v1.19 and v1.18 release lines. Consider upgrading to v1.19.2 (latest stable) for continued security updates.

controller/deploy/operator/internal/controller/jumpstarter/jumpstarter_controller.go (1)

641-663: Manual TLS secrets still ignored when cert-manager is disabled.

The TLS mount logic is still gated solely on CertManager.Enabled, so Spec.Controller.GRPC.TLS.CertSecret / Spec.Routers.GRPC.TLS.CertSecret remain unused.

Also applies to: 827-849

🧹 Nitpick comments (4)
controller/hack/utils (1)

177-196: CRD-only check may skip installation when cert-manager controllers are not running.

is_cert_manager_installed() only verifies that the certificates.cert-manager.io CRD exists. CRDs can persist after a partial uninstall or failed deployment while the cert-manager webhook and controllers are not running. This could cause downstream scripts to skip installation and then fail when attempting to create Issuer/Certificate resources.

Consider adding a webhook deployment readiness check (similar to what install_cert_manager() already waits for) as part of the installation check:

🔧 Suggested improvement
 # Check if cert-manager CRDs are installed in the cluster
 is_cert_manager_installed() {
-  kubectl get crd certificates.cert-manager.io > /dev/null 2>&1
+  # Check both CRDs and webhook deployment readiness
+  kubectl get crd certificates.cert-manager.io > /dev/null 2>&1 && \
+    kubectl get deployment cert-manager-webhook -n cert-manager > /dev/null 2>&1 && \
+    kubectl wait --namespace cert-manager \
+      --for=condition=available deployment/cert-manager-webhook \
+      --timeout=5s > /dev/null 2>&1
 }
controller/deploy/operator/test/e2e/e2e_test.go (1)

1291-1299: Consider using Fail() for clearer failure semantics.

The pattern g.Expect(false).To(BeTrue(), msg) works but is unconventional. Using Ginkgo's Fail() or the idiomatic g.Expect(true).To(BeFalse(), msg) more clearly expresses failure intent.

♻️ Optional refactor
 		// Check for Ready condition
 		for _, cond := range issuer.Status.Conditions {
 			if cond.Type == certmanagerv1.IssuerConditionReady {
 				g.Expect(cond.Status).To(Equal(cmmeta.ConditionTrue),
 					fmt.Sprintf("Issuer %s is not ready: %s", name, cond.Message))
 				return
 			}
 		}
-		g.Expect(false).To(BeTrue(), fmt.Sprintf("Issuer %s has no Ready condition", name))
+		Fail(fmt.Sprintf("Issuer %s has no Ready condition", name))

This pattern also appears in waitForClusterIssuerReady (line 1324) and waitForCertificateReady (line 1350).

controller/deploy/operator/api/v1alpha1/jumpstarter_types.go (1)

560-572: Consider adding validation for mutual exclusivity between SelfSigned and IssuerRef.

The comment states "Only one of SelfSigned or IssuerRef should be specified" but there's no kubebuilder validation marker to enforce this. Users could configure both, leading to ambiguous behavior.

Consider adding a CEL validation rule or documenting the precedence clearly.

💡 Example CEL validation
// +kubebuilder:validation:XValidation:rule="!(has(self.selfSigned) && has(self.issuerRef))",message="only one of selfSigned or issuerRef may be specified"
type ServerCertConfig struct {
controller/deploy/operator/internal/controller/jumpstarter/status.go (1)

43-88: Set cert-manager conditions when integration is disabled.

When CertManager.Enabled flips from true → false, existing cert-manager conditions are left untouched, which can leave stale readiness info in status. Consider explicitly setting these conditions to False (or a “Disabled” reason) when integration is off.

♻️ Proposed adjustment
-	if js.Spec.CertManager.Enabled {
+	if js.Spec.CertManager.Enabled {
 		// cert-manager checks...
-	}
+	} else {
+		setCondition(js, operatorv1alpha1.ConditionTypeCertManagerAvailable, false, "Disabled", "cert-manager integration disabled")
+		setCondition(js, operatorv1alpha1.ConditionTypeIssuerReady, false, "Disabled", "cert-manager integration disabled")
+		setCondition(js, operatorv1alpha1.ConditionTypeControllerCertificateReady, false, "Disabled", "cert-manager integration disabled")
+		setCondition(js, operatorv1alpha1.ConditionTypeRouterCertificatesReady, false, "Disabled", "cert-manager integration disabled")
+	}

Copy link
Copy Markdown
Member Author

@mangelajo mangelajo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of inline comments, handling.

Comment on lines +264 to +267
PrivateKey: &certmanagerv1.CertificatePrivateKey{
Algorithm: certmanagerv1.ECDSAKeyAlgorithm,
Size: 256,
},
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to make this configurable in the future, but not yet, this patch is already big enough.

@mangelajo mangelajo force-pushed the operator-cert-manager branch from da14773 to dd573f6 Compare January 26, 2026 16:13
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
`@controller/deploy/operator/bundle/manifests/operator.jumpstarter.dev_jumpstarters.yaml`:
- Around line 432-505: The schema allows ambiguous cert-manager configs because
certManager.enabled can be true while server has neither or both issuerRef and
selfSigned; update the CRD/OpenAPI schema for certManager.server to enforce
exactly one mode by adding a oneOf constraint that requires either the presence
of issuerRef (object with required name) or the presence of selfSigned (object)
and disallows both; ensure this validation is applied under the
certManager.server schema so manifests with certManager.enabled=true must select
exactly one of issuerRef or selfSigned (and will be rejected at validation
time).

In
`@controller/deploy/operator/internal/controller/jumpstarter/jumpstarter_controller.go`:
- Around line 595-670: When cert-manager is enabled the code unconditionally
uses GetControllerCertSecretName(jumpstarter) and ignores
jumpstarter.Spec.Controller.GRPC.TLS.CertSecret causing secret name mismatches;
change the logic in the TLS secret selection blocks to prefer
jumpstarter.Spec.Controller.GRPC.TLS.CertSecret when non-empty even if
jumpstarter.Spec.CertManager.Enabled is true (fall back to
GetControllerCertSecretName(...) only if CertSecret is empty), apply the same
pattern for the router TLS selection, and update
reconcileControllerCertificate() and reconcileRouterCertificate() to accept/use
the resolved secret name (rather than always calling Get*CertSecretName) so the
controller and router certificate reconciliation use the exact secret chosen for
mounting.
♻️ Duplicate comments (2)
controller/deploy/operator/config/crd/bases/operator.jumpstarter.dev_jumpstarters.yaml (1)

1167-1179: Replace useCertManager references with spec.certManager.enabled in TLS docs.
These sections still point to the removed field and can mislead users.

✏️ Example edit (apply similarly in all TLS sections)
-                          If useCertManager is enabled, this secret will be automatically managed and
-                          configured by cert-manager.
+                          If spec.certManager.enabled is true, this secret will be automatically managed and
+                          configured by cert-manager.
#!/bin/bash
# Find remaining useCertManager references in the CRD
rg -n "useCertManager" controller/deploy/operator/config/crd/bases/operator.jumpstarter.dev_jumpstarters.yaml

Also applies to: 1463-1465, 1714-1725

controller/deploy/operator/api/v1alpha1/jumpstarter_types.go (1)

389-390: Update TLSConfig docs to reference spec.certManager.enabled.
The comment still mentions useCertManager, which was replaced by the certManager block.

✏️ Suggested edit
-	// If useCertManager is enabled, this secret will be automatically managed and
-	// configured by cert-manager.
+	// If spec.certManager.enabled is true, this secret will be automatically managed and
+	// configured by cert-manager.
#!/bin/bash
# Find remaining useCertManager references in this file
rg -n "useCertManager" controller/deploy/operator/api/v1alpha1/jumpstarter_types.go
🧹 Nitpick comments (1)
controller/go.mod (1)

19-28: Update google.golang.org/genproto/googleapis/api to a more recent version.

The version v0.0.0-20250204164813-702378808489 (Feb 4, 2025) is significantly outdated; as of January 26, 2026, the latest is v0.0.0-20260122232226-8e98ce8d340d (Jan 22, 2026). Consider updating to a recent 2026 build to align with current API definitions and fixes.

No security advisories exist for google.golang.org/genproto itself or sigs.k8s.io/yaml v1.4.0. The protobuf dependency (v1.36.5) includes fixes for known DoS and parsing vulnerabilities (GO-2024-2611, GO-2023-1631), so the current stack is secure.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
controller/deploy/operator/config/crd/bases/operator.jumpstarter.dev_jumpstarters.yaml (3)

1167-1182: TLS description still references removed useCertManager field.

Lines 1171-1172 and 1178-1179 reference useCertManager and spec.useCertManager, but this field was replaced with spec.certManager.enabled. This creates documentation inconsistency.

📝 Suggested fix
                       tls:
                         description: |-
                           TLS configuration for secure gRPC communication.
                           Requires a Kubernetes secret containing the TLS certificate and private key.
-                          If useCertManager is enabled, this secret will be automatically created.
-                          See also: spec.useCertManager for automatic certificate management.
+                          If spec.certManager.enabled is true, this secret will be automatically created.
+                          See also: spec.certManager for automatic certificate management.
                         properties:
                           certSecret:
                             description: |-
                               Name of the Kubernetes secret containing the TLS certificate and private key.
                               The secret must contain 'tls.crt' and 'tls.key' keys.
-                              If useCertManager is enabled, this secret will be automatically managed and
-                              configured by cert-manager.
+                              If spec.certManager.enabled is true, this secret will be automatically managed
+                              by cert-manager.

1455-1468: Same stale useCertManager reference in REST API TLS config.

Lines 1464-1465 have the same outdated reference. Apply the same fix as suggested for controller.grpc.tls.


1714-1729: Same stale useCertManager references in router TLS config.

Lines 1718-1719 and 1725-1726 have outdated references. Update to reference spec.certManager.enabled and spec.certManager for consistency with the new schema.

♻️ Duplicate comments (1)
controller/deploy/operator/bundle/manifests/operator.jumpstarter.dev_jumpstarters.yaml (1)

443-517: Schema still allows ambiguous cert-manager configuration.

The certManager.server schema permits both issuerRef and selfSigned to be specified simultaneously (or neither), which can lead to ambiguous configurations that only fail at reconcile time. A oneOf constraint would enforce exactly one mode at validation time.

This was flagged in a previous review - consider addressing when convenient.

🧹 Nitpick comments (4)
controller/deploy/operator/test/e2e/e2e_suite_test.go (1)

90-92: Consider adding cleanup comments or logic.

The AfterSuite is empty, which is acceptable if the e2e tests run on ephemeral Kind clusters that are destroyed afterward. If the tests run on persistent clusters, consider whether cert-manager should be uninstalled or if the empty block is intentional and deserves a clarifying comment.

controller/deploy/operator/api/v1alpha1/jumpstarter_types.go (1)

567-570: Minor: Fix comment formatting.

Line 569 is missing a space after the comment delimiter.

📝 Suggested fix
 	// Reference an existing cert-manager Issuer or ClusterIssuer.
 	// Use this to integrate with existing PKI infrastructure (ACME, Vault, etc.).
-	//This overrides SelfSigned.Enabled = true which is the default setting
+	// This overrides SelfSigned.Enabled = true which is the default setting
 	IssuerRef *IssuerReference `json:"issuerRef,omitempty"`
controller/deploy/operator/internal/controller/jumpstarter/status.go (1)

132-148: Consider the implications of assuming CRDs exist on RBAC errors.

Returning true on non-IsNoMatchError errors (e.g., RBAC issues) allows the reconciliation to proceed but may cause confusing downstream failures if cert-manager isn't actually installed. The V(1) logging helps with debugging, but operators may not notice the issue until certificate creation fails.

This is a design trade-off - the current approach prioritizes not blocking on transient errors, which is reasonable for initial deployment scenarios.

controller/deploy/operator/config/crd/bases/operator.jumpstarter.dev_jumpstarters.yaml (1)

362-372: Consider adding enum validation for egressSelectorType.

The description states valid choices are "controlplane" and "cluster", but the schema only defines type: string without an enum constraint. Adding validation would prevent invalid values at admission time.

🔧 Suggested schema change
                             egressSelectorType:
                               description: |-
                                 egressSelectorType is an indicator of which egress selection should be used for sending all traffic related
                                 to this issuer (discovery, JWKS, distributed claims, etc).  If unspecified, no custom dialer is used.
                                 When specified, the valid choices are "controlplane" and "cluster".  These correspond to the associated
                                 values in the --egress-selector-config-file.

                                 - controlplane: for traffic intended to go to the control plane.

                                 - cluster: for traffic intended to go to the system being managed by Kubernetes.
                               type: string
+                              enum:
+                              - controlplane
+                              - cluster

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@controller/deploy/operator/internal/controller/jumpstarter/certificates.go`:
- Around line 214-215: Remove the dead certDuration variable and its discarded
assignment: delete the certDuration declaration and the "_ = certDuration" line,
since reconcileControllerCertificate and reconcileRouterCertificate compute
their own durations; also remove the other unused assignments mentioned in the
review (the earlier unused assignment and the unused block referenced in the
comment) so no leftover/unused duration or assignment statements remain in this
file.
🧹 Nitpick comments (1)
controller/deploy/operator/internal/controller/jumpstarter/certificates.go (1)

228-240: Consider extracting duration configuration into a helper.

The duration extraction logic is duplicated in reconcileSelfSignedIssuer, reconcileControllerCertificate, and reconcileRouterCertificate. A helper function would reduce duplication and ensure consistent behavior.

♻️ Proposed refactor
// getCertDurations returns cert duration and renewBefore from config with defaults.
func getCertDurations(js *operatorv1alpha1.Jumpstarter) (certDuration, renewBefore time.Duration) {
	certDuration = defaultCertDuration
	renewBefore = defaultRenewBefore

	if js.Spec.CertManager.Server != nil && js.Spec.CertManager.Server.SelfSigned != nil {
		cfg := js.Spec.CertManager.Server.SelfSigned
		if cfg.CertDuration != nil {
			certDuration = cfg.CertDuration.Duration
		}
		if cfg.RenewBefore != nil {
			renewBefore = cfg.RenewBefore.Duration
		}
	}
	return
}

// clampRenewBefore ensures renewBefore is less than duration.
func clampRenewBefore(duration, renewBefore time.Duration) time.Duration {
	if renewBefore >= duration {
		return duration / 2
	}
	return renewBefore
}

Also applies to: 291-303

@mangelajo
Copy link
Copy Markdown
Member Author

@bennyz @evakhoni this one is long, I recommend looking at the e2e tests for understanding it a bit better: https://github.com/jumpstarter-dev/jumpstarter/pull/139/changes#diff-0ddae967b2d602add17ae3915e5fdf7d51bcb787ef477bc209594517997e1f71R771

This only creates the operator behavior, I need to add support on the jmp admin tool to setup the CA bundle on clients when those are created, etc. And then E2E testing on our overall e2e tests.

@mangelajo
Copy link
Copy Markdown
Member Author

Also the tests for this are ran on Kind based CI / e2e-test-operator (pull_request)Started 7 minutes ago — This check has started...

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
controller/deploy/operator/api/v1alpha1/jumpstarter_types.go (1)

385-392: Update TLSConfig comment to the new certManager field.
Line 389-390 still mentions useCertManager, which was removed. This can mislead users reading the API docs.

🧹 Suggested doc fix
-	// If useCertManager is enabled, this secret will be automatically managed and
-	// configured by cert-manager.
+	// If spec.certManager.enabled is true, this secret will be automatically managed
+	// and configured by cert-manager.
🤖 Fix all issues with AI agents
In `@controller/deploy/operator/internal/controller/jumpstarter/certificates.go`:
- Around line 235-303: reconcileServerCertificate currently puts all endpoint
host strings (from extractHostname) into Certificate.Spec.DNSNames, but
cert-manager v1 requires IP addresses to be in Certificate.Spec.IPAddresses;
update reconcileServerCertificate to iterate the input dnsNames, use net.ParseIP
on each entry and append parsed IPs to an ipAddrs slice and non-IPs to a dns
slice, then set cert.Spec.DNSNames = dns and cert.Spec.IPAddresses = ipAddrs;
ensure you import net and reference the CertificateSpec.IPAddresses field when
building the cert object.
- Around line 322-353: The collectControllerDNSNames function currently only
adds gRPC endpoints to the SAN list; update it to also iterate
js.Spec.Controller.RestAPI.Endpoints, extract the hostname using
extractHostname, and append each non-empty, non-duplicate host to dnsNames
(using contains to check duplicates) so REST API hostnames are included in the
controller certificate SANs; ensure the same hostname de-duplication logic used
for GRPC endpoints is applied to RestAPI endpoints and keep the existing default
names logic.

Comment on lines +322 to +353
// collectControllerDNSNames collects all DNS names for the controller certificate.
func (r *JumpstarterReconciler) collectControllerDNSNames(js *operatorv1alpha1.Jumpstarter) []string {
dnsNames := make([]string, 0)

// Add default controller service name
dnsNames = append(dnsNames,
fmt.Sprintf("%s-controller", js.Name),
fmt.Sprintf("%s-controller.%s", js.Name, js.Namespace),
fmt.Sprintf("%s-controller.%s.svc", js.Name, js.Namespace),
fmt.Sprintf("%s-controller.%s.svc.cluster.local", js.Name, js.Namespace),
)

// Add DNS names from configured endpoints
for _, endpoint := range js.Spec.Controller.GRPC.Endpoints {
if endpoint.Address != "" {
host := extractHostname(endpoint.Address)
if host != "" && !contains(dnsNames, host) {
dnsNames = append(dnsNames, host)
}
}
}

// Add default domain-based name
if js.Spec.BaseDomain != "" {
defaultName := fmt.Sprintf("grpc.%s", js.Spec.BaseDomain)
if !contains(dnsNames, defaultName) {
dnsNames = append(dnsNames, defaultName)
}
}

return dnsNames
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -n "GetControllerCertSecretName|controllerCertSuffix" -g '*.go'
echo "---"
rg -n "RestAPI|restApi" controller/deploy/operator/internal/controller/jumpstarter -g '*.go'

Repository: jumpstarter-dev/jumpstarter

Length of output: 1017


🏁 Script executed:

#!/bin/bash
# Find the Jumpstarter spec definition
fd -e go -e yaml | xargs rg -l "Jumpstarter.*Spec" | head -5
echo "---"
# Look for the operatorv1alpha1 types
fd "types.go|jumpstarter.*types" -e go

Repository: jumpstarter-dev/jumpstarter

Length of output: 960


🏁 Script executed:

#!/bin/bash
# Read the Jumpstarter types file
cat -n controller/deploy/operator/api/v1alpha1/jumpstarter_types.go | head -200

Repository: jumpstarter-dev/jumpstarter

Length of output: 8745


🏁 Script executed:

#!/bin/bash
# Continue reading to find ControllerConfig definition
sed -n '200,400p' controller/deploy/operator/api/v1alpha1/jumpstarter_types.go | cat -n

Repository: jumpstarter-dev/jumpstarter

Length of output: 11026


🏁 Script executed:

#!/bin/bash
# Continue reading to see RestAPIConfig full definition
sed -n '400,450p' controller/deploy/operator/api/v1alpha1/jumpstarter_types.go | cat -n

Repository: jumpstarter-dev/jumpstarter

Length of output: 2997


🏁 Script executed:

#!/bin/bash
# Look at the reconcile logic in certificates.go to understand cert generation
rg -n "func.*reconcile.*[Cc]ert|controllerCert|GetControllerCertSecretName" controller/deploy/operator/internal/controller/jumpstarter/certificates.go -A 5

Repository: jumpstarter-dev/jumpstarter

Length of output: 2868


🏁 Script executed:

#!/bin/bash
# Search for REST API certificate handling
rg -n "RestAPI|restApi" controller/deploy/operator/internal/controller/jumpstarter/certificates.go
echo "---"
# Check if there's separate REST API cert reconciliation
rg -n "reconcileRestAPI|RestAPI.*[Cc]ert" controller/deploy/operator/internal/controller/jumpstarter/ -g '*.go'

Repository: jumpstarter-dev/jumpstarter

Length of output: 74


🏁 Script executed:

#!/bin/bash
# Check the deployment generation to see how TLS is handled
rg -n "controller-tls|tls\.crt|tls\.key|TLS|RestAPI" controller/deploy/operator/internal/controller/jumpstarter/deployments.go -A 2 -B 2 | head -100

Repository: jumpstarter-dev/jumpstarter

Length of output: 184


🏁 Script executed:

#!/bin/bash
# Find deployment-related files
fd -e go "deploy" controller/deploy/operator/internal/controller/jumpstarter/
echo "---"
# List all go files in jumpstarter directory
fd -e go controller/deploy/operator/internal/controller/jumpstarter/ | head -20

Repository: jumpstarter-dev/jumpstarter

Length of output: 633


🏁 Script executed:

#!/bin/bash
# List all go files in jumpstarter directory
fd -e go . controller/deploy/operator/internal/controller/jumpstarter/ | sort

Repository: jumpstarter-dev/jumpstarter

Length of output: 1345


🏁 Script executed:

#!/bin/bash
# Check jumpstarter_controller.go around line 644
sed -n '630,660p' controller/deploy/operator/internal/controller/jumpstarter/jumpstarter_controller.go | cat -n
echo "---"
# Check status.go around line 210
sed -n '200,220p' controller/deploy/operator/internal/controller/jumpstarter/status.go | cat -n

Repository: jumpstarter-dev/jumpstarter

Length of output: 2282


🏁 Script executed:

#!/bin/bash
# Read the collectControllerDNSNames function fully
sed -n '322,353p' controller/deploy/operator/internal/controller/jumpstarter/certificates.go | cat -n

Repository: jumpstarter-dev/jumpstarter

Length of output: 1339


Add REST API endpoints to controller certificate SANs.

The collectControllerDNSNames function only includes gRPC endpoints. When cert-manager is enabled, the same controller TLS certificate is mounted for both REST API and gRPC services, but the certificate lacks SANs for REST API hostnames configured in js.Spec.Controller.RestAPI.Endpoints, causing TLS validation failures for REST clients.

✅ Suggested fix
 	// Add DNS names from configured endpoints
 	for _, endpoint := range js.Spec.Controller.GRPC.Endpoints {
 		if endpoint.Address != "" {
 			host := extractHostname(endpoint.Address)
 			if host != "" && !contains(dnsNames, host) {
 				dnsNames = append(dnsNames, host)
 			}
 		}
 	}
+
+	// Add DNS names from configured REST API endpoints
+	for _, endpoint := range js.Spec.Controller.RestAPI.Endpoints {
+		if endpoint.Address != "" {
+			host := extractHostname(endpoint.Address)
+			if host != "" && !contains(dnsNames, host) {
+				dnsNames = append(dnsNames, host)
+			}
+		}
+	}
🤖 Prompt for AI Agents
In `@controller/deploy/operator/internal/controller/jumpstarter/certificates.go`
around lines 322 - 353, The collectControllerDNSNames function currently only
adds gRPC endpoints to the SAN list; update it to also iterate
js.Spec.Controller.RestAPI.Endpoints, extract the hostname using
extractHostname, and append each non-empty, non-duplicate host to dnsNames
(using contains to check duplicates) so REST API hostnames are included in the
controller certificate SANs; ensure the same hostname de-duplication logic used
for GRPC endpoints is applied to RestAPI endpoints and keep the existing default
names logic.

mangelajo and others added 9 commits January 28, 2026 11:43
Add new types for cert-manager integration:
- CertManagerConfig: Main configuration struct with enabled flag
- ServerCertConfig: Server certificate configuration (selfSigned or issuerRef)
- SelfSignedConfig: Self-signed CA configuration with duration settings
- IssuerReference: Reference to existing cert-manager Issuer/ClusterIssuer

Replace the simple UseCertManager bool with the structured CertManager field
to support both self-signed CA and external issuer modes.

Resolves: controller-6ih

Co-Authored-By: Claude <noreply@anthropic.com>
Add Conditions slice to JumpstarterStatus following Kubernetes conventions.
All status information is now expressed through conditions instead of
separate boolean fields.

Condition types added:
- CertManagerAvailable: cert-manager CRDs installed
- IssuerReady: Issuer ready to issue certificates
- ControllerCertificateReady: Controller TLS cert ready
- RouterCertificatesReady: All router TLS certs ready
- ControllerDeploymentReady: Controller deployment available
- RouterDeploymentsReady: Router deployments available
- Ready: Aggregate condition for overall system readiness

Also adds condition type constants for consistent referencing.

Resolves: controller-qky

Co-Authored-By: Claude <noreply@anthropic.com>
Add certificates.go with cert-manager resource reconciliation:
- reconcileCertificates: Main entry point for certificate management
- reconcileSelfSignedIssuer: Creates self-signed CA infrastructure
- reconcileControllerCertificate: Issues TLS cert for controller
- reconcileRouterCertificate: Issues TLS certs for each router replica

The self-signed flow creates:
1. SelfSigned Issuer (bootstrap)
2. CA Certificate signed by self-signed issuer
3. CA Issuer using the CA certificate secret
4. Server certificates signed by CA Issuer

Also supports external issuer mode via issuerRef configuration.

Changes:
- Add cert-manager v1.17.2 dependency
- Register cert-manager scheme in main.go
- Call reconcileCertificates from main reconcile loop
- Add helper functions for DNS name collection

Resolves: controller-jrs

Co-Authored-By: Claude <noreply@anthropic.com>
Modify createControllerDeployment and createRouterDeployment to:
- Mount TLS certificate secret when cert-manager is enabled
- Add EXTERNAL_CERT_PEM and EXTERNAL_KEY_PEM environment variables
- Secret names are derived from Jumpstarter CR name

The controller reads TLS certs from /tls/tls.crt and /tls/tls.key
when cert-manager integration is enabled.

Resolves: controller-7hb

Co-Authored-By: Claude <noreply@anthropic.com>
Add status.go with comprehensive status checking:
- checkCertManagerCRDs: Verify cert-manager is installed
- checkIssuerReady: Check issuer (CA or external) is ready
- checkControllerCertificateReady: Verify controller TLS secret exists
- checkRouterCertificatesReady: Verify all router TLS secrets exist
- checkControllerDeploymentReady: Check controller deployment available
- checkRouterDeploymentsReady: Check all router deployments available

All status is reported through conditions:
- CertManagerAvailable
- IssuerReady
- ControllerCertificateReady
- RouterCertificatesReady
- ControllerDeploymentReady
- RouterDeploymentsReady
- Ready (aggregate)

Resolves: controller-30w

Co-Authored-By: Claude <noreply@anthropic.com>
Add kubebuilder RBAC markers for:
- cert-manager.io/issuers: full CRUD for managing self-signed issuers
- cert-manager.io/clusterissuers: read-only for external issuer references
- cert-manager.io/certificates: full CRUD for TLS certificate management
- Status read access for checking issuer/certificate readiness

These markers will generate the necessary ClusterRole rules
when running make manifests.

Resolves: controller-a7q

Co-Authored-By: Claude <noreply@anthropic.com>
Add cert-manager installation support to deployment scripts:

controller/hack/utils:
- Add CERTMANAGER_VERSION env var (default: v1.19.2)
- Add is_cert_manager_installed() function
- Add install_cert_manager() function with webhook wait

controller/hack/deploy_with_operator.sh:
- Add USE_CERTMANAGER env var (default: false)
- Install cert-manager when USE_CERTMANAGER=true
- Configure Jumpstarter CR with certManager config

Usage:
  ./hack/deploy_with_operator.sh                      # Without cert-manager
  USE_CERTMANAGER=true ./hack/deploy_with_operator.sh # With cert-manager

Resolves: controller-rub

Co-Authored-By: Claude <noreply@anthropic.com>
Update test utilities and e2e setup for cert-manager:

test/utils/utils.go:
- Update certmanagerVersion to v1.19.2

test/e2e/e2e_suite_test.go:
- Add InstallCertManager to BeforeSuite
- Check if cert-manager is already installed before installing
- Import test utils package

This ensures cert-manager is available for e2e tests that
verify certificate management functionality.

Resolves: controller-258

Co-Authored-By: Claude <noreply@anthropic.com>
Run make generate and make manifests to update:

api/v1alpha1/zz_generated.deepcopy.go:
- Add DeepCopy functions for new cert-manager types
- CertManagerConfig, ServerCertConfig, SelfSignedConfig, IssuerReference

config/crd/bases/operator.jumpstarter.dev_jumpstarters.yaml:
- Add certManager spec field with nested configuration
- Add conditions to status with proper list type hints

config/rbac/role.yaml:
- Add cert-manager.io/issuers permissions
- Add cert-manager.io/clusterissuers permissions
- Add cert-manager.io/certificates permissions

Resolves: controller-utt

Co-Authored-By: Claude <noreply@anthropic.com>
mangelajo and others added 15 commits January 28, 2026 11:46
Add comprehensive e2e tests for cert-manager integration:

e2e_suite_test.go:
- Add cert-manager scheme import and registration

e2e_test.go:
- Add cert-manager and meta imports
- Add helper functions:
  - verifyCondition: Check condition status
  - waitForCondition: Wait for condition with timeout
  - getJumpstarterConditions: Get all conditions
  - createSelfSignedClusterIssuer: Create test ClusterIssuer
  - waitForIssuerReady: Wait for Issuer Ready condition
  - waitForClusterIssuerReady: Wait for ClusterIssuer Ready
  - waitForCertificateReady: Wait for Certificate Ready
  - verifyTLSSecret: Verify TLS secret has required keys
  - verifyDeploymentHasTLSMount: Verify TLS volume mount/env
  - dumpCertManagerResourcesOnFailure: Debug logging

Test Contexts added:
1. cert-manager self-signed mode:
   - Deploys Jumpstarter with selfSigned enabled
   - Verifies self-signed issuer, CA cert, CA issuer
   - Verifies controller and router certificates
   - Verifies TLS secrets exist with tls.crt/tls.key
   - Verifies deployment TLS mounts and env vars
   - Verifies all 7 status conditions are True

2. cert-manager external issuer mode:
   - Creates test ClusterIssuer
   - Deploys Jumpstarter with issuerRef
   - Verifies no self-signed issuer created
   - Verifies certificates reference external issuer
   - Verifies TLS secrets are populated

Co-Authored-By: Claude <noreply@anthropic.com>
- Add ContinueOnFailure to top-level Describe so that cert-manager
  tests run even if earlier operator tests fail
- Fix grpcurl path from ../../../../bin/grpcurl to ../../bin/grpcurl
  (tests run from controller/deploy/operator/ directory)

Co-Authored-By: Claude <noreply@anthropic.com>
- Set DefaultMode explicitly on TLS secret volumes to avoid reconciliation
  loop. Kubernetes defaults secret volumes to mode 420 (0644), but when
  the operator creates deployments without this field, the comparison
  sees a difference and triggers continuous updates.

- Increase deployment readiness timeouts from 3 to 5 minutes in e2e tests
  to accommodate slower environments (image pulls, pod scheduling).

Co-Authored-By: Claude <noreply@anthropic.com>
- Use r.Get/r.List instead of r.Client.Get/r.Client.List (staticcheck QF1008)
- Add defaultControllerImage constant to avoid string duplication (goconst)

Remaining linter issues are pre-existing:
- errcheck for podLogs.Close() in test code
- deprecated corev1.Endpoints usage
- dot imports in test utils
- unused log parameter in compare.go

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Update the TLS mounting logic in jumpstarter_controller.go to handle
manual TLS secrets when CertManager.Enabled is false. The controller
now checks for non-empty CertSecret values in both Controller and
Router configurations and mounts those secrets appropriately.

Changes:
- Controller deployment now mounts Spec.Controller.GRPC.TLS.CertSecret
  when CertManager is disabled and the field is set
- Router deployments now mount Spec.Routers.GRPC.TLS.CertSecret
  when CertManager is disabled and the field is set
- Both use the same env vars (EXTERNAL_CERT_PEM/EXTERNAL_KEY_PEM),
  volume mount configuration (tls-certs), and DefaultMode (420)
- Maintains backward compatibility with existing cert-manager workflow

This allows users to provide their own TLS certificates without
requiring cert-manager to be enabled.
Replace manual string parsing in extractHostname with net.SplitHostPort to
properly handle IPv6 addresses. The previous implementation incorrectly split
unbracketed IPv6 addresses at the last colon, treating part of the address as
a port. Now correctly handles IPv4, IPv6 (bracketed and unbracketed), and
host:port forms.

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove unused certDuration variable and dead code assignments in
  reconcileSelfSignedIssuer that were never used since
  reconcileControllerCertificate and reconcileRouterCertificate
  compute their own durations
- Extract duration extraction logic into getServerCertDurationSettings
  helper to eliminate duplication across three functions
- Refactor reconcileControllerCertificate and reconcileRouterCertificate
  to use new reconcileServerCertificate helper, reducing ~100 lines of
  duplicated certificate creation logic
- Add logging when renewBefore duration is capped to half of certificate
  duration for CA, controller, and router certificates
- Fix comment formatting in ServerCertConfig

Co-authored-by: Cursor AI <cursor@cursor.sh>
Update reconcileServerCertificate to properly separate IP addresses from DNS
names for cert-manager v1 compatibility. IP addresses must be placed in
Certificate.Spec.IPAddresses instead of Certificate.Spec.DNSNames.

Co-authored-by: Cursor <cursoragent@cursor.com>
@mangelajo mangelajo force-pushed the operator-cert-manager branch from ab6104e to b6c5b60 Compare January 28, 2026 11:11
@mangelajo mangelajo requested a review from bkhizgiy January 28, 2026 11:13
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
controller/deploy/operator/internal/controller/jumpstarter/suite_test.go (1)

71-75: Cert-manager reconciliation code is currently untested due to missing CRDs in envtest.

The test explicitly disables cert-manager (line 59 of jumpstarter_controller_test.go: CertManager.Enabled: false with comment "cert-manager CRDs not available in envtest"), which means the certificate reconciliation logic in certificates.go and issuer status checks in status.go are never exercised. Either add cert-manager CRD manifests to the envtest environment (via CRDDirectoryPaths or the envtest helper), or create a dedicated test suite for cert-manager functionality with cert-manager CRDs installed. The current workaround masks missing test coverage for production code.

controller/deploy/operator/test/e2e/e2e_test.go (1)

373-412: Outdated API field in test YAML.

The YAML uses useCertManager: false (line 380), but the CRD has been updated to use a nested certManager block with an enabled field. This test may fail or behave unexpectedly with the new CRD schema.

🐛 Suggested fix
 spec:
   baseDomain: %s
-  useCertManager: false
+  certManager:
+    enabled: false
   authentication:
🤖 Fix all issues with AI agents
In `@controller/deploy/operator/api/v1alpha1/jumpstarter_types.go`:
- Around line 384-385: The comment mentioning the removed field useCertManager
is outdated; update the doc comment (near the Jumpstarter type/field where
useCertManager is referenced) to reference the new field path
spec.certManager.enabled (or certManager.enabled) instead, e.g., replace
"useCertManager" with "spec.certManager.enabled" so CRD docs point to the
certManager.enabled flag.

In `@controller/deploy/operator/internal/controller/jumpstarter/certificates.go`:
- Around line 109-123: The reconciler currently always falls back to creating a
self‑signed CA via reconcileSelfSignedIssuer when no external IssuerRef is
provided; change this to honor the selfSigned.enabled flag by checking
js.Spec.CertManager.SelfSigned.Enabled (or equivalent field) before calling
reconcileSelfSignedIssuer. If SelfSigned.Enabled is true, call and return
r.reconcileSelfSignedIssuer(ctx, js); if false, return an empty
cmmeta.ObjectReference and nil (or an explicit no-issuer error if the callsite
expects one) so no self-signed issuer is created. Ensure you reference
js.Spec.CertManager.Server.IssuerRef and reconcileSelfSignedIssuer in your
change.

In `@controller/hack/utils`:
- Around line 186-193: Update the readiness waits to ensure all cert-manager
components are available by adding kubectl wait calls for
deployment/cert-manager and deployment/cainjector in addition to
deployment/cert-manager-webhook; modify the block that currently calls "kubectl
wait --namespace cert-manager --for=condition=available
deployment/cert-manager-webhook --timeout=300s" to also wait for
"deployment/cert-manager" and "deployment/cainjector" (reuse the same namespace
and timeout), and apply the same change in the other two occurrences referenced
in the review (the versions in controller/test/utils/utils.go and
controller/deploy/operator/test/utils/utils.go).
🧹 Nitpick comments (2)
controller/hack/utils (1)

16-16: Align CERTMANAGER_VERSION with go.mod dependency or document the mismatch.

Line 16 defaults to v1.19.2 while controller/deploy/operator/go.mod pins cert-manager v1.18.4. This mismatch between the deployment script and the actual API dependency version can lead to CRD/API drift. Either update the script default to match the go.mod version or explicitly document why a different version is intentionally used.

controller/deploy/operator/internal/controller/jumpstarter/status.go (1)

132-148: Consider the implications of returning true on RBAC errors.

When checkCertManagerCRDs encounters non-NoMatchError errors (e.g., RBAC issues), it returns true assuming CRDs exist. While the comment explains this prevents blocking on transient errors, it could mask persistent RBAC misconfigurations. The V(1) logging may not surface these issues visibly enough.

This design choice is reasonable for resilience, but consider whether the condition message should indicate uncertainty when errors occur.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@controller/deploy/operator/test/e2e/e2e_test.go`:
- Around line 1505-1510: The test currently only fails when a VolumeMount named
"tls-certs" has MountPath "/tls"; change the assertion to fail whenever any
mount with Name == "tls-certs" exists regardless of path. Locate the loop
iterating over container.VolumeMounts and replace the MountPath check with a
direct presence check for mount.Name == "tls-certs" and call g.Expect to fail
(e.g., g.Expect(false).To(BeTrue()) or
g.Expect(mount.Name).NotTo(Equal("tls-certs")) inverted) with the existing
fmt.Sprintf message referencing the deployment name.
🧹 Nitpick comments (2)
controller/hack/deploy_with_operator.sh (1)

117-156: Good approach using temp file for debugging.

The temp file approach is helpful for troubleshooting failed deployments. Consider adding optional cleanup at the end of successful runs to avoid accumulating temp files, though this is minor for a hack script.

♻️ Optional: Add cleanup on success
 echo -e "${GREEN}Applying Jumpstarter CR...${NC}"
 kubectl apply -f "${TMPFILE}"
+# Clean up temp file on success (file path was already printed for debugging)
+rm -f "${TMPFILE}"

Alternatively, use a trap to clean up on exit while preserving the file on failure:

cleanup() {
  if [ $? -eq 0 ]; then
    rm -f "${TMPFILE}"
  fi
}
trap cleanup EXIT
controller/deploy/operator/test/e2e/e2e_test.go (1)

1358-1365: Replace g.Expect(false).To(BeTrue()) with g.Fail().

Using g.Expect(false).To(BeTrue(), msg) is an anti-pattern that can be confusing. Use g.Fail(msg) directly for clearer intent.

♻️ Suggested fix
 		// Check for Ready condition
 		for _, cond := range issuer.Status.Conditions {
 			if cond.Type == certmanagerv1.IssuerConditionReady {
 				g.Expect(cond.Status).To(Equal(cmmeta.ConditionTrue),
 					fmt.Sprintf("Issuer %s is not ready: %s", name, cond.Message))
 				return
 			}
 		}
-		g.Expect(false).To(BeTrue(), fmt.Sprintf("Issuer %s has no Ready condition", name))
+		g.Fail(fmt.Sprintf("Issuer %s has no Ready condition", name))

The same pattern appears in waitForClusterIssuerReady (line 1390) and waitForCertificateReady (line 1416).

@mangelajo mangelajo force-pushed the operator-cert-manager branch from 809722c to 59bbc49 Compare January 28, 2026 15:19
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
controller/deploy/operator/test/e2e/e2e_test.go (1)

373-381: Update the test CRD field to certManager.enabled.

useCertManager was removed from the API; keeping it in the e2e YAML is stale and gets pruned. Prefer either omitting it or setting:

certManager:
  enabled: false
🤖 Fix all issues with AI agents
In
`@controller/deploy/operator/bundle/manifests/operator.jumpstarter.dev_jumpstarters.yaml`:
- Around line 776-788: Update the TLS documentation text that still references
the removed useCertManager flag: replace occurrences in the certSecret property
description (and the other repeated blocks) so they mention
spec.certManager.enabled instead of useCertManager; specifically edit the
description strings under the TLS configuration and certSecret properties (e.g.,
the multiline descriptions referencing automatic secret management) to say "If
spec.certManager.enabled is true, this secret will be automatically managed and
configured by cert-manager." and apply the same change to the other duplicated
descriptions.

In `@controller/deploy/operator/internal/controller/jumpstarter/certificates.go`:
- Around line 475-488: The extractHostname function returns the raw address when
net.SplitHostPort fails, which leaves surrounding brackets for bracketed IPv6
like "[2001:db8::1]" and prevents net.ParseIP from recognizing it; update
extractHostname to, after SplitHostPort returns an error, detect and trim a
leading '[' and trailing ']' from address (only if both are present) and return
the unbracketed value, otherwise return the original address—this change should
be made inside the extractHostname function.

In `@controller/deploy/operator/internal/controller/jumpstarter/status.go`:
- Around line 43-88: When js.Spec.CertManager.Enabled is false you must
explicitly update the cert-manager related conditions so they don't remain
stale: call setCondition for operatorv1alpha1.ConditionTypeCertManagerAvailable,
ConditionTypeIssuerReady, ConditionTypeControllerCertificateReady and
ConditionTypeRouterCertificatesReady with status = false, reason =
"CertManagerDisabled" and an appropriate message (e.g. "cert-manager is
disabled") instead of skipping those checks; update any aggregated
allReady/messages logic accordingly so disabled cert-manager produces the
expected False conditions and a clear status message.

In `@controller/deploy/operator/test/e2e/e2e_test.go`:
- Around line 736-739: The authentication block is currently nested under
controller in the cert-manager YAML templates; move the entire authentication:
internal: prefix/enabled block up to spec.authentication so the CRD receives
spec.authentication at top level. Update both occurrences of the cert-manager
YAML template (the existing authentication block under controller and the second
occurrence referenced) to use spec.authentication instead of
controller.authentication, preserving the internal: prefix and enabled values.
🧹 Nitpick comments (1)
controller/deploy/operator/api/v1alpha1/jumpstarter_types.go (1)

553-566: Consider documenting precedence more explicitly or adding validation.

The comment states "Only one of SelfSigned or IssuerRef should be specified" but there's no schema validation to enforce this. The IssuerRef description mentions it overrides SelfSigned.Enabled, which is good, but users could still be confused if they specify both.

For v1alpha1 this is acceptable, but consider adding a CEL validation rule in a future iteration to surface this constraint at admission time.

Comment on lines 776 to 788
description: |-
TLS configuration for secure gRPC communication.
Requires a Kubernetes secret containing the TLS certificate and private key.
If useCertManager is enabled, this secret will be automatically created.
See also: spec.useCertManager for automatic certificate management.
If spec.certManager.enabled is true, this secret will be automatically managed and
configured by cert-manager.
properties:
certSecret:
description: |-
Name of the Kubernetes secret containing the TLS certificate and private key.
The secret must contain 'tls.crt' and 'tls.key' keys.
If useCertManager is enabled, this secret will be automatically created.
If useCertManager is enabled, this secret will be automatically managed and
configured by cert-manager.
pattern: ^[a-z0-9]([a-z0-9\-\.]*[a-z0-9])?$
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docs still reference useCertManager.

These descriptions should mention spec.certManager.enabled now that useCertManager is removed, to avoid confusing users.

Also applies to: 1068-1074, 1329-1335

🤖 Prompt for AI Agents
In
`@controller/deploy/operator/bundle/manifests/operator.jumpstarter.dev_jumpstarters.yaml`
around lines 776 - 788, Update the TLS documentation text that still references
the removed useCertManager flag: replace occurrences in the certSecret property
description (and the other repeated blocks) so they mention
spec.certManager.enabled instead of useCertManager; specifically edit the
description strings under the TLS configuration and certSecret properties (e.g.,
the multiline descriptions referencing automatic secret management) to say "If
spec.certManager.enabled is true, this secret will be automatically managed and
configured by cert-manager." and apply the same change to the other duplicated
descriptions.

Comment on lines +475 to +488
// extractHostname extracts the hostname from an address (removes port if present).
// It properly handles IPv4, IPv6 (bracketed), and host:port forms.
func extractHostname(address string) string {
// Try to split host and port using net.SplitHostPort
// This correctly handles [IPv6]:port, hostname:port, and IPv4:port
host, _, err := net.SplitHostPort(address)
if err == nil {
// Successfully split host and port
return host
}

// If SplitHostPort failed, there's no port in the address
// Return the full address (handles plain IPv6, IPv4, or hostname)
return address
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and read the extractHostname function
fd "certificates.go" --type f

Repository: jumpstarter-dev/jumpstarter

Length of output: 145


🏁 Script executed:

# Let's also search for how extractHostname is used in the codebase
rg "extractHostname" --type go -B 2 -A 2

Repository: jumpstarter-dev/jumpstarter

Length of output: 4112


🏁 Script executed:

# Let's verify the behavior of net.SplitHostPort with bracketed IPv6
python3 << 'EOF'
import subprocess
import json

# Create a test Go program to check net.SplitHostPort behavior
test_code = '''package main

import (
	"fmt"
	"net"
)

func main() {
	testCases := []string{
		"[2001:db8::1]",
		"[2001:db8::1]:8080",
		"2001:db8::1",
		"example.com",
		"example.com:8080",
		"192.168.1.1",
		"192.168.1.1:8080",
	}

	for _, addr := range testCases {
		host, port, err := net.SplitHostPort(addr)
		errStr := "nil"
		if err != nil {
			errStr = err.Error()
		}
		fmt.Printf("Input: %-20s | Host: %-20s | Port: %-6s | Error: %s\n", addr, host, port, errStr)
	}

	// Also test net.ParseIP with bracketed and unbracketed IPv6
	fmt.Println("\nnet.ParseIP results:")
	ipTests := []string{
		"[2001:db8::1]",
		"2001:db8::1",
	}
	for _, ip := range ipTests {
		parsed := net.ParseIP(ip)
		fmt.Printf("ParseIP(%-20s) = %v\n", ip, parsed)
	}
}
'''

with open('/tmp/test.go', 'w') as f:
	f.write(test_code)

# Compile and run it
result = subprocess.run(['go', 'run', '/tmp/test.go'], capture_output=True, text=True, timeout=10)
print(result.stdout)
if result.stderr:
	print("STDERR:", result.stderr)
EOF

Repository: jumpstarter-dev/jumpstarter

Length of output: 816


🏁 Script executed:

# Read the extractHostname function in certificates.go
sed -n '470,500p' controller/deploy/operator/internal/controller/jumpstarter/certificates.go

Repository: jumpstarter-dev/jumpstarter

Length of output: 1074


🏁 Script executed:

# Read the extractHostname function in ingress.go
rg "func extractHostname" -A 15 controller/deploy/operator/internal/controller/jumpstarter/endpoints/ingress.go

Repository: jumpstarter-dev/jumpstarter

Length of output: 596


🏁 Script executed:

# Check how the hostname is used after extractHostname - look for net.ParseIP or SAN related code
rg "dnsNames|parsedIP|net\.ParseIP" controller/deploy/operator/internal/controller/jumpstarter/certificates.go -B 2 -A 2

Repository: jumpstarter-dev/jumpstarter

Length of output: 2927


🏁 Script executed:

# Let's fix the Go test to verify net.SplitHostPort behavior
python3 << 'EOF'
import subprocess

test_code = """package main

import (
	"fmt"
	"net"
)

func main() {
	testCases := []string{
		"[2001:db8::1]",
		"[2001:db8::1]:8080",
		"2001:db8::1",
		"example.com",
		"example.com:8080",
		"192.168.1.1",
		"192.168.1.1:8080",
	}

	for _, addr := range testCases {
		host, port, err := net.SplitHostPort(addr)
		errStr := "nil"
		if err != nil {
			errStr = err.Error()
		}
		portStr := port
		if port == "" {
			portStr = "(empty)"
		}
		fmt.Printf("Input: %-20s | Host: %-20s | Port: %-10s | Error: %s\\n", addr, host, portStr, errStr)
	}

	fmt.Println("")
	fmt.Println("net.ParseIP results:")
	ipTests := []string{
		"[2001:db8::1]",
		"2001:db8::1",
	}
	for _, ip := range ipTests {
		parsed := net.ParseIP(ip)
		fmt.Printf("ParseIP(%-20s) = %v\\n", ip, parsed)
	}
}
"""

with open('/tmp/test.go', 'w') as f:
	f.write(test_code)

result = subprocess.run(['go', 'run', '/tmp/test.go'], capture_output=True, text=True, timeout=10)
print(result.stdout)
if result.stderr:
	print("STDERR:", result.stderr)
EOF

Repository: jumpstarter-dev/jumpstarter

Length of output: 967


Handle bracketed IPv6 without port before returning host.

net.SplitHostPort fails on [2001:db8::1] (missing port error), so the brackets are retained and net.ParseIP later won't classify it as an IP SAN. Trim brackets when no port is present so IPv6 SANs are correct.

🐛 Proposed fix
 import (
 	"context"
 	"fmt"
 	"net"
+	"strings"
 	"time"
@@
 	host, _, err := net.SplitHostPort(address)
 	if err == nil {
 		// Successfully split host and port
 		return host
 	}
 
 	// If SplitHostPort failed, there's no port in the address
 	// Return the full address (handles plain IPv6, IPv4, or hostname)
-	return address
+	if strings.HasPrefix(address, "[") && strings.HasSuffix(address, "]") {
+		return strings.TrimSuffix(strings.TrimPrefix(address, "["), "]")
+	}
+	return address
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// extractHostname extracts the hostname from an address (removes port if present).
// It properly handles IPv4, IPv6 (bracketed), and host:port forms.
func extractHostname(address string) string {
// Try to split host and port using net.SplitHostPort
// This correctly handles [IPv6]:port, hostname:port, and IPv4:port
host, _, err := net.SplitHostPort(address)
if err == nil {
// Successfully split host and port
return host
}
// If SplitHostPort failed, there's no port in the address
// Return the full address (handles plain IPv6, IPv4, or hostname)
return address
// extractHostname extracts the hostname from an address (removes port if present).
// It properly handles IPv4, IPv6 (bracketed), and host:port forms.
func extractHostname(address string) string {
// Try to split host and port using net.SplitHostPort
// This correctly handles [IPv6]:port, hostname:port, and IPv4:port
host, _, err := net.SplitHostPort(address)
if err == nil {
// Successfully split host and port
return host
}
// If SplitHostPort failed, there's no port in the address
// Return the full address (handles plain IPv6, IPv4, or hostname)
if strings.HasPrefix(address, "[") && strings.HasSuffix(address, "]") {
return strings.TrimSuffix(strings.TrimPrefix(address, "["), "]")
}
return address
}
🤖 Prompt for AI Agents
In `@controller/deploy/operator/internal/controller/jumpstarter/certificates.go`
around lines 475 - 488, The extractHostname function returns the raw address
when net.SplitHostPort fails, which leaves surrounding brackets for bracketed
IPv6 like "[2001:db8::1]" and prevents net.ParseIP from recognizing it; update
extractHostname to, after SplitHostPort returns an error, detect and trim a
leading '[' and trailing ']' from address (only if both are present) and return
the unbracketed value, otherwise return the original address—this change should
be made inside the extractHostname function.

Comment on lines +43 to +88
// Check cert-manager related conditions if enabled
if js.Spec.CertManager.Enabled {
// Check if cert-manager CRDs are available
certManagerAvailable := r.checkCertManagerCRDs(ctx)
setCondition(js, operatorv1alpha1.ConditionTypeCertManagerAvailable,
certManagerAvailable,
conditionReason(certManagerAvailable, "CRDsInstalled", "CRDsNotFound"),
conditionMessage(certManagerAvailable, "cert-manager CRDs are available", "cert-manager CRDs not found in cluster"))
if !certManagerAvailable {
allReady = false
messages = append(messages, "cert-manager CRDs not available")
}

// Check issuer readiness
issuerReady, issuerMsg := r.checkIssuerReady(ctx, js)
setCondition(js, operatorv1alpha1.ConditionTypeIssuerReady,
issuerReady,
conditionReason(issuerReady, "IssuerReady", "IssuerNotReady"),
issuerMsg)
if !issuerReady {
allReady = false
messages = append(messages, issuerMsg)
}

// Check controller certificate readiness
controllerCertReady, certMsg := r.checkControllerCertificateReady(ctx, js)
setCondition(js, operatorv1alpha1.ConditionTypeControllerCertificateReady,
controllerCertReady,
conditionReason(controllerCertReady, "CertificateReady", "CertificatePending"),
certMsg)
if !controllerCertReady {
allReady = false
messages = append(messages, certMsg)
}

// Check router certificates readiness
routerCertsReady, routerMsg := r.checkRouterCertificatesReady(ctx, js)
setCondition(js, operatorv1alpha1.ConditionTypeRouterCertificatesReady,
routerCertsReady,
conditionReason(routerCertsReady, "AllCertificatesReady", "CertificatesPending"),
routerMsg)
if !routerCertsReady {
allReady = false
messages = append(messages, routerMsg)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's explore the repository structure
find . -type f -name "*.go" | grep -E "(status|condition)" | head -20

Repository: jumpstarter-dev/jumpstarter

Length of output: 141


🏁 Script executed:

# Read the full status.go file to understand the complete reconciliation logic
wc -l ./controller/deploy/operator/internal/controller/jumpstarter/status.go

Repository: jumpstarter-dev/jumpstarter

Length of output: 145


🏁 Script executed:

# Read the file with line numbers to see the full context
cat -n ./controller/deploy/operator/internal/controller/jumpstarter/status.go

Repository: jumpstarter-dev/jumpstarter

Length of output: 16647


🏁 Script executed:

# Check if there are tests for this status logic
find . -type f -name "*status*test*.go" -o -name "*test*status*.go" | head -10

Repository: jumpstarter-dev/jumpstarter

Length of output: 53


🏁 Script executed:

# Search for how conditions are typically reset/cleared elsewhere in the codebase
rg "ConditionType" --type go -B 2 -A 2 | head -50

Repository: jumpstarter-dev/jumpstarter

Length of output: 4481


🏁 Script executed:

# Check if there's an API definition to see what these condition types are
find . -type f -name "*.go" -path "*/api/*" | xargs grep -l "ConditionType" | head -5

Repository: jumpstarter-dev/jumpstarter

Length of output: 307


🏁 Script executed:

# Look at the API definition to understand the condition types better
find . -type f -name "*.go" -path "*api*" | xargs grep -l "CertManager\|Issuer" | head -5

Repository: jumpstarter-dev/jumpstarter

Length of output: 200


🏁 Script executed:

# Read the jumpstarter_types.go to see the condition type definitions
cat -n ./controller/deploy/operator/api/v1alpha1/jumpstarter_types.go | head -150

Repository: jumpstarter-dev/jumpstarter

Length of output: 5744


🏁 Script executed:

# Search for the condition type constants
rg "ConditionType" ./controller/deploy/operator/api/v1alpha1/jumpstarter_types.go -A 2 -B 2

Repository: jumpstarter-dev/jumpstarter

Length of output: 1195


Handle cert-manager conditions when cert-manager is disabled.

When cert-manager is disabled, the cert-manager-related conditions (CertManagerAvailable, IssuerReady, ControllerCertificateReady, RouterCertificatesReady) are not updated. If cert-manager was previously enabled, these conditions retain stale values, creating misleading status. Consider explicitly setting these conditions to False with reason "CertManagerDisabled" when js.Spec.CertManager.Enabled is false, or removing them from the status.

🤖 Prompt for AI Agents
In `@controller/deploy/operator/internal/controller/jumpstarter/status.go` around
lines 43 - 88, When js.Spec.CertManager.Enabled is false you must explicitly
update the cert-manager related conditions so they don't remain stale: call
setCondition for operatorv1alpha1.ConditionTypeCertManagerAvailable,
ConditionTypeIssuerReady, ConditionTypeControllerCertificateReady and
ConditionTypeRouterCertificatesReady with status = false, reason =
"CertManagerDisabled" and an appropriate message (e.g. "cert-manager is
disabled") instead of skipping those checks; update any aggregated
allReady/messages logic accordingly so disabled cert-manager produces the
expected False conditions and a clear status message.

Comment on lines +736 to +739
authentication:
internal:
prefix: "internal:"
enabled: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

authentication is nested under controller in cert-manager YAML.

The CRD expects spec.authentication at the top level. As written, this block will be pruned and won’t exercise the intended config. Please move it to spec.authentication in both cert-manager YAML templates.

Also applies to: 1059-1063

🤖 Prompt for AI Agents
In `@controller/deploy/operator/test/e2e/e2e_test.go` around lines 736 - 739, The
authentication block is currently nested under controller in the cert-manager
YAML templates; move the entire authentication: internal: prefix/enabled block
up to spec.authentication so the CRD receives spec.authentication at top level.
Update both occurrences of the cert-manager YAML template (the existing
authentication block under controller and the second occurrence referenced) to
use spec.authentication instead of controller.authentication, preserving the
internal: prefix and enabled values.

@mangelajo mangelajo merged commit a4febd6 into main Jan 28, 2026
19 checks passed
@mangelajo mangelajo deleted the operator-cert-manager branch January 28, 2026 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants