Skip to main content

Upgrading

Standard upgrade

To upgrade Cosmonic Control to a new chart version, use helm upgrade with the same values file used during installation:

# Check the currently installed release and chart version
helm list -n cosmonic-system

# Upgrade to a new chart version
helm upgrade cosmonic-control oci://ghcr.io/cosmonic/cosmonic-control \
  --version 0.5.1 \
  --namespace cosmonic-system \
  -f cosmonic-control-values.yaml

Upgrade the HostGroup release separately:

helm upgrade hostgroup oci://ghcr.io/cosmonic/cosmonic-control-hostgroup \
  --version 0.5.1 \
  --namespace cosmonic-system

Wait for the rollout to complete before considering the upgrade done:

kubectl rollout status deploy -l app.kubernetes.io/instance=cosmonic-control -n cosmonic-system
kubectl rollout status deploy -l app.kubernetes.io/instance=hostgroup -n cosmonic-system
Chart version vs. appVersion

Chart versions do not track appVersion one-to-one. A single chart version (e.g. 0.4.1) may ship updated appVersion values as underlying Cosmonic Control images are patched. Check the current appVersion before upgrading:

helm show chart oci://ghcr.io/cosmonic/cosmonic-control --version 0.5.1

Rolling back

Rolling back the Helm release

To roll back to the previous release after a failed upgrade:

helm rollback cosmonic-control -n cosmonic-system
helm rollback hostgroup -n cosmonic-system

helm rollback restores the Helm release to its previous revision. Check the rollback status:

kubectl rollout status deploy -l app.kubernetes.io/instance=cosmonic-control -n cosmonic-system
kubectl rollout status deploy -l app.kubernetes.io/instance=hostgroup -n cosmonic-system

CRD rollback caveat

warning

Helm does not downgrade CRDs on rollback — by design, to prevent accidental data loss. If the failed upgrade included CRD changes, rolling back the Helm release will leave the newer CRD versions in place.

If you need to roll CRDs back to a previous version, apply the old CRD manifests manually:

# Pull CRDs from the target (older) chart version
helm show crds oci://ghcr.io/cosmonic/cosmonic-control --version <previous-version> | kubectl apply -f -

Verify the CRDs were restored before proceeding:

kubectl get crd | grep -E 'cosmonic|wasmcloud'

Upgrading from v0.3.x

Version 0.4.0 is a significant release. Review the changes below and update your values file before running helm upgrade.

Apply the updated CRDs

Helm does not upgrade existing CRDs automatically. Export the CRDs from the v0.4.0 chart and apply them manually:

helm show crds oci://ghcr.io/cosmonic/cosmonic-control --version 0.4.0 | kubectl apply -f -

The v0.4.0 CRDs add new optional fields (imagePullPolicy on hostgroup workloads, readOnly on volume mounts) and remain backward compatible with existing HTTPTrigger, WorkloadDeployment, and Host manifests.

Traefik is now the default ingress

The chart now deploys Traefik as the edge proxy by default and creates a Traefik IngressClass. The ingress Kubernetes Service defaults to ClusterIP and sits behind Traefik. See Ingress and Workloads for the full architecture.

If your v0.3.x install exposed Envoy directly via envoy.service.type: LoadBalancer or NodePort, preserve that pattern on v0.4.0 by setting ingress.enabled: false:

# cosmonic-control-values.yaml
ingress:
  enabled: false
envoy:
  service:
    type: LoadBalancer   # or NodePort, with httpNodePort set

Otherwise, switch to Traefik and migrate external routing to standard Kubernetes Ingress resources that target the ingress Service on port 80. The Traefik section of Ingress and Workloads has a worked example.

Console and Cloud components removed

The Helm chart no longer ships Console, Console UI, or the console: values block. When you run helm upgrade, Helm removes the existing Console Deployment, Service, and ConfigMap from the cluster. Perses is now the primary observability surface and continues to ship with the chart.

Remove any console: or console_ui: blocks from your values file before upgrading.

wasmCloud 2.0.3 and OpenTelemetry field changes

Cosmonic Control v0.4.0 upgrades the control-host image to wasmCloud 2.0.3, built on wash v2 internals. OpenTelemetry field names emitted by the host changed as part of that reconciliation. If you have Perses dashboards, alerting rules, or external OTel consumers that query on specific field names, verify them against the new host logs after the upgrade.

Run the upgrade

With the values file updated and CRDs applied, upgrade both releases:

# 1. Apply the updated CRDs
helm show crds oci://ghcr.io/cosmonic/cosmonic-control --version 0.4.0 | kubectl apply -f -

# 2. Upgrade the control plane
helm upgrade cosmonic-control oci://ghcr.io/cosmonic/cosmonic-control \
  --version 0.4.0 \
  --namespace cosmonic-system \
  -f cosmonic-control-values.yaml

# 3. Upgrade the HostGroup
helm upgrade hostgroup oci://ghcr.io/cosmonic/cosmonic-control-hostgroup \
  --version 0.4.0 \
  --namespace cosmonic-system

Wait for the rollout to complete:

kubectl rollout status deploy -l app.kubernetes.io/instance=cosmonic-control -n cosmonic-system
kubectl rollout status deploy -l app.kubernetes.io/instance=hostgroup -n cosmonic-system

Upgrading from v0.4.0 to v0.4.1

Chart 0.4.1 ships appVersion 0.4.2. The chart was bumped once; the host image (ghcr.io/cosmonic/control-host) was patched twice, ending at 0.4.2. Upgrading installs the latest host image automatically.

Apply the updated CRDs

Helm does not upgrade existing CRDs automatically. Export the CRDs from the v0.4.1 chart and apply them manually:

helm show crds oci://ghcr.io/cosmonic/cosmonic-control --version 0.4.1 | kubectl apply -f -

The v0.4.1 CRDs add the following optional fields. Existing v0.4.0 manifests remain valid without changes.

  • HTTPTrigger.spec.timeout (string, Go duration) — per-route upstream request timeout applied to the xDS route generated for the trigger. Bounds how long Envoy waits for a response from the backing Wasm host before returning upstream request timeout. Accepts values like 300s or 5m. When unset, Envoy's default of 15s applies.
  • HTTPTrigger.spec.kubernetes.service.name and HTTPTrigger.spec.template.spec.kubernetes.service.name (string) — references an existing Kubernetes Service that the operator maintains an EndpointSlice for, pointing to the host pods running the workload. When set, the operator also registers DNS aliases (service-name, service-name.namespace.svc.cluster.local) with the host so cluster-internal callers can reach the workload via Service DNS without going through an external gateway.
  • WorkloadDeployment.spec.template.spec.kubernetes.service.name (string) — same mechanism as above for non-HTTP workloads.
  • HostInterface.name (string, lowercase alphanumeric and hyphens) on both CRDs — uniquely identifies an interface instance when multiple hostInterfaces entries share the same namespace and package. Components use this name as the identifier parameter in resource-opening functions (e.g. store::open(name)). Required when multiple entries of the same namespace:package exist.

Default ingress provider switched to Traefik

ingress.provider now defaults to "traefik". v0.4.0 defaulted to "istio". If your values file does not set ingress.provider explicitly and your environment expects Istio routing, set it before upgrading:

# cosmonic-control-values.yaml
ingress:
  provider: istio

If you migrated to Traefik during the v0.3.x → v0.4.0 upgrade or you run a fresh v0.4.0 install with Traefik, no action is required.

New request-timeout values

Two new chart values bound how long the host and the ingress will wait for a reply on long-running requests. The defaults are conservative (300s) and cover GPU compute jobs and other multi-minute request/reply patterns out of the box.

  • nexus.requestTimeoutSeconds on cosmonic-control-hostgroup (default 300) — bounds any single wasmcloud:messaging/consumer.request call (and other NATS request/reply traffic). Set to null to fall back to the async-nats client default (10s).
  • ingress.istio.workloadsRequestTimeoutSeconds on cosmonic-control (default 300) — per-route upstream request timeout for the workloads VirtualService when running with the Istio ingress provider. Set to null to fall back to Envoy's default (15s). For per-trigger overrides under either ingress provider, use the new HTTPTrigger.spec.timeout field.

If you previously worked around long-running requests by raising async-nats client timeouts at the application level or by editing VirtualService routes after install, you can remove those overrides.

Run the upgrade

# 1. Apply the updated CRDs
helm show crds oci://ghcr.io/cosmonic/cosmonic-control --version 0.4.1 | kubectl apply -f -

# 2. Upgrade the control plane
helm upgrade cosmonic-control oci://ghcr.io/cosmonic/cosmonic-control \
  --version 0.4.1 \
  --namespace cosmonic-system \
  -f cosmonic-control-values.yaml

# 3. Upgrade the HostGroup
helm upgrade hostgroup oci://ghcr.io/cosmonic/cosmonic-control-hostgroup \
  --version 0.4.1 \
  --namespace cosmonic-system

Wait for the rollout to complete:

kubectl rollout status deploy -l app.kubernetes.io/instance=cosmonic-control -n cosmonic-system
kubectl rollout status deploy -l app.kubernetes.io/instance=hostgroup -n cosmonic-system

Upgrading from v0.4.x to v0.5.0

Chart 0.5.0 ships appVersion 0.5.0 and bumps the control-host image to wasmCloud v2.2.1.

Apply the updated CRDs

helm show crds oci://ghcr.io/cosmonic/cosmonic-control --version 0.5.0 | kubectl apply -f -

CA bundle support (opt-in)

Both charts gain a top-level caBundle value group with mutually exclusive sources (contents, existingConfigMap, existingSecret) and per-component opt-in flags. Defaults are off, so existing installs render an identical manifest. See the Trusting a private CA tip on the Get Started page for the install-time pattern.

The freeform volumes / volumeMounts values on the hostgroup chart still work; the new mechanism is additive.

Observability stack image bumps

The observability sidecars are pinned to new tags in values.yaml. If you mirror images to a private registry, mirror the new tags before upgrading — see Air-Gapped Installation.

Componentv0.4.x tagv0.5.x tag
prometheusv3.3.1v3.11.3
loki3.53.7.2
tempo2.8.2 (hardcoded)2.10.5
otel-collector-contrib0.120.00.152.0
envoyv1.35.2v1.38.0
jaeger2.11.02.18.0
kiwigrid-k8s-sidecar1.30.102.7.3

The tempo Deployment previously bypassed values.yaml and pulled a hardcoded 2.8.2. From v0.5.0 onward, tempo.image.tag drives the deployed tag like every other observability component.

Run the upgrade

# 1. Apply the updated CRDs
helm show crds oci://ghcr.io/cosmonic/cosmonic-control --version 0.5.0 | kubectl apply -f -

# 2. Upgrade the control plane
helm upgrade cosmonic-control oci://ghcr.io/cosmonic/cosmonic-control \
  --version 0.5.0 \
  --namespace cosmonic-system \
  -f cosmonic-control-values.yaml

# 3. Upgrade the HostGroup
helm upgrade hostgroup oci://ghcr.io/cosmonic/cosmonic-control-hostgroup \
  --version 0.5.0 \
  --namespace cosmonic-system

Upgrading from v0.5.0 to v0.5.1

Chart 0.5.1 ships appVersion 0.5.1.

Apply the updated CRDs

helm show crds oci://ghcr.io/cosmonic/cosmonic-control --version 0.5.1 | kubectl apply -f -

Metrics endpoint is now configurable; cluster-scoped metrics RBAC is off by default

The cosmonic-control chart gains two new values on a new metrics block:

  • metrics.secure (default false) — when true, the operator serves metrics over HTTPS with delegated authn/authz, and the chart installs the cluster-scoped metrics-reader / metrics-auth ClusterRoles and binding required for that mode. When false (the default), metrics are served as plain HTTP with no auth and no metrics ClusterRole is created.
  • metrics.port (default 8081) — the operator's metrics bind port. Drives the container port, Service port, and the in-cluster OpenTelemetry collector's scrape target in lockstep.

Previous chart versions installed those ClusterRoles unconditionally, but the operator was never started with secure metrics, so they were unused. helm upgrade removes them. The metrics endpoint behavior is functionally identical to before, minus the unused RBAC.

Host self-identification on telemetry

The hostgroup now exports k8s.pod.name, k8s.pod.uid, k8s.node.name, k8s.namespace.name, and cosmonic.io/hostgroup as OpenTelemetry resource attributes via the Kubernetes downward API. The collector drops the k8sattributes processor and its ClusterRole — the host self-identifies, so the collector no longer needs cluster-scope pod read.

Loki indexes k8s_pod_name, k8s_namespace_name, and cosmonic_io_hostgroup as structured-metadata fields on log records. Use these in LogQL queries to scope logs to a specific pod, namespace, or HostGroup.

Rebuilt observability dashboards

The HTTPTrigger Detail dashboard is rebuilt and renamed to Workload Activity (namespace/workload/host variables, per-host RPS and error rate, separate HTTP/Blobstore/Keyvalue/Messaging/Logs rows, Tempo Explorer drill-in links). A new Host Activity dashboard surfaces per-host span rates and a host-scoped logs panel keyed off k8s_pod_name.

If you forked the previous dashboard, port your customizations against the new dashboard definitions. See Observability.

Run the upgrade

# 1. Apply the updated CRDs
helm show crds oci://ghcr.io/cosmonic/cosmonic-control --version 0.5.1 | kubectl apply -f -

# 2. Upgrade the control plane
helm upgrade cosmonic-control oci://ghcr.io/cosmonic/cosmonic-control \
  --version 0.5.1 \
  --namespace cosmonic-system \
  -f cosmonic-control-values.yaml

# 3. Upgrade the HostGroup
helm upgrade hostgroup oci://ghcr.io/cosmonic/cosmonic-control-hostgroup \
  --version 0.5.1 \
  --namespace cosmonic-system

Wait for the rollout to complete:

kubectl rollout status deploy -l app.kubernetes.io/instance=cosmonic-control -n cosmonic-system
kubectl rollout status deploy -l app.kubernetes.io/instance=hostgroup -n cosmonic-system