Skip to main content

Observability

Cosmonic Control ships a fully integrated observability stack. No external dependencies are required.

Architecture

Observability architecture diagram

All components export telemetry via OTLP to the opentelemetry-collector, which fans out to:

Workload telemetry

HostGroups emit OpenTelemetry data for Wasm workloads when the --wasi-otel flag is set on the control-host container. The flag is enabled by default via the opentelemetry.workload: true value on the cosmonic-control-hostgroup chart.

With the flag enabled, workloads on the HostGroup can import the wasi:otel WIT package to emit traces, metrics, and logs:

import wasi:otel/types@0.2.0-rc.1;
import wasi:otel/tracing@0.2.0-rc.1;
import wasi:otel/logs@0.2.0-rc.1;
import wasi:otel/metrics@0.2.0-rc.1;

Emitted data flows to the endpoint in opentelemetry.endpoint (default http://opentelemetry-collector:4317), which is the same OTel collector the control plane uses. Traces, metrics, and logs land in Tempo, Prometheus, and Loki and appear in the Perses dashboards alongside control-plane telemetry.

To opt out per HostGroup, set opentelemetry.workload: false:

# hostgroup-values.yaml
opentelemetry:
  workload: false

To send workload telemetry to a different collector, override opentelemetry.endpoint:

# hostgroup-values.yaml
opentelemetry:
  endpoint: https://my-collector.corp.com:4317
  insecure: false

See the wasi:otel package spec and the otel-http example for reference usage.

Accessing the Perses dashboard

Perses is deployed as a ClusterIP service and is not exposed externally by default. Use kubectl port-forward to access it locally:

kubectl port-forward svc/perses 8080:8080 -n cosmonic-system

Open http://localhost:8080 in your browser.

To expose Perses externally (for example, behind an ingress controller), change the service type in your values file:

perses:
  service:
    type: LoadBalancer   # or NodePort, or configure your own ingress

Built-in dashboards

Cosmonic Control provisions the following Perses dashboards automatically:

Workload Activity

Namespace, workload, and host variables drive the entire dashboard. Per-host RPS, error-rate stat with an idle empty-state, sorted per-host table, and separate collapsible HTTP, Blobstore, Keyvalue, Messaging, and Logs rows. Each TraceTable links into the Tempo Explorer with the dashboard variables pre-populated.

Host Activity

Per-host rollups of every workload running on the selected host (count, RPS, error rate, sorted table), host-process span rates (connect_nats, workload lifecycle, component prep, plugin bind/unbind), and a host-scoped logs panel filtered by k8s_pod_name.

Host Infrastructure

  • Host Reconciliation Activity
  • Host Controller Errors
  • Workqueue Depth by Controller

Workloads

  • Workload Reconciliation Rate
  • Workload Errors by Type
  • Active Workers by Controller

Operator Resource Usage

  • Memory Usage
  • CPU Usage
  • Goroutines

Host identity on telemetry

Every span, log, and metric emitted by a HostGroup pod carries the following OpenTelemetry resource attributes, set on the host via the Kubernetes downward API:

  • k8s.pod.name
  • k8s.pod.uid
  • k8s.node.name
  • k8s.namespace.name
  • cosmonic.io/hostgroup

Use these to scope queries to a specific pod, node, or HostGroup without joining against external cluster state. The Host Activity dashboard uses k8s_pod_name (the Loki structured-metadata field copied from k8s.pod.name) as its host selector.

Accessing backends directly

Each backend is available as a ClusterIP service in the cosmonic-system namespace for direct access or integration with external tooling (e.g. an existing Grafana instance):

ServicePortProtocol
prometheus9090HTTP
loki3100HTTP (Loki API)
tempo3200HTTP / 4317 gRPC (OTLP)
opentelemetry-collector4317 (gRPC) / 4318 (HTTP)OTLP

To disable the built-in Perses dashboard (for example, when integrating with an existing Grafana deployment):

perses:
  uiEnabled: false

Custom dashboards

Perses supports a Dashboard-as-Code approach via provisioning. Add custom dashboards with the perses.provisioning.extraProvisioningFiles Helm value. See the Perses documentation for the dashboard file format.

Further reading