Cert-manager Grafana dashboard and alert rules setup

Introduction
cert-manager is one of those Kubernetes components that quietly does important work in the background. When certificates renew successfully, nobody thinks about it. When they fail, the first sign is often an expired TLS certificate in front of a user-facing service.
In this guide, we will install cert-manager with Prometheus metrics enabled, import a Grafana dashboard using a dashboard ConfigMap, and create Prometheus alert rules for certificates that are not ready, already expired, or expiring soon.
Prerequisites
- Kubernetes cluster
kube-prometheus-stackalready installed- Helm installed locally
- Grafana sidecar configured to load dashboard ConfigMaps with the
grafana_dashboard: '1'label kubectlaccess to apply and verify the manifests
Step-by-step
-
Add the Jetstack Helm repository:
bashhelm repo add jetstack https://charts.jetstack.io helm repo update -
Create a
cert-manager-values.yamlfile with ServiceMonitor enabled. Therelease: kube-prom-stacklabel letskube-prometheus-stackdiscover the ServiceMonitor:yamlcrds: enabled: true prometheus: servicemonitor: enabled: true prometheusInstance: kube-prom-stack labels: release: kube-prom-stack extraArgs: - --dns01-recursive-nameservers=8.8.8.8:53,8.8.4.4:53 - --dns01-recursive-nameservers-only -
Install cert-manager:
bashhelm upgrade --install cert-manager jetstack/cert-manager \ --namespace cert-manager \ --create-namespace \ --version '~1.19.0' \ --values cert-manager-values.yaml -
Create a
trust-manager-values.yamlfile:yamlcrds: enabled: true app: metrics: service: servicemonitor: enabled: true labels: release: kube-prom-stack -
Install trust-manager:
bashhelm upgrade --install trust-manager jetstack/trust-manager \ --namespace cert-manager \ --version '~0.19.0' \ --values trust-manager-values.yaml -
Verify that cert-manager is running:
bashkubectl get pods -n cert-manager kubectl get servicemonitor -n cert-manager -
Create a Grafana dashboard ConfigMap in the
observabilitynamespace. Thegrafana_dashboard: '1'label lets the Grafana sidecar discover and import it automatically:yamlapiVersion: v1 kind: ConfigMap metadata: name: cert-manager-dashboard namespace: observability labels: grafana_dashboard: '1' app.kubernetes.io/instance: prometheus-community data: cert-manager.json: |- { "annotations": { "list": [ { "builtIn": 1, "datasource": { "type": "grafana", "uid": "-- Grafana --" }, "enable": true, "hide": true, "iconColor": "rgba(0, 211, 255, 1)", "name": "Annotations & Alerts", "target": { "limit": 100, "matchAny": false, "tags": [], "type": "dashboard" }, "type": "dashboard" } ] }, "description": "The dashboard gives an overview of the SSL certs managed by cert-manager in Kubernetes", "editable": true, "fiscalYearStartMonth": 0, "gnetId": 20842, "graphTooltip": 0, "links": [], "liveNow": false, "panels": [ { "datasource": { "type": "prometheus", "uid": "${datasource}" }, "description": "The number if available certificates", "fieldConfig": { "defaults": { "color": { "mode": "thresholds" }, "mappings": [], "noValue": "0", "thresholds": { "mode": "absolute", "steps": [ { "color": "green", "value": null } ] } }, "overrides": [] }, "gridPos": { "h": 8, "w": 8, "x": 0, "y": 0 }, "id": 1, "options": { "colorMode": "value", "graphMode": "none", "justifyMode": "auto", "orientation": "auto", "reduceOptions": { "calcs": [ "lastNotNull" ], "fields": "", "values": false }, "textMode": "value", "wideLayout": true }, "pluginVersion": "11.2.0", "targets": [ { "datasource": { "type": "prometheus", "uid": "${datasource}" }, "editorMode": "code", "exemplar": false, "expr": "count(certmanager_certificate_ready_status{condition=\"True\", cluster=~\"$cluster\", exported_namespace=~\"$namespace\"})", "instant": true, "legendFormat": "__auto", "range": false, "refId": "A" } ], "title": "Valid Certificates", "type": "stat" }, { "datasource": { "type": "prometheus", "uid": "${datasource}" }, "description": "The number of certificates that will expire within the next 14 days", "fieldConfig": { "defaults": { "color": { "mode": "thresholds" }, "mappings": [], "noValue": "0", "thresholds": { "mode": "absolute", "steps": [ { "color": "green", "value": null }, { "color": "#EAB839", "value": 1 } ] } }, "overrides": [] }, "gridPos": { "h": 8, "w": 8, "x": 8, "y": 0 }, "id": 3, "options": { "colorMode": "value", "graphMode": "none", "justifyMode": "auto", "orientation": "auto", "reduceOptions": { "calcs": [ "lastNotNull" ], "fields": "", "values": false }, "textMode": "auto", "wideLayout": true }, "pluginVersion": "11.2.0", "targets": [ { "datasource": { "type": "prometheus", "uid": "${datasource}" }, "editorMode": "code", "exemplar": false, "expr": "count(certmanager_certificate_expiration_timestamp_seconds{cluster=~\"$cluster\", exported_namespace=~\"$namespace\"} < (time()+(14*24*3600)))", "instant": true, "legendFormat": "{{exported_namespace}}/{{name}}", "range": false, "refId": "A" } ], "title": "Expiring Certificates", "type": "stat" }, { "datasource": { "type": "prometheus", "uid": "${datasource}" }, "description": "Total number of HTTP requests, based on the selected time range", "fieldConfig": { "defaults": { "color": { "mode": "thresholds" }, "decimals": 0, "mappings": [], "noValue": "0", "thresholds": { "mode": "absolute", "steps": [ { "color": "text", "value": null } ] } }, "overrides": [] }, "gridPos": { "h": 8, "w": 8, "x": 16, "y": 0 }, "id": 2, "options": { "colorMode": "value", "graphMode": "none", "justifyMode": "auto", "orientation": "auto", "reduceOptions": { "calcs": [ "lastNotNull" ], "fields": "", "values": false }, "textMode": "auto", "wideLayout": true }, "pluginVersion": "11.2.0", "targets": [ { "datasource": { "type": "prometheus", "uid": "${datasource}" }, "editorMode": "code", "exemplar": false, "expr": "sum(increase(certmanager_http_acme_client_request_count{cluster=~\"$cluster\"}[$__range]))", "instant": true, "legendFormat": "__auto", "range": false, "refId": "A" } ], "title": "Total ACME Requests", "type": "stat" }, { "datasource": { "type": "prometheus", "uid": "${datasource}" }, "description": "Time before the certificates expire. Only shows certificates expiring within 45 days", "fieldConfig": { "defaults": { "color": { "mode": "thresholds" }, "mappings": [], "min": 0, "thresholds": { "mode": "absolute", "steps": [ { "color": "red", "value": null }, { "color": "orange", "value": 14 }, { "color": "green", "value": 30 }, { "color": "dark-green", "value": 60 } ] }, "unit": "s" }, "overrides": [] }, "gridPos": { "h": 10, "w": 12, "x": 0, "y": 8 }, "id": 5, "options": { "displayMode": "gradient", "maxVizHeight": 300, "minVizHeight": 16, "minVizWidth": 8, "namePlacement": "left", "orientation": "horizontal", "reduceOptions": { "calcs": [ "lastNotNull" ], "fields": "", "values": false }, "showUnfilled": true, "sizing": "auto", "valueMode": "color" }, "pluginVersion": "11.2.0", "targets": [ { "datasource": { "type": "prometheus", "uid": "${datasource}" }, "editorMode": "code", "exemplar": false, "expr": "sort(certmanager_certificate_expiration_timestamp_seconds{exported_namespace=~\"$namespace\"} - time()) < 45 * (24*3600)", "format": "time_series", "instant": true, "legendFormat": "{{cluster}} - {{exported_namespace}} - {{name}}", "range": false, "refId": "A" } ], "title": "Time to Expiration (<45 days)", "type": "bargauge" }, { "datasource": { "type": "prometheus", "uid": "${datasource}" }, "description": "Time before the certificates are automatically renewed", "fieldConfig": { "defaults": { "color": { "mode": "thresholds" }, "mappings": [], "min": 0, "thresholds": { "mode": "absolute", "steps": [ { "color": "green", "value": null } ] }, "unit": "s" }, "overrides": [] }, "gridPos": { "h": 10, "w": 12, "x": 12, "y": 8 }, "id": 6, "options": { "displayMode": "gradient", "maxVizHeight": 300, "minVizHeight": 16, "minVizWidth": 8, "namePlacement": "left", "orientation": "horizontal", "reduceOptions": { "calcs": [ "lastNotNull" ], "fields": "", "values": false }, "showUnfilled": true, "sizing": "auto", "valueMode": "color" }, "pluginVersion": "11.2.0", "targets": [ { "datasource": { "type": "prometheus", "uid": "${datasource}" }, "editorMode": "code", "exemplar": false, "expr": "sort(certmanager_certificate_renewal_timestamp_seconds{exported_namespace=~\"$namespace\"} - time()) < 45 * (24*3600)", "format": "time_series", "instant": true, "legendFormat": "{{cluster}} - {{exported_namespace}} - {{name}}", "range": false, "refId": "A" } ], "title": "Time to Automatic Renewal (<45 days)", "type": "bargauge" }, { "datasource": { "type": "prometheus", "uid": "${datasource}" }, "description": "Time before the certificates expire", "fieldConfig": { "defaults": { "color": { "mode": "palette-classic" }, "mappings": [], "thresholds": { "mode": "absolute", "steps": [ { "color": "green", "value": null } ] }, "unit": "d" }, "overrides": [] }, "gridPos": { "h": 9, "w": 24, "x": 0, "y": 18 }, "id": 4, "options": { "legend": { "displayMode": "list", "placement": "right", "showLegend": true }, "tooltip": { "mode": "single", "sort": "none" } }, "targets": [ { "datasource": { "type": "prometheus", "uid": "${datasource}" }, "editorMode": "code", "exemplar": true, "expr": "(certmanager_certificate_expiration_timestamp_seconds{exported_namespace=~\"$namespace\"} - time())/(24*3600)", "instant": false, "legendFormat": "{{cluster}} - {{exported_namespace}} - {{name}}", "range": true, "refId": "A" } ], "title": "Time to Expiration", "type": "timeseries" }, { "datasource": { "type": "prometheus", "uid": "${datasource}" }, "description": "Displays the timestamp of a renewal, based on the expiration time changing", "fieldConfig": { "defaults": { "color": { "mode": "palette-classic" }, "decimals": 1, "mappings": [], "min": 0, "thresholds": { "mode": "absolute", "steps": [ { "color": "green", "value": null } ] }, "unit": "short" }, "overrides": [] }, "gridPos": { "h": 9, "w": 24, "x": 0, "y": 27 }, "id": 8, "options": { "legend": { "displayMode": "list", "placement": "right", "showLegend": true }, "tooltip": { "mode": "single", "sort": "none" } }, "pluginVersion": "8.3.2", "targets": [ { "datasource": { "type": "prometheus", "uid": "${datasource}" }, "editorMode": "code", "exemplar": true, "expr": "changes(certmanager_certificate_renewal_timestamp_seconds{cluster=~\"$cluster\", exported_namespace=~\"$namespace\"}[15m]) > 0", "format": "time_series", "instant": false, "legendFormat": "{{cluster}} - {{exported_namespace}} - {{name}}", "range": false, "refId": "A" } ], "title": "Certificate Renewal Events", "type": "timeseries" }, { "datasource": { "type": "prometheus", "uid": "${datasource}" }, "description": "Total number of HTTP requests, based on the selected range", "fieldConfig": { "defaults": { "color": { "mode": "palette-classic" }, "mappings": [], "noValue": "0", "thresholds": { "mode": "absolute", "steps": [ { "color": "text", "value": null } ] } }, "overrides": [] }, "gridPos": { "h": 10, "w": 24, "x": 0, "y": 36 }, "id": 10, "options": { "barRadius": 0, "barWidth": 0.97, "fullHighlight": false, "groupWidth": 0.7, "legend": { "displayMode": "list", "placement": "bottom", "showLegend": true }, "orientation": "auto", "showValue": "auto", "stacking": "none", "tooltip": { "mode": "single", "sort": "none" }, "xTickLabelRotation": -30, "xTickLabelSpacing": 0 }, "pluginVersion": "8.3.2", "targets": [ { "datasource": { "type": "prometheus", "uid": "${datasource}" }, "editorMode": "code", "exemplar": false, "expr": "sort_desc(sum by (cluster)(increase(certmanager_http_acme_client_request_count{cluster=~\"$cluster\"}[$__range]))) > 0", "format": "table", "instant": true, "legendFormat": "{{cluster}}", "range": false, "refId": "A" } ], "title": "ACME Requests by Cluster", "type": "barchart" } ], "schemaVersion": 39, "tags": [ "k8s", "cert-manager" ], "templating": { "list": [ { "current": { "selected": false, "text": "default", "value": "default" }, "hide": 0, "includeAll": false, "multi": false, "name": "datasource", "options": [], "query": "prometheus", "queryValue": "", "refresh": 1, "regex": "", "skipUrlSync": false, "type": "datasource" }, { "current": { "selected": false, "text": "All", "value": "$__all" }, "datasource": { "type": "prometheus", "uid": "${datasource}" }, "definition": "label_values(certmanager_clock_time_seconds, cluster)", "hide": 0, "includeAll": true, "multi": false, "name": "cluster", "options": [], "query": { "query": "label_values(certmanager_clock_time_seconds, cluster)", "refId": "StandardVariableQuery" }, "refresh": 1, "regex": "", "skipUrlSync": false, "sort": 0, "type": "query" }, { "allValue": ".*", "current": { "selected": false, "text": "All", "value": "$__all" }, "datasource": { "type": "prometheus", "uid": "${datasource}" }, "definition": "label_values(certmanager_certificate_ready_status{cluster=~\"$cluster\"}, exported_namespace)", "hide": 0, "includeAll": true, "multi": false, "name": "namespace", "options": [], "query": { "query": "label_values(certmanager_certificate_ready_status{cluster=~\"$cluster\"}, exported_namespace)", "refId": "StandardVariableQuery" }, "refresh": 1, "regex": "", "skipUrlSync": false, "sort": 0, "type": "query" } ] }, "time": { "from": "now-24h", "to": "now" }, "timepicker": {}, "timezone": "browser", "title": "Cert-manager-Kubernetes", "uid": "cdhrcds8aosg0c", "version": 2, "weekStart": "" } -
Apply the Grafana dashboard:
bashkubectl apply -f cert-manager-dashboard.yaml -
Create Prometheus alert rules for cert-manager:
yamlapiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: cert-manager namespace: cert-manager labels: prometheus: kube-prom-stack role: alert-rules release: kube-prom-stack spec: groups: - name: cert-manager rules: - alert: CertManagerCertificateReadyStatus annotations: description: 'Certificate for "{{ $labels.name }}" is not ready.' summary: Certificate is not ready expr: certmanager_certificate_ready_status{condition="False"} == 1 for: 10m labels: severity: critical - alert: CertManagerCertificateExpired annotations: description: 'Certificate "{{ $labels.exported_namespace }}/{{ $labels.name }}" expired {{ $value | humanizeDuration }} ago.' summary: Certificate has expired expr: time() - certmanager_certificate_expiration_timestamp_seconds > 0 for: 5m labels: severity: critical - alert: CertManagerCertificateExpiringSoon annotations: description: 'Certificate "{{ $labels.exported_namespace }}/{{ $labels.name }}" expires in {{ $value | humanizeDuration }}.' summary: Certificate expires in less than 14 days expr: certmanager_certificate_expiration_timestamp_seconds - time() > 0 and certmanager_certificate_expiration_timestamp_seconds - time() < 14 * 24 * 60 * 60 for: 1h labels: severity: warning -
Apply the alert rules:
bashkubectl apply -f cert-manager-prometheus-rules.yaml -
Verify Prometheus and Grafana picked up the configuration:
bashkubectl get prometheusrule -n cert-manager cert-manager kubectl get configmap -n observability cert-manager-dashboard -
Open Grafana and search for the
Cert-manager-Kubernetesdashboard. You should see valid certificates, expiring certificates, ACME request count, renewal events, and time-to-expiration panels.
Conclusion
You now have cert-manager metrics flowing into Prometheus, a Grafana dashboard for certificate visibility, and alert rules for the most important certificate states. This gives you an early warning before certificates expire and a quick place to check renewal behavior across clusters and namespaces.
References
- cert-manager Helm installation
- cert-manager Prometheus metrics
- Prometheus Operator alerting
- Grafana dashboard sidecar
If you found this useful, you can buy me a coffee! Thanks for the support!