Monitor Stalwart HA clusters with Grafana Cloud

Stalwart Grafana
Stalwart Grafana

Introduction

When running Stalwart in an HA configuration, keeping an eye on your cluster can be quite a challenge. Having separate log files for each instance makes it a struggle to monitor performance or troubleshoot issues effectively. Today, I'll show you how to streamline your workflow by sending your logs, metrics, and traces to Grafana Cloud. This setup gives you a single 'pane of glass' to observe, manage, and optimize your Stalwart cluster with ease.

Prerequisites

  1. A Stalwart HA cluster.
  2. A free Grafana Cloud account.

Step-by-step

We implement a pull-based model for metrics, where Prometheus scrapes data directly from Stalwart. For logs and traces, we utilize a push-based model, transmitting data to Grafana Cloud via Alloy, which serves as a proxy for forwarding telemetry.

  1. Configure Prometheus Authentication:

    In Stalwart UI, set up the necessary credentials (Username and Secret) to allow Prometheus to securely scrape metrics from the Stalwart instance.

    Prometheus Pull Metrics

  2. Establish the Scrape Job:

    In the Grafana Cloud interface, navigate to Connections > Add new connection > Metrics Endpoint. Define a new scrape job by entering your Stalwart configuration details and credentials.

    Prometheus Scrape Job

  3. Generate Telemetry Tokens:

    Access the Grafana Cloud Portal, select your stack, and navigate to Send Logs (Loki) and Send Traces (Tempo) to generate the authentication tokens required for Alloy.

    Grafana Cloud Portal

  4. Deploy Alloy Configuration:

    Create the config.alloy file on every Stalwart instance to define the data routing logic.

    otelcol.auth.basic "grafanacloud_traces" {
      username = "YOUR_TEMPO_USERNAME"
      password = "YOUR_TEMPO_API_KEY"
    }
    
    otelcol.receiver.otlp "default" {
      grpc {
        endpoint = "0.0.0.0:4317"
      }
    
      http {
        endpoint = "0.0.0.0:4318"
      }
    
      output {
        logs   = [otelcol.exporter.loki.grafanacloud.input]
        traces = [otelcol.exporter.otlp.grafanacloud_tempo.input]
      }
    }
    
    otelcol.exporter.loki "grafanacloud" {
      forward_to = [loki.write.grafanacloud.receiver]
    }
    
    loki.write "grafanacloud" {
      endpoint {
        url = "https://logs-prod-035.grafana.net/loki/api/v1/push"
    
        basic_auth {
          username = "YOUR_LOKI_USERNAME"
          password = "YOUR_LOKI_API_KEY"
        }
      }
    }
    
    otelcol.exporter.otlp "grafanacloud_tempo" {
      client {
        endpoint = "tempo-prod-25-prod-gb-south-1.grafana.net:443"
        auth     = otelcol.auth.basic.grafanacloud_traces.handler
      }
    }
    
  5. Launch via Docker Compose:

    Start the Alloy service using Docker Compose to begin processing and forwarding telemetry.

    stalwart:
      image: stalwartlabs/stalwart:v0.15.5
      container_name: stalwart
      restart: unless-stopped
      networks:
        - ip6net  
      ports:
        - "443:443"
        - "8080:8080"
        - "25:25"
        - "587:587"
        - "465:465"
        - "143:143"
        - "993:993"
        - "4190:4190"
        - "110:110"
        - "995:995"
      volumes:
        - ./data:/opt/stalwart
    alloy:
      image: grafana/alloy:v1.14.2
      container_name: alloy
      networks:
        - ip6net      
      command:
        - run
        - /etc/alloy/config.alloy
      ports:
        - "4317:4317"
        - "4318:4318"
      volumes:
        - ./config.alloy:/etc/alloy/config.alloy:ro
      restart: unless-stopped    
    
  6. Configure Stalwart Export:

    Update the Stalwart configuration to point logs and traces toward Alloy, which then acts as a proxy to forward the data to Loki and Tempo in Grafana Cloud.

    tracer.alloy.enable = true
    tracer.alloy.enable.log-exporter = true
    tracer.alloy.enable.span-exporter = true
    tracer.alloy.endpoint = "http://alloy:4317"
    tracer.alloy.level = "debug"
    tracer.alloy.lossy = false
    tracer.alloy.throttle = "1s"
    tracer.alloy.timeout = "10s"
    tracer.alloy.transport = "grpc"
    tracer.alloy.type = "open-telemetry"    
    

Conclusion

Congratulations! You have successfully centralized your Stalwart logs, traces, and metrics within Grafana Cloud. This unified, high-visibility interface simplifies the monitoring and maintenance of your Stalwart High Availability (HA) cluster, providing the insights needed to manage your infrastructure with ease.

Grafana Cloud Loki

Grafana Cloud Metrics

References

If you found this useful, you can buy me a coffee! Thanks for the support!