High Availability Self-Hosted Plausible Analytics on Kubernetes

HA Plausible K8s
HA Plausible K8s

Introduction

There are several tutorials on installing Plausible Analytics on Kubernetes, but none of them explain how to achieve high availability. In this guide, I'll walk you through setting up Plausible Analytics with a highly available ClickHouse cluster and PostgreSQL cluster.

Prerequisites

  • A Kubernetes cluster with at least three worker nodes.
  • CloudNativePG - PostgreSQL Operator for Kubernetes installed.
  • Altinity Kubernetes Operator for ClickHouse installed.
  • cert-manager.io installed with a cluster issuer.

Step-by-step Guide

Note: This might only works with Plausible CE v2.1.3.

  1. Create a plausible namespace:

    apiVersion: v1
    kind: Namespace
    metadata:
      name: plausible
    

    Save as namespace.yaml file and send to K8s:

    kubectl create -f namespace.yaml
    
  2. ZooKeeper is required for ClickHouse cluster replication, deploy a HA ZooKeeper:

    ---
    apiVersion: policy/v1
    kind: PodDisruptionBudget
    metadata:
      name: zookeeper-pod-disruption-budget
      namespace: plausible
    spec:
      selector:
        matchLabels:
          app: zookeeper
      maxUnavailable: 1
    ---
    # Setup Service to provide access to Zookeeper for clients
    apiVersion: v1
    kind: Service
    metadata:
      name: zookeeper
      namespace: plausible
      labels:
        app: zookeeper
    spec:
      ports:
        - port: 2181
          name: client
        - port: 7000
          name: prometheus
      selector:
        app: zookeeper
        what: node
    ---
    # Setup Headless Service for StatefulSet
    apiVersion: v1
    kind: Service
    metadata:
      # DNS would be like zookeeper-0.zookeeper-headless
      name: zookeeper-headless
      namespace: plausible
      labels:
        app: zookeeper
    spec:
      ports:
        - port: 2181
          name: client
        - port: 2888
          name: server
        - port: 3888
          name: leader-election
      clusterIP: None
      selector:
        app: zookeeper
        what: node
    ---
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: zookeeper
      namespace: plausible
    spec:
      selector:
        matchLabels:
          app: zookeeper
      serviceName: zookeeper-headless
      replicas: 3
      updateStrategy:
        type: RollingUpdate
      podManagementPolicy: Parallel
      template:
        metadata:
          labels:
            app: zookeeper
            what: node
          annotations:
            prometheus.io/port: '7000'
            prometheus.io/scrape: 'true'
        spec:
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                - labelSelector:
                    matchExpressions:
                      - key: 'app'
                        operator: In
                        values:
                          - zookeeper
                  topologyKey: 'kubernetes.io/hostname'
          containers:
            - name: kubernetes-zookeeper
              imagePullPolicy: IfNotPresent
              image: 'docker.io/zookeeper:3.8.3'
              ports:
                - containerPort: 2181
                  name: client
                - containerPort: 2888
                  name: server
                - containerPort: 3888
                  name: leader-election
                - containerPort: 7000
                  name: prometheus
              # See those links for proper startup settings:
              # https://github.com/kow3ns/kubernetes-zookeeper/blob/master/docker/scripts/start-zookeeper
              # https://clickhouse.yandex/docs/en/operations/tips/#zookeeper
              # https://github.com/ClickHouse/ClickHouse/issues/11781
              command:
                - bash
                - -x
                - -c
                - |
                  SERVERS=3 &&
                  HOST=`hostname -s` &&
                  DOMAIN=`hostname -d` &&
                  CLIENT_PORT=2181 &&
                  SERVER_PORT=2888 &&
                  ELECTION_PORT=3888 &&
                  PROMETHEUS_PORT=7000 &&
                  ZOO_DATA_DIR=/var/lib/zookeeper/data &&
                  ZOO_DATA_LOG_DIR=/var/lib/zookeeper/datalog &&
                  {
                    echo "clientPort=${CLIENT_PORT}"
                    echo 'tickTime=2000'
                    echo 'initLimit=300'
                    echo 'syncLimit=10'
                    echo 'maxClientCnxns=2000'
                    echo 'maxTimeToWaitForEpoch=2000'
                    echo 'maxSessionTimeout=60000000'
                    echo "dataDir=${ZOO_DATA_DIR}"
                    echo "dataLogDir=${ZOO_DATA_LOG_DIR}"
                    echo 'autopurge.snapRetainCount=10'
                    echo 'autopurge.purgeInterval=1'
                    echo 'preAllocSize=131072'
                    echo 'snapCount=3000000'
                    echo 'leaderServes=yes'
                    echo 'standaloneEnabled=false'
                    echo '4lw.commands.whitelist=*'
                    echo 'metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider'
                    echo "metricsProvider.httpPort=${PROMETHEUS_PORT}"
                    echo "zookeeper.skipACL=true"
                  } > /conf/zoo.cfg &&
                  {
                    echo "zookeeper.root.logger=CONSOLE"
                    echo "zookeeper.console.threshold=INFO"
                    echo "log4j.rootLogger=\${zookeeper.root.logger}"
                    echo "log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender"
                    echo "log4j.appender.CONSOLE.Threshold=\${zookeeper.console.threshold}"
                    echo "log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout"
                    echo "log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} [myid:%X{myid}] - %-5p [%t:%C{1}@%L] - %m%n"
                  } > /conf/log4j.properties &&
                  echo 'JVMFLAGS="-Xms128M -Xmx1G -XX:ActiveProcessorCount=8 -XX:+AlwaysPreTouch -Djute.maxbuffer=8388608 -XX:MaxGCPauseMillis=50"' > /conf/java.env &&
                  if [[ $HOST =~ (.*)-([0-9]+)$ ]]; then
                      NAME=${BASH_REMATCH[1]}
                      ORD=${BASH_REMATCH[2]}
                  else
                      echo "Failed to parse name and ordinal of Pod"
                      exit 1
                  fi &&
                  mkdir -p ${ZOO_DATA_DIR} &&
                  mkdir -p ${ZOO_DATA_LOG_DIR} &&
                  export MY_ID=$((ORD+1)) &&
                  echo $MY_ID > $ZOO_DATA_DIR/myid &&
                  for (( i=1; i<=$SERVERS; i++ )); do
                      echo "server.$i=$NAME-$((i-1)).$DOMAIN:$SERVER_PORT:$ELECTION_PORT" >> /conf/zoo.cfg;
                  done &&
                  if [[ $SERVERS -eq 1 ]]; then
                      echo "group.1=1" >> /conf/zoo.cfg;
                  else
                      echo "group.1=1:2:3" >> /conf/zoo.cfg;
                  fi &&
                  chown -Rv zookeeper "$ZOO_DATA_DIR" "$ZOO_DATA_LOG_DIR" "$ZOO_LOG_DIR" "$ZOO_CONF_DIR" &&
                  zkServer.sh start-foreground
              readinessProbe:
                exec:
                  command:
                    - bash
                    - -c
                    - 'OK=$(echo ruok | nc 127.0.0.1 2181); if [[ "$OK" == "imok" ]]; then exit 0; else exit 1; fi'
                initialDelaySeconds: 10
                timeoutSeconds: 5
              livenessProbe:
                exec:
                  command:
                    - bash
                    - -c
                    - 'OK=$(echo ruok | nc 127.0.0.1 2181); if [[ "$OK" == "imok" ]]; then exit 0; else exit 1; fi'
                initialDelaySeconds: 10
                timeoutSeconds: 5
              volumeMounts:
                - name: zookeeper-data
                  mountPath: /var/lib/zookeeper
          securityContext:
            runAsUser: 1000
            fsGroup: 1000
      volumeClaimTemplates:
        - metadata:
            name: zookeeper-data
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 1Gi    
    

    Save as zookeeper.yaml file and send to K8s:

    kubectl create -f zookeeper.yaml
    
  3. Deploy the HA ClickHouse cluster:

    apiVersion: 'clickhouse.altinity.com/v1'
    kind: 'ClickHouseInstallation'
    metadata:
      name: plausible
      namespace: plausible
    spec:
      configuration:
        users:
          plausible/password_sha256_hex: xxxyyy # printf 'YOUR_PASSWORD' | sha256sum
          plausible/networks/ip:
            - 0.0.0.0/0
        zookeeper:
          nodes:
            - host: zookeeper-0.zookeeper-headless.plausible.svc
            - host: zookeeper-1.zookeeper-headless.plausible.svc
            - host: zookeeper-2.zookeeper-headless.plausible.svc
        clusters:
          - name: 'cluster'
            templates:
              podTemplate: pod-template
            layout:
              shardsCount: 1
              replicasCount: 3
      templates:
        podTemplates:
          - name: pod-template
            spec:
              containers:
                - name: clickhouse
                  image: clickhouse/clickhouse-server:24.3
                  volumeMounts:
                    - name: clickhouse-data
                      mountPath: /var/lib/clickhouse
                    - name: clickhouse-log
                      mountPath: /var/log/clickhouse-server
        volumeClaimTemplates:
          - name: clickhouse-data
            spec:
              accessModes:
                - ReadWriteOnce
              resources:
                requests:
                  storage: 2Gi
          - name: clickhouse-log
            spec:
              accessModes:
                - ReadWriteOnce
              resources:
                requests:
                  storage: 2Gi    
    

    Save as clickhouse.yaml file and send to K8s:

    kubectl create -f clickhouse.yaml
    
  4. Deploy a HA PostgreSQL cluster:

    ---
    apiVersion: postgresql.cnpg.io/v1
    kind: Cluster
    metadata:
      name: postgresql-plausible
      namespace: plausible
    spec:
      # https://github.com/cloudnative-pg/postgres-containers/pkgs/container/postgresql
      # https://github.com/cloudnative-pg/postgis-containers/pkgs/container/postgis
      imageName: ghcr.io/cloudnative-pg/postgresql:17.0
      instances: 3
      postgresql:
        parameters:
          max_worker_processes: '60'
        pg_hba:
          - host all all all md5
      storage:
        size: 1Gi
      primaryUpdateMethod: switchover
      monitoring:
        enablePodMonitor: true
    ---
    apiVersion: postgresql.cnpg.io/v1
    kind: Pooler
    metadata:
      name: postgresql
      namespace: plausible
    spec:
      cluster:
        name: postgresql-plausible
      instances: 3
      type: rw
      pgbouncer:
        poolMode: session
        parameters:
          max_client_conn: '1000'
          default_pool_size: '10'    
    

    Save as postgresql.yaml file and send to K8s:

    kubectl create -f postgresql.yaml
    
  5. Provide the ENVs for the Plausible through ConfigMap and Secret:

    You can get the PostgreSQL password in the postgresql-plausible-app secret:

    kubectl -n plausible get secret postgresql-plausible-app -o jsonpath="{.data.password}" | base64 --decode
    

    You can create a K8s secret for storing plausible ENVs:

    export SECRET_KEY_BASE=$(openssl rand -base64 48)
    export TOTP_VAULT_KEY=$(openssl rand -base64 32)
    export DATABASE_URL='postgres://app:[email protected]:5432/app'
    export CLICKHOUSE_DATABASE_URL='http://default:[email protected]:8123/default'
    export SMTP_USER_PWD='toBeReplaced'
    
    kubectl create secret generic --dry-run=client \
        plausible-env \
        --namespace='plausible' \
        --from-literal=SECRET_KEY_BASE=$SECRET_KEY_BASE \
        --from-literal=TOTP_VAULT_KEY=$TOTP_VAULT_KEY \
        --from-literal=DATABASE_URL=$DATABASE_URL \
        --from-literal=CLICKHOUSE_DATABASE_URL=$CLICKHOUSE_DATABASE_URL \
        --from-literal=SMTP_USER_PWD=$SMTP_USER_PWD \
        -o yaml
    

    Your final YAML file for the ENVs will looks like this:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: plausible-env
      namespace: plausible
    data:
      BASE_URL: 'https://analytics.domain.com' # Update this
      DISABLE_REGISTRATION: 'invite_only'
      ENABLE_EMAIL_VERIFICATION: 'true'
      MAILER_NAME: 'Plausible Analytics'
      MAILER_EMAIL: '[email protected]' # Update this
      SMTP_HOST_ADDR: 'smtp.tem.scw.cloud' # Update this
      SMTP_USER_NAME: 'xxxxxxxx-aaaa-bbbb-cccc-yyyyyyyyyyyyy' # Update this
    ---
    apiVersion: v1
    data:
      CLICKHOUSE_DATABASE_URL: XXX
      DATABASE_URL: YYY
      SECRET_KEY_BASE: ZZZ
      SMTP_USER_PWD: AAA
      TOTP_VAULT_KEY: BBB
    kind: Secret
    metadata:
      creationTimestamp: null
      name: plausible-env
      namespace: plausible
    

    Save as env.yaml file and send to K8s:

    kubectl create -f env.yaml
    
  6. Plausible doesn't natively support a highly available (HA) ClickHouse replication cluster. To enable this, we need to manually create the necessary Replicated tables on the ClickHouse cluster:

    CREATE TABLE default.location_data ON cluster '{cluster}'
    (
        `type` LowCardinality(String),
        `id` String,
        `name` String
    )
    ENGINE = ReplicatedMergeTree('/clickhouse/{cluster}/tables/{shard}/{database}/{table}', '{replica}')
    ORDER BY (type, id)
    SETTINGS index_granularity = 128
    COMMENT '2024-07-09';
    
    CREATE DICTIONARY default.location_data_dict ON cluster '{cluster}'
    (
        `type` String,
        `id` String,
        `name` String
    )
    PRIMARY KEY type, id
    SOURCE(CLICKHOUSE(TABLE location_data DB 'default'))
    LIFETIME(MIN 0 MAX 0)
    LAYOUT(COMPLEX_KEY_CACHE(SIZE_IN_CELLS 500000));
    
    CREATE TABLE default.sessions_v2 ON cluster '{cluster}'
    (
        `session_id` UInt64,
        `sign` Int8,
        `site_id` UInt64,
        `user_id` UInt64,
        `hostname` String CODEC(ZSTD(3)),
        `timestamp` DateTime CODEC(Delta(4), LZ4),
        `start` DateTime CODEC(Delta(4), LZ4),
        `is_bounce` UInt8,
        `entry_page` String CODEC(ZSTD(3)),
        `exit_page` String CODEC(ZSTD(3)),
        `pageviews` Int32,
        `events` Int32,
        `duration` UInt32,
        `referrer` String CODEC(ZSTD(3)),
        `referrer_source` String CODEC(ZSTD(3)),
        `country_code` LowCardinality(FixedString(2)),
        `screen_size` LowCardinality(String),
        `operating_system` LowCardinality(String),
        `browser` LowCardinality(String),
        `utm_medium` String CODEC(ZSTD(3)),
        `utm_source` String CODEC(ZSTD(3)),
        `utm_campaign` String CODEC(ZSTD(3)),
        `browser_version` LowCardinality(String),
        `operating_system_version` LowCardinality(String),
        `subdivision1_code` LowCardinality(String),
        `subdivision2_code` LowCardinality(String),
        `city_geoname_id` UInt32,
        `utm_content` String CODEC(ZSTD(3)),
        `utm_term` String CODEC(ZSTD(3)),
        `transferred_from` String,
        `entry_meta.key` Array(String) CODEC(ZSTD(3)),
        `entry_meta.value` Array(String) CODEC(ZSTD(3)),
        `exit_page_hostname` String CODEC(ZSTD(3)),
        `city` UInt32 ALIAS city_geoname_id,
        `country` LowCardinality(FixedString(2)) ALIAS country_code,
        `device` LowCardinality(String) ALIAS screen_size,
        `entry_page_hostname` String ALIAS hostname,
        `os` LowCardinality(String) ALIAS operating_system,
        `os_version` LowCardinality(String) ALIAS operating_system_version,
        `region` LowCardinality(String) ALIAS subdivision1_code,
        `screen` LowCardinality(String) ALIAS screen_size,
        `source` String ALIAS referrer_source,
        `country_name` String ALIAS dictGet('default.location_data_dict', 'name', ('country', country_code)),
        `region_name` String ALIAS dictGet('default.location_data_dict', 'name', ('subdivision', subdivision1_code)),
        `city_name` String ALIAS dictGet('default.location_data_dict', 'name', ('city', city_geoname_id)),
        `channel` LowCardinality(String),
        INDEX minmax_timestamp timestamp TYPE minmax GRANULARITY 1
    )
    ENGINE = ReplicatedVersionedCollapsingMergeTree('/clickhouse/{cluster}/tables/{shard}/{database}/{table}', '{replica}', sign, events)
    PARTITION BY toYYYYMM(start)
    PRIMARY KEY (site_id, toDate(start), user_id, session_id)
    ORDER BY (site_id, toDate(start), user_id, session_id)
    SAMPLE BY user_id
    SETTINGS index_granularity = 8192;
    
    
    CREATE TABLE default.ingest_counters ON cluster '{cluster}'
    (
        `event_timebucket` DateTime,
        `domain` LowCardinality(String),
        `site_id` Nullable(UInt64),
        `metric` LowCardinality(String),
        `value` UInt64
    )
    ENGINE = ReplicatedSummingMergeTree('/clickhouse/{cluster}/tables/{shard}/{database}/{table}', '{replica}', value)
    ORDER BY (domain, toDate(event_timebucket), metric, toStartOfMinute(event_timebucket))
    SETTINGS index_granularity = 8192;
    
    CREATE TABLE default.imported_visitors ON cluster '{cluster}'
    (
        `site_id` UInt64,
        `date` Date,
        `visitors` UInt64,
        `pageviews` UInt64,
        `bounces` UInt64,
        `visits` UInt64,
        `visit_duration` UInt64,
        `import_id` UInt64
    )
    ENGINE = ReplicatedMergeTree('/clickhouse/{cluster}/tables/{shard}/{database}/{table}', '{replica}')
    ORDER BY (site_id, date)
    SETTINGS index_granularity = 8192, replicated_deduplication_window = 0;
    
    CREATE TABLE default.imported_sources ON cluster '{cluster}'
    (
        `site_id` UInt64,
        `date` Date,
        `source` String,
        `utm_medium` String,
        `utm_campaign` String,
        `utm_content` String,
        `utm_term` String,
        `visitors` UInt64,
        `visits` UInt64,
        `visit_duration` UInt64,
        `bounces` UInt32,
        `import_id` UInt64,
        `pageviews` UInt64,
        `referrer` String,
        `utm_source` String
    )
    ENGINE = ReplicatedMergeTree('/clickhouse/{cluster}/tables/{shard}/{database}/{table}', '{replica}')
    ORDER BY (site_id, date, source)
    SETTINGS index_granularity = 8192, replicated_deduplication_window = 0;
    
    CREATE TABLE default.imported_pages ON cluster '{cluster}'
    (
        `site_id` UInt64,
        `date` Date,
        `hostname` String,
        `page` String,
        `visitors` UInt64,
        `pageviews` UInt64,
        `exits` UInt64,
        `time_on_page` UInt64,
        `import_id` UInt64,
        `visits` UInt64,
        `active_visitors` UInt64
    )
    ENGINE = ReplicatedMergeTree('/clickhouse/{cluster}/tables/{shard}/{database}/{table}', '{replica}')
    ORDER BY (site_id, date, hostname, page)
    SETTINGS index_granularity = 8192, replicated_deduplication_window = 0;
    
    CREATE TABLE default.imported_operating_systems ON cluster '{cluster}'
    (
        `site_id` UInt64,
        `date` Date,
        `operating_system` String,
        `visitors` UInt64,
        `visits` UInt64,
        `visit_duration` UInt64,
        `bounces` UInt32,
        `import_id` UInt64,
        `pageviews` UInt64,
        `operating_system_version` String
    )
    ENGINE = ReplicatedMergeTree('/clickhouse/{cluster}/tables/{shard}/{database}/{table}', '{replica}')
    ORDER BY (site_id, date, operating_system)
    SETTINGS index_granularity = 8192, replicated_deduplication_window = 0;
    
    CREATE TABLE default.imported_locations ON cluster '{cluster}'
    (
        `site_id` UInt64,
        `date` Date,
        `country` String,
        `region` String,
        `city` UInt64,
        `visitors` UInt64,
        `visits` UInt64,
        `visit_duration` UInt64,
        `bounces` UInt32,
        `import_id` UInt64,
        `pageviews` UInt64,
        `country_name` String ALIAS dictGet('default.location_data_dict', 'name', ('country', country)),
        `region_name` String ALIAS dictGet('default.location_data_dict', 'name', ('subdivision', region)),
        `city_name` String ALIAS dictGet('default.location_data_dict', 'name', ('city', city))
    )
    ENGINE = ReplicatedMergeTree('/clickhouse/{cluster}/tables/{shard}/{database}/{table}', '{replica}')
    ORDER BY (site_id, date, country, region, city)
    SETTINGS index_granularity = 8192, replicated_deduplication_window = 0;
    
    CREATE TABLE default.imported_exit_pages ON cluster '{cluster}'
    (
        `site_id` UInt64,
        `date` Date,
        `exit_page` String,
        `visitors` UInt64,
        `exits` UInt64,
        `import_id` UInt64,
        `pageviews` UInt64,
        `bounces` UInt32,
        `visit_duration` UInt64
    )
    ENGINE = ReplicatedMergeTree('/clickhouse/{cluster}/tables/{shard}/{database}/{table}', '{replica}')
    ORDER BY (site_id, date, exit_page)
    SETTINGS index_granularity = 8192, replicated_deduplication_window = 0;
    
    CREATE TABLE default.imported_entry_pages ON cluster '{cluster}'
    (
        `site_id` UInt64,
        `date` Date,
        `entry_page` String,
        `visitors` UInt64,
        `entrances` UInt64,
        `visit_duration` UInt64,
        `bounces` UInt32,
        `import_id` UInt64,
        `pageviews` UInt64
    )
    ENGINE = ReplicatedMergeTree('/clickhouse/{cluster}/tables/{shard}/{database}/{table}', '{replica}')
    ORDER BY (site_id, date, entry_page)
    SETTINGS index_granularity = 8192, replicated_deduplication_window = 0;
    
    CREATE TABLE default.imported_devices ON cluster '{cluster}'
    (
        `site_id` UInt64,
        `date` Date,
        `device` String,
        `visitors` UInt64,
        `visits` UInt64,
        `visit_duration` UInt64,
        `bounces` UInt32,
        `import_id` UInt64,
        `pageviews` UInt64
    )
    ENGINE = ReplicatedMergeTree('/clickhouse/{cluster}/tables/{shard}/{database}/{table}', '{replica}')
    ORDER BY (site_id, date, device)
    SETTINGS index_granularity = 8192, replicated_deduplication_window = 0;
    
    CREATE TABLE default.imported_custom_events ON cluster '{cluster}'
    (
        `site_id` UInt64,
        `import_id` UInt64,
        `date` Date,
        `name` String CODEC(ZSTD(3)),
        `link_url` String CODEC(ZSTD(3)),
        `path` String CODEC(ZSTD(3)),
        `visitors` UInt64,
        `events` UInt64
    )
    ENGINE = ReplicatedMergeTree('/clickhouse/{cluster}/tables/{shard}/{database}/{table}', '{replica}')
    ORDER BY (site_id, import_id, date, name)
    SETTINGS replicated_deduplication_window = 0, index_granularity = 8192;
    
    CREATE TABLE default.imported_browsers ON cluster '{cluster}'
    (
        `site_id` UInt64,
        `date` Date,
        `browser` String,
        `visitors` UInt64,
        `visits` UInt64,
        `visit_duration` UInt64,
        `bounces` UInt32,
        `import_id` UInt64,
        `pageviews` UInt64,
        `browser_version` String
    )
    ENGINE = ReplicatedMergeTree('/clickhouse/{cluster}/tables/{shard}/{database}/{table}', '{replica}')
    ORDER BY (site_id, date, browser)
    SETTINGS index_granularity = 8192, replicated_deduplication_window = 0;
    
    CREATE TABLE default.events_v2 ON cluster '{cluster}'
    (
        `timestamp` DateTime CODEC(Delta(4), LZ4),
        `name` LowCardinality(String),
        `site_id` UInt64,
        `user_id` UInt64,
        `session_id` UInt64,
        `hostname` String CODEC(ZSTD(3)),
        `pathname` String CODEC(ZSTD(3)),
        `referrer` String CODEC(ZSTD(3)),
        `referrer_source` String CODEC(ZSTD(3)),
        `country_code` FixedString(2),
        `screen_size` LowCardinality(String),
        `operating_system` LowCardinality(String),
        `browser` LowCardinality(String),
        `utm_medium` String CODEC(ZSTD(3)),
        `utm_source` String CODEC(ZSTD(3)),
        `utm_campaign` String CODEC(ZSTD(3)),
        `meta.key` Array(String) CODEC(ZSTD(3)),
        `meta.value` Array(String) CODEC(ZSTD(3)),
        `browser_version` LowCardinality(String),
        `operating_system_version` LowCardinality(String),
        `subdivision1_code` LowCardinality(String),
        `subdivision2_code` LowCardinality(String),
        `city_geoname_id` UInt32,
        `utm_content` String CODEC(ZSTD(3)),
        `utm_term` String CODEC(ZSTD(3)),
        `revenue_reporting_amount` Nullable(Decimal(18, 3)),
        `revenue_reporting_currency` FixedString(3),
        `revenue_source_amount` Nullable(Decimal(18, 3)),
        `revenue_source_currency` FixedString(3),
        `city` UInt32 ALIAS city_geoname_id,
        `country` LowCardinality(FixedString(2)) ALIAS country_code,
        `device` LowCardinality(String) ALIAS screen_size,
        `os` LowCardinality(String) ALIAS operating_system,
        `os_version` LowCardinality(String) ALIAS operating_system_version,
        `region` LowCardinality(String) ALIAS subdivision1_code,
        `screen` LowCardinality(String) ALIAS screen_size,
        `source` String ALIAS referrer_source,
        `country_name` String ALIAS dictGet('default.location_data_dict', 'name', ('country', country_code)),
        `region_name` String ALIAS dictGet('default.location_data_dict', 'name', ('subdivision', subdivision1_code)),
        `city_name` String ALIAS dictGet('default.location_data_dict', 'name', ('city', city_geoname_id)),
        `channel` LowCardinality(String)
    )
    ENGINE = ReplicatedMergeTree('/clickhouse/{cluster}/tables/{shard}/{database}/{table}', '{replica}')
    PARTITION BY toYYYYMM(timestamp)
    PRIMARY KEY (site_id, toDate(timestamp), name, user_id)
    ORDER BY (site_id, toDate(timestamp), name, user_id, timestamp)
    SAMPLE BY user_id
    SETTINGS index_granularity = 8192;
    
    CREATE TABLE default.schema_migrations ON cluster '{cluster}'
    (
        `version` Int64,
        `inserted_at` DateTime
    )
    ENGINE = ReplicatedMergeTree('/clickhouse/{cluster}/tables/{shard}/{database}/{table}', '{replica}')
    ORDER BY tuple();
    
    INSERT INTO "default"."schema_migrations" (version, inserted_at) VALUES
    (20200915070607,'2024-09-11 09:23:21'),
    (20200918075025,'2024-09-11 09:23:21'),
    (20201020083739,'2024-09-11 09:23:21'),
    (20201106125234,'2024-09-11 09:23:21'),
    (20210323130440,'2024-09-11 09:23:21'),
    (20210712214034,'2024-09-11 09:23:21'),
    (20211017093035,'2024-09-11 09:23:21'),
    (20211112130238,'2024-09-11 09:23:21'),
    (20220310104931,'2024-09-11 09:23:21'),
    (20220404123000,'2024-09-11 09:23:21'),
    (20220421161259,'2024-09-11 09:23:21'),
    (20220422075510,'2024-09-11 09:23:21'),
    (20230124140348,'2024-09-11 09:23:21'),
    (20230210140348,'2024-09-11 09:23:21'),
    (20230214114402,'2024-09-11 09:23:21'),
    (20230320094327,'2024-09-11 09:23:21'),
    (20230417104025,'2024-09-11 09:23:21'),
    (20230509124919,'2024-09-11 09:23:21'),
    (20231017073642,'2024-09-11 09:23:21'),
    (20240123142959,'2024-09-11 09:23:21'),
    (20240209085338,'2024-09-11 09:23:21'),
    (20240220123656,'2024-09-11 09:23:21'),
    (20240222082911,'2024-09-11 09:23:21'),
    (20240305085310,'2024-09-11 09:23:21'),
    (20240326134840,'2024-09-11 09:23:21'),
    (20240327085855,'2024-09-11 09:23:21'),
    (20240419133926,'2024-09-11 09:23:21'),
    (20240423094014,'2024-09-11 09:23:21'),
    (20240502115822,'2024-09-11 09:23:21'),
    (20240709181437,'2024-09-11 09:23:22'),
    (20240801091615,'2024-09-11 09:23:22'),
    (20240829092858,'2024-09-11 09:23:22');    
    

    Connect to ClickHouse and execute the SQL query above using any preferred tool, such as DBeaver.

  7. Deploy the Plausible with two or more replicas:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: plausible
      namespace: plausible
      annotations:
        reloader.stakater.com/auto: 'true'
      labels:
        app: plausible
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: plausible
      template:
        metadata:
          namespace: plausible
          labels:
            app: plausible
        spec:
          initContainers:
            - name: migration
              image: ghcr.io/plausible/community-edition:v2.1.3
              command:
                - /bin/sh
                - -c
                - |
                  /entrypoint.sh db createdb && /entrypoint.sh db migrate
              envFrom:
                - secretRef:
                    name: plausible-env
                - configMapRef:
                    name: plausible-env
          containers:
            - name: plausible
              image: ghcr.io/plausible/community-edition:v2.1.3
              ports:
                - containerPort: 8000
              envFrom:
                - secretRef:
                    name: plausible-env
                - configMapRef:
                    name: plausible-env    
    

    Save as depl.yaml file and send to K8s:

    kubectl create -f depl.yaml
    
  8. Configure service and ingress for Plausible:

    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: plausible
      namespace: plausible
    spec:
      selector:
        app: plausible
      ports:
        - protocol: TCP
          port: 8000
          targetPort: 8000
    ---
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: plausible
      namespace: plausible
      annotations:
        cert-manager.io/cluster-issuer: letsencrypt-cluster-issuer
    spec:
      tls:
        - hosts:
            - analytics.domain.com # Update this
          secretName: plausible-tls
      rules:
        - host: analytics.domain.com
          http:
            paths:
              - path: /
                pathType: Prefix
                backend:
                  service:
                    name: plausible
                    port:
                      number: 8000
    

    Save as ingress-service.yaml file and send to K8s:

    kubectl create -f ingress-service.yaml
    

Verify

Once configured correctly, the following result will be displayed:

kubectl -n plausible get pods
NAME                          READY   STATUS    RESTARTS      AGE
chi-plausible-cluster-0-0-0   1/1     Running   0             24h
chi-plausible-cluster-0-1-0   1/1     Running   0             24h
chi-plausible-cluster-0-2-0   1/1     Running   0             24h
plausible-6cc78d59bb-56wjj    1/1     Running   2 (24h ago)   24h
plausible-6cc78d59bb-kl6sf    1/1     Running   1 (24h ago)   24h
postgresql-7968d465bd-87gx4   1/1     Running   0             24h
postgresql-7968d465bd-hlrdj   1/1     Running   0             24h
postgresql-7968d465bd-wj4ln   1/1     Running   0             24h
postgresql-plausible-1        1/1     Running   0             24h
postgresql-plausible-2        1/1     Running   0             24h
postgresql-plausible-3        1/1     Running   0             24h
zookeeper-0                   1/1     Running   1 (22h ago)   24h
zookeeper-1                   1/1     Running   0             24h
zookeeper-2                   1/1     Running   0             24h

You can now access to analytics.domain.com and create your Plausible account.

Conclusion

By following these steps, you now have a production-grade Plausible Analytics setup running on your Kubernetes cluster. I've been running my own Plausible in my cluster, and it has performed well without any downtime, even during Kubernetes node maintenance/restating.

References

Comments