2022-03-31 14:52:21 +02:00
2023-02-13 16:10:33 +01:00
2023-06-20 16:03:57 +02:00
wip
2023-02-09 10:07:29 +01:00
2023-12-20 22:46:27 +01:00
2023-12-20 22:46:27 +01:00
2023-05-23 12:22:22 +02:00

monitoring-ng

Yet another monitoring stack ... This time its a all-in-one grafana/prometheus/loki/node_exporter/cadvisor/promtail stack. It's based heavily on the monitoring-lite stack, but has everything in one recipe included now. So you can deploy monitoring instances to only gather metrics / logs (node_exporter/cadvisor/promtail) and also deploy instances with the full monitoring stack (grafana/prometheus/loki) with the same recipe and just different .env configuration.

  • Category: Apps
  • Status: 2, beta
  • Image: grafana/grafana, 4, upstream
  • Healthcheck: 3
  • Backups: 1
  • Email: 3
  • Tests: No
  • SSO: 1

Setup a Metrics Gathering

Where gathering.org is the node you want to gather metrics from.

  1. Configure DNS
  • monitoring.gathering.org
  • cadvisor.monitoring.gathering.org
  • node.monitoring.gathering.org
  1. Configure Traefik to use BasicAuth
  • abra app config traefik.gathering.org uncomment
# BASIC_AUTH
COMPOSE_FILE="$COMPOSE_FILE:compose.basicauth.yml"
BASIC_AUTH=1
SECRET_USERSFILE_VERSION=v1
  • Generate userslist with httpasswd hashed password abra app secret insert traefik.gathering.org usersfile v1 'admin:<hashed-secret>' make sure there is no whitespace in between admin:<hashed-secret>, it seems to break stuff...
  • abra app deploy -f traefik
  1. abra app new monitoring-ng
  2. abra app config monitoring.gathering.org for gathering only the main compose.yml is needed, nothing more.
  3. abra app deploy monitoring.gathering.org
  4. check that endpoints are up and basic-auth works
  • cadvisor.monitoring.gathering.org
  • node.monitoring.gathering.org

Expose node and cadvisor via ports instead of traefik

In case you have no traefik running on the machine, you can expose the ports directly by uncommenting the following line:

# COMPOSE_FILE="$COMPOSE_FILE:compose.expose-ports.yml"

Setup Metrics Browser

  1. Configure DNS
    • monitoring.example.org
    • prometheus.monitoring.example.org
    • loki.monitoring.example.org
  2. Setup monitoring stack
    • abra app new monitoring-ng
    • abra app config monitoring.example.org Uncomment all the stuff
    • abra app secret insert monitoring.example.org basic_auth v1 <secret> this needs the plaintext traefik basic-auth secret, not the hashed one!
    • abra app secret ls monitoring.example.org
    • abra app deploy monitoring.example.org
  3. Add scrape config to prometheus
  • abra app cmd monitoring.example.org prometheus gathering.org
  • or manually
    cp scrape-config.example.yml gathering.org.yml
    # adjust domain
    # mkdir scrape_configs
    abra app cp monitoring.dev.local-it.cloud gathering.org.yml prometheus:/prometheus/scrape_configs/
    
Service Authentication Domain
Grafana Email / SSO monitoring.example.org
Prometheus traefik basic-auth prometheus.monitoring.example.org
loki traefik basic-auth loki.monitoring.example.org
Cadvisor traefik basic-auth cadvisor.monitoring.example.org
Node Exporter traefik basic-auth node.monitoring.example.org

Logging from a docker host to loki server without anything else

$ docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions
$ echo '{
    "debug" : true,
    "log-driver": "loki",
    "log-opts": {
        "loki-url": "https://<user>:<secret>@loki.monitoring.example.org/loki/api/v1/push",
        "loki-batch-size": "400"
    }
}' > /etc/docker/daemon.json
$ systemctl restart docker.service

Setup Push Gateway

  1. Enable in the env fiöle by uncommenting the following lines:
## Prometheus Pushgateway
# COMPOSE_FILE="$COMPOSE_FILE:compose.pushgateway.yml"
  1. abra app deploy monitoring.example.org

This will expose the pushgateway at https://pushgateway.${DOMAIN}. It is secured behind the same basic auth as the other services. After that you need to add the pushgateway.${DOMAIN} to the scare config.

Post-setup guide

  • configure prometheus/loki/alertmanager as data sources in grafana under Configuration > Data sources
    • for loki, you need to set a "Custom HTTP Header": X-Scope-OrgID: fake
  • configure the SMTP mailer under Alerting > Contact points
    • edit the default contact point, choose "Alertmanager" as type & http://alertmanager:9093 as URL
    • use the "Test" button to send a test mail. It should fire a request at the alertmanager & that should send a mail
  • abra app cp your scrap_configs: ... into /prometheus/scrape_configs & log into your prometheus web UI to ensure they're working
  • load your dashboards in manually under Create > Dashboard
  • from your dashboard panels, choose Edit > Alert to create alerts based on those panels

THX to the previous work of @decentral1se @knooflok @3wc @cellarspoon @mirsal

Adding Matrix as Alert Contact point

  1. Enable the matrix-alertmanager-receiver:
COMPOSE_FILE="$COMPOSE_FILE:compose.matrix-alertmanager-receiver.yml"
  1. Insert the matrix access token secret:
abra app secret insert monitoring.marx.klasse-methode.it matrix_access_token v1
  1. Set required configurations:
GF_MATRIX_USER_ID=
GF_MATRIX_ROOM_ID=
GF_MATRIX_HOME_SERVER_URL=
  1. Configure Alertmanager webhook and set the url to http://matrix-alertmanager-receiver:12345/alerts/<room-id>

alerts

It is possible to enable the following alerts, by setting the corresponding env variable to true:

  • backupbot failed: ALERT_BACKUP_FAILED_ENABLED
  • backupbot missing: ALERT_BACKUP_MISSING_ENABLED
  • backupbot not successfull: ALERT_BACKUP_NOT_SUCCESSFULL_ENABLED
  • node disk space: ALERT_NODE_DISK_SPACE_ENABLED
  • node memory usage: ALERT_NODE_MEMORY_USAGE_ENABLED
Description
yet another try on the monitoring stack
Readme 302 KiB
Languages
Shell 91.5%
Roff 8.5%