yet another try on the monitoring stack
Go to file
Moritz 5d0b6d88fc fix default DOMAIN naming scheme 2024-01-19 12:30:58 +01:00
release chore: publish 1.0.0+v1.7.0 release 2023-12-21 02:24:47 +01:00
.env.sample fix default DOMAIN naming scheme 2024-01-19 12:30:58 +01:00
.gitignore init 2022-03-31 14:52:21 +02:00
README.md shorten basic_auth secret 2023-12-20 22:46:27 +01:00
abra.sh chore: publish 1.0.0+v1.7.0 release 2023-12-21 02:24:47 +01:00
compose.grafana.yml chore: publish 1.0.0+v1.7.0 release 2023-12-21 02:24:47 +01:00
compose.loki.yml chore: publish 1.0.0+v1.7.0 release 2023-12-21 02:24:47 +01:00
compose.prometheus.yml chore: publish 1.0.0+v1.7.0 release 2023-12-21 02:24:47 +01:00
compose.promtail.yml chore: publish 1.0.0+v1.7.0 release 2023-12-21 02:24:47 +01:00
compose.yml chore: publish 1.0.0+v1.7.0 release 2023-12-21 02:24:47 +01:00
grafana-dashboards.yml add grafana 2023-02-11 17:17:50 +01:00
grafana-datasources.yml wip loki stuff 2023-02-13 16:10:33 +01:00
grafana-logs-dashboard.json update grafana dashboards 2023-12-21 01:43:44 +01:00
grafana-stacks-dashboard.json update grafana dashboards 2023-12-21 01:43:44 +01:00
grafana-swarm-dashboard.json update grafana dashboards 2023-12-21 01:43:44 +01:00
grafana-traefik-dashboard.json update grafana dashboards 2023-12-21 01:43:44 +01:00
grafana_custom.ini chore: publish 0.3.0+v1.6.0 release 2023-07-11 14:59:37 +02:00
loki.yml.tmpl fix: increase loki limits 2023-06-20 16:03:57 +02:00
node-exporter-entrypoint.sh wip 2023-02-09 10:07:29 +01:00
prometheus.yml.tmpl shorten basic_auth secret 2023-12-20 22:46:27 +01:00
promtail.yml.tmpl shorten basic_auth secret 2023-12-20 22:46:27 +01:00
scrape-config.example.yml fix scrape config 2023-05-23 12:22:22 +02:00

README.md

monitoring-ng

Yet another monitoring stack ... This time its a all-in-one grafana/prometheus/loki/node_exporter/cadvisor/promtail stack. It's based heavily on the monitoring-lite stack, but has everything in one recipe included now. So you can deploy monitoring instances to only gather metrics / logs (node_exporter/cadvisor/promtail) and also deploy instances with the full monitoring stack (grafana/prometheus/loki) with the same recipe and just different .env configuration.

  • Category: Apps
  • Status: 2, beta
  • Image: grafana/grafana, 4, upstream
  • Healthcheck: 3
  • Backups: 1
  • Email: 3
  • Tests: No
  • SSO: 1

Setup a Metrics Gathering

Where gathering.org is the node you want to gather metrics from.

  1. Configure DNS
  • monitoring.gathering.org
  • cadvisor.monitoring.gathering.org
  • node.monitoring.gathering.org
  1. Configure Traefik to use BasicAuth
  • abra app config traefik.gathering.org uncomment
# BASIC_AUTH
COMPOSE_FILE="$COMPOSE_FILE:compose.basicauth.yml"
BASIC_AUTH=1
SECRET_USERSFILE_VERSION=v1
  • Generate userslist with httpasswd hashed password abra app secret insert traefik.gathering.org userslist v1 'admin:<hashed-secret>' make sure there is no whitespace in between admin:<hashed-secret>, it seems to break stuff...
  • abra app deploy -f traefik
  1. abra app new monitoring-ng
  2. abra app config monitoring.gathering.org for gathering only the main compose.yml is needed, nothing more.
  3. abra app deploy monitoring.gathering.org
  4. check that endpoints are up and basic-auth works
  • cadvisor.monitoring.gathering.org
  • node.monitoring.gathering.org

Setup Metrics Browser

  1. Configure DNS
    • monitoring.example.org
    • prometheus.monitoring.example.org
    • loki.monitoring.example.org
  2. Setup monitoring stack
    • abra app new monitoring-ng
    • abra app config monitoring.example.org Uncomment all the stuff
    • abra app secret insert monitoring.example.org basic_auth v1 <secret> this needs the plaintext traefik basic-auth secret, not the hashed one!
    • abra app secret ls monitoring.example.org
    • abra app deploy monitoring.example.org
  3. add scrape config to prometheus
  • abra app cmd monitoring.example.org prometheus gathering.org
  • or manually
    cp scrape-config.example.yml gathering.org.yml
    # adjust domain
    # mkdir scrape_configs
    abra app cp monitoring.dev.local-it.cloud gathering.org.yml prometheus:/prometheus/scrape_configs/
    
Service Authentication Domain
Grafana Email / SSO monitoring.example.org
Prometheus traefik basic-auth prometheus.monitoring.example.org
loki traefik basic-auth loki.monitoring.example.org
Cadvisor traefik basic-auth cadvisor.monitoring.example.org
Node Exporter traefik basic-auth node.monitoring.example.org

Logging from a docker host to loki server without anything else

$ docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions
$ echo '{
    "debug" : true,
    "log-driver": "loki",
    "log-opts": {
        "loki-url": "https://<user>:<secret>@loki.monitoring.example.org/loki/api/v1/push",
        "loki-batch-size": "400"
    }
}' > /etc/docker/daemon.json
$ systemctl restart docker.service

Post-setup guide

  • configure prometheus/loki/alertmanager as data sources in grafana under Configuration > Data sources
    • for loki, you need to set a "Custom HTTP Header": X-Scope-OrgID: fake
  • configure the SMTP mailer under Alerting > Contact points
    • edit the default contact point, choose "Alertmanager" as type & http://alertmanager:9093 as URL
    • use the "Test" button to send a test mail. It should fire a request at the alertmanager & that should send a mail
  • abra app cp your scrap_configs: ... into /prometheus/scrape_configs & log into your prometheus web UI to ensure they're working
  • load your dashboards in manually under Create > Dashboard
  • from your dashboard panels, choose Edit > Alert to create alerts based on those panels

THX to the previous work of @decentral1se @knooflok @3wc @cellarspoon @mirsal