monitoring-ng/README.md

122 lines
4.5 KiB
Markdown

# monitoring-ng
Yet another monitoring stack ...
This time its a all-in-one grafana/prometheus/loki/node_exporter/cadvisor/promtail stack.
It's based heavily on the [monitoring-lite](https://git.coopcloud.tech/coop-cloud/monitoring-lite) stack, but has everything in one recipe included now. So you can deploy monitoring instances to only gather metrics / logs (node_exporter/cadvisor/promtail) and also deploy instances with the full monitoring stack (grafana/prometheus/loki) with the same recipe and just different .env configuration.
<!-- metadata -->
- **Category**: Apps
- **Status**: 2, beta
- **Image**: [`grafana/grafana`](https://hub.docker.com/r/grafana/grafana), 4, upstream
- **Healthcheck**: 3
- **Backups**: 1
- **Email**: 3
- **Tests**: No
- **SSO**: 1
<!-- endmetadata -->
## Setup a Metrics Gathering
Where gathering.org is the node you want to gather metrics from.
1. Configure DNS
- monitoring.gathering.org
- cadvisor.monitoring.gathering.org
- node.monitoring.gathering.org
1. Configure Traefik to use BasicAuth
* `abra app config traefik.gathering.org`
uncomment
```
# BASIC_AUTH
COMPOSE_FILE="$COMPOSE_FILE:compose.basicauth.yml"
BASIC_AUTH=1
SECRET_USERSFILE_VERSION=v1
```
- Generate userslist with httpasswd hashed password
`abra app secret insert traefik.gathering.org userslist v1 'admin:<hashed-secret>'`
make sure there is no whitespace in between `admin:<hashed-secret>`, it seems to break stuff...
- `abra app deploy -f traefik`
1. `abra app new monitoring-ng`
1. `abra app config monitoring.gathering.org`
for gathering only the main `compose.yml` is needed, nothing more.
1. `abra app deploy monitoring.gathering.org`
1. check that endpoints are up and basic-auth works
- cadvisor.monitoring.gathering.org
- node.monitoring.gathering.org
## Setup Metrics Browser
1. Configure DNS
- monitoring.example.org
- prometheus.monitoring.example.org
- loki.monitoring.example.org
1. Setup monitoring stack
- `abra app new monitoring-ng`
- `abra app config monitoring.example.org`
Uncomment all the stuff
- `abra app secret insert monitoring.example.org basic_auth v1 <secret>`
this needs the plaintext traefik basic-auth secret, not the hashed one!
- `abra app secret ls monitoring.example.org`
- `abra app deploy monitoring.example.org`
1. add scrape config to prometheus
- `abra app cmd monitoring.example.org prometheus gathering.org`
- or manually
```
cp scrape-config.example.yml gathering.org.yml
# adjust domain
# mkdir scrape_configs
abra app cp monitoring.dev.local-it.cloud gathering.org.yml prometheus:/prometheus/scrape_configs/
```
* check that all configured targets are up:
https://prometheus.monitoring.example.org/targets
| Service | Authentication | Domain |
| ------------- | ------------------ | --------------------------------- |
| Grafana | Email / SSO | monitoring.example.org |
| Prometheus | traefik basic-auth | prometheus.monitoring.example.org |
| loki | traefik basic-auth | loki.monitoring.example.org |
| Cadvisor | traefik basic-auth | cadvisor.monitoring.example.org |
| Node Exporter | traefik basic-auth | node.monitoring.example.org |
### Logging from a docker host to loki server without anything else
```
$ docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions
$ echo '{
"debug" : true,
"log-driver": "loki",
"log-opts": {
"loki-url": "https://<user>:<secret>@loki.monitoring.example.org/loki/api/v1/push",
"loki-batch-size": "400"
}
}' > /etc/docker/daemon.json
$ systemctl restart docker.service
```
## Post-setup guide
- configure prometheus/loki/alertmanager as data sources in grafana under `Configuration > Data sources`
- for loki, you need to set a "Custom HTTP Header": `X-Scope-OrgID: fake`
- configure the SMTP mailer under `Alerting > Contact points`
- edit the default contact point, choose "Alertmanager" as type & `http://alertmanager:9093` as URL
- use the "Test" button to send a test mail. It should fire a request at the alertmanager & that should send a mail
- `abra app cp` your `scrap_configs: ...` into `/prometheus/scrape_configs` & log into your prometheus web UI to ensure they're working
- load your dashboards in manually under `Create > Dashboard`
- from your dashboard panels, choose `Edit > Alert` to create alerts based on those panels
---
THX to the previous work of @decentral1se @knooflok @3wc @cellarspoon @mirsal