monitoring-ng/README.md

122 lines
4.5 KiB
Markdown
Raw Permalink Normal View History

2023-05-23 15:08:21 +00:00
# monitoring-ng
2022-03-31 12:26:41 +00:00
2023-05-24 09:08:42 +00:00
Yet another monitoring stack ...
This time its a all-in-one grafana/prometheus/loki/node_exporter/cadvisor/promtail stack.
It's based heavily on the [monitoring-lite](https://git.coopcloud.tech/coop-cloud/monitoring-lite) stack, but has everything in one recipe included now. So you can deploy monitoring instances to only gather metrics / logs (node_exporter/cadvisor/promtail) and also deploy instances with the full monitoring stack (grafana/prometheus/loki) with the same recipe and just different .env configuration.
2022-03-31 12:26:41 +00:00
<!-- metadata -->
2023-02-12 18:06:30 +00:00
- **Category**: Apps
- **Status**: 2, beta
- **Image**: [`grafana/grafana`](https://hub.docker.com/r/grafana/grafana), 4, upstream
- **Healthcheck**: 3
- **Backups**: 1
- **Email**: 3
- **Tests**: No
- **SSO**: 1
2022-03-31 12:26:41 +00:00
<!-- endmetadata -->
2023-02-13 15:10:33 +00:00
## Setup a Metrics Gathering
2022-03-31 12:26:41 +00:00
2023-02-13 15:10:33 +00:00
Where gathering.org is the node you want to gather metrics from.
1. Configure DNS
2023-05-23 15:08:21 +00:00
- monitoring.gathering.org
- cadvisor.monitoring.gathering.org
- node.monitoring.gathering.org
2023-02-12 18:06:30 +00:00
1. Configure Traefik to use BasicAuth
2023-02-13 15:10:33 +00:00
* `abra app config traefik.gathering.org`
2023-05-23 15:08:21 +00:00
uncomment
```
# BASIC_AUTH
COMPOSE_FILE="$COMPOSE_FILE:compose.basicauth.yml"
BASIC_AUTH=1
SECRET_USERSFILE_VERSION=v1
```
- Generate userslist with httpasswd hashed password
`abra app secret insert traefik.gathering.org userslist v1 'admin:<hashed-secret>'`
make sure there is no whitespace in between `admin:<hashed-secret>`, it seems to break stuff...
- `abra app deploy -f traefik`
2023-02-13 15:10:33 +00:00
1. `abra app new monitoring-ng`
1. `abra app config monitoring.gathering.org`
2023-05-23 15:08:21 +00:00
for gathering only the main `compose.yml` is needed, nothing more.
2023-02-13 15:10:33 +00:00
1. `abra app deploy monitoring.gathering.org`
1. check that endpoints are up and basic-auth works
2023-05-23 15:08:21 +00:00
- cadvisor.monitoring.gathering.org
- node.monitoring.gathering.org
2023-02-13 15:10:33 +00:00
## Setup Metrics Browser
2023-05-23 15:08:21 +00:00
2023-02-13 15:10:33 +00:00
1. Configure DNS
2023-05-23 15:08:21 +00:00
- monitoring.example.org
- prometheus.monitoring.example.org
- loki.monitoring.example.org
2023-05-11 13:23:35 +00:00
1. Setup monitoring stack
2023-05-23 15:08:21 +00:00
- `abra app new monitoring-ng`
- `abra app config monitoring.example.org`
Uncomment all the stuff
2023-12-20 21:46:27 +00:00
- `abra app secret insert monitoring.example.org basic_auth v1 <secret>`
2023-05-23 15:08:21 +00:00
this needs the plaintext traefik basic-auth secret, not the hashed one!
- `abra app secret ls monitoring.example.org`
- `abra app deploy monitoring.example.org`
1. add scrape config to prometheus
- `abra app cmd monitoring.example.org prometheus gathering.org`
- or manually
```
cp scrape-config.example.yml gathering.org.yml
# adjust domain
# mkdir scrape_configs
abra app cp monitoring.dev.local-it.cloud gathering.org.yml prometheus:/prometheus/scrape_configs/
```
2023-02-13 15:10:33 +00:00
* check that all configured targets are up:
https://prometheus.monitoring.example.org/targets
2023-02-12 18:06:30 +00:00
2023-05-23 15:08:21 +00:00
| Service | Authentication | Domain |
2023-02-12 18:06:30 +00:00
| ------------- | ------------------ | --------------------------------- |
| Grafana | Email / SSO | monitoring.example.org |
| Prometheus | traefik basic-auth | prometheus.monitoring.example.org |
| loki | traefik basic-auth | loki.monitoring.example.org |
| Cadvisor | traefik basic-auth | cadvisor.monitoring.example.org |
| Node Exporter | traefik basic-auth | node.monitoring.example.org |
2023-05-23 15:08:21 +00:00
### Logging from a docker host to loki server without anything else
```
$ docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions
$ echo '{
"debug" : true,
"log-driver": "loki",
"log-opts": {
"loki-url": "https://<user>:<secret>@loki.monitoring.example.org/loki/api/v1/push",
"loki-batch-size": "400"
}
}' > /etc/docker/daemon.json
$ systemctl restart docker.service
```
2023-02-13 15:10:33 +00:00
2022-03-31 12:26:41 +00:00
## Post-setup guide
- configure prometheus/loki/alertmanager as data sources in grafana under `Configuration > Data sources`
- for loki, you need to set a "Custom HTTP Header": `X-Scope-OrgID: fake`
- configure the SMTP mailer under `Alerting > Contact points`
- edit the default contact point, choose "Alertmanager" as type & `http://alertmanager:9093` as URL
- use the "Test" button to send a test mail. It should fire a request at the alertmanager & that should send a mail
- `abra app cp` your `scrap_configs: ...` into `/prometheus/scrape_configs` & log into your prometheus web UI to ensure they're working
- load your dashboards in manually under `Create > Dashboard`
- from your dashboard panels, choose `Edit > Alert` to create alerts based on those panels
2023-02-12 18:06:30 +00:00
2023-05-23 15:08:21 +00:00
---
2023-12-20 21:46:27 +00:00
THX to the previous work of @decentral1se @knooflok @3wc @cellarspoon @mirsal