Philipp Rothmann a8e94af0cf | ||
---|---|---|
.env.sample | ||
.gitignore | ||
README.md | ||
abra.sh | ||
alertmanager.yml.tmpl | ||
compose.grafana.yml | ||
compose.loki.yml | ||
compose.prometheus.yml | ||
compose.promtail.yml | ||
compose.yml | ||
grafana-dashboards.yml | ||
grafana-datasources.yml | ||
grafana-stacks-dashboard.json | ||
grafana-swarm-dashboard.json | ||
grafana-traefik-dashboard.json | ||
grafana_custom.ini | ||
loki.yml.tmpl | ||
node-exporter-entrypoint.sh | ||
prometheus.yml.tmpl | ||
promtail.yml.tmpl | ||
scrape-config.example.yml |
README.md
monitoring-ng
A all-in-one grafana/prometheus/loki stack. This is a useful recipe for folks who need to centralise their monitoring stack into a single grafana/prometheus/loki & several instances of node_exporter/cadvisor/promtail.
- Category: Apps
- Status: 2, beta
- Image:
grafana/grafana
, 4, upstream - Healthcheck: 3
- Backups: 1
- Email: 3
- Tests: No
- SSO: 1
Setup a Metrics Gathering
Where gathering.org is the node you want to gather metrics from.
- Configure DNS
- monitoring.gathering.org
- cadvisor.monitoring.gathering.org
- node.monitoring.gathering.org
- Configure Traefik to use BasicAuth
abra app config traefik.gathering.org
uncomment
# BASIC_AUTH
COMPOSE_FILE="$COMPOSE_FILE:compose.basicauth.yml"
BASIC_AUTH=1
SECRET_USERSFILE_VERSION=v1
- Generate userslist with httpasswd hashed password
abra app secret insert traefik.gathering.org userslist v1 'admin:<hashed-secret>'
make sure there is no whitespace in betweenadmin:<hashed-secret>
, it seems to break stuff... abra app deploy -f traefik
abra app new monitoring-ng
abra app config monitoring.gathering.org
for gathering only the maincompose.yml
is needed, nothing more.abra app deploy monitoring.gathering.org
- check that endpoints are up and basic-auth works
- cadvisor.monitoring.gathering.org
- node.monitoring.gathering.org
Setup Metrics Browser
- Configure DNS
- monitoring.example.org
- prometheus.monitoring.example.org
- loki.monitoring.example.org
- Setup monitoring stack
abra app new monitoring-ng
abra app config monitoring.example.org
Uncomment all the stuffabra app secret insert monitoring.example.org basic_auth_admin_password v1 <secret>
this needs the plaintext traefik basic-auth secret, not the hashed one!abra app secret ls monitoring.example.org
abra app deploy monitoring.example.org
- add scrape config to prometheus
abra app cmd monitoring.example.org prometheus gathering.org
- or manually
cp scrape-config.example.yml gathering.org.yml # adjust domain # mkdir scrape_configs abra app cp monitoring.dev.local-it.cloud gathering.org.yml prometheus:/prometheus/scrape_configs/
- check that all configured targets are up: https://prometheus.monitoring.example.org/targets
Service | Authentication | Domain |
---|---|---|
Grafana | Email / SSO | monitoring.example.org |
Prometheus | traefik basic-auth | prometheus.monitoring.example.org |
loki | traefik basic-auth | loki.monitoring.example.org |
Cadvisor | traefik basic-auth | cadvisor.monitoring.example.org |
Node Exporter | traefik basic-auth | node.monitoring.example.org |
Logging from a docker host to loki server without anything else
$ docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions
$ echo '{
"debug" : true,
"log-driver": "loki",
"log-opts": {
"loki-url": "https://<user>:<secret>@loki.monitoring.example.org/loki/api/v1/push",
"loki-batch-size": "400"
}
}' > /etc/docker/daemon.json
$ systemctl restart docker.service
This stack requires 3 domains, one for grafana, prometheus, loki. This is due to the need for the gathering tools, such as node_exporter, to have a publicy accessible URL for making connections. We make use of the internal prometheus HTTP basic auth & wire up an Nginx proxy with HTTP basic auth for loki. Grafana uses Keycloak OpenId Connect sign in. The alertmanager setup remains internal and is only connected with grafana. It also assume that you are deploying the coop-cloud/gathering
recipe on the machines that you want to gather metrics & logs from. Each instance of the gathering recipe will report back and/or be scraped by your central install of monitoring-lite.
Post-setup guide
- configure prometheus/loki/alertmanager as data sources in grafana under
Configuration > Data sources
- for loki, you need to set a "Custom HTTP Header":
X-Scope-OrgID: fake
- for loki, you need to set a "Custom HTTP Header":
- configure the SMTP mailer under
Alerting > Contact points
- edit the default contact point, choose "Alertmanager" as type &
http://alertmanager:9093
as URL - use the "Test" button to send a test mail. It should fire a request at the alertmanager & that should send a mail
- edit the default contact point, choose "Alertmanager" as type &
abra app cp
yourscrap_configs: ...
into/prometheus/scrape_configs
& log into your prometheus web UI to ensure they're working- load your dashboards in manually under
Create > Dashboard
- from your dashboard panels, choose
Edit > Alert
to create alerts based on those panels
THX to the previous work of @decentral1se @knooflok @3wc @cellarspoon @mirsal
For reasonable CPU usage there are some constraints made ... hape to env out this at any point to make
Metrics are fetched every 120s Logs every 10s?