Compare commits
86 Commits
0.1.0+v1.5
...
smartctl
| Author | SHA1 | Date | |
|---|---|---|---|
|
2972272303
|
|||
|
4c69cf97ab
|
|||
|
1110786179
|
|||
| 1d9eb10004 | |||
|
23acf56637
|
|||
|
03227f1907
|
|||
|
d085c66d68
|
|||
|
1970061ff8
|
|||
|
fa76179987
|
|||
|
64cb07a4a2
|
|||
|
e247677433
|
|||
|
f2310f2b86
|
|||
|
f2711fa16e
|
|||
|
2870b9486c
|
|||
|
3a1fabe4f9
|
|||
|
a358837922
|
|||
|
dd0a0c1bb0
|
|||
|
31cabc36ae
|
|||
|
d25986d5cb
|
|||
|
f8f8004445
|
|||
|
aa05d022da
|
|||
|
fb52a76247
|
|||
|
2e2a52eae0
|
|||
|
48419d5afa
|
|||
|
a0a6e2c509
|
|||
|
024f2a8aec
|
|||
|
38095e23fa
|
|||
|
641161329e
|
|||
|
cdacfd035e
|
|||
|
b2d3901f61
|
|||
|
8becf1c1d6
|
|||
|
777b1355dd
|
|||
|
e83433cebd
|
|||
|
a713f98ffb
|
|||
|
8dc84c591c
|
|||
|
d9aa05a4b5
|
|||
|
349df12204
|
|||
|
6c33089078
|
|||
|
4bedebfab1
|
|||
| dd320e9f1c | |||
| 9cb997b25a | |||
| 48d137d194 | |||
| 1acb5ebd6a | |||
| 682f30cef1 | |||
| 694c8a9875 | |||
| 9dfa9cad2a | |||
| 99f8790ec4 | |||
| 310c28e735 | |||
| 16bd65f417 | |||
| 97ebcf306a | |||
| f93370b9ca | |||
| 83461e2e76 | |||
| 7dbe5bf22e | |||
| 89b5fef6ac | |||
| cd42c64544 | |||
| 70719dbee8 | |||
| 8900ace6a2 | |||
| 8b464156bd | |||
| 73c4ec3e74 | |||
| 225899785b | |||
| 0352a393de | |||
| 92e7bbc730 | |||
| 5bf3d31c0f | |||
| a14cb575a2 | |||
| 1a59dfac7f | |||
| a9b76dff65 | |||
| 0401de1d16 | |||
| aa133fcfea | |||
| 3750e75439 | |||
| 5d0b6d88fc | |||
| c06dda7205 | |||
| b497b4ce4e | |||
| 29b3fcd7d1 | |||
| d4c6bd4c12 | |||
| 42c3695bf3 | |||
| 74498b70fe | |||
| 961a17f1ad | |||
| d549c33fe5 | |||
| 9a791ef794 | |||
| 7a0292f902 | |||
| 705934abc1 | |||
| b091bf1951 | |||
| de1819521b | |||
| 8c82943289 | |||
| 39071fa512 | |||
| 3398e1d312 |
86
.env.sample
86
.env.sample
@ -1,30 +1,47 @@
|
||||
TYPE=monitoring-ng
|
||||
LETS_ENCRYPT_ENV=production
|
||||
COMPOSE_FILE=compose.yml
|
||||
DOMAIN=monitoring.example.com
|
||||
TIMEOUT=120
|
||||
DOMAIN=monitoring-ng.example.com
|
||||
#TIMEOUT=120
|
||||
ENABLE_BACKUPS=true
|
||||
|
||||
# Monitoring Client
|
||||
SECRET_BASIC_AUTH_VERSION=v1
|
||||
# Enable Live Debugging
|
||||
LIVE_DEBUGGING=false
|
||||
# Enable this to send logs to a Loki server, adapt DOMAIN if server is
|
||||
# remote
|
||||
# LOKI_PUSH_URL=https://loki.$DOMAIN/loki/api/v1/push
|
||||
# Enable on systemd hosts to read logs from the journal
|
||||
# JOURNALD=1
|
||||
#
|
||||
## Node Exporter, Cadvisor (Gathering Metrics)
|
||||
# COMPOSE_FILE="$COMPOSE_FILE:compose.metrics.yml"
|
||||
# Enable on non-systemd hosts (Alpine, older Debian/Ubuntu) to tail
|
||||
# /var/log/*log files (syslog, auth.log, kern.log, etc.) that a local
|
||||
# syslogd writes. No syslogd reconfiguration needed.
|
||||
# SYSLOG_FILES=1
|
||||
#
|
||||
## Enable this secret for Promtail / Prometheus
|
||||
# SECRET_BASIC_AUTH_ADMIN_PASSWORD_VERSION=v1
|
||||
#
|
||||
# Promtail (Gathering Logs)
|
||||
# COMPOSE_FILE="$COMPOSE_FILE:compose.promtail.yml"
|
||||
# LOKI_PUSH_URL=https://loki.monitoring.example.org/loki/api/v1/push
|
||||
# Enable to receive syslog messages over the network on port 514/tcp.
|
||||
# Use for remote devices that push syslog to this host, or for a
|
||||
# local syslogd configured to forward over the network.
|
||||
# Not needed if you just want to read local log files — use SYSLOG_FILES instead.
|
||||
# SYSLOG=1
|
||||
# COMPOSE_FILE="$COMPOSE_FILE:compose.syslog.yml"
|
||||
|
||||
# Enable this to send metrics to a Prometheus server, adapt DOMAIN if
|
||||
# server is remote
|
||||
# PROMETHEUS_REMOTE_WRITE_URL=https://prometheus.$DOMAIN/api/v1/write
|
||||
|
||||
# Monitor physical disks health
|
||||
# COMPOSE_FILE="$COMPOSE_FILE:compose.smartctl.yml"
|
||||
|
||||
# Monitoring Server
|
||||
#
|
||||
## Prometheus, Alertmanager
|
||||
## Prometheus
|
||||
# COMPOSE_FILE="$COMPOSE_FILE:compose.prometheus.yml"
|
||||
# ALERTMANAGER_SMTP_FROM=noreply@autonomic.zone
|
||||
# ALERTMANAGER_SMTP_HOST=mail.gandi.net:587
|
||||
# ALERTMANAGER_SMTP_TO=kaboom@autonomic.zone
|
||||
# SECRET_ALERTMANAGER_SMTP_PASSWORD_VERSION=v1
|
||||
# PROMETHEUS_RETENTION_TIME=1y
|
||||
#
|
||||
## Prometheus Pushgateway
|
||||
# COMPOSE_FILE="$COMPOSE_FILE:compose.pushgateway.yml"
|
||||
#
|
||||
## Loki
|
||||
# Loki Server
|
||||
#
|
||||
@ -40,28 +57,47 @@ TIMEOUT=120
|
||||
# LOKI_AWS_REGION=eu-west-1
|
||||
# LOKI_ACCESS_KEY_ID=bush-debrief-approval-robust-scraggly-molecule
|
||||
# LOKI_BUCKET_NAMES=loki
|
||||
# SECRET_LOKI_AWS_SECRET_ACCESS_KEY_VERSION=v1
|
||||
# SECRET_LOKI_AWS_KEY_VERSION=v1
|
||||
#
|
||||
## Grafana
|
||||
#
|
||||
# COMPOSE_FILE="$COMPOSE_FILE:compose.grafana.yml"
|
||||
# GF_SERVER_ROOT_URL=https://${DOMAIN}
|
||||
# SECRET_GRAFANA_ADMIN_PASSWORD_VERSION=v1
|
||||
# GF_SERVER_ROOT_URL=https://monitoring.example.com
|
||||
# SECRET_GF_ADMINPASSWD_VERSION=v1
|
||||
#
|
||||
## Single-Sign-On with OIDC
|
||||
# COMPOSE_FILE="$COMPOSE_FILE:compose.grafana-oidc.yml"
|
||||
# OIDC_ENABLED=1
|
||||
# SECRET_GRAFANA_OIDC_CLIENT_SECRET_VERSION=v1
|
||||
# SECRET_GF_OIDC_SECRET_VERSION=v1
|
||||
# OIDC_CLIENT_ID=grafana
|
||||
# OIDC_AUTH_URL="https://sso.example.com/auth/realms/autonomic/protocol/openid-connect/auth"
|
||||
# OIDC_API_URL="https://sso.example.com/auth/realms/autonomic/protocol/openid-connect/userinfo"
|
||||
# OIDC_TOKEN_URL="https://sso.example.com/auth/realms/autonomic/protocol/openid-connect/token"
|
||||
# OIDC_AUTH_URL="https://authentik.example.com/application/o/authorize/"
|
||||
# OIDC_API_URL="https://authentik.example.com/application/o/userinfo/"
|
||||
# OIDC_TOKEN_URL="https://authentik.example.com/application/o/token/"
|
||||
#
|
||||
## Additional grafana settings (unlikely to require editing)
|
||||
# GF_SECURITY_ALLOW_EMBEDDING=1
|
||||
# GF_INSTALL_PLUGINS=grafana-piechart-panel
|
||||
#
|
||||
## grafana SMTP configuration (optional)
|
||||
# COMPOSE_FILE="$COMPOSE_FILE:compose.grafana-smtp.yml"
|
||||
# GF_SMTP_HOST=changeme
|
||||
# GF_SMTP_ENABLED=1
|
||||
# GF_SMTP_USER=changme
|
||||
# GF_SMTP_ENABLED=true
|
||||
# GF_SMTP_FROM_ADDRESS=grafana@example.com
|
||||
# GF_SMTP_SKIP_VERIFY=1
|
||||
# GF_SMTP_SKIP_VERIFY=false
|
||||
# SECRET_GF_SMTP_PASSWD_VERSION=v1
|
||||
#
|
||||
|
||||
## Grafana Matrix Contact Point (optional)
|
||||
#COMPOSE_FILE="$COMPOSE_FILE:compose.matrix-alertmanager-receiver.yml"
|
||||
#SECRET_MATRIX_TOKEN_VERSION=v1
|
||||
#GF_MATRIX_USER_ID="<user-id>"
|
||||
#GF_MATRIX_ROOM_ID="<room-id>"
|
||||
#GF_MATRIX_HOMESERVER_URL="<homeserver-url>"
|
||||
|
||||
# ALerts
|
||||
#ALERT_BACKUP_FAILED_ENABLED=true
|
||||
#ALERT_BACKUP_MISSING_ENABLED=true
|
||||
#ALERT_BACKUP_NOT_SUCCESSFULL_ENABLED=true
|
||||
#ALERT_NODE_DISK_SPACE_ENABLED=true
|
||||
#ALERT_NODE_MEMORY_USAGE_ENABLED=true
|
||||
|
||||
62
README.md
62
README.md
@ -1,8 +1,8 @@
|
||||
# monitoring-ng
|
||||
|
||||
Yet another monitoring stack ...
|
||||
This time its a all-in-one grafana/prometheus/loki/node_exporter/cadvisor/promtail stack.
|
||||
It's based heavily on the [monitoring-lite](https://git.coopcloud.tech/coop-cloud/monitoring-lite) stack, but has everything in one recipe included now. So you can deploy monitoring instances to only gather metrics / logs (node_exporter/cadvisor/promtail) and also deploy instances with the full monitoring stack (grafana/prometheus/loki) with the same recipe and just different .env configuration.
|
||||
This time its a all-in-one grafana/prometheus/loki/alloy stack.
|
||||
It's based heavily on the [monitoring-lite](https://git.coopcloud.tech/coop-cloud/monitoring-lite) stack, but has everything in one recipe included now. So you can deploy monitoring instances to only gather metrics / logs (alloy) and also deploy instances with the full monitoring stack (grafana/prometheus/loki) with the same recipe and just different .env configuration.
|
||||
|
||||
|
||||
<!-- metadata -->
|
||||
@ -36,7 +36,7 @@ Where gathering.org is the node you want to gather metrics from.
|
||||
SECRET_USERSFILE_VERSION=v1
|
||||
```
|
||||
- Generate userslist with httpasswd hashed password
|
||||
`abra app secret insert traefik.gathering.org userslist v1 'admin:<hashed-secret>'`
|
||||
`abra app secret insert traefik.gathering.org usersfile v1 'admin:<hashed-secret>'`
|
||||
make sure there is no whitespace in between `admin:<hashed-secret>`, it seems to break stuff...
|
||||
- `abra app deploy -f traefik`
|
||||
1. `abra app new monitoring-ng`
|
||||
@ -54,15 +54,15 @@ Where gathering.org is the node you want to gather metrics from.
|
||||
- monitoring.example.org
|
||||
- prometheus.monitoring.example.org
|
||||
- loki.monitoring.example.org
|
||||
1. Setup monitoring stack
|
||||
2. Setup monitoring stack
|
||||
- `abra app new monitoring-ng`
|
||||
- `abra app config monitoring.example.org`
|
||||
Uncomment all the stuff
|
||||
- `abra app secret insert monitoring.example.org basic_auth_admin_password v1 <secret>`
|
||||
- `abra app secret insert monitoring.example.org basic_auth v1 <secret>`
|
||||
this needs the plaintext traefik basic-auth secret, not the hashed one!
|
||||
- `abra app secret ls monitoring.example.org`
|
||||
- `abra app deploy monitoring.example.org`
|
||||
1. add scrape config to prometheus
|
||||
3. Add scrape config to prometheus
|
||||
- `abra app cmd monitoring.example.org prometheus gathering.org`
|
||||
- or manually
|
||||
```
|
||||
@ -85,7 +85,6 @@ Where gathering.org is the node you want to gather metrics from.
|
||||
| Cadvisor | traefik basic-auth | cadvisor.monitoring.example.org |
|
||||
| Node Exporter | traefik basic-auth | node.monitoring.example.org |
|
||||
|
||||
|
||||
### Logging from a docker host to loki server without anything else
|
||||
|
||||
```
|
||||
@ -101,8 +100,18 @@ $ echo '{
|
||||
$ systemctl restart docker.service
|
||||
```
|
||||
|
||||
## Setup Push Gateway
|
||||
|
||||
1. Enable in the env fiöle by uncommenting the following lines:
|
||||
```
|
||||
## Prometheus Pushgateway
|
||||
# COMPOSE_FILE="$COMPOSE_FILE:compose.pushgateway.yml"
|
||||
```
|
||||
2. `abra app deploy monitoring.example.org`
|
||||
|
||||
This will expose the pushgateway at `https://pushgateway.${DOMAIN}`.
|
||||
It is secured behind the same basic auth as the other services.
|
||||
After that you need to add the `pushgateway.${DOMAIN}` to the scare config.
|
||||
|
||||
## Post-setup guide
|
||||
|
||||
@ -118,4 +127,41 @@ $ systemctl restart docker.service
|
||||
|
||||
---
|
||||
|
||||
THX to the previous work of @decentral1se @knooflok @3wc @cellarspoon @mirsal
|
||||
THX to the previous work of @decentral1se @knooflok @3wc @cellarspoon @mirsal
|
||||
|
||||
## Adding Matrix as Alert Contact point
|
||||
|
||||
1. Enable the [matrix-alertmanager-receiver](https://github.com/metio/matrix-alertmanager-receiver/):
|
||||
```
|
||||
COMPOSE_FILE="$COMPOSE_FILE:compose.matrix-alertmanager-receiver.yml"
|
||||
```
|
||||
|
||||
2. Insert the matrix access token secret:
|
||||
```
|
||||
abra app secret insert monitoring.marx.klasse-methode.it matrix_token v1
|
||||
```
|
||||
|
||||
3. Set required configurations:
|
||||
```
|
||||
GF_MATRIX_USER_ID=
|
||||
GF_MATRIX_ROOM_ID=
|
||||
GF_MATRIX_HOME_SERVER_URL=
|
||||
```
|
||||
4. Configure Alertmanager webhook and set the url to `http://matrix-alertmanager-receiver:12345/alerts/<room-id>`
|
||||
|
||||
## alerts
|
||||
|
||||
It is possible to enable the following alerts, by setting the corresponding env variable to `true`:
|
||||
- backupbot failed: `ALERT_BACKUP_FAILED_ENABLED`
|
||||
- backupbot missing: `ALERT_BACKUP_MISSING_ENABLED`
|
||||
- backupbot not successfull: `ALERT_BACKUP_NOT_SUCCESSFULL_ENABLED`
|
||||
- node disk space: `ALERT_NODE_DISK_SPACE_ENABLED`
|
||||
- node memory usage: `ALERT_NODE_MEMORY_USAGE_ENABLED`
|
||||
|
||||
## smart monitoring
|
||||
|
||||
To be able monitor hard drive health data, you need to configure
|
||||
`smartd` to run on the host system, and also the
|
||||
`collect-smartctl-json.sh` script provided here (via cronjob or as
|
||||
a `smartd` hook). This is a limitation on Docker Swarm, which prevents
|
||||
the `smartctl_exporter` from running on privileged mode.
|
||||
|
||||
122
abra.sh
122
abra.sh
@ -1,24 +1,122 @@
|
||||
export NODE_EXPORTER_ENTRYPOINT_VERSION=v1
|
||||
export GRAFANA_DATASOURCES_YML_VERSION=v1
|
||||
export GRAFANA_DASHBOARDS_YML_VERSION=v1
|
||||
export GRAFANA_SWARM_DASHBOARD_JSON_VERSION=v1
|
||||
export GRAFANA_STACKS_DASHBOARD_JSON_VERSION=v1
|
||||
export GRAFANA_TRAEFIK_DASHBOARD_JSON_VERSION=v1
|
||||
export GRAFANA_CUSTOM_INI_VERSION=v1
|
||||
export PROMTAIL_YML_VERSION=v1
|
||||
export LOKI_YML_VERSION=v1
|
||||
export PROMETHEUS_YML_VERSION=v1
|
||||
export ALERTMANAGER_CONFIG_VERSION=v1
|
||||
export ENTRYPOINT_VERSION=v1
|
||||
export GF_DATASOURCES_VERSION=v1
|
||||
export GF_DASHBOARDS_VERSION=v2
|
||||
export GF_SWARM_DASH_VERSION=v2
|
||||
export GF_STACKS_DASH_VERSION=v2
|
||||
export GF_TRAEFIK_DASH_VERSION=v2
|
||||
export GF_BACKUP_DASH_VERSION=v1
|
||||
export GF_CUSTOM_INI_VERSION=v4
|
||||
export LOKI_YML_VERSION=v3
|
||||
export PROMETHEUS_YML_VERSION=v2
|
||||
export MATRIX_ALERTMANAGER_CONFIG_VERSION=e
|
||||
export MATRIX_ALERTMANAGER_ENTRYPOINT_VERSION=a
|
||||
export GRAFANA_ALERTS_NODE_VERSION=v1c
|
||||
export CONFIG_ALLOY_VERSION=v10
|
||||
|
||||
# creates a default prometheus scrape config for a given node
|
||||
add_node(){
|
||||
name=$1
|
||||
add_domain "$name" "$name:8082"
|
||||
add_domain "$name" "metrics.traefik.$name"
|
||||
add_domain "$name" "node.monitoring.$name"
|
||||
add_domain "$name" "cadvisor.monitoring.$name"
|
||||
cat "/prometheus/scrape_configs/$name.yml"
|
||||
}
|
||||
|
||||
# migrates secrets from old names to new names by reading values from the
|
||||
# running containers on the server and re-inserting them under the new names.
|
||||
# preview changes: abra app cmd --local <app> migrate_secret_names
|
||||
# execute changes: abra app cmd --local <app> migrate_secret_names execute
|
||||
migrate_secret_names() {
|
||||
if ! command -v jq &> /dev/null; then
|
||||
echo "jq is required on your local machine to migrate secret names"
|
||||
echo "It could not be found in your PATH, please install jq to proceed."
|
||||
echo "For example: On a debian/ubuntu system, run `apt install jq`"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Hardcoded migration mappings: old_secret_name|new_secret_name
|
||||
MIGRATIONS="
|
||||
grafana_admin_password|gf_adminpasswd
|
||||
grafana_smtp_password|gf_smtp_passwd
|
||||
grafana_oidc_client_secret|gf_oidc_secret
|
||||
matrix_access_token|matrix_token
|
||||
loki_aws_secret_access_key|loki_aws_key
|
||||
"
|
||||
|
||||
# Determine which server the app is deployed on
|
||||
SERVER=$(abra app ls -m | jq -r --arg domain "$APP_NAME" '[.[].apps[] | select(.domain == $domain) | .server] | first' 2>/dev/null)
|
||||
|
||||
if [ -z "$SERVER" ]; then
|
||||
echo "Error: could not determine server for app '$APP_NAME'"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Build a lookup table of all secrets currently mounted in this stack.
|
||||
# Each line: <secretID> <containerID> <secretName>
|
||||
LOOKUP=$(ssh "$SERVER" "
|
||||
docker stack services ${STACK_NAME} --format '{{.Name}}' | while read svc; do
|
||||
CID=\$(docker ps --no-trunc -q --filter \"name=\${svc}\" | head -1)
|
||||
docker service inspect \"\$svc\" --format '{{json .Spec.TaskTemplate.ContainerSpec.Secrets}}' | \
|
||||
jq -r --arg cid \"\$CID\" '.[]? | .SecretID + \" \" + \$cid + \" \" + .SecretName'
|
||||
done | sort -k3 -r
|
||||
" 2>/dev/null)
|
||||
|
||||
echo "Secret migration plan for: $APP_NAME (server: $SERVER)"
|
||||
echo ""
|
||||
printf " %-24s %-8s %s\n" "OLD NAME" "FOUND" "ACTION"
|
||||
printf " %-24s %-8s %s\n" "--------" "-----" "------"
|
||||
|
||||
# Check each old name against the lookup table and display the plan
|
||||
ANY_FOUND=false
|
||||
while IFS='|' read -r OLD_NAME NEW_NAME; do
|
||||
[ -z "$OLD_NAME" ] && continue
|
||||
MATCH=$(echo "$LOOKUP" | grep " ${STACK_NAME}_${OLD_NAME}_" | head -1)
|
||||
if [ -n "$MATCH" ]; then
|
||||
printf " %-24s %-8s %s\n" "$OLD_NAME" "yes" "recreate as '$NEW_NAME' version V1"
|
||||
ANY_FOUND=true
|
||||
else
|
||||
printf " %-24s %-8s %s\n" "$OLD_NAME" "no" "nothing (not found on server)"
|
||||
fi
|
||||
done <<< "$MIGRATIONS"
|
||||
|
||||
echo ""
|
||||
|
||||
if [ "$ANY_FOUND" = false ]; then
|
||||
echo "No old secrets found on server. Nothing to migrate."
|
||||
return 0
|
||||
fi
|
||||
|
||||
if [ "$1" != "execute" ]; then
|
||||
echo "To apply the above changes, run:"
|
||||
echo " abra app cmd --local $APP_NAME migrate_secret_names execute"
|
||||
return 0
|
||||
fi
|
||||
|
||||
# read each found secret from its container and re-insert with the new name
|
||||
while IFS='|' read -r OLD_NAME NEW_NAME; do
|
||||
[ -z "$OLD_NAME" ] && continue
|
||||
|
||||
MATCH=$(echo "$LOOKUP" | grep " ${STACK_NAME}_${OLD_NAME}_" | head -1)
|
||||
[ -z "$MATCH" ] && continue
|
||||
|
||||
SECRET_ID=$(echo "$MATCH" | awk '{print $1}')
|
||||
CID=$(echo "$MATCH" | awk '{print $2}')
|
||||
SECRET_VALUE=$(ssh "$SERVER" "cat /var/lib/docker/containers/${CID}/mounts/secrets/${SECRET_ID} 2>/dev/null || sudo cat /var/lib/docker/containers/${CID}/mounts/secrets/${SECRET_ID} 2>/dev/null")
|
||||
|
||||
if [ -z "$SECRET_VALUE" ]; then
|
||||
echo "Error: could not read value for '$OLD_NAME', skipping"
|
||||
continue
|
||||
fi
|
||||
|
||||
echo "Migrating: '$OLD_NAME' -> '$NEW_NAME' (v1)"
|
||||
printf '%s' "$SECRET_VALUE" | abra app secret insert -C "$APP_NAME" "$NEW_NAME" v1
|
||||
|
||||
done <<< "$MIGRATIONS"
|
||||
|
||||
echo ""
|
||||
echo "Done."
|
||||
}
|
||||
|
||||
# adds a domain to a scrape config or creates a new one
|
||||
add_domain(){
|
||||
name=$1
|
||||
|
||||
74
alertmanager-matrix-config.yml.tmpl
Normal file
74
alertmanager-matrix-config.yml.tmpl
Normal file
@ -0,0 +1,74 @@
|
||||
# configuration of the HTTP server
|
||||
http:
|
||||
## address: 127.0.0.1 # bind address for this service. Can be left unspecified to bind on all interfaces
|
||||
port: 12345 # port used by this service
|
||||
alerts-path-prefix: /alerts # URL path for the webhook receiver called by an Alertmanager. Defaults to /alerts
|
||||
metrics-path: /metrics # URL path to collect metrics. Defaults to /metrics
|
||||
metrics-enabled: true # Whether to enable metrics or not. Defaults to false
|
||||
# basic-username: alertmanager # Username for basic authentication. Defaults to alertmanager
|
||||
# basic-password: secret # If set, the alerts endpoint expects basic-auth credentials with the configured username and password
|
||||
|
||||
# configuration for the Matrix connection
|
||||
matrix:
|
||||
homeserver-url: "{{ env "GF_MATRIX_HOMESERVER_URL" }}"
|
||||
user-id: "{{ env "GF_MATRIX_USER_ID" }}"
|
||||
access-token: "{{ secret "matrix_token" }}"
|
||||
room-mapping:
|
||||
matrixroom: "{{ env "GF_MATRIX_ROOM_ID" }}"
|
||||
|
||||
templating:
|
||||
# mapping of ExternalURL values
|
||||
external-url-mapping:
|
||||
# key is the original value taken from the Alertmanager payload
|
||||
# value is the mapped value which will be available as '.ExternalURL' in templates
|
||||
"http://alertmanager:9093": https://alertmanager.example.com
|
||||
# mapping of GeneratorURL values
|
||||
generator-url-mapping:
|
||||
# key is the original value taken from the Alertmanager payload
|
||||
# value is the mapped value which will be available as '.GeneratorURL' in templates
|
||||
"http://prometheus:8080": https://prometheus.example.com
|
||||
|
||||
# computation of arbitrary values based on matching alert annotations, labels, or status
|
||||
# values will be evaluated top to bottom, last entry wins
|
||||
computed-values:
|
||||
- values: # always set 'color' to 'yellow'
|
||||
color: yellow
|
||||
- values: # set 'color' to 'orange' when alert label 'severity' is 'warning'
|
||||
color: orange
|
||||
when-matching-labels:
|
||||
severity: warning
|
||||
- values: # set 'color' to 'red' when alert label 'severity' is 'critical'
|
||||
color: red
|
||||
when-matching-labels:
|
||||
severity: critical
|
||||
- values: # set 'color' to 'green' when alert status is 'resolved'
|
||||
color: green
|
||||
when-matching-status: resolved
|
||||
|
||||
# template for alerts in status 'firing'
|
||||
firing-template: '{{`
|
||||
<p>
|
||||
<strong><font color="{{ .ComputedValues.color }}">{{ .Alert.Status | ToUpper }}</font></strong>
|
||||
{{ if .Alert.Labels.name }}
|
||||
{{ .Alert.Labels.name }}
|
||||
{{ else if .Alert.Labels.alertname }}
|
||||
{{ .Alert.Labels.alertname }}
|
||||
{{ end }}
|
||||
>>
|
||||
{{ if .Alert.Labels.severity }}
|
||||
{{ .Alert.Labels.severity | ToUpper }}:
|
||||
{{ end }}
|
||||
{{ if .Alert.Annotations.description }}
|
||||
{{ .Alert.Annotations.description }}
|
||||
{{ else if .Alert.Annotations.summary }}
|
||||
{{ .Alert.Annotations.summary }}
|
||||
{{ end }}
|
||||
>>
|
||||
{{ if .Alert.Annotations.runbook }}
|
||||
<a href="{{ .Alert.Annotations.runbook }}">Runbook</a> |
|
||||
{{ end }}
|
||||
{{ if .Alert.Annotations.dashboard }}
|
||||
<a href="{{ .Alert.Annotations.dashboard }}">Dashboard</a> |
|
||||
{{ end }}
|
||||
<a href="{{ .SilenceURL }}">Silence</a>
|
||||
</p>`}}'
|
||||
@ -1,13 +0,0 @@
|
||||
global:
|
||||
smtp_from: {{ env "ALERTMANAGER_SMTP_FROM" }}
|
||||
smtp_smarthost: {{ env "ALERTMANAGER_SMTP_HOST" }}
|
||||
smtp_auth_username: {{ env "ALERTMANAGER_SMTP_FROM" }}
|
||||
smtp_auth_password: {{ secret "alertmanager_smtp_password" }}
|
||||
|
||||
route:
|
||||
receiver: "kaboom-mailer"
|
||||
|
||||
receivers:
|
||||
- name: "kaboom-mailer"
|
||||
email_configs:
|
||||
- to: {{ env "ALERTMANAGER_SMTP_TO" }}
|
||||
131
alerts/node.yml.tmpl
Normal file
131
alerts/node.yml.tmpl
Normal file
@ -0,0 +1,131 @@
|
||||
apiVersion: 1
|
||||
|
||||
# List of alert rule UIDs that should be deleted
|
||||
deleteRules:
|
||||
{{ if ne (env "ALERT_NODE_DISK_SPACE_ENABLED") "true" }}
|
||||
- orgId: 1
|
||||
uid: bds8bhxu97pxca
|
||||
{{ end }}
|
||||
{{ if ne (env "ALERT_NODE_MEMORY_USAGE_ENABLED") "true" }}
|
||||
- orgId: 1
|
||||
uid: ads8cswmly96oa
|
||||
{{ end }}
|
||||
|
||||
groups:
|
||||
- orgId: 1
|
||||
name: node
|
||||
folder: node
|
||||
interval: 5m
|
||||
rules:
|
||||
{{ if eq (env "ALERT_NODE_DISK_SPACE_ENABLED") "true" }}
|
||||
- uid: bds8bhxu97pxca
|
||||
title: Node Disk Space
|
||||
condition: C
|
||||
data:
|
||||
- refId: A
|
||||
relativeTimeRange:
|
||||
from: 600
|
||||
to: 0
|
||||
datasourceUid: PBFA97CFB590B2093
|
||||
model:
|
||||
editorMode: code
|
||||
expr: (node_filesystem_free_bytes{fstype="ext4"} / node_filesystem_size_bytes{fstype="ext4"}) * 100
|
||||
instant: true
|
||||
intervalMs: 1000
|
||||
legendFormat: __auto
|
||||
maxDataPoints: 43200
|
||||
range: false
|
||||
refId: A
|
||||
- refId: C
|
||||
relativeTimeRange:
|
||||
from: 600
|
||||
to: 0
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
conditions:
|
||||
- evaluator:
|
||||
params:
|
||||
- 10
|
||||
type: lt
|
||||
operator:
|
||||
type: and
|
||||
query:
|
||||
params:
|
||||
- C
|
||||
reducer:
|
||||
params: []
|
||||
type: last
|
||||
type: query
|
||||
datasource:
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
expression: A
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: C
|
||||
type: threshold
|
||||
noDataState: NoData
|
||||
execErrState: Error
|
||||
for: 5m
|
||||
annotations:
|
||||
description: ""
|
||||
runbook_url: ""
|
||||
summary: Less than 10% disk space left on {{`{{ $labels.instance }}`}} ({{`{{ (index $values "A").Value }}`}}% left)
|
||||
labels:
|
||||
"": ""
|
||||
isPaused: false
|
||||
{{ end }}
|
||||
{{ if eq (env "ALERT_NODE_MEMORY_USAGE_ENABLED") "true" }}
|
||||
- uid: ads8cswmly96oa
|
||||
title: Node Memory Usage
|
||||
condition: C
|
||||
data:
|
||||
- refId: A
|
||||
relativeTimeRange:
|
||||
from: 600
|
||||
to: 0
|
||||
datasourceUid: PBFA97CFB590B2093
|
||||
model:
|
||||
editorMode: code
|
||||
expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
|
||||
instant: true
|
||||
intervalMs: 1000
|
||||
legendFormat: __auto
|
||||
maxDataPoints: 43200
|
||||
range: false
|
||||
refId: A
|
||||
- refId: C
|
||||
relativeTimeRange:
|
||||
from: 600
|
||||
to: 0
|
||||
datasourceUid: __expr__
|
||||
model:
|
||||
conditions:
|
||||
- evaluator:
|
||||
params:
|
||||
- 85
|
||||
type: gt
|
||||
operator:
|
||||
type: and
|
||||
query:
|
||||
params:
|
||||
- C
|
||||
reducer:
|
||||
params: []
|
||||
type: last
|
||||
type: query
|
||||
datasource:
|
||||
type: __expr__
|
||||
uid: __expr__
|
||||
expression: A
|
||||
intervalMs: 1000
|
||||
maxDataPoints: 43200
|
||||
refId: C
|
||||
type: threshold
|
||||
noDataState: NoData
|
||||
execErrState: Error
|
||||
for: 5m
|
||||
annotations:
|
||||
summary: Memory usage is above 85% on {{`{{ $labels.instance }}`}} ({{`{{ printf "%.2f" (index $values "A").Value }}`}}% usage)
|
||||
isPaused: false
|
||||
{{ end }}
|
||||
6
collect-smartctl-json.service
Normal file
6
collect-smartctl-json.service
Normal file
@ -0,0 +1,6 @@
|
||||
[Unit]
|
||||
Description=Collect SMART data
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/usr/local/bin/collect-smartctl-json.sh
|
||||
69
collect-smartctl-json.sh
Executable file
69
collect-smartctl-json.sh
Executable file
@ -0,0 +1,69 @@
|
||||
#! /bin/bash
|
||||
# Adapted from https://github.com/prometheus-community/smartctl_exporter/blob/master/collect-smartctl-json.sh
|
||||
|
||||
script_dir=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
|
||||
|
||||
# Data directory to dump smartctl output
|
||||
# This directory will be created if it doesn't exist
|
||||
data_dir="/var/lib/smartmontools/json"
|
||||
|
||||
# The original script used --xall but that doesn't work
|
||||
# This matches the command in readSMARTctl()
|
||||
smartctl_args="--json --info --health --attributes --tolerance=verypermissive \
|
||||
--nocheck=standby --format=brief --log=error"
|
||||
|
||||
# Ignore this devices
|
||||
smartctl_ignore_dev_regex="^(/dev/bus)"
|
||||
|
||||
# Determine the json query tool to use
|
||||
if command -v jq >/dev/null; then
|
||||
json_tool="jq"
|
||||
json_args="--raw-output"
|
||||
elif command -v yq >/dev/null; then
|
||||
json_tool="yq"
|
||||
json_args="--unwrapScalar"
|
||||
else
|
||||
echo -e "One of 'yq' or 'jq' is required. Please try again after \
|
||||
installing one of them"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [[ ! "${UID}" -eq 0 ]] && ! command -v sudo >/dev/null; then
|
||||
# Not root and sudo doesn't exist
|
||||
echo "sudo does not exist. Please run this as root"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
SUDO="sudo"
|
||||
if [[ "${UID}" -eq 0 ]]; then
|
||||
# Don't use sudo if root
|
||||
SUDO=""
|
||||
fi
|
||||
|
||||
[[ ! -d "${data_dir}" ]] && mkdir --parents "${data_dir}"
|
||||
|
||||
if [[ $# -ne 0 ]]; then
|
||||
devices="${1}"
|
||||
else
|
||||
devices="$(smartctl --scan --json | "${json_tool}" "${json_args}" \
|
||||
".devices[].name | select(test(\"${smartctl_ignore_dev_regex}\") | not)")"
|
||||
mapfile -t devices <<< "${devices[@]}"
|
||||
fi
|
||||
|
||||
for device in "${devices[@]}"
|
||||
do
|
||||
echo -n "Collecting data for '${device}'..."
|
||||
# shellcheck disable=SC2086
|
||||
data="$($SUDO smartctl ${smartctl_args} ${device})"
|
||||
# Accommodate a smartmontools pre-7.3 bug
|
||||
data=${data#" Pending defect count:"}
|
||||
type="$(echo "${data}" | "${json_tool}" "${json_args}" '.device.type')"
|
||||
family="$(echo "${data}" | "${json_tool}" "${json_args}" \
|
||||
'select(.model_family != null) | .model_family | sub(" |/" ; "_" ; "g")
|
||||
| sub("\"|\\(|\\)" ; "" ; "g")')"
|
||||
model="$(echo "${data}" | "${json_tool}" "${json_args}" \
|
||||
'.model_name | sub(" |/" ; "_" ; "g") | sub("\"|\\(|\\)" ; "" ; "g")')"
|
||||
device_name="$(basename "${device}")"
|
||||
echo -e "\tSaving to ${device_name}.json"
|
||||
echo "${data}" > "${data_dir}/${device_name}.json"
|
||||
done
|
||||
9
collect-smartctl-json.timer
Normal file
9
collect-smartctl-json.timer
Normal file
@ -0,0 +1,9 @@
|
||||
[Unit]
|
||||
Description=Collect SMART data
|
||||
|
||||
[Timer]
|
||||
OnCalendar=hourly
|
||||
Persistent=true
|
||||
|
||||
[Install]
|
||||
WantedBy=timers.target
|
||||
17
compose.grafana-oidc.yml
Normal file
17
compose.grafana-oidc.yml
Normal file
@ -0,0 +1,17 @@
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
grafana:
|
||||
secrets:
|
||||
- gf_oidc_secret
|
||||
environment:
|
||||
- OIDC_API_URL
|
||||
- OIDC_AUTH_URL
|
||||
- OIDC_CLIENT_ID
|
||||
- OIDC_ENABLED
|
||||
- OIDC_TOKEN_URL
|
||||
|
||||
secrets:
|
||||
gf_oidc_secret:
|
||||
external: true
|
||||
name: ${STACK_NAME}_gf_oidc_secret_${SECRET_GF_OIDC_SECRET_VERSION}
|
||||
18
compose.grafana-smtp.yml
Normal file
18
compose.grafana-smtp.yml
Normal file
@ -0,0 +1,18 @@
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
grafana:
|
||||
secrets:
|
||||
- gf_smtp_passwd
|
||||
environment:
|
||||
- GF_SMTP_HOST
|
||||
- GF_SMTP_USER
|
||||
- GF_SMTP_PASSWORD__FILE=/run/secrets/gf_smtp_passwd
|
||||
- GF_SMTP_ENABLED
|
||||
- GF_SMTP_FROM_ADDRESS
|
||||
- GF_SMTP_SKIP_VERIFY
|
||||
|
||||
secrets:
|
||||
gf_smtp_passwd:
|
||||
external: true
|
||||
name: ${STACK_NAME}_gf_smtp_passwd_${SECRET_GF_SMTP_PASSWD_VERSION}
|
||||
@ -2,86 +2,87 @@ version: '3.8'
|
||||
|
||||
services:
|
||||
grafana:
|
||||
image: grafana/grafana:9.5.2
|
||||
image: grafana/grafana:12.4.3
|
||||
volumes:
|
||||
- grafana-data:/var/lib/grafana:rw
|
||||
secrets:
|
||||
- grafana_admin_password
|
||||
- grafana_oidc_client_secret
|
||||
- gf_adminpasswd
|
||||
configs:
|
||||
- source: grafana_custom_ini
|
||||
- source: gf_custom_ini
|
||||
target: /etc/grafana/grafana.ini
|
||||
- source: grafana_datasources_yml
|
||||
- source: gf_datasources
|
||||
target: /etc/grafana/provisioning/datasources/datasources.yml
|
||||
- source: grafana_dashboards_yml
|
||||
- source: gf_dashboards
|
||||
target: /etc/grafana/provisioning/dashboards/dashboards.yml
|
||||
- source: grafana_swarm_dashboard_json
|
||||
- source: gf_swarm_dash
|
||||
target: /var/lib/grafana/dashboards/docker-swarm-nodes.json
|
||||
- source: grafana_stacks_dashboard_json
|
||||
- source: gf_stacks_dash
|
||||
target: /var/lib/grafana/dashboards/docker-swarm-stacks.json
|
||||
- source: grafana_traefik_dashboard_json
|
||||
- source: gf_traefik_dash
|
||||
target: /var/lib/grafana/dashboards/traefik.json
|
||||
- source: gf_backup_dash
|
||||
target: /var/lib/grafana/dashboards/backup.json
|
||||
- source: gf_alerts_node
|
||||
target: /etc/grafana/provisioning/alerting/node.yml
|
||||
networks:
|
||||
- proxy
|
||||
- internal
|
||||
environment:
|
||||
- GF_SERVER_ROOT_URL=https://${GRAFANA_DOMAIN}
|
||||
- GF_SECURITY_ADMIN_PASSWORD__FILE=/run/secrets/grafana_admin_password
|
||||
- GF_SMTP_HOST
|
||||
- GF_SMTP_ENABLED
|
||||
- GF_SMTP_FROM_ADDRESS
|
||||
- GF_SMTP_SKIP_VERIFY
|
||||
- GF_SERVER_ROOT_URL
|
||||
- GF_SECURITY_ADMIN_PASSWORD__FILE=/run/secrets/gf_adminpasswd
|
||||
- GF_SECURITY_ALLOW_EMBEDDING
|
||||
- GF_INSTALL_PLUGINS=grafana-piechart-panel
|
||||
- OIDC_API_URL
|
||||
- OIDC_AUTH_URL
|
||||
- OIDC_CLIENT_ID
|
||||
- OIDC_ENABLED
|
||||
- OIDC_TOKEN_URL
|
||||
- GF_INSTALL_PLUGINS
|
||||
- ALERT_NODE_DISK_SPACE_ENABLED
|
||||
- ALERT_NODE_MEMORY_USAGE_ENABLED
|
||||
deploy:
|
||||
labels:
|
||||
- "traefik.enable=true"
|
||||
- "traefik.swarm.network=proxy"
|
||||
- "traefik.http.services.${STACK_NAME}-grafana.loadbalancer.server.port=3000"
|
||||
- "traefik.http.routers.${STACK_NAME}-grafana.rule=Host(`${DOMAIN}`)"
|
||||
- "traefik.http.routers.${STACK_NAME}-grafana.entrypoints=web-secure"
|
||||
- "traefik.http.routers.${STACK_NAME}-grafana.tls=true"
|
||||
- "traefik.http.routers.${STACK_NAME}-grafana.tls.certresolver=${LETS_ENCRYPT_ENV}"
|
||||
healthcheck:
|
||||
test: "wget -q http://localhost:3000/ -O/dev/null"
|
||||
test: "wget -q http://localhost:3000/healthz -O/dev/null"
|
||||
interval: 5s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 10s
|
||||
|
||||
configs:
|
||||
grafana_custom_ini:
|
||||
gf_custom_ini:
|
||||
template_driver: golang
|
||||
name: ${STACK_NAME}_grafana_custom_ini_${GRAFANA_CUSTOM_INI_VERSION}
|
||||
name: ${STACK_NAME}_gf_custom_ini_${GF_CUSTOM_INI_VERSION}
|
||||
file: grafana_custom.ini
|
||||
grafana_datasources_yml:
|
||||
name: ${STACK_NAME}_grafana_datasources_yml_${GRAFANA_DATASOURCES_YML_VERSION}
|
||||
gf_datasources:
|
||||
name: ${STACK_NAME}_gf_datasources_${GF_DATASOURCES_VERSION}
|
||||
file: grafana-datasources.yml
|
||||
grafana_dashboards_yml:
|
||||
name: ${STACK_NAME}_grafana_dashboards_yml_${GRAFANA_DASHBOARDS_YML_VERSION}
|
||||
gf_dashboards:
|
||||
name: ${STACK_NAME}_gf_dashboards_${GF_DASHBOARDS_VERSION}
|
||||
file: grafana-dashboards.yml
|
||||
grafana_swarm_dashboard_json:
|
||||
name: ${STACK_NAME}_grafana_swarm_dashboard_json_${GRAFANA_SWARM_DASHBOARD_JSON_VERSION}
|
||||
gf_swarm_dash:
|
||||
name: ${STACK_NAME}_gf_swarm_dash_${GF_SWARM_DASH_VERSION}
|
||||
file: grafana-swarm-dashboard.json
|
||||
grafana_stacks_dashboard_json:
|
||||
name: ${STACK_NAME}_grafana_stacks_dashboard_json_${GRAFANA_STACKS_DASHBOARD_JSON_VERSION}
|
||||
gf_stacks_dash:
|
||||
name: ${STACK_NAME}_gf_stacks_dash_${GF_STACKS_DASH_VERSION}
|
||||
file: grafana-stacks-dashboard.json
|
||||
grafana_traefik_dashboard_json:
|
||||
name: ${STACK_NAME}_grafana_traefik_dashboard_json_${GRAFANA_TRAEFIK_DASHBOARD_JSON_VERSION}
|
||||
gf_traefik_dash:
|
||||
name: ${STACK_NAME}_gf_traefik_dash_${GF_TRAEFIK_DASH_VERSION}
|
||||
file: grafana-traefik-dashboard.json
|
||||
gf_backup_dash:
|
||||
name: ${STACK_NAME}_gf_backup_dash_${GF_BACKUP_DASH_VERSION}
|
||||
file: grafana-backup-dashboard.json
|
||||
gf_alerts_node:
|
||||
template_driver: golang
|
||||
name: ${STACK_NAME}_gf_alerts_node_${GRAFANA_ALERTS_NODE_VERSION}
|
||||
file: alerts/node.yml.tmpl
|
||||
|
||||
volumes:
|
||||
grafana-data:
|
||||
|
||||
|
||||
secrets:
|
||||
grafana_admin_password:
|
||||
gf_adminpasswd:
|
||||
external: true
|
||||
name: ${STACK_NAME}_grafana_admin_password_${SECRET_GRAFANA_ADMIN_PASSWORD_VERSION}
|
||||
grafana_oidc_client_secret:
|
||||
external: true
|
||||
name: ${STACK_NAME}_grafana_oidc_client_secret_${SECRET_GRAFANA_OIDC_CLIENT_SECRET_VERSION}
|
||||
name: ${STACK_NAME}_gf_adminpasswd_${SECRET_GF_ADMINPASSWD_VERSION}
|
||||
|
||||
@ -2,7 +2,7 @@ version: '3.8'
|
||||
|
||||
services:
|
||||
loki:
|
||||
image: grafana/loki:2.8.2
|
||||
image: grafana/loki:3.7.2
|
||||
command: -config.file=/etc/loki/local-config.yaml
|
||||
networks:
|
||||
- proxy
|
||||
@ -12,7 +12,7 @@ services:
|
||||
volumes:
|
||||
- loki-data:/loki
|
||||
# secrets:
|
||||
# - loki_aws_secret_access_key
|
||||
# - loki_aws_key
|
||||
environment:
|
||||
- LOKI_ACCESS_KEY_ID
|
||||
- LOKI_AWS_ENDPOINT
|
||||
@ -27,6 +27,7 @@ services:
|
||||
condition: on-failure
|
||||
labels:
|
||||
- "traefik.enable=true"
|
||||
- "traefik.swarm.network=proxy"
|
||||
- "traefik.http.services.${STACK_NAME}-loki.loadbalancer.server.port=3100"
|
||||
- "traefik.http.routers.${STACK_NAME}-loki.rule=Host(`loki.${DOMAIN}`)"
|
||||
- "traefik.http.routers.${STACK_NAME}-loki.entrypoints=web-secure"
|
||||
@ -46,6 +47,6 @@ volumes:
|
||||
loki-data:
|
||||
|
||||
# secrets:
|
||||
# loki_aws_secret_access_key:
|
||||
# loki_aws_key:
|
||||
# external: true
|
||||
# name: ${STACK_NAME}_loki_aws_secret_access_key_${SECRET_LOKI_AWS_SECRET_ACCESS_KEY_VERSION}
|
||||
# name: ${STACK_NAME}_loki_aws_key_${SECRET_LOKI_AWS_KEY_VERSION}
|
||||
|
||||
28
compose.matrix-alertmanager-receiver.yml
Normal file
28
compose.matrix-alertmanager-receiver.yml
Normal file
@ -0,0 +1,28 @@
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
matrix-alertmanager-receiver:
|
||||
image: metio/matrix-alertmanager-receiver:2026.2.25
|
||||
secrets:
|
||||
- matrix_token
|
||||
configs:
|
||||
- source: matrix-alertmanager-receiver-config
|
||||
target: /etc/matrix-alertmanager-receiver/config.yml
|
||||
networks:
|
||||
- internal
|
||||
environment:
|
||||
- GF_MATRIX_USER_ID
|
||||
- GF_MATRIX_ROOM_ID
|
||||
- GF_MATRIX_HOMESERVER_URL
|
||||
command: "--config-path=/etc/matrix-alertmanager-receiver/config.yml"
|
||||
|
||||
configs:
|
||||
matrix-alertmanager-receiver-config:
|
||||
template_driver: golang
|
||||
name: ${STACK_NAME}_mar_config_${MATRIX_ALERTMANAGER_CONFIG_VERSION}
|
||||
file: alertmanager-matrix-config.yml.tmpl
|
||||
|
||||
secrets:
|
||||
matrix_token:
|
||||
external: true
|
||||
name: ${STACK_NAME}_matrix_token_${SECRET_MATRIX_TOKEN_VERSION}
|
||||
@ -2,9 +2,9 @@ version: '3.8'
|
||||
|
||||
services:
|
||||
prometheus:
|
||||
image: prom/prometheus:v2.44.0
|
||||
image: prom/prometheus:v3.12.0
|
||||
secrets:
|
||||
- basic_auth_admin_password
|
||||
- basic_auth
|
||||
volumes:
|
||||
- prometheus-data:/prometheus:rw
|
||||
configs:
|
||||
@ -16,6 +16,8 @@ services:
|
||||
- "--web.console.libraries=/usr/share/prometheus/console_libraries"
|
||||
- "--web.console.templates=/usr/share/prometheus/consoles"
|
||||
- "--storage.tsdb.retention.time=${PROMETHEUS_RETENTION_TIME}"
|
||||
- "--enable-feature=remote-write-receiver"
|
||||
- "--web.enable-remote-write-receiver"
|
||||
networks:
|
||||
- proxy
|
||||
- internal
|
||||
@ -24,6 +26,7 @@ services:
|
||||
condition: on-failure
|
||||
labels:
|
||||
- "traefik.enable=true"
|
||||
- "traefik.swarm.network=proxy"
|
||||
- "traefik.http.services.${STACK_NAME}-prometheus.loadbalancer.server.port=9090"
|
||||
- "traefik.http.routers.${STACK_NAME}-prometheus.rule=Host(`prometheus.${DOMAIN}`)"
|
||||
- "traefik.http.routers.${STACK_NAME}-prometheus.entrypoints=web-secure"
|
||||
@ -31,42 +34,11 @@ services:
|
||||
- "traefik.http.routers.${STACK_NAME}-prometheus.tls.certresolver=${LETS_ENCRYPT_ENV}"
|
||||
- "traefik.http.routers.${STACK_NAME}-prometheus.middlewares=basicauth@file"
|
||||
|
||||
|
||||
alertmanager:
|
||||
image: prom/alertmanager:v0.25.0
|
||||
volumes:
|
||||
- alertmanager-data:/etc/alertmanager
|
||||
command:
|
||||
- "--config.file=/etc/alertmanager/config.yml"
|
||||
- "--storage.path=/alertmanager"
|
||||
networks:
|
||||
- internal
|
||||
secrets:
|
||||
- alertmanager_smtp_password
|
||||
configs:
|
||||
- source: alertmanager_config
|
||||
target: /etc/alertmanager/config.yml
|
||||
environment:
|
||||
- ALERTMANAGER_SMTP_FROM
|
||||
- ALERTMANAGER_SMTP_HOST
|
||||
- ALERTMANAGER_SMTP_TO
|
||||
|
||||
configs:
|
||||
prometheus_yml:
|
||||
template_driver: golang
|
||||
name: ${STACK_NAME}_prometheus_yml_${PROMETHEUS_YML_VERSION}
|
||||
file: prometheus.yml.tmpl
|
||||
alertmanager_config:
|
||||
template_driver: golang
|
||||
name: ${STACK_NAME}_alertmanager_config_${ALERTMANAGER_CONFIG_VERSION}
|
||||
file: ./alertmanager.yml.tmpl
|
||||
|
||||
|
||||
volumes:
|
||||
prometheus-data:
|
||||
alertmanager-data:
|
||||
|
||||
secrets:
|
||||
alertmanager_smtp_password:
|
||||
external: true
|
||||
name: ${STACK_NAME}_alertmanager_smtp_password_${SECRET_ALERTMANAGER_SMTP_PASSWORD_VERSION}
|
||||
@ -1,29 +0,0 @@
|
||||
version: "3.8"
|
||||
|
||||
services:
|
||||
promtail:
|
||||
image: grafana/promtail:2.8.2
|
||||
volumes:
|
||||
- /var/log:/var/log:ro
|
||||
- /var/run/docker.sock:/var/run/docker.sock
|
||||
command: -config.file=/etc/promtail/config.yml
|
||||
configs:
|
||||
- source: promtail_yml
|
||||
target: /etc/promtail/config.yml
|
||||
networks:
|
||||
- internal
|
||||
secrets:
|
||||
- basic_auth_admin_password
|
||||
environment:
|
||||
- LOKI_PUSH_URL
|
||||
|
||||
configs:
|
||||
promtail_yml:
|
||||
name: ${STACK_NAME}_promtail_yml_${PROMTAIL_YML_VERSION}
|
||||
file: promtail.yml.tmpl
|
||||
template_driver: golang
|
||||
|
||||
secrets:
|
||||
basic_auth_admin_password:
|
||||
external: true
|
||||
name: ${STACK_NAME}_basic_auth_admin_password_${SECRET_BASIC_AUTH_ADMIN_PASSWORD_VERSION}
|
||||
26
compose.pushgateway.yml
Normal file
26
compose.pushgateway.yml
Normal file
@ -0,0 +1,26 @@
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
pushgateway:
|
||||
image: prom/pushgateway:v1.11.2
|
||||
command:
|
||||
- '--web.listen-address=:9191'
|
||||
- '--push.disable-consistency-check'
|
||||
- '--persistence.interval=5m'
|
||||
ports:
|
||||
- 9191:9191
|
||||
networks:
|
||||
- internal
|
||||
- proxy
|
||||
deploy:
|
||||
restart_policy:
|
||||
condition: on-failure
|
||||
labels:
|
||||
- "traefik.enable=true"
|
||||
- "traefik.swarm.network=proxy"
|
||||
- "traefik.http.services.${STACK_NAME}-pushgateway.loadbalancer.server.port=9191"
|
||||
- "traefik.http.routers.${STACK_NAME}-pushgateway.rule=Host(`pushgateway.${DOMAIN}`)"
|
||||
- "traefik.http.routers.${STACK_NAME}-pushgateway.entrypoints=web-secure"
|
||||
- "traefik.http.routers.${STACK_NAME}-pushgateway.tls=true"
|
||||
- "traefik.http.routers.${STACK_NAME}-pushgateway.tls.certresolver=${LETS_ENCRYPT_ENV}"
|
||||
- "traefik.http.routers.${STACK_NAME}-pushgateway.middlewares=basicauth@file"
|
||||
18
compose.smartctl.yml
Normal file
18
compose.smartctl.yml
Normal file
@ -0,0 +1,18 @@
|
||||
---
|
||||
version: "3.8"
|
||||
services:
|
||||
smartctl:
|
||||
image: "prometheuscommunity/smartctl-exporter:v0.14.0"
|
||||
volumes:
|
||||
- "/dev:/dev"
|
||||
- "/var/lib/smartmontools/json:/debug"
|
||||
command:
|
||||
- "--smartctl.fake-data"
|
||||
- "--smartctl.interval=1h"
|
||||
networks:
|
||||
- "proxy"
|
||||
deploy:
|
||||
labels:
|
||||
- "prometheus.io/scrape=true"
|
||||
- "prometheus.io/port=9633"
|
||||
- "prometheus.io/path=/metrics"
|
||||
6
compose.syslog.yml
Normal file
6
compose.syslog.yml
Normal file
@ -0,0 +1,6 @@
|
||||
---
|
||||
version: "3.8"
|
||||
services:
|
||||
app:
|
||||
ports:
|
||||
- "514:514"
|
||||
106
compose.yml
106
compose.yml
@ -3,87 +3,53 @@ version: "3.8"
|
||||
|
||||
services:
|
||||
app:
|
||||
image: prom/node-exporter:v1.5.0
|
||||
user: root
|
||||
environment:
|
||||
- NODE_ID={{.Node.ID}}
|
||||
volumes:
|
||||
- /proc:/host/proc:ro
|
||||
- /sys:/host/sys:ro
|
||||
- /:/rootfs:ro
|
||||
- /etc/hostname:/etc/nodename:ro
|
||||
command:
|
||||
- "--path.sysfs=/host/sys"
|
||||
- "--path.procfs=/host/proc"
|
||||
- "--path.rootfs=/rootfs"
|
||||
- "--collector.textfile.directory=/etc/node-exporter/"
|
||||
- "--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)"
|
||||
- "--no-collector.ipvs"
|
||||
image: grafana/alloy:v1.16.1
|
||||
hostname: "${DOMAIN}"
|
||||
configs:
|
||||
- source: node_exporter_entrypoint_sh
|
||||
target: /entrypoint.sh
|
||||
networks:
|
||||
- internal
|
||||
- proxy
|
||||
entrypoint: [ "/bin/sh", "-e", "/entrypoint.sh" ]
|
||||
deploy:
|
||||
restart_policy:
|
||||
condition: on-failure
|
||||
labels:
|
||||
- "traefik.enable=true"
|
||||
- "traefik.http.services.${STACK_NAME}-node.loadbalancer.server.port=9100"
|
||||
- "traefik.http.routers.${STACK_NAME}-node.rule=Host(`node.${DOMAIN}`)"
|
||||
- "traefik.http.routers.${STACK_NAME}-node.entrypoints=web-secure"
|
||||
- "traefik.http.routers.${STACK_NAME}-node.tls=true"
|
||||
- "traefik.http.routers.${STACK_NAME}-node.tls.certresolver=${LETS_ENCRYPT_ENV}"
|
||||
- "traefik.http.routers.${STACK_NAME}-node.middlewares=basicauth@file"
|
||||
- "coop-cloud.${STACK_NAME}.version=0.1.0+v1.5.0"
|
||||
- "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
|
||||
|
||||
cadvisor:
|
||||
image: gcr.io/cadvisor/cadvisor:v0.47.1
|
||||
command:
|
||||
- "-logtostderr"
|
||||
- "--enable_metrics=cpu,cpuLoad,disk,memory,network"
|
||||
# all possible metrics: advtcp,app,cpu,cpuLoad,cpu_topology,cpuset,disk,diskIO,hugetlb,memory,memory_numa,network,oom_event,percpu,perf_event,process,referenced_memory,resctrl,sched,tcp,udp.
|
||||
- "--housekeeping_interval=120s"
|
||||
- "--docker_only=true"
|
||||
|
||||
- source: config_alloy
|
||||
target: /etc/alloy/config.alloy
|
||||
volumes:
|
||||
- /var/lib/docker/:/var/lib/docker:ro
|
||||
- /dev/disk/:/dev/disk:ro
|
||||
- /sys:/sys:ro
|
||||
- /var/run:/var/run:ro
|
||||
- /:/rootfs:ro
|
||||
- /var/run/docker.sock:/var/run/docker.sock:ro
|
||||
- /sys:/sys:ro
|
||||
- /var/lib/docker:/var/lib/docker:ro
|
||||
- alloy-data:/var/lib/alloy/data
|
||||
command:
|
||||
- "run"
|
||||
- "--storage.path=/var/lib/alloy/data"
|
||||
- "--server.http.listen-addr=0.0.0.0:12345"
|
||||
- "/etc/alloy/config.alloy"
|
||||
networks:
|
||||
- internal
|
||||
- proxy
|
||||
- internal
|
||||
secrets:
|
||||
- basic_auth
|
||||
deploy:
|
||||
restart_policy:
|
||||
condition: on-failure
|
||||
labels:
|
||||
- "backupbot.backup=${ENABLE_BACKUPS:-true}"
|
||||
- "coop-cloud.${STACK_NAME}.version=1.6.0+v1.8.1"
|
||||
- "traefik.enable=true"
|
||||
- "traefik.http.services.${STACK_NAME}-cadvisor.loadbalancer.server.port=8080"
|
||||
- "traefik.http.routers.${STACK_NAME}-cadvisor.rule=Host(`cadvisor.${DOMAIN}`)"
|
||||
- "traefik.http.routers.${STACK_NAME}-cadvisor.entrypoints=web-secure"
|
||||
- "traefik.http.routers.${STACK_NAME}-cadvisor.tls=true"
|
||||
- "traefik.http.routers.${STACK_NAME}-cadvisor.tls.certresolver=${LETS_ENCRYPT_ENV}"
|
||||
- "traefik.http.routers.${STACK_NAME}-cadvisor.middlewares=basicauth@file"
|
||||
healthcheck:
|
||||
test: wget --quiet --tries=1 --spider http://localhost:8080/healthz || exit 1
|
||||
interval: 15s
|
||||
timeout: 15s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
|
||||
- "traefik.swarm.network=proxy"
|
||||
- "traefik.http.services.${STACK_NAME}-alloy.loadbalancer.server.port=12345"
|
||||
- "traefik.http.routers.${STACK_NAME}-alloy.rule=Host(`alloy.${DOMAIN}`)"
|
||||
- "traefik.http.routers.${STACK_NAME}-alloy.entrypoints=web-secure"
|
||||
- "traefik.http.routers.${STACK_NAME}-alloy.tls=true"
|
||||
- "traefik.http.routers.${STACK_NAME}-alloy.tls.certresolver=${LETS_ENCRYPT_ENV}"
|
||||
- "traefik.http.routers.${STACK_NAME}-alloy.middlewares=basicauth@file"
|
||||
configs:
|
||||
node_exporter_entrypoint_sh:
|
||||
name: ${STACK_NAME}_node_exporter_entrypoint_${NODE_EXPORTER_ENTRYPOINT_VERSION}
|
||||
file: node-exporter-entrypoint.sh
|
||||
|
||||
|
||||
|
||||
config_alloy:
|
||||
template_driver: golang
|
||||
name: ${STACK_NAME}_config_alloy_${CONFIG_ALLOY_VERSION}
|
||||
file: config.alloy.tmpl
|
||||
networks:
|
||||
proxy:
|
||||
external: true
|
||||
internal:
|
||||
internal:
|
||||
volumes:
|
||||
alloy-data:
|
||||
secrets:
|
||||
basic_auth:
|
||||
external: true
|
||||
name: ${STACK_NAME}_basic_auth_${SECRET_BASIC_AUTH_VERSION}
|
||||
|
||||
266
config.alloy.tmpl
Normal file
266
config.alloy.tmpl
Normal file
@ -0,0 +1,266 @@
|
||||
logging {
|
||||
level = "info"
|
||||
format = "logfmt"
|
||||
}
|
||||
|
||||
livedebugging {
|
||||
enabled = {{ env "LIVE_DEBUGGING" }}
|
||||
}
|
||||
|
||||
discovery.docker "linux" {
|
||||
host = "unix:///var/run/docker.sock"
|
||||
}
|
||||
|
||||
{{ if ne (env "PROMETHEUS_REMOTE_WRITE_URL") "" }}
|
||||
prometheus.exporter.cadvisor "docker" {
|
||||
docker_only = true
|
||||
enabled_metrics = ["cpu", "cpuLoad", "disk", "diskIO", "memory", "network", "process"]
|
||||
}
|
||||
|
||||
prometheus.exporter.unix "default" {
|
||||
include_exporter_metrics = true
|
||||
rootfs_path = "/rootfs"
|
||||
procfs_path = "/rootfs/proc"
|
||||
sysfs_path = "/rootfs/sys"
|
||||
|
||||
disable_collectors = ["ipvs"]
|
||||
|
||||
filesystem {
|
||||
fs_types_exclude = "^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|tmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$"
|
||||
mount_points_exclude = "^/(sys|proc|dev|host|etc)($|/)"
|
||||
mount_timeout = "5s"
|
||||
}
|
||||
|
||||
netclass { ignored_devices = "^(veth.*)$" }
|
||||
netdev { device_exclude = "^(veth.*)$" }
|
||||
}
|
||||
|
||||
prometheus.exporter.self "alloy" {}
|
||||
|
||||
prometheus.scrape "default" {
|
||||
scrape_interval = "120s"
|
||||
|
||||
targets = array.concat(
|
||||
prometheus.exporter.self.alloy.targets,
|
||||
prometheus.exporter.unix.default.targets,
|
||||
prometheus.exporter.cadvisor.docker.targets,
|
||||
discovery.docker.containers.targets,
|
||||
)
|
||||
|
||||
forward_to = [prometheus.remote_write.prometheus.receiver]
|
||||
}
|
||||
|
||||
prometheus.remote_write "prometheus" {
|
||||
endpoint {
|
||||
url = "{{ env "PROMETHEUS_REMOTE_WRITE_URL" }}"
|
||||
|
||||
basic_auth {
|
||||
username = "admin"
|
||||
password = "{{ secret "basic_auth" }}"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
discovery.docker "containers" {
|
||||
host = "unix:///var/run/docker.sock"
|
||||
match_first_network = false
|
||||
}
|
||||
|
||||
// Scrape Prometheus metrics from other containers on this host.
|
||||
// Containers opt in via Docker labels:
|
||||
// prometheus.io/scrape=true required: enable scraping
|
||||
// prometheus.io/port=9090 optional: port exposing /metrics (defaults to first exposed port)
|
||||
// prometheus.io/path=/metrics optional: path to metrics endpoint (default: /metrics)
|
||||
// prometheus.io/auth=basic optional: use basic auth with the shared basic_auth secret
|
||||
discovery.dockerswarm "swarm" {
|
||||
host = "unix:///var/run/docker.sock"
|
||||
role = "services"
|
||||
}
|
||||
|
||||
discovery.relabel "metrics" {
|
||||
targets = discovery.dockerswarm.swarm.targets
|
||||
|
||||
rule {
|
||||
source_labels = ["__meta_dockerswarm_network_name"]
|
||||
regex = "proxy"
|
||||
action = "keep"
|
||||
}
|
||||
|
||||
rule {
|
||||
source_labels = ["__meta_dockerswarm_service_label_prometheus_io_scrape"]
|
||||
regex = "true"
|
||||
action = "keep"
|
||||
}
|
||||
|
||||
rule {
|
||||
source_labels = ["__address__", "__meta_dockerswarm_service_label_prometheus_io_port"]
|
||||
regex = `(.+):\d+;(\d+)`
|
||||
target_label = "__address__"
|
||||
replacement = "$1:$2"
|
||||
}
|
||||
|
||||
rule {
|
||||
source_labels = ["__meta_dockerswarm_service_label_prometheus_io_path"]
|
||||
regex = `(.+)`
|
||||
target_label = "__metrics_path__"
|
||||
}
|
||||
|
||||
rule {
|
||||
source_labels = ["__meta_dockerswarm_service_name"]
|
||||
target_label = "job"
|
||||
}
|
||||
}
|
||||
|
||||
discovery.relabel "metrics_noauth" {
|
||||
targets = discovery.relabel.metrics.output
|
||||
rule {
|
||||
source_labels = ["__meta_dockerswarm_service_label_prometheus_io_auth"]
|
||||
regex = "^$"
|
||||
action = "keep"
|
||||
}
|
||||
}
|
||||
|
||||
discovery.relabel "metrics_basicauth" {
|
||||
targets = discovery.relabel.metrics.output
|
||||
rule {
|
||||
source_labels = ["__meta_dockerswarm_service_label_prometheus_io_auth"]
|
||||
regex = "basic"
|
||||
action = "keep"
|
||||
}
|
||||
}
|
||||
|
||||
discovery.relabel "metrics_bearerauth" {
|
||||
targets = discovery.relabel.metrics.output
|
||||
rule {
|
||||
source_labels = ["__meta_dockerswarm_service_label_prometheus_io_auth"]
|
||||
regex = "bearer"
|
||||
action = "keep"
|
||||
}
|
||||
}
|
||||
|
||||
prometheus.scrape "containers" {
|
||||
scrape_interval = "120s"
|
||||
targets = discovery.relabel.metrics_noauth.output
|
||||
forward_to = [prometheus.remote_write.prometheus.receiver]
|
||||
}
|
||||
|
||||
prometheus.scrape "containers_basicauth" {
|
||||
scrape_interval = "120s"
|
||||
targets = discovery.relabel.metrics_basicauth.output
|
||||
forward_to = [prometheus.remote_write.prometheus.receiver]
|
||||
basic_auth {
|
||||
username = "admin"
|
||||
password = "{{ secret "basic_auth" }}"
|
||||
}
|
||||
}
|
||||
|
||||
prometheus.scrape "containers_bearerauth" {
|
||||
scrape_interval = "120s"
|
||||
targets = discovery.relabel.metrics_bearerauth.output
|
||||
forward_to = [prometheus.remote_write.prometheus.receiver]
|
||||
bearer_token = "{{ secret "basic_auth" }}"
|
||||
}
|
||||
{{ end }}
|
||||
|
||||
{{ if ne (env "LOKI_PUSH_URL") "" }}
|
||||
discovery.relabel "docker" {
|
||||
targets = discovery.docker.linux.targets
|
||||
|
||||
rule {
|
||||
source_labels = ["__meta_docker_container_name"]
|
||||
target_label = "container_name"
|
||||
}
|
||||
rule {
|
||||
source_labels = ["__meta_docker_container_id"]
|
||||
target_label = "container_id"
|
||||
}
|
||||
rule {
|
||||
source_labels = ["__meta_docker_container_label_com_docker_stack_namespace"]
|
||||
target_label = "stack_namespace"
|
||||
}
|
||||
rule {
|
||||
source_labels = ["__meta_docker_container_label_com_docker_swarm_service_name"]
|
||||
target_label = "service_name"
|
||||
}
|
||||
rule {
|
||||
source_labels = ["__meta_docker_container_log_stream"]
|
||||
target_label = "stream"
|
||||
}
|
||||
}
|
||||
|
||||
loki.source.docker "docker" {
|
||||
host = "unix:///var/run/docker.sock"
|
||||
targets = discovery.relabel.docker.output
|
||||
labels = {"app" = "docker"}
|
||||
forward_to = [loki.write.loki.receiver]
|
||||
}
|
||||
|
||||
// JOURNALD: reads the systemd journal binary log directly.
|
||||
// Use on systemd hosts (most modern Linux distros). Requires no syslogd.
|
||||
{{ if eq (env "JOURNALD") "1" }}
|
||||
loki.source.journal "journal" {
|
||||
path = "/rootfs/var/log/journal"
|
||||
labels = { job = "{{ env "DOMAIN" }}" }
|
||||
forward_to = [loki.write.loki.receiver]
|
||||
}
|
||||
{{ end }}
|
||||
|
||||
// SYSLOG_FILES: tails all /var/log/*log files (syslog, auth.log, kern.log, etc.).
|
||||
// Use on non-systemd hosts where a syslogd writes to /var/log.
|
||||
{{ if eq (env "SYSLOG_FILES") "1" }}
|
||||
local.file_match "syslog_files" {
|
||||
path_targets = [{ __path__ = "/rootfs/var/log/*log" }]
|
||||
}
|
||||
|
||||
loki.source.file "syslog_files" {
|
||||
targets = local.file_match.syslog_files.targets
|
||||
forward_to = [loki.process.syslog_files.receiver]
|
||||
}
|
||||
|
||||
loki.process "syslog_files" {
|
||||
stage.static_labels {
|
||||
values = { job = "syslog" }
|
||||
}
|
||||
forward_to = [loki.write.loki.receiver]
|
||||
}
|
||||
{{ end }}
|
||||
|
||||
// SYSLOG: opens a network syslog listener on port 514.
|
||||
// Use when a remote device or a local syslogd configured to
|
||||
// forward over the network sends logs to this host.
|
||||
// Requires compose.syslog.yml to publish port 514 to the host.
|
||||
// This is NOT needed for reading local log files — use SYSLOG_FILES instead.
|
||||
{{ if eq (env "SYSLOG") "1" }}
|
||||
loki.relabel "syslog" {
|
||||
rule {
|
||||
action = "labelmap"
|
||||
regex = "__syslog_(.+)"
|
||||
}
|
||||
|
||||
forward_to = []
|
||||
}
|
||||
|
||||
loki.source.syslog "syslog" {
|
||||
listener {
|
||||
address = "[::]:514"
|
||||
label_structured_data = true
|
||||
labels = { component = "loki.source.syslog" }
|
||||
}
|
||||
|
||||
relabel_rules = loki.relabel.syslog.rules
|
||||
forward_to = [loki.write.loki.receiver]
|
||||
}
|
||||
{{ end }}
|
||||
|
||||
loki.write "loki" {
|
||||
endpoint {
|
||||
url = "{{ env "LOKI_PUSH_URL" }}"
|
||||
|
||||
basic_auth {
|
||||
username = "admin"
|
||||
password = "{{ secret "basic_auth" }}"
|
||||
}
|
||||
}
|
||||
external_labels = { hostname = "{{ env "DOMAIN" }}" }
|
||||
}
|
||||
{{ end }}
|
||||
315
grafana-alerts.json.tmpl
Normal file
315
grafana-alerts.json.tmpl
Normal file
@ -0,0 +1,315 @@
|
||||
{
|
||||
"apiVersion": 1,
|
||||
"groups": [
|
||||
{
|
||||
"orgId": 1,
|
||||
"name": "backupbot",
|
||||
"folder": "node",
|
||||
"interval": "1m",
|
||||
"rules": [
|
||||
{{ if eq (env "ALERT_BACKUP_FAILED_ENABLED") "true" }}
|
||||
{
|
||||
"uid": "de8e5xxup7t34a",
|
||||
"title": "Backup Failed",
|
||||
"condition": "C",
|
||||
"data": [
|
||||
{
|
||||
"refId": "A",
|
||||
"relativeTimeRange": { "from": 600, "to": 0 },
|
||||
"datasourceUid": "PBFA97CFB590B2093",
|
||||
"model": {
|
||||
"disableTextWrap": false,
|
||||
"editorMode": "builder",
|
||||
"expr": "backup",
|
||||
"fullMetaSearch": false,
|
||||
"includeNullMetadata": true,
|
||||
"instant": true,
|
||||
"intervalMs": 1000,
|
||||
"legendFormat": "__auto",
|
||||
"maxDataPoints": 43200,
|
||||
"range": false,
|
||||
"refId": "A",
|
||||
"useBackend": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"refId": "C",
|
||||
"relativeTimeRange": { "from": 600, "to": 0 },
|
||||
"datasourceUid": "__expr__",
|
||||
"model": {
|
||||
"conditions": [
|
||||
{
|
||||
"evaluator": { "params": [0], "type": "lt" },
|
||||
"operator": { "type": "and" },
|
||||
"query": { "params": ["C"] },
|
||||
"reducer": { "params": [], "type": "last" },
|
||||
"type": "query"
|
||||
}
|
||||
],
|
||||
"datasource": { "type": "__expr__", "uid": "__expr__" },
|
||||
"expression": "A",
|
||||
"intervalMs": 1000,
|
||||
"maxDataPoints": 43200,
|
||||
"refId": "C",
|
||||
"type": "threshold"
|
||||
}
|
||||
}
|
||||
],
|
||||
"noDataState": "NoData",
|
||||
"execErrState": "Error",
|
||||
"for": "1m",
|
||||
"isPaused": false
|
||||
},
|
||||
{{ end }}
|
||||
{{ if eq (env "ALERT_BACKUP_MISSING_ENABLED") "true" }}
|
||||
{
|
||||
"uid": "ce8e65uddcwe8d",
|
||||
"title": "Backup Missing",
|
||||
"condition": "B",
|
||||
"data": [
|
||||
{
|
||||
"refId": "A",
|
||||
"relativeTimeRange": { "from": 600, "to": 0 },
|
||||
"datasourceUid": "PBFA97CFB590B2093",
|
||||
"model": {
|
||||
"disableTextWrap": false,
|
||||
"editorMode": "builder",
|
||||
"expr": "rate(backup[24h])",
|
||||
"fullMetaSearch": false,
|
||||
"includeNullMetadata": true,
|
||||
"instant": true,
|
||||
"intervalMs": 1000,
|
||||
"legendFormat": "__auto",
|
||||
"maxDataPoints": 43200,
|
||||
"range": false,
|
||||
"refId": "A",
|
||||
"useBackend": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"refId": "B",
|
||||
"relativeTimeRange": { "from": 600, "to": 0 },
|
||||
"datasourceUid": "__expr__",
|
||||
"model": {
|
||||
"conditions": [
|
||||
{
|
||||
"evaluator": { "params": [0, 0], "type": "within_range" },
|
||||
"operator": { "type": "and" },
|
||||
"query": { "params": ["C"] },
|
||||
"reducer": { "params": [], "type": "last" },
|
||||
"type": "query"
|
||||
}
|
||||
],
|
||||
"datasource": { "type": "__expr__", "uid": "__expr__" },
|
||||
"expression": "A",
|
||||
"intervalMs": 1000,
|
||||
"maxDataPoints": 43200,
|
||||
"refId": "B",
|
||||
"type": "threshold"
|
||||
}
|
||||
}
|
||||
],
|
||||
"noDataState": "NoData",
|
||||
"execErrState": "Error",
|
||||
"for": "5m",
|
||||
"isPaused": false
|
||||
},
|
||||
{{ end }}
|
||||
{{ if eq (env "ALERT_BACKUP_NOT_SUCCESSFULL_ENABLED") "true" }}
|
||||
{
|
||||
"uid": "de8e6bc92a8lcc",
|
||||
"title": "Backup Not Successfull",
|
||||
"condition": "B",
|
||||
"data": [
|
||||
{
|
||||
"refId": "A",
|
||||
"relativeTimeRange": {
|
||||
"from": 60,
|
||||
"to": 0
|
||||
},
|
||||
"datasourceUid": "PBFA97CFB590B2093",
|
||||
"model": {
|
||||
"disableTextWrap": false,
|
||||
"editorMode": "builder",
|
||||
"expr": "backup",
|
||||
"fullMetaSearch": false,
|
||||
"includeNullMetadata": true,
|
||||
"instant": true,
|
||||
"intervalMs": 1000,
|
||||
"legendFormat": "__auto",
|
||||
"maxDataPoints": 43200,
|
||||
"range": false,
|
||||
"refId": "A",
|
||||
"useBackend": false
|
||||
}
|
||||
},
|
||||
{
|
||||
"refId": "B",
|
||||
"relativeTimeRange": {
|
||||
"from": 60,
|
||||
"to": 0
|
||||
},
|
||||
"datasourceUid": "__expr__",
|
||||
"model": {
|
||||
"conditions": [
|
||||
{
|
||||
"evaluator": {
|
||||
"params": [
|
||||
0
|
||||
],
|
||||
"type": "gt"
|
||||
},
|
||||
"operator": {
|
||||
"type": "and"
|
||||
},
|
||||
"query": {
|
||||
"params": [
|
||||
"C"
|
||||
]
|
||||
},
|
||||
"reducer": {
|
||||
"params": [],
|
||||
"type": "last"
|
||||
},
|
||||
"type": "query"
|
||||
}
|
||||
],
|
||||
"datasource": {
|
||||
"type": "__expr__",
|
||||
"uid": "__expr__"
|
||||
},
|
||||
"expression": "A",
|
||||
"intervalMs": 1000,
|
||||
"maxDataPoints": 43200,
|
||||
"refId": "B",
|
||||
"type": "threshold"
|
||||
}
|
||||
}
|
||||
],
|
||||
"noDataState": "NoData",
|
||||
"execErrState": "Error",
|
||||
"for": "20m",
|
||||
"annotations": {
|
||||
"summary": "Backup did not finish within 20 minutes"
|
||||
},
|
||||
"labels": {},
|
||||
"isPaused": false
|
||||
}
|
||||
{{ end }}
|
||||
]
|
||||
},
|
||||
{
|
||||
"orgId": 1,
|
||||
"name": "node",
|
||||
"folder": "node",
|
||||
"interval": "5m",
|
||||
"rules": [
|
||||
{{ if eq (env "ALERT_NODE_DISK_SPACE_ENABLED") "true" }}
|
||||
{
|
||||
"uid": "bds8bhxu97pxca",
|
||||
"title": "Node Disk Space",
|
||||
"condition": "C",
|
||||
"data": [
|
||||
{
|
||||
"refId": "A",
|
||||
"relativeTimeRange": { "from": 600, "to": 0 },
|
||||
"datasourceUid": "PBFA97CFB590B2093",
|
||||
"model": {
|
||||
"editorMode": "code",
|
||||
"expr": "(node_filesystem_free_bytes{fstype=\"ext4\"} / node_filesystem_size_bytes{fstype=\"ext4\"}) * 100",
|
||||
"instant": true,
|
||||
"intervalMs": 1000,
|
||||
"legendFormat": "__auto",
|
||||
"maxDataPoints": 43200,
|
||||
"range": false,
|
||||
"refId": "A"
|
||||
}
|
||||
},
|
||||
{
|
||||
"refId": "C",
|
||||
"relativeTimeRange": { "from": 600, "to": 0 },
|
||||
"datasourceUid": "__expr__",
|
||||
"model": {
|
||||
"conditions": [
|
||||
{
|
||||
"evaluator": { "params": [10], "type": "lt" },
|
||||
"operator": { "type": "and" },
|
||||
"query": { "params": ["C"] },
|
||||
"reducer": { "params": [], "type": "last" },
|
||||
"type": "query"
|
||||
}
|
||||
],
|
||||
"datasource": { "type": "__expr__", "uid": "__expr__" },
|
||||
"expression": "A",
|
||||
"intervalMs": 1000,
|
||||
"maxDataPoints": 43200,
|
||||
"refId": "C",
|
||||
"type": "threshold"
|
||||
}
|
||||
}
|
||||
],
|
||||
"noDataState": "NoData",
|
||||
"execErrState": "Error",
|
||||
"for": "5m",
|
||||
"annotations": {},
|
||||
"labels": {},
|
||||
"isPaused": false
|
||||
},
|
||||
{{ end }}
|
||||
{{ if eq (env "ALERT_NODE_MEMORY_USAGE_ENABLED") "true" }}
|
||||
{
|
||||
"uid": "ads8cswmly96oa",
|
||||
"title": "Node Memory Usage",
|
||||
"condition": "C",
|
||||
"data": [
|
||||
{
|
||||
"refId": "A",
|
||||
"relativeTimeRange": { "from": 600, "to": 0 },
|
||||
"datasourceUid": "PBFA97CFB590B2093",
|
||||
"model": {
|
||||
"editorMode": "code",
|
||||
"expr": "(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100",
|
||||
"instant": true,
|
||||
"intervalMs": 1000,
|
||||
"legendFormat": "__auto",
|
||||
"maxDataPoints": 43200,
|
||||
"range": false,
|
||||
"refId": "A"
|
||||
}
|
||||
},
|
||||
{
|
||||
"refId": "C",
|
||||
"relativeTimeRange": { "from": 600, "to": 0 },
|
||||
"datasourceUid": "__expr__",
|
||||
"model": {
|
||||
"conditions": [
|
||||
{
|
||||
"evaluator": { "params": [90], "type": "gt" },
|
||||
"operator": { "type": "and" },
|
||||
"query": { "params": ["C"] },
|
||||
"reducer": { "params": [], "type": "last" },
|
||||
"type": "query"
|
||||
}
|
||||
],
|
||||
"datasource": { "type": "__expr__", "uid": "__expr__" },
|
||||
"expression": "A",
|
||||
"intervalMs": 1000,
|
||||
"maxDataPoints": 43200,
|
||||
"refId": "C",
|
||||
"type": "threshold"
|
||||
}
|
||||
}
|
||||
],
|
||||
"noDataState": "NoData",
|
||||
"execErrState": "Error",
|
||||
"for": "5m",
|
||||
"annotations": {},
|
||||
"labels": {},
|
||||
"isPaused": false
|
||||
}
|
||||
{{ end }}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
228
grafana-backup-dashboard.json
Normal file
228
grafana-backup-dashboard.json
Normal file
@ -0,0 +1,228 @@
|
||||
{
|
||||
"annotations": {
|
||||
"list": [
|
||||
{
|
||||
"builtIn": 1,
|
||||
"datasource": {
|
||||
"type": "grafana",
|
||||
"uid": "-- Grafana --"
|
||||
},
|
||||
"enable": true,
|
||||
"hide": true,
|
||||
"iconColor": "rgba(0, 211, 255, 1)",
|
||||
"name": "Annotations & Alerts",
|
||||
"type": "dashboard"
|
||||
}
|
||||
]
|
||||
},
|
||||
"editable": true,
|
||||
"fiscalYearStartMonth": 0,
|
||||
"graphTooltip": 0,
|
||||
"id": 6,
|
||||
"links": [],
|
||||
"panels": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "palette-classic"
|
||||
},
|
||||
"custom": {
|
||||
"axisBorderShow": false,
|
||||
"axisCenteredZero": false,
|
||||
"axisColorMode": "text",
|
||||
"axisLabel": "",
|
||||
"axisPlacement": "auto",
|
||||
"axisSoftMax": 2,
|
||||
"axisSoftMin": -2,
|
||||
"barAlignment": 0,
|
||||
"drawStyle": "line",
|
||||
"fillOpacity": 0,
|
||||
"gradientMode": "none",
|
||||
"hideFrom": {
|
||||
"legend": false,
|
||||
"tooltip": false,
|
||||
"viz": false
|
||||
},
|
||||
"insertNulls": false,
|
||||
"lineInterpolation": "linear",
|
||||
"lineWidth": 1,
|
||||
"pointSize": 5,
|
||||
"scaleDistribution": {
|
||||
"type": "linear"
|
||||
},
|
||||
"showPoints": "auto",
|
||||
"spanNulls": false,
|
||||
"stacking": {
|
||||
"group": "A",
|
||||
"mode": "none"
|
||||
},
|
||||
"thresholdsStyle": {
|
||||
"mode": "off"
|
||||
}
|
||||
},
|
||||
"mappings": [
|
||||
{
|
||||
"options": {
|
||||
"0": {
|
||||
"color": "dark-green",
|
||||
"index": 0
|
||||
},
|
||||
"1": {
|
||||
"color": "dark-yellow",
|
||||
"index": 1,
|
||||
"text": "Running"
|
||||
},
|
||||
"-1": {
|
||||
"index": 2,
|
||||
"text": "Fail"
|
||||
}
|
||||
},
|
||||
"type": "value"
|
||||
}
|
||||
],
|
||||
"max": 1,
|
||||
"min": -1,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "string"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 7,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"id": 1,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [],
|
||||
"displayMode": "list",
|
||||
"placement": "bottom",
|
||||
"showLegend": true
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "single",
|
||||
"sort": "none"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"disableTextWrap": false,
|
||||
"editorMode": "builder",
|
||||
"exemplar": false,
|
||||
"expr": "backup",
|
||||
"fullMetaSearch": false,
|
||||
"includeNullMetadata": true,
|
||||
"instant": false,
|
||||
"legendFormat": "__auto",
|
||||
"range": true,
|
||||
"refId": "A",
|
||||
"useBackend": false
|
||||
}
|
||||
],
|
||||
"title": "Backup Status",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "loki",
|
||||
"uid": "P8E80F9AEF21F6940"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 11,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 7
|
||||
},
|
||||
"id": 2,
|
||||
"options": {
|
||||
"dedupStrategy": "none",
|
||||
"enableLogDetails": true,
|
||||
"prettifyLogMessage": false,
|
||||
"showCommonLabels": false,
|
||||
"showLabels": false,
|
||||
"showTime": true,
|
||||
"sortOrder": "Descending",
|
||||
"wrapLogMessage": false
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "loki",
|
||||
"uid": "P8E80F9AEF21F6940"
|
||||
},
|
||||
"editorMode": "builder",
|
||||
"expr": "{service_name=\"$ServiceName\"} |= ``",
|
||||
"queryType": "range",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Backupbot Logs",
|
||||
"type": "logs"
|
||||
}
|
||||
],
|
||||
"refresh": "auto",
|
||||
"schemaVersion": 39,
|
||||
"tags": [],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"current": {
|
||||
"selected": true,
|
||||
"text": "backup_marx_klasse-methode_it_app",
|
||||
"value": "backup_marx_klasse-methode_it_app"
|
||||
},
|
||||
"datasource": {
|
||||
"type": "loki",
|
||||
"uid": "P8E80F9AEF21F6940"
|
||||
},
|
||||
"definition": "",
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "Backupbot Service",
|
||||
"multi": false,
|
||||
"name": "ServiceName",
|
||||
"options": [],
|
||||
"query": {
|
||||
"label": "service_name",
|
||||
"refId": "LokiVariableQueryEditor-VariableQuery",
|
||||
"stream": "",
|
||||
"type": 1
|
||||
},
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"skipUrlSync": false,
|
||||
"sort": 1,
|
||||
"type": "query"
|
||||
}
|
||||
]
|
||||
},
|
||||
"time": {
|
||||
"from": "now-24h",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {},
|
||||
"timezone": "browser",
|
||||
"title": "backupbot-two",
|
||||
"uid": "be8e2xeofw4xsa",
|
||||
"version": 3,
|
||||
"weekStart": ""
|
||||
}
|
||||
@ -11,3 +11,13 @@ providers:
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards
|
||||
foldersFromFilesStructure: true
|
||||
- name: 'default-alert-provider'
|
||||
orgId: 1
|
||||
folder: 'default-alerts'
|
||||
type: file
|
||||
disableDeletion: false
|
||||
updateIntervalSeconds: 10
|
||||
allowUiUpdates: true
|
||||
options:
|
||||
path: /var/lib/grafana/alerts
|
||||
foldersFromFilesStructure: true
|
||||
|
||||
1270
grafana-logs-dashboard.json
Normal file
1270
grafana-logs-dashboard.json
Normal file
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -3,7 +3,10 @@
|
||||
"list": [
|
||||
{
|
||||
"builtIn": 1,
|
||||
"datasource": "-- Grafana --",
|
||||
"datasource": {
|
||||
"type": "datasource",
|
||||
"uid": "grafana"
|
||||
},
|
||||
"enable": true,
|
||||
"hide": true,
|
||||
"iconColor": "rgba(0, 211, 255, 1)",
|
||||
@ -18,17 +21,35 @@
|
||||
}
|
||||
]
|
||||
},
|
||||
"description": "Simple dashboard for Traefik 2",
|
||||
"description": "Dashboards for Traefik Reverse Proxy",
|
||||
"editable": true,
|
||||
"fiscalYearStartMonth": 0,
|
||||
"gnetId": 11462,
|
||||
"graphTooltip": 0,
|
||||
"id": 3,
|
||||
"iteration": 1684839198931,
|
||||
"links": [],
|
||||
"links": [
|
||||
{
|
||||
"asDropdown": false,
|
||||
"icon": "external link",
|
||||
"includeVars": false,
|
||||
"keepTime": false,
|
||||
"tags": [
|
||||
"menu"
|
||||
],
|
||||
"targetBlank": false,
|
||||
"title": "dashboards",
|
||||
"tooltip": "",
|
||||
"type": "dashboards",
|
||||
"url": ""
|
||||
}
|
||||
],
|
||||
"liveNow": false,
|
||||
"panels": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"decimals": 0,
|
||||
@ -87,7 +108,7 @@
|
||||
},
|
||||
"textMode": "auto"
|
||||
},
|
||||
"pluginVersion": "8.4.4",
|
||||
"pluginVersion": "10.0.2",
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
@ -108,6 +129,10 @@
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"decimals": 0,
|
||||
@ -167,7 +192,7 @@
|
||||
},
|
||||
"textMode": "auto"
|
||||
},
|
||||
"pluginVersion": "8.4.4",
|
||||
"pluginVersion": "10.0.2",
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
@ -185,6 +210,10 @@
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"decimals": 0,
|
||||
@ -243,7 +272,7 @@
|
||||
},
|
||||
"textMode": "auto"
|
||||
},
|
||||
"pluginVersion": "8.4.4",
|
||||
"pluginVersion": "10.0.2",
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
@ -265,6 +294,10 @@
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"description": "",
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
@ -324,7 +357,7 @@
|
||||
},
|
||||
"textMode": "auto"
|
||||
},
|
||||
"pluginVersion": "8.4.4",
|
||||
"pluginVersion": "10.0.2",
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
@ -352,6 +385,10 @@
|
||||
"label": "Others",
|
||||
"threshold": 0
|
||||
},
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"description": "",
|
||||
"fontSize": "80%",
|
||||
"format": "short",
|
||||
@ -397,6 +434,10 @@
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"decimals": 0,
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
@ -433,7 +474,7 @@
|
||||
"alertThreshold": true
|
||||
},
|
||||
"percentage": false,
|
||||
"pluginVersion": "8.4.4",
|
||||
"pluginVersion": "10.0.2",
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
@ -493,6 +534,10 @@
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"description": "",
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
@ -530,7 +575,7 @@
|
||||
"alertThreshold": true
|
||||
},
|
||||
"percentage": false,
|
||||
"pluginVersion": "8.4.4",
|
||||
"pluginVersion": "10.0.2",
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
@ -587,6 +632,10 @@
|
||||
}
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"mappings": [
|
||||
@ -652,7 +701,7 @@
|
||||
},
|
||||
"textMode": "auto"
|
||||
},
|
||||
"pluginVersion": "8.4.4",
|
||||
"pluginVersion": "10.0.2",
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
@ -676,6 +725,10 @@
|
||||
"bars": true,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"links": []
|
||||
@ -712,7 +765,7 @@
|
||||
"alertThreshold": true
|
||||
},
|
||||
"percentage": false,
|
||||
"pluginVersion": "8.4.4",
|
||||
"pluginVersion": "10.0.2",
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
@ -726,12 +779,14 @@
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"editorMode": "code",
|
||||
"exemplar": true,
|
||||
"expr": "sum(delta(traefik_service_requests_total{instance=\"$instance\"}[$interval]))",
|
||||
"expr": "sum(delta(traefik_service_requests_total{instance=\"${instance:raw}\"}[$interval]))",
|
||||
"format": "time_series",
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "Total requests",
|
||||
"range": true,
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
@ -769,6 +824,10 @@
|
||||
}
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"links": [],
|
||||
@ -812,7 +871,7 @@
|
||||
},
|
||||
"textMode": "auto"
|
||||
},
|
||||
"pluginVersion": "8.4.4",
|
||||
"pluginVersion": "10.0.2",
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
@ -832,6 +891,10 @@
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"description": "",
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
@ -839,6 +902,8 @@
|
||||
"mode": "palette-classic"
|
||||
},
|
||||
"custom": {
|
||||
"axisCenteredZero": false,
|
||||
"axisColorMode": "text",
|
||||
"axisLabel": "",
|
||||
"axisPlacement": "auto",
|
||||
"barAlignment": 0,
|
||||
@ -899,8 +964,11 @@
|
||||
"lastNotNull",
|
||||
"max"
|
||||
],
|
||||
"displayMode": "list",
|
||||
"placement": "right"
|
||||
"displayMode": "table",
|
||||
"placement": "right",
|
||||
"showLegend": true,
|
||||
"sortBy": "Last *",
|
||||
"sortDesc": true
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "single",
|
||||
@ -914,12 +982,14 @@
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"editorMode": "code",
|
||||
"exemplar": true,
|
||||
"expr": "rate(traefik_service_request_duration_seconds_sum{ instance=\"$instance\" }[5m])",
|
||||
"expr": "sum(rate(traefik_service_request_duration_seconds_sum{ instance=\"$instance\" }[5m])) by(service)",
|
||||
"format": "time_series",
|
||||
"interval": "",
|
||||
"intervalFactor": 2,
|
||||
"legendFormat": "{{ service }}",
|
||||
"range": true,
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
@ -931,6 +1001,10 @@
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"links": []
|
||||
@ -964,7 +1038,7 @@
|
||||
"alertThreshold": true
|
||||
},
|
||||
"percentage": false,
|
||||
"pluginVersion": "8.4.4",
|
||||
"pluginVersion": "10.0.2",
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
@ -1023,6 +1097,10 @@
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"decimals": 0,
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
@ -1061,7 +1139,7 @@
|
||||
"alertThreshold": true
|
||||
},
|
||||
"percentage": false,
|
||||
"pluginVersion": "8.4.4",
|
||||
"pluginVersion": "10.0.2",
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
@ -1121,6 +1199,10 @@
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"links": []
|
||||
@ -1158,7 +1240,7 @@
|
||||
"alertThreshold": true
|
||||
},
|
||||
"percentage": false,
|
||||
"pluginVersion": "8.4.4",
|
||||
"pluginVersion": "10.0.2",
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
@ -1217,6 +1299,10 @@
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"links": []
|
||||
@ -1254,7 +1340,7 @@
|
||||
"alertThreshold": true
|
||||
},
|
||||
"percentage": false,
|
||||
"pluginVersion": "8.4.4",
|
||||
"pluginVersion": "10.0.2",
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
@ -1313,6 +1399,10 @@
|
||||
"bars": false,
|
||||
"dashLength": 10,
|
||||
"dashes": false,
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"decimals": 0,
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
@ -1349,7 +1439,7 @@
|
||||
"alertThreshold": true
|
||||
},
|
||||
"percentage": false,
|
||||
"pluginVersion": "8.4.4",
|
||||
"pluginVersion": "10.0.2",
|
||||
"pointradius": 5,
|
||||
"points": false,
|
||||
"renderer": "flot",
|
||||
@ -1413,13 +1503,14 @@
|
||||
}
|
||||
],
|
||||
"refresh": "",
|
||||
"schemaVersion": 35,
|
||||
"schemaVersion": 38,
|
||||
"style": "dark",
|
||||
"tags": [
|
||||
"traefik",
|
||||
"load-balancer",
|
||||
"docker",
|
||||
"prometheus"
|
||||
"prometheus",
|
||||
"menu"
|
||||
],
|
||||
"templating": {
|
||||
"list": [
|
||||
@ -1455,7 +1546,7 @@
|
||||
},
|
||||
{
|
||||
"current": {
|
||||
"selected": false,
|
||||
"selected": true,
|
||||
"text": [
|
||||
"All"
|
||||
],
|
||||
@ -1492,7 +1583,7 @@
|
||||
"auto_count": 30,
|
||||
"auto_min": "10s",
|
||||
"current": {
|
||||
"selected": true,
|
||||
"selected": false,
|
||||
"text": "5m",
|
||||
"value": "5m"
|
||||
},
|
||||
@ -1562,17 +1653,12 @@
|
||||
}
|
||||
],
|
||||
"query": "1m,5m,10m,30m,1h,6h,12h,1d,7d,14d,30d",
|
||||
"queryValue": "5m",
|
||||
"queryValue": "",
|
||||
"refresh": 2,
|
||||
"skipUrlSync": false,
|
||||
"type": "interval"
|
||||
},
|
||||
{
|
||||
"current": {
|
||||
"selected": true,
|
||||
"text": "demo.local-it.cloud:8082",
|
||||
"value": "demo.local-it.cloud:8082"
|
||||
},
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
@ -1586,12 +1672,12 @@
|
||||
"options": [],
|
||||
"query": {
|
||||
"query": "label_values(instance)",
|
||||
"refId": "StandardVariableQuery"
|
||||
"refId": "PrometheusVariableQueryEditor-VariableQuery"
|
||||
},
|
||||
"refresh": 1,
|
||||
"regex": ".*8082.*",
|
||||
"regex": ".*8082",
|
||||
"skipUrlSync": false,
|
||||
"sort": 0,
|
||||
"sort": 1,
|
||||
"tagValuesQuery": "",
|
||||
"tagsQuery": "",
|
||||
"type": "query",
|
||||
@ -1600,7 +1686,7 @@
|
||||
]
|
||||
},
|
||||
"time": {
|
||||
"from": "now-15m",
|
||||
"from": "now-2d",
|
||||
"to": "now"
|
||||
},
|
||||
"timepicker": {
|
||||
@ -1629,8 +1715,8 @@
|
||||
]
|
||||
},
|
||||
"timezone": "",
|
||||
"title": "Traefik 2",
|
||||
"title": "Traefik Reverse Proxy",
|
||||
"uid": "3ipsWfViz",
|
||||
"version": 5,
|
||||
"version": 9,
|
||||
"weekStart": ""
|
||||
}
|
||||
}
|
||||
|
||||
@ -10,6 +10,7 @@ auto_assign_org_role = Admin
|
||||
{{ if eq (env "OIDC_ENABLED") "1" }}
|
||||
[auth]
|
||||
disable_login_form = true
|
||||
oauth_allow_insecure_email_lookup=true # https://github.com/grafana/grafana/issues/70203
|
||||
|
||||
[auth.generic_oauth]
|
||||
enabled = true
|
||||
@ -18,8 +19,9 @@ name = oauth
|
||||
icon = signin
|
||||
tls_skip_verify_insecure = false
|
||||
allow_sign_up = true
|
||||
auto_login = true
|
||||
client_id = {{ env "OIDC_CLIENT_ID" }}
|
||||
client_secret = {{ secret "grafana_oidc_client_secret" }}
|
||||
client_secret = {{ secret "gf_oidc_secret" }}
|
||||
auth_url = {{ env "OIDC_AUTH_URL" }}
|
||||
token_url = {{ env "OIDC_TOKEN_URL" }}
|
||||
api_url = {{ env "OIDC_API_URL" }}
|
||||
@ -28,6 +30,9 @@ api_url = {{ env "OIDC_API_URL" }}
|
||||
enabled = false
|
||||
{{ end }}
|
||||
|
||||
|
||||
[plugins]
|
||||
enable_alpha = true
|
||||
enable_alpha = true
|
||||
|
||||
[database]
|
||||
type = sqlite3
|
||||
wal = true
|
||||
@ -34,7 +34,6 @@ ingester:
|
||||
max_chunk_age: 1h # All chunks will be flushed when they hit this age, default is 1h
|
||||
chunk_target_size: 1048576 # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
|
||||
chunk_retain_period: 30s # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
|
||||
max_transfer_retries: 0 # Chunk transfers disabled
|
||||
wal:
|
||||
dir: "/tmp/wal"
|
||||
|
||||
@ -53,7 +52,7 @@ schema_config:
|
||||
- from: 2020-10-24
|
||||
store: boltdb-shipper
|
||||
object_store: filesystem
|
||||
schema: v11
|
||||
schema: v13
|
||||
index:
|
||||
prefix: index_
|
||||
period: 24h
|
||||
@ -63,7 +62,6 @@ storage_config:
|
||||
active_index_directory: /loki/boltdb-shipper-active
|
||||
cache_location: /loki/boltdb-shipper-cache
|
||||
cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
|
||||
shared_store: filesystem
|
||||
filesystem:
|
||||
directory: /loki/chunks
|
||||
{{ end }}
|
||||
@ -72,7 +70,6 @@ schema_config:
|
||||
configs:
|
||||
- from: 2020-11-25
|
||||
store: boltdb-shipper
|
||||
object_store: aws
|
||||
schema: v11
|
||||
index:
|
||||
prefix: index_
|
||||
@ -89,7 +86,7 @@ storage_config:
|
||||
endpoint: {{ env "LOKI_AWS_ENDPOINT" }}
|
||||
region: {{ env "LOKI_AWS_REGION" }}
|
||||
access_key_id: {{ env "LOKI_ACCESS_KEY_ID" }}
|
||||
secret_access_key: {{ secret "loki_aws_secret_access_key" }}
|
||||
secret_access_key: {{ secret "loki_aws_key" }}
|
||||
bucketnames: {{ env "LOKI_BUCKET_NAMES" }}
|
||||
insecure: false
|
||||
sse_encryption: false
|
||||
@ -103,20 +100,30 @@ storage_config:
|
||||
|
||||
compactor:
|
||||
working_directory: /loki/boltdb-shipper-compactor
|
||||
shared_store: filesystem
|
||||
compaction_interval: 10m
|
||||
retention_enabled: true
|
||||
retention_delete_delay: 2h
|
||||
retention_delete_worker_count: 150
|
||||
{{ if eq (env "LOKI_STORAGE_FILESYSTEM") "1" }}
|
||||
delete_request_store: filesystem
|
||||
{{ end }}
|
||||
{{ if eq (env "LOKI_STORAGE_S3") "1" }}
|
||||
delete_request_store: aws
|
||||
{{ end }}
|
||||
|
||||
limits_config:
|
||||
enforce_metric_name: false
|
||||
reject_old_samples: true
|
||||
reject_old_samples_max_age: 168h
|
||||
retention_period: {{ env "LOKI_RETENTION_PERIOD" }}
|
||||
split_queries_by_interval: 24h
|
||||
max_query_parallelism: 100
|
||||
allow_structured_metadata: false
|
||||
|
||||
chunk_store_config:
|
||||
max_look_back_period: 0s
|
||||
query_scheduler:
|
||||
max_outstanding_requests_per_tenant: 4096
|
||||
|
||||
frontend:
|
||||
max_outstanding_per_tenant: 4096
|
||||
|
||||
table_manager:
|
||||
retention_deletes_enabled: false
|
||||
|
||||
@ -1,11 +0,0 @@
|
||||
#!/bin/sh -e
|
||||
|
||||
NODE_NAME=$(cat /etc/nodename)
|
||||
|
||||
mkdir -p /etc/node-exporter
|
||||
|
||||
echo "node_meta{node_id=\"$NODE_ID\", container_label_com_docker_swarm_node_id=\"$NODE_ID\", node_name=\"$NODE_NAME\"} 1" > /etc/node-exporter/node-meta.prom
|
||||
|
||||
set -- /bin/node_exporter "$@"
|
||||
|
||||
exec "$@"
|
||||
@ -17,4 +17,4 @@ scrape_configs:
|
||||
- /prometheus/scrape_configs/*.yml
|
||||
basic_auth:
|
||||
username: admin
|
||||
password: {{ secret "basic_auth_admin_password" }}
|
||||
password: {{ secret "basic_auth" }}
|
||||
|
||||
@ -1,35 +0,0 @@
|
||||
server:
|
||||
http_listen_port: 9080
|
||||
grpc_listen_port: 0
|
||||
|
||||
positions:
|
||||
filename: /tmp/positions.yaml
|
||||
|
||||
clients:
|
||||
- url: {{ env "LOKI_PUSH_URL" }}
|
||||
basic_auth:
|
||||
username: admin
|
||||
password: {{ secret "basic_auth_admin_password" }}
|
||||
|
||||
scrape_configs:
|
||||
- job_name: system
|
||||
static_configs:
|
||||
- targets:
|
||||
- localhost
|
||||
labels:
|
||||
job: varlogs
|
||||
__path__: /var/log/*log
|
||||
|
||||
- job_name: "docker"
|
||||
docker_sd_configs:
|
||||
- host: "unix:///var/run/docker.sock"
|
||||
refresh_interval: "10s"
|
||||
relabel_configs:
|
||||
- source_labels: ['__meta_docker_container_name']
|
||||
target_label: "container_name"
|
||||
- source_labels: ['__meta_docker_container_id']
|
||||
target_label: "container_id"
|
||||
- source_labels: ['__meta_docker_container_label_com_docker_stack_namespace']
|
||||
target_label: "stack_namespace"
|
||||
- source_labels: ['__meta_docker_container_label_com_docker_swarm_service_name']
|
||||
target_label: "service_name"
|
||||
4
release/1.0.0+v1.7.0
Normal file
4
release/1.0.0+v1.7.0
Normal file
@ -0,0 +1,4 @@
|
||||
Breakng change: secret `basic_auth_admin_password` was renamed to `basic_auth`. Insert the secret before upgrading. And change the env BASIC_AUTH_ADMIN_PASSWORD to BASIC_AUTH
|
||||
|
||||
abra app secret insert monitoring.example.com basic_auth v1 $(abra app run monitoring.example.com promtail cat /var/run/secrets/basic_auth_admin_password)
|
||||
sed -i ~/.abra/servers/example.com/monitoring.example.com.env -e 's/BASIC_AUTH_ADMIN_PASSWORD/BASIC_AUTH/'
|
||||
1
release/1.4.0+v1.8.1
Normal file
1
release/1.4.0+v1.8.1
Normal file
@ -0,0 +1 @@
|
||||
Adds an optional GRAFANA_DOMAIN
|
||||
1
release/1.5.0+v1.8.1
Normal file
1
release/1.5.0+v1.8.1
Normal file
@ -0,0 +1 @@
|
||||
Adds an optional matrix contact point for grafana
|
||||
1
release/1.6.0+v1.8.1
Normal file
1
release/1.6.0+v1.8.1
Normal file
@ -0,0 +1 @@
|
||||
Adds option to expose ports for node and cadvisor service
|
||||
12
release/next
Normal file
12
release/next
Normal file
@ -0,0 +1,12 @@
|
||||
1. OIDC was moved into a seperate compose file. If you have oidc configured you need to add the following line to you .env file:
|
||||
|
||||
COMPOSE_FILE="$COMPOSE_FILE:compose.grafana-oidc.yml"
|
||||
|
||||
2. SMTP was moved into a seperate compose file. If you have smtp configured you need to add the following line to you .env file:
|
||||
|
||||
COMPOSE_FILE="$COMPOSE_FILE:compose.grafana-smtp.yml"
|
||||
|
||||
3. The scrape-config.example.yml file and add_node() command were updated to use a secure endpoint for the traefik metrics instead of http. This requires an updated Traefik recipe that publishes the metrics on https.
|
||||
|
||||
4. Secret and config names were shortened to max 14 characters to prevent going over Docker's 64 character limit when STACK_NAME and VERSION are added to it.
|
||||
When upgrading, you need to reinsert the secrets with their shorter names. Run `abra app secret list <domain>` to see which secrets aren't created on the server (because their name was shortened) and run `abra app secret insert <domain> <secret_name> v1 <value>` to reinsert them with the shorter name. Or you can use the migrate_secret_names function in abra.sh to reinsert all existing secrets with their shorter name automatically: `abra app cmd --local <domain> migrate_secret_names`
|
||||
@ -1,4 +1,4 @@
|
||||
- targets:
|
||||
- 'example.org:8082'
|
||||
- 'metrics.traefik.example.org'
|
||||
- 'node.monitoring.example.org'
|
||||
- 'cadvisor.monitoring.example.org'
|
||||
|
||||
Reference in New Issue
Block a user