Compare commits

..

10 Commits

Author SHA1 Message Date
682f30cef1 Add migrate_secret_names() to abra.sh to reinsert all secrets with shortened names in docker 2026-03-25 16:11:37 +01:00
694c8a9875 Add instructions for shorter secret names to release notes 2026-03-25 16:11:28 +01:00
9dfa9cad2a Shortened all the secret and config names to max 14 characters to prevent running into Docker's 64 character limit when STACK_NAME is appended to it. 2026-03-25 15:58:28 +01:00
99f8790ec4 fix: Update scape-config example to use HTTPS for Traefik metrics (#17)
This fixes the insecure Traefik metrics endpoint. See coop-cloud/traefik#94 for details.

Reviewed-on: coop-cloud/monitoring-ng#17
Co-authored-by: Danny Groenewegen <mail@dannygroenewegen.nl>
Co-committed-by: Danny Groenewegen <mail@dannygroenewegen.nl>
2026-03-24 09:37:05 +00:00
310c28e735 refactor: provision alerts instead of putting them in the /var/lib folder (#16)
Note that I did not copy the backupbot alert since this one gets a rework soon

Reviewed-on: coop-cloud/monitoring-ng#16
Co-authored-by: p4u1 <p4u1_f4u1@riseup.net>
Co-committed-by: p4u1 <p4u1_f4u1@riseup.net>
2026-03-20 14:10:10 +00:00
16bd65f417 fix recipe part in the domain (#8)
I created a new app using this recipe and the domain wasn't automatically replaced, I'm guessing cause the part before the root domain didn't match the recipe name?

Just opening a PR real quick so I can get back to it and test the fix later when I have cycles

Co-authored-by: p4u1 <p4u1@noreply.git.coopcloud.tech>
Reviewed-on: coop-cloud/monitoring-ng#8
Reviewed-by: p4u1 <p4u1@noreply.git.coopcloud.tech>
Co-authored-by: ammaratef45 <ammaratef45@proton.me>
Co-committed-by: ammaratef45 <ammaratef45@proton.me>
2026-03-20 09:23:36 +00:00
97ebcf306a add all mountpoints to free disk space in Docker Swarm dashboard (#4)
Until now, only / and /media were monitored in the Docker Swarm dashboard. We removed the filters and changed the dashboard to a time series, so multiple mounts can be shown at once.
We also updated the alert, so it also triggers on all mount ext4 points.

Reviewed-on: coop-cloud/monitoring-ng#4
Co-authored-by: Apfelwurm <Alexander@volzit.de>
Co-committed-by: Apfelwurm <Alexander@volzit.de>
2026-03-20 09:15:52 +00:00
f93370b9ca Moves oidc to a seperate compose config (#6)
Otherwise the secret has to be provided when oidc is not used

Reviewed-on: coop-cloud/monitoring-ng#6
Co-authored-by: p4u1 <p4u1_f4u1@riseup.net>
Co-committed-by: p4u1 <p4u1_f4u1@riseup.net>
2026-03-20 09:10:48 +00:00
83461e2e76 remove default TIMEOUT (abra #596) 2025-12-30 13:53:47 +01:00
7dbe5bf22e fix: Removes duplicate basic auth from prometheus and a few other improvements 2025-02-21 18:31:54 +01:00
19 changed files with 453 additions and 146 deletions

View File

@ -1,8 +1,8 @@
TYPE=monitoring-ng
LETS_ENCRYPT_ENV=production
COMPOSE_FILE=compose.yml
DOMAIN=monitoring.example.com
TIMEOUT=120
DOMAIN=monitoring-ng.example.com
#TIMEOUT=120
ENABLE_BACKUPS=true
## Enable this secret for Promtail / Prometheus
@ -39,19 +39,20 @@ ENABLE_BACKUPS=true
# LOKI_AWS_REGION=eu-west-1
# LOKI_ACCESS_KEY_ID=bush-debrief-approval-robust-scraggly-molecule
# LOKI_BUCKET_NAMES=loki
# SECRET_LOKI_AWS_SECRET_ACCESS_KEY_VERSION=v1
# SECRET_LOKI_AWS_KEY_VERSION=v1
#
## Grafana
#
# COMPOSE_FILE="$COMPOSE_FILE:compose.grafana.yml"
# GF_SERVER_ROOT_URL=https://monitoring.example.com
# SECRET_GRAFANA_ADMIN_PASSWORD_VERSION=v1
# SECRET_GF_ADMINPASSWD_VERSION=v1
## Seperate domain for Grafana
#GRAFANA_DOMAIN=grafana.example.com
#
## Single-Sign-On with OIDC
# COMPOSE_FILE="$COMPOSE_FILE:compose.grafana-oidc.yml"
# OIDC_ENABLED=1
# SECRET_GRAFANA_OIDC_CLIENT_SECRET_VERSION=v1
# SECRET_GF_OIDC_SECRET_VERSION=v1
# OIDC_CLIENT_ID=grafana
# OIDC_AUTH_URL="https://authentik.example.com/application/o/authorize/"
# OIDC_API_URL="https://authentik.example.com/application/o/userinfo/"
@ -62,17 +63,18 @@ ENABLE_BACKUPS=true
# GF_INSTALL_PLUGINS=grafana-piechart-panel
#
## grafana SMTP configuration (optional)
# COMPOSE_FILE="$COMPOSE_FILE:compose.grafana-smtp.yml"
# GF_SMTP_HOST=changeme
# GF_SMTP_USER=changme
# GF_SMTP_ENABLED=true
# GF_SMTP_FROM_ADDRESS=grafana@example.com
# GF_SMTP_SKIP_VERIFY=false
# SECRET_GRAFANA_SMTP_PASSWORD_VERSION=v1
# SECRET_GF_SMTP_PASSWD_VERSION=v1
#
## Grafana Matrix Contact Point (optional)
#COMPOSE_FILE="$COMPOSE_FILE:compose.matrix-alertmanager-receiver.yml"
#SECRET_MATRIX_ACCESS_TOKEN_VERSION=v1
#SECRET_MATRIX_TOKEN_VERSION=v1
#GF_MATRIX_USER_ID="<user-id>"
#GF_MATRIX_ROOM_ID="<room-id>"
#GF_MATRIX_HOMESERVER_URL="<homeserver-url>"

View File

@ -36,7 +36,7 @@ Where gathering.org is the node you want to gather metrics from.
SECRET_USERSFILE_VERSION=v1
```
- Generate userslist with httpasswd hashed password
`abra app secret insert traefik.gathering.org userslist v1 'admin:<hashed-secret>'`
`abra app secret insert traefik.gathering.org usersfile v1 'admin:<hashed-secret>'`
make sure there is no whitespace in between `admin:<hashed-secret>`, it seems to break stuff...
- `abra app deploy -f traefik`
1. `abra app new monitoring-ng`
@ -145,7 +145,7 @@ COMPOSE_FILE="$COMPOSE_FILE:compose.matrix-alertmanager-receiver.yml"
2. Insert the matrix access token secret:
```
abra app secret insert monitoring.marx.klasse-methode.it matrix_access_token v1
abra app secret insert monitoring.marx.klasse-methode.it matrix_token v1
```
3. Set required configurations:

113
abra.sh
View File

@ -1,27 +1,122 @@
export ENTRYPOINT_VERSION=v1
export GRAFANA_DATASOURCES_YML_VERSION=v1
export GRAFANA_DASHBOARDS_YML_VERSION=v2
export GRAFANA_SWARM_DASHBOARD_JSON_VERSION=v2
export GRAFANA_STACKS_DASHBOARD_JSON_VERSION=v2
export GRAFANA_TRAEFIK_DASHBOARD_JSON_VERSION=v2
export GRAFANA_BACKUP_DASHBOARD_JSON_VERSION=v1
export GRAFANA_ALERTS_JSON_VERSION=v3
export GRAFANA_CUSTOM_INI_VERSION=v4
export GF_DATASOURCES_VERSION=v1
export GF_DASHBOARDS_VERSION=v2
export GF_SWARM_DASH_VERSION=v2
export GF_STACKS_DASH_VERSION=v2
export GF_TRAEFIK_DASH_VERSION=v2
export GF_BACKUP_DASH_VERSION=v1
export GF_CUSTOM_INI_VERSION=v4
export PROMTAIL_YML_VERSION=v3
export LOKI_YML_VERSION=v2
export PROMETHEUS_YML_VERSION=v2
export MATRIX_ALERTMANAGER_CONFIG_VERSION=e
export MATRIX_ALERTMANAGER_ENTRYPOINT_VERSION=a
export GRAFANA_ALERTS_NODE_VERSION=v1c
# creates a default prometheus scrape config for a given node
add_node(){
name=$1
add_domain "$name" "$name:8082"
add_domain "$name" "metrics.traefik.$name"
add_domain "$name" "node.monitoring.$name"
add_domain "$name" "cadvisor.monitoring.$name"
cat "/prometheus/scrape_configs/$name.yml"
}
# migrates secrets from old names to new names by reading values from the
# running containers on the server and re-inserting them under the new names.
# preview changes: abra app cmd --local <app> migrate_secret_names
# execute changes: abra app cmd --local <app> migrate_secret_names execute
migrate_secret_names() {
if ! command -v jq &> /dev/null; then
echo "jq is required on your local machine to migrate secret names"
echo "It could not be found in your PATH, please install jq to proceed."
echo "For example: On a debian/ubuntu system, run `apt install jq`"
exit 1
fi
# Hardcoded migration mappings: old_secret_name|new_secret_name
MIGRATIONS="
grafana_admin_password|gf_adminpasswd
grafana_smtp_password|gf_smtp_passwd
grafana_oidc_client_secret|gf_oidc_secret
matrix_access_token|matrix_token
loki_aws_secret_access_key|loki_aws_key
"
# Determine which server the app is deployed on
SERVER=$(abra app ls -m | jq -r --arg domain "$APP_NAME" '[.[].apps[] | select(.domain == $domain) | .server] | first' 2>/dev/null)
if [ -z "$SERVER" ]; then
echo "Error: could not determine server for app '$APP_NAME'"
exit 1
fi
# Build a lookup table of all secrets currently mounted in this stack.
# Each line: <secretID> <containerID> <secretName>
LOOKUP=$(ssh "$SERVER" "
docker stack services ${STACK_NAME} --format '{{.Name}}' | while read svc; do
CID=\$(docker ps --no-trunc -q --filter \"name=\${svc}\" | head -1)
docker service inspect \"\$svc\" --format '{{json .Spec.TaskTemplate.ContainerSpec.Secrets}}' | \
jq -r --arg cid \"\$CID\" '.[]? | .SecretID + \" \" + \$cid + \" \" + .SecretName'
done | sort -k3 -r
" 2>/dev/null)
echo "Secret migration plan for: $APP_NAME (server: $SERVER)"
echo ""
printf " %-24s %-8s %s\n" "OLD NAME" "FOUND" "ACTION"
printf " %-24s %-8s %s\n" "--------" "-----" "------"
# Check each old name against the lookup table and display the plan
ANY_FOUND=false
while IFS='|' read -r OLD_NAME NEW_NAME; do
[ -z "$OLD_NAME" ] && continue
MATCH=$(echo "$LOOKUP" | grep " ${STACK_NAME}_${OLD_NAME}_" | head -1)
if [ -n "$MATCH" ]; then
printf " %-24s %-8s %s\n" "$OLD_NAME" "yes" "recreate as '$NEW_NAME' version V1"
ANY_FOUND=true
else
printf " %-24s %-8s %s\n" "$OLD_NAME" "no" "nothing (not found on server)"
fi
done <<< "$MIGRATIONS"
echo ""
if [ "$ANY_FOUND" = false ]; then
echo "No old secrets found on server. Nothing to migrate."
return 0
fi
if [ "$1" != "execute" ]; then
echo "To apply the above changes, run:"
echo " abra app cmd --local $APP_NAME migrate_secret_names execute"
return 0
fi
# read each found secret from its container and re-insert with the new name
while IFS='|' read -r OLD_NAME NEW_NAME; do
[ -z "$OLD_NAME" ] && continue
MATCH=$(echo "$LOOKUP" | grep " ${STACK_NAME}_${OLD_NAME}_" | head -1)
[ -z "$MATCH" ] && continue
SECRET_ID=$(echo "$MATCH" | awk '{print $1}')
CID=$(echo "$MATCH" | awk '{print $2}')
SECRET_VALUE=$(ssh "$SERVER" "cat /var/lib/docker/containers/${CID}/mounts/secrets/${SECRET_ID} 2>/dev/null || sudo cat /var/lib/docker/containers/${CID}/mounts/secrets/${SECRET_ID} 2>/dev/null")
if [ -z "$SECRET_VALUE" ]; then
echo "Error: could not read value for '$OLD_NAME', skipping"
continue
fi
echo "Migrating: '$OLD_NAME' -> '$NEW_NAME' (v1)"
printf '%s' "$SECRET_VALUE" | abra app secret insert -C "$APP_NAME" "$NEW_NAME" v1
done <<< "$MIGRATIONS"
echo ""
echo "Done."
}
# adds a domain to a scrape config or creates a new one
add_domain(){
name=$1

View File

@ -12,7 +12,7 @@ http:
matrix:
homeserver-url: "{{ env "GF_MATRIX_HOMESERVER_URL" }}"
user-id: "{{ env "GF_MATRIX_USER_ID" }}"
access-token: "{{ secret "matrix_access_token" }}"
access-token: "{{ secret "matrix_token" }}"
room-mapping:
matrixroom: "{{ env "GF_MATRIX_ROOM_ID" }}"

131
alerts/node.yml.tmpl Normal file
View File

@ -0,0 +1,131 @@
apiVersion: 1
# List of alert rule UIDs that should be deleted
deleteRules:
{{ if ne (env "ALERT_NODE_DISK_SPACE_ENABLED") "true" }}
- orgId: 1
uid: bds8bhxu97pxca
{{ end }}
{{ if ne (env "ALERT_NODE_MEMORY_USAGE_ENABLED") "true" }}
- orgId: 1
uid: ads8cswmly96oa
{{ end }}
groups:
- orgId: 1
name: node
folder: node
interval: 5m
rules:
{{ if eq (env "ALERT_NODE_DISK_SPACE_ENABLED") "true" }}
- uid: bds8bhxu97pxca
title: Node Disk Space
condition: C
data:
- refId: A
relativeTimeRange:
from: 600
to: 0
datasourceUid: PBFA97CFB590B2093
model:
editorMode: code
expr: (node_filesystem_free_bytes{fstype="ext4"} / node_filesystem_size_bytes{fstype="ext4"}) * 100
instant: true
intervalMs: 1000
legendFormat: __auto
maxDataPoints: 43200
range: false
refId: A
- refId: C
relativeTimeRange:
from: 600
to: 0
datasourceUid: __expr__
model:
conditions:
- evaluator:
params:
- 10
type: lt
operator:
type: and
query:
params:
- C
reducer:
params: []
type: last
type: query
datasource:
type: __expr__
uid: __expr__
expression: A
intervalMs: 1000
maxDataPoints: 43200
refId: C
type: threshold
noDataState: NoData
execErrState: Error
for: 5m
annotations:
description: ""
runbook_url: ""
summary: Less than 10% disk space left on {{`{{ $labels.instance }}`}} ({{`{{ (index $values "A").Value }}`}}% left)
labels:
"": ""
isPaused: false
{{ end }}
{{ if eq (env "ALERT_NODE_MEMORY_USAGE_ENABLED") "true" }}
- uid: ads8cswmly96oa
title: Node Memory Usage
condition: C
data:
- refId: A
relativeTimeRange:
from: 600
to: 0
datasourceUid: PBFA97CFB590B2093
model:
editorMode: code
expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
instant: true
intervalMs: 1000
legendFormat: __auto
maxDataPoints: 43200
range: false
refId: A
- refId: C
relativeTimeRange:
from: 600
to: 0
datasourceUid: __expr__
model:
conditions:
- evaluator:
params:
- 85
type: gt
operator:
type: and
query:
params:
- C
reducer:
params: []
type: last
type: query
datasource:
type: __expr__
uid: __expr__
expression: A
intervalMs: 1000
maxDataPoints: 43200
refId: C
type: threshold
noDataState: NoData
execErrState: Error
for: 5m
annotations:
summary: Memory usage is above 85% on {{`{{ $labels.instance }}`}} ({{`{{ printf "%.2f" (index $values "A").Value }}`}}% usage)
isPaused: false
{{ end }}

17
compose.grafana-oidc.yml Normal file
View File

@ -0,0 +1,17 @@
version: '3.8'
services:
grafana:
secrets:
- gf_oidc_secret
environment:
- OIDC_API_URL
- OIDC_AUTH_URL
- OIDC_CLIENT_ID
- OIDC_ENABLED
- OIDC_TOKEN_URL
secrets:
gf_oidc_secret:
external: true
name: ${STACK_NAME}_gf_oidc_secret_${SECRET_GF_OIDC_SECRET_VERSION}

18
compose.grafana-smtp.yml Normal file
View File

@ -0,0 +1,18 @@
version: '3.8'
services:
grafana:
secrets:
- gf_smtp_passwd
environment:
- GF_SMTP_HOST
- GF_SMTP_USER
- GF_SMTP_PASSWORD__FILE=/run/secrets/gf_smtp_passwd
- GF_SMTP_ENABLED
- GF_SMTP_FROM_ADDRESS
- GF_SMTP_SKIP_VERIFY
secrets:
gf_smtp_passwd:
external: true
name: ${STACK_NAME}_gf_smtp_passwd_${SECRET_GF_SMTP_PASSWD_VERSION}

View File

@ -6,48 +6,38 @@ services:
volumes:
- grafana-data:/var/lib/grafana:rw
secrets:
- grafana_admin_password
- grafana_oidc_client_secret
- grafana_smtp_password
- gf_adminpasswd
configs:
- source: grafana_custom_ini
- source: gf_custom_ini
target: /etc/grafana/grafana.ini
- source: grafana_datasources_yml
- source: gf_datasources
target: /etc/grafana/provisioning/datasources/datasources.yml
- source: grafana_dashboards_yml
- source: gf_dashboards
target: /etc/grafana/provisioning/dashboards/dashboards.yml
- source: grafana_swarm_dashboard_json
- source: gf_swarm_dash
target: /var/lib/grafana/dashboards/docker-swarm-nodes.json
- source: grafana_stacks_dashboard_json
- source: gf_stacks_dash
target: /var/lib/grafana/dashboards/docker-swarm-stacks.json
- source: grafana_traefik_dashboard_json
- source: gf_traefik_dash
target: /var/lib/grafana/dashboards/traefik.json
- source: grafana_backup_dashboard_json
- source: gf_backup_dash
target: /var/lib/grafana/dashboards/backup.json
- source: grafana_alerts_json
target: /var/lib/grafana/alerts/alerts.json
- source: gf_alerts_node
target: /etc/grafana/provisioning/alerting/node.yml
networks:
- proxy
- internal
environment:
- GF_SERVER_ROOT_URL
- GF_SECURITY_ADMIN_PASSWORD__FILE=/run/secrets/grafana_admin_password
- GF_SMTP_HOST
- GF_SMTP_USER
- GF_SMTP_PASSWORD__FILE=/run/secrets/grafana_smtp_password
- GF_SMTP_ENABLED
- GF_SMTP_FROM_ADDRESS
- GF_SMTP_SKIP_VERIFY
- GF_SECURITY_ADMIN_PASSWORD__FILE=/run/secrets/gf_adminpasswd
- GF_SECURITY_ALLOW_EMBEDDING
- GF_INSTALL_PLUGINS
- OIDC_API_URL
- OIDC_AUTH_URL
- OIDC_CLIENT_ID
- OIDC_ENABLED
- OIDC_TOKEN_URL
- ALERT_NODE_DISK_SPACE_ENABLED
- ALERT_NODE_MEMORY_USAGE_ENABLED
deploy:
labels:
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.http.services.${STACK_NAME}-grafana.loadbalancer.server.port=3000"
- "traefik.http.routers.${STACK_NAME}-grafana.rule=Host(`${GRAFANA_DOMAIN:-$DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-grafana.entrypoints=web-secure"
@ -61,44 +51,38 @@ services:
start_period: 10s
configs:
grafana_custom_ini:
gf_custom_ini:
template_driver: golang
name: ${STACK_NAME}_grafana_custom_ini_${GRAFANA_CUSTOM_INI_VERSION}
name: ${STACK_NAME}_gf_custom_ini_${GF_CUSTOM_INI_VERSION}
file: grafana_custom.ini
grafana_datasources_yml:
name: ${STACK_NAME}_g_datasources_yml_${GRAFANA_DATASOURCES_YML_VERSION}
gf_datasources:
name: ${STACK_NAME}_gf_datasources_${GF_DATASOURCES_VERSION}
file: grafana-datasources.yml
grafana_dashboards_yml:
name: ${STACK_NAME}_g_dashboards_yml_${GRAFANA_DASHBOARDS_YML_VERSION}
gf_dashboards:
name: ${STACK_NAME}_gf_dashboards_${GF_DASHBOARDS_VERSION}
file: grafana-dashboards.yml
grafana_swarm_dashboard_json:
name: ${STACK_NAME}_g_swarm_dashboard_json_${GRAFANA_SWARM_DASHBOARD_JSON_VERSION}
gf_swarm_dash:
name: ${STACK_NAME}_gf_swarm_dash_${GF_SWARM_DASH_VERSION}
file: grafana-swarm-dashboard.json
grafana_stacks_dashboard_json:
name: ${STACK_NAME}_g_stacks_dashboard_json_${GRAFANA_STACKS_DASHBOARD_JSON_VERSION}
gf_stacks_dash:
name: ${STACK_NAME}_gf_stacks_dash_${GF_STACKS_DASH_VERSION}
file: grafana-stacks-dashboard.json
grafana_traefik_dashboard_json:
name: ${STACK_NAME}_g_traefik_dashboard_json_${GRAFANA_TRAEFIK_DASHBOARD_JSON_VERSION}
gf_traefik_dash:
name: ${STACK_NAME}_gf_traefik_dash_${GF_TRAEFIK_DASH_VERSION}
file: grafana-traefik-dashboard.json
grafana_backup_dashboard_json:
name: ${STACK_NAME}_g_backup_dashboard_json_${GRAFANA_BACKUP_DASHBOARD_JSON_VERSION}
gf_backup_dash:
name: ${STACK_NAME}_gf_backup_dash_${GF_BACKUP_DASH_VERSION}
file: grafana-backup-dashboard.json
grafana_alerts_json:
gf_alerts_node:
template_driver: golang
name: ${STACK_NAME}_g_alerts_json_${GRAFANA_ALERTS_JSON_VERSION}
file: grafana-alerts.json.tmpl
name: ${STACK_NAME}_gf_alerts_node_${GRAFANA_ALERTS_NODE_VERSION}
file: alerts/node.yml.tmpl
volumes:
grafana-data:
secrets:
grafana_admin_password:
gf_adminpasswd:
external: true
name: ${STACK_NAME}_grafana_admin_password_${SECRET_GRAFANA_ADMIN_PASSWORD_VERSION}
grafana_oidc_client_secret:
external: true
name: ${STACK_NAME}_grafana_oidc_client_secret_${SECRET_GRAFANA_OIDC_CLIENT_SECRET_VERSION}
grafana_smtp_password:
external: true
name: ${STACK_NAME}_grafana_smtp_password_${SECRET_GRAFANA_SMTP_PASSWORD_VERSION}
name: ${STACK_NAME}_gf_adminpasswd_${SECRET_GF_ADMINPASSWD_VERSION}

View File

@ -12,7 +12,7 @@ services:
volumes:
- loki-data:/loki
# secrets:
# - loki_aws_secret_access_key
# - loki_aws_key
environment:
- LOKI_ACCESS_KEY_ID
- LOKI_AWS_ENDPOINT
@ -27,6 +27,7 @@ services:
condition: on-failure
labels:
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.http.services.${STACK_NAME}-loki.loadbalancer.server.port=3100"
- "traefik.http.routers.${STACK_NAME}-loki.rule=Host(`loki.${DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-loki.entrypoints=web-secure"
@ -46,6 +47,6 @@ volumes:
loki-data:
# secrets:
# loki_aws_secret_access_key:
# loki_aws_key:
# external: true
# name: ${STACK_NAME}_loki_aws_secret_access_key_${SECRET_LOKI_AWS_SECRET_ACCESS_KEY_VERSION}
# name: ${STACK_NAME}_loki_aws_key_${SECRET_LOKI_AWS_KEY_VERSION}

View File

@ -4,7 +4,7 @@ services:
matrix-alertmanager-receiver:
image: metio/matrix-alertmanager-receiver:2025.2.9
secrets:
- matrix_access_token
- matrix_token
configs:
- source: matrix-alertmanager-receiver-config
target: /etc/matrix-alertmanager-receiver/config.yml
@ -23,6 +23,6 @@ configs:
file: alertmanager-matrix-config.yml.tmpl
secrets:
matrix_access_token:
matrix_token:
external: true
name: ${STACK_NAME}_matrix_access_token_${SECRET_MATRIX_ACCESS_TOKEN_VERSION}
name: ${STACK_NAME}_matrix_token_${SECRET_MATRIX_TOKEN_VERSION}

View File

@ -24,12 +24,12 @@ services:
condition: on-failure
labels:
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.http.services.${STACK_NAME}-prometheus.loadbalancer.server.port=9090"
- "traefik.http.routers.${STACK_NAME}-prometheus.rule=Host(`prometheus.${DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-prometheus.entrypoints=web-secure"
- "traefik.http.routers.${STACK_NAME}-prometheus.tls=true"
- "traefik.http.routers.${STACK_NAME}-prometheus.tls.certresolver=${LETS_ENCRYPT_ENV}"
- "traefik.http.routers.${STACK_NAME}-prometheus.middlewares=basicauth@file"
configs:
prometheus_yml:

View File

@ -17,6 +17,7 @@ services:
condition: on-failure
labels:
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.http.services.${STACK_NAME}-pushgateway.loadbalancer.server.port=9191"
- "traefik.http.routers.${STACK_NAME}-pushgateway.rule=Host(`pushgateway.${DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-pushgateway.entrypoints=web-secure"

View File

@ -32,6 +32,7 @@ services:
labels:
- "backupbot.backup=${ENABLE_BACKUPS:-true}"
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.http.services.${STACK_NAME}-node.loadbalancer.server.port=9100"
- "traefik.http.routers.${STACK_NAME}-node.rule=Host(`node.${DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-node.entrypoints=web-secure"
@ -39,7 +40,7 @@ services:
- "traefik.http.routers.${STACK_NAME}-node.tls.certresolver=${LETS_ENCRYPT_ENV}"
- "traefik.http.routers.${STACK_NAME}-node.middlewares=basicauth@file"
- "coop-cloud.${STACK_NAME}.version=1.6.0+v1.8.1"
- "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
- "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT}"
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.49.2
@ -63,6 +64,7 @@ services:
condition: on-failure
labels:
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.http.services.${STACK_NAME}-cadvisor.loadbalancer.server.port=8080"
- "traefik.http.routers.${STACK_NAME}-cadvisor.rule=Host(`cadvisor.${DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-cadvisor.entrypoints=web-secure"

View File

@ -216,7 +216,7 @@
"datasourceUid": "PBFA97CFB590B2093",
"model": {
"editorMode": "code",
"expr": "(node_filesystem_free_bytes{fstype=\"ext4\",mountpoint=~\"(/$)|(/media.*)\"} / node_filesystem_size_bytes{fstype=\"ext4\",mountpoint=~\"(/$)|(/media.*)\"}) * 100",
"expr": "(node_filesystem_free_bytes{fstype=\"ext4\"} / node_filesystem_size_bytes{fstype=\"ext4\"}) * 100",
"instant": true,
"intervalMs": 1000,
"legendFormat": "__auto",

View File

@ -93,7 +93,6 @@
},
"hideTimeOverride": true,
"id": 2,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "value",
@ -107,10 +106,12 @@
"fields": "",
"values": false
},
"showPercentChange": false,
"text": {},
"textMode": "auto"
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"targets": [
{
"datasource": {
@ -172,7 +173,6 @@
"y": 0
},
"id": 1,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "value",
@ -186,10 +186,12 @@
"fields": "",
"values": false
},
"showPercentChange": false,
"text": {},
"textMode": "auto"
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"targets": [
{
"datasource": {
@ -251,7 +253,6 @@
},
"hideTimeOverride": true,
"id": 4,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "value",
@ -265,10 +266,12 @@
"fields": "",
"values": false
},
"showPercentChange": false,
"text": {},
"textMode": "auto"
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"targets": [
{
"datasource": {
@ -335,9 +338,10 @@
"y": 0
},
"id": 8,
"links": [],
"maxDataPoints": 100,
"options": {
"minVizHeight": 75,
"minVizWidth": 75,
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
@ -348,9 +352,10 @@
},
"showThresholdLabels": false,
"showThresholdMarkers": true,
"sizing": "auto",
"text": {}
},
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"targets": [
{
"datasource": {
@ -405,13 +410,12 @@
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"pointradius": 5,
"points": false,
"renderer": "flot",
@ -507,7 +511,6 @@
},
"hideTimeOverride": true,
"id": 3,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "value",
@ -521,10 +524,12 @@
"fields": "",
"values": false
},
"showPercentChange": false,
"text": {},
"textMode": "auto"
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"targets": [
{
"datasource": {
@ -585,7 +590,6 @@
},
"hideTimeOverride": true,
"id": 9,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "value",
@ -599,10 +603,12 @@
"fields": "",
"values": false
},
"showPercentChange": false,
"text": {},
"textMode": "auto"
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"targets": [
{
"datasource": {
@ -671,9 +677,10 @@
},
"hideTimeOverride": true,
"id": 11,
"links": [],
"maxDataPoints": 100,
"options": {
"minVizHeight": 75,
"minVizWidth": 75,
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
@ -684,9 +691,10 @@
},
"showThresholdLabels": false,
"showThresholdMarkers": true,
"sizing": "auto",
"text": {}
},
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"targets": [
{
"datasource": {
@ -713,7 +721,39 @@
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "left",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [
{
@ -747,33 +787,42 @@
},
"unit": "percent"
},
"overrides": []
"overrides": [
{
"matcher": {
"id": "byType",
"options": "time"
},
"properties": [
{
"id": "custom.axisPlacement",
"value": "hidden"
}
]
}
]
},
"gridPos": {
"h": 4,
"w": 2.6666666666666665,
"h": 6,
"w": 6,
"x": 0,
"y": 8
},
"id": 10,
"links": [],
"maxDataPoints": 100,
"maxPerRow": 12,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
"last"
],
"fields": "",
"values": false
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"textMode": "auto"
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"repeat": "node_id",
"repeatDirection": "h",
"targets": [
@ -782,18 +831,20 @@
"type": "prometheus",
"uid": "PBFA97CFB590B2093"
},
"editorMode": "code",
"exemplar": true,
"expr": "sum((node_filesystem_free_bytes{fstype=\"ext4\",mountpoint=~\"(/$)|(/media.*)\"} / node_filesystem_size_bytes{fstype=\"ext4\",mountpoint=~\"(/$)|(/media.*)\"}) * on(instance) group_left(node_name) node_meta{node_name=~\"$node_id\"} * 100) / count(node_meta * on(instance) group_left(node_name) node_meta{node_name=~\"$node_id\"})",
"expr": "node_filesystem_free_bytes{fstype=\"ext4\"} / node_filesystem_size_bytes{fstype=\"ext4\"} * on(instance) group_left(node_name) node_meta{node_name=~\"$node_id\"} * 100",
"format": "time_series",
"interval": "",
"intervalFactor": 2,
"legendFormat": "",
"legendFormat": "{{mountpoint}}",
"range": true,
"refId": "A",
"step": 20
}
],
"title": "Available Disk Space $node_id",
"type": "stat"
"type": "timeseries"
},
{
"aliasColors": {},
@ -811,7 +862,7 @@
"h": 7,
"w": 24,
"x": 0,
"y": 12
"y": 14
},
"hiddenSeries": false,
"id": 14,
@ -830,13 +881,12 @@
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null as zero",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"pointradius": 5,
"points": false,
"renderer": "flot",
@ -900,6 +950,7 @@
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
@ -913,6 +964,7 @@
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
@ -961,12 +1013,11 @@
},
"gridPos": {
"h": 7,
"w": 2.6666666666666665,
"w": 6,
"x": 0,
"y": 19
"y": 21
},
"id": 15,
"links": [],
"maxPerRow": 12,
"options": {
"legend": {
@ -1074,7 +1125,7 @@
"h": 7,
"w": 24,
"x": 0,
"y": 26
"y": 28
},
"hiddenSeries": false,
"id": 16,
@ -1091,13 +1142,12 @@
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null as zero",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"pointradius": 5,
"points": false,
"renderer": "flot",
@ -1178,7 +1228,7 @@
"h": 7,
"w": 12,
"x": 0,
"y": 33
"y": 35
},
"hiddenSeries": false,
"id": 18,
@ -1195,7 +1245,6 @@
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null as zero",
"options": {
"alertThreshold": true
@ -1281,7 +1330,7 @@
"h": 7,
"w": 12,
"x": 12,
"y": 33
"y": 35
},
"hiddenSeries": false,
"id": 19,
@ -1300,7 +1349,6 @@
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null as zero",
"options": {
"alertThreshold": true
@ -1376,7 +1424,7 @@
"h": 7,
"w": 18,
"x": 0,
"y": 40
"y": 42
},
"hiddenSeries": false,
"id": 12,
@ -1397,7 +1445,6 @@
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": true
@ -1499,10 +1546,9 @@
"h": 7,
"w": 6,
"x": 18,
"y": 40
"y": 42
},
"id": 7,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "none",
@ -1600,10 +1646,9 @@
"h": 7,
"w": 24,
"x": 0,
"y": 47
"y": 49
},
"id": 17,
"links": [],
"options": {
"legend": {
"calcs": [],
@ -1658,7 +1703,7 @@
"h": 9,
"w": 24,
"x": 0,
"y": 54
"y": 56
},
"id": 30,
"options": {
@ -1688,8 +1733,7 @@
}
],
"refresh": "",
"schemaVersion": 38,
"style": "dark",
"schemaVersion": 39,
"tags": [
"swarmprom",
"prometheus",
@ -1836,6 +1880,6 @@
"timezone": "",
"title": "Docker Swarm Nodes",
"uid": "BPlb-Sgik",
"version": 24,
"version": 7,
"weekStart": ""
}
}

View File

@ -21,7 +21,7 @@ tls_skip_verify_insecure = false
allow_sign_up = true
auto_login = true
client_id = {{ env "OIDC_CLIENT_ID" }}
client_secret = {{ secret "grafana_oidc_client_secret" }}
client_secret = {{ secret "gf_oidc_secret" }}
auth_url = {{ env "OIDC_AUTH_URL" }}
token_url = {{ env "OIDC_TOKEN_URL" }}
api_url = {{ env "OIDC_API_URL" }}

View File

@ -89,7 +89,7 @@ storage_config:
endpoint: {{ env "LOKI_AWS_ENDPOINT" }}
region: {{ env "LOKI_AWS_REGION" }}
access_key_id: {{ env "LOKI_ACCESS_KEY_ID" }}
secret_access_key: {{ secret "loki_aws_secret_access_key" }}
secret_access_key: {{ secret "loki_aws_key" }}
bucketnames: {{ env "LOKI_BUCKET_NAMES" }}
insecure: false
sse_encryption: false

12
release/next Normal file
View File

@ -0,0 +1,12 @@
1. OIDC was moved into a seperate compose file. If you have oidc configured you need to add the following line to you .env file:
COMPOSE_FILE="$COMPOSE_FILE:compose.grafana-oidc.yml"
2. SMTP was moved into a seperate compose file. If you have smtp configured you need to add the following line to you .env file:
COMPOSE_FILE="$COMPOSE_FILE:compose.grafana-smtp.yml"
3. The scrape-config.example.yml file and add_node() command were updated to use a secure endpoint for the traefik metrics instead of http. This requires an updated Traefik recipe that publishes the metrics on https.
4. Secret and config names were shortened to max 14 characters to prevent going over Docker's 64 character limit when STACK_NAME and VERSION are added to it.
When upgrading, you need to reinsert the secrets with their shorter names. Run `abra app secret list <domain>` to see which secrets aren't created on the server (because their name was shortened) and run `abra app secret insert <domain> <secret_name> v1 <value>` to reinsert them with the shorter name. Or you can use the migrate_secret_names function in abra.sh to reinsert all existing secrets with their shorter name automatically: `abra app cmd --local <domain> migrate_secret_names`

View File

@ -1,4 +1,4 @@
- targets:
- 'example.org:8082'
- 'metrics.traefik.example.org'
- 'node.monitoring.example.org'
- 'cadvisor.monitoring.example.org'