Compare commits

...

15 Commits

Author SHA1 Message Date
8cdc3bf744 Add instructions for shorter secret names to release notes 2026-03-20 22:00:06 +01:00
cda5c89033 Shortened all the secret and config names to max 14 characters to prevent running into Docker's 64 character limit when STACK_NAME is appended to it. 2026-03-20 21:35:10 +01:00
310c28e735 refactor: provision alerts instead of putting them in the /var/lib folder (#16)
Note that I did not copy the backupbot alert since this one gets a rework soon

Reviewed-on: coop-cloud/monitoring-ng#16
Co-authored-by: p4u1 <p4u1_f4u1@riseup.net>
Co-committed-by: p4u1 <p4u1_f4u1@riseup.net>
2026-03-20 14:10:10 +00:00
16bd65f417 fix recipe part in the domain (#8)
I created a new app using this recipe and the domain wasn't automatically replaced, I'm guessing cause the part before the root domain didn't match the recipe name?

Just opening a PR real quick so I can get back to it and test the fix later when I have cycles

Co-authored-by: p4u1 <p4u1@noreply.git.coopcloud.tech>
Reviewed-on: coop-cloud/monitoring-ng#8
Reviewed-by: p4u1 <p4u1@noreply.git.coopcloud.tech>
Co-authored-by: ammaratef45 <ammaratef45@proton.me>
Co-committed-by: ammaratef45 <ammaratef45@proton.me>
2026-03-20 09:23:36 +00:00
97ebcf306a add all mountpoints to free disk space in Docker Swarm dashboard (#4)
Until now, only / and /media were monitored in the Docker Swarm dashboard. We removed the filters and changed the dashboard to a time series, so multiple mounts can be shown at once.
We also updated the alert, so it also triggers on all mount ext4 points.

Reviewed-on: coop-cloud/monitoring-ng#4
Co-authored-by: Apfelwurm <Alexander@volzit.de>
Co-committed-by: Apfelwurm <Alexander@volzit.de>
2026-03-20 09:15:52 +00:00
f93370b9ca Moves oidc to a seperate compose config (#6)
Otherwise the secret has to be provided when oidc is not used

Reviewed-on: coop-cloud/monitoring-ng#6
Co-authored-by: p4u1 <p4u1_f4u1@riseup.net>
Co-committed-by: p4u1 <p4u1_f4u1@riseup.net>
2026-03-20 09:10:48 +00:00
83461e2e76 remove default TIMEOUT (abra #596) 2025-12-30 13:53:47 +01:00
7dbe5bf22e fix: Removes duplicate basic auth from prometheus and a few other improvements 2025-02-21 18:31:54 +01:00
89b5fef6ac chore: publish 1.6.0+v1.8.1 release 2025-02-21 17:23:06 +01:00
cd42c64544 feat: Adds option to expose ports for node and cadvisor service 2025-02-21 17:21:46 +01:00
70719dbee8 chore: publish 1.5.0+v1.8.1 release 2025-02-18 16:04:21 +01:00
8900ace6a2 feat: Adds matrix contact point for grafana alerts 2025-02-18 16:03:47 +01:00
8b464156bd chore: publish 1.4.0+v1.8.1 release 2025-02-17 12:18:39 +01:00
73c4ec3e74 feat: Adds optional GRAFANA_DOMAIN 2025-02-17 12:16:29 +01:00
225899785b Merge pull request 'feat: Adds dashboard and alerts for backupbot' (#2) from backup-dashboard into main
Reviewed-on: coop-cloud/monitoring-ng#2
2025-01-14 13:35:29 +00:00
22 changed files with 511 additions and 141 deletions

View File

@ -1,8 +1,8 @@
TYPE=monitoring-ng
LETS_ENCRYPT_ENV=production
COMPOSE_FILE=compose.yml
DOMAIN=monitoring.example.com
TIMEOUT=120
DOMAIN=monitoring-ng.example.com
#TIMEOUT=120
ENABLE_BACKUPS=true
## Enable this secret for Promtail / Prometheus
@ -12,6 +12,9 @@ ENABLE_BACKUPS=true
# COMPOSE_FILE="$COMPOSE_FILE:compose.promtail.yml"
# LOKI_PUSH_URL=https://loki.monitoring.example.org/loki/api/v1/push
## Expose node and cadvisor ports instead of traefik
# COMPOSE_FILE="$COMPOSE_FILE:compose.expose-ports.yml"
# Monitoring Server
#
## Prometheus
@ -36,17 +39,20 @@ ENABLE_BACKUPS=true
# LOKI_AWS_REGION=eu-west-1
# LOKI_ACCESS_KEY_ID=bush-debrief-approval-robust-scraggly-molecule
# LOKI_BUCKET_NAMES=loki
# SECRET_LOKI_AWS_SECRET_ACCESS_KEY_VERSION=v1
# SECRET_LOKI_AWS_KEY_VERSION=v1
#
## Grafana
#
# COMPOSE_FILE="$COMPOSE_FILE:compose.grafana.yml"
# GF_SERVER_ROOT_URL=https://monitoring.example.com
# SECRET_GRAFANA_ADMIN_PASSWORD_VERSION=v1
# SECRET_GF_ADMINPASSWD_VERSION=v1
## Seperate domain for Grafana
#GRAFANA_DOMAIN=grafana.example.com
#
## Single-Sign-On with OIDC
# COMPOSE_FILE="$COMPOSE_FILE:compose.grafana-oidc.yml"
# OIDC_ENABLED=1
# SECRET_GRAFANA_OIDC_CLIENT_SECRET_VERSION=v1
# SECRET_GF_OIDC_SECRET_VERSION=v1
# OIDC_CLIENT_ID=grafana
# OIDC_AUTH_URL="https://authentik.example.com/application/o/authorize/"
# OIDC_API_URL="https://authentik.example.com/application/o/userinfo/"
@ -57,14 +63,22 @@ ENABLE_BACKUPS=true
# GF_INSTALL_PLUGINS=grafana-piechart-panel
#
## grafana SMTP configuration (optional)
# COMPOSE_FILE="$COMPOSE_FILE:compose.grafana-smtp.yml"
# GF_SMTP_HOST=changeme
# GF_SMTP_USER=changme
# GF_SMTP_ENABLED=true
# GF_SMTP_FROM_ADDRESS=grafana@example.com
# GF_SMTP_SKIP_VERIFY=false
# SECRET_GRAFANA_SMTP_PASSWORD_VERSION=v1
# SECRET_GF_SMTP_PASSWD_VERSION=v1
#
## Grafana Matrix Contact Point (optional)
#COMPOSE_FILE="$COMPOSE_FILE:compose.matrix-alertmanager-receiver.yml"
#SECRET_MATRIX_TOKEN_VERSION=v1
#GF_MATRIX_USER_ID="<user-id>"
#GF_MATRIX_ROOM_ID="<room-id>"
#GF_MATRIX_HOMESERVER_URL="<homeserver-url>"
# ALerts
#ALERT_BACKUP_FAILED_ENABLED=true
#ALERT_BACKUP_MISSING_ENABLED=true

View File

@ -36,7 +36,7 @@ Where gathering.org is the node you want to gather metrics from.
SECRET_USERSFILE_VERSION=v1
```
- Generate userslist with httpasswd hashed password
`abra app secret insert traefik.gathering.org userslist v1 'admin:<hashed-secret>'`
`abra app secret insert traefik.gathering.org usersfile v1 'admin:<hashed-secret>'`
make sure there is no whitespace in between `admin:<hashed-secret>`, it seems to break stuff...
- `abra app deploy -f traefik`
1. `abra app new monitoring-ng`
@ -47,6 +47,13 @@ Where gathering.org is the node you want to gather metrics from.
- cadvisor.monitoring.gathering.org
- node.monitoring.gathering.org
### Expose node and cadvisor via ports instead of traefik
In case you have no traefik running on the machine, you can expose the ports directly by uncommenting the following line:
```
# COMPOSE_FILE="$COMPOSE_FILE:compose.expose-ports.yml"
```
## Setup Metrics Browser
@ -85,7 +92,6 @@ Where gathering.org is the node you want to gather metrics from.
| Cadvisor | traefik basic-auth | cadvisor.monitoring.example.org |
| Node Exporter | traefik basic-auth | node.monitoring.example.org |
### Logging from a docker host to loki server without anything else
```
@ -130,6 +136,26 @@ After that you need to add the `pushgateway.${DOMAIN}` to the scare config.
THX to the previous work of @decentral1se @knooflok @3wc @cellarspoon @mirsal
## Adding Matrix as Alert Contact point
1. Enable the [matrix-alertmanager-receiver](https://github.com/metio/matrix-alertmanager-receiver/):
```
COMPOSE_FILE="$COMPOSE_FILE:compose.matrix-alertmanager-receiver.yml"
```
2. Insert the matrix access token secret:
```
abra app secret insert monitoring.marx.klasse-methode.it matrix_token v1
```
3. Set required configurations:
```
GF_MATRIX_USER_ID=
GF_MATRIX_ROOM_ID=
GF_MATRIX_HOME_SERVER_URL=
```
4. Configure Alertmanager webhook and set the url to `http://matrix-alertmanager-receiver:12345/alerts/<room-id>`
## alerts
It is possible to enable the following alerts, by setting the corresponding env variable to `true`:
@ -138,3 +164,5 @@ It is possible to enable the following alerts, by setting the corresponding env
- backupbot not successfull: `ALERT_BACKUP_NOT_SUCCESSFULL_ENABLED`
- node disk space: `ALERT_NODE_DISK_SPACE_ENABLED`
- node memory usage: `ALERT_NODE_MEMORY_USAGE_ENABLED`

18
abra.sh
View File

@ -1,15 +1,17 @@
export ENTRYPOINT_VERSION=v1
export GRAFANA_DATASOURCES_YML_VERSION=v1
export GRAFANA_DASHBOARDS_YML_VERSION=v2
export GRAFANA_SWARM_DASHBOARD_JSON_VERSION=v2
export GRAFANA_STACKS_DASHBOARD_JSON_VERSION=v2
export GRAFANA_TRAEFIK_DASHBOARD_JSON_VERSION=v2
export GRAFANA_BACKUP_DASHBOARD_JSON_VERSION=v1
export GRAFANA_ALERTS_JSON_VERSION=v3
export GRAFANA_CUSTOM_INI_VERSION=v4
export GF_DATASOURCES_VERSION=v1
export GF_DASHBOARDS_VERSION=v2
export GF_SWARM_DASH_VERSION=v2
export GF_STACKS_DASH_VERSION=v2
export GF_TRAEFIK_DASH_VERSION=v2
export GF_BACKUP_DASH_VERSION=v1
export GF_CUSTOM_INI_VERSION=v4
export PROMTAIL_YML_VERSION=v3
export LOKI_YML_VERSION=v2
export PROMETHEUS_YML_VERSION=v2
export MATRIX_ALERTMANAGER_CONFIG_VERSION=e
export MATRIX_ALERTMANAGER_ENTRYPOINT_VERSION=a
export GRAFANA_ALERTS_NODE_VERSION=v1c
# creates a default prometheus scrape config for a given node
add_node(){

View File

@ -0,0 +1,74 @@
# configuration of the HTTP server
http:
## address: 127.0.0.1 # bind address for this service. Can be left unspecified to bind on all interfaces
port: 12345 # port used by this service
alerts-path-prefix: /alerts # URL path for the webhook receiver called by an Alertmanager. Defaults to /alerts
metrics-path: /metrics # URL path to collect metrics. Defaults to /metrics
metrics-enabled: true # Whether to enable metrics or not. Defaults to false
# basic-username: alertmanager # Username for basic authentication. Defaults to alertmanager
# basic-password: secret # If set, the alerts endpoint expects basic-auth credentials with the configured username and password
# configuration for the Matrix connection
matrix:
homeserver-url: "{{ env "GF_MATRIX_HOMESERVER_URL" }}"
user-id: "{{ env "GF_MATRIX_USER_ID" }}"
access-token: "{{ secret "matrix_token" }}"
room-mapping:
matrixroom: "{{ env "GF_MATRIX_ROOM_ID" }}"
templating:
# mapping of ExternalURL values
external-url-mapping:
# key is the original value taken from the Alertmanager payload
# value is the mapped value which will be available as '.ExternalURL' in templates
"http://alertmanager:9093": https://alertmanager.example.com
# mapping of GeneratorURL values
generator-url-mapping:
# key is the original value taken from the Alertmanager payload
# value is the mapped value which will be available as '.GeneratorURL' in templates
"http://prometheus:8080": https://prometheus.example.com
# computation of arbitrary values based on matching alert annotations, labels, or status
# values will be evaluated top to bottom, last entry wins
computed-values:
- values: # always set 'color' to 'yellow'
color: yellow
- values: # set 'color' to 'orange' when alert label 'severity' is 'warning'
color: orange
when-matching-labels:
severity: warning
- values: # set 'color' to 'red' when alert label 'severity' is 'critical'
color: red
when-matching-labels:
severity: critical
- values: # set 'color' to 'green' when alert status is 'resolved'
color: green
when-matching-status: resolved
# template for alerts in status 'firing'
firing-template: '{{`
<p>
<strong><font color="{{ .ComputedValues.color }}">{{ .Alert.Status | ToUpper }}</font></strong>
{{ if .Alert.Labels.name }}
{{ .Alert.Labels.name }}
{{ else if .Alert.Labels.alertname }}
{{ .Alert.Labels.alertname }}
{{ end }}
>>
{{ if .Alert.Labels.severity }}
{{ .Alert.Labels.severity | ToUpper }}:
{{ end }}
{{ if .Alert.Annotations.description }}
{{ .Alert.Annotations.description }}
{{ else if .Alert.Annotations.summary }}
{{ .Alert.Annotations.summary }}
{{ end }}
>>
{{ if .Alert.Annotations.runbook }}
<a href="{{ .Alert.Annotations.runbook }}">Runbook</a> |
{{ end }}
{{ if .Alert.Annotations.dashboard }}
<a href="{{ .Alert.Annotations.dashboard }}">Dashboard</a> |
{{ end }}
<a href="{{ .SilenceURL }}">Silence</a>
</p>`}}'

131
alerts/node.yml.tmpl Normal file
View File

@ -0,0 +1,131 @@
apiVersion: 1
# List of alert rule UIDs that should be deleted
deleteRules:
{{ if ne (env "ALERT_NODE_DISK_SPACE_ENABLED") "true" }}
- orgId: 1
uid: bds8bhxu97pxca
{{ end }}
{{ if ne (env "ALERT_NODE_MEMORY_USAGE_ENABLED") "true" }}
- orgId: 1
uid: ads8cswmly96oa
{{ end }}
groups:
- orgId: 1
name: node
folder: node
interval: 5m
rules:
{{ if eq (env "ALERT_NODE_DISK_SPACE_ENABLED") "true" }}
- uid: bds8bhxu97pxca
title: Node Disk Space
condition: C
data:
- refId: A
relativeTimeRange:
from: 600
to: 0
datasourceUid: PBFA97CFB590B2093
model:
editorMode: code
expr: (node_filesystem_free_bytes{fstype="ext4"} / node_filesystem_size_bytes{fstype="ext4"}) * 100
instant: true
intervalMs: 1000
legendFormat: __auto
maxDataPoints: 43200
range: false
refId: A
- refId: C
relativeTimeRange:
from: 600
to: 0
datasourceUid: __expr__
model:
conditions:
- evaluator:
params:
- 10
type: lt
operator:
type: and
query:
params:
- C
reducer:
params: []
type: last
type: query
datasource:
type: __expr__
uid: __expr__
expression: A
intervalMs: 1000
maxDataPoints: 43200
refId: C
type: threshold
noDataState: NoData
execErrState: Error
for: 5m
annotations:
description: ""
runbook_url: ""
summary: Less than 10% disk space left on {{`{{ $labels.instance }}`}} ({{`{{ (index $values "A").Value }}`}}% left)
labels:
"": ""
isPaused: false
{{ end }}
{{ if eq (env "ALERT_NODE_MEMORY_USAGE_ENABLED") "true" }}
- uid: ads8cswmly96oa
title: Node Memory Usage
condition: C
data:
- refId: A
relativeTimeRange:
from: 600
to: 0
datasourceUid: PBFA97CFB590B2093
model:
editorMode: code
expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100
instant: true
intervalMs: 1000
legendFormat: __auto
maxDataPoints: 43200
range: false
refId: A
- refId: C
relativeTimeRange:
from: 600
to: 0
datasourceUid: __expr__
model:
conditions:
- evaluator:
params:
- 85
type: gt
operator:
type: and
query:
params:
- C
reducer:
params: []
type: last
type: query
datasource:
type: __expr__
uid: __expr__
expression: A
intervalMs: 1000
maxDataPoints: 43200
refId: C
type: threshold
noDataState: NoData
execErrState: Error
for: 5m
annotations:
summary: Memory usage is above 85% on {{`{{ $labels.instance }}`}} ({{`{{ printf "%.2f" (index $values "A").Value }}`}}% usage)
isPaused: false
{{ end }}

13
compose.expose-ports.yml Normal file
View File

@ -0,0 +1,13 @@
---
version: "3.8"
services:
app:
ports:
- "9100:9100"
deploy:
cadvisor:
ports:
- "9101:8080"
deploy:

17
compose.grafana-oidc.yml Normal file
View File

@ -0,0 +1,17 @@
version: '3.8'
services:
grafana:
secrets:
- gf_oidc_secret
environment:
- OIDC_API_URL
- OIDC_AUTH_URL
- OIDC_CLIENT_ID
- OIDC_ENABLED
- OIDC_TOKEN_URL
secrets:
gf_oidc_secret:
external: true
name: ${STACK_NAME}_gf_oidc_secret_${SECRET_GF_OIDC_SECRET_VERSION}

18
compose.grafana-smtp.yml Normal file
View File

@ -0,0 +1,18 @@
version: '3.8'
services:
grafana:
secrets:
- gf_smtp_passwd
environment:
- GF_SMTP_HOST
- GF_SMTP_USER
- GF_SMTP_PASSWORD__FILE=/run/secrets/gf_smtp_passwd
- GF_SMTP_ENABLED
- GF_SMTP_FROM_ADDRESS
- GF_SMTP_SKIP_VERIFY
secrets:
gf_smtp_passwd:
external: true
name: ${STACK_NAME}_gf_smtp_passwd_${SECRET_GF_SMTP_PASSWD_VERSION}

View File

@ -6,50 +6,40 @@ services:
volumes:
- grafana-data:/var/lib/grafana:rw
secrets:
- grafana_admin_password
- grafana_oidc_client_secret
- grafana_smtp_password
- gf_adminpasswd
configs:
- source: grafana_custom_ini
- source: gf_custom_ini
target: /etc/grafana/grafana.ini
- source: grafana_datasources_yml
- source: gf_datasources
target: /etc/grafana/provisioning/datasources/datasources.yml
- source: grafana_dashboards_yml
- source: gf_dashboards
target: /etc/grafana/provisioning/dashboards/dashboards.yml
- source: grafana_swarm_dashboard_json
- source: gf_swarm_dash
target: /var/lib/grafana/dashboards/docker-swarm-nodes.json
- source: grafana_stacks_dashboard_json
- source: gf_stacks_dash
target: /var/lib/grafana/dashboards/docker-swarm-stacks.json
- source: grafana_traefik_dashboard_json
- source: gf_traefik_dash
target: /var/lib/grafana/dashboards/traefik.json
- source: grafana_backup_dashboard_json
- source: gf_backup_dash
target: /var/lib/grafana/dashboards/backup.json
- source: grafana_alerts_json
target: /var/lib/grafana/alerts/alerts.json
- source: gf_alerts_node
target: /etc/grafana/provisioning/alerting/node.yml
networks:
- proxy
- internal
environment:
- GF_SERVER_ROOT_URL
- GF_SECURITY_ADMIN_PASSWORD__FILE=/run/secrets/grafana_admin_password
- GF_SMTP_HOST
- GF_SMTP_USER
- GF_SMTP_PASSWORD__FILE=/run/secrets/grafana_smtp_password
- GF_SMTP_ENABLED
- GF_SMTP_FROM_ADDRESS
- GF_SMTP_SKIP_VERIFY
- GF_SECURITY_ADMIN_PASSWORD__FILE=/run/secrets/gf_adminpasswd
- GF_SECURITY_ALLOW_EMBEDDING
- GF_INSTALL_PLUGINS
- OIDC_API_URL
- OIDC_AUTH_URL
- OIDC_CLIENT_ID
- OIDC_ENABLED
- OIDC_TOKEN_URL
- ALERT_NODE_DISK_SPACE_ENABLED
- ALERT_NODE_MEMORY_USAGE_ENABLED
deploy:
labels:
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.http.services.${STACK_NAME}-grafana.loadbalancer.server.port=3000"
- "traefik.http.routers.${STACK_NAME}-grafana.rule=Host(`${DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-grafana.rule=Host(`${GRAFANA_DOMAIN:-$DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-grafana.entrypoints=web-secure"
- "traefik.http.routers.${STACK_NAME}-grafana.tls=true"
- "traefik.http.routers.${STACK_NAME}-grafana.tls.certresolver=${LETS_ENCRYPT_ENV}"
@ -61,44 +51,38 @@ services:
start_period: 10s
configs:
grafana_custom_ini:
gf_custom_ini:
template_driver: golang
name: ${STACK_NAME}_grafana_custom_ini_${GRAFANA_CUSTOM_INI_VERSION}
name: ${STACK_NAME}_gf_custom_ini_${GF_CUSTOM_INI_VERSION}
file: grafana_custom.ini
grafana_datasources_yml:
name: ${STACK_NAME}_g_datasources_yml_${GRAFANA_DATASOURCES_YML_VERSION}
gf_datasources:
name: ${STACK_NAME}_gf_datasources_${GF_DATASOURCES_VERSION}
file: grafana-datasources.yml
grafana_dashboards_yml:
name: ${STACK_NAME}_g_dashboards_yml_${GRAFANA_DASHBOARDS_YML_VERSION}
gf_dashboards:
name: ${STACK_NAME}_gf_dashboards_${GF_DASHBOARDS_VERSION}
file: grafana-dashboards.yml
grafana_swarm_dashboard_json:
name: ${STACK_NAME}_g_swarm_dashboard_json_${GRAFANA_SWARM_DASHBOARD_JSON_VERSION}
gf_swarm_dash:
name: ${STACK_NAME}_gf_swarm_dash_${GF_SWARM_DASH_VERSION}
file: grafana-swarm-dashboard.json
grafana_stacks_dashboard_json:
name: ${STACK_NAME}_g_stacks_dashboard_json_${GRAFANA_STACKS_DASHBOARD_JSON_VERSION}
gf_stacks_dash:
name: ${STACK_NAME}_gf_stacks_dash_${GF_STACKS_DASH_VERSION}
file: grafana-stacks-dashboard.json
grafana_traefik_dashboard_json:
name: ${STACK_NAME}_g_traefik_dashboard_json_${GRAFANA_TRAEFIK_DASHBOARD_JSON_VERSION}
gf_traefik_dash:
name: ${STACK_NAME}_gf_traefik_dash_${GF_TRAEFIK_DASH_VERSION}
file: grafana-traefik-dashboard.json
grafana_backup_dashboard_json:
name: ${STACK_NAME}_g_backup_dashboard_json_${GRAFANA_BACKUP_DASHBOARD_JSON_VERSION}
gf_backup_dash:
name: ${STACK_NAME}_gf_backup_dash_${GF_BACKUP_DASH_VERSION}
file: grafana-backup-dashboard.json
grafana_alerts_json:
gf_alerts_node:
template_driver: golang
name: ${STACK_NAME}_g_alerts_json_${GRAFANA_ALERTS_JSON_VERSION}
file: grafana-alerts.json.tmpl
name: ${STACK_NAME}_gf_alerts_node_${GRAFANA_ALERTS_NODE_VERSION}
file: alerts/node.yml.tmpl
volumes:
grafana-data:
secrets:
grafana_admin_password:
gf_adminpasswd:
external: true
name: ${STACK_NAME}_grafana_admin_password_${SECRET_GRAFANA_ADMIN_PASSWORD_VERSION}
grafana_oidc_client_secret:
external: true
name: ${STACK_NAME}_grafana_oidc_client_secret_${SECRET_GRAFANA_OIDC_CLIENT_SECRET_VERSION}
grafana_smtp_password:
external: true
name: ${STACK_NAME}_grafana_smtp_password_${SECRET_GRAFANA_SMTP_PASSWORD_VERSION}
name: ${STACK_NAME}_gf_adminpasswd_${SECRET_GF_ADMINPASSWD_VERSION}

View File

@ -12,7 +12,7 @@ services:
volumes:
- loki-data:/loki
# secrets:
# - loki_aws_secret_access_key
# - loki_aws_key
environment:
- LOKI_ACCESS_KEY_ID
- LOKI_AWS_ENDPOINT
@ -27,6 +27,7 @@ services:
condition: on-failure
labels:
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.http.services.${STACK_NAME}-loki.loadbalancer.server.port=3100"
- "traefik.http.routers.${STACK_NAME}-loki.rule=Host(`loki.${DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-loki.entrypoints=web-secure"
@ -46,6 +47,6 @@ volumes:
loki-data:
# secrets:
# loki_aws_secret_access_key:
# loki_aws_key:
# external: true
# name: ${STACK_NAME}_loki_aws_secret_access_key_${SECRET_LOKI_AWS_SECRET_ACCESS_KEY_VERSION}
# name: ${STACK_NAME}_loki_aws_key_${SECRET_LOKI_AWS_KEY_VERSION}

View File

@ -0,0 +1,28 @@
version: '3.8'
services:
matrix-alertmanager-receiver:
image: metio/matrix-alertmanager-receiver:2025.2.9
secrets:
- matrix_token
configs:
- source: matrix-alertmanager-receiver-config
target: /etc/matrix-alertmanager-receiver/config.yml
networks:
- internal
environment:
- GF_MATRIX_USER_ID
- GF_MATRIX_ROOM_ID
- GF_MATRIX_HOMESERVER_URL
command: "--config-path=/etc/matrix-alertmanager-receiver/config.yml"
configs:
matrix-alertmanager-receiver-config:
template_driver: golang
name: ${STACK_NAME}_mar_config_${MATRIX_ALERTMANAGER_CONFIG_VERSION}
file: alertmanager-matrix-config.yml.tmpl
secrets:
matrix_token:
external: true
name: ${STACK_NAME}_matrix_token_${SECRET_MATRIX_TOKEN_VERSION}

View File

@ -24,12 +24,12 @@ services:
condition: on-failure
labels:
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.http.services.${STACK_NAME}-prometheus.loadbalancer.server.port=9090"
- "traefik.http.routers.${STACK_NAME}-prometheus.rule=Host(`prometheus.${DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-prometheus.entrypoints=web-secure"
- "traefik.http.routers.${STACK_NAME}-prometheus.tls=true"
- "traefik.http.routers.${STACK_NAME}-prometheus.tls.certresolver=${LETS_ENCRYPT_ENV}"
- "traefik.http.routers.${STACK_NAME}-prometheus.middlewares=basicauth@file"
configs:
prometheus_yml:

View File

@ -17,6 +17,7 @@ services:
condition: on-failure
labels:
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.http.services.${STACK_NAME}-pushgateway.loadbalancer.server.port=9191"
- "traefik.http.routers.${STACK_NAME}-pushgateway.rule=Host(`pushgateway.${DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-pushgateway.entrypoints=web-secure"

View File

@ -32,14 +32,15 @@ services:
labels:
- "backupbot.backup=${ENABLE_BACKUPS:-true}"
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.http.services.${STACK_NAME}-node.loadbalancer.server.port=9100"
- "traefik.http.routers.${STACK_NAME}-node.rule=Host(`node.${DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-node.entrypoints=web-secure"
- "traefik.http.routers.${STACK_NAME}-node.tls=true"
- "traefik.http.routers.${STACK_NAME}-node.tls.certresolver=${LETS_ENCRYPT_ENV}"
- "traefik.http.routers.${STACK_NAME}-node.middlewares=basicauth@file"
- "coop-cloud.${STACK_NAME}.version=1.3.0+v1.8.1"
- "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
- "coop-cloud.${STACK_NAME}.version=1.6.0+v1.8.1"
- "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT}"
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.49.2
@ -63,6 +64,7 @@ services:
condition: on-failure
labels:
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.http.services.${STACK_NAME}-cadvisor.loadbalancer.server.port=8080"
- "traefik.http.routers.${STACK_NAME}-cadvisor.rule=Host(`cadvisor.${DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-cadvisor.entrypoints=web-secure"

View File

@ -216,7 +216,7 @@
"datasourceUid": "PBFA97CFB590B2093",
"model": {
"editorMode": "code",
"expr": "(node_filesystem_free_bytes{fstype=\"ext4\",mountpoint=~\"(/$)|(/media.*)\"} / node_filesystem_size_bytes{fstype=\"ext4\",mountpoint=~\"(/$)|(/media.*)\"}) * 100",
"expr": "(node_filesystem_free_bytes{fstype=\"ext4\"} / node_filesystem_size_bytes{fstype=\"ext4\"}) * 100",
"instant": true,
"intervalMs": 1000,
"legendFormat": "__auto",

View File

@ -93,7 +93,6 @@
},
"hideTimeOverride": true,
"id": 2,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "value",
@ -107,10 +106,12 @@
"fields": "",
"values": false
},
"showPercentChange": false,
"text": {},
"textMode": "auto"
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"targets": [
{
"datasource": {
@ -172,7 +173,6 @@
"y": 0
},
"id": 1,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "value",
@ -186,10 +186,12 @@
"fields": "",
"values": false
},
"showPercentChange": false,
"text": {},
"textMode": "auto"
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"targets": [
{
"datasource": {
@ -251,7 +253,6 @@
},
"hideTimeOverride": true,
"id": 4,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "value",
@ -265,10 +266,12 @@
"fields": "",
"values": false
},
"showPercentChange": false,
"text": {},
"textMode": "auto"
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"targets": [
{
"datasource": {
@ -335,9 +338,10 @@
"y": 0
},
"id": 8,
"links": [],
"maxDataPoints": 100,
"options": {
"minVizHeight": 75,
"minVizWidth": 75,
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
@ -348,9 +352,10 @@
},
"showThresholdLabels": false,
"showThresholdMarkers": true,
"sizing": "auto",
"text": {}
},
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"targets": [
{
"datasource": {
@ -405,13 +410,12 @@
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"pointradius": 5,
"points": false,
"renderer": "flot",
@ -507,7 +511,6 @@
},
"hideTimeOverride": true,
"id": 3,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "value",
@ -521,10 +524,12 @@
"fields": "",
"values": false
},
"showPercentChange": false,
"text": {},
"textMode": "auto"
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"targets": [
{
"datasource": {
@ -585,7 +590,6 @@
},
"hideTimeOverride": true,
"id": 9,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "value",
@ -599,10 +603,12 @@
"fields": "",
"values": false
},
"showPercentChange": false,
"text": {},
"textMode": "auto"
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"targets": [
{
"datasource": {
@ -671,9 +677,10 @@
},
"hideTimeOverride": true,
"id": 11,
"links": [],
"maxDataPoints": 100,
"options": {
"minVizHeight": 75,
"minVizWidth": 75,
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
@ -684,9 +691,10 @@
},
"showThresholdLabels": false,
"showThresholdMarkers": true,
"sizing": "auto",
"text": {}
},
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"targets": [
{
"datasource": {
@ -713,7 +721,39 @@
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "left",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [
{
@ -747,33 +787,42 @@
},
"unit": "percent"
},
"overrides": []
"overrides": [
{
"matcher": {
"id": "byType",
"options": "time"
},
"properties": [
{
"id": "custom.axisPlacement",
"value": "hidden"
}
]
}
]
},
"gridPos": {
"h": 4,
"w": 2.6666666666666665,
"h": 6,
"w": 6,
"x": 0,
"y": 8
},
"id": 10,
"links": [],
"maxDataPoints": 100,
"maxPerRow": 12,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
"last"
],
"fields": "",
"values": false
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"textMode": "auto"
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"repeat": "node_id",
"repeatDirection": "h",
"targets": [
@ -782,18 +831,20 @@
"type": "prometheus",
"uid": "PBFA97CFB590B2093"
},
"editorMode": "code",
"exemplar": true,
"expr": "sum((node_filesystem_free_bytes{fstype=\"ext4\",mountpoint=~\"(/$)|(/media.*)\"} / node_filesystem_size_bytes{fstype=\"ext4\",mountpoint=~\"(/$)|(/media.*)\"}) * on(instance) group_left(node_name) node_meta{node_name=~\"$node_id\"} * 100) / count(node_meta * on(instance) group_left(node_name) node_meta{node_name=~\"$node_id\"})",
"expr": "node_filesystem_free_bytes{fstype=\"ext4\"} / node_filesystem_size_bytes{fstype=\"ext4\"} * on(instance) group_left(node_name) node_meta{node_name=~\"$node_id\"} * 100",
"format": "time_series",
"interval": "",
"intervalFactor": 2,
"legendFormat": "",
"legendFormat": "{{mountpoint}}",
"range": true,
"refId": "A",
"step": 20
}
],
"title": "Available Disk Space $node_id",
"type": "stat"
"type": "timeseries"
},
{
"aliasColors": {},
@ -811,7 +862,7 @@
"h": 7,
"w": 24,
"x": 0,
"y": 12
"y": 14
},
"hiddenSeries": false,
"id": 14,
@ -830,13 +881,12 @@
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null as zero",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"pointradius": 5,
"points": false,
"renderer": "flot",
@ -900,6 +950,7 @@
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
@ -913,6 +964,7 @@
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
@ -961,12 +1013,11 @@
},
"gridPos": {
"h": 7,
"w": 2.6666666666666665,
"w": 6,
"x": 0,
"y": 19
"y": 21
},
"id": 15,
"links": [],
"maxPerRow": 12,
"options": {
"legend": {
@ -1074,7 +1125,7 @@
"h": 7,
"w": 24,
"x": 0,
"y": 26
"y": 28
},
"hiddenSeries": false,
"id": 16,
@ -1091,13 +1142,12 @@
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null as zero",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "10.0.2",
"pluginVersion": "10.4.14",
"pointradius": 5,
"points": false,
"renderer": "flot",
@ -1178,7 +1228,7 @@
"h": 7,
"w": 12,
"x": 0,
"y": 33
"y": 35
},
"hiddenSeries": false,
"id": 18,
@ -1195,7 +1245,6 @@
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null as zero",
"options": {
"alertThreshold": true
@ -1281,7 +1330,7 @@
"h": 7,
"w": 12,
"x": 12,
"y": 33
"y": 35
},
"hiddenSeries": false,
"id": 19,
@ -1300,7 +1349,6 @@
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null as zero",
"options": {
"alertThreshold": true
@ -1376,7 +1424,7 @@
"h": 7,
"w": 18,
"x": 0,
"y": 40
"y": 42
},
"hiddenSeries": false,
"id": 12,
@ -1397,7 +1445,6 @@
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"options": {
"alertThreshold": true
@ -1499,10 +1546,9 @@
"h": 7,
"w": 6,
"x": 18,
"y": 40
"y": 42
},
"id": 7,
"links": [],
"maxDataPoints": 100,
"options": {
"colorMode": "none",
@ -1600,10 +1646,9 @@
"h": 7,
"w": 24,
"x": 0,
"y": 47
"y": 49
},
"id": 17,
"links": [],
"options": {
"legend": {
"calcs": [],
@ -1658,7 +1703,7 @@
"h": 9,
"w": 24,
"x": 0,
"y": 54
"y": 56
},
"id": 30,
"options": {
@ -1688,8 +1733,7 @@
}
],
"refresh": "",
"schemaVersion": 38,
"style": "dark",
"schemaVersion": 39,
"tags": [
"swarmprom",
"prometheus",
@ -1836,6 +1880,6 @@
"timezone": "",
"title": "Docker Swarm Nodes",
"uid": "BPlb-Sgik",
"version": 24,
"version": 7,
"weekStart": ""
}
}

View File

@ -21,7 +21,7 @@ tls_skip_verify_insecure = false
allow_sign_up = true
auto_login = true
client_id = {{ env "OIDC_CLIENT_ID" }}
client_secret = {{ secret "grafana_oidc_client_secret" }}
client_secret = {{ secret "gf_oidc_secret" }}
auth_url = {{ env "OIDC_AUTH_URL" }}
token_url = {{ env "OIDC_TOKEN_URL" }}
api_url = {{ env "OIDC_API_URL" }}

View File

@ -89,7 +89,7 @@ storage_config:
endpoint: {{ env "LOKI_AWS_ENDPOINT" }}
region: {{ env "LOKI_AWS_REGION" }}
access_key_id: {{ env "LOKI_ACCESS_KEY_ID" }}
secret_access_key: {{ secret "loki_aws_secret_access_key" }}
secret_access_key: {{ secret "loki_aws_key" }}
bucketnames: {{ env "LOKI_BUCKET_NAMES" }}
insecure: false
sse_encryption: false

1
release/1.4.0+v1.8.1 Normal file
View File

@ -0,0 +1 @@
Adds an optional GRAFANA_DOMAIN

1
release/1.5.0+v1.8.1 Normal file
View File

@ -0,0 +1 @@
Adds an optional matrix contact point for grafana

1
release/1.6.0+v1.8.1 Normal file
View File

@ -0,0 +1 @@
Adds option to expose ports for node and cadvisor service

10
release/next Normal file
View File

@ -0,0 +1,10 @@
1. OIDC was moved into a seperate compose file. If you have oidc configured you need to add the following line to you .env file:
COMPOSE_FILE="$COMPOSE_FILE:compose.grafana-oidc.yml"
2. SMTP was moved into a seperate compose file. If you have smtp configured you need to add the following line to you .env file:
COMPOSE_FILE="$COMPOSE_FILE:compose.grafana-smtp.yml"
4. Secret and config names were shortened to max 14 characters to prevent going over Docker's 64 character limit when STACK_NAME and VERSION are added to it.
When upgrading, you need to reinsert the secrets with their shorter names. Run `abra app secret list monitoring.example.org` to see which secrets aren't created on the server (because their name was shortened) and run `abra app secret insert monitoring.example.org <secret_name> v1 <value>` to reinsert them with the shorter name.