31 Commits

Author SHA1 Message Date
f
dd0a0c1bb0 fixup! feat: read syslog 2026-06-02 20:19:45 -03:00
f
31cabc36ae fix: prevent traefik deprecation warnings 2026-06-02 19:16:49 -03:00
f
d25986d5cb fix: README 2026-06-02 18:51:10 -03:00
f
f8f8004445 feat: read syslog 2026-06-02 18:50:41 -03:00
f
aa05d022da feat: optionally push to prometheus and loki 2026-06-02 18:50:20 -03:00
f
fb52a76247 BREAKING CHANGE: deprecate node-exporter 2026-06-02 18:49:05 -03:00
f
2e2a52eae0 BREAKING CHANGE: deprecate promtail 2026-06-02 18:48:20 -03:00
f
48419d5afa fixup! BREAKING CHANGE: no need to expose exporters 2026-06-02 18:46:02 -03:00
f
a0a6e2c509 fix: basic auth secret is always needed 2026-06-02 18:44:32 -03:00
f
024f2a8aec feat: send docker logs to loki 2026-06-02 18:39:24 -03:00
f
38095e23fa BREAKING CHANGE: no need to expose exporters 2026-06-02 18:37:56 -03:00
f
641161329e fix: grafana alternate domain doesn't work
the variable is not expanded and the domain name label ends up as a
literal "$DOMAIN".
2026-06-02 18:00:00 -03:00
f
cdacfd035e fix: prometheus querying panel is accessible through basic auth 2026-06-02 17:52:25 -03:00
f
b2d3901f61 fix: bind mounts recommended by docs 2026-06-02 13:24:28 -03:00
f
8becf1c1d6 fixup! feat: node exporter 2026-05-29 16:16:37 -03:00
f
777b1355dd fixup! feat: node exporter 2026-05-29 16:16:08 -03:00
f
e83433cebd feat: node exporter 2026-05-29 16:04:19 -03:00
f
a713f98ffb feat: instance name is domain 2026-05-29 16:03:59 -03:00
f
8dc84c591c fixup! feat: enable prometheus remote write receiver 2026-05-29 15:38:52 -03:00
f
d9aa05a4b5 feat: send metrics to prometheus 2026-05-28 21:00:10 -03:00
f
349df12204 feat: enable prometheus remote write receiver 2026-05-28 20:44:00 -03:00
f
6c33089078 feat: cadvisor 2026-05-28 20:38:50 -03:00
f
4bedebfab1 BREAKING CHANGES: replace promtail and cadvisor for alloy 2026-05-28 20:33:36 -03:00
dd320e9f1c fix: Shorten all secret and config names to max 14 characters (#13)
Reviewed-on: #13
Reviewed-by: p4u1 <p4u1@noreply.git.coopcloud.tech>
Reviewed-by: moritz <moritz@noreply.git.coopcloud.tech>
2026-05-11 15:38:15 +00:00
9cb997b25a delete_request_store based on env variable 2026-04-09 04:36:03 +00:00
48d137d194 update loki config file 2026-04-09 04:36:03 +00:00
1acb5ebd6a chore: update image tags 2026-04-09 04:36:03 +00:00
682f30cef1 Add migrate_secret_names() to abra.sh to reinsert all secrets with shortened names in docker 2026-03-25 16:11:37 +01:00
694c8a9875 Add instructions for shorter secret names to release notes 2026-03-25 16:11:28 +01:00
9dfa9cad2a Shortened all the secret and config names to max 14 characters to prevent running into Docker's 64 character limit when STACK_NAME is appended to it. 2026-03-25 15:58:28 +01:00
99f8790ec4 fix: Update scape-config example to use HTTPS for Traefik metrics (#17)
This fixes the insecure Traefik metrics endpoint. See coop-cloud/traefik#94 for details.

Reviewed-on: #17
Co-authored-by: Danny Groenewegen <mail@dannygroenewegen.nl>
Co-committed-by: Danny Groenewegen <mail@dannygroenewegen.nl>
2026-03-24 09:37:05 +00:00
22 changed files with 309 additions and 259 deletions

View File

@ -5,15 +5,20 @@ DOMAIN=monitoring-ng.example.com
#TIMEOUT=120
ENABLE_BACKUPS=true
## Enable this secret for Promtail / Prometheus
# SECRET_BASIC_AUTH_VERSION=v1
#
# Promtail (Gathering Logs)
# COMPOSE_FILE="$COMPOSE_FILE:compose.promtail.yml"
# LOKI_PUSH_URL=https://loki.monitoring.example.org/loki/api/v1/push
SECRET_BASIC_AUTH_VERSION=v1
# Enable this to send logs to a Loki server, adapt DOMAIN if server is
# remote
# LOKI_PUSH_URL=https://loki.$DOMAIN/loki/api/v1/push
# Enable this on SystemD hosts to read logs
# JOURNALD=1
# Enable this on syslogd hosts and configure the syslogd to send logs to
# Alloy on port 514/tcp
# SYSLOG=1
# COMPOSE_FILE="$COMPOSE_FILE:compose.syslog.yml"
## Expose node and cadvisor ports instead of traefik
# COMPOSE_FILE="$COMPOSE_FILE:compose.expose-ports.yml"
# Enable this to send metrics to a Prometheus server, adapt DOMAIN if
# server is remote
# PROMETHEUS_REMOTE_WRITE_URL=https://prometheus.$DOMAIN/api/v1/write
# Monitoring Server
#
@ -39,20 +44,18 @@ ENABLE_BACKUPS=true
# LOKI_AWS_REGION=eu-west-1
# LOKI_ACCESS_KEY_ID=bush-debrief-approval-robust-scraggly-molecule
# LOKI_BUCKET_NAMES=loki
# SECRET_LOKI_AWS_SECRET_ACCESS_KEY_VERSION=v1
# SECRET_LOKI_AWS_KEY_VERSION=v1
#
## Grafana
#
# COMPOSE_FILE="$COMPOSE_FILE:compose.grafana.yml"
# GF_SERVER_ROOT_URL=https://monitoring.example.com
# SECRET_GRAFANA_ADMIN_PASSWORD_VERSION=v1
## Seperate domain for Grafana
#GRAFANA_DOMAIN=grafana.example.com
# SECRET_GF_ADMINPASSWD_VERSION=v1
#
## Single-Sign-On with OIDC
# COMPOSE_FILE="$COMPOSE_FILE:compose.grafana-oidc.yml"
# OIDC_ENABLED=1
# SECRET_GRAFANA_OIDC_CLIENT_SECRET_VERSION=v1
# SECRET_GF_OIDC_SECRET_VERSION=v1
# OIDC_CLIENT_ID=grafana
# OIDC_AUTH_URL="https://authentik.example.com/application/o/authorize/"
# OIDC_API_URL="https://authentik.example.com/application/o/userinfo/"
@ -69,12 +72,12 @@ ENABLE_BACKUPS=true
# GF_SMTP_ENABLED=true
# GF_SMTP_FROM_ADDRESS=grafana@example.com
# GF_SMTP_SKIP_VERIFY=false
# SECRET_GRAFANA_SMTP_PASSWORD_VERSION=v1
# SECRET_GF_SMTP_PASSWD_VERSION=v1
#
## Grafana Matrix Contact Point (optional)
#COMPOSE_FILE="$COMPOSE_FILE:compose.matrix-alertmanager-receiver.yml"
#SECRET_MATRIX_ACCESS_TOKEN_VERSION=v1
#SECRET_MATRIX_TOKEN_VERSION=v1
#GF_MATRIX_USER_ID="<user-id>"
#GF_MATRIX_ROOM_ID="<room-id>"
#GF_MATRIX_HOMESERVER_URL="<homeserver-url>"

View File

@ -1,8 +1,8 @@
# monitoring-ng
Yet another monitoring stack ...
This time its a all-in-one grafana/prometheus/loki/node_exporter/cadvisor/promtail stack.
It's based heavily on the [monitoring-lite](https://git.coopcloud.tech/coop-cloud/monitoring-lite) stack, but has everything in one recipe included now. So you can deploy monitoring instances to only gather metrics / logs (node_exporter/cadvisor/promtail) and also deploy instances with the full monitoring stack (grafana/prometheus/loki) with the same recipe and just different .env configuration.
This time its a all-in-one grafana/prometheus/loki/alloy stack.
It's based heavily on the [monitoring-lite](https://git.coopcloud.tech/coop-cloud/monitoring-lite) stack, but has everything in one recipe included now. So you can deploy monitoring instances to only gather metrics / logs (alloy) and also deploy instances with the full monitoring stack (grafana/prometheus/loki) with the same recipe and just different .env configuration.
<!-- metadata -->
@ -47,13 +47,6 @@ Where gathering.org is the node you want to gather metrics from.
- cadvisor.monitoring.gathering.org
- node.monitoring.gathering.org
### Expose node and cadvisor via ports instead of traefik
In case you have no traefik running on the machine, you can expose the ports directly by uncommenting the following line:
```
# COMPOSE_FILE="$COMPOSE_FILE:compose.expose-ports.yml"
```
## Setup Metrics Browser
@ -145,7 +138,7 @@ COMPOSE_FILE="$COMPOSE_FILE:compose.matrix-alertmanager-receiver.yml"
2. Insert the matrix access token secret:
```
abra app secret insert monitoring.marx.klasse-methode.it matrix_access_token v1
abra app secret insert monitoring.marx.klasse-methode.it matrix_token v1
```
3. Set required configurations:

115
abra.sh
View File

@ -1,27 +1,122 @@
export ENTRYPOINT_VERSION=v1
export GRAFANA_DATASOURCES_YML_VERSION=v1
export GRAFANA_DASHBOARDS_YML_VERSION=v2
export GRAFANA_SWARM_DASHBOARD_JSON_VERSION=v2
export GRAFANA_STACKS_DASHBOARD_JSON_VERSION=v2
export GRAFANA_TRAEFIK_DASHBOARD_JSON_VERSION=v2
export GRAFANA_BACKUP_DASHBOARD_JSON_VERSION=v1
export GRAFANA_CUSTOM_INI_VERSION=v4
export PROMTAIL_YML_VERSION=v3
export LOKI_YML_VERSION=v2
export GF_DATASOURCES_VERSION=v1
export GF_DASHBOARDS_VERSION=v2
export GF_SWARM_DASH_VERSION=v2
export GF_STACKS_DASH_VERSION=v2
export GF_TRAEFIK_DASH_VERSION=v2
export GF_BACKUP_DASH_VERSION=v1
export GF_CUSTOM_INI_VERSION=v4
export LOKI_YML_VERSION=v3
export PROMETHEUS_YML_VERSION=v2
export MATRIX_ALERTMANAGER_CONFIG_VERSION=e
export MATRIX_ALERTMANAGER_ENTRYPOINT_VERSION=a
export GRAFANA_ALERTS_NODE_VERSION=v1c
export CONFIG_ALLOY_VERSION=v8
# creates a default prometheus scrape config for a given node
add_node(){
name=$1
add_domain "$name" "$name:8082"
add_domain "$name" "metrics.traefik.$name"
add_domain "$name" "node.monitoring.$name"
add_domain "$name" "cadvisor.monitoring.$name"
cat "/prometheus/scrape_configs/$name.yml"
}
# migrates secrets from old names to new names by reading values from the
# running containers on the server and re-inserting them under the new names.
# preview changes: abra app cmd --local <app> migrate_secret_names
# execute changes: abra app cmd --local <app> migrate_secret_names execute
migrate_secret_names() {
if ! command -v jq &> /dev/null; then
echo "jq is required on your local machine to migrate secret names"
echo "It could not be found in your PATH, please install jq to proceed."
echo "For example: On a debian/ubuntu system, run `apt install jq`"
exit 1
fi
# Hardcoded migration mappings: old_secret_name|new_secret_name
MIGRATIONS="
grafana_admin_password|gf_adminpasswd
grafana_smtp_password|gf_smtp_passwd
grafana_oidc_client_secret|gf_oidc_secret
matrix_access_token|matrix_token
loki_aws_secret_access_key|loki_aws_key
"
# Determine which server the app is deployed on
SERVER=$(abra app ls -m | jq -r --arg domain "$APP_NAME" '[.[].apps[] | select(.domain == $domain) | .server] | first' 2>/dev/null)
if [ -z "$SERVER" ]; then
echo "Error: could not determine server for app '$APP_NAME'"
exit 1
fi
# Build a lookup table of all secrets currently mounted in this stack.
# Each line: <secretID> <containerID> <secretName>
LOOKUP=$(ssh "$SERVER" "
docker stack services ${STACK_NAME} --format '{{.Name}}' | while read svc; do
CID=\$(docker ps --no-trunc -q --filter \"name=\${svc}\" | head -1)
docker service inspect \"\$svc\" --format '{{json .Spec.TaskTemplate.ContainerSpec.Secrets}}' | \
jq -r --arg cid \"\$CID\" '.[]? | .SecretID + \" \" + \$cid + \" \" + .SecretName'
done | sort -k3 -r
" 2>/dev/null)
echo "Secret migration plan for: $APP_NAME (server: $SERVER)"
echo ""
printf " %-24s %-8s %s\n" "OLD NAME" "FOUND" "ACTION"
printf " %-24s %-8s %s\n" "--------" "-----" "------"
# Check each old name against the lookup table and display the plan
ANY_FOUND=false
while IFS='|' read -r OLD_NAME NEW_NAME; do
[ -z "$OLD_NAME" ] && continue
MATCH=$(echo "$LOOKUP" | grep " ${STACK_NAME}_${OLD_NAME}_" | head -1)
if [ -n "$MATCH" ]; then
printf " %-24s %-8s %s\n" "$OLD_NAME" "yes" "recreate as '$NEW_NAME' version V1"
ANY_FOUND=true
else
printf " %-24s %-8s %s\n" "$OLD_NAME" "no" "nothing (not found on server)"
fi
done <<< "$MIGRATIONS"
echo ""
if [ "$ANY_FOUND" = false ]; then
echo "No old secrets found on server. Nothing to migrate."
return 0
fi
if [ "$1" != "execute" ]; then
echo "To apply the above changes, run:"
echo " abra app cmd --local $APP_NAME migrate_secret_names execute"
return 0
fi
# read each found secret from its container and re-insert with the new name
while IFS='|' read -r OLD_NAME NEW_NAME; do
[ -z "$OLD_NAME" ] && continue
MATCH=$(echo "$LOOKUP" | grep " ${STACK_NAME}_${OLD_NAME}_" | head -1)
[ -z "$MATCH" ] && continue
SECRET_ID=$(echo "$MATCH" | awk '{print $1}')
CID=$(echo "$MATCH" | awk '{print $2}')
SECRET_VALUE=$(ssh "$SERVER" "cat /var/lib/docker/containers/${CID}/mounts/secrets/${SECRET_ID} 2>/dev/null || sudo cat /var/lib/docker/containers/${CID}/mounts/secrets/${SECRET_ID} 2>/dev/null")
if [ -z "$SECRET_VALUE" ]; then
echo "Error: could not read value for '$OLD_NAME', skipping"
continue
fi
echo "Migrating: '$OLD_NAME' -> '$NEW_NAME' (v1)"
printf '%s' "$SECRET_VALUE" | abra app secret insert -C "$APP_NAME" "$NEW_NAME" v1
done <<< "$MIGRATIONS"
echo ""
echo "Done."
}
# adds a domain to a scrape config or creates a new one
add_domain(){
name=$1

View File

@ -12,7 +12,7 @@ http:
matrix:
homeserver-url: "{{ env "GF_MATRIX_HOMESERVER_URL" }}"
user-id: "{{ env "GF_MATRIX_USER_ID" }}"
access-token: "{{ secret "matrix_access_token" }}"
access-token: "{{ secret "matrix_token" }}"
room-mapping:
matrixroom: "{{ env "GF_MATRIX_ROOM_ID" }}"

View File

@ -1,13 +0,0 @@
---
version: "3.8"
services:
app:
ports:
- "9100:9100"
deploy:
cadvisor:
ports:
- "9101:8080"
deploy:

View File

@ -3,7 +3,7 @@ version: '3.8'
services:
grafana:
secrets:
- grafana_oidc_client_secret
- gf_oidc_secret
environment:
- OIDC_API_URL
- OIDC_AUTH_URL
@ -12,6 +12,6 @@ services:
- OIDC_TOKEN_URL
secrets:
grafana_oidc_client_secret:
gf_oidc_secret:
external: true
name: ${STACK_NAME}_grafana_oidc_client_secret_${SECRET_GRAFANA_OIDC_CLIENT_SECRET_VERSION}
name: ${STACK_NAME}_gf_oidc_secret_${SECRET_GF_OIDC_SECRET_VERSION}

View File

@ -3,16 +3,16 @@ version: '3.8'
services:
grafana:
secrets:
- grafana_smtp_password
- gf_smtp_passwd
environment:
- GF_SMTP_HOST
- GF_SMTP_USER
- GF_SMTP_PASSWORD__FILE=/run/secrets/grafana_smtp_password
- GF_SMTP_PASSWORD__FILE=/run/secrets/gf_smtp_passwd
- GF_SMTP_ENABLED
- GF_SMTP_FROM_ADDRESS
- GF_SMTP_SKIP_VERIFY
secrets:
grafana_smtp_password:
gf_smtp_passwd:
external: true
name: ${STACK_NAME}_grafana_smtp_password_${SECRET_GRAFANA_SMTP_PASSWORD_VERSION}
name: ${STACK_NAME}_gf_smtp_passwd_${SECRET_GF_SMTP_PASSWD_VERSION}

View File

@ -2,25 +2,25 @@ version: '3.8'
services:
grafana:
image: grafana/grafana:10.4.14
image: grafana/grafana:12.4.0
volumes:
- grafana-data:/var/lib/grafana:rw
secrets:
- grafana_admin_password
- gf_adminpasswd
configs:
- source: grafana_custom_ini
- source: gf_custom_ini
target: /etc/grafana/grafana.ini
- source: grafana_datasources_yml
- source: gf_datasources
target: /etc/grafana/provisioning/datasources/datasources.yml
- source: grafana_dashboards_yml
- source: gf_dashboards
target: /etc/grafana/provisioning/dashboards/dashboards.yml
- source: grafana_swarm_dashboard_json
- source: gf_swarm_dash
target: /var/lib/grafana/dashboards/docker-swarm-nodes.json
- source: grafana_stacks_dashboard_json
- source: gf_stacks_dash
target: /var/lib/grafana/dashboards/docker-swarm-stacks.json
- source: grafana_traefik_dashboard_json
- source: gf_traefik_dash
target: /var/lib/grafana/dashboards/traefik.json
- source: grafana_backup_dashboard_json
- source: gf_backup_dash
target: /var/lib/grafana/dashboards/backup.json
- source: gf_alerts_node
target: /etc/grafana/provisioning/alerting/node.yml
@ -29,7 +29,7 @@ services:
- internal
environment:
- GF_SERVER_ROOT_URL
- GF_SECURITY_ADMIN_PASSWORD__FILE=/run/secrets/grafana_admin_password
- GF_SECURITY_ADMIN_PASSWORD__FILE=/run/secrets/gf_adminpasswd
- GF_SECURITY_ALLOW_EMBEDDING
- GF_INSTALL_PLUGINS
- ALERT_NODE_DISK_SPACE_ENABLED
@ -37,9 +37,9 @@ services:
deploy:
labels:
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.swarm.network=proxy"
- "traefik.http.services.${STACK_NAME}-grafana.loadbalancer.server.port=3000"
- "traefik.http.routers.${STACK_NAME}-grafana.rule=Host(`${GRAFANA_DOMAIN:-$DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-grafana.rule=Host(`${DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-grafana.entrypoints=web-secure"
- "traefik.http.routers.${STACK_NAME}-grafana.tls=true"
- "traefik.http.routers.${STACK_NAME}-grafana.tls.certresolver=${LETS_ENCRYPT_ENV}"
@ -51,27 +51,27 @@ services:
start_period: 10s
configs:
grafana_custom_ini:
gf_custom_ini:
template_driver: golang
name: ${STACK_NAME}_grafana_custom_ini_${GRAFANA_CUSTOM_INI_VERSION}
name: ${STACK_NAME}_gf_custom_ini_${GF_CUSTOM_INI_VERSION}
file: grafana_custom.ini
grafana_datasources_yml:
name: ${STACK_NAME}_g_datasources_yml_${GRAFANA_DATASOURCES_YML_VERSION}
gf_datasources:
name: ${STACK_NAME}_gf_datasources_${GF_DATASOURCES_VERSION}
file: grafana-datasources.yml
grafana_dashboards_yml:
name: ${STACK_NAME}_g_dashboards_yml_${GRAFANA_DASHBOARDS_YML_VERSION}
gf_dashboards:
name: ${STACK_NAME}_gf_dashboards_${GF_DASHBOARDS_VERSION}
file: grafana-dashboards.yml
grafana_swarm_dashboard_json:
name: ${STACK_NAME}_g_swarm_dashboard_json_${GRAFANA_SWARM_DASHBOARD_JSON_VERSION}
gf_swarm_dash:
name: ${STACK_NAME}_gf_swarm_dash_${GF_SWARM_DASH_VERSION}
file: grafana-swarm-dashboard.json
grafana_stacks_dashboard_json:
name: ${STACK_NAME}_g_stacks_dashboard_json_${GRAFANA_STACKS_DASHBOARD_JSON_VERSION}
gf_stacks_dash:
name: ${STACK_NAME}_gf_stacks_dash_${GF_STACKS_DASH_VERSION}
file: grafana-stacks-dashboard.json
grafana_traefik_dashboard_json:
name: ${STACK_NAME}_g_traefik_dashboard_json_${GRAFANA_TRAEFIK_DASHBOARD_JSON_VERSION}
gf_traefik_dash:
name: ${STACK_NAME}_gf_traefik_dash_${GF_TRAEFIK_DASH_VERSION}
file: grafana-traefik-dashboard.json
grafana_backup_dashboard_json:
name: ${STACK_NAME}_g_backup_dashboard_json_${GRAFANA_BACKUP_DASHBOARD_JSON_VERSION}
gf_backup_dash:
name: ${STACK_NAME}_gf_backup_dash_${GF_BACKUP_DASH_VERSION}
file: grafana-backup-dashboard.json
gf_alerts_node:
template_driver: golang
@ -83,6 +83,6 @@ volumes:
secrets:
grafana_admin_password:
gf_adminpasswd:
external: true
name: ${STACK_NAME}_grafana_admin_password_${SECRET_GRAFANA_ADMIN_PASSWORD_VERSION}
name: ${STACK_NAME}_gf_adminpasswd_${SECRET_GF_ADMINPASSWD_VERSION}

View File

@ -2,7 +2,7 @@ version: '3.8'
services:
loki:
image: grafana/loki:2.9.11
image: grafana/loki:3.6.7
command: -config.file=/etc/loki/local-config.yaml
networks:
- proxy
@ -12,7 +12,7 @@ services:
volumes:
- loki-data:/loki
# secrets:
# - loki_aws_secret_access_key
# - loki_aws_key
environment:
- LOKI_ACCESS_KEY_ID
- LOKI_AWS_ENDPOINT
@ -27,7 +27,7 @@ services:
condition: on-failure
labels:
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.swarm.network=proxy"
- "traefik.http.services.${STACK_NAME}-loki.loadbalancer.server.port=3100"
- "traefik.http.routers.${STACK_NAME}-loki.rule=Host(`loki.${DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-loki.entrypoints=web-secure"
@ -47,6 +47,6 @@ volumes:
loki-data:
# secrets:
# loki_aws_secret_access_key:
# loki_aws_key:
# external: true
# name: ${STACK_NAME}_loki_aws_secret_access_key_${SECRET_LOKI_AWS_SECRET_ACCESS_KEY_VERSION}
# name: ${STACK_NAME}_loki_aws_key_${SECRET_LOKI_AWS_KEY_VERSION}

View File

@ -2,9 +2,9 @@ version: '3.8'
services:
matrix-alertmanager-receiver:
image: metio/matrix-alertmanager-receiver:2025.2.9
image: metio/matrix-alertmanager-receiver:2026.2.25
secrets:
- matrix_access_token
- matrix_token
configs:
- source: matrix-alertmanager-receiver-config
target: /etc/matrix-alertmanager-receiver/config.yml
@ -23,6 +23,6 @@ configs:
file: alertmanager-matrix-config.yml.tmpl
secrets:
matrix_access_token:
matrix_token:
external: true
name: ${STACK_NAME}_matrix_access_token_${SECRET_MATRIX_ACCESS_TOKEN_VERSION}
name: ${STACK_NAME}_matrix_token_${SECRET_MATRIX_TOKEN_VERSION}

View File

@ -2,7 +2,7 @@ version: '3.8'
services:
prometheus:
image: prom/prometheus:v2.55.1
image: prom/prometheus:v3.10.0
secrets:
- basic_auth
volumes:
@ -16,6 +16,8 @@ services:
- "--web.console.libraries=/usr/share/prometheus/console_libraries"
- "--web.console.templates=/usr/share/prometheus/consoles"
- "--storage.tsdb.retention.time=${PROMETHEUS_RETENTION_TIME}"
- "--enable-feature=remote-write-receiver"
- "--web.enable-remote-write-receiver"
networks:
- proxy
- internal
@ -24,12 +26,13 @@ services:
condition: on-failure
labels:
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.swarm.network=proxy"
- "traefik.http.services.${STACK_NAME}-prometheus.loadbalancer.server.port=9090"
- "traefik.http.routers.${STACK_NAME}-prometheus.rule=Host(`prometheus.${DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-prometheus.entrypoints=web-secure"
- "traefik.http.routers.${STACK_NAME}-prometheus.tls=true"
- "traefik.http.routers.${STACK_NAME}-prometheus.tls.certresolver=${LETS_ENCRYPT_ENV}"
- "traefik.http.routers.${STACK_NAME}-prometheus.middlewares=basicauth@file"
configs:
prometheus_yml:

View File

@ -1,30 +0,0 @@
version: "3.8"
services:
promtail:
image: grafana/promtail:2.9.11
volumes:
- /var/log:/var/log:ro
- /var/run/docker.sock:/var/run/docker.sock
command: -config.file=/etc/promtail/config.yml
configs:
- source: promtail_yml
target: /etc/promtail/config.yml
networks:
- internal
secrets:
- basic_auth
environment:
- DOMAIN
- LOKI_PUSH_URL
configs:
promtail_yml:
name: ${STACK_NAME}_promtail_yml_${PROMTAIL_YML_VERSION}
file: promtail.yml.tmpl
template_driver: golang
secrets:
basic_auth:
external: true
name: ${STACK_NAME}_basic_auth_${SECRET_BASIC_AUTH_VERSION}

View File

@ -2,7 +2,7 @@ version: '3.8'
services:
pushgateway:
image: prom/pushgateway:v1.10.0
image: prom/pushgateway:v1.11.2
command:
- '--web.listen-address=:9191'
- '--push.disable-consistency-check'
@ -17,7 +17,7 @@ services:
condition: on-failure
labels:
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.swarm.network=proxy"
- "traefik.http.services.${STACK_NAME}-pushgateway.loadbalancer.server.port=9191"
- "traefik.http.routers.${STACK_NAME}-pushgateway.rule=Host(`pushgateway.${DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-pushgateway.entrypoints=web-secure"

6
compose.syslog.yml Normal file
View File

@ -0,0 +1,6 @@
---
version: "3.8"
services:
app:
ports:
- "514:514"

View File

@ -3,89 +3,46 @@ version: "3.8"
services:
app:
image: prom/node-exporter:v1.8.1
user: root
environment:
- NODE_ID={{.Node.ID}}
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
- /etc/hostname:/etc/nodename:ro
command:
- "--path.sysfs=/host/sys"
- "--path.procfs=/host/proc"
- "--path.rootfs=/rootfs"
- "--collector.textfile.directory=/etc/node-exporter/"
- "--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|host|etc)($$|/)"
- "--no-collector.ipvs"
image: grafana/alloy:v1.16.1
hostname: "${DOMAIN}"
configs:
- source: entrypoint
target: /entrypoint.sh
- source: config_alloy
target: /etc/alloy/config.alloy
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /var/run/docker.sock:/var/run/docker.sock
- /sys:/sys:ro
- /var/lib/docker:/var/lib/docker:ro
- /dev:/dev:ro
- alloy-data:/var/lib/alloy/data
command:
- "run"
- "--storage.path=/var/lib/alloy/data"
- "/etc/alloy/config.alloy"
networks:
- internal
- proxy
entrypoint: [ "/bin/sh", "-e", "/entrypoint.sh" ]
secrets:
- basic_auth
deploy:
restart_policy:
condition: on-failure
labels:
- "backupbot.backup=${ENABLE_BACKUPS:-true}"
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.http.services.${STACK_NAME}-node.loadbalancer.server.port=9100"
- "traefik.http.routers.${STACK_NAME}-node.rule=Host(`node.${DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-node.entrypoints=web-secure"
- "traefik.http.routers.${STACK_NAME}-node.tls=true"
- "traefik.http.routers.${STACK_NAME}-node.tls.certresolver=${LETS_ENCRYPT_ENV}"
- "traefik.http.routers.${STACK_NAME}-node.middlewares=basicauth@file"
- "traefik.enable=false"
- "coop-cloud.${STACK_NAME}.version=1.6.0+v1.8.1"
- "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT}"
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.49.2
command:
- "-logtostderr"
- "--enable_metrics=cpu,cpuLoad,disk,diskIO,process,memory,network"
# all possible metrics: advtcp,app,cpu,cpuLoad,cpu_topology,cpuset,disk,diskIO,hugetlb,memory,memory_numa,network,oom_event,percpu,perf_event,process,referenced_memory,resctrl,sched,tcp,udp.
- "--housekeeping_interval=120s"
- "--docker_only=true"
volumes:
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
- /sys:/sys:ro
- /var/run:/var/run:ro
- /:/rootfs:ro
networks:
- internal
- proxy
deploy:
restart_policy:
condition: on-failure
labels:
- "traefik.enable=true"
- "traefik.docker.network=proxy"
- "traefik.http.services.${STACK_NAME}-cadvisor.loadbalancer.server.port=8080"
- "traefik.http.routers.${STACK_NAME}-cadvisor.rule=Host(`cadvisor.${DOMAIN}`)"
- "traefik.http.routers.${STACK_NAME}-cadvisor.entrypoints=web-secure"
- "traefik.http.routers.${STACK_NAME}-cadvisor.tls=true"
- "traefik.http.routers.${STACK_NAME}-cadvisor.tls.certresolver=${LETS_ENCRYPT_ENV}"
- "traefik.http.routers.${STACK_NAME}-cadvisor.middlewares=basicauth@file"
healthcheck:
test: wget --quiet --tries=1 --spider http://localhost:8080/healthz || exit 1
interval: 15s
timeout: 15s
retries: 5
start_period: 30s
configs:
entrypoint:
name: ${STACK_NAME}_entrypoint_${ENTRYPOINT_VERSION}
file: node-exporter-entrypoint.sh
config_alloy:
template_driver: golang
name: ${STACK_NAME}_config_alloy_${CONFIG_ALLOY_VERSION}
file: config.alloy.tmpl
networks:
proxy:
external: true
internal:
volumes:
alloy-data:
secrets:
basic_auth:
external: true
name: ${STACK_NAME}_basic_auth_${SECRET_BASIC_AUTH_VERSION}

80
config.alloy.tmpl Normal file
View File

@ -0,0 +1,80 @@
logging {
level = "info"
format = "logfmt"
}
discovery.docker "linux" {
host = "unix:///var/run/docker.sock"
}
{{ if ne (env "PROMETHEUS_REMOTE_WRITE_URL") "" }}
prometheus.exporter.cadvisor "docker" {
}
prometheus.exporter.unix "default" {
include_exporter_metrics = true
rootfs_path = "/rootfs"
}
prometheus.scrape "default" {
targets = array.concat(
[{
job = "alloy",
__address__ = "127.0.0.1:12345",
}],
prometheus.exporter.unix.default.targets,
prometheus.exporter.cadvisor.docker.targets,
)
forward_to = [prometheus.remote_write.prometheus.receiver]
}
prometheus.remote_write "prometheus" {
endpoint {
url = "{{ env "PROMETHEUS_REMOTE_WRITE_URL" }}"
basic_auth {
username = "admin"
password = "{{ secret "basic_auth" }}"
}
}
}
{{ end }}
{{ if ne (env "LOKI_PUSH_URL") "" }}
loki.source.docker "docker" {
host = "unix:///var/run/docker.sock"
targets = discovery.docker.linux.targets
labels = {"app" = "docker"}
forward_to = [loki.write.loki.receiver]
}
{{ if eq (env "JOURNALD") "1" }}
loki.source.journal "journal" {
path = "/var/log/journal"
labels = { job = "{{ env "DOMAIN" }}" }
forward_to = [loki.write.loki.receiver]
}
{{ end }}
{{ if eq (env "SYSLOG") "1" }}
loki.source.syslog "syslog" {
listener {
address = "[::1]:514"
}
forward_to = [loki.write.loki.receiver]
}
{{ end }}
loki.write "loki" {
endpoint {
url = "{{ env "LOKI_PUSH_URL" }}"
basic_auth {
username = "admin"
password = "{{ secret "basic_auth" }}"
}
}
}
{{ end }}

View File

@ -21,7 +21,7 @@ tls_skip_verify_insecure = false
allow_sign_up = true
auto_login = true
client_id = {{ env "OIDC_CLIENT_ID" }}
client_secret = {{ secret "grafana_oidc_client_secret" }}
client_secret = {{ secret "gf_oidc_secret" }}
auth_url = {{ env "OIDC_AUTH_URL" }}
token_url = {{ env "OIDC_TOKEN_URL" }}
api_url = {{ env "OIDC_API_URL" }}

View File

@ -34,7 +34,6 @@ ingester:
max_chunk_age: 1h # All chunks will be flushed when they hit this age, default is 1h
chunk_target_size: 1048576 # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
chunk_retain_period: 30s # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
max_transfer_retries: 0 # Chunk transfers disabled
wal:
dir: "/tmp/wal"
@ -53,7 +52,7 @@ schema_config:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
schema: v13
index:
prefix: index_
period: 24h
@ -63,7 +62,6 @@ storage_config:
active_index_directory: /loki/boltdb-shipper-active
cache_location: /loki/boltdb-shipper-cache
cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
shared_store: filesystem
filesystem:
directory: /loki/chunks
{{ end }}
@ -72,7 +70,6 @@ schema_config:
configs:
- from: 2020-11-25
store: boltdb-shipper
object_store: aws
schema: v11
index:
prefix: index_
@ -89,7 +86,7 @@ storage_config:
endpoint: {{ env "LOKI_AWS_ENDPOINT" }}
region: {{ env "LOKI_AWS_REGION" }}
access_key_id: {{ env "LOKI_ACCESS_KEY_ID" }}
secret_access_key: {{ secret "loki_aws_secret_access_key" }}
secret_access_key: {{ secret "loki_aws_key" }}
bucketnames: {{ env "LOKI_BUCKET_NAMES" }}
insecure: false
sse_encryption: false
@ -103,19 +100,24 @@ storage_config:
compactor:
working_directory: /loki/boltdb-shipper-compactor
shared_store: filesystem
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
{{ if eq (env "LOKI_STORAGE_FILESYSTEM") "1" }}
delete_request_store: filesystem
{{ end }}
{{ if eq (env "LOKI_STORAGE_S3") "1" }}
delete_request_store: aws
{{ end }}
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
retention_period: {{ env "LOKI_RETENTION_PERIOD" }}
split_queries_by_interval: 24h
max_query_parallelism: 100
allow_structured_metadata: false
query_scheduler:
max_outstanding_requests_per_tenant: 4096
@ -123,9 +125,6 @@ query_scheduler:
frontend:
max_outstanding_per_tenant: 4096
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: false
retention_period: 0s

View File

@ -1,11 +0,0 @@
#!/bin/sh -e
NODE_NAME=$(cat /etc/nodename)
mkdir -p /etc/node-exporter
echo "node_meta{node_id=\"$NODE_ID\", container_label_com_docker_swarm_node_id=\"$NODE_ID\", node_name=\"$NODE_NAME\"} 1" > /etc/node-exporter/node-meta.prom
set -- /bin/node_exporter "$@"
exec "$@"

View File

@ -1,37 +0,0 @@
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: {{ env "LOKI_PUSH_URL" }}
basic_auth:
username: admin
password: {{ secret "basic_auth" }}
external_labels:
hostname: {{ env "DOMAIN" }}
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: varlogs
__path__: /var/log/*log
- job_name: "docker"
docker_sd_configs:
- host: "unix:///var/run/docker.sock"
refresh_interval: "10s"
relabel_configs:
- source_labels: ['__meta_docker_container_name']
target_label: "container_name"
- source_labels: ['__meta_docker_container_id']
target_label: "container_id"
- source_labels: ['__meta_docker_container_label_com_docker_stack_namespace']
target_label: "stack_namespace"
- source_labels: ['__meta_docker_container_label_com_docker_swarm_service_name']
target_label: "service_name"

View File

@ -5,3 +5,8 @@ COMPOSE_FILE="$COMPOSE_FILE:compose.grafana-oidc.yml"
2. SMTP was moved into a seperate compose file. If you have smtp configured you need to add the following line to you .env file:
COMPOSE_FILE="$COMPOSE_FILE:compose.grafana-smtp.yml"
3. The scrape-config.example.yml file and add_node() command were updated to use a secure endpoint for the traefik metrics instead of http. This requires an updated Traefik recipe that publishes the metrics on https.
4. Secret and config names were shortened to max 14 characters to prevent going over Docker's 64 character limit when STACK_NAME and VERSION are added to it.
When upgrading, you need to reinsert the secrets with their shorter names. Run `abra app secret list <domain>` to see which secrets aren't created on the server (because their name was shortened) and run `abra app secret insert <domain> <secret_name> v1 <value>` to reinsert them with the shorter name. Or you can use the migrate_secret_names function in abra.sh to reinsert all existing secrets with their shorter name automatically: `abra app cmd --local <domain> migrate_secret_names`

View File

@ -1,4 +1,4 @@
- targets:
- 'example.org:8082'
- 'metrics.traefik.example.org'
- 'node.monitoring.example.org'
- 'cadvisor.monitoring.example.org'