2 Commits

Author SHA1 Message Date
6ae2d2cf51 fix(db): make the postgres major-version migration safe and correct
The in-place pg_upgrade in the db entrypoint could crash-loop or fail on real
clusters. This reworks it:

- Idempotent, crash-safe: replace the fragile migration_in_progress marker with
  a state-driven guard on the old_data/new_data scratch dirs. An empty leftover
  means a run was interrupted before any data moved (data still intact at
  $PGDATA) so it is discarded and retried; a non-empty one means data may live
  only there, so it stops for manual recovery. Removes both the
  "mkdir: File exists" crash-loop and the silent fresh-initdb-over-live-data
  window.

- Correct install user: pg_upgrade must run as the old cluster's bootstrap
  superuser (oid 10), and the new cluster must be initialised with that same
  user. It is not necessarily $POSTGRES_USER (clusters created with the default
  "postgres" superuser plus a separate app role are common). Detect it from the
  old cluster (briefly start it and read pg_roles where oid = 10) and use it for
  both the new cluster's initdb and the pg_upgrade -U argument.

- Bump DB_ENTRYPOINT_VERSION to v3 so swarm reloads the (immutable) config.

Verified on cctest: clean 13->17, interrupted-then-retried, and prod-like
clusters whose install user is "postgres" with a separate "discourse" app role.
2026-06-16 18:27:45 +00:00
7e4d85a810 Merge pull request 'Fix backup/restore' (#14) from fix-restore into main
Some checks failed
continuous-integration/drone/push Build is failing
Reviewed-on: #14
2026-06-15 17:33:39 +00:00

View File

@ -43,7 +43,7 @@ services:
#- "traefik.http.routers.${STACK_NAME}.middlewares=${STACK_NAME}-redirect"
#- "traefik.http.middlewares.${STACK_NAME}-redirect.headers.SSLForceHost=true"
#- "traefik.http.middlewares.${STACK_NAME}-redirect.headers.SSLHost=${DOMAIN}"
- "coop-cloud.${STACK_NAME}.version=0.10.0+3.5.0"
- "coop-cloud.${STACK_NAME}.version=0.8.0+3.5.0"
healthcheck:
test: "ruby -e \"require 'uri'; require 'net/http'; uri = URI('http://localhost:3000/srv/status'); res = Net::HTTP.get_response(uri); if res.is_a?(Net::HTTPSuccess) then exit (0) else exit (1) end\""
interval: 30s
@ -80,7 +80,7 @@ services:
backupbot.restore.post-hook: "/pg_backup.sh restore"
redis:
image: redis:8.8-alpine
image: redis:7.4-alpine
networks:
- internal
volumes: