6 Commits

Author SHA1 Message Date
bd5f181737 fix(db): bump DB_ENTRYPOINT_VERSION to v3 so the entrypoint config reloads
The install-user fix changed the entrypoint content; swarm configs are
immutable, so the config name (which embeds DB_ENTRYPOINT_VERSION) must change
for a redeploy to pick up the new script.
2026-06-16 18:04:05 +00:00
57f5ee2531 fix(db): run pg_upgrade as the old cluster's real install user
pg_upgrade must run as the old cluster's bootstrap superuser (oid 10), and the
new cluster must be initialised with that same user, otherwise it fails the
"database user is the install user" consistency check. The install user is not
necessarily $POSTGRES_USER: clusters created with the default "postgres"
superuser plus a separate app role (e.g. discourse) are common.

Detect it from the old cluster by briefly starting it and reading pg_roles
(oid = 10) as the known app role, then use it for both the new cluster's initdb
and the pg_upgrade -U argument.
2026-06-16 17:59:26 +00:00
101ffe1964 fix(db): make pg_upgrade migration idempotent & crash-safe
The postgres major-version migration in the db entrypoint was not safe to
re-run. If the container was killed mid-migration it could crash-loop forever
("mkdir: cannot create directory .../old_data: File exists") or silently initdb
a fresh empty cluster over the live data once PG_VERSION had been moved out of
$PGDATA but before the in-progress marker was written.

Replace the marker file with a state-driven guard keyed on the scratch dirs:
empty old_data/new_data means the run was interrupted before any data moved, so
discard and retry (idempotent); non-empty means data may only live there, so
stop for manual recovery. Bump DB_ENTRYPOINT_VERSION v1->v2 so swarm picks up
the new (immutable) config.
2026-06-16 17:00:16 +00:00
433ce12dbc Merge pull request 'chore: upgrade to 0.10.0+3.5.0' (#2) from upgrade-0.8.0+3.5.0 into main
Reviewed-on: https://git.autonomic.zone/recipe-maintainers/discourse/pulls/2
2026-06-15 17:37:14 +00:00
b7d8a244d7 chore: upgrade to 0.10.0+3.5.0 (redis 8.0->8.8-alpine) 2026-06-11 22:52:37 +00:00
7ae7b0f76e chore: upgrade to 0.9.0+3.5.0 2026-06-05 02:03:34 +00:00
3 changed files with 31 additions and 13 deletions

View File

@ -1,2 +1,2 @@
export DB_ENTRYPOINT_VERSION=v1
export DB_ENTRYPOINT_VERSION=v3
export PG_BACKUP_VERSION=v2

View File

@ -43,7 +43,7 @@ services:
#- "traefik.http.routers.${STACK_NAME}.middlewares=${STACK_NAME}-redirect"
#- "traefik.http.middlewares.${STACK_NAME}-redirect.headers.SSLForceHost=true"
#- "traefik.http.middlewares.${STACK_NAME}-redirect.headers.SSLHost=${DOMAIN}"
- "coop-cloud.${STACK_NAME}.version=0.8.0+3.5.0"
- "coop-cloud.${STACK_NAME}.version=0.10.0+3.5.0"
healthcheck:
test: "ruby -e \"require 'uri'; require 'net/http'; uri = URI('http://localhost:3000/srv/status'); res = Net::HTTP.get_response(uri); if res.is_a?(Net::HTTPSuccess) then exit (0) else exit (1) end\""
interval: 30s
@ -80,7 +80,7 @@ services:
backupbot.restore.post-hook: "/pg_backup.sh restore"
redis:
image: redis:7.4-alpine
image: redis:8.8-alpine
networks:
- internal
volumes:

View File

@ -2,16 +2,23 @@
set -e
MIGRATION_MARKER=$PGDATA/migration_in_progress
OLDDATA=$PGDATA/old_data
NEWDATA=$PGDATA/new_data
echo "Running as $(id)"
if [ -e $MIGRATION_MARKER ]; then
echo "FATAL: migration was started but did not complete in a previous run. manual recovery necessary"
exit 1
fi
# The migration uses $OLDDATA/$NEWDATA as scratch and removes them when it
# finishes; a leftover *empty* one means a run was interrupted before any data
# moved (data still intact at $PGDATA) so we clear it and retry, while a
# *non-empty* one means data may live only there, so we stop for manual recovery.
for scratch in $OLDDATA $NEWDATA; do
if [ -d "$scratch" ] && [ -n "$(ls -A "$scratch")" ]; then
echo "FATAL: $scratch exists and is not empty - a previous migration did not"
echo "complete and the data may only exist there. manual recovery necessary."
exit 1
fi
done
rm -rf $OLDDATA $NEWDATA
if [ -f $PGDATA/PG_VERSION ]; then
DATA_VERSION=$(cat $PGDATA/PG_VERSION)
@ -23,22 +30,33 @@ if [ -f $PGDATA/PG_VERSION ]; then
apt-get update && apt-get install -y --no-install-recommends \
postgresql-$DATA_VERSION \
&& rm -rf /var/lib/apt/lists/*
# pg_upgrade must run as the old cluster's bootstrap superuser (the "install
# user", oid 10), and the new cluster must be initialised with that same
# user. It is not necessarily $POSTGRES_USER (e.g. clusters created with the
# default "postgres" superuser and a separate app role), so read it from the
# old cluster: briefly start it and ask, connecting as the app role we know.
PGBIN=/usr/lib/postgresql/$DATA_VERSION/bin
gosu postgres $PGBIN/pg_ctl -D $PGDATA -w \
-o "-c listen_addresses= -c unix_socket_directories=/tmp" start
INSTALL_USER=$(gosu postgres psql -h /tmp -U "$POSTGRES_USER" -d postgres -tAc \
"select rolname from pg_roles where oid = 10")
gosu postgres $PGBIN/pg_ctl -D $PGDATA -w stop
echo "old cluster install user: $INSTALL_USER"
echo "shuffling around"
gosu postgres mkdir $OLDDATA $NEWDATA
chmod 700 $OLDDATA $NEWDATA
mv $PGDATA/* $OLDDATA/ || true
touch $MIGRATION_MARKER
echo "running initdb"
# abuse entrypoint script for initdb by making server error out
gosu postgres bash -c "export PGDATA=$NEWDATA ; /usr/local/bin/docker-entrypoint.sh --invalid-arg || true"
# abuse entrypoint script for initdb by making server error out; initialise
# the new cluster with the same superuser as the old one so pg_upgrade matches
gosu postgres bash -c "export PGDATA=$NEWDATA POSTGRES_USER=$INSTALL_USER ; /usr/local/bin/docker-entrypoint.sh --invalid-arg || true"
echo "running pg_upgrade"
cd /tmp
gosu postgres pg_upgrade --link -b /usr/lib/postgresql/$DATA_VERSION/bin -d $OLDDATA -D $NEWDATA -U $POSTGRES_USER
gosu postgres pg_upgrade --link -b /usr/lib/postgresql/$DATA_VERSION/bin -d $OLDDATA -D $NEWDATA -U $INSTALL_USER
cp $OLDDATA/pg_hba.conf $NEWDATA/
mv $NEWDATA/* $PGDATA
rm -rf $OLDDATA
rmdir $NEWDATA
rm $MIGRATION_MARKER
echo "migration complete"
fi
fi