The in-place pg_upgrade in the db entrypoint could crash-loop or fail on real
clusters. This reworks it:
- Idempotent, crash-safe: replace the fragile migration_in_progress marker with
a state-driven guard on the old_data/new_data scratch dirs. An empty leftover
means a run was interrupted before any data moved (data still intact at
$PGDATA) so it is discarded and retried; a non-empty one means data may live
only there, so it stops for manual recovery. Removes both the
"mkdir: File exists" crash-loop and the silent fresh-initdb-over-live-data
window.
- Correct install user: pg_upgrade must run as the old cluster's bootstrap
superuser (oid 10), and the new cluster must be initialised with that same
user. It is not necessarily $POSTGRES_USER (clusters created with the default
"postgres" superuser plus a separate app role are common). Detect it from the
old cluster (briefly start it and read pg_roles where oid = 10) and use it for
both the new cluster's initdb and the pg_upgrade -U argument.
- Bump DB_ENTRYPOINT_VERSION to v3 so swarm reloads the (immutable) config.
Verified on cctest: clean 13->17, interrupted-then-retried, and prod-like
clusters whose install user is "postgres" with a separate "discourse" app role.