post backup restore command #26

Open
opened 2023-09-07 21:23:08 +00:00 by moritz · 10 comments
Member

Do we need a post backup restore command? Something like psql -d dbname < backup.sql for restoring a database.
I am not quite sure, because the easiest way to restore a database is to simply shut down the container and restore the whole data volume. The database dumps are more like an alternative way, especially for migrating the database to another system. I played a little bit with restoring database dumps and its not really straightforward. On the one hand the container needs to run, on the other hand all database sessions need to be closed.
For postgres I hand to run this whole bunch of commands to simply restore a database dump on a running container:

psql -d dbdame -c """
REVOKE CONNECT ON DATABASE dbname FROM public;
SELECT pid, pg_terminate_backend(pid) 
FROM pg_stat_activity 
WHERE datname = current_database() AND pid <> pg_backend_pid();
"""
dropdb dbname
createdb dbname 
psql -d dbname < backup.sql
psql -d dbname -c "GRANT CONNECT ON DATABASE dbname TO public;"

What is your opinion about a backupbot.restore.post-hook label?

Do we need a post backup restore command? Something like `psql -d dbname < backup.sql` for restoring a database. I am not quite sure, because the easiest way to restore a database is to simply shut down the container and restore the whole data volume. The database dumps are more like an alternative way, especially for migrating the database to another system. I played a little bit with restoring database dumps and its not really straightforward. On the one hand the container needs to run, on the other hand all database sessions need to be closed. For postgres I hand to run this whole bunch of commands to simply restore a database dump on a running container: ``` psql -d dbdame -c """ REVOKE CONNECT ON DATABASE dbname FROM public; SELECT pid, pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = current_database() AND pid <> pg_backend_pid(); """ dropdb dbname createdb dbname psql -d dbname < backup.sql psql -d dbname -c "GRANT CONNECT ON DATABASE dbname TO public;" ``` What is your opinion about a `backupbot.restore.post-hook` label?
moritz added this to the backupbot revolution project 2023-09-07 21:23:15 +00:00
Owner

I think borgmatic has a very good approach here https://torsion.org/borgmatic/docs/how-to/backup-your-databases/

If you want to backup a database, it's best practice with most database systems to backup an exported database dump, rather than backing up your database's internal file storage. That's because the internal storage can change while you're reading from it. In contrast, a database dump creates a consistent snapshot that is more suited for backups.

So, we need to support dumps? I typically err on the side of "do both", both system level snapshot and then dump and include that. But yeh, unsure how this works out in practice... it's always a bit difficult.

I think `borgmatic` has a very good approach here https://torsion.org/borgmatic/docs/how-to/backup-your-databases/ > If you want to backup a database, it's best practice with most database systems to backup an exported database dump, rather than backing up your database's internal file storage. That's because the internal storage can change while you're reading from it. In contrast, a database dump creates a consistent snapshot that is more suited for backups. So, we need to support dumps? I typically err on the side of "do both", both system level snapshot and then dump and include that. But yeh, unsure how this works out in practice... it's always a bit difficult.
Author
Member

I also think we should support both. Maybe restoring the filesystem per default (as it's less error prone and can be done without deploying the app) and restoring the database as extra command.
And I like the idea to bake the postrgres/mysql dump and restore commands directly into the backupbot. This way the commands can be arbitrarily complex, without cluttering the recipe.
I am not sure if than a backupbot.restore.post-hook is still necessary?

I also think we should support both. Maybe restoring the filesystem per default (as it's less error prone and can be done without deploying the app) and restoring the database as extra command. And I like the idea to bake the postrgres/mysql dump and restore commands directly into the backupbot. This way the commands can be arbitrarily complex, without cluttering the recipe. I am not sure if than a `backupbot.restore.post-hook` is still necessary?
Owner

Do we need a post backup restore command?

yes

Maybe restoring the filesystem per default [..] and restoring the database as extra command.

I'd prefer the other way around (reasoning follows) but as long as both options are available then either is fine.

as it's less error prone

My experience is the opposite:

  1. DB dump backups are guaranteed consistent (as per @decentral1se's comment), vs copying the database files on disk, which - depending on which DB, and how it's configured - often leads to corrupted data. When I did a survey of docker/docker swarm backup systems before writing backup-bot-two, most folks were stopping docker services before running backups, which seems like an anti-pattern i would like to avoid

  2. DB dump backups are also more portable, e.g. postgres is very fussy about loading files (or even its own native backup format files) from different major versions

and can be done without deploying the app

This is fair; mysql backups usually restore fine over default data (e.g. wordpress installs some default tables and data, but these get overwritten when loading a backup) but postgres seems to want a completely blank DB to load a dump, which in practice means overriding entrypoint to prevent DB initialisation and doing a --chaos deploy, not ideal. I'd be sort of ok with keeping "restoring the filesystem" as default for this reason.

And I like the idea to bake the postrgres/mysql dump and restore commands directly into the backupbot. This way the commands can be arbitrarily complex, without cluttering the recipe.

can we do both? the original backupbot implementation had helper scripts which made the post-restore labels less verbose, but still allowed defining custom hooks where needed. but custom hooks could potentially come later if needed.

> Do we need a post backup restore command? yes > Maybe restoring the filesystem per default [..] and restoring the database as extra command. I'd prefer the other way around (reasoning follows) but as long as both options are available then either is fine. > as it's less error prone My experience is the opposite: 1. DB dump backups are guaranteed consistent (as per @decentral1se's comment), vs copying the database files on disk, which - depending on which DB, and how it's configured - often leads to corrupted data. When I did a survey of docker/docker swarm backup systems before writing backup-bot-two, most folks were stopping docker services before running backups, which seems like an anti-pattern i would like to avoid 2. DB dump backups are also more portable, e.g. postgres is very fussy about loading files (or even its own native backup format files) from different major versions > and can be done without deploying the app This is fair; mysql backups usually restore fine over default data (e.g. wordpress installs some default tables and data, but these get overwritten when loading a backup) but postgres seems to want a completely blank DB to load a dump, which in practice means overriding `entrypoint` to prevent DB initialisation and doing a `--chaos` deploy, not ideal. I'd be sort of ok with keeping "restoring the filesystem" as default for this reason. > And I like the idea to bake the postrgres/mysql dump and restore commands directly into the backupbot. This way the commands can be arbitrarily complex, without cluttering the recipe. can we do both? the original backupbot implementation had helper scripts which made the `post-restore` labels less verbose, but still allowed defining custom hooks where needed. but custom hooks could potentially come later if needed.
Member

I also prefer restoring with the data base commands, for the reasons @3wordchant wrote.

This is fair; mysql backups usually restore fine over default data (e.g. wordpress installs some default tables and data, but these get overwritten when loading a backup) but postgres seems to want a completely blank DB to load a dump, which in practice means overriding entrypoint to prevent DB initialisation and doing a --chaos deploy, not ideal. I'd be sort of ok with keeping "restoring the filesystem" as default for this reason.

When providing --clean to `pg_restore it overrides the existing database. So it should not be a problem.

I also prefer restoring with the data base commands, for the reasons @3wordchant wrote. > This is fair; mysql backups usually restore fine over default data (e.g. wordpress installs some default tables and data, but these get overwritten when loading a backup) but postgres seems to want a completely blank DB to load a dump, which in practice means overriding entrypoint to prevent DB initialisation and doing a --chaos deploy, not ideal. I'd be sort of ok with keeping "restoring the filesystem" as default for this reason. When providing `--clean` to `pg_restore it overrides the existing database. So it should not be a problem.
Author
Member

When providing --clean to `pg_restore it overrides the existing database. So it should not be a problem.

I think we still need to manually terminate all connections to execute this command?

We could put the restore commands into an abra.sh function and trigger this function after a restore. This would allow more flexibility without verbose labels in the compose file. Maybe a function called _restore() ?

> When providing `--clean` to `pg_restore it overrides the existing database. So it should not be a problem. I think we still need to manually terminate all connections to execute this command? We could put the restore commands into an `abra.sh` function and trigger this function after a restore. This would allow more flexibility without verbose labels in the compose file. Maybe a function called `_restore()` ?
Owner

I think we still need to manually terminate all connections to execute this command?

Interesting point, I haven't tested this myself.

We could put the restore commands into an abra.sh function and trigger this function after a restore

That sounds nice, centralising them into backup-bot could also be nice, but either seems fine to start with.

> I think we still need to manually terminate all connections to execute this command? Interesting point, I haven't tested this myself. > We could put the restore commands into an `abra.sh` function and trigger this function after a restore That sounds nice, centralising them into backup-bot could also be nice, but either seems fine to start with.
Member

When providing --clean to `pg_restore it overrides the existing database. So it should not be a problem.

I think we still need to manually terminate all connections to execute this command?

Yes you are right, terminating the existing connections is probably safer.

We could put the restore commands into an abra.sh function and trigger this function after a restore. This would allow more flexibility without verbose labels in the compose file. Maybe a function called _restore() ?

Using a _restore() in abra.sh might be more convenient, but then backup-bot-two will be tightly coupled with abra. It might be nice, if backup-bot-two could work without using abra. So maybe we can wait a bit more to add this coupling?

> > When providing --clean to `pg_restore it overrides the existing database. So it should not be a problem. > > I think we still need to manually terminate all connections to execute this command? Yes you are right, terminating the existing connections is probably safer. > We could put the restore commands into an abra.sh function and trigger this function after a restore. This would allow more flexibility without verbose labels in the compose file. Maybe a function called _restore() ? Using a `_restore()` in `abra.sh` might be more convenient, but then `backup-bot-two` will be tightly coupled with abra. It might be nice, if `backup-bot-two` could work without using abra. So maybe we can wait a bit more to add this coupling?
Author
Member

Using a _restore() in abra.sh might be more convenient, but then backup-bot-two will be tightly coupled with abra. It might be nice, if backup-bot-two could work without using abra. So maybe we can wait a bit more to add this coupling?

I don't have a good idea how to decouple this in a smart way. One way would be to have multiple modules inside the backupbot repo. Each module contains the restore command for the respecting database i.e. mysql, postgresql, mariadb, mongodb...
But this way we probably also have to maintain modules for different database versions. And in rare cases the restore command is completely unique for a specific recipe, or may change in between different recipe versions.
Therefore I think the restore command should be inside the recipe. But having a command like above #26 (comment) as label would completely clutter the compose file 🤔

> Using a `_restore()` in `abra.sh` might be more convenient, but then `backup-bot-two` will be tightly coupled with abra. It might be nice, if `backup-bot-two` could work without using abra. So maybe we can wait a bit more to add this coupling? I don't have a good idea how to decouple this in a smart way. One way would be to have multiple modules inside the backupbot repo. Each module contains the restore command for the respecting database i.e. mysql, postgresql, mariadb, mongodb... But this way we probably also have to maintain modules for different database versions. And in rare cases the restore command is completely unique for a specific recipe, or may change in between different recipe versions. Therefore I think the restore command should be inside the recipe. But having a command like above https://git.coopcloud.tech/coop-cloud/backup-bot-two/issues/26#issue-3704 as label would completely clutter the compose file 🤔
decentral1se added the
help wanted
label 2024-01-11 13:32:34 +00:00
Owner

Also see #42. (Should we merge?) (Separate issues).

Also see https://git.coopcloud.tech/coop-cloud/backup-bot-two/issues/42. ~~(Should we merge?)~~ (Separate issues).
Owner

Using a _restore() in abra.sh might be more convenient, but then backup-bot-two will be tightly coupled with abra

I don't have a good idea how to decouple this in a smart way

For me, "standard bash commands in abra.sh" is reusable enough outside abra; if someone is bringing their own plumbing to use a Co-op Cloud recipe, and backup-bot-two, without abra, then asking them to manually handle copying that script into a container and run it after a restore seems fine.

It does seem a bit non-ideal that we'll have a proliferation of nearly-identical restore_postgres scripts between a bunch of recipes, but I see where you're coming from about how centralisation would make maintaining backupbot annoying, so maybe it's fine to live with more copypasta for now.

> Using a `_restore()` in `abra.sh` might be more convenient, but then backup-bot-two will be tightly coupled with `abra` > I don't have a good idea how to decouple this in a smart way For me, "standard bash commands in `abra.sh`" is reusable enough outside abra; if someone is bringing their own plumbing to use a Co-op Cloud recipe, and backup-bot-two, without `abra`, then asking them to manually handle copying that script into a container and run it after a restore seems fine. It does seem a bit non-ideal that we'll have a proliferation of nearly-identical `restore_postgres` scripts between a bunch of recipes, but I see where you're coming from about how centralisation would make maintaining backupbot annoying, so maybe it's fine to live with more copypasta for now.
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: coop-cloud/backup-bot-two#26
No description provided.