post backup restore command #26

New Issue

moritz · 2023-09-07T21:23:08Z

moritz commented

2023-09-07 21:23:08 +00:00

Do we need a post backup restore command? Something like psql -d dbname < backup.sql for restoring a database.
I am not quite sure, because the easiest way to restore a database is to simply shut down the container and restore the whole data volume. The database dumps are more like an alternative way, especially for migrating the database to another system. I played a little bit with restoring database dumps and its not really straightforward. On the one hand the container needs to run, on the other hand all database sessions need to be closed.
For postgres I hand to run this whole bunch of commands to simply restore a database dump on a running container:

psql -d dbdame -c """
REVOKE CONNECT ON DATABASE dbname FROM public;
SELECT pid, pg_terminate_backend(pid) 
FROM pg_stat_activity 
WHERE datname = current_database() AND pid <> pg_backend_pid();
"""
dropdb dbname
createdb dbname 
psql -d dbname < backup.sql
psql -d dbname -c "GRANT CONNECT ON DATABASE dbname TO public;"

What is your opinion about a backupbot.restore.post-hook label?

Do we need a post backup restore command? Something like `psql -d dbname < backup.sql` for restoring a database. I am not quite sure, because the easiest way to restore a database is to simply shut down the container and restore the whole data volume. The database dumps are more like an alternative way, especially for migrating the database to another system. I played a little bit with restoring database dumps and its not really straightforward. On the one hand the container needs to run, on the other hand all database sessions need to be closed. For postgres I hand to run this whole bunch of commands to simply restore a database dump on a running container: ``` psql -d dbdame -c """ REVOKE CONNECT ON DATABASE dbname FROM public; SELECT pid, pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = current_database() AND pid <> pg_backend_pid(); """ dropdb dbname createdb dbname psql -d dbname < backup.sql psql -d dbname -c "GRANT CONNECT ON DATABASE dbname TO public;" ``` What is your opinion about a `backupbot.restore.post-hook` label?

moritz added this to the backupbot revolution project 2023-09-07 21:23:15 +00:00

decentral1se commented

2023-10-12 13:50:22 +00:00

I think borgmatic has a very good approach here https://torsion.org/borgmatic/docs/how-to/backup-your-databases/

If you want to backup a database, it's best practice with most database systems to backup an exported database dump, rather than backing up your database's internal file storage. That's because the internal storage can change while you're reading from it. In contrast, a database dump creates a consistent snapshot that is more suited for backups.

So, we need to support dumps? I typically err on the side of "do both", both system level snapshot and then dump and include that. But yeh, unsure how this works out in practice... it's always a bit difficult.

I think `borgmatic` has a very good approach here https://torsion.org/borgmatic/docs/how-to/backup-your-databases/ > If you want to backup a database, it's best practice with most database systems to backup an exported database dump, rather than backing up your database's internal file storage. That's because the internal storage can change while you're reading from it. In contrast, a database dump creates a consistent snapshot that is more suited for backups. So, we need to support dumps? I typically err on the side of "do both", both system level snapshot and then dump and include that. But yeh, unsure how this works out in practice... it's always a bit difficult.

👍 1

moritz commented

2023-10-16 14:50:13 +00:00

I also think we should support both. Maybe restoring the filesystem per default (as it's less error prone and can be done without deploying the app) and restoring the database as extra command.
And I like the idea to bake the postrgres/mysql dump and restore commands directly into the backupbot. This way the commands can be arbitrarily complex, without cluttering the recipe.
I am not sure if than a backupbot.restore.post-hook is still necessary?

I also think we should support both. Maybe restoring the filesystem per default (as it's less error prone and can be done without deploying the app) and restoring the database as extra command. And I like the idea to bake the postrgres/mysql dump and restore commands directly into the backupbot. This way the commands can be arbitrarily complex, without cluttering the recipe. I am not sure if than a `backupbot.restore.post-hook` is still necessary?

👍 1

3wordchant commented

2023-10-18 17:51:27 +00:00

Do we need a post backup restore command?

ｙｅｓ

Maybe restoring the filesystem per default [..] and restoring the database as extra command.

I'd prefer the other way around (reasoning follows) but as long as both options are available then either is fine.

as it's less error prone

My experience is the opposite:

DB dump backups are guaranteed consistent (as per @decentral1se's comment), vs copying the database files on disk, which - depending on which DB, and how it's configured - often leads to corrupted data. When I did a survey of docker/docker swarm backup systems before writing backup-bot-two, most folks were stopping docker services before running backups, which seems like an anti-pattern i would like to avoid
DB dump backups are also more portable, e.g. postgres is very fussy about loading files (or even its own native backup format files) from different major versions

and can be done without deploying the app

This is fair; mysql backups usually restore fine over default data (e.g. wordpress installs some default tables and data, but these get overwritten when loading a backup) but postgres seems to want a completely blank DB to load a dump, which in practice means overriding entrypoint to prevent DB initialisation and doing a --chaos deploy, not ideal. I'd be sort of ok with keeping "restoring the filesystem" as default for this reason.

And I like the idea to bake the postrgres/mysql dump and restore commands directly into the backupbot. This way the commands can be arbitrarily complex, without cluttering the recipe.

can we do both? the original backupbot implementation had helper scripts which made the post-restore labels less verbose, but still allowed defining custom hooks where needed. but custom hooks could potentially come later if needed.

> Do we need a post backup restore command? ｙｅｓ > Maybe restoring the filesystem per default [..] and restoring the database as extra command. I'd prefer the other way around (reasoning follows) but as long as both options are available then either is fine. > as it's less error prone My experience is the opposite: 1. DB dump backups are guaranteed consistent (as per @decentral1se's comment), vs copying the database files on disk, which - depending on which DB, and how it's configured - often leads to corrupted data. When I did a survey of docker/docker swarm backup systems before writing backup-bot-two, most folks were stopping docker services before running backups, which seems like an anti-pattern i would like to avoid 2. DB dump backups are also more portable, e.g. postgres is very fussy about loading files (or even its own native backup format files) from different major versions > and can be done without deploying the app This is fair; mysql backups usually restore fine over default data (e.g. wordpress installs some default tables and data, but these get overwritten when loading a backup) but postgres seems to want a completely blank DB to load a dump, which in practice means overriding `entrypoint` to prevent DB initialisation and doing a `--chaos` deploy, not ideal. I'd be sort of ok with keeping "restoring the filesystem" as default for this reason. > And I like the idea to bake the postrgres/mysql dump and restore commands directly into the backupbot. This way the commands can be arbitrarily complex, without cluttering the recipe. can we do both? the original backupbot implementation had helper scripts which made the `post-restore` labels less verbose, but still allowed defining custom hooks where needed. but custom hooks could potentially come later if needed.

👍 2

p4u1 commented

2023-11-11 12:29:12 +00:00

I also prefer restoring with the data base commands, for the reasons @3wordchant wrote.

This is fair; mysql backups usually restore fine over default data (e.g. wordpress installs some default tables and data, but these get overwritten when loading a backup) but postgres seems to want a completely blank DB to load a dump, which in practice means overriding entrypoint to prevent DB initialisation and doing a --chaos deploy, not ideal. I'd be sort of ok with keeping "restoring the filesystem" as default for this reason.

When providing --clean to `pg_restore it overrides the existing database. So it should not be a problem.

I also prefer restoring with the data base commands, for the reasons @3wordchant wrote. > This is fair; mysql backups usually restore fine over default data (e.g. wordpress installs some default tables and data, but these get overwritten when loading a backup) but postgres seems to want a completely blank DB to load a dump, which in practice means overriding entrypoint to prevent DB initialisation and doing a --chaos deploy, not ideal. I'd be sort of ok with keeping "restoring the filesystem" as default for this reason. When providing `--clean` to `pg_restore it overrides the existing database. So it should not be a problem.

❤️ 2

3wordchant referenced this issue

2023-11-12 01:05:13 +00:00

does `backupbot.backup.path` need to be backwards compatible? i.e. should we make a new label? #41

3wordchant referenced this issue

2023-11-12 01:10:56 +00:00

`restore` command and label design #42

moritz commented

2023-11-14 18:12:13 +00:00

When providing --clean to `pg_restore it overrides the existing database. So it should not be a problem.

I think we still need to manually terminate all connections to execute this command?

We could put the restore commands into an abra.sh function and trigger this function after a restore. This would allow more flexibility without verbose labels in the compose file. Maybe a function called _restore() ?

> When providing `--clean` to `pg_restore it overrides the existing database. So it should not be a problem. I think we still need to manually terminate all connections to execute this command? We could put the restore commands into an `abra.sh` function and trigger this function after a restore. This would allow more flexibility without verbose labels in the compose file. Maybe a function called `_restore()` ?

👀 1

3wordchant commented

2023-12-02 15:35:53 +00:00

I think we still need to manually terminate all connections to execute this command?

Interesting point, I haven't tested this myself.

We could put the restore commands into an abra.sh function and trigger this function after a restore

That sounds nice, centralising them into backup-bot could also be nice, but either seems fine to start with.

> I think we still need to manually terminate all connections to execute this command? Interesting point, I haven't tested this myself. > We could put the restore commands into an `abra.sh` function and trigger this function after a restore That sounds nice, centralising them into backup-bot could also be nice, but either seems fine to start with.

p4u1 commented

2023-12-04 08:52:46 +00:00

When providing --clean to `pg_restore it overrides the existing database. So it should not be a problem.

I think we still need to manually terminate all connections to execute this command?

Yes you are right, terminating the existing connections is probably safer.

We could put the restore commands into an abra.sh function and trigger this function after a restore. This would allow more flexibility without verbose labels in the compose file. Maybe a function called _restore() ?

Using a _restore() in abra.sh might be more convenient, but then backup-bot-two will be tightly coupled with abra. It might be nice, if backup-bot-two could work without using abra. So maybe we can wait a bit more to add this coupling?

> > When providing --clean to `pg_restore it overrides the existing database. So it should not be a problem. > > I think we still need to manually terminate all connections to execute this command? Yes you are right, terminating the existing connections is probably safer. > We could put the restore commands into an abra.sh function and trigger this function after a restore. This would allow more flexibility without verbose labels in the compose file. Maybe a function called _restore() ? Using a `_restore()` in `abra.sh` might be more convenient, but then `backup-bot-two` will be tightly coupled with abra. It might be nice, if `backup-bot-two` could work without using abra. So maybe we can wait a bit more to add this coupling?

moritz commented

2023-12-07 17:15:27 +00:00

Using a _restore() in abra.sh might be more convenient, but then backup-bot-two will be tightly coupled with abra. It might be nice, if backup-bot-two could work without using abra. So maybe we can wait a bit more to add this coupling?

I don't have a good idea how to decouple this in a smart way. One way would be to have multiple modules inside the backupbot repo. Each module contains the restore command for the respecting database i.e. mysql, postgresql, mariadb, mongodb...
But this way we probably also have to maintain modules for different database versions. And in rare cases the restore command is completely unique for a specific recipe, or may change in between different recipe versions.
Therefore I think the restore command should be inside the recipe. But having a command like above #26 (comment) as label would completely clutter the compose file 🤔

> Using a `_restore()` in `abra.sh` might be more convenient, but then `backup-bot-two` will be tightly coupled with abra. It might be nice, if `backup-bot-two` could work without using abra. So maybe we can wait a bit more to add this coupling? I don't have a good idea how to decouple this in a smart way. One way would be to have multiple modules inside the backupbot repo. Each module contains the restore command for the respecting database i.e. mysql, postgresql, mariadb, mongodb... But this way we probably also have to maintain modules for different database versions. And in rare cases the restore command is completely unique for a specific recipe, or may change in between different recipe versions. Therefore I think the restore command should be inside the recipe. But having a command like above https://git.coopcloud.tech/coop-cloud/backup-bot-two/issues/26#issue-3704 as label would completely clutter the compose file 🤔

decentral1se added the

help wanted

label 2024-01-11 13:32:34 +00:00

decentral1se referenced this issue from toolshed/abra

2024-01-12 15:50:28 +00:00

feat: backup revolution #361

decentral1se commented

2024-01-12 20:56:17 +00:00

Also see #42. ~~(Should we merge?)~~ (Separate issues).

Also see https://git.coopcloud.tech/coop-cloud/backup-bot-two/issues/42. ~~(Should we merge?)~~ (Separate issues).

3wordchant commented

2024-02-10 21:23:50 +00:00

Using a _restore() in abra.sh might be more convenient, but then backup-bot-two will be tightly coupled with abra

I don't have a good idea how to decouple this in a smart way

For me, "standard bash commands in abra.sh" is reusable enough outside abra; if someone is bringing their own plumbing to use a Co-op Cloud recipe, and backup-bot-two, without abra, then asking them to manually handle copying that script into a container and run it after a restore seems fine.

It does seem a bit non-ideal that we'll have a proliferation of nearly-identical restore_postgres scripts between a bunch of recipes, but I see where you're coming from about how centralisation would make maintaining backupbot annoying, so maybe it's fine to live with more copypasta for now.

> Using a `_restore()` in `abra.sh` might be more convenient, but then backup-bot-two will be tightly coupled with `abra` > I don't have a good idea how to decouple this in a smart way For me, "standard bash commands in `abra.sh`" is reusable enough outside abra; if someone is bringing their own plumbing to use a Co-op Cloud recipe, and backup-bot-two, without `abra`, then asking them to manually handle copying that script into a container and run it after a restore seems fine. It does seem a bit non-ideal that we'll have a proliferation of nearly-identical `restore_postgres` scripts between a bunch of recipes, but I see where you're coming from about how centralisation would make maintaining backupbot annoying, so maybe it's fine to live with more copypasta for now.

~~moritz referenced this issue from toolshed/docs.coopcloud.tech 2024-04-16 11:51:05 +00:00~~

Backupbot Specification #258

decentral1se commented

2024-10-23 07:52:21 +00:00

Seems like we have an approach now?

49dd989302

Seems like we have an approach now? https://git.coopcloud.tech/coop-cloud/backup-bot-two/commit/49dd989302fb902af7dfc22e86ea8d90b9b44dce

moritz commented

2024-10-24 15:03:47 +00:00

I pushed an example script for restoring postgres: https://git.coopcloud.tech/coop-cloud/backup-bot-two/src/branch/main/pg_backup.sh

🎉 2

moritz closed this issue