Disable deploy timeouts #596

New Issue

iexos · 2025-08-13T11:03:30Z

iexos commented

2025-08-13 11:03:30 +00:00

On my VMs, deploying apps almost always times out. This is rather annoying as I cannot follow the deployment status polling. Also, this way the deploy version tags are never updated in the config files.

It pretty much only works without timeout when deploying authentik, not sure if it is because it executes post deploy commands.

I would like it to not time out at all, so I can watch what is happening. Maybe let a timer running as well so I can compare deployment times. Some recipes seem to have a timeout var, though specifying it manually is always error-prone I think. Often it times out during "preparing" phase already, where probably the images are still downloaded.

Maybe instead of aborting, abra could print a warning if a container seems to be stuck in startup/reboot cycle for a long time.

On my VMs, deploying apps almost always times out. This is rather annoying as I cannot follow the deployment status polling. Also, this way the deploy version tags are never updated in the config files. It pretty much only works without timeout when deploying authentik, not sure if it is because it executes post deploy commands. I would like it to not time out at all, so I can watch what is happening. Maybe let a timer running as well so I can compare deployment times. Some recipes seem to have a timeout var, though specifying it manually is always error-prone I think. Often it times out during "preparing" phase already, where probably the images are still downloaded. Maybe instead of aborting, abra could print a warning if a container seems to be stuck in startup/reboot cycle for a long time.

decentral1se added the

enhancement

label 2025-08-13 11:11:39 +00:00

decentral1se commented

2025-08-13 11:34:26 +00:00

@iexos ah I see, that seems frustrating alright. thanks for the report.

All recipes support the TIMEOUT env var and a default is used if not specified:

		pkg/upstream/stack/stack.go
		Line 43 in 157d131b37
	
				var WaitTimeout = 50

This is a bit tricky as people raised an issue when it didn't timeout, when timeout wasn't customisable and now we have an issue with timeout itself 😁 So, we need to get into the weeds on this one to see what would be a good improvement which fits everyone.

We could add a --no-timeout/-T to all commands that honour the TIMEOUT env var but then you'd just be passing it indefinitely and I'm not sure that's a real solution? You could add it to the abra.yml but then you're locked into using that forever.

I imagine having these commands timeout is useful for abra consumers (e.g. alakazam) because otherwise, the command would hang forever? So only warning would probably break some peoples shit.

We could tweak the TIMEOUT logic to only count down when all containers are out of the preparing
stage. However, this would probably make everyones existing TIMEOUT values less accurate...

I'm not fully against adding a timer but you can also run time abra app deploy foo.com and learn from there also?

Having the timeout not store the deployed version in the .env file is pretty damn annoying!

Send halp! Would like to improve this and it seems like there is definitely space for that.

@iexos ah I see, that seems frustrating alright. thanks for the report. All recipes support the `TIMEOUT` env var and a default is used if not specified: https://git.coopcloud.tech/toolshed/abra/src/commit/157d131b3704b9c73983d37349c9d74485540b80/pkg/upstream/stack/stack.go#L43 This is a bit tricky as people raised an issue when it didn't timeout, when timeout wasn't customisable and now we have an issue with timeout itself 😁 So, we need to get into the weeds on this one to see what would be a good improvement which fits everyone. We could add a `--no-timeout/-T` to all commands that honour the `TIMEOUT` env var but then you'd just be passing it indefinitely and I'm not sure that's a real solution? You could add it to the `abra.yml` but then you're locked into using that forever. I imagine having these commands timeout is useful for abra consumers (e.g. alakazam) because otherwise, the command would hang forever? So only warning would probably break some peoples shit. We could tweak the TIMEOUT logic to only count down when all containers are out of the `preparing` stage. However, this would probably make everyones existing TIMEOUT values less accurate... I'm not fully against adding a timer but you can also run `time abra app deploy foo.com` and learn from there also? Having the timeout not store the deployed version in the `.env` file is pretty damn annoying! Send halp! Would like to improve this and it seems like there is definitely space for that.

decentral1se changed title from ~~disable deploy timeouts~~ to Disable deploy timeouts

2025-08-13 11:42:35 +00:00

iexos commented

2025-08-13 12:17:37 +00:00

All recipes support the TIMEOUT env var

Thank you, seems good enough workaround for now to just add a high timeout to every app i deploy.

You could add it to the abra.yml but then you're locked into using that forever.

Honestly for me that would be sufficient, so I could just forget about timeouts. But it doesn't really solve the issue ofc.

Also, what is abra.yml? Cannot find any information on that.

I'm not fully against adding a timer but you can also run time abra app deploy foo.com and learn from there also?

Not only a summary deployment time at the end (which i would like), but also a timer how long its deploying already so I can easily see that its taking longer than expected. Though its not super important as I can manage that myself and I see it might increase the output noise.

We could tweak the TIMEOUT logic to only count down when all containers are out of the preparing
stage. However, this would probably make everyones existing TIMEOUT values less accurate...

I think that would be a good step already. And I don't see any harm for existing values, it would just time out a little bit later. From my perspective, they are already inaccurate.

I see a problem that timeouts are chosen decentrally by whoever maintains the recipe on whatever machine they use.
We would need some kind of reference for choosing comparable values. Then we could add a server-wide timeout-coefficient (there is currently no server-specific configuration, correct?) to account for performance differences.
Another path could be generating timeout values centrally via a release CI/CD pipeline and publishing them into the catalogue.
Considering all that makes me think just disabling timeouts sounds like a more sane plan 😅

> All recipes support the TIMEOUT env var Thank you, seems good enough workaround for now to just add a high timeout to every app i deploy. > You could add it to the abra.yml but then you're locked into using that forever. Honestly for me that would be sufficient, so I could just forget about timeouts. But it doesn't really solve the issue ofc. Also, what is `abra.yml`? Cannot find any information on that. > I'm not fully against adding a timer but you can also run time abra app deploy foo.com and learn from there also? Not only a summary deployment time at the end (which i would like), but also a timer how long its deploying already so I can easily see that its taking longer than expected. Though its not super important as I can manage that myself and I see it might increase the output noise. > We could tweak the TIMEOUT logic to only count down when all containers are out of the preparing stage. However, this would probably make everyones existing TIMEOUT values less accurate... I think that would be a good step already. And I don't see any harm for existing values, it would just time out a little bit later. From my perspective, they are already inaccurate. I see a problem that timeouts are chosen decentrally by whoever maintains the recipe on whatever machine they use. We would need some kind of reference for choosing comparable values. Then we could add a server-wide timeout-coefficient (there is currently no server-specific configuration, correct?) to account for performance differences. Another path could be generating timeout values centrally via a release CI/CD pipeline and publishing them into the catalogue. Considering all that makes me think just disabling timeouts sounds like a more sane plan 😅

fauno commented

2025-08-14 13:47:52 +00:00

i was about to report something related about timeouts.

right now it fails with a fatal error, making people think something failed while in fact they can wait a couple minutes while everything is getting ready. maybe make it a warning instead?

We could tweak the TIMEOUT logic to only count down when all containers are out of the preparing
stage. However, this would probably make everyones existing TIMEOUT values less accurate...

so if the registry or internet are slow, the deploy could timeout? i'm ok with no timeout during prepare stage.

i was about to report something related about timeouts. right now it fails with a fatal error, making people think something failed while in fact they can wait a couple minutes while everything is getting ready. maybe make it a warning instead? > We could tweak the TIMEOUT logic to only count down when all containers are out of the preparing stage. However, this would probably make everyones existing TIMEOUT values less accurate... so if the registry or internet are slow, the deploy could timeout? i'm ok with no timeout during prepare stage.

fauno commented

2025-08-14 13:54:13 +00:00

maybe related to timeouts: this week i deployed peertube and it timed out. i had to wait some minutes for it to be really up.

i was following logs while waiting and i could see the nginx container restart continually because it couldn't solve the peertube container's hostname. this is something annoying nginx does, failing to start when name resolution fails, but there are workarounds (including setting the dependency order on the compose file).

meanwhile, the peertube container didn't log anything at all, so if i didn't know how to read the nginx error, i would've thought that the recipe was faulty, specially since it already timed out by abra's opinion.

maybe related to timeouts: this week i deployed peertube and it timed out. i had to wait some minutes for it to be really up. i was following logs while waiting and i could see the nginx container restart continually because it couldn't solve the peertube container's hostname. this is something annoying nginx does, failing to start when name resolution fails, but there are workarounds (including setting the dependency order on the compose file). meanwhile, the peertube container didn't log anything at all, so if i didn't know how to read the nginx error, i would've thought that the recipe was faulty, specially since it already timed out by abra's opinion.

fauno commented

2025-08-14 13:55:35 +00:00

also in general, when someone is learning to sysadmin, any error is assumed to be one's fault, so i'd go easy on how fatal are errors and how to communicate them :D

decentral1se commented

2025-08-16 07:19:24 +00:00

@moritz @simon have you got any feedback on this re: the usage of timeouts? especially regarding alakazam or other scripts you might be making use of?

i'm thinking we need to make some behaviour changes here to how timeouts are handled. i'm still thinking about it. thanks for feedback all, proposals for practical changes which can reach a compromise between different workflows are more than welcome.

@moritz @simon have you got any feedback on this re: the usage of timeouts? especially regarding alakazam or other scripts you might be making use of? i'm thinking we need to make some behaviour changes here to how timeouts are handled. i'm still thinking about it. thanks for feedback all, proposals for practical changes which can reach a compromise between different workflows are more than welcome.

decentral1se referenced this issue

2025-08-17 09:35:29 +00:00

`abra app undeploy` fails to fetch recipe #573

decentral1se added this to the Abra v0.11.x project 2025-08-17 13:46:56 +00:00

stevensting commented

2025-08-17 15:26:03 +00:00

I would also like the timeout to be optional. A timer running also seems like a helpful thing as for longer running deployments like Loomio, aone often gets distracted by other things and looses the feeling on how long the deployment process already ran.

ammaratef45 commented

2025-08-17 17:15:31 +00:00

Here is what I'm a thinking

Backward compatible

As is, existing commands would have the existing behavior

Timeout configuration

A new flag (e.g. --timeout-config/-tc) that allows you to pass timeout configuration as a file, env vars, or inline

The configurations could be something like this

should-timeout=true/false
timeout=xxx # value in seconds
timeout-action=xxx # no-op to just exit the command, rollback to attempt an undeploy first

Default behavior change

Decide if default behavior would stay the same or a set of default configuration values would be assumed in the next major release of abra

Here is what I'm a thinking ## Backward compatible As is, existing commands would have the existing behavior ## Timeout configuration A new flag (e.g. --timeout-config/-tc) that allows you to pass timeout configuration as a file, env vars, or inline The configurations could be something like this ``` should-timeout=true/false timeout=xxx # value in seconds timeout-action=xxx # no-op to just exit the command, rollback to attempt an undeploy first ``` ## Default behavior change Decide if default behavior would stay the same or a set of default configuration values would be assumed in the next major release of abra

decentral1se self-assigned this 2025-08-18 07:39:48 +00:00

decentral1se commented

2025-08-18 07:42:22 +00:00

While I can see the benefits to adding config for completeness, that would complicate the code path considerably. I don't see much justification for that right now. Also, maintaining previous behaviour seems unnecessary as many don't even want to deal with the timeout. I think we just need to maintain backwards compat for scripts (which we can by honouring the timeout if there is a TIMEOUT present in the .env). Then we can make sure recipes have TIMEOUT commented out by default and operators can remove them from their .env files for the new version of abra. It's not that much churn, I think. It seems like a ctrl-c is a much more reliable way to understand if a deployment failed or not because a human did it.

While I can see the benefits to adding config for completeness, that would complicate the code path considerably. I don't see much justification for that right now. Also, maintaining previous behaviour seems unnecessary as many don't even want to deal with the timeout. I think we just need to maintain backwards compat for scripts (which we can by honouring the timeout if there is a `TIMEOUT` present in the `.env`). Then we can make sure recipes have `TIMEOUT` commented out by default and operators can remove them from their `.env` files for the new version of `abra`. It's not that much churn, I think. It seems like a `ctrl-c` is a much more reliable way to understand if a deployment failed or not because a human did it.

iexos commented

2025-08-18 08:36:45 +00:00

I agree, disabling timeout when there is no TIMEOUT var seems like a sensible solution. Automation tools like alakazam can simply add the env var by default.

Though existing recipes then should remove TIMEOUT, also I don't know what exactly is up with existing "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" labels.

I agree, disabling timeout when there is no `TIMEOUT` var seems like a sensible solution. Automation tools like alakazam can simply add the env var by default. Though existing recipes then should remove `TIMEOUT`, also I don't know what exactly is up with existing `"coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"` labels.

decentral1se commented

2025-08-19 16:35:28 +00:00

Unless I hear any major objections, I will take a run at this after the translation work.

In general, I think we're running into the issue that we just need to see the logs of the containers while polling the status of the deployment. It's like the 2 main reliable indicators that determine if the deployment is going well. I think this timeout implementation was an attempt to reduce the worst parts of a failing deployment but it might have made things worse 🙃 I will try to take another run at feed live logs into the abra app deploy screen...

Unless I hear any major objections, I will take a run at this after the translation work. In general, I think we're running into the issue that we just *need* to see the logs of the containers while polling the status of the deployment. It's like the 2 main reliable indicators that determine if the deployment is going well. I think this timeout implementation was an attempt to reduce the worst parts of a failing deployment but it might have made things worse 🙃 I will try to take another run at feed live logs into the `abra app deploy` screen...

❤️ 1

decentral1se moved this to In Progress in Abra v0.11.x on 2025-08-25 06:58:59 +00:00

decentral1se referenced this issue from a commit

2025-08-25 07:37:00 +00:00

refactor: allow timeout only from .env

decentral1se referenced this issue from a commit

2025-08-25 09:38:55 +00:00

refactor: timeout only when TIMEOUT=... in .env

decentral1se referenced this issue

2025-08-25 09:40:27 +00:00

refactor!: do not set default timeout #612

decentral1se referenced this issue from a commit

2025-08-25 09:49:28 +00:00

refactor!: do not set default timeout

decentral1se commented

2025-08-25 10:12:41 +00:00

OK, #612 should cover this now.

Ideally, we'll only see #TIMEOUT=... in the .env.sample in the future and operators can choose to use them or not. Hopefully the migration here is not that painful for scripting purposes and this change as-is is backwards compatible. There's no default any more from the abra side of things.

It would be great if folks who maintain these recipes can remove these label defaults:

> grep -r "coop-cloud.*timeout=" ~/.abra/recipes
/home/d/.abra/recipes/onlyoffice/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/gitlab/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-240}"
/home/d/.abra/recipes/lasuite-docs/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/backup-bot-two/compose.yml:        - coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-300}
/home/d/.abra/recipes/kimai/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/zammad/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/matrix-synapse/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/hedgedoc/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/mattermost/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/synapse-admin/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/collabora/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/monitoring-ng/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/traefik/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/authentik/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/vikunja/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/foodsoft/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/rallly/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/wekan/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/wordpress/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/nextcloud/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
/home/d/.abra/recipes/outline/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-80}"

E.g.

/home/d/.abra/recipes/outline/compose.yml:        - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT}"

OK, https://git.coopcloud.tech/toolshed/abra/pulls/612 should cover this now. Ideally, we'll only see `#TIMEOUT=...` in the `.env.sample` in the future and operators can choose to use them or not. Hopefully the migration here is not that painful for scripting purposes and this change as-is is backwards compatible. There's no default any more from the `abra` side of things. It would be great if folks who maintain these recipes can remove these label defaults: ```bash > grep -r "coop-cloud.*timeout=" ~/.abra/recipes /home/d/.abra/recipes/onlyoffice/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/gitlab/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-240}" /home/d/.abra/recipes/lasuite-docs/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/backup-bot-two/compose.yml: - coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-300} /home/d/.abra/recipes/kimai/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/zammad/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/matrix-synapse/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/hedgedoc/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/mattermost/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/synapse-admin/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/collabora/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/monitoring-ng/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/traefik/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/authentik/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/vikunja/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/foodsoft/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/rallly/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/wekan/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/wordpress/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/nextcloud/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}" /home/d/.abra/recipes/outline/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-80}" ``` E.g. ```bash /home/d/.abra/recipes/outline/compose.yml: - "coop-cloud.${STACK_NAME}.timeout=${TIMEOUT}" ```

🎉 1

decentral1se closed this issue

2025-08-25 10:12:41 +00:00

decentral1se referenced this issue

2025-08-25 10:15:00 +00:00

feat: tabbed overview of deployment/logs/errors/etc. on deploy #613

decentral1se moved this to Done in Abra v0.11.x on 2025-08-25 10:15:26 +00:00

Sign in to join this conversation.