Disable deploy timeouts #596
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
On my VMs, deploying apps almost always times out. This is rather annoying as I cannot follow the deployment status polling. Also, this way the deploy version tags are never updated in the config files.
It pretty much only works without timeout when deploying authentik, not sure if it is because it executes post deploy commands.
I would like it to not time out at all, so I can watch what is happening. Maybe let a timer running as well so I can compare deployment times. Some recipes seem to have a timeout var, though specifying it manually is always error-prone I think. Often it times out during "preparing" phase already, where probably the images are still downloaded.
Maybe instead of aborting, abra could print a warning if a container seems to be stuck in startup/reboot cycle for a long time.
@iexos ah I see, that seems frustrating alright. thanks for the report.
All recipes support the
TIMEOUT
env var and a default is used if not specified:This is a bit tricky as people raised an issue when it didn't timeout, when timeout wasn't customisable and now we have an issue with timeout itself 😁 So, we need to get into the weeds on this one to see what would be a good improvement which fits everyone.
We could add a
--no-timeout/-T
to all commands that honour theTIMEOUT
env var but then you'd just be passing it indefinitely and I'm not sure that's a real solution? You could add it to theabra.yml
but then you're locked into using that forever.I imagine having these commands timeout is useful for abra consumers (e.g. alakazam) because otherwise, the command would hang forever? So only warning would probably break some peoples shit.
We could tweak the TIMEOUT logic to only count down when all containers are out of the
preparing
stage. However, this would probably make everyones existing TIMEOUT values less accurate...
I'm not fully against adding a timer but you can also run
time abra app deploy foo.com
and learn from there also?Having the timeout not store the deployed version in the
.env
file is pretty damn annoying!Send halp! Would like to improve this and it seems like there is definitely space for that.
disable deploy timeoutsto Disable deploy timeoutsThank you, seems good enough workaround for now to just add a high timeout to every app i deploy.
Honestly for me that would be sufficient, so I could just forget about timeouts. But it doesn't really solve the issue ofc.
Also, what is
abra.yml
? Cannot find any information on that.Not only a summary deployment time at the end (which i would like), but also a timer how long its deploying already so I can easily see that its taking longer than expected. Though its not super important as I can manage that myself and I see it might increase the output noise.
I think that would be a good step already. And I don't see any harm for existing values, it would just time out a little bit later. From my perspective, they are already inaccurate.
I see a problem that timeouts are chosen decentrally by whoever maintains the recipe on whatever machine they use.
We would need some kind of reference for choosing comparable values. Then we could add a server-wide timeout-coefficient (there is currently no server-specific configuration, correct?) to account for performance differences.
Another path could be generating timeout values centrally via a release CI/CD pipeline and publishing them into the catalogue.
Considering all that makes me think just disabling timeouts sounds like a more sane plan 😅
i was about to report something related about timeouts.
right now it fails with a fatal error, making people think something failed while in fact they can wait a couple minutes while everything is getting ready. maybe make it a warning instead?
so if the registry or internet are slow, the deploy could timeout? i'm ok with no timeout during prepare stage.
maybe related to timeouts: this week i deployed peertube and it timed out. i had to wait some minutes for it to be really up.
i was following logs while waiting and i could see the nginx container restart continually because it couldn't solve the peertube container's hostname. this is something annoying nginx does, failing to start when name resolution fails, but there are workarounds (including setting the dependency order on the compose file).
meanwhile, the peertube container didn't log anything at all, so if i didn't know how to read the nginx error, i would've thought that the recipe was faulty, specially since it already timed out by abra's opinion.
also in general, when someone is learning to sysadmin, any error is assumed to be one's fault, so i'd go easy on how fatal are errors and how to communicate them :D
@moritz @simon have you got any feedback on this re: the usage of timeouts? especially regarding alakazam or other scripts you might be making use of?
i'm thinking we need to make some behaviour changes here to how timeouts are handled. i'm still thinking about it. thanks for feedback all, proposals for practical changes which can reach a compromise between different workflows are more than welcome.
I would also like the timeout to be optional. A timer running also seems like a helpful thing as for longer running deployments like Loomio, aone often gets distracted by other things and looses the feeling on how long the deployment process already ran.
Here is what I'm a thinking
Backward compatible
As is, existing commands would have the existing behavior
Timeout configuration
A new flag (e.g. --timeout-config/-tc) that allows you to pass timeout configuration as a file, env vars, or inline
The configurations could be something like this
Default behavior change
Decide if default behavior would stay the same or a set of default configuration values would be assumed in the next major release of abra
While I can see the benefits to adding config for completeness, that would complicate the code path considerably. I don't see much justification for that right now. Also, maintaining previous behaviour seems unnecessary as many don't even want to deal with the timeout. I think we just need to maintain backwards compat for scripts (which we can by honouring the timeout if there is a
TIMEOUT
present in the.env
). Then we can make sure recipes haveTIMEOUT
commented out by default and operators can remove them from their.env
files for the new version ofabra
. It's not that much churn, I think. It seems like actrl-c
is a much more reliable way to understand if a deployment failed or not because a human did it.I agree, disabling timeout when there is no
TIMEOUT
var seems like a sensible solution. Automation tools like alakazam can simply add the env var by default.Though existing recipes then should remove
TIMEOUT
, also I don't know what exactly is up with existing"coop-cloud.${STACK_NAME}.timeout=${TIMEOUT:-120}"
labels.Unless I hear any major objections, I will take a run at this after the translation work.
In general, I think we're running into the issue that we just need to see the logs of the containers while polling the status of the deployment. It's like the 2 main reliable indicators that determine if the deployment is going well. I think this timeout implementation was an attempt to reduce the worst parts of a failing deployment but it might have made things worse 🙃 I will try to take another run at feed live logs into the
abra app deploy
screen...OK, #612 should cover this now.
Ideally, we'll only see
#TIMEOUT=...
in the.env.sample
in the future and operators can choose to use them or not. Hopefully the migration here is not that painful for scripting purposes and this change as-is is backwards compatible. There's no default any more from theabra
side of things.It would be great if folks who maintain these recipes can remove these label defaults:
E.g.