Write to env version before polling #808
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
When polling fails for some reason (user aborted, timeout) the deployed version will not be recorded in the
.envfile. Though in the cases I encounter the deployed version is always that one, even though it might be failing. If I retry deploy/undeploy it will deploy an old version than which is likely to not work due to data migrations. For me it makes much more sense to write to the env version before polling starts, after swarm accepted the new config.Is there anything I might be missing here?
Would that interfere with how rollback currently works? I never used it, curious how you would use that one and why not simply
deploy -f.Yes, this is definitely a pain point. We are struggling because it is just really hard to understand what exactly it is that Swarm runtime is doing. I started to document this on https://docs.coopcloud.tech/abra/swarm/#limitations.
The Swarm runtime might automatically rollback the deployment version depending on the kind of failure. This will shown in the
abra app deployoutput. Some times, the deployment will fail but keep trying to start up and not get rolled back.To make another ductape patch on the already thousand ductape patches, I think you'd need to do deployment version check on the failure (user abort, timeout, etc.) and check what exactly is the deployed version and then writing it. In that case, with a warning: "it failed but we still wrote it to the
.env?It's messy 🙈
I see, so this would need extensive testing of swarm failure states. I guess some of them might not be easy to provocate...
Yes, we'd need a reliable way to understand how to trigger all failure scenarios of Swarm in an integration test suite. I am not sure what all those states are tbh but that is probably documented somewhere. You can start to get an idea of what is going on with:
The whole rabbit hole is documented well here. If you do this by hand a few times, you can start to see what the Swarm runtime is doing. Is there a way to map out what events happen and what they mean?
I'm not sure what is the appropriate level of commitment to throw into this so far bottomless pit of technical debt. I am personally more interested in investing into a backwards compatible approach to get away from swarm and into "one after the other" linear deployment model. Both approaches are time consuming and perilous 🙃
My pain point relates to when swarm does not abort with a failure, i.e. user aborting (probably because its continually falling over) or timeout. Does it really happen that swarm is rolling back after trying for a while? I would hope that its safe enough to assume its going to stay that way and write the env file. If its not, it could be wrong either way.
And yes, I would not want to invest too much time in handling swarm anymore as well.