Write to env version before polling #808

Open
opened 2026-03-26 11:08:08 +00:00 by iexos · 4 comments
Member

When polling fails for some reason (user aborted, timeout) the deployed version will not be recorded in the .env file. Though in the cases I encounter the deployed version is always that one, even though it might be failing. If I retry deploy/undeploy it will deploy an old version than which is likely to not work due to data migrations. For me it makes much more sense to write to the env version before polling starts, after swarm accepted the new config.

Is there anything I might be missing here?
Would that interfere with how rollback currently works? I never used it, curious how you would use that one and why not simply deploy -f.

When polling fails for some reason (user aborted, timeout) the deployed version will not be recorded in the `.env` file. Though in the cases I encounter the deployed version is always that one, even though it might be failing. If I retry deploy/undeploy it will deploy an old version than which is likely to not work due to data migrations. For me it makes much more sense to write to the env version before polling starts, after swarm accepted the new config. Is there anything I might be missing here? Would that interfere with how [rollback](https://docs.coopcloud.tech/operators/handbook/#abra-app-rollback) currently works? I never used it, curious how you would use that one and why not simply `deploy -f`.
iexos added the
enhancement
label 2026-03-26 11:08:08 +00:00
Owner

Yes, this is definitely a pain point. We are struggling because it is just really hard to understand what exactly it is that Swarm runtime is doing. I started to document this on https://docs.coopcloud.tech/abra/swarm/#limitations.

The Swarm runtime might automatically rollback the deployment version depending on the kind of failure. This will shown in the abra app deploy output. Some times, the deployment will fail but keep trying to start up and not get rolled back.

To make another ductape patch on the already thousand ductape patches, I think you'd need to do deployment version check on the failure (user abort, timeout, etc.) and check what exactly is the deployed version and then writing it. In that case, with a warning: "it failed but we still wrote it to the .env?

It's messy 🙈

Yes, this is definitely a pain point. We are struggling because it is just really hard to understand what exactly it is that Swarm runtime is doing. I started to document this on https://docs.coopcloud.tech/abra/swarm/#limitations. The Swarm runtime might automatically rollback the deployment version depending on the kind of failure. This will shown in the `abra app deploy` output. Some times, the deployment will fail but keep trying to start up and not get rolled back. To make another ductape patch on the already thousand ductape patches, I think you'd need to do deployment version check on the failure (user abort, timeout, etc.) and check what exactly is the deployed version and then writing it. In that case, with a warning: "it failed but we still wrote it to the `.env`? It's messy 🙈
Author
Member

I see, so this would need extensive testing of swarm failure states. I guess some of them might not be easy to provocate...

I see, so this would need extensive testing of swarm failure states. I guess some of them might not be easy to provocate...
Owner

Yes, we'd need a reliable way to understand how to trigger all failure scenarios of Swarm in an integration test suite. I am not sure what all those states are tbh but that is probably documented somewhere. You can start to get an idea of what is going on with:

https://docs.docker.com/reference/cli/docker/system/events/

The whole rabbit hole is documented well here. If you do this by hand a few times, you can start to see what the Swarm runtime is doing. Is there a way to map out what events happen and what they mean?

I'm not sure what is the appropriate level of commitment to throw into this so far bottomless pit of technical debt. I am personally more interested in investing into a backwards compatible approach to get away from swarm and into "one after the other" linear deployment model. Both approaches are time consuming and perilous 🙃

Yes, we'd need a reliable way to understand how to trigger all failure scenarios of Swarm in an integration test suite. I am not sure what all those states are tbh but that is probably documented somewhere. You can start to get an idea of what is going on with: > https://docs.docker.com/reference/cli/docker/system/events/ The whole rabbit hole is documented well [here](https://oneuptime.com/blog/post/2026-02-08-how-to-use-docker-system-events-for-real-time-monitoring/view). If you do this by hand a few times, you can start to see what the Swarm runtime is doing. Is there a way to map out what events happen and what they mean? I'm not sure what is the appropriate level of commitment to throw into this so far bottomless pit of technical debt. I am personally more interested in investing into a backwards compatible approach to get [away from swarm](https://docs.coopcloud.tech/abra/swarm/#what-we-need) and into "one after the other" linear deployment model. Both approaches are time consuming and perilous 🙃
Author
Member

My pain point relates to when swarm does not abort with a failure, i.e. user aborting (probably because its continually falling over) or timeout. Does it really happen that swarm is rolling back after trying for a while? I would hope that its safe enough to assume its going to stay that way and write the env file. If its not, it could be wrong either way.

And yes, I would not want to invest too much time in handling swarm anymore as well.

My pain point relates to when swarm does not abort with a failure, i.e. user aborting (probably because its continually falling over) or timeout. Does it really happen that swarm is rolling back after trying for a while? I would hope that its safe enough to assume its going to stay that way and write the env file. If its not, it could be wrong either way. And yes, I would not want to invest too much time in handling swarm anymore as well.
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: toolshed/abra#808
No description provided.