abra app undeploy doesn't hold off for container deletion #564

Closed
opened 2025-05-25 20:44:57 +00:00 by decentral1se · 1 comment
Owner

Companion issue to #557 following #563.

The original report:

Undeploy doesn't wait for container deletion

When I run abra app undeploy on recipes like matrix-synapse, it seems to return instantly, before the containers are killed. This causes problems when I try to abra app deploy soon after, since it will sometimes result in the network not being recreated.

EDIT: Tracked in #557

Companion issue to https://git.coopcloud.tech/toolshed/abra/issues/557 following https://git.coopcloud.tech/toolshed/abra/issues/563. The original report: > Undeploy doesn't wait for container deletion > > When I run abra app undeploy on recipes like matrix-synapse, it seems to return instantly, before the containers are killed. This causes problems when I try to abra app deploy soon after, since it will sometimes result in the network not being recreated. > > EDIT: Tracked in #557
decentral1se added the
bug
label 2025-05-25 20:44:57 +00:00
decentral1se added the
critical fix
label 2025-06-05 12:22:44 +00:00
decentral1se added this to the Abra v0.11.x project 2025-06-10 06:10:42 +00:00
decentral1se self-assigned this 2025-07-08 09:59:30 +00:00
decentral1se moved this to In Progress in Abra v0.11.x on 2025-08-25 10:15:40 +00:00
Author
Owner

OK, i have confirmed that we basically have the same implementation as docker stack rm.

https://github.com/docker/cli/pull/4259 is basically here:

func waitOnTasks(ctx context.Context, client apiclient.APIClient, namespace string) (bool, error) {
var timedOut bool
go func() {
if WaitTimeout == 0 {
return
}
log.Debug(i18n.G("timeout: waiting on undeploy tasks (timeout=%v secs)", WaitTimeout))
timeout := time.Duration(WaitTimeout) * time.Second
<-time.After(timeout)
log.Debug(i18n.G("timed out on undeploy (timeout=%v sec)", WaitTimeout))
timedOut = true
}()
terminalStatesReached := 0
for {
tasks, err := getStackTasks(ctx, client, namespace)
if err != nil {
return false, errors.New(i18n.G("failed to get tasks: %w", err))
}
for _, task := range tasks {
if terminalState(task.Status.State) {
terminalStatesReached++
break
}
}
if terminalStatesReached == len(tasks) {
break
}
if timedOut {
return true, errors.New(i18n.G("deployment timed out 🟠"))
}
}
return false, nil
}

I noticed that docker stack rm doesn't wait the containers to really, actually go away. There is a time delay between the signalling of shutdown and when the container is really, actually gone. I've added a check to confirm that in #623 It seems to be working after a lot of manual testing.

I think there might be a bug in the upstream implementation that doesn't count the tasks correctly. However, the task listing output coming from the runtime is also seemingly unpredictable for reasons unknown. I have struggled to add more information to the debug output of the undeploy polling and can't get any joy out of it now. I can try come back to it if this issue persists.

Sample --debug output:

DEBU <recipe/compose.go:38> COMPOSE_FILE detected, loading /home/d/.abra_test/recipes/custom-html/compose.yml
DEBU <app/app.go:501> retrieved /home/d/.abra_test/recipes/custom-html/compose.yml for custom-html.test.example.org
DEBU <app/compose.go:81> get label 'coop-cloud.custom-html_test_example_org.timeout'
DEBU <app/compose.go:87> no timeout label found for custom-html_test_example_org
INFO <stack/remove.go:26> initialising undeploy
DEBU <stack/remove.go:132> removing service custom-html_test_example_org_app
DEBU <stack/remove.go:132> removing service custom-html_test_example_org_bar4
DEBU <stack/remove.go:132> removing service custom-html_test_example_org_baz2
DEBU <stack/remove.go:132> removing service custom-html_test_example_org_foo3
DEBU <stack/remove.go:180> removing config custom-html_test_example_org_nginx_default_conf_v6
INFO <stack/remove.go:87> polling undeploy status
DEBU <stack/remove.go:244> waiting for 4 task(s) to reach terminal state
DEBU <stack/remove.go:244> waiting for 3 task(s) to reach terminal state
DEBU <stack/remove.go:244> waiting for 2 task(s) to reach terminal state
DEBU <stack/remove.go:244> waiting for 1 task(s) to reach terminal state
DEBU <stack/remove.go:251> all tasks reached terminal state
DEBU <stack/remove.go:275> waiting for 3 container(s) to really go away
DEBU <stack/remove.go:275> waiting for 2 container(s) to really go away
DEBU <stack/remove.go:275> waiting for 1 container(s) to really go away
DEBU <stack/remove.go:270> all containers did really go away
INFO <app/undeploy.go:103> undeploy succeeded 🟢
DEBU <app/app.go:683> version 8a026066+U saved to custom-html.test.example.org.env
OK, i have confirmed that we basically have the same implementation as `docker stack rm`. https://github.com/docker/cli/pull/4259 is basically here: https://git.coopcloud.tech/toolshed/abra/src/commit/20909695e0e05c6251029dba270b3d4741aeb7a8/pkg/upstream/stack/remove.go#L213-L254 I noticed that `docker stack rm` doesn't wait the containers to really, actually go away. There is a time delay between the signalling of `shutdown` and when the container is really, actually gone. I've added a check to confirm that in https://git.coopcloud.tech/toolshed/abra/pulls/623 It seems to be working after a lot of manual testing. I think there might be a bug in the upstream implementation that doesn't count the tasks correctly. However, the task listing output coming from the runtime is also seemingly unpredictable for reasons unknown. I have struggled to add more information to the debug output of the undeploy polling and can't get any joy out of it now. I can try come back to it if this issue persists. Sample `--debug` output: ``` DEBU <recipe/compose.go:38> COMPOSE_FILE detected, loading /home/d/.abra_test/recipes/custom-html/compose.yml DEBU <app/app.go:501> retrieved /home/d/.abra_test/recipes/custom-html/compose.yml for custom-html.test.example.org DEBU <app/compose.go:81> get label 'coop-cloud.custom-html_test_example_org.timeout' DEBU <app/compose.go:87> no timeout label found for custom-html_test_example_org INFO <stack/remove.go:26> initialising undeploy DEBU <stack/remove.go:132> removing service custom-html_test_example_org_app DEBU <stack/remove.go:132> removing service custom-html_test_example_org_bar4 DEBU <stack/remove.go:132> removing service custom-html_test_example_org_baz2 DEBU <stack/remove.go:132> removing service custom-html_test_example_org_foo3 DEBU <stack/remove.go:180> removing config custom-html_test_example_org_nginx_default_conf_v6 INFO <stack/remove.go:87> polling undeploy status DEBU <stack/remove.go:244> waiting for 4 task(s) to reach terminal state DEBU <stack/remove.go:244> waiting for 3 task(s) to reach terminal state DEBU <stack/remove.go:244> waiting for 2 task(s) to reach terminal state DEBU <stack/remove.go:244> waiting for 1 task(s) to reach terminal state DEBU <stack/remove.go:251> all tasks reached terminal state DEBU <stack/remove.go:275> waiting for 3 container(s) to really go away DEBU <stack/remove.go:275> waiting for 2 container(s) to really go away DEBU <stack/remove.go:275> waiting for 1 container(s) to really go away DEBU <stack/remove.go:270> all containers did really go away INFO <app/undeploy.go:103> undeploy succeeded 🟢 DEBU <app/app.go:683> version 8a026066+U saved to custom-html.test.example.org.env ```
decentral1se moved this to Done in Abra v0.11.x on 2025-08-28 14:30:22 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: toolshed/abra#564
No description provided.