abra app undeploy hangs #557

Open
opened 2025-05-16 11:05:21 +00:00 by decentral1se · 3 comments
Owner

I've noticed that undeploy hangs until it times out but the stack is down already.

I've noticed that `undeploy` hangs until it times out but the stack is down already.
decentral1se added the
bug
label 2025-05-16 11:05:21 +00:00
decentral1se added this to the Abra v0.11.x project 2025-05-19 09:04:36 +00:00
decentral1se added the
critical fix
label 2025-06-05 12:22:44 +00:00
3wordchant self-assigned this 2025-06-30 14:58:37 +00:00
Author
Owner

While pulling my hair out on #564 for a few hours, I haven't seen a single hanging undeployment (was deploy/undeploying a recipe with 4 services, so a fairly good stress test). I am not sure it's possible to reproduce this now 🤔 I've added some more useful output for --debug to waitOnTasks where I think the issue must be happening if it still is....

func waitOnTasks(ctx context.Context, client apiclient.APIClient, namespace string) (bool, error) {
var timedOut bool
go func() {
if WaitTimeout == 0 {
return
}
log.Debug(i18n.G("timeout: waiting on undeploy tasks (timeout=%v secs)", WaitTimeout))
timeout := time.Duration(WaitTimeout) * time.Second
<-time.After(timeout)
log.Debug(i18n.G("timed out on undeploy (timeout=%v sec)", WaitTimeout))
timedOut = true
}()
terminalStatesReached := 0
for {
tasks, err := getStackTasks(ctx, client, namespace)
if err != nil {
return false, errors.New(i18n.G("failed to get tasks: %w", err))
}
for _, task := range tasks {
if terminalState(task.Status.State) {
log.Debug(i18n.G("waiting for %d task(s) to reach terminal state", len(tasks)-terminalStatesReached))
terminalStatesReached++
break
}
}
if terminalStatesReached == len(tasks) {
log.Debug(i18n.G("all tasks reached terminal state"))
break
}
if timedOut {
return true, errors.New(i18n.G("deployment timed out 🟠"))
}
}
lastSeenCount := -1
for {
containers, err := getStackContainers(ctx, client, namespace)
if err != nil {
return false, errors.New(i18n.G("failed to list containers of stack: %s", namespace))
}
numContainers := len(containers)
if numContainers == 0 {
log.Debug(i18n.G("all containers did really go away"))
break
}
if numContainers != lastSeenCount {
log.Debug(i18n.G("waiting for %d container(s) to really go away", numContainers))
lastSeenCount = numContainers
}
}
return false, nil
}

While pulling my hair out on https://git.coopcloud.tech/toolshed/abra/issues/564 for a few hours, I haven't seen a single hanging undeployment (was deploy/undeploying a recipe with 4 services, so a fairly good stress test). I am not sure it's possible to reproduce this now 🤔 I've added some more useful output for `--debug` to `waitOnTasks` where I think the issue must be happening if it still is.... https://git.coopcloud.tech/toolshed/abra/src/commit/acb617076828164e07be8af96be188d425b2fce2/pkg/upstream/stack/remove.go#L218-L281
Owner

I am not sure it's possible to reproduce this now 🤔

Happy to close on that basis, we can always reopen if someone runs into this again?

> I am not sure it's possible to reproduce this now 🤔 Happy to close on that basis, we can always reopen if someone runs into this again?
decentral1se moved this to Done in Abra v0.11.x on 2025-09-02 22:17:34 +00:00
Author
Owner

Managed to reproduce it! No idea what is causing it state-wise on the runtime side but here's some logs.

Hopefully these are enough to figure out how to skip out of this hanging state... seems like the counting logic is borked somehow.

DEBU <recipe/compose.go:38> COMPOSE_FILE detected, loading /home/d1/.abra_test/recipes/custom-html/compose.yml
DEBU <app/app.go:506> retrieved /home/d/.abra_test/recipes/custom-html/compose.yml for custom-html.foo
DEBU <app/compose.go:81> get label 'coop-cloud.custom-html_foo.timeout'
DEBU <app/compose.go:87> no timeout label found for custom-html_foo
INFO <stack/remove.go:26> initialising undeploy
DEBU <stack/remove.go:132> removing service custom-html_foo_app
DEBU <stack/remove.go:180> removing config custom-html_foo_nginx_default_conf_v6
INFO <stack/remove.go:87> polling undeploy status
DEBU <stack/remove.go:244> waiting for 5 task(s) to reach terminal state
DEBU <stack/remove.go:244> waiting for 4 task(s) to reach terminal state
DEBU <stack/remove.go:244> waiting for 3 task(s) to reach terminal state
DEBU <stack/remove.go:244> waiting for 2 task(s) to reach terminal state
DEBU <stack/remove.go:244> waiting for -3 task(s) to reach terminal state
DEBU <stack/remove.go:244> waiting for -4 task(s) to reach terminal state
DEBU <stack/remove.go:244> waiting for -5 task(s) to reach terminal state
DEBU <stack/remove.go:244> waiting for -6 task(s) to reach terminal state
DEBU <stack/remove.go:244> waiting for -7 task(s) to reach terminal state
DEBU <stack/remove.go:244> waiting for -8 task(s) to reach terminal state
Managed to reproduce it! No idea what is causing it state-wise on the runtime side but here's some logs. Hopefully these are enough to figure out how to skip out of this hanging state... seems like the counting logic is borked somehow. ``` DEBU <recipe/compose.go:38> COMPOSE_FILE detected, loading /home/d1/.abra_test/recipes/custom-html/compose.yml DEBU <app/app.go:506> retrieved /home/d/.abra_test/recipes/custom-html/compose.yml for custom-html.foo DEBU <app/compose.go:81> get label 'coop-cloud.custom-html_foo.timeout' DEBU <app/compose.go:87> no timeout label found for custom-html_foo INFO <stack/remove.go:26> initialising undeploy DEBU <stack/remove.go:132> removing service custom-html_foo_app DEBU <stack/remove.go:180> removing config custom-html_foo_nginx_default_conf_v6 INFO <stack/remove.go:87> polling undeploy status DEBU <stack/remove.go:244> waiting for 5 task(s) to reach terminal state DEBU <stack/remove.go:244> waiting for 4 task(s) to reach terminal state DEBU <stack/remove.go:244> waiting for 3 task(s) to reach terminal state DEBU <stack/remove.go:244> waiting for 2 task(s) to reach terminal state DEBU <stack/remove.go:244> waiting for -3 task(s) to reach terminal state DEBU <stack/remove.go:244> waiting for -4 task(s) to reach terminal state DEBU <stack/remove.go:244> waiting for -5 task(s) to reach terminal state DEBU <stack/remove.go:244> waiting for -6 task(s) to reach terminal state DEBU <stack/remove.go:244> waiting for -7 task(s) to reach terminal state DEBU <stack/remove.go:244> waiting for -8 task(s) to reach terminal state ```
decentral1se modified the project from Abra v0.11.x to Abra "next" 2025-09-29 16:15:57 +00:00
decentral1se moved this to Prioritised in Abra "next" on 2025-09-29 16:16:39 +00:00
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: toolshed/abra#557
No description provided.