Best practice healthcheck conventions #539
Labels
No Label
abra
abra-gandi
awaiting-feedback
backups
bug
build
ci/cd
community organising
contributing
coopcloud.tech
democracy
design
documentation
duplicate
enhancement
finance
funding
good first issue
help wanted
installer
kadabra
performance
proposal
question
recipes.coopcloud.tech
security
test
wontfix
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: coop-cloud/organising#539
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This is a very general discussion. I wonder if we could find some good best practice or conventions of how to define good healthchecks in our recipes. This is a very critical topic for us, because in the last time it caused us a lot of nasty problems. Here are the most annoying ones:
start_period
in the recipe again. I would propose to set quite hugestart_period
values for each recipe, or are the any arguments against a too highstart_period
?I think we should also state this in the docs, as this can cause a lot of pain.
I ran into the same issue on Discourse; if the forum is set to require login, then
GET /
serves a 403, instead of a 200 (amusewiki
does similar, but we didn't define a healthcheck for that yet). Solution was to find the/srv/status
endpoint, which works regardless of that setting.Oh yeah that sounds nightmarish. Tuning healthcheck timings is hard; too short and you run into problems like you mention, too long and it increases the chance of walking away from a deployment, not noticing it failed, and then being confused later why the app is still running an old version (or worse, mismatched versions between different services). I wonder if there's a way to make values depend on server load? Otherwise, perhaps a little calculator for different combinations of
interval
/retries
/timeout
/start_period
could help?