docs: swarm mode almanac

2026-02-16 23:25:35 +01:00
parent 3eff69b530
commit 04829e51ce
2 changed files with 56 additions and 0 deletions
--- a/docs/abra/swarm.md
+++ b/docs/abra/swarm.md
@ -0,0 +1,55 @@
+---
+title: Swarm mode almanac
+---
+
+> !!! warning "This page is a Work In Progress :tm:"
+    A page to understand WTF is going on with [Swarm mode](https://docs.docker.com/engine/swarm/key-concepts/) and how we rely on it, how we might not rely on it and other random threads. Please add to this page as you see fit! If we can establish some shared understanding of what is going on under the hood, we can come up with a collective solution which meets everyones needs.
+
+## Support matrix
+
+In practice, this is what we currently rely on Swarm mode for.
+
+| Feature | Explanation |
+| ----------- | ----------- |
+| Encrypted secrets | When you run `abra secret generate`, it uses something like `printf foo | docker secret create foo -` under the hood. This feature only works if you have first run `docker swarm join`. Swarm mode is [securly transports and stores your secret encrypted on the server](https://docs.docker.com/engine/swarm/secrets/#how-docker-manages-secrets). `docker compose` does not support encrypting or storing secrets because it only runs client-side. |
+| Template driver | If you use `template_driver: golang` in your `compose.yml` to insert secrets or environment variables into your configs, then you are using a template driver. This feature has almost 0 documentation and does not appear to be supported by [the actual Compose Spec](https://github.com/compose-spec/compose-spec/blob/main/08-configs.md) and is actually completely blocked by `docker compose` ([source](https://github.com/docker/compose/blob/f9828dfab909e9dd0dd489a49088c8619ec2ca7e/pkg/compose/create.go#L1095)). Several recipes use this feature and it seems quite critical for our usage. |
+| Stacks | Firstly, [a service](https://docs.docker.com/engine/swarm/how-swarm-mode-works/services/) is key concept here. A stack is a shared namespace of services with networks, volumes, configs etc. The concept of a "Stack" is a [unique](https://docs.docker.com/engine/swarm/stack-deploy/) to Swarm mode. Any replacement for Swarm mode would have to implement this kind of namespacing feature for compatibility. See [`psviderski/uncloud#94`](https://github.com/psviderski/uncloud/discussions/94) for more. |
+| Orchestration | When you run `abra app deploy`, we're running a slightly customised `docker stack deploy` under the hood. Swarm mode is supposed to handle zero downtime updates and rollbacks if things fail automagically. However, we're seeing the limitations of this approach (see below). |
+
+## Unsupport matrix
+
+| Feature | Explanation |
+| ----------- | ----------- |
+| Multi-node | It is possible but it doesn't seem like anyone in our community is really doing this. We believe the majority of Co-op Cloud installs are single node. There is a lack of [CSI](https://github.com/olljanat/csi-plugins-for-docker-swarm?tab=readme-ov-file) support for coordinating storage across multiple hosts when using Swarm mode. This means we kind of throw out [the majority](https://docs.docker.com/engine/swarm/#feature-highlights) of the features of Swarm mode when it comes to running multi-node. |
+
+## Limitations
+
+* Swarm mode is still eerily underdeveloped and lacking features as a system. There are still some lurking network and stability bugs which are common. We're grateful for the undercover live reporting from people in-the-know adjacent to our network below. It does not really put us at ease 👇
+
+!!! note "Docker whiskey leaks"
+
+    > https://www.mirantis.com/blog/mirantis-guarantees-long-term-support-for-swarm/
+    >
+    > Mirantis' relationship with "swarm" is very confusing! my understanding is that there are people (or one person? lol) at mirantis who do some work on the orchestration engine that is "docker swarm," but only to the extent that it supports mirantis' platform. i don't believe there's any active feature development beyond that. you're right that it's a misleading headline -- it sounds to me that they're just saying that they'll continue swarm support in their v3 kubernetes platform, not that they're committed to developing swarm as an orchestration system.
+    >
+    > Way back when (i guess in 2019? before my time!), docker sold off its enterprise platform which was called "swarm" to mirantis, so that's still a product that mirantis has and has developed in their way, but it's not the open-source swarm(kit) that's part of the docker cli. this is a good quick explanation: https://forums.docker.com/t/docker-swarmkit-and-the-mirantis-deal-not-docker-swarm/88886
+
+* The orchestration features of Swarm mode are opaque, causing failed deployments to be very difficult to understand. This can cause a litany of a issues. For example, in the case where your database has been migrated and a rollback of your failing app doesn't support the new schema. This has been discussed extensively on [`organising#682`](https://git.coopcloud.tech/toolshed/organising/issues/682). We need more fine grained control on the order of the deployment to be able to create more insights and stable recovery operations, i.e. an imperative model which is more predictable and easier to work with.
+
+## Potential alternatives
+
+* [`uncloud.run`](https://github.com/psviderski/uncloud): As it turns out, the uncloud folks are creating a very different system. Something beyond compose but not k8s and not swarm. This means they have to implement a lot of features of the orchestration from scratch. However, they're going for a nice approach: a straight-forward imperative deployment model (supports `depends_on` and upgrades stuff one at a time). They're choosing which parts of the Compose Spec they implement and it's noteworthy that they [don't implement secrets yet](https://github.com/psviderski/uncloud/issues/75). See the [Compose support matrix](https://uncloud.run/docs/compose-file-reference/support-matrix) for more. It's a system to [keep an eye on](https://github.com/psviderski/uncloud/milestone/1) with the hope that we can use some part of it in the future.
+
+* [`docker compose`](https://github.com/docker/compose): Plain old `docker compose`. A more elegant weapon for a more civilised age. It is however missing features we need such as encrypted secrets and `template_driver` support. There may be more things missing. They are developing a promising [SDK](https://docs.docker.com/compose/compose-sdk/) exposes a public API for handling various operations. This would need some serious investigation and most likely some custom solutions for the features we're missing.
+
+## What we need
+
+* Something that is backwards compatible with our existing recipe configuration commons and the current deployments. We can't re-invent the wheel because we all rely on this system. So, we need to look towards incremental improvements or changes which are backwards compatible. We can always agree to change the config commons or some shared practices but then we need to establish a clear agreement with decision making. This is the social part.
+
+* Some way of conveniently using secrets when deploying services. This method should easily support working in a team which doesn't stray too far from our established Git Ops workflow of sharing `$ABRA_DIR`. They don't need to be encrypted and stored on the server (removing the need for Swarm mode handling) as long as they're mounted as secrets in the usual `/run/secret/<name>` manner at runtime.
+
+* Template driver support so we can template values into our configurations. This is used in enough recipes to warrant continued support.
+
+* A way to namespace services into a deployment, aka a "Docker Stack". This would appear to be a minor implementation detail after all is said and done. It's services all the way down and they have some linked networks/configs/volumes/etc. and a shared naming convention.
+
+* Some way to achieve [Fearless YunoHost-esque fearless upgrades](https://git.coopcloud.tech/toolshed/organising/issues/682#issuecomment-29302). In other words, some predictable way to deploy / upgrade / rollback and some way to intervene when things go wrong. It should be easy to understand for everyone and would enable real stability for operators. I think we want some sort of anti-orchestration implementation which is super simple.
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -166,6 +166,7 @@ nav:
      - "Hack": abra/hack.md
      - "Troubleshoot": abra/trouble.md
      - "Cheat Sheet": abra/cheat-sheet.md
+      - "Swarm mode almanac": abra/swarm.md
  - "Specifications":
      - specs/index.md
      - "Backups":