Stacks are not network isolated from each other, possible security issue? #683

New Issue

2026-01-24T21:11:31Z

marlon commented

2026-01-24 21:11:31 +00:00

The favored networking pattern for a coop cloud recipe is to designate a service to receive incoming web connections, which are routed to the container by the ingress controller (e.g. Traefik).
In order for the service to receive these connections, it must be assigned to the "proxy" overlay network, which Traefik is also assigned to.

The problem is that each other app that's deployed in the swarm also has a service assigned to the proxy network, so all these services can reach each other directly via local IPs. If one service is compromised, it can be used to directly connect to other services while bypassing the ingress controller. This could allow attacks that manipulate x-forwarded-for headers, but could also allow connections to ports which are not intended to be accessible through the ingress controller. Maybe other vulnerabilities, I haven't thought it out fully.

What could be a mitigation?

The favored networking pattern for a coop cloud recipe is to designate a service to receive incoming web connections, which are routed to the container by the ingress controller (e.g. Traefik). In order for the service to receive these connections, it must be assigned to the "proxy" overlay network, which Traefik is also assigned to. The problem is that each other app that's deployed in the swarm also has a service assigned to the proxy network, so all these services can reach each other directly via local IPs. If one service is compromised, it can be used to directly connect to other services while bypassing the ingress controller. This could allow attacks that manipulate x-forwarded-for headers, but could also allow connections to ports which are not intended to be accessible through the ingress controller. Maybe other vulnerabilities, I haven't thought it out fully. What could be a mitigation?

❤️ 1

decentral1se commented

2026-01-25 12:50:24 +00:00

Thanks for reporting @marlon. I think I've had various fever dreams down through the years as to whether this is happening or not and somehow convinced myself it isn't. Can we please have a minimal reproduction using even the basic docker commands using two separate stacks / dummy images with a shell etc. etc. so we can each confirm this behaviour? That would be very appreciated because then we can also use that as a manual "test harness" to work our way around it and also come up with a mitigation and/or migration.

Thanks for reporting @marlon. I think I've had various fever dreams down through the years as to whether this is happening or not and somehow convinced myself it isn't. Can we please have a minimal reproduction using even the basic `docker` commands using two separate stacks / dummy images with a shell etc. etc. so we can each confirm this behaviour? That would be very appreciated because then we can also use that as a manual "test harness" to work our way around it and also come up with a mitigation and/or migration.

decentral1se added the security label 2026-01-25 12:50:30 +00:00

marlon commented

2026-01-25 19:19:18 +00:00

Here's a demo:

abra app new netshoot
abra app deploy <app-name>
abra app run <app-name> app -- nmap -sS -sV -T5 10.0.1.\*

(not sure if the proxy overlay is always on that subnet, maybe check with docker network inspect proxy)

Here's a demo: ``` abra app new netshoot abra app deploy <app-name> abra app run <app-name> app -- nmap -sS -sV -T5 10.0.1.\* ``` (not sure if the proxy overlay is always on that subnet, maybe check with `docker network inspect proxy`)

❤️ 1

Apfelwurm commented

2026-01-25 22:36:32 +00:00

This is also an attack vector that i am a bit concerned about.
I also tought about how this could be mitigated, but currently i only see a somewhat better solution that would require to redeploy traefik for each dependent deployment:
Each app gets its own network between traefik and the app, and an additional label that tells traefik which network to use. But all the networks need to be added to traefik as well then, and we have to somehow manage them (since currently the gateway network is just added manually).
Edit: maybe it would not require a redeploy of traefik if we would use docker network connect for just adding traefik to each network while it is running, but I have no idea how the restart behavior would be in such a case and if we would need some kind of daemon that takes care about wiring up the network dynamically

This is also an attack vector that i am a bit concerned about. I also tought about how this could be mitigated, but currently i only see a somewhat better solution that would require to redeploy traefik for each dependent deployment: Each app gets its own network between traefik and the app, and an additional label that tells traefik which network to use. But all the networks need to be added to traefik as well then, and we have to somehow manage them (since currently the gateway network is just added manually). Edit: maybe it would not require a redeploy of traefik if we would use docker network connect for just adding traefik to each network while it is running, but I have no idea how the restart behavior would be in such a case and if we would need some kind of daemon that takes care about wiring up the network dynamically

❤️ 1

decentral1se commented

2026-01-26 07:10:10 +00:00

@mirsal can I deal you in for your annual coop cloud docker networking extravaganza issue? I know it's early in the year but we all look forward to it.

mirsal commented

2026-01-26 09:30:46 +00:00

That is expected, the only more-or-less sane way I can think of would be a separate overlay network for each app (although that would potentially cause a lot of networks to be created, docker networking has a lot of moving parts and a few race conditions) another downside to that approach would be that deploying a recipe would require a traefik restart.

An alternative approach would be manually inserting netfilter rules on container start but I would advise not going down that path of madness.

The main question, imho, is: Depending on the threat model, is it really a risk worth mitigating? normally, an application container would only bind a single port on the proxy network, so not really increasing the attack surface. Traefik is not a firewall. I would say the best way to deal with this is to check that services attached to the traefik overlay network consider it as public and untrusted.

That is expected, the only more-or-less sane way I can think of would be a separate overlay network for each app (although that would potentially cause a lot of networks to be created, docker networking has a lot of moving parts and a few race conditions) another downside to that approach would be that deploying a recipe would require a traefik restart. An alternative approach would be manually inserting netfilter rules on container start but I would advise not going down that path of madness. The main question, imho, is: Depending on the threat model, is it really a risk worth mitigating? normally, an application container would only bind a single port on the proxy network, so not really increasing the attack surface. Traefik is not a firewall. I would say the best way to deal with this is to check that services attached to the traefik overlay network consider it as public and untrusted.

👍 1 👀 2 ❤️ 2

decentral1se changed title from ~~stacks are not network isolated from each other, possible security issue?~~ to Stacks are not network isolated from each other, possible security issue?

2026-01-26 20:22:56 +00:00

simon commented

2026-01-27 13:03:23 +00:00

I forwarded this to @moritz:

He doesn’t currently see a clear, concrete attack vector. The biggest potential risk would be if customers get a full Authentik admin account, since they could use policies to spawn a shell and then gain network visibility/access within the Docker network (at least enough to discover what other services run on the VM). This would be much more serious if databases were attached to the proxy network, but after a quick check it looks like the DB container isn’t in the proxy network by default. Usually only the externally exposed app container is, so with the standard setup it doesn’t seem as critical.

I forwarded this to @moritz: He doesn’t currently see a clear, concrete attack vector. The biggest potential risk would be if customers get a full Authentik admin account, since they could use policies to spawn a shell and then gain network visibility/access within the Docker network (at least enough to discover what other services run on the VM). This would be much more serious if databases were attached to the proxy network, but after a quick check it looks like the DB container isn’t in the proxy network by default. Usually only the externally exposed app container is, so with the standard setup it doesn’t seem as critical.

❤️ 1

3wordchant commented

2026-01-30 05:19:56 +00:00

I will chime in to say I was 100% convinced that this is happening, amazing work with the reproducible example @marlon to prove that easily.

The ~~best~~ only attack vector I can come up with hinges on the fact that traffic on the proxy network isn't SSL-encrypted; I agree "Traefik is not a firewall" – but it is where SSL terminates.

So (untested), what about:

Deploy recipe A and recipe B to the same swarm
Recipe A has a remote code execution vulnerability that leads to shell access
Attacker installs ettercap (sad reality is that many of our recipes' containers run as root 🫠), MITMs traffic between Traefik and recipe B
Recipe B's traffic can now be inspected and modified as well

I don't know nearly enough about Docker networking to know if ettercap's ARP poisoning would work in that environment; if it doesn't then I'm back to "can't think of an attack vector".

I will chime in to say I was 100% convinced that this is happening, amazing work with the reproducible example @marlon to prove that easily. The ~best~ only attack vector I can come up with hinges on the fact that traffic on the `proxy` network isn't SSL-encrypted; I agree "Traefik is not a firewall" – but it _is_ where SSL terminates. So (untested), what about: 1. Deploy recipe A and recipe B to the same swarm 2. Recipe A has a remote code execution vulnerability that leads to shell access 3. Attacker installs `ettercap` (sad reality is that many of our recipes' containers run as `root` 🫠), MITMs traffic between Traefik and recipe B 4. Recipe B's traffic can now be inspected and modified as well I don't know nearly enough about Docker networking to know if `ettercap`'s ARP poisoning would work in that environment; if it doesn't then I'm back to "can't think of an attack vector".

❤️ 1

mirsal commented

2026-01-30 13:23:10 +00:00

@3wordchant an unprivileged swarm container would not be able to perform MITM without CAP_NET_ADMIN or host-mode networking, but I believe that's besides the point because with code execution as root within a container, network isolation should be the least of our concern: in many situations, it is not that hard to escape docker containers with the ability to execute arbitrary binary as root as premise.

❤️ 1

marlon commented

2026-01-30 14:29:09 +00:00

I haven't seen any discussion about the spoofing IP source by tampering with X-forwarded-for I mentioned - is that not a concern? Am I mistaken that it works like that?

I haven't seen any discussion about the spoofing IP source by tampering with `X-forwarded-for` I mentioned - is that not a concern? Am I mistaken that it works like that?

❤️ 1

moritz commented

2026-02-04 19:33:57 +00:00

it is not that hard to escape docker containers with the ability to execute arbitrary binary as root as premise.

I just skimmed this briefly: https://book.hacktricks.wiki/en/linux-hardening/privilege-escalation/docker-security/docker-breakout-privilege-escalation/index.html and https://kayssel.substack.com/p/docker-escape-breaking-out-of-containers and I couldn't find a way that could be applied easily to typical coop-cloud recipes.
The main causes are

exposed docker socket, but as far as I know, only traefik and the backupbot expose the docker socket. And traefik uses the socket-proxy container, to reduce the attack surface. And if an attacker could access traefik or backupbot it would be too late anyway, independent of the network.
privileged container (At least I haven't seen it in any recipe yet)
Host Path Mounts (The only recipe I know, that uses it is the backupbot. But this container doesn't expose any ports and the attack surface is quite negligible. The only I way I see is an exploit in restic while parsing the volumes or by tampering with the docker label (for this you need access to the docker socket).
kernel exploits are rare, expensive and they are quickly fixed once they are published. Regular kernel updates are necessary anyway!

It would be worth investigating if our recipes are well configure or if it's possible to escape the container.

I haven't seen any discussion about the spoofing IP source by tampering with X-forwarded-for I mentioned - is that not a concern? Am I mistaken that it works like that?

The biggest issue of tampering with X-forwarded-for is to circumvent the rate limit and be more performant to brute force passwords or DoSing a service. But I wouldn't see this as critical. But this depends on how much any of our services relies their security on X-forwarded-for header.

Maybe someone can think of a few alternative attack vectors that I have missed

edit: oh I forgot the monitoring stack 😬 🙈 I think this is the most problematic!!

> it is not that hard to escape docker containers with the ability to execute arbitrary binary as root as premise. I just skimmed this briefly: https://book.hacktricks.wiki/en/linux-hardening/privilege-escalation/docker-security/docker-breakout-privilege-escalation/index.html and https://kayssel.substack.com/p/docker-escape-breaking-out-of-containers and I couldn't find a way that could be applied *easily* to typical coop-cloud recipes. The main causes are - exposed **docker socket**, but as far as I know, only traefik and the backupbot expose the docker socket. And traefik uses the socket-proxy container, to reduce the attack surface. And if an attacker could access traefik or backupbot it would be too late anyway, independent of the network. - **privileged container** (At least I haven't seen it in any recipe yet) - **Host Path Mounts** (The only recipe I know, that uses it is the backupbot. But this container doesn't expose any ports and the attack surface is quite negligible. The only I way I see is an exploit in restic while parsing the volumes or by tampering with the docker label (for this you need access to the docker socket). - **kernel exploits** are rare, expensive and they are quickly fixed once they are published. Regular kernel updates are necessary anyway! It would be worth investigating if our recipes are well configure or if it's possible to escape the container. > I haven't seen any discussion about the spoofing IP source by tampering with X-forwarded-for I mentioned - is that not a concern? Am I mistaken that it works like that? The biggest issue of tampering with X-forwarded-for is to circumvent the rate limit and be more performant to brute force passwords or DoSing a service. But I wouldn't see this as **critical**. But this depends on how much any of our services relies their security on X-forwarded-for header. Maybe someone can think of a few alternative attack vectors that I have missed edit: oh I forgot the monitoring stack 😬 🙈 I think this is the most problematic!!

❤️ 1

Sign in to join this conversation.

7 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: toolshed/organising#683