Stacks are not network isolated from each other, possible security issue? #683
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The favored networking pattern for a coop cloud recipe is to designate a service to receive incoming web connections, which are routed to the container by the ingress controller (e.g. Traefik).
In order for the service to receive these connections, it must be assigned to the "proxy" overlay network, which Traefik is also assigned to.
The problem is that each other app that's deployed in the swarm also has a service assigned to the proxy network, so all these services can reach each other directly via local IPs. If one service is compromised, it can be used to directly connect to other services while bypassing the ingress controller. This could allow attacks that manipulate x-forwarded-for headers, but could also allow connections to ports which are not intended to be accessible through the ingress controller. Maybe other vulnerabilities, I haven't thought it out fully.
What could be a mitigation?
Thanks for reporting @marlon. I think I've had various fever dreams down through the years as to whether this is happening or not and somehow convinced myself it isn't. Can we please have a minimal reproduction using even the basic
dockercommands using two separate stacks / dummy images with a shell etc. etc. so we can each confirm this behaviour? That would be very appreciated because then we can also use that as a manual "test harness" to work our way around it and also come up with a mitigation and/or migration.Here's a demo:
(not sure if the proxy overlay is always on that subnet, maybe check with
docker network inspect proxy)This is also an attack vector that i am a bit concerned about.
I also tought about how this could be mitigated, but currently i only see a somewhat better solution that would require to redeploy traefik for each dependent deployment:
Each app gets its own network between traefik and the app, and an additional label that tells traefik which network to use. But all the networks need to be added to traefik as well then, and we have to somehow manage them (since currently the gateway network is just added manually).
Edit: maybe it would not require a redeploy of traefik if we would use docker network connect for just adding traefik to each network while it is running, but I have no idea how the restart behavior would be in such a case and if we would need some kind of daemon that takes care about wiring up the network dynamically
@mirsal can I deal you in for your annual coop cloud docker networking extravaganza issue? I know it's early in the year but we all look forward to it.
That is expected, the only more-or-less sane way I can think of would be a separate overlay network for each app (although that would potentially cause a lot of networks to be created, docker networking has a lot of moving parts and a few race conditions) another downside to that approach would be that deploying a recipe would require a traefik restart.
An alternative approach would be manually inserting netfilter rules on container start but I would advise not going down that path of madness.
The main question, imho, is: Depending on the threat model, is it really a risk worth mitigating? normally, an application container would only bind a single port on the proxy network, so not really increasing the attack surface. Traefik is not a firewall. I would say the best way to deal with this is to check that services attached to the traefik overlay network consider it as public and untrusted.
stacks are not network isolated from each other, possible security issue?to Stacks are not network isolated from each other, possible security issue?I forwarded this to @moritz:
He doesn’t currently see a clear, concrete attack vector. The biggest potential risk would be if customers get a full Authentik admin account, since they could use policies to spawn a shell and then gain network visibility/access within the Docker network (at least enough to discover what other services run on the VM). This would be much more serious if databases were attached to the proxy network, but after a quick check it looks like the DB container isn’t in the proxy network by default. Usually only the externally exposed app container is, so with the standard setup it doesn’t seem as critical.
I will chime in to say I was 100% convinced that this is happening, amazing work with the reproducible example @marlon to prove that easily.
The
bestonly attack vector I can come up with hinges on the fact that traffic on theproxynetwork isn't SSL-encrypted; I agree "Traefik is not a firewall" – but it is where SSL terminates.So (untested), what about:
ettercap(sad reality is that many of our recipes' containers run asroot🫠), MITMs traffic between Traefik and recipe BI don't know nearly enough about Docker networking to know if
ettercap's ARP poisoning would work in that environment; if it doesn't then I'm back to "can't think of an attack vector".@3wordchant an unprivileged swarm container would not be able to perform MITM without CAP_NET_ADMIN or host-mode networking, but I believe that's besides the point because with code execution as root within a container, network isolation should be the least of our concern: in many situations, it is not that hard to escape docker containers with the ability to execute arbitrary binary as root as premise.
I haven't seen any discussion about the spoofing IP source by tampering with
X-forwarded-forI mentioned - is that not a concern? Am I mistaken that it works like that?I just skimmed this briefly: https://book.hacktricks.wiki/en/linux-hardening/privilege-escalation/docker-security/docker-breakout-privilege-escalation/index.html and https://kayssel.substack.com/p/docker-escape-breaking-out-of-containers and I couldn't find a way that could be applied easily to typical coop-cloud recipes.
The main causes are
It would be worth investigating if our recipes are well configure or if it's possible to escape the container.
The biggest issue of tampering with X-forwarded-for is to circumvent the rate limit and be more performant to brute force passwords or DoSing a service. But I wouldn't see this as critical. But this depends on how much any of our services relies their security on X-forwarded-for header.
Maybe someone can think of a few alternative attack vectors that I have missed
edit: oh I forgot the monitoring stack 😬 🙈 I think this is the most problematic!!