Failing to deploy Traefik on server missing kernel #648

Open
opened 2024-11-21 14:46:18 +00:00 by basebuilder · 3 comments

Unclear if (or which) one of the following is true:

  1. Something has changed in installing Docker Swarm and our docs are out of date 🤷‍♀️
  2. The host Njalla is doing something unusual networking wise or Debian image 🤔
  3. I am being incompetent 🤦‍♀️

I haven't done an abra a new server deployment in a few months, but upon following New Operators Tutorial I am failing to deploy traefik on a fresh VPS 😢

When running abra app logs <domain> I see the following log:

 2024-11-21T14:32:43.137232463Z 
2024-11-21T14:32:43.137246929Z ───────────────────────────────────────
F2024-11-21T14:32:43.145140164Z Linuxserver.io version: 1.26.2-r0-ls26
E2024-11-21T14:32:43.145179468Z Build-date: 2024-09-16T20:44:25+00:00
2024-11-21T14:32:43.145196649Z ───────────────────────────────────────
$2024-11-21T14:32:43.145212541Z     
22024-11-21T14:32:43.145226526Z [ls.io-init] done.
2024-11-21T14:32:43.231930674Z 2024/11/21 14:32:43 [alert] 1#1: socketpair() failed while spawning "worker process" (13: Permission denied)
2024-11-21T14:31:27.009590489Z time="2024-11-21T14:31:27Z" level=error msg="Failed to retrieve information of the docker client and server host: Cannot connect to the Docker daemon at tcp://socket-proxy:2375. Is the docker daemon running?" providerName=docker
024-11-21T14:31:27.009761942Z time="2024-11-21T14:31:27Z" level=error msg="Provider connection error Cannot connect to the Docker daemon at tcp://socket-proxy:2375. Is the docker daemon running?, retrying in 4.196028189s" providerName=docker
2024-11-21T14:31:31.209191013Z time="2024-11-21T14:31:31Z" level=error msg="Failed to retrieve information of the docker client and server host: Cannot connect to the Docker daemon at tcp://socket-proxy:2375. Is the docker daemon running?" providerName=docker
024-11-21T14:31:31.209280569Z time="2024-11-21T14:31:31Z" level=error msg="Provider connection error Cannot connect to the Docker daemon at tcp://socket-proxy:2375. Is the docker daemon running?, retrying in 11.663780679s" providerName=docker
2024-11-21T14:31:42.877770751Z time="2024-11-21T14:31:42Z" level=error msg="Failed to retrieve information of the docker client and server host: Cannot connect to the Docker daemon at tcp://socket-proxy:2375. Is the docker daemon running?" providerName=docker
024-11-21T14:31:42.877918022Z time="2024-11-21T14:31:42Z" level=error msg="Provider connection error Cannot connect to the Docker daemon at tcp://socket-proxy:2375. Is the docker daemon running?, retrying in 19.128010463s" providerName=docker

Docker is running. I even see the traefik container running 😱

Unclear if (or which) one of the following is true: 1. Something has changed in installing Docker Swarm and our docs are out of date 🤷‍♀️ 2. The host Njalla is doing something unusual networking wise or Debian image 🤔 3. I am being incompetent 🤦‍♀️ I haven't done an `abra` a new server deployment in a few months, but upon following [New Operators Tutorial](https://docs.coopcloud.tech/operators/tutorial/#server-setup) I am failing to deploy `traefik` on a fresh VPS 😢 When running `abra app logs <domain>` I see the following log: ``` 2024-11-21T14:32:43.137232463Z 2024-11-21T14:32:43.137246929Z ─────────────────────────────────────── F2024-11-21T14:32:43.145140164Z Linuxserver.io version: 1.26.2-r0-ls26 E2024-11-21T14:32:43.145179468Z Build-date: 2024-09-16T20:44:25+00:00 2024-11-21T14:32:43.145196649Z ─────────────────────────────────────── $2024-11-21T14:32:43.145212541Z 22024-11-21T14:32:43.145226526Z [ls.io-init] done. 2024-11-21T14:32:43.231930674Z 2024/11/21 14:32:43 [alert] 1#1: socketpair() failed while spawning "worker process" (13: Permission denied) 2024-11-21T14:31:27.009590489Z time="2024-11-21T14:31:27Z" level=error msg="Failed to retrieve information of the docker client and server host: Cannot connect to the Docker daemon at tcp://socket-proxy:2375. Is the docker daemon running?" providerName=docker 024-11-21T14:31:27.009761942Z time="2024-11-21T14:31:27Z" level=error msg="Provider connection error Cannot connect to the Docker daemon at tcp://socket-proxy:2375. Is the docker daemon running?, retrying in 4.196028189s" providerName=docker 2024-11-21T14:31:31.209191013Z time="2024-11-21T14:31:31Z" level=error msg="Failed to retrieve information of the docker client and server host: Cannot connect to the Docker daemon at tcp://socket-proxy:2375. Is the docker daemon running?" providerName=docker 024-11-21T14:31:31.209280569Z time="2024-11-21T14:31:31Z" level=error msg="Provider connection error Cannot connect to the Docker daemon at tcp://socket-proxy:2375. Is the docker daemon running?, retrying in 11.663780679s" providerName=docker 2024-11-21T14:31:42.877770751Z time="2024-11-21T14:31:42Z" level=error msg="Failed to retrieve information of the docker client and server host: Cannot connect to the Docker daemon at tcp://socket-proxy:2375. Is the docker daemon running?" providerName=docker 024-11-21T14:31:42.877918022Z time="2024-11-21T14:31:42Z" level=error msg="Provider connection error Cannot connect to the Docker daemon at tcp://socket-proxy:2375. Is the docker daemon running?, retrying in 19.128010463s" providerName=docker ``` Docker is running. I even see the `traefik` container running 😱
basebuilder added the
bug
label 2024-11-21 14:46:18 +00:00
Member

I had a similar issue a while ago. I discovered that the user I logged in on the host system with lacked user privileges.

for me, sshing on the host and executing this worked:

$ sudo groupadd -f docker
$ sudo usermod -aG docker $USER
I had a similar issue a while ago. I discovered that the user I logged in on the host system with lacked user privileges. for me, sshing on the host and executing this worked: ``` $ sudo groupadd -f docker $ sudo usermod -aG docker $USER ```
decentral1se added
awaiting-feedback
and removed
bug
labels 2024-11-26 10:52:26 +00:00
Author

Heya thanks. That is a negative. I followed the usermod step of the tutorial 😄

When I compared my groups on the broken server:

root@broken:~# groups
root docker
root@broken:~# groups root
root : root docker

To the groups on a sever that work are the same:

root@working:~# groups
root docker
root@working:~# groups root
root : root docker

I just completely rebuilt the server with a fresh Debian 12 image and went from scratch. Same issue.

I did notice this Debian image was missing wget and apt-utils which I didn't mention before, but this is why i'm thinking there is something about the image as it's more stripped down than say Hetzner's images.

Continuing down the... 🕳️🐇

root@broken:~# docker run hello-world
docker: Cannot connect to the Docker daemon at tcp://localhost:2375. Is the docker daemon running?.

The service is running, but with errors:

root@broken:~# systemctl status docker
● docker.service - Docker Application Container Engine
     Loaded: loaded (/lib/systemd/system/docker.service; enabled; preset: enabled)
     Active: active (running) since Tue 2024-11-26 20:09:32 UTC; 9min ago
TriggeredBy: ● docker.socket
       Docs: https://docs.docker.com
   Main PID: 167 (dockerd)
      Tasks: 10
     Memory: 148.6M
     CGroup: /system.slice/docker.service
             └─167 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

Nov 26 20:09:39 broken dockerd[167]: time="2024-11-26T20:09:39.548063114Z" level=error msg="error reading the kernel parameter net.ipv4.vs.expire_nodest_conn" error="open /proc/sys/net/ipv4/vs/expire_nodest_conn: no such file or directory"
Nov 26 20:09:39 broken dockerd[167]: time="2024-11-26T20:09:39.708354411Z" level=warning msg="Running modprobe ip_vs failed with message: ``, error: exec: \"modprobe\": executable file not found in $PATH"
Nov 26 20:09:39 broken dockerd[167]: time="2024-11-26T20:09:39.751357400Z" level=error msg="Failed to add firewall mark rule in sbox lb_fw1j (lb-trae): open /proc/sys/net/ipv4/vs/conntrack: no such file or directory"
Nov 26 20:09:48 broken dockerd[167]: time="2024-11-26T20:09:48.165488301Z" level=error msg="Failed to add firewall mark rule in sbox ingress (ingress): open /proc/sys/net/ipv4/vs/conntrack: no such file or directory"
Nov 26 20:09:48 broken dockerd[167]: time="2024-11-26T20:09:48.173308477Z" level=error msg="Failed to add firewall mark rule in sbox lb_9w7e (lb-prox): open /proc/sys/net/ipv4/vs/conntrack: no such file or directory"
Nov 26 20:09:48 broken dockerd[167]: time="2024-11-26T20:09:48.179576616Z" level=error msg="Failed to add firewall mark rule in sbox lb_fw1j (lb-trae): open /proc/sys/net/ipv4/vs/conntrack: no such file or directory"
...

Running Docker's check-config.sh script returns the following:

root@broken:~# ./check-config.sh 
warning: /proc/config.gz does not exist, searching other paths for kernel config ...
error: cannot find kernel config
  try running this script again, specifying the kernel config:
    CONFIG=/path/to/kernel/.config ./check-config.sh or ./check-config.sh /path/to/kernel/.config

Further exploring the packages kmod linux-headers-generic were not installed and the /boot directory is empty.

root@broken:/boot# ls -l
total 0

According to the hosts documentation it claims to support Docker on Ubuntu 22.04 using the default docker.io packages so I rebuilt my server with this OS / version (as opposed to Debian 12) and I still get the same permission errors reported above.

So the issue seems to be kernel modules are missing or accessed in a virtualized fashion that Docker Swarm does not appreciate... 🤷‍♀️

Heya thanks. That is a negative. I followed the `usermod` step of the tutorial 😄 When I compared my groups on the `broken` server: ``` root@broken:~# groups root docker root@broken:~# groups root root : root docker ``` To the groups on a sever that work are the same: ``` root@working:~# groups root docker root@working:~# groups root root : root docker ``` I just completely rebuilt the server with a fresh Debian 12 image and went from scratch. Same issue. I did notice this Debian image was missing `wget` and `apt-utils` which I didn't mention before, but this is why i'm thinking there is something about the image as it's more stripped down than say Hetzner's images. Continuing down the... 🕳️🐇 ``` root@broken:~# docker run hello-world docker: Cannot connect to the Docker daemon at tcp://localhost:2375. Is the docker daemon running?. ``` The service is running, but with errors: ``` root@broken:~# systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/lib/systemd/system/docker.service; enabled; preset: enabled) Active: active (running) since Tue 2024-11-26 20:09:32 UTC; 9min ago TriggeredBy: ● docker.socket Docs: https://docs.docker.com Main PID: 167 (dockerd) Tasks: 10 Memory: 148.6M CGroup: /system.slice/docker.service └─167 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock Nov 26 20:09:39 broken dockerd[167]: time="2024-11-26T20:09:39.548063114Z" level=error msg="error reading the kernel parameter net.ipv4.vs.expire_nodest_conn" error="open /proc/sys/net/ipv4/vs/expire_nodest_conn: no such file or directory" Nov 26 20:09:39 broken dockerd[167]: time="2024-11-26T20:09:39.708354411Z" level=warning msg="Running modprobe ip_vs failed with message: ``, error: exec: \"modprobe\": executable file not found in $PATH" Nov 26 20:09:39 broken dockerd[167]: time="2024-11-26T20:09:39.751357400Z" level=error msg="Failed to add firewall mark rule in sbox lb_fw1j (lb-trae): open /proc/sys/net/ipv4/vs/conntrack: no such file or directory" Nov 26 20:09:48 broken dockerd[167]: time="2024-11-26T20:09:48.165488301Z" level=error msg="Failed to add firewall mark rule in sbox ingress (ingress): open /proc/sys/net/ipv4/vs/conntrack: no such file or directory" Nov 26 20:09:48 broken dockerd[167]: time="2024-11-26T20:09:48.173308477Z" level=error msg="Failed to add firewall mark rule in sbox lb_9w7e (lb-prox): open /proc/sys/net/ipv4/vs/conntrack: no such file or directory" Nov 26 20:09:48 broken dockerd[167]: time="2024-11-26T20:09:48.179576616Z" level=error msg="Failed to add firewall mark rule in sbox lb_fw1j (lb-trae): open /proc/sys/net/ipv4/vs/conntrack: no such file or directory" ... ``` Running Docker's [check-config.sh](https://raw.githubusercontent.com/docker/docker/master/contrib/check-config.sh) script returns the following: ``` root@broken:~# ./check-config.sh warning: /proc/config.gz does not exist, searching other paths for kernel config ... error: cannot find kernel config try running this script again, specifying the kernel config: CONFIG=/path/to/kernel/.config ./check-config.sh or ./check-config.sh /path/to/kernel/.config ``` Further exploring the packages `kmod linux-headers-generic` were not installed and the `/boot` directory is empty. ``` root@broken:/boot# ls -l total 0 ``` According to [the hosts documentation](https://njal.la/docs/docker/) it claims to support Docker on `Ubuntu 22.04` using the default `docker.io` packages so I rebuilt my server with this OS / version (as opposed to Debian 12) and I still get the same permission errors reported above. So the issue seems to be kernel modules are missing or accessed in a virtualized fashion that Docker Swarm does not appreciate... 🤷‍♀️

There are multiple issues:

  • The server needs a kernel actually installed in order to be able to load modules required for docker networking
  • the traefik recipe is using endpoint-mode vip for the socket proxy, and ingress routing-mesh port publishing for traefik which makes very little sense

I'll send a PR

There are multiple issues: * The server needs a kernel actually installed in order to be able to load modules required for docker networking * the traefik recipe is using endpoint-mode vip for the socket proxy, and ingress routing-mesh port publishing for traefik which makes very little sense I'll send a PR
basebuilder changed title from Failing to deploy Traefik on new server to Failing to deploy Traefik on server missing kernel 2024-11-27 11:47:05 +00:00
Sign in to join this conversation.
No description provided.