It has observed defunct containerd processes accumulating over
time while dockerd was permanently failing to restart containerd.
Due to a bug in the runContainerdDaemon() function, dockerd does not clean up
its child process if containerd already exits very soon after the (re)start.
The reproducer and analysis below comes from docker 1.12.x but bug
still applies on latest master.
- from libcontainerd/remote_linux.go:
329 func (r *remote) runContainerdDaemon() error {
:
: // start the containerd child process
:
403 if err := cmd.Start(); err != nil {
404 return err
405 }
:
: // If containerd exits very soon after (re)start, it is
possible
: // that containerd is already in defunct state at the time
when
: // dockerd gets here. The setOOMScore() function tries to
write
: // to /proc/PID_OF_CONTAINERD/oom_score_adj. However, this
fails
: // with errno EINVAL because containerd is defunct. Please see
: // snippets of kernel source code and further explanation
below.
:
407 if err := setOOMScore(cmd.Process.Pid, r.oomScore); err != nil
{
408 utils.KillProcess(cmd.Process.Pid)
:
: // Due to the error from write() we return here. As
the
: // goroutine that would clean up the child has not
been
: // started yet, containerd remains in the defunct
state
: // and never gets reaped.
:
409 return err
410 }
:
417 go func() {
418 cmd.Wait()
419 close(r.daemonWaitCh)
420 }() // Reap our child when needed
:
423 }
This is the kernel function that gets invoked when dockerd tries to
write
to /proc/PID_OF_CONTAINERD/oom_score_adj.
- from fs/proc/base.c:
1197 static ssize_t oom_score_adj_write(struct file *file, ...
1198 size_t count, loff_t
*ppos)
1199 {
:
1223 task = get_proc_task(file_inode(file));
:
: // The defunct containerd process does not have a virtual
: // address space anymore, i.e. task->mm is NULL. Thus the
: // following code returns errno EINVAL to dockerd.
:
1230 if (!task->mm) {
1231 err = -EINVAL;
1232 goto err_task_lock;
1233 }
:
1253 err_task_lock:
:
1257 return err < 0 ? err : count;
1258 }
The purpose of the following program is to demonstrate the behavior of
the oom_score_adj_write() function in connection with a defunct process.
$ cat defunct_test.c
\#include <unistd.h>
main()
{
pid_t pid = fork();
if (pid == 0)
// child
_exit(0);
// parent
pause();
}
$ make defunct_test
cc defunct_test.c -o defunct_test
$ ./defunct_test &
[1] 3142
$ ps -f | grep defunct_test | grep -v grep
root 3142 2956 0 13:04 pts/0 00:00:00 ./defunct_test
root 3143 3142 0 13:04 pts/0 00:00:00 [defunct_test] <defunct>
$ echo "ps 3143" | crash -s
PID PPID CPU TASK ST %MEM VSZ RSS COMM
3143 3142 2 ffff880035def300 ZO 0.0 0 0
defunct_test
$ echo "px ((struct task_struct *)0xffff880035def300)->mm" | crash -s
$1 = (struct mm_struct *) 0x0
^^^ task->mm is NULL
$ cat /proc/3143/oom_score_adj
0
$ echo 0 > /proc/3143/oom_score_adj
-bash: echo: write error: Invalid argument"
---
This patch fixes the above issue by making sure we start the reaper
goroutine as soon as possible.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
(cherry picked from commit 27087eacbf96e6ef9d48a6d3dc89c7c1cff155b4)
Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
(cherry picked from commit 097516fc76a2c9acf798dc523326904520ff1fc7)
Signed-off-by: Andrew Hsu <andrewhsu@docker.com>
Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
(cherry picked from commit a1b7f6f3407546ff41ed2e0d16d7e1b2a8ac0ef4)
Conflicts:
components/packaging/static/Makefile
Signed-off-by: Andrew Hsu <andrewhsu@docker.com>
So we can use it at will
Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
(cherry picked from commit 533a843393bd7c3674074ec9af73c8e666fc7484)
Conflicts:
components/packaging/static/Makefile
Signed-off-by: Andrew Hsu <andrewhsu@docker.com>
Signed-off-by: Andrew Hsu <andrewhsu@docker.com>
(cherry picked from commit f6d30c3631abee04aa3586cba36960901f705fe2)
Signed-off-by: Andrew Hsu <andrewhsu@docker.com>
Signed-off-by: Andrew Hsu <andrewhsu@docker.com>
(cherry picked from commit ade95a2b6f8a16aa7e6c387dc9c9dd48bba9a672)
Signed-off-by: Andrew Hsu <andrewhsu@docker.com>
Signed-off-by: Andrew Hsu <andrewhsu@docker.com>
(cherry picked from commit e14fca86f8230e10d79a7bfae2873a98a16bb44d)
Signed-off-by: Andrew Hsu <andrewhsu@docker.com>
Because of cherry-pick from commit 17ec46a24316f59c808c112e3ca46d7c442a785a
into components/cli/vendor/github.com/docker/docker
Signed-off-by: Tibor Vass <tibor@docker.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
(cherry picked from commit 17ec46a24316f59c808c112e3ca46d7c442a785a)
Manually update pkg/term vendoring to match changes in moby
Signed-off-by: Tibor Vass <tibor@docker.com>
Add armhf dockerfiles for deb building
Signed-off-by: Eli Uriegas <seemethere101@gmail.com>
(cherry picked from commit f0c8cea1b79b049743cd1503f7ac4a34c265f476)
Signed-off-by: Andrew Hsu <andrewhsu@docker.com>
the source was missing from the second dispatch
Signed-off-by: Daniel Nephin <dnephin@docker.com>
(cherry picked from commit 3f2604157790408acf5ad05c74cebe105f2b6979)
For stack compose files, use filepath.IsAbs instead of path.IsAbs, for
bind-mounted service volumes, because filepath.IsAbs handles Windows
paths, while path.IsAbs does not.
Signed-off-by: John Stephens <johnstep@docker.com>
(cherry picked from commit 9043d39dea)