It has observed defunct containerd processes accumulating over
time while dockerd was permanently failing to restart containerd.
Due to a bug in the runContainerdDaemon() function, dockerd does not clean up
its child process if containerd already exits very soon after the (re)start.
The reproducer and analysis below comes from docker 1.12.x but bug
still applies on latest master.
- from libcontainerd/remote_linux.go:
329 func (r *remote) runContainerdDaemon() error {
:
: // start the containerd child process
:
403 if err := cmd.Start(); err != nil {
404 return err
405 }
:
: // If containerd exits very soon after (re)start, it is
possible
: // that containerd is already in defunct state at the time
when
: // dockerd gets here. The setOOMScore() function tries to
write
: // to /proc/PID_OF_CONTAINERD/oom_score_adj. However, this
fails
: // with errno EINVAL because containerd is defunct. Please see
: // snippets of kernel source code and further explanation
below.
:
407 if err := setOOMScore(cmd.Process.Pid, r.oomScore); err != nil
{
408 utils.KillProcess(cmd.Process.Pid)
:
: // Due to the error from write() we return here. As
the
: // goroutine that would clean up the child has not
been
: // started yet, containerd remains in the defunct
state
: // and never gets reaped.
:
409 return err
410 }
:
417 go func() {
418 cmd.Wait()
419 close(r.daemonWaitCh)
420 }() // Reap our child when needed
:
423 }
This is the kernel function that gets invoked when dockerd tries to
write
to /proc/PID_OF_CONTAINERD/oom_score_adj.
- from fs/proc/base.c:
1197 static ssize_t oom_score_adj_write(struct file *file, ...
1198 size_t count, loff_t
*ppos)
1199 {
:
1223 task = get_proc_task(file_inode(file));
:
: // The defunct containerd process does not have a virtual
: // address space anymore, i.e. task->mm is NULL. Thus the
: // following code returns errno EINVAL to dockerd.
:
1230 if (!task->mm) {
1231 err = -EINVAL;
1232 goto err_task_lock;
1233 }
:
1253 err_task_lock:
:
1257 return err < 0 ? err : count;
1258 }
The purpose of the following program is to demonstrate the behavior of
the oom_score_adj_write() function in connection with a defunct process.
$ cat defunct_test.c
\#include <unistd.h>
main()
{
pid_t pid = fork();
if (pid == 0)
// child
_exit(0);
// parent
pause();
}
$ make defunct_test
cc defunct_test.c -o defunct_test
$ ./defunct_test &
[1] 3142
$ ps -f | grep defunct_test | grep -v grep
root 3142 2956 0 13:04 pts/0 00:00:00 ./defunct_test
root 3143 3142 0 13:04 pts/0 00:00:00 [defunct_test] <defunct>
$ echo "ps 3143" | crash -s
PID PPID CPU TASK ST %MEM VSZ RSS COMM
3143 3142 2 ffff880035def300 ZO 0.0 0 0
defunct_test
$ echo "px ((struct task_struct *)0xffff880035def300)->mm" | crash -s
$1 = (struct mm_struct *) 0x0
^^^ task->mm is NULL
$ cat /proc/3143/oom_score_adj
0
$ echo 0 > /proc/3143/oom_score_adj
-bash: echo: write error: Invalid argument"
---
This patch fixes the above issue by making sure we start the reaper
goroutine as soon as possible.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Upstream-commit: 27087eacbf96e6ef9d48a6d3dc89c7c1cff155b4
Component: engine
fix when rpc reports "transport is closing" error, health check go routine will exit
Upstream-commit: e103125883ec3c03a8523682ed62f33d04e0ade9
Component: engine
Signed-off-by: John Howard <jhoward@microsoft.com>
This ensures that any compute processes in HCS are cleanedup
during daemon restore. Note Windows cannot (currently) reconnect
to containers on restore.
Upstream-commit: f59593cbd1c177fe2d5a1b1f00efe9987d25a526
Component: engine
Instead of a timeout the context is cancelled on error to ensure
proper cleanup of the associated fifos' goroutines.
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Upstream-commit: c178700a0420118fe7f632af4da1bc43abd3a9bf
Component: engine
Keeping the current behavior for exec, i.e., inheriting
variables from main process. New variables will be added
to current ones. If there's already a variable with that
name it will be overwritten.
Example of usage: docker exec -it -e TERM=vt100 <container> top
Closes#24355.
Signed-off-by: Jonh Wendell <jonh.wendell@redhat.com>
Upstream-commit: e03bf1221ee2c863f25a57af4d415e2d8ff4f26c
Component: engine
The code that handles waiting for the servicing container to complete correctly grabs the exit code and logs a failure, but doesn't return that failure to the caller, mistakenly causing servicing operations to look successful when they really failed during processing.
Signed-off-by: Stefan J. Wernli <swernli@microsoft.com>
Upstream-commit: f65647463eb8e6ba0675d295aed492d3617306a2
Component: engine