Commit Graph

9 Commits

Author SHA1 Message Date
358c36e930 [engine] Graceful upgrade of containerd and runc state files upon live-restore
Vendors new dependency github.com/crosbymichael/upgrade

Signed-off-by: Tibor Vass <tibor@docker.com>
2017-07-22 05:54:46 +00:00
fbc1b3070d Limit max backoff delay to 2 seconds for GRPC connection
Docker use default GRPC backoff strategy to reconnect to containerd when
connection is lost. and the delay time grows exponentially, until reaches 120s.

So Change the max delay time to 2s to avoid docker and containerd
connection failure.

Signed-off-by: Wentao Zhang <zhangwentao234@huawei.com>
(cherry picked from commit d3d8c77d195ce74f36ae6eee24578b9cac48f897)
Signed-off-by: Andrew Hsu <andrewhsu@docker.com>
2017-07-04 03:39:09 +00:00
5965f0216f libcontainerd: fix reaper goroutine position
It has observed defunct containerd processes accumulating over
time while dockerd was permanently failing to restart containerd.
Due to a bug in the runContainerdDaemon() function, dockerd does not clean up
its child process if containerd already exits very soon after the (re)start.

The reproducer and analysis below comes from docker 1.12.x but bug
still applies on latest master.

- from libcontainerd/remote_linux.go:

  329 func (r *remote) runContainerdDaemon() error {
   :
   :      // start the containerd child process
   :
  403     if err := cmd.Start(); err != nil {
  404             return err
  405     }
   :
   :      // If containerd exits very soon after (re)start, it is
possible
   :      // that containerd is already in defunct state at the time
when
   :      // dockerd gets here. The setOOMScore() function tries to
write
   :      // to /proc/PID_OF_CONTAINERD/oom_score_adj. However, this
fails
   :      // with errno EINVAL because containerd is defunct. Please see
   :      // snippets of kernel source code and further explanation
below.
   :
  407     if err := setOOMScore(cmd.Process.Pid, r.oomScore); err != nil
{
  408             utils.KillProcess(cmd.Process.Pid)
   :
   :              // Due to the error from write() we return here. As
the
   :              // goroutine that would clean up the child has not
been
   :              // started yet, containerd remains in the defunct
state
   :              // and never gets reaped.
   :
  409             return err
  410     }
   :
  417     go func() {
  418             cmd.Wait()
  419             close(r.daemonWaitCh)
  420     }() // Reap our child when needed
   :
  423 }

This is the kernel function that gets invoked when dockerd tries to
write
to /proc/PID_OF_CONTAINERD/oom_score_adj.

- from fs/proc/base.c:

 1197 static ssize_t oom_score_adj_write(struct file *file, ...
 1198                                         size_t count, loff_t
*ppos)
 1199 {
   :
 1223         task = get_proc_task(file_inode(file));
   :
   :          // The defunct containerd process does not have a virtual
   :          // address space anymore, i.e. task->mm is NULL. Thus the
   :          // following code returns errno EINVAL to dockerd.
   :
 1230         if (!task->mm) {
 1231                 err = -EINVAL;
 1232                 goto err_task_lock;
 1233         }
   :
 1253 err_task_lock:
   :
 1257         return err < 0 ? err : count;
 1258 }

The purpose of the following program is to demonstrate the behavior of
the oom_score_adj_write() function in connection with a defunct process.

$ cat defunct_test.c

\#include <unistd.h>

main()
{
    pid_t pid = fork();

    if (pid == 0)
        // child
        _exit(0);

    // parent
    pause();
}

$ make defunct_test
cc     defunct_test.c   -o defunct_test

$ ./defunct_test &
[1] 3142

$ ps -f | grep defunct_test | grep -v grep
root      3142  2956  0 13:04 pts/0    00:00:00 ./defunct_test
root      3143  3142  0 13:04 pts/0    00:00:00 [defunct_test] <defunct>

$ echo "ps 3143" | crash -s
  PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
  3143   3142   2  ffff880035def300  ZO   0.0       0      0
defunct_test

$ echo "px ((struct task_struct *)0xffff880035def300)->mm" | crash -s
$1 = (struct mm_struct *) 0x0
                          ^^^ task->mm is NULL

$ cat /proc/3143/oom_score_adj
0

$ echo 0 > /proc/3143/oom_score_adj
-bash: echo: write error: Invalid argument"

---

This patch fixes the above issue by making sure we start the reaper
goroutine as soon as possible.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>

(cherry picked from commit 27087eacbf96e6ef9d48a6d3dc89c7c1cff155b4)

Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>

Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
2017-06-09 13:23:20 -07:00
55f1c34adc Merge pull request #32986 from moypray/containerd_close
fix when rpc reports "transport is closing" error, health check go routine will exit
Upstream-commit: e103125883ec3c03a8523682ed62f33d04e0ade9
Component: engine
2017-05-17 17:04:05 -07:00
76789457c2 Use containerd Status variable when checking container state
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Upstream-commit: 0ea0b2becf119ca7950e8afcf5d440e800484b15
Component: engine
2017-05-15 10:53:51 -07:00
2849edb0ec fix when rpc reports "transport is closing" error, health check go routine will exit
Signed-off-by: Wentao Zhang <zhangwentao234@huawei.com>
Upstream-commit: 60742f9a95cb5eff549a873a53724863f1ab20e2
Component: engine
2017-05-04 00:52:10 +08:00
90c004590f Move process functions to pkg/system
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
Upstream-commit: 8c1ac816657a1371597c4b2d1a758bee0114e0d7
Component: engine
2016-12-12 09:28:41 +01:00
0a4f3b6f0e fix a typo
Signed-off-by: Akshay Karle <akshay.a.karle@gmail.com>
Upstream-commit: 2d08a764211035ec93aa3a97afb2baff074103da
Component: engine
2016-11-17 16:51:37 -05:00
6fb90ed484 Add functional support for Docker sub commands on Solaris
Signed-off-by: Amit Krishnan <krish.amit@gmail.com>

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
Upstream-commit: 934328d8ea650bf8a9c3c719999ce2a1f5dd5df6
Component: engine
2016-11-07 09:06:34 -08:00