Commit Graph

229 Commits

Author SHA1 Message Date
bfc36e79fd Merge pull request #33764 from keloyang/fix-queue-mem-leak
Fix mem leak in libcontainerd/queue/append
Upstream-commit: f88626b270827da7e8a2d668dc2e16bbe1ac53f0
Component: engine
2017-06-22 10:57:07 -07:00
e48afa3455 Merge pull request #33774 from Microsoft/jjh/lcow-networking
LCOW: owner and network endpoints
Upstream-commit: c85f92de15d8f7162b7579143d2be74d8453996d
Component: engine
2017-06-22 16:40:50 +02:00
7ccd764533 fix mem leak in libcontainerd/queue/append
Signed-off-by: yangshukui <yangshukui@huawei.com>
Upstream-commit: 5425a5ab842894d7eb7e26bb706a04be40b67075
Component: engine
2017-06-22 16:47:47 +08:00
926eab07b2 Merge pull request #33772 from cpuguy83/optimizations
Don't json marshal then immediately unmarshal
Upstream-commit: 4fc2710dc7a8fc87849e94770fcf33eececae2fc
Component: engine
2017-06-22 01:19:08 -07:00
40224fcf4b LCOW: owner and network endpoints
Signed-off-by: John Howard <jhoward@microsoft.com>
Upstream-commit: e99a633720563d8d8d2669791cee542c0f989291
Component: engine
2017-06-21 22:35:30 -07:00
f9852d6a7a Don't json marshal then immediately unmarshal
During container startup we end up spending a fair amount of time
encoding/decoding json.
This cuts out some of that since we already have the decoded object in
memory.

The old flow looked like:

1. Start container request
2. Create file
3. Encode container spec to json
4. Write to file
5. Close file
6. Open file
7. Read file
8. Decode container spec
9. Close file
10. Send to containerd.

The new flow cuts out steps 6-9 completely, and with it a lot of time
spent in reflect and file IO.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: 8d588d9c5b5cd019e09bcfc4f790eae79405c7b1
Component: engine
2017-06-21 15:18:01 -07:00
01b491fce5 LCOW: Create layer folders with correct ACL
Signed-off-by: John Howard <jhoward@microsoft.com>
Upstream-commit: ed10ac6ee93cf5c389a735c0c97b08d5d5dff3a9
Component: engine
2017-06-20 19:50:12 -07:00
94ce604e3d LCOW: OCI Spec and Environment for container start
Signed-off-by: John Howard <jhoward@microsoft.com>
Upstream-commit: f1545882264e743fa6d6859ee8687407e28fc35c
Component: engine
2017-06-20 19:50:11 -07:00
b35fc7f268 Remove MkdirAllNewAs and update tests.
Signed-off-by: Daniel Nephin <dnephin@docker.com>
Upstream-commit: 6150ebf7b483197f4b8755df60e750b6410e95ca
Component: engine
2017-06-07 11:44:34 -04:00
5e2107c6ca Merge pull request #33496 from Microsoft/jjh/removedummy
Windows: Correct comment
Upstream-commit: 56da020e6b05c5cb15867049dccc79b51a6e0abb
Component: engine
2017-06-03 01:07:26 +02:00
3386def242 Windows: Correct comment
Signed-off-by: John Howard <jhoward@microsoft.com>
Upstream-commit: 6e33c4158cc895d9b7fb274c13ecac0612c88aaa
Component: engine
2017-06-02 11:51:30 -07:00
6d3cd41f83 Limit max backoff delay to 2 seconds for GRPC connection
Docker use default GRPC backoff strategy to reconnect to containerd when
connection is lost. and the delay time grows exponentially, until reaches 120s.

So Change the max delay time to 2s to avoid docker and containerd
connection failure.

Signed-off-by: Wentao Zhang <zhangwentao234@huawei.com>
Upstream-commit: d3d8c77d195ce74f36ae6eee24578b9cac48f897
Component: engine
2017-06-02 18:19:09 +08:00
5f38794750 Merge pull request #32590 from moypray/containerd
Fix when containerd restarted, event handler may exit
Upstream-commit: d7c125791a3fa6cf233a8f9d017c28ae65d5b535
Component: engine
2017-06-01 08:16:24 -04:00
13833c9420 libcontainerd: fix reaper goroutine position
It has observed defunct containerd processes accumulating over
time while dockerd was permanently failing to restart containerd.
Due to a bug in the runContainerdDaemon() function, dockerd does not clean up
its child process if containerd already exits very soon after the (re)start.

The reproducer and analysis below comes from docker 1.12.x but bug
still applies on latest master.

- from libcontainerd/remote_linux.go:

  329 func (r *remote) runContainerdDaemon() error {
   :
   :      // start the containerd child process
   :
  403     if err := cmd.Start(); err != nil {
  404             return err
  405     }
   :
   :      // If containerd exits very soon after (re)start, it is
possible
   :      // that containerd is already in defunct state at the time
when
   :      // dockerd gets here. The setOOMScore() function tries to
write
   :      // to /proc/PID_OF_CONTAINERD/oom_score_adj. However, this
fails
   :      // with errno EINVAL because containerd is defunct. Please see
   :      // snippets of kernel source code and further explanation
below.
   :
  407     if err := setOOMScore(cmd.Process.Pid, r.oomScore); err != nil
{
  408             utils.KillProcess(cmd.Process.Pid)
   :
   :              // Due to the error from write() we return here. As
the
   :              // goroutine that would clean up the child has not
been
   :              // started yet, containerd remains in the defunct
state
   :              // and never gets reaped.
   :
  409             return err
  410     }
   :
  417     go func() {
  418             cmd.Wait()
  419             close(r.daemonWaitCh)
  420     }() // Reap our child when needed
   :
  423 }

This is the kernel function that gets invoked when dockerd tries to
write
to /proc/PID_OF_CONTAINERD/oom_score_adj.

- from fs/proc/base.c:

 1197 static ssize_t oom_score_adj_write(struct file *file, ...
 1198                                         size_t count, loff_t
*ppos)
 1199 {
   :
 1223         task = get_proc_task(file_inode(file));
   :
   :          // The defunct containerd process does not have a virtual
   :          // address space anymore, i.e. task->mm is NULL. Thus the
   :          // following code returns errno EINVAL to dockerd.
   :
 1230         if (!task->mm) {
 1231                 err = -EINVAL;
 1232                 goto err_task_lock;
 1233         }
   :
 1253 err_task_lock:
   :
 1257         return err < 0 ? err : count;
 1258 }

The purpose of the following program is to demonstrate the behavior of
the oom_score_adj_write() function in connection with a defunct process.

$ cat defunct_test.c

\#include <unistd.h>

main()
{
    pid_t pid = fork();

    if (pid == 0)
        // child
        _exit(0);

    // parent
    pause();
}

$ make defunct_test
cc     defunct_test.c   -o defunct_test

$ ./defunct_test &
[1] 3142

$ ps -f | grep defunct_test | grep -v grep
root      3142  2956  0 13:04 pts/0    00:00:00 ./defunct_test
root      3143  3142  0 13:04 pts/0    00:00:00 [defunct_test] <defunct>

$ echo "ps 3143" | crash -s
  PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
  3143   3142   2  ffff880035def300  ZO   0.0       0      0
defunct_test

$ echo "px ((struct task_struct *)0xffff880035def300)->mm" | crash -s
$1 = (struct mm_struct *) 0x0
                          ^^^ task->mm is NULL

$ cat /proc/3143/oom_score_adj
0

$ echo 0 > /proc/3143/oom_score_adj
-bash: echo: write error: Invalid argument"

---

This patch fixes the above issue by making sure we start the reaper
goroutine as soon as possible.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Upstream-commit: 27087eacbf96e6ef9d48a6d3dc89c7c1cff155b4
Component: engine
2017-05-27 15:13:59 +02:00
292fba1b59 Fix when containerd restarted, event handler may exit
Description:
Kill docker-containerd continuously, and use kill -SIGUSR1 <dockerpid>
to check docker callstacks. And we will find that event
handler: startEventsMonitor or handleEventStream will exit.

This will only happen when system is busy, containerd need more time to
startup, and the monitor gorotine maybe exit.

Signed-off-by: Wentao Zhang <zhangwentao234@huawei.com>
Upstream-commit: 02ce73f62e73e78a4ec29b29fb2ba552221fe885
Component: engine
2017-05-25 17:32:05 +08:00
be30b971d7 Windows: Remove unused SandboxPath
Signed-off-by: John Howard <jhoward@microsoft.com>
Upstream-commit: 2f038c25868727310992104b7b267fed6c7dad39
Component: engine
2017-05-24 13:44:35 -07:00
f1ce9152c0 Use CpuMaximum instead of CpuPercent for more precision
Signed-off-by: Darren Stahl <darst@microsoft.com>
Upstream-commit: 425973cbb87aef6a32b225a57f5ef2d78d5749d5
Component: engine
2017-05-19 12:33:14 -07:00
55f1c34adc Merge pull request #32986 from moypray/containerd_close
fix when rpc reports "transport is closing" error, health check go routine will exit
Upstream-commit: e103125883ec3c03a8523682ed62f33d04e0ade9
Component: engine
2017-05-17 17:04:05 -07:00
76789457c2 Use containerd Status variable when checking container state
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Upstream-commit: 0ea0b2becf119ca7950e8afcf5d440e800484b15
Component: engine
2017-05-15 10:53:51 -07:00
b04f3cd16c fix inconsistent state string with containerd
should be `stopped` according to containerd:
  https://github.com/containerd/containerd/blob/v0.2.x/runtime/runtime.go#L104

Signed-off-by: Deng Guangxing <dengguangxing@huawei.com>
Upstream-commit: 9771780a01e73200f96c84aa83689b7f34092772
Component: engine
2017-05-15 10:53:51 -07:00
ce61a3d4f2 Update moby to runc and oci 1.0 runtime final rc
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Upstream-commit: 005506d36c1c9308a05592d7596f3d484359c426
Component: engine
2017-05-05 13:45:45 -07:00
2849edb0ec fix when rpc reports "transport is closing" error, health check go routine will exit
Signed-off-by: Wentao Zhang <zhangwentao234@huawei.com>
Upstream-commit: 60742f9a95cb5eff549a873a53724863f1ab20e2
Component: engine
2017-05-04 00:52:10 +08:00
8cd4fe0243 Wait to delete container when restoring on Windows
Signed-off-by: Darren Stahl <darst@microsoft.com>
Upstream-commit: dbdc8bbee4a26093e6342f93bb63a09fbe850f58
Component: engine
2017-03-31 10:59:00 -07:00
9a572f349e Merge pull request #31629 from darrenstahlmsft/ShutdownLock
Windows: Stop holding client container lock during shutdown
Upstream-commit: caf8d884aadd4cc56c1ffb524ce9b6d51ad63d88
Component: engine
2017-03-23 18:16:56 -07:00
dc089ee246 Merge pull request #31503 from Microsoft/jjh/cleanuphcsonrestore
Windows: Cleanup HCS on restore
Upstream-commit: 2fca6526d6c9dfdc714f3a1b8ee880fbbab7580e
Component: engine
2017-03-13 13:43:35 +01:00
b027885182 Handle paused container when restoring without live-restore set
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Upstream-commit: c458d3bb98bb85359ffaf3b9e54d837ee001829a
Component: engine
2017-03-09 13:37:08 -08:00
cfda5ee425 Stop holding client container lock during shutdown
Signed-off-by: Darren Stahl <darst@microsoft.com>
Upstream-commit: b819ffdb2079bf2e805f8ca2ed84f184fe601269
Component: engine
2017-03-07 16:24:34 -08:00
b03c0791bf Windows: Cleanup HCS on restore
Signed-off-by: John Howard <jhoward@microsoft.com>

This ensures that any compute processes in HCS are cleanedup
during daemon restore. Note Windows cannot (currently) reconnect
to containers on restore.
Upstream-commit: f59593cbd1c177fe2d5a1b1f00efe9987d25a526
Component: engine
2017-03-02 15:13:12 -08:00
709434a772 (*) Support --net:container:<containername/id> for windows
(*) (vdemeester) Removed duplicate code across Windows and Unix wrt Net:Containers
(*) Return unsupported error for network sharing for hyperv isolation containers

Signed-off-by: Madhan Raj Mookkandy <MadhanRaj.Mookkandy@microsoft.com>
Upstream-commit: 040afcce8f3f54c64d328929c5115128f623deb1
Component: engine
2017-02-28 20:03:43 -08:00
f8a42143d3 Windows: Remove unused commandLine
Signed-off-by: John Howard <jhoward@microsoft.com>
Upstream-commit: b7106a92f26271e0d2c6623446ce4a8bc987c445
Component: engine
2017-02-02 11:16:11 -08:00
b304d83344 Merge pull request #30117 from msabansal/natfix
Added support for dns-search and fixes #30102
Upstream-commit: c0a1d2e0d88ff3cae6802dfbd128c7739e8c2bcc
Component: engine
2017-01-31 11:05:29 +01:00
047ed027d7 Windows: Remove GetPidsForContainer
Signed-off-by: John Howard <jhoward@microsoft.com>
Upstream-commit: f47e417466532212534a18d510a9d76e2e8f9f61
Component: engine
2017-01-18 12:28:52 -08:00
3fdf20b049 Added support for dns-search and fixes #30102
Signed-off-by: msabansal <sabansal@microsoft.com>
Upstream-commit: e6962481a032c7278bc17c8fdcc42831c6d0b88f
Component: engine
2017-01-13 12:01:10 -08:00
2d095c22c0 Remove timeout on fifos opening
Instead of a timeout the context is cancelled on error to ensure
proper cleanup of the associated fifos' goroutines.

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Upstream-commit: c178700a0420118fe7f632af4da1bc43abd3a9bf
Component: engine
2017-01-13 11:46:48 -08:00
8928076872 fix typo in libcontainerd/client.go
Signed-off-by: Aaron.L.Xu <likexu@harmonycloud.cn>
Upstream-commit: 39a24019e3a7b2f423090dff4793698001620737
Component: engine
2017-01-11 23:10:02 +08:00
433a6ae35a Merge pull request #29314 from vdemeester/no-more-utils
Remove the utils package
Upstream-commit: b9ee31ae027bbd62477fea3f58023c90f051db00
Component: engine
2016-12-22 15:21:05 +01:00
c493517769 fix some typos in libcontainer\types_windows.go
Signed-off-by: lixiaobing10051267 <li.xiaobing1@zte.com.cn>
Upstream-commit: f385846d6ff48c17b7fc8173b8370df17c76ad40
Component: engine
2016-12-14 16:33:03 +08:00
90c004590f Move process functions to pkg/system
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
Upstream-commit: 8c1ac816657a1371597c4b2d1a758bee0114e0d7
Component: engine
2016-12-12 09:28:41 +01:00
dc6f3f84fc Fix docker restart panic on machine ungracefully shutdown
Machine ungracefully shutdown leaves a lot of container has a
Running=true state.

```
$ cat config.v2.json | jq .

    "Running": true,
    "Paused": false,
    "Restarting": false,

```

And the next docker start will fail with panic.

```

time="2016-12-01T01:54:45.086446715-05:00" level=warning msg="libcontainerd: client is out of sync, restore was called on a fully synced container (49f41ad5ca0be860622d9190673b5816d012022fb2c1794560ec4851e7cfec6a)."
time="2016-12-01T01:54:45.087046004-05:00" level=warning msg="libcontainerd: failed to retrieve container 49f41ad5ca0be860622d9190673b5816d012022fb2c1794560ec4851e7cfec6a state: rpc error: code = 2 desc = containerd: container not found"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x5db7f3]

goroutine 57 [running]:
panic(0x16a8e60, 0xc420010130)
        /usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/docker/docker/libcontainerd.(*client).Restore(0xc4202e1a40, 0xc420415000, 0x40, 0xc42015a0b0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /go/src/github.com/docker/docker/libcontainerd/client_linux.go:457 +0x553
github.com/docker/docker/daemon.(*Daemon).restore.func1(0xc4201c46f0, 0xc4202581e0, 0xc4201c46e8, 0xc42047bfb0, 0xc42047bf80, 0xc42047bf50, 0xc42024ba10, 0xc420512c00)
        /go/src/github.com/docker/docker/daemon/daemon.go:205 +0x198
created by github.com/docker/docker/daemon.(*Daemon).restore
        /go/src/github.com/docker/docker/daemon/daemon.go:260 +0x7bb

```

Signed-off-by: Lei Jitang <leijitang@huawei.com>
Upstream-commit: 267422e4d08244e701ce049ab55ca0ad9879ba78
Component: engine
2016-12-01 02:25:24 -05:00
eb1bfafb6a Fix race with containerd events stream on restore
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Upstream-commit: 9fff9bb761b3ceb1ef09ab2d6dbdbaa4463a063c
Component: engine
2016-11-30 10:15:39 -08:00
c081d78f0c Ignore "failed to close stdin" if container or process not found
Signed-off-by: Lei Jitang <leijitang@huawei.com>
Upstream-commit: 9aedaf5b3acc0dd0df4a4b67c46cf922d42f62a3
Component: engine
2016-11-29 20:41:39 -05:00
c0d4c1f04f Fix race on sending stdin close event
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Upstream-commit: 4e262f63876018ca78d54a98eee3f533352b0ac9
Component: engine
2016-11-21 17:43:01 -08:00
bee0ae1311 Shutdown instead of terminate process on Windows
Signed-off-by: Darren Stahl <darst@microsoft.com>
Upstream-commit: 8b503242734d66a223bf8d694f26901b1d02106d
Component: engine
2016-11-18 12:05:08 -08:00
0a4f3b6f0e fix a typo
Signed-off-by: Akshay Karle <akshay.a.karle@gmail.com>
Upstream-commit: 2d08a764211035ec93aa3a97afb2baff074103da
Component: engine
2016-11-17 16:51:37 -05:00
4d7a48f6ec Merge pull request #27955 from mlaventure/runc-docker-info
Add external binaries version to docker info
Upstream-commit: 0427afa409f1a2034537b4659bf7a3a1454fa617
Component: engine
2016-11-10 21:27:14 -08:00
828e14b6f7 Adding more strict resource checks on Windows
Signed-off-by: Darren Stahl <darst@microsoft.com>
Upstream-commit: 0ed00b36ff7d8651e4d11a41507f81441c081388
Component: engine
2016-11-09 16:29:54 -08:00
dc6854b4b3 Merge pull request #28184 from Microsoft/jjh/user
Windows: Plumb through user
Upstream-commit: f67d4b897adc9048269d1b91435b6dcbf69e36d1
Component: engine
2016-11-09 11:32:42 -08:00
515ebb7b5c Add expected 3rd party binaries commit ids to info
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Upstream-commit: 2790ac68b32b399c872de88388bdccc359ed7a88
Component: engine
2016-11-09 07:42:44 -08:00
7242352354 Stop returning errors that should be ignored while closing stdin
Signed-off-by: Darren Stahl <darst@microsoft.com>
Upstream-commit: ae35c0f70e96de011ad376c8fffba8e8a52ec21f
Component: engine
2016-11-08 18:25:43 -08:00
234aecaaa6 Windows: Plumb through user
Signed-off-by: John Howard <jhoward@microsoft.com>
Upstream-commit: 5207ff7202327bd06fa7e8df4c58d6a944899b60
Component: engine
2016-11-08 17:41:56 -08:00