This is only for the case when dockerd has had to re-mount the daemon
root as shared.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 487c6c7e73dbb7871e80d75f176dd2a3539a2947)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
By default, if a user requests a bind mount it uses private propagation.
When the source path is a path within the daemon root this, along with
some other propagation values that the user can use, causes issues when
the daemon tries to remove a mountpoint because a container will then
have a private reference to that mount which prevents removal.
Unmouting with MNT_DETATCH can help this scenario on newer kernels, but
ultimately this is just covering up the problem and doesn't actually
free up the underlying resources until all references are destroyed.
This change does essentially 2 things:
1. Change the default propagation when unspecified to `rslave` when the
source path is within the daemon root path or a parent of the daemon
root (because everything is using rbinds).
2. Creates a validation error on create when the user tries to specify
an unacceptable propagation mode for these paths...
basically the only two acceptable modes are `rslave` and `rshared`.
In cases where we have used the new default propagation but the
underlying filesystem is not setup to handle it (fs must hvae at least
rshared propagation) instead of erroring out like we normally would,
this falls back to the old default mode of `private`, which preserves
backwards compatibility.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 589a0afa8cbe39b6512662fd1705873e2d236dd0)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This was added in #36047 just as a way to make sure the tree is fully
unmounted on shutdown.
For ZFS this could be a breaking change since there was no unmount before.
Someone could have setup the zfs tree themselves. It would be better, if
we really do want the cleanup to actually the unpacked layers checking
for mounts rather than a blind recursive unmount of the root.
BTRFS does not use mounts and does not need to unmount anyway.
These was only an unmount to begin with because for some reason the
btrfs tree was being moutned with `private` propagation.
For the other graphdrivers that still have a recursive unmount here...
these were already being unmounted and performing the recursive unmount
shouldn't break anything. If anyone had anything mounted at the
graphdriver location it would have been unmounted on shutdown anyway.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 2fe4f888bee52b1f256d6fa5e20f9b061d30221c)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This fix tries to address the issue raised in 36083 where
`network inspect` does not show Created time if the network is
created in swarm scope.
The issue was that Created was not converted from swarm api.
This fix addresses the issue.
An unit test has been added.
This fix fixes 36083.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
(cherry picked from commit 090c439fb8a863731cc80fcb9932ce5958d8166d)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Daniel Nephin <dnephin@docker.com>
(cherry picked from commit 7d296522f68a80d540b20753ea0648171ee12825)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Daniel Nephin <dnephin@docker.com>
(cherry picked from commit 3c4537d5b33d951237ea5e4cc123953eda7a37e7)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
pickup changes which use t.Helper()
Signed-off-by: Daniel Nephin <dnephin@docker.com>
(cherry picked from commit 4ac4b690f78a645cc50030b81077fd5319b53501)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This fixes a vulnerability in `go get` (CVE-2018-6574, http://golang.org/issue/23672),
but shouldn't really affect our code, but it's good to keep in sync.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 6263b1254b179af81ff4ef97563fe2e1a053993a)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This fixes a vulnerability in `go get` (CVE-2018-6574, http://golang.org/issue/23672),
but shouldn't really affect our code, but it's good to keep in sync.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit b32599761f)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This workaround for golang/go#15286 was added for Nano server TP5 in
fa82c0aa10cfac8c6d5e2446876dc79b2b0c1bf9, and should no longer be
needed
Due to a security fix in Go 1.9.4/1.8.7, loading the .dll is no longer
allowed, and produces an error:
.\docker_windows.go:9:3: //go:cgo_import_dynamic main.dummy CommandLineToArgvW%2 "shell32.dll" only allowed in cgo-generated code
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 250193387c98a4ad69a6591d5fe5a39c1409ffba)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This fixes a vulnerability in `go get` (CVE-2018-6574, http://golang.org/issue/23672),
but shouldn't really affect our code, but it's good to keep in sync.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit caeab268430a033fedd27c53be16758ac1a0f71e)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Before this patch, when containerd is restarted (due to a crash, or
kill, whatever), the daemon would keep trying to process the event
stream against the old socket handles. This would lead to a CPU spin due
to the error handling when the client can't connect to containerd.
This change makes sure the containerd remote client is updated for all
registered libcontainerd clients.
This is not neccessarily the ideal fix which would likely require a
major refactor, but at least gets things to a working state with a
minimal patch.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 400126f8698233099259da967378c0a76bc3ea31)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 5c3418e38b9603e8ff582d53c2face57f0f01cce)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This change sets an explicit mount propagation for the daemon root.
This is useful for people who need to bind mount the docker daemon root
into a container.
Since bind mounting the daemon root should only ever happen with at
least `rlsave` propagation (to prevent the container from holding
references to mounts making it impossible for the daemon to clean up its
resources), we should make sure the user is actually able to this.
Most modern systems have shared root (`/`) propagation by default
already, however there are some cases where this may not be so
(e.g. potentially docker-in-docker scenarios, but also other cases).
So this just gives the daemon a little more control here and provides
a more uniform experience across different systems.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit a510192b86e7eb1e1112f3f625d80687fdec6578)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The idea behind making the graphdrivers private is to prevent leaking
mounts into other namespaces.
Unfortunately this is not really what happens.
There is one case where this does work, and that is when the namespace
was created before the daemon's namespace.
However with systemd each system servie winds up with it's own mount
namespace. This causes a race betwen daemon startup and other system
services as to if the mount is actually private.
This also means there is a negative impact when other system services
are started while the daemon is running.
Basically there are too many things that the daemon does not have
control over (nor should it) to be able to protect against these kinds
of leakages. One thing is certain, setting the graphdriver roots to
private disconnects the mount ns heirarchy preventing propagation of
unmounts... new mounts are of course not propagated either, but the
behavior is racey (or just bad in the case of restarting services)... so
it's better to just be able to keep mount propagation in tact.
It also does not protect situations like `-v
/var/lib/docker:/var/lib/docker` where all mounts are recursively bound
into the container anyway.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 9803272f2db84df7955b16c0d847ad72cdc494d1)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Since systemd version 228, a new setting, `TasksMax`, has appeared, which
limits the number of tasks used by a service (via pids cgroup
controller). Unfortunately, a default for this setting, `DefaultTaskMax`,
is set to 512. In systemd version 231 it is changed to 15% which
practically is 4195, as the value from /proc/sys/kernel/pid_max is
treated like 100%).
Either 512 or 4195 is severily limited value for Docker Engine, as it
can run thousands of containers with thousands of tasks in each, and
the number of tasks limit should be set on a per-container basis by the
Docker user. So, the most reasonable setting for `TasksMax` is `unlimited`.
Unfortunately, older versions of systemd warn about unknown `TasksMax`
parameter in `docker.service` file, and the warning is rather annoying,
therefore this setting is commented out by default, and is supposed to
be uncommented by the user.
The problem with that is, once the limit is hit, all sorts of bad things
happen and it's not really clear even to an advanced user that this
setting is the source of issues.
As Fedora 25 ships systemd 231, it (and later Fedora releases) support
TasksMax, so it makes total sense to uncomment the setting, this is what
this commit does.
[17.12: added patch for Fedora 25 spec]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: 9055832bb0725f05d518c3ebc9b7cc93a69420c7
Component: packaging
(cherry picked from commit 02b6af2e96)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since systemd version 228, a new setting, `TasksMax`, has appeared,
which limits the number of tasks used by a service (via pids cgroup
controller). Unfortunately, a default for this setting, `DefaultTaskMax`,
is set to 512. In systemd version 231 it is changed to 15% which
practically is 4195, as the value from /proc/sys/kernel/pid_max is
treated like 100%).
Either 512 or 4195 is severily limited value for Docker Engine,
as it can run thousands of containers with thousands of tasks in each,
and the number of tasks limit should be set on a per-container basis
by the Docker user. So, the most reasonable setting for `TasksMax`
is `unlimited`.
Unfortunately, older versions of systemd warn about unknown `TasksMax`
parameter in `docker.service` file, and the warning is rather annoying,
therefore this setting is commented out by default, and is supposed
to be uncommented by the user.
The problem with that is, once the limit is hit, all sorts of bad things
happen and it's not really clear even to an advanced user that this
setting is the source of issues.
Now, `rules` file already contain a hack to check for the systemd
version (during build time) and in case the version is greater than 227,
uncomment the `TasksMax=unlimited` line. Alas, it does not work
during normal builds, the reason being systemd is not installed
into build environments.
An obvious fix would be to add systemd to the list of installed
packages in all Dockerfiles used to build debs. Fortunately,
there is a simpler way, as libsystemd-dev is installed, and
it's a subpackage of systemd built from the same source and
carrying the same version, so it can also be checked.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: d80738e4b4459816c64757a2a63e5d8058d0ccf4
Component: packaging
(cherry picked from commit 1530820600)
Daemon flags that can be specified multiple times use
singlar names for flags, but plural names for the configuration
file.
To make the daemon configuration know how to correlate
the flag with the corresponding configuration option,
`opt.NewNamedListOptsRef()` should be used instead of
`opt.NewListOptsRef()`.
Commit 6702ac590e6148cb3f606388dde93a011cb14931 attempted
to fix the daemon not corresponding the flag with the configuration
file option, but did so by changing the name of the flag
to plural.
This patch reverts that change, and uses `opt.NewNamedListOptsRef()`
instead.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 6e7715d65ba892a47d355e16bf9ad87fb537a2d0)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 0e676c4bde1d429d21ea083a8bc9f40c0fc51269)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
(cherry picked from commit d10091c86e75fb78eaba96f433dc2cc06c0a54de)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Brings in a cherry pick of 76bfcb0eb1
In sandbox creation we disable IPV6 config. But this causes problem in live-restore case
where all IPV6 configs are wiped out on running container. Hence extra check has been added
take care of this issue.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
(cherry picked from commit 54051e9e64185e442e034c7e49a5707459a9eed2)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>