The idea behind making the graphdrivers private is to prevent leaking
mounts into other namespaces.
Unfortunately this is not really what happens.
There is one case where this does work, and that is when the namespace
was created before the daemon's namespace.
However with systemd each system servie winds up with it's own mount
namespace. This causes a race betwen daemon startup and other system
services as to if the mount is actually private.
This also means there is a negative impact when other system services
are started while the daemon is running.
Basically there are too many things that the daemon does not have
control over (nor should it) to be able to protect against these kinds
of leakages. One thing is certain, setting the graphdriver roots to
private disconnects the mount ns heirarchy preventing propagation of
unmounts... new mounts are of course not propagated either, but the
behavior is racey (or just bad in the case of restarting services)... so
it's better to just be able to keep mount propagation in tact.
It also does not protect situations like `-v
/var/lib/docker:/var/lib/docker` where all mounts are recursively bound
into the container anyway.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: 9803272f2db84df7955b16c0d847ad72cdc494d1
Component: engine
Even though it's highly discouraged, there are existing
installs that are running overlay/overlay2 on filesystems
without d_type support.
This patch allows the daemon to start in such cases, instead of
refusing to start without an option to override.
For fresh installs, backing filesystems without d_type support
will still cause the overlay/overlay2 drivers to be marked as
"unsupported", and skipped during the automatic selection.
This feature is only to keep backward compatibility, but
will be removed at some point.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 0a4e793a3da9ba6d20bccfb83f7c48e20a76d895
Component: engine
Support for running overlay/overlay2 on a backing filesystem
without d_type support (most likely: xfs, as ext4 supports
this by default), was deprecated for some time.
Running without d_type support is problematic, and can
lead to difficult to debug issues ("invalid argument" errors,
or unable to remove files from the container's filesystem).
This patch turns the warning that was previously printed
into an "unsupported" error, so that the overlay/overlay2
drivers are not automatically selected when detecting supported
storage drivers.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 0abb8dec3f730f3ad2cc9a161c97968a6bfd0631
Component: engine
The fsmagic check was always performed on "data-root" (`/var/lib/docker`),
not on the storage-driver's home directory (e.g. `/var/lib/docker/<somedriver>`).
This caused detection to be done on the wrong filesystem in situations
where `/var/lib/docker/<somedriver>` was a mount, and a different
filesystem than `/var/lib/docker` itself.
This patch checks if the storage-driver's home directory exists, and only
falls back to `/var/lib/docker` if it doesn't exist.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: f9c8fa305e1501d8056f8744cb193a720aab0e13
Component: engine
This change makes the VFS graphdriver use the kernel-accelerated
(copy_file_range) mechanism of copying files, which is able to
leverage reflinks.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Upstream-commit: d2b71b26604370620630d8d3f35aba75ae474f3f
Component: engine
This subtle bug keeps lurking in because error checking for `Mkdir()`
and `MkdirAll()` is slightly different wrt to `EEXIST`/`IsExist`:
- for `Mkdir()`, `IsExist` error should (usually) be ignored
(unless you want to make sure directory was not there before)
as it means "the destination directory was already there"
- for `MkdirAll()`, `IsExist` error should NEVER be ignored.
Mostly, this commit just removes ignoring the IsExist error, as it
should not be ignored.
Also, there are a couple of cases then IsExist is handled as
"directory already exist" which is wrong. As a result, some code
that never worked as intended is now removed.
NOTE that `idtools.MkdirAndChown()` behaves like `os.MkdirAll()`
rather than `os.Mkdir()` -- so its description is amended accordingly,
and its usage is handled as such (i.e. IsExist error is not ignored).
For more details, a quote from my runc commit 6f82d4b (July 2015):
TL;DR: check for IsExist(err) after a failed MkdirAll() is both
redundant and wrong -- so two reasons to remove it.
Quoting MkdirAll documentation:
> MkdirAll creates a directory named path, along with any necessary
> parents, and returns nil, or else returns an error. If path
> is already a directory, MkdirAll does nothing and returns nil.
This means two things:
1. If a directory to be created already exists, no error is
returned.
2. If the error returned is IsExist (EEXIST), it means there exists
a non-directory with the same name as MkdirAll need to use for
directory. Example: we want to MkdirAll("a/b"), but file "a"
(or "a/b") already exists, so MkdirAll fails.
The above is a theory, based on quoted documentation and my UNIX
knowledge.
3. In practice, though, current MkdirAll implementation [1] returns
ENOTDIR in most of cases described in #2, with the exception when
there is a race between MkdirAll and someone else creating the
last component of MkdirAll argument as a file. In this very case
MkdirAll() will indeed return EEXIST.
Because of #1, IsExist check after MkdirAll is not needed.
Because of #2 and #3, ignoring IsExist error is just plain wrong,
as directory we require is not created. It's cleaner to report
the error now.
Note this error is all over the tree, I guess due to copy-paste,
or trying to follow the same usage pattern as for Mkdir(),
or some not quite correct examples on the Internet.
[1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: 516010e92d56cfcd6d1e343bdc02b6f04bc43039
Component: engine
This removes and recreates the merged dir with each umount/mount
respectively.
This is done to make the impact of leaking mountpoints have less
user-visible impact.
It's fairly easy to accidentally leak mountpoints (even if moby doesn't,
other tools on linux like 'unshare' are quite able to incidentally do
so).
As of recently, overlayfs reacts to these mounts being leaked (see
One trick to force an unmount is to remove the mounted directory and
recreate it. Devicemapper now does this, overlay can follow suit.
Signed-off-by: Euan Kemp <euan.kemp@coreos.com>
Upstream-commit: af0d589623eff9f8cefced8b527dbd7cf221ce61
Component: engine
From https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt:
> The lower filesystem can be any filesystem supported by Linux and does
> not need to be writable. The lower filesystem can even be another
> overlayfs. The upper filesystem will normally be writable and if it
> is it must support the creation of trusted.* extended attributes, and
> must provide valid d_type in readdir responses, so NFS is not suitable.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 90dfb1d0cc59d79ccb272997d735864615010785
Component: engine
This enables docker cp and ADD/COPY docker build support for LCOW.
Originally, the graphdriver.Get() interface returned a local path
to the container root filesystem. This does not work for LCOW, so
the Get() method now returns an interface that LCOW implements to
support copying to and from the container.
Signed-off-by: Akash Gupta <akagup@microsoft.com>
Upstream-commit: 7a7357dae1bcccb17e9b2d4c7c8f5c025fce56ca
Component: engine
This commit reverts a hunk of commit 2f5f0af3f ("Add unconvert linter")
and adds a hint for unconvert linter to ignore excessive conversion as
it is required on 32-bit platforms (e.g. armhf).
The exact error on armhf is this:
19:06:45 ---> Making bundle: dynbinary (in bundles/17.06.0-dev/dynbinary)
19:06:48 Building: bundles/17.06.0-dev/dynbinary-daemon/dockerd-17.06.0-dev
19:10:58 # github.com/docker/docker/daemon/graphdriver/overlay
19:10:58 daemon/graphdriver/overlay/copy.go:161: cannot use stat.Atim.Sec (type int32) as type int64 in argument to time.Unix
19:10:58 daemon/graphdriver/overlay/copy.go:161: cannot use stat.Atim.Nsec (type int32) as type int64 in argument to time.Unix
19:10:58 daemon/graphdriver/overlay/copy.go:162: cannot use stat.Mtim.Sec (type int32) as type int64 in argument to time.Unix
19:10:58 daemon/graphdriver/overlay/copy.go:162: cannot use stat.Mtim.Nsec (type int32) as type int64 in argument to time.Unix
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: 21b2c278cc86f0fc411018becbcbf2a7e44b6057
Component: engine
Changes most references of syscall to golang.org/x/sys/
Ones aren't changes include, Errno, Signal and SysProcAttr
as they haven't been implemented in /x/sys/.
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
[s390x] switch utsname from unsigned to signed
per 33267e036f
char in s390x in the /x/sys/unix package is now signed, so
change the buildtags
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
Upstream-commit: 069fdc8a083cb1663e4f86fe3fd9b9a1aebc3e54
Component: engine
we see a lot of
```
level=debug msg="Failed to unmount a03b1bb6f569421857e5407d73d89451f92724674caa56bfc2170de7e585a00b-init overlay: device or resource busy"
```
in daemon logs and there is a lot of mountpoint leftover.
This cause failed to remove container.
Signed-off-by: Lei Jitang <leijitang@huawei.com>
Upstream-commit: f65fa1f115df896b2440f50c374f032fc781188d
Component: engine
Before this, if `forceRemove` is set the container data will be removed
no matter what, including if there are issues with removing container
on-disk state (rw layer, container root).
In practice this causes a lot of issues with leaked data sitting on
disk that users are not able to clean up themselves.
This is particularly a problem while the `EBUSY` errors on remove are so
prevalent. So for now let's not keep this behavior.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: 54dcbab25ea4771da303fa95e0c26f2d39487b49
Component: engine
This allows for easy extension of adding more parameters to existing
parameters list. Otherwise adding a single parameter changes code
at so many places.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: b937aa8e6968d805527d163e6f477d496ceb88d7
Component: engine
In root case no mount call or reference count
increment actually happens so don’t try to unmount.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Upstream-commit: e4349ad90114c741c729046f7749f0c2fde415c9
Component: engine
The `archive` package defines aliases for `io.ReadCloser` and
`io.Reader`. These don't seem to provide an benefit other than type
decoration. Per this change, several unnecessary type cases were
removed.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Upstream-commit: aa2cc18745cbe0231c33782f0fa764f657e3fb88
Component: engine
If we are running in a user namespace, don't try to mknod as
it won't be allowed. libcontainer will bind-mount the host's
devices over files in the container anyway, so it's not needed.
The chrootarchive package does a chroot (without mounting /proc) before
its work, so we cannot check /proc/self/uid_map when we need to. So
compute it in advance and pass it along with the tar options.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Upstream-commit: 617c352e9225b1d598e893aa5f89a8863808e4f2
Component: engine
Diff apply is sometimes producing a different change list causing the tests to fail.
Overlay has a known issue calculating diffs of files which occur within the same second they were created.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Upstream-commit: 0e74aabbb9aa5cea0b6bf7342f9e325f989468fa
Component: engine
The mount check is now done by the FSChecker. This function is no longer needed and shouldn't be called.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Upstream-commit: 5cc082473068b00dee123f8388a79d7a48842a57
Component: engine
Check for the rootDir first because the mergeDir may not exist if root
is present.
Also fix unmounting in the defer to make sure it does not have a
refcount.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Upstream-commit: 36a82c20321936a71b30fcfde8bc6c76d6cc8d1f
Component: engine
For things that we can check if they are mounted by using their fsmagic
we should use that and for others do it the slow way.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Upstream-commit: 1ba05cdb6ade7e3abd4c4c3221b5e27645460111
Component: engine
use a consistent approach for checking if the
backing filesystem is compatible with the
storage driver.
also add an error-message for the AUFS driver if
an incompatible combination is found.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 1fc0acc9ae77752858057d1f6f8487ccd82372be
Component: engine
This makes sure fsdiff doesn't try to unmount things that shouldn't be.
**Note**: This is intended as a temporary solution to have as minor a
change as possible for 1.11.1. A bigger change will be required in order
to support container re-attach.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: 7342060b070df67481f8da4f394a57cac1671d56
Component: engine