The idea behind making the graphdrivers private is to prevent leaking
mounts into other namespaces.
Unfortunately this is not really what happens.
There is one case where this does work, and that is when the namespace
was created before the daemon's namespace.
However with systemd each system servie winds up with it's own mount
namespace. This causes a race betwen daemon startup and other system
services as to if the mount is actually private.
This also means there is a negative impact when other system services
are started while the daemon is running.
Basically there are too many things that the daemon does not have
control over (nor should it) to be able to protect against these kinds
of leakages. One thing is certain, setting the graphdriver roots to
private disconnects the mount ns heirarchy preventing propagation of
unmounts... new mounts are of course not propagated either, but the
behavior is racey (or just bad in the case of restarting services)... so
it's better to just be able to keep mount propagation in tact.
It also does not protect situations like `-v
/var/lib/docker:/var/lib/docker` where all mounts are recursively bound
into the container anyway.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: 9803272f2db84df7955b16c0d847ad72cdc494d1
Component: engine
The overlay2 driver was not setting up the archive.TarOptions field
properly like other storage backend routes to "applyTarLayer"
functionality. The InUserNS field is populated now for overlay2 using
the same query function used by the other storage drivers.
Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
Upstream-commit: 05b8d59015f8a5ce26c8bbaa8053b5bc7cb1a77b
Component: engine
Even though it's highly discouraged, there are existing
installs that are running overlay/overlay2 on filesystems
without d_type support.
This patch allows the daemon to start in such cases, instead of
refusing to start without an option to override.
For fresh installs, backing filesystems without d_type support
will still cause the overlay/overlay2 drivers to be marked as
"unsupported", and skipped during the automatic selection.
This feature is only to keep backward compatibility, but
will be removed at some point.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 0a4e793a3da9ba6d20bccfb83f7c48e20a76d895
Component: engine
Support for running overlay/overlay2 on a backing filesystem
without d_type support (most likely: xfs, as ext4 supports
this by default), was deprecated for some time.
Running without d_type support is problematic, and can
lead to difficult to debug issues ("invalid argument" errors,
or unable to remove files from the container's filesystem).
This patch turns the warning that was previously printed
into an "unsupported" error, so that the overlay/overlay2
drivers are not automatically selected when detecting supported
storage drivers.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 0abb8dec3f730f3ad2cc9a161c97968a6bfd0631
Component: engine
The fsmagic check was always performed on "data-root" (`/var/lib/docker`),
not on the storage-driver's home directory (e.g. `/var/lib/docker/<somedriver>`).
This caused detection to be done on the wrong filesystem in situations
where `/var/lib/docker/<somedriver>` was a mount, and a different
filesystem than `/var/lib/docker` itself.
This patch checks if the storage-driver's home directory exists, and only
falls back to `/var/lib/docker` if it doesn't exist.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: f9c8fa305e1501d8056f8744cb193a720aab0e13
Component: engine
The overlay2 storage-driver requires multiple lower dir
support for overlayFs. Support for this feature was added
in kernel 4.x, but some distros (RHEL 7.4, CentOS 7.4) ship with
an older kernel with this feature backported.
This patch adds feature-detection for multiple lower dirs,
and will perform this feature-detection on pre-4.x kernels
with overlayFS support.
With this patch applied, daemons running on a kernel
with multiple lower dir support will now select "overlay2"
as storage-driver, instead of falling back to "overlay".
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 955c1f881ac94af19c99f0f7d5635e6a574789f2
Component: engine
This subtle bug keeps lurking in because error checking for `Mkdir()`
and `MkdirAll()` is slightly different wrt to `EEXIST`/`IsExist`:
- for `Mkdir()`, `IsExist` error should (usually) be ignored
(unless you want to make sure directory was not there before)
as it means "the destination directory was already there"
- for `MkdirAll()`, `IsExist` error should NEVER be ignored.
Mostly, this commit just removes ignoring the IsExist error, as it
should not be ignored.
Also, there are a couple of cases then IsExist is handled as
"directory already exist" which is wrong. As a result, some code
that never worked as intended is now removed.
NOTE that `idtools.MkdirAndChown()` behaves like `os.MkdirAll()`
rather than `os.Mkdir()` -- so its description is amended accordingly,
and its usage is handled as such (i.e. IsExist error is not ignored).
For more details, a quote from my runc commit 6f82d4b (July 2015):
TL;DR: check for IsExist(err) after a failed MkdirAll() is both
redundant and wrong -- so two reasons to remove it.
Quoting MkdirAll documentation:
> MkdirAll creates a directory named path, along with any necessary
> parents, and returns nil, or else returns an error. If path
> is already a directory, MkdirAll does nothing and returns nil.
This means two things:
1. If a directory to be created already exists, no error is
returned.
2. If the error returned is IsExist (EEXIST), it means there exists
a non-directory with the same name as MkdirAll need to use for
directory. Example: we want to MkdirAll("a/b"), but file "a"
(or "a/b") already exists, so MkdirAll fails.
The above is a theory, based on quoted documentation and my UNIX
knowledge.
3. In practice, though, current MkdirAll implementation [1] returns
ENOTDIR in most of cases described in #2, with the exception when
there is a race between MkdirAll and someone else creating the
last component of MkdirAll argument as a file. In this very case
MkdirAll() will indeed return EEXIST.
Because of #1, IsExist check after MkdirAll is not needed.
Because of #2 and #3, ignoring IsExist error is just plain wrong,
as directory we require is not created. It's cleaner to report
the error now.
Note this error is all over the tree, I guess due to copy-paste,
or trying to follow the same usage pattern as for Mkdir(),
or some not quite correct examples on the Internet.
[1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: 516010e92d56cfcd6d1e343bdc02b6f04bc43039
Component: engine
This removes and recreates the merged dir with each umount/mount
respectively.
This is done to make the impact of leaking mountpoints have less
user-visible impact.
It's fairly easy to accidentally leak mountpoints (even if moby doesn't,
other tools on linux like 'unshare' are quite able to incidentally do
so).
As of recently, overlayfs reacts to these mounts being leaked (see
One trick to force an unmount is to remove the mounted directory and
recreate it. Devicemapper now does this, overlay can follow suit.
Signed-off-by: Euan Kemp <euan.kemp@coreos.com>
Upstream-commit: af0d589623eff9f8cefced8b527dbd7cf221ce61
Component: engine
From https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt:
> The lower filesystem can be any filesystem supported by Linux and does
> not need to be writable. The lower filesystem can even be another
> overlayfs. The upper filesystem will normally be writable and if it
> is it must support the creation of trusted.* extended attributes, and
> must provide valid d_type in readdir responses, so NFS is not suitable.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 90dfb1d0cc59d79ccb272997d735864615010785
Component: engine
When use overlay2 as the graphdriver and the kernel enable
`CONFIG_OVERLAY_FS_REDIRECT_DIR=y`, rename a dir in lower layer
will has a xattr to redirct its dir to source dir. This make the
image layer unportable. This patch fallback to use naive diff driver
when kernel enable CONFIG_OVERLAY_FS_REDIRECT_DIR
Signed-off-by: Lei Jitang <leijitang@huawei.com>
Upstream-commit: 49c3a7c4bac2877265ef8c4eaf210159560f08b4
Component: engine
The change in 7a7357dae1bcccb17e9b2d4c7c8f5c025fce56ca inadvertently
changed the `defer` error code into a no-op. This restores its behavior
prior to that code change, and also introduces a little more error
logging.
Signed-off-by: Euan Kemp <euan.kemp@coreos.com>
Upstream-commit: 639ab92f011245e17e9a293455a8dae1eb034022
Component: engine
This enables docker cp and ADD/COPY docker build support for LCOW.
Originally, the graphdriver.Get() interface returned a local path
to the container root filesystem. This does not work for LCOW, so
the Get() method now returns an interface that LCOW implements to
support copying to and from the container.
Signed-off-by: Akash Gupta <akagup@microsoft.com>
Upstream-commit: 7a7357dae1bcccb17e9b2d4c7c8f5c025fce56ca
Component: engine
Changes most references of syscall to golang.org/x/sys/
Ones aren't changes include, Errno, Signal and SysProcAttr
as they haven't been implemented in /x/sys/.
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
[s390x] switch utsname from unsigned to signed
per 33267e036f
char in s390x in the /x/sys/unix package is now signed, so
change the buildtags
Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
Upstream-commit: 069fdc8a083cb1663e4f86fe3fd9b9a1aebc3e54
Component: engine
This commit adds the overlay2.size option to the daemon daemon
storage opts.
The user can override this option by the "docker run --storage-opt"
options.
Signed-off-by: Dhawal Yogesh Bhanushali <dbhanushali@vmware.com>
Upstream-commit: a63d5bc03513755015827d0fe93563240429f1e0
Component: engine
we see a lot of
```
level=debug msg="Failed to unmount a03b1bb6f569421857e5407d73d89451f92724674caa56bfc2170de7e585a00b-init overlay: device or resource busy"
```
in daemon logs and there is a lot of mountpoint leftover.
This cause failed to remove container.
Signed-off-by: Lei Jitang <leijitang@huawei.com>
Upstream-commit: f65fa1f115df896b2440f50c374f032fc781188d
Component: engine
OverlayFS is supported on top of btrfs as of Linux Kernel 4.7.
Skip the hard enforcement when on kernel 4.7 or newer and
respect the kernel check override flag on older kernels.
https://btrfs.wiki.kernel.org/index.php/Changelog#By_feature
Signed-off-by: Derek McGowan <derek@mcgstyle.net>
Upstream-commit: f64a4ad008e68996afcec3ab34a869887716f944
Component: engine
Before this, if `forceRemove` is set the container data will be removed
no matter what, including if there are issues with removing container
on-disk state (rw layer, container root).
In practice this causes a lot of issues with leaked data sitting on
disk that users are not able to clean up themselves.
This is particularly a problem while the `EBUSY` errors on remove are so
prevalent. So for now let's not keep this behavior.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: 54dcbab25ea4771da303fa95e0c26f2d39487b49
Component: engine
Naivediff fails when layers are created directly on top of
each other. Other graphdrivers which use naivediff already
skip these tests. Until naivediff is fixed, skip with overlay2
when running tests on a kernel which causes naivediff fallback.
Fix applydiff to never use the naivediff size when not applying
changes with naivediff.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Upstream-commit: 5a1b557281204964f37e81cb0776758360029555
Component: engine
When running on a kernel which is not patched for the copy up bug
overlay2 will use the naive diff driver.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Upstream-commit: 64b43ed5ecf3e805bd72bd6a9493c8c5d08478aa
Component: engine
This allows for easy extension of adding more parameters to existing
parameters list. Otherwise adding a single parameter changes code
at so many places.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: b937aa8e6968d805527d163e6f477d496ceb88d7
Component: engine
The overlay2 change ensures that the correct path is used to resolve the
symlink. The current code will not fail since the symlinks are always given
a value of "../id/diff" which ends up ignoring the incorrect "link" value.
Fix this code so it doesn't cause unexpected errors in the future if the
symlink changes.
The layerstore cleanup ensures that the empty layer returns a tar stream if
the provided parent is empty. Any value other than empty still returns an
error since the empty layer has no parent. Currently empty layer is not
used anywhere that TarStreamFrom is used but could break in the future if
this function is called.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Upstream-commit: 6622cc970e27a7bf2798d751ed276265a3a2d404
Component: engine
Allow built images to be squash to scratch.
Squashing does not destroy any images or layers, and preserves the
build cache.
Introduce a new CLI argument --squash to docker build
Introduce a new param to the build API endpoint `squash`
Once the build is complete, docker creates a new image loading the diffs
from each layer into a single new layer and references all the parent's
layers.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: 362369b4bbea38881402d281ee2015d16e8b10ce
Component: engine
This fix tries to fix logrus formatting by removing `f` from
`logrus.[Error|Warn|Debug|Fatal|Panic|Info]f` when formatting string
is not present.
Fixed issue #23459
Signed-off-by: Daehyeok Mun <daehyeok@gmail.com>
Upstream-commit: fa710e504b0e3e51d4031790c18621b02dcd2600
Component: engine
The `archive` package defines aliases for `io.ReadCloser` and
`io.Reader`. These don't seem to provide an benefit other than type
decoration. Per this change, several unnecessary type cases were
removed.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Upstream-commit: aa2cc18745cbe0231c33782f0fa764f657e3fb88
Component: engine
Go can falsely report a larger page size than supported,
causing overlay2 mount arguments to be truncated. When overlay2
detects the mount arguments have hit the page limit, it will
switch to using relative paths. If this limit is smaller than
the actual page size there is no behavioral problems, but if it
is larger mounts can fail for images with many layers.
Closes#27384
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Upstream-commit: 520034e35b463e8c9d69ac78b52a4e5df958bc04
Component: engine
Allow passing --storage-opt size=X to docker create/run commands
for the `overlay2` graphriver.
The size option is only available if the backing fs is xfs that is
mounted with the `pquota` mount option.
The user can pass any size less then the backing fs size.
Signed-off-by: Amir Goldstein <amir73il@aquasec.com>
Upstream-commit: 05bac4591a4519fbac9d3724f3b708e882bbad80
Component: engine
In the common case where the user is using /var/lib/docker and
an image with less than 60 layers, forking is not needed. Calculate
whether absolute paths can be used and avoid forking to mount in
those cases.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Upstream-commit: c13a985fa1196a5ed782d5ac68a4bbb68dd529ca
Component: engine
Add option to skip kernel check for older kernels which have been patched to support multiple lower directories in overlayfs.
Fixes#24023
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Upstream-commit: ff98da0607c4d6a94a2356d9ccaa64cc9d7f6a78
Component: engine