we see a lot of
```
level=debug msg="Failed to unmount a03b1bb6f569421857e5407d73d89451f92724674caa56bfc2170de7e585a00b-init overlay: device or resource busy"
```
in daemon logs and there is a lot of mountpoint leftover.
This cause failed to remove container.
Signed-off-by: Lei Jitang <leijitang@huawei.com>
Upstream-commit: f65fa1f115df896b2440f50c374f032fc781188d
Component: engine
Before this, if `forceRemove` is set the container data will be removed
no matter what, including if there are issues with removing container
on-disk state (rw layer, container root).
In practice this causes a lot of issues with leaked data sitting on
disk that users are not able to clean up themselves.
This is particularly a problem while the `EBUSY` errors on remove are so
prevalent. So for now let's not keep this behavior.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: 54dcbab25ea4771da303fa95e0c26f2d39487b49
Component: engine
This allows for easy extension of adding more parameters to existing
parameters list. Otherwise adding a single parameter changes code
at so many places.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: b937aa8e6968d805527d163e6f477d496ceb88d7
Component: engine
In root case no mount call or reference count
increment actually happens so don’t try to unmount.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Upstream-commit: e4349ad90114c741c729046f7749f0c2fde415c9
Component: engine
The `archive` package defines aliases for `io.ReadCloser` and
`io.Reader`. These don't seem to provide an benefit other than type
decoration. Per this change, several unnecessary type cases were
removed.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Upstream-commit: aa2cc18745cbe0231c33782f0fa764f657e3fb88
Component: engine
If we are running in a user namespace, don't try to mknod as
it won't be allowed. libcontainer will bind-mount the host's
devices over files in the container anyway, so it's not needed.
The chrootarchive package does a chroot (without mounting /proc) before
its work, so we cannot check /proc/self/uid_map when we need to. So
compute it in advance and pass it along with the tar options.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Upstream-commit: 617c352e9225b1d598e893aa5f89a8863808e4f2
Component: engine
Diff apply is sometimes producing a different change list causing the tests to fail.
Overlay has a known issue calculating diffs of files which occur within the same second they were created.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Upstream-commit: 0e74aabbb9aa5cea0b6bf7342f9e325f989468fa
Component: engine
The mount check is now done by the FSChecker. This function is no longer needed and shouldn't be called.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Upstream-commit: 5cc082473068b00dee123f8388a79d7a48842a57
Component: engine
Check for the rootDir first because the mergeDir may not exist if root
is present.
Also fix unmounting in the defer to make sure it does not have a
refcount.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Upstream-commit: 36a82c20321936a71b30fcfde8bc6c76d6cc8d1f
Component: engine
For things that we can check if they are mounted by using their fsmagic
we should use that and for others do it the slow way.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Upstream-commit: 1ba05cdb6ade7e3abd4c4c3221b5e27645460111
Component: engine
use a consistent approach for checking if the
backing filesystem is compatible with the
storage driver.
also add an error-message for the AUFS driver if
an incompatible combination is found.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 1fc0acc9ae77752858057d1f6f8487ccd82372be
Component: engine
This makes sure fsdiff doesn't try to unmount things that shouldn't be.
**Note**: This is intended as a temporary solution to have as minor a
change as possible for 1.11.1. A bigger change will be required in order
to support container re-attach.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: 7342060b070df67481f8da4f394a57cac1671d56
Component: engine
People have reported following issue with overlay
$ docker run -ti --name=foo -v /dev/:/dev fedora bash
$ docker cp foo:/bin/bash /tmp
$ exit container
Upon container exit, /dev/pts gets unmounted too. This happens because
docker cp volume mounts get propagated to /run/docker/libcontainer/....
and when container exits, it must be tearing down mount point under
/run/docker/libcontainerd/... and as these are "shared" mounts it
propagates events to /dev/pts and it gets unmounted too.
One way to solve this problem is to make sure "docker cp" volume mounts
don't become visible under /run/docker/libcontainerd/..
Here are more details of what is actually happening.
Make overlay home directory (/var/lib/docker/overlay) private mount when
docker starts and unmount it when docker stops. Following is the reason
to do it.
In fedora and some other distributions / is "shared". That means when
docker creates a container and mounts it root in /var/lib/docker/overlay/...
that mount point is "shared".
Looks like after that containerd/runc bind mounts that rootfs into
/runc/docker/libcontainerd/container-id/rootfs. And this puts both source
and destination mounts points in shared group and they both are setup
to propagate mount events to each other.
Later when "docker cp" is run it sets up container volumes under
/var/lib/dokcer/overlay/container-id/... And all these mounts propagate
to /runc/docker/libcontainerd/... Now mountVolumes() makes these new
mount points private but by that time propagation already has happened
and private only takes affect when unmount happens.
So to stop this propagation of volumes by docker cp, make
/var/lib/docker/overlay a private mount point. That means when a container
rootfs is created, that mount point will be private too (it will inherit
property from parent). And that means when bind mount happens in /runc/
dir, overlay mount point will not propagate mounts to /runc/.
Other graphdrivers like devicemapper are already doing it and they don't
face this issue.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: e076bccb458aeadab9380ce0636456ad6317a85f
Component: engine
Overlay tests were failing when /var/tmp was an overlay mount with a misleading message.
Now overlay tests will be skipped when attempting to be run on overlay.
Tests will now use the TMPDIR environment variable instead of only /var/tmp
Fixes#21686
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Upstream-commit: 824c72f4727504e3a8d37f87ce88733c560d4129
Component: engine
Since the layer store was introduced, the level above the graphdriver
now differentiates between read/write and read-only layers. This
distinction is useful for graphdrivers that need to take special steps
when creating a layer based on whether it is read-only or not.
Adding this parameter allows the graphdrivers to differentiate, which
in the case of the Windows graphdriver, removes our dependence on parsing
the id of the parent for "-init" in order to infer this information.
This will also set the stage for unblocking some of the layer store
unit tests in the next preview build of Windows.
Signed-off-by: Stefan J. Wernli <swernli@microsoft.com>
Upstream-commit: ef5bfad3210a9e9c8b761f2c11c0c6289490ebff
Component: engine
Instead of implementing refcounts at each graphdriver, implement this in
the layer package which is what the engine actually interacts with now.
This means interacting directly with the graphdriver is no longer
explicitly safe with regard to Get/Put calls being refcounted.
In addition, with the containerd, layers may still be mounted after
a daemon restart since we will no longer explicitly kill containers when
we shutdown or startup engine.
Because of this ref counts would need to be repopulated.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: 65d79e3e5e537039b244afd7eda29e721a93d84f
Component: engine
Instead of implementing refcounts at each graphdriver, implement this in
the layer package which is what the engine actually interacts with now.
This means interacting directly with the graphdriver is no longer
explicitly safe with regard to Get/Put calls being refcounted.
In addition, with the containerd, layers may still be mounted after
a daemon restart since we will no longer explicitly kill containers when
we shutdown or startup engine.
Because of this ref counts would need to be repopulated.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: 563d0711f83952e561a0d7d5c48fef9810b4f010
Component: engine
That function is pretty heavy used on container start. Autoallocating
buffer can be painful.
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
Upstream-commit: 3f5e1c69b345b25d9b1c57f5d492a0e3fd4432a0
Component: engine
Instead of creating a "0.0" subdirectory and migrating graphroot
metadata into it when user namespaces are available in the daemon
(currently only in experimental), change the graphroot dir permissions
to only include the execute bit for "other" users.
This allows easy migration to and from user namespaces and will allow
easier integration of user namespace support into the master build.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
Upstream-commit: e8532023f20498e6eb1ce5c079dc8a09aeae3061
Component: engine
All underlay dirs need proper remapped ownership. This bug was masked by the
fact that the setupInitLayer code was chown'ing the dirs at startup
time. Since that bug is now fixed, it revealed this permissions issue.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
Upstream-commit: 191cefbaca45ba86341379d09d2f75d5fc1868fb
Component: engine
This change will allow us to run SELinux in a container with
BTRFS back end. We continue to work on fixing the kernel/BTRFS
but this change will allow SELinux Security separation on BTRFS.
It basically relabels the content on container creation.
Just relabling -init directory in BTRFS use case. Everything looks like it
works. I don't believe tar/achive stores the SELinux labels, so we are good
as far as docker commit.
Tested Speed on startup with BTRFS on top of loopback directory. BTRFS
not on loopback should get even better perfomance on startup time. The
more inodes inside of the container image will increase the relabel time.
This patch will give people who care more about security the option of
runnin BTRFS with SELinux. Those who don't want to take the slow down
can disable SELinux either in individual containers or for all containers
by continuing to disable SELinux in the daemon.
Without relabel:
> time docker run --security-opt label:disable fedora echo test
test
real 0m0.918s
user 0m0.009s
sys 0m0.026s
With Relabel
test
real 0m1.942s
user 0m0.007s
sys 0m0.030s
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
Upstream-commit: 1716d497a420f0cd4e53a99535704c6d215e38c7
Component: engine
Adds support for the daemon to handle user namespace maps as a
per-daemon setting.
Support for handling uid/gid mapping is added to the builder,
archive/unarchive packages and functions, all graphdrivers (except
Windows), and the test suite is updated to handle user namespace daemon
rootgraph changes.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
Upstream-commit: 442b45628ee12ebd8e8bd08497896d5fa8eec4bd
Component: engine
There is no need to call `os.Stat` on the driver filesystem path of a
container as `os.RemoveAll` already handles (properly) the case where
the path no longer exists.
Given the results of the stat() were not even being used, there is no
value in erroring out because of the stat call failure, and worse, it
prevents daemon cleanup of containers in "Dead" state unless you re-create
directories that were already removed via a manual cleanup after a
failure. This brings removal in overlay in line with aufs/devicemapper
drivers which don't error out if the filesystem path no longer exists.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
Upstream-commit: 6ed11b53743668dba3bcf6ecef4e57a399d95569
Component: engine