Since commit e9b9e4ace294230c6b8eb has landed, there is a chance that
container.RWLayer is nil (due to some half-removed container). Let's
check the pointer before use to avoid any potential nil pointer
dereferences, resulting in a daemon crash.
Note that even without the abovementioned commit, it's better to perform
an extra check (even it's totally redundant) rather than to have a
possibility of a daemon crash. In other words, better be safe than
sorry.
[v2: add a test case for daemon.getInspectData]
[v3: add a check for container.Dead and a special error for the case]
Fixes: e9b9e4ace294230c6b8eb
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 195893d38160c0893e326b8674e05ef6714aeaa4)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
ReleaseRWLayer can and should only be called once (unless it returns
an error), but might be called twice in case of a failure from
`system.EnsureRemoveAll(container.Root)`. This results in the
following error:
> Error response from daemon: driver "XXX" failed to remove root filesystem for YYY: layer not retained
The obvious fix is to set container.RWLayer to nil as soon as
ReleaseRWLayer() succeeds.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit e9b9e4ace294230c6b8eb010eda564a2541c4564)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This is only for the case when dockerd has had to re-mount the daemon
root as shared.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 487c6c7e73dbb7871e80d75f176dd2a3539a2947)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
By default, if a user requests a bind mount it uses private propagation.
When the source path is a path within the daemon root this, along with
some other propagation values that the user can use, causes issues when
the daemon tries to remove a mountpoint because a container will then
have a private reference to that mount which prevents removal.
Unmouting with MNT_DETATCH can help this scenario on newer kernels, but
ultimately this is just covering up the problem and doesn't actually
free up the underlying resources until all references are destroyed.
This change does essentially 2 things:
1. Change the default propagation when unspecified to `rslave` when the
source path is within the daemon root path or a parent of the daemon
root (because everything is using rbinds).
2. Creates a validation error on create when the user tries to specify
an unacceptable propagation mode for these paths...
basically the only two acceptable modes are `rslave` and `rshared`.
In cases where we have used the new default propagation but the
underlying filesystem is not setup to handle it (fs must hvae at least
rshared propagation) instead of erroring out like we normally would,
this falls back to the old default mode of `private`, which preserves
backwards compatibility.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 589a0afa8cbe39b6512662fd1705873e2d236dd0)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This was added in #36047 just as a way to make sure the tree is fully
unmounted on shutdown.
For ZFS this could be a breaking change since there was no unmount before.
Someone could have setup the zfs tree themselves. It would be better, if
we really do want the cleanup to actually the unpacked layers checking
for mounts rather than a blind recursive unmount of the root.
BTRFS does not use mounts and does not need to unmount anyway.
These was only an unmount to begin with because for some reason the
btrfs tree was being moutned with `private` propagation.
For the other graphdrivers that still have a recursive unmount here...
these were already being unmounted and performing the recursive unmount
shouldn't break anything. If anyone had anything mounted at the
graphdriver location it would have been unmounted on shutdown anyway.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 2fe4f888bee52b1f256d6fa5e20f9b061d30221c)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This fix tries to address the issue raised in 36083 where
`network inspect` does not show Created time if the network is
created in swarm scope.
The issue was that Created was not converted from swarm api.
This fix addresses the issue.
An unit test has been added.
This fix fixes 36083.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
(cherry picked from commit 090c439fb8a863731cc80fcb9932ce5958d8166d)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Daniel Nephin <dnephin@docker.com>
(cherry picked from commit 3c4537d5b33d951237ea5e4cc123953eda7a37e7)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This change sets an explicit mount propagation for the daemon root.
This is useful for people who need to bind mount the docker daemon root
into a container.
Since bind mounting the daemon root should only ever happen with at
least `rlsave` propagation (to prevent the container from holding
references to mounts making it impossible for the daemon to clean up its
resources), we should make sure the user is actually able to this.
Most modern systems have shared root (`/`) propagation by default
already, however there are some cases where this may not be so
(e.g. potentially docker-in-docker scenarios, but also other cases).
So this just gives the daemon a little more control here and provides
a more uniform experience across different systems.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit a510192b86e7eb1e1112f3f625d80687fdec6578)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The idea behind making the graphdrivers private is to prevent leaking
mounts into other namespaces.
Unfortunately this is not really what happens.
There is one case where this does work, and that is when the namespace
was created before the daemon's namespace.
However with systemd each system servie winds up with it's own mount
namespace. This causes a race betwen daemon startup and other system
services as to if the mount is actually private.
This also means there is a negative impact when other system services
are started while the daemon is running.
Basically there are too many things that the daemon does not have
control over (nor should it) to be able to protect against these kinds
of leakages. One thing is certain, setting the graphdriver roots to
private disconnects the mount ns heirarchy preventing propagation of
unmounts... new mounts are of course not propagated either, but the
behavior is racey (or just bad in the case of restarting services)... so
it's better to just be able to keep mount propagation in tact.
It also does not protect situations like `-v
/var/lib/docker:/var/lib/docker` where all mounts are recursively bound
into the container anyway.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 9803272f2db84df7955b16c0d847ad72cdc494d1)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Validation of Mounts was only performed on container _creation_, not on
container _start_. As a result, if the host-path no longer existed
when the container was started, a directory was created in the given
location.
This is the wrong behavior, because when using the `Mounts` API, host paths
should never be created, and an error should be produced instead.
This patch adds a validation step on container start, and produces an
error if the host path is not found.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 7cb96ba308dc53824d2203fd343a4a297d17976e)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This is a fix to regression in vfs graph driver introduced by
commit 7a1618ced359a3ac92 ("add quota support to VFS graphdriver").
On some filesystems, vfs fails to init with the following error:
> Error starting daemon: error initializing graphdriver: Failed to mknod
> /go/src/github.com/docker/docker/bundles/test-integration/d6bcf6de610e9/root/vfs/backingFsBlockDev:
> function not implemented
As quota is not essential for vfs, let's ignore (but log as a warning) any error
from quota init.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 1e8a087850aa9f96c5000a3ad90757d2e9c0499f)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
If mknod() returns ENOSYS, it most probably means quota is not supported
here, so return the appropriate error.
This is a conservative* fix to regression in vfs graph driver introduced
by commit 7a1618ced359a3ac92 ("add quota support to VFS graphdriver").
On some filesystems, vfs fails to init with the following error:
> Error starting daemon: error initializing graphdriver: Failed to mknod
> /go/src/github.com/docker/docker/bundles/test-integration/d6bcf6de610e9/root/vfs/backingFsBlockDev:
> function not implemented
Reported-by: Brian Goff <cpuguy83@gmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 2dd39b7841bdb9968884bbedc5db97ff77d4fe3e)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>
(cherry picked from commit 852a943c773382df09cdda4f29f9e93807523178)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Follow the conventions for namespace naming set out by other projects,
such as linuxkit and cri-containerd. Typically, they are some sort of
host name, with a subdomain describing functionality of the namespace.
In the case of linuxkit, services are launched in `services.linuxkit`.
In cri-containerd, pods are launched in `k8s.io`, making it clear that
these are from kubernetes.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
(cherry picked from commit 521e7eba86df25857647b93f13e5366c554e9d63)
Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
The previous bytes counter was moved out of scope was not counting the
total number of bytes in the batch. This type encapsulates the counter
and the batch for consideration and code ergonomics.
Signed-off-by: Jacob Vallejo <jakeev@amazon.com>
(cherry picked from commit ad14dbf1346742f0607d7c28a8ef3d4064f5f9fd)
Signed-off-by: Anusha Ragunathan <anusha.ragunathan@docker.com>
When the containerd 1.0 runtime changes were made, we inadvertantly
removed the functionality where any running containers are killed on
startup when not using live-restore.
This change restores that behavior.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e69127bd5ba4dcf8ae1f248db93a95795eb75b93)
Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
With the contianerd 1.0 migration we now have strongly typed errors that
we can check for process not found.
We also had some bad error checks looking for `ESRCH` which would only
be returned from `unix.Kill` and never from containerd even though we
were checking containerd responses for it.
Fixes some race conditions around process handling and our error checks
that could lead to errors that propagate up to the user that should not.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e55bead518e4c72cdecf7de2e49db6c477cb58eb)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The `repository:shortid` syntax for referencing images is very little used,
collides with with tag references can be confused with digest references.
The `repository:shortid` notation was deprecated in Docker 1.13 through
5fc71599a0b77189f0fedf629ed43c7f7067956c, and scheduled for removal
in Docker 17.12.
This patch removes the support for this notation.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit a942c92dd77aff229680c7ae2a6de27687527b8a)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This change adds a Platform struct with a Name field and a general
Components field to the Version API type. This will allow API
consumers to show version information for the whole platform and
it will allow API providers to set the versions for the various
components of the platform.
All changes here are backwards compatible.
Signed-off-by: Tibor Vass <tibor@docker.com>
Upstream-commit: 9152e63290e4a4e586b811cce39082efc649b912
Component: engine
Add a new configuration option to allow the enabling
of the networkDB debug. The option is only parsed using the
reload event. This will protect the daemon on start or restart
if the option is left behind in the config file
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Upstream-commit: a97e45794ea8318a08daf763a5b63b04184a886b
Component: engine
Even though it's highly discouraged, there are existing
installs that are running overlay/overlay2 on filesystems
without d_type support.
This patch allows the daemon to start in such cases, instead of
refusing to start without an option to override.
For fresh installs, backing filesystems without d_type support
will still cause the overlay/overlay2 drivers to be marked as
"unsupported", and skipped during the automatic selection.
This feature is only to keep backward compatibility, but
will be removed at some point.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 0a4e793a3da9ba6d20bccfb83f7c48e20a76d895
Component: engine
Support for running overlay/overlay2 on a backing filesystem
without d_type support (most likely: xfs, as ext4 supports
this by default), was deprecated for some time.
Running without d_type support is problematic, and can
lead to difficult to debug issues ("invalid argument" errors,
or unable to remove files from the container's filesystem).
This patch turns the warning that was previously printed
into an "unsupported" error, so that the overlay/overlay2
drivers are not automatically selected when detecting supported
storage drivers.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 0abb8dec3f730f3ad2cc9a161c97968a6bfd0631
Component: engine
The fsmagic check was always performed on "data-root" (`/var/lib/docker`),
not on the storage-driver's home directory (e.g. `/var/lib/docker/<somedriver>`).
This caused detection to be done on the wrong filesystem in situations
where `/var/lib/docker/<somedriver>` was a mount, and a different
filesystem than `/var/lib/docker` itself.
This patch checks if the storage-driver's home directory exists, and only
falls back to `/var/lib/docker` if it doesn't exist.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: f9c8fa305e1501d8056f8744cb193a720aab0e13
Component: engine
Previously, the code would set the mtime on the directories before
creating files in the directory itself. This was problematic
because it resulted in the mtimes on the directories being
incorrectly set. This change makes it so that the mtime is
set only _after_ all of the files have been created.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Upstream-commit: 77a2bc3e5bbc9be3fe166ed8321b7cd04e7bd097
Component: engine