On unix, merge secrets/configs handling. This is important because
configs can contain secrets (via templating) and potentially a config
could just simply have secret information "by accident" from the user.
This just make sure that configs are as secure as secrets and de-dups a
lot of code.
Generally this makes everything simpler and configs more secure.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: c02171802b788fb2d4d48bebcee2a57c8eabeeaa
Component: engine
It's a common scenario for admins and/or monitoring applications to
mount in the daemon root dir into a container. When doing so all mounts
get coppied into the container, often with private references.
This can prevent removal of a container due to the various mounts that
must be configured before a container is started (for example, for
shared /dev/shm, or secrets) being leaked into another namespace,
usually with private references.
This is particularly problematic on older kernels (e.g. RHEL < 7.4)
where a mount may be active in another namespace and attempting to
remove a mountpoint which is active in another namespace fails.
This change moves all container resource mounts into a common directory
so that the directory can be made unbindable.
What this does is prevents sub-mounts of this new directory from leaking
into other namespaces when mounted with `rbind`... which is how all
binds are handled for containers.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: eaa5192856c1ad09614318e88030554b96bb6e81
Component: engine
This fix tries to address the issue raised in 35920 where the filter
of `docker ps` with `health=starting` always returns nothing.
The issue was that in container view, the human readable string (`HealthString()` => `Health.String()`)
of health status was used. In case of starting it is `"health: starting"`.
However, the filter still uses `starting` so no match returned.
This fix fixes the issue by using `container.Health.Status()` instead so that it matches
the string (`starting`) passed by filter.
This fix fixes 35920.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Upstream-commit: 97b16aecf9275f4103c2737b79d0c5e81583aa58
Component: engine
Files that are suffixed with `_linux.go` or `_windows.go` are
already only built on Linux / Windows, so these build-tags
were redundant.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 6ed1163c98703f8dd0693cecbadc84d2cda811c3
Component: engine
Solaris is no longer being worked on, so these files
are now just dead code.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 1589cc0a85396e2768bfe9e558c7c2100dc3bc87
Component: engine
This is a fix to the following issue:
$ docker run --tmpfs /dev/shm busybox sh
docker: Error response from daemon: linux mounts: Duplicate mount point '/dev/shm'.
In current code (daemon.createSpec()), tmpfs mount from --tmpfs is added
to list of mounts (`ms`), when the mount from IpcMounts() is added.
While IpcMounts() is checking for existing mounts first, it does that
by using container.HasMountFor() function which only checks container.Mounts
but not container.Tmpfs.
Ultimately, the solution is to get rid of container.Tmpfs (moving its
data to container.Mounts). Current workaround is to add checking
of container.Tmpfs into container.HasMountFor().
A unit test case is included.
Unfortunately we can't call daemon.createSpec() from a unit test,
as the code relies a lot on various daemon structures to be initialized
properly, and it is hard to achieve. Therefore, we minimally mimick
the code flow of daemon.createSpec() -- barely enough to reproduce
the issue.
https://github.com/moby/moby/issues/35455
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: 1861abdc4a31efad202a5c3d89a895bb7a62799a
Component: engine
The code in question looks up mounts two times: first by using
HasMountFor(), and then directly by looking in container.MountPoints.
There is no need to do it twice.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: eab3ac3e70a510b97f9399efd13e3dc01a07c413
Component: engine
Adds a mutex to protect the status, as well. When running the race
detector with the unit test, we can see that the Status field is written
without holding this lock. Adding a mutex to read and set status
addresses the issue.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Upstream-commit: 7db30ab0cdf072956d2ceda833b7de22fe17655c
Component: engine
While this code was likely called from a single thread before, we have
now seen panics, indicating that it could be called in parallel. This
change adds a mutex to protect opening and closing of the channel. There
may be another root cause associated with this panic, such as something
that led to the calling of this in parallel, as this code is old and we
had seen this condition until recently.
This fix is by no means a permanent fix. Typically, bugs like this
indicate misplaced channel ownership. In idiomatic uses, the channel
should have a particular "owner" that coordinates sending and closure.
In this case, the owner of the channel is unclear, so it gets opened
lazily. Synchronizing this access is a decent solution, but a refactor
may yield better results.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Upstream-commit: 5b55747a523671fa6e626848060460a48d058451
Component: engine
Currently, if a container removal has failed for some reason,
any client waiting for removal (e.g. `docker run --rm`) is
stuck, waiting for removal to succeed while it has failed already.
For more details and the reproducer, please check
https://github.com/moby/moby/issues/34945
This commit addresses that by allowing `ContainerWait()` with
`container.WaitCondition == "removed"` argument to return an
error in case of removal failure. The `ContainerWaitOKBody`
stucture returned to a client is amended with a pointer to `struct Error`,
containing an error message string, and the `Client.ContainerWait()`
is modified to return the error, if any, to the client.
Note that this feature is only available for API version >= 1.34.
In order for the old clients to be unstuck, we just close the connection
without writing anything -- this causes client's error.
Now, docker-cli would need a separate commit to bump the API to 1.34
and to show an error returned, if any.
[v2: recreate the waitRemove channel after closing]
[v3: document; keep legacy behavior for older clients]
[v4: convert Error from string to pointer to a struct]
[v5: don't emulate old behavior, send empty response in error case]
[v6: rename legacy* vars to include version suffix]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: f963500c544daa3c158c0ca3d2985295c875cb6b
Component: engine
The shutdown timeout for containers in insufficient on Windows. If the daemon is shutting down, and a container takes longer than expected to shut down, this can cause the container to remain in a bad state after restart, and never be able to start again. Increasing the timeout makes this less likely to occur.
Signed-off-by: Darren Stahl <darst@microsoft.com>
Upstream-commit: ed74ee127f42f32ee98be7b908e1562b1c0554d7
Component: engine
Signed-off-by: John Howard <jhoward@microsoft.com>
This PR has the API changes described in https://github.com/moby/moby/issues/34617.
Specifically, it adds an HTTP header "X-Requested-Platform" which is a JSON-encoded
OCI Image-spec `Platform` structure.
In addition, it renames (almost all) uses of a string variable platform (and associated)
methods/functions to os. This makes it much clearer to disambiguate with the swarm
"platform" which is really os/arch. This is a stepping stone to getting the daemon towards
fully multi-platform/arch-aware, and makes it clear when "operating system" is being
referred to rather than "platform" which is misleadingly used - sometimes in the swarm
meaning, but more often as just the operating system.
Upstream-commit: 0380fbff37922cadf294851b1546f4c212c7f364
Component: engine
Take an extra reference to rwlayer while the container is being
committed or exported to avoid the removal of that layer.
Also add some checks before commit/export.
Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>
Upstream-commit: 8c32659979150630a2c4eae4e7da944806c46297
Component: engine
The promise package represents a simple enough concurrency pattern that
replicating it in place is sufficient. To end the propagation of this
package, it has been removed and the uses have been inlined.
While this code could likely be refactored to be simpler without the
package, the changes have been minimized to reduce the possibility of
defects. Someone else may want to do further refactoring to remove
closures and reduce the number of goroutines in use.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Upstream-commit: 0cd4ab3f9a3f242468484fc62b46e632fdba5e13
Component: engine
This enables docker cp and ADD/COPY docker build support for LCOW.
Originally, the graphdriver.Get() interface returned a local path
to the container root filesystem. This does not work for LCOW, so
the Get() method now returns an interface that LCOW implements to
support copying to and from the container.
Signed-off-by: Akash Gupta <akagup@microsoft.com>
Upstream-commit: 7a7357dae1bcccb17e9b2d4c7c8f5c025fce56ca
Component: engine
Use strongly typed errors to set HTTP status codes.
Error interfaces are defined in the api/errors package and errors
returned from controllers are checked against these interfaces.
Errors can be wraeped in a pkg/errors.Causer, as long as somewhere in the
line of causes one of the interfaces is implemented. The special error
interfaces take precedence over Causer, meaning if both Causer and one
of the new error interfaces are implemented, the Causer is not
traversed.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: ebcb7d6b406fe50ea9a237c73004d75884184c33
Component: engine
Since the commit d88fe447df0e8 ("Add support for sharing /dev/shm/ and
/dev/mqueue between containers") container's /dev/shm is mounted on the
host first, then bind-mounted inside the container. This is done that
way in order to be able to share this container's IPC namespace
(and the /dev/shm mount point) with another container.
Unfortunately, this functionality breaks container checkpoint/restore
(even if IPC is not shared). Since /dev/shm is an external mount, its
contents is not saved by `criu checkpoint`, and so upon restore any
application that tries to access data under /dev/shm is severily
disappointed (which usually results in a fatal crash).
This commit solves the issue by introducing new IPC modes for containers
(in addition to 'host' and 'container:ID'). The new modes are:
- 'shareable': enables sharing this container's IPC with others
(this used to be the implicit default);
- 'private': disables sharing this container's IPC.
In 'private' mode, container's /dev/shm is truly mounted inside the
container, without any bind-mounting from the host, which solves the
issue.
While at it, let's also implement 'none' mode. The motivation, as
eloquently put by Justin Cormack, is:
> I wondered a while back about having a none shm mode, as currently it is
> not possible to have a totally unwriteable container as there is always
> a /dev/shm writeable mount. It is a bit of a niche case (and clearly
> should never be allowed to be daemon default) but it would be trivial to
> add now so maybe we should...
...so here's yet yet another mode:
- 'none': no /dev/shm mount inside the container (though it still
has its own private IPC namespace).
Now, to ultimately solve the abovementioned checkpoint/restore issue, we'd
need to make 'private' the default mode, but unfortunately it breaks the
backward compatibility. So, let's make the default container IPC mode
per-daemon configurable (with the built-in default set to 'shareable'
for now). The default can be changed either via a daemon CLI option
(--default-shm-mode) or a daemon.json configuration file parameter
of the same name.
Note one can only set either 'shareable' or 'private' IPC modes as a
daemon default (i.e. in this context 'host', 'container', or 'none'
do not make much sense).
Some other changes this patch introduces are:
1. A mount for /dev/shm is added to default OCI Linux spec.
2. IpcMode.Valid() is simplified to remove duplicated code that parsed
'container:ID' form. Note the old version used to check that ID does
not contain a semicolon -- this is no longer the case (tests are
modified accordingly). The motivation is we should either do a
proper check for container ID validity, or don't check it at all
(since it is checked in other places anyway). I chose the latter.
3. IpcMode.Container() is modified to not return container ID if the
mode value does not start with "container:", unifying the check to
be the same as in IpcMode.IsContainer().
3. IPC mode unit tests (runconfig/hostconfig_test.go) are modified
to add checks for newly added values.
[v2: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-51345997]
[v3: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-53902833]
[v4: addressed the case of upgrading from older daemon, in this case
container.HostConfig.IpcMode is unset and this is valid]
[v5: document old and new IpcMode values in api/swagger.yaml]
[v6: add the 'none' mode, changelog entry to docs/api/version-history.md]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: 7120976d74195a60334c688a061270a4d95f9aeb
Component: engine
Delete needs to release names related to a container even if that
container isn't present in the db. However, slightly overzealous error
checking causes the transaction to get rolled back. Ignore the error
from Delete on the container itself, since it may not be present.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Upstream-commit: 1d9546fc62c559dbcbb3dbdce40318fb7c4d67a2
Component: engine
Currently, names are maintained by a separate system called "registrar".
This means there is no way to atomically snapshot the state of
containers and the names associated with them.
We can add this atomicity and simplify the code by storing name
associations in the memdb. This removes the need for pkg/registrar, and
makes snapshots a lot less expensive because they no longer need to copy
all the names. This change also avoids some problematic behavior from
pkg/registrar where it returns slices which may be modified later on.
Note that while this change makes the *snapshotting* atomic, it doesn't
yet do anything to make sure containers are named at the same time that
they are added to the database. We can do that by adding a transactional
interface, either as a followup, or as part of this PR.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Upstream-commit: 1128fc1add66a849c12d2045aed39605e673abc6
Component: engine
Do not change pause state when restoring container's
status, or status in docker will be different with
status in runc.
Signed-off-by: Fengtu Wang <wangfengtu@huawei.com>
Upstream-commit: 977c4046fd2147d7c04f4b513a94138013ca0dd6
Component: engine
Do not allow sharing of container network with hyperv containers
Signed-off-by: Madhan Raj Mookkandy <madhanm@microsoft.com>
Upstream-commit: 349913ce9fde34d8acd08fad5ce866401f4d135e
Component: engine