This also update:
- runc to 3f2f8b84a77f73d38244dd690525642a72156c64
- runtime-specs to v1.0.0
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Upstream-commit: 45d85c99139bbd16004bbedb7d5bac6a60264538
Component: engine
Use strongly typed errors to set HTTP status codes.
Error interfaces are defined in the api/errors package and errors
returned from controllers are checked against these interfaces.
Errors can be wraeped in a pkg/errors.Causer, as long as somewhere in the
line of causes one of the interfaces is implemented. The special error
interfaces take precedence over Causer, meaning if both Causer and one
of the new error interfaces are implemented, the Causer is not
traversed.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: ebcb7d6b406fe50ea9a237c73004d75884184c33
Component: engine
Since the commit d88fe447df0e8 ("Add support for sharing /dev/shm/ and
/dev/mqueue between containers") container's /dev/shm is mounted on the
host first, then bind-mounted inside the container. This is done that
way in order to be able to share this container's IPC namespace
(and the /dev/shm mount point) with another container.
Unfortunately, this functionality breaks container checkpoint/restore
(even if IPC is not shared). Since /dev/shm is an external mount, its
contents is not saved by `criu checkpoint`, and so upon restore any
application that tries to access data under /dev/shm is severily
disappointed (which usually results in a fatal crash).
This commit solves the issue by introducing new IPC modes for containers
(in addition to 'host' and 'container:ID'). The new modes are:
- 'shareable': enables sharing this container's IPC with others
(this used to be the implicit default);
- 'private': disables sharing this container's IPC.
In 'private' mode, container's /dev/shm is truly mounted inside the
container, without any bind-mounting from the host, which solves the
issue.
While at it, let's also implement 'none' mode. The motivation, as
eloquently put by Justin Cormack, is:
> I wondered a while back about having a none shm mode, as currently it is
> not possible to have a totally unwriteable container as there is always
> a /dev/shm writeable mount. It is a bit of a niche case (and clearly
> should never be allowed to be daemon default) but it would be trivial to
> add now so maybe we should...
...so here's yet yet another mode:
- 'none': no /dev/shm mount inside the container (though it still
has its own private IPC namespace).
Now, to ultimately solve the abovementioned checkpoint/restore issue, we'd
need to make 'private' the default mode, but unfortunately it breaks the
backward compatibility. So, let's make the default container IPC mode
per-daemon configurable (with the built-in default set to 'shareable'
for now). The default can be changed either via a daemon CLI option
(--default-shm-mode) or a daemon.json configuration file parameter
of the same name.
Note one can only set either 'shareable' or 'private' IPC modes as a
daemon default (i.e. in this context 'host', 'container', or 'none'
do not make much sense).
Some other changes this patch introduces are:
1. A mount for /dev/shm is added to default OCI Linux spec.
2. IpcMode.Valid() is simplified to remove duplicated code that parsed
'container:ID' form. Note the old version used to check that ID does
not contain a semicolon -- this is no longer the case (tests are
modified accordingly). The motivation is we should either do a
proper check for container ID validity, or don't check it at all
(since it is checked in other places anyway). I chose the latter.
3. IpcMode.Container() is modified to not return container ID if the
mode value does not start with "container:", unifying the check to
be the same as in IpcMode.IsContainer().
3. IPC mode unit tests (runconfig/hostconfig_test.go) are modified
to add checks for newly added values.
[v2: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-51345997]
[v3: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-53902833]
[v4: addressed the case of upgrading from older daemon, in this case
container.HostConfig.IpcMode is unset and this is valid]
[v5: document old and new IpcMode values in api/swagger.yaml]
[v6: add the 'none' mode, changelog entry to docs/api/version-history.md]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: 7120976d74195a60334c688a061270a4d95f9aeb
Component: engine
There is no case which would resolve in this error. The root user always exists, and if the id maps are empty, the default value of 0 is correct.
Signed-off-by: Daniel Nephin <dnephin@docker.com>
Upstream-commit: 93fbdb69acf9248283a91a1c5c6ea24711c26eda
Component: engine
This adds support to specify custom container paths for secrets.
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
Upstream-commit: 67d282a5c95ca1d25cd4e9c688e89191f662d448
Component: engine
- Moved DefaultInitBinary from daemon/daemon.go to
daemon/config/config.go since it's a daemon config and is referred in
config package files.
- Added condition in GetInitPath to check for any explicitly configured
DefaultInitBinary. If not, the default value of DefaultInitBinary is
returned.
- Changed all references of DefaultInitBinary to refer to the variable
from new location.
- Added TestCommonUnixGetInitPath to test for the various values of
GetInitPath.
Fixes#32314
Signed-off-by: Sunny Gogoi <indiasuny000@gmail.com>
Upstream-commit: 17b128876028022991e2dbcb2cc402cc81b451e5
Component: engine
This introduce a new `--device-cgroup-rule` flag that allow a user to
add one or more entry to the container cgroup device `devices.allow`
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Upstream-commit: 1756af6fafabd9197feb56c0324e49dd7d30b11f
Component: engine
In certain cases (unattended upgrades), system services can disable
loaded AppArmor profiles. However, since /etc being read-only is a
supported setup we cannot just write a copy of the profile to
/etc/apparmor.d.
Instead, dynamically load the docker-default AppArmor profile if a
container is started with that profile set. This code will short-cut if
the profile is already loaded.
Fixes: 2f7596aaef3a ("apparmor: do not save profile to /etc/apparmor.d")
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Upstream-commit: 567ef8e7858ca4f282f598ba1f5a951cbad39e83
Component: engine
- use /secrets for swarm secret create route
- do not specify omitempty for secret and secret reference
- simplify lookup for secret ids
- do not use pointer for secret grpc conversion
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
Upstream-commit: 189f89301e0abfee32447f2ca23dacd3a96de06d
Component: engine
- fix lint issues
- use errors pkg for wrapping errors
- cleanup on error when setting up secrets mount
- fix erroneous import
- remove unneeded switch for secret reference mode
- return single mount for secrets instead of slice
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
Upstream-commit: 857e60c2f943a09e3ec0ac0f236821b797935900
Component: engine
containers may specify these cgroup values at runtime. This will allow
processes to change their priority to real-time within the container
when CONFIG_RT_GROUP_SCHED is enabled in the kernel. See #22380.
Also added sanity checks for the new --cpu-rt-runtime and --cpu-rt-period
flags to ensure that that the kernel supports these features and that
runtime is not greater than period.
Daemon will support a --cpu-rt-runtime flag to initialize the parent
cgroup on startup, this prevents the administrator from alotting runtime
to docker after each restart.
There are additional checks that could be added but maybe too far? Check
parent cgroups to ensure values are <= parent, inspecting rtprio ulimit
and issuing a warning.
Signed-off-by: Erik St. Martin <alakriti@gmail.com>
Upstream-commit: 56f77d5ade945b3b8816a6c8acb328b7c6dce9a7
Component: engine
This adds a small C binary for fighting zombies. It is mounted under
`/dev/init` and is prepended to the args specified by the user. You
enable it via a daemon flag, `dockerd --init`, as it is disable by
default for backwards compat.
You can also override the daemon option or specify this on a per
container basis with `docker run --init=true|false`.
You can test this by running a process like this as the pid 1 in a
container and see the extra zombie that appears in the container as it
is running.
```c
int main(int argc, char ** argv) {
pid_t pid = fork();
if (pid == 0) {
pid = fork();
if (pid == 0) {
exit(0);
}
sleep(3);
exit(0);
}
printf("got pid %d and exited\n", pid);
sleep(20);
}
```
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Upstream-commit: ee3ac3aa66bfb27b7c21dfb253fdaa113baedd4e
Component: engine
This fix tries to fix 26326 where `docker inspect` will not show
ulimit even when daemon default ulimit has been set.
This fix merge the HostConfig's ulimit with daemon default in
`docker inspect`, so that when daemon is started with `default-ulimit`
and HostConfig's ulimit is not set, `docker inspect` will output
the daemon default.
An integration test has been added to cover the changes.
This fix fixes 26326.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Upstream-commit: 7d705a7355d650feffc966e08efc0f92297145a8
Component: engine
This moves the types for the `engine-api` repo to the existing types
package.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Upstream-commit: 91e197d614547f0202e6ae9b8a24d88ee131d950
Component: engine
This fix tries to address the issue raised in #22420. When
`--tmpfs` is specified with `/tmp`, the default value is
`rw,nosuid,nodev,noexec,relatime,size=65536k`. When `--tmpfs`
is specified with `/tmp:rw`, then the value changed to
`rw,nosuid,nodev,noexec,relatime`.
The reason for such an inconsistency is because docker tries
to add `size=65536k` option only when user provides no option.
This fix tries to address this issue by always pre-progating
`size=65536k` along with `rw,nosuid,nodev,noexec,relatime`.
If user provides a different value (e.g., `size=8192k`), it
will override the `size=65536k` anyway since the combined
options will be parsed and merged to remove any duplicates.
Additional test cases have been added to cover the changes
in this fix.
This fix fixes#22420.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Upstream-commit: 397a6fefadf9ac91a5c9de2447f4dea607296470
Component: engine
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
Set up the mount label in the spec for a container
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
Upstream-commit: e0f98c698b49e3790fe63bff611eeda6f5b46055
Component: engine
This patch will allow users to specify namespace specific "kernel parameters"
for running inside of a container.
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
Upstream-commit: 9caf7aeefd23263a209c26c8439d26c147972d81
Component: engine
This vendors in new spec/runc that supports
setting readonly and masked paths in the
configuration. Using this allows us to make an
exception for `—-privileged`.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Upstream-commit: 3f81b4935292d5daedea9de4e2db0895986115da
Component: engine
runc expects a systemd cgroupsPath to be in slice:scopePrefix:containerName
format and the "--systemd-cgroup" option to be set. Update docker accordingly.
Fixes 21475
Signed-off-by: Anusha Ragunathan <anusha@docker.com>
Upstream-commit: 7ed3d265a4499ec03f10537fea0aac3ebaa0cec6
Component: engine