I was able to successfully use device mapper autoconfig feature
(commit 5ef07d79c) but it stopped working after a reboot.
Investigation shown that the dm device was not activated because of
a missing binary, that is not used during initial setup, but every
following time. Here's an error shown when trying to manually activate
the device:
> kir@kd:~/go/src/github.com/docker/docker$ sudo lvchange -a y /dev/docker/thinpool
> /usr/sbin/thin_check: execvp failed: No such file or directory
> Check of pool docker/thinpool failed (status:2). Manual repair required!
Surely, there is no solution to this other than to have a package that
provides the thin_check binary installed beforehand. Due to the fact
the issue revealed itself way later than DM setup was performed, it was
somewhat harder to investigate.
With this in mind, let's check for binary presense before setting up DM,
refusing to proceed if the binary is not there, saving a user from later
frustration.
While at it, eliminate repeated binary checking code. The downside is
that the binary lookup is happening more than once now -- I think the
clarity of code overweights this minor de-optimization.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: 58a453f3f06c1daf34544da8aa16bb95e8e18010
Component: engine
Since the commit d88fe447df0e8 ("Add support for sharing /dev/shm/ and
/dev/mqueue between containers") container's /dev/shm is mounted on the
host first, then bind-mounted inside the container. This is done that
way in order to be able to share this container's IPC namespace
(and the /dev/shm mount point) with another container.
Unfortunately, this functionality breaks container checkpoint/restore
(even if IPC is not shared). Since /dev/shm is an external mount, its
contents is not saved by `criu checkpoint`, and so upon restore any
application that tries to access data under /dev/shm is severily
disappointed (which usually results in a fatal crash).
This commit solves the issue by introducing new IPC modes for containers
(in addition to 'host' and 'container:ID'). The new modes are:
- 'shareable': enables sharing this container's IPC with others
(this used to be the implicit default);
- 'private': disables sharing this container's IPC.
In 'private' mode, container's /dev/shm is truly mounted inside the
container, without any bind-mounting from the host, which solves the
issue.
While at it, let's also implement 'none' mode. The motivation, as
eloquently put by Justin Cormack, is:
> I wondered a while back about having a none shm mode, as currently it is
> not possible to have a totally unwriteable container as there is always
> a /dev/shm writeable mount. It is a bit of a niche case (and clearly
> should never be allowed to be daemon default) but it would be trivial to
> add now so maybe we should...
...so here's yet yet another mode:
- 'none': no /dev/shm mount inside the container (though it still
has its own private IPC namespace).
Now, to ultimately solve the abovementioned checkpoint/restore issue, we'd
need to make 'private' the default mode, but unfortunately it breaks the
backward compatibility. So, let's make the default container IPC mode
per-daemon configurable (with the built-in default set to 'shareable'
for now). The default can be changed either via a daemon CLI option
(--default-shm-mode) or a daemon.json configuration file parameter
of the same name.
Note one can only set either 'shareable' or 'private' IPC modes as a
daemon default (i.e. in this context 'host', 'container', or 'none'
do not make much sense).
Some other changes this patch introduces are:
1. A mount for /dev/shm is added to default OCI Linux spec.
2. IpcMode.Valid() is simplified to remove duplicated code that parsed
'container:ID' form. Note the old version used to check that ID does
not contain a semicolon -- this is no longer the case (tests are
modified accordingly). The motivation is we should either do a
proper check for container ID validity, or don't check it at all
(since it is checked in other places anyway). I chose the latter.
3. IpcMode.Container() is modified to not return container ID if the
mode value does not start with "container:", unifying the check to
be the same as in IpcMode.IsContainer().
3. IPC mode unit tests (runconfig/hostconfig_test.go) are modified
to add checks for newly added values.
[v2: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-51345997]
[v3: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-53902833]
[v4: addressed the case of upgrading from older daemon, in this case
container.HostConfig.IpcMode is unset and this is valid]
[v5: document old and new IpcMode values in api/swagger.yaml]
[v6: add the 'none' mode, changelog entry to docs/api/version-history.md]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: 7120976d74195a60334c688a061270a4d95f9aeb
Component: engine
I tried using dm.directlvm_device but it ended up with the following
error:
> Error starting daemon: error initializing graphdriver: error
> writing docker thinp autoextend profile: open
> /etc/lvm/profile/docker-thinpool.profile: no such file or directory
The reason is /etc/lvm/profile directory does not exist. I think it is
better to try creating it beforehand.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: 6ca20ec771ab7c0ebf64c20021ca795746cf3ccb
Component: engine
It was noted[1] that container's HostConfig.ShmSize, if not set, should be
initialized to daemon default value during container creation.
In fact, it is already done in daemon.adaptContainerSettings, so we can use
value from container.HostConfig directly.
[1] https://github.com/moby/moby/pull/34087#discussion_r128656429
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: 0fb1fb1ce0177cf31dd96e9fdb4a5f55155a5966
Component: engine
Commit db63f9370e26d725357c703cbaf9ab63cc7b6d0a
extracted daemon configuration to its own
package, but did not update the Solaris stubs.
This updates the Solaris daemon.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 335033e25fae0173217e70d4b8dfc5df682ea913
Component: engine
- Remove unused function and variables from the package
- Remove usage of it from `profiles/apparmor` where it wasn't required
- Move the package to `daemon/logger/templates` where it's only used
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
Upstream-commit: 9ef3b535974612b137abae062b7a8a0f7e969871
Component: engine
It is only used in `daemon` and should really live there.
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
Upstream-commit: c204fce2ee926417d1dc7d10c043a81b93d2a72b
Component: engine
Signed-off-by: John Howard <jhoward@microsoft.com>
This changes the graphdriver to perform dynamic sandbox management.
Previously, as a temporary 'hack', the service VM had a prebuilt
sandbox in it. With this change, management is under the control
of the client (docker) and executes a mkfs.ext4 on it. This enables
sandboxes of non-default sizes too (a TODO previously in the code).
It also addresses https://github.com/moby/moby/pull/33969#discussion_r127287887
Requires:
- go-winio: v0.4.3
- opengcs: v0.0.12
- hcsshim: v0.6.x
Upstream-commit: 8c279ef3ad8cd1f019789b8378d0394c80a1807f
Component: engine
Make sure user understands this is about the in-kernel driver
(not the dockerd driver or smth).
While at it, amend the comment as well.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: aab2450e25b397d38cdcb5e173ef1121283196c2
Component: engine
Add daemon config to allow the user to specify the MTU of the control plane network.
The first user of this new parameter is actually libnetwork that can seed the
gossip with the proper MTU value allowing to pack multiple messages per UDP packet sent.
If the value is not specified or is lower than 1500 the logic will set it to the default.
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Upstream-commit: f9f25ca5e44c89d7c1ebdfa9865076eb2cde9bb2
Component: engine
Switch some more usage of the Stat function and the Stat_t type from the
syscall package to golang.org/x/sys. Those were missing in PR #33399.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Upstream-commit: 01f70b028e9597ef207509e8124e120688dae185
Component: engine
releaseableLayer includes automatic handling for creating a read/write layer and mounting it on a call to Mount(), but then does not correspondingly unmount the layer before trying to delete it, which will fail for some graphdrivers. Commit on a releaseable layer also leaks the tarstream for the layer. To fix this, the stream close is deferred in Commit and releaseRWLayer now correctly handles unmounting the layer before trying to delete it. In addition, the changes include better error handling in Release() to make sure that errors are returned to the caller for failures on read/write layers instead of being ignored.# Please enter the commit message for your changes. Lines starting
Signed-off-by: Stefan Wernli <swernli@ntdev.microsoft.com>
Upstream-commit: 1d457999c4540aacda68f834bdb3c6f220ce3fd5
Component: engine
GetTasks can call GetService and GetNode with the read lock held. These
methods try to aquire the read side of the same lock. According to the
sync package documentation, this is not safe:
> If a goroutine holds a RWMutex for reading, it must not expect this or
> any other goroutine to be able to also take the read lock until the
> first read lock is released. In particular, this prohibits recursive
> read locking. This is to ensure that the lock eventually becomes
> available; a blocked Lock call excludes new readers from acquiring the
> lock.
Fix GetTasks to use the lower-level getService and getNode methods
instead. Also, use lockedManagerAction to simplify GetTasks.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Upstream-commit: bd4f66c8f1f6ad4a2f228a957f293bc157e13d9c
Component: engine
Specifically, none of the graphdrivers are supposed to return a
not-exist type of error on remove (or at least that's how they are
currently handled).
Found that AUFS still had one case where a not-exist error could escape,
when checking if the directory is mounted we call a `Statfs` on the
path.
This fixes AUFS to not return an error in this case, but also
double-checks at the daemon level on layer remove that the error is not
a `not-exist` type of error.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: d42dbdd3d48d0134f8bba7ead92a7067791dffab
Component: engine
Having a map per log entry seemed heavier than necessary. These
attributes end up being sorted and serialized, so storing them in a map
doesn't add anything (there's no random access element). In SwarmKit,
they originate as a slice, so there's an unnecessary conversion to a map
and back.
This also fixes the sort comparator, which used to inefficiently split
the string on each comparison.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Upstream-commit: b642b3f21f17cf50c79e464d3aedc93b2dbf0fb0
Component: engine
Steps to reproduce:
```
# docker run -tid --name aaa ubuntu
57bfd00ac5559f72eec8c1b32a01fe38427d66687940f74611e65137414f0ada
# docker run -tid --name bbb --link aaa ubuntu
23ad18362950f39b638206ab4d1885fd4f50cbd1d16aac9cab8e97e0c8363471
# docker ps --no-trunc
CONTAINER ID IMAGE
COMMAND CREATED STATUS PORTS
NAMES
23ad18362950f39b638206ab4d1885fd4f50cbd1d16aac9cab8e97e0c8363471
ubuntu "/bin/bash" 4 seconds ago Up 3 seconds
bbb
57bfd00ac5559f72eec8c1b32a01fe38427d66687940f74611e65137414f0ada
ubuntu "/bin/bash" 14 seconds ago Up 14
seconds aaa,bbb/aaa
# docker rm -f bbb
bbb
# docker ps --no-trunc
CONTAINER ID IMAGE
COMMAND CREATED STATUS PORTS
NAMES
57bfd00ac5559f72eec8c1b32a01fe38427d66687940f74611e65137414f0ada
ubuntu "/bin/bash" 29 seconds ago Up 28
seconds aaa,bbb/aaa
# docker rm --link bbb/aaa
Error response from daemon: Cannot get parent /bbb for name /bbb/aaa
```
When we rm container `bbb`, we can still see `bbb/aaa` in `docker ps
--no-trunc`. And this link cannot be deleted since container `bbb` has
already been removed.
We should remove links of a container when it is deleted.
Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>
Upstream-commit: 600ad5c1b7b736fba6b103eb99ec87efb050b9ec
Component: engine