This fix moves multiple places of serviceRunningTasksCount
to one location in integration/internal/swarm, so that
code duplication could be removed.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
(cherry picked from commit e485a60e2bcc59860f387c94f6afaa0130ea7040)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: c087f681d40332bd7d158baf615c03f6a72274f1
Component: engine
Tests which will re-deploy containers uses function serviceIsUpdated() to
make sure that service update really reached state UpdateStateCompleted.
Tests which will not re-deploy container uses function
serviceSpecIsUpdated to make sure that service version is increased.
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
(cherry picked from commit b868ada474b5c214ed9bddb792dd36f794dc1fa6)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: ccc1abea092a54d99dec51345ef9303b4629572d
Component: engine
[18.09 backport] explicitly set filesystem type for mount to avoid 'invalid argument' error on arm
Upstream-commit: 51ebfcbe423c679f06cfb37f7e6be565e0955639
Component: engine
[18.09 backport] bugfix: fetch the right device number which great than 255
Upstream-commit: b236a1e78d35a62b164d00f4e19f59a556da8831
Component: engine
[18.09 backport] Ensure all integration daemon logging happens before test exit
Upstream-commit: 08f6e9c14fd0de408db93914058d75df5b32ac67
Component: engine
Currently the API spec would allow `"443/tcp": [null]`, but what should
be allowed is `"443/tcp": null`
Signed-off-by: Dominic Tubach <dominic.tubach@to.com>
(cherry picked from commit 32b5d296ea5d392c28affe2854b9d4201166bc27)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 3ee1e060fc9a31844064035cc3e8b76fddb8b341
Component: engine
This test runs on a daemon also used by other tests
so make sure we don't get failures if another test
doesn't cleanup or is running in parallel.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 915acffdb4cf95b934dd9872c1f54ea4487819b7)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 05599804157a38db2eda84d3e8060a879b755ab7
Component: engine
Running a bundled aufs benchmark sometimes results in this warning:
> WARN[0001] Couldn't run auplink before unmount /tmp/aufs-tests/aufs/mnt/XXXXX error="exit status 22" storage-driver=aufs
If we take a look at what aulink utility produces on stderr, we'll see:
> auplink:proc_mnt.c:96: /tmp/aufs-tests/aufs/mnt/XXXXX: Invalid argument
and auplink exits with exit code of 22 (EINVAL).
Looking into auplink source code, what happens is it tries to find a
record in /proc/self/mounts corresponding to the mount point (by using
setmntent()/getmntent_r() glibc functions), and it fails.
Some manual testing, as well as runtime testing with lots of printf
added on mount/unmount, as well as calls to check the superblock fs
magic on mount point (as in graphdriver.Mounted(graphdriver.FsMagicAufs, target)
confirmed that this record is in fact there, but sometimes auplink
can't find it. I was also able to reproduce the same error (inability
to find a mount in /proc/self/mounts that should definitely be there)
using a small C program, mocking what `auplink` does:
```c
#include <stdio.h>
#include <err.h>
#include <mntent.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
FILE *fp;
struct mntent m, *p;
char a[4096];
char buf[4096 + 1024];
int found =0, lines = 0;
if (argc != 2) {
fprintf(stderr, "Usage: %s <mountpoint>\n", argv[0]);
exit(1);
}
fp = setmntent("/proc/self/mounts", "r");
if (!fp) {
err(1, "setmntent");
}
setvbuf(fp, a, _IOLBF, sizeof(a));
while ((p = getmntent_r(fp, &m, buf, sizeof(buf)))) {
lines++;
if (!strcmp(p->mnt_dir, argv[1])) {
found++;
}
}
printf("found %d entries for %s (%d lines seen)\n", found, argv[1], lines);
return !found;
}
```
I have also wrote a few other C proggies -- one that reads
/proc/self/mounts directly, one that reads /proc/self/mountinfo instead.
They are also prone to the same occasional error.
It is not perfectly clear why this happens, but so far my best theory
is when a lot of mounts/unmounts happen in parallel with reading
contents of /proc/self/mounts, sometimes the kernel fails to provide
continuity (i.e. it skips some part of file or mixes it up in some
other way). In other words, this is a kernel bug (which is probably
hard to fix unless some other interface to get a mount entry is added).
Now, there is no real fix, and a workaround I was able to come up
with is to retry when we got EINVAL. It usually works on the second
attempt, although I've once seen it took two attempts to go through.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit ae431b10a9508e2bf3b1782e9d435855e3e13977)
Upstream-commit: c303e63ca6c3d25d29b8992451898734fa6e4e7c
Component: engine
Do not use filepath.Walk() as there's no requirement to recursively
go into every directory under mnt -- a (non-recursive) list of
directories in mnt is sufficient.
With filepath.Walk(), in case some container will fail to unmount,
it'll go through the whole container filesystem which is both
excessive and useless.
This is similar to commit f1a459229724f5e8e440b49f ("devmapper.shutdown:
optimize")
While at it, raise the priority of "unmount error" message from debug
to a warning. Note we don't have to explicitly add `m` as unmount error (from
pkg/mount) will have it.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 8fda12c6078ed5c86be0822a7a980c6fbc9220bf)
Upstream-commit: b85d4a4f09badf40201ef7fcbd5b930de216190d
Component: engine
In case there are a big number of layers, so that mount data won't fit
into a single memory page (4096 bytes on most platforms, which is good
enough for about 40 layers, depending on how long graphdriver root path
is), we supply additional layers with O_REMOUNT, as described in aufs
documentation.
Problem is, the current implementation does that one layer at a time
(i.e. there is one mount syscall per each additional layer).
Optimize the code to supply as many layers as we can fit in one page
(basically reusing the same code as for the original mount).
Note, per aufs docs, "[a]t remount-time, the options are interpreted
in the given order, e.g. left to right" so we should be good.
Tested on an image with ~100 layers.
Before (35 syscalls):
> [pid 22756] 1556919088.686955 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", "aufs", 0, "br:/mnt/volume_sfo2_09/docker-au"...) = 0 <0.000504>
> [pid 22756] 1556919088.687643 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c451b0, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000105>
> [pid 22756] 1556919088.687851 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c451ba, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000098>
> ..... (~30 lines skipped for clarity)
> [pid 22756] 1556919088.696182 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/a86f8c9dd0ec2486293119c20b0ec026e19bbc4d51332c554f7cf05d777c9866", 0xc000c45310, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.000266>
After (2 syscalls):
> [pid 24352] 1556919361.799889 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/8e7ba189e347a834e99eea4ed568f95b86cec809c227516afdc7c70286ff9a20", "aufs", 0, "br:/mnt/volume_sfo2_09/docker-au"...) = 0 <0.001717>
> [pid 24352] 1556919361.801761 mount("none", "/mnt/volume_sfo2_09/docker-aufs/aufs/mnt/8e7ba189e347a834e99eea4ed568f95b86cec809c227516afdc7c70286ff9a20", 0xc000dbecb0, MS_REMOUNT, "append:/mnt/volume_sfo2_09/docke"...) = 0 <0.001358>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit d58c434bffef76e48bff75ede290937874488dd3)
Upstream-commit: 75488521735325e4564e66d9cf9c2b1bc1c2c64b
Component: engine
Apparently there is some kind of race in aufs kernel module code,
which leads to the errors like:
[98221.158606] aufs au_xino_create2:186:dockerd[25801]: aufs.xino create err -17
[98221.162128] aufs au_xino_set:1229:dockerd[25801]: I/O Error, failed creating xino(-17).
[98362.239085] aufs au_xino_create2:186:dockerd[6348]: aufs.xino create err -17
[98362.243860] aufs au_xino_set:1229:dockerd[6348]: I/O Error, failed creating xino(-17).
[98373.775380] aufs au_xino_create:767:dockerd[27435]: open /dev/shm/aufs.xino(-17)
[98389.015640] aufs au_xino_create2:186:dockerd[26753]: aufs.xino create err -17
[98389.018776] aufs au_xino_set:1229:dockerd[26753]: I/O Error, failed creating xino(-17).
[98424.117584] aufs au_xino_create:767:dockerd[27105]: open /dev/shm/aufs.xino(-17)
So, we have to have a lock around mount syscall.
While at it, don't call the whole Unmount() on an error path, as
it leads to bogus error from auplink flush.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 5cd62852fa199f272b542d828c8c5d1db427ea53)
Upstream-commit: bb4b9fe29eba57ed71d8105ec0e5dacc8e72129e
Component: engine
1. Use mount.Unmount() which ignores EINVAL ("not mounted") error,
and provides better error diagnostics (so we don't have to explicitly
add target to error messages).
2. Since we're ignoring "not mounted" error, we can call
multiple unmounts without any locking -- but since "auplink flush"
is still involved and can produce an error in logs, let's keep
the check for fs being mounted (it's just a statfs so should be fast).
2. While at it, improve the "can't unmount" error message in Put().
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 4beee98026feabe4f4f0468215b8fd9b56f90d5e)
Upstream-commit: 5b68d00abc4cad8b707a9a01dde9a420ca6eb995
Component: engine
Both mount and unmount calls are already protected by fine-grained
(per id) locks in Get()/Put() introduced in commit fc1cf1911bb
("Add more locking to storage drivers"), so there's no point in
having a global lock in mount/unmount.
The only place from which unmount is called without any locking
is Cleanup() -- this is to be addressed in the next patch.
This reverts commit 824c24e6802ad3ed7e26b4f16e5ae81869b98185.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit f93750b2c4d5f6144f0790ffa89291da3c097b80)
Upstream-commit: 701939112efc71a0c867a626f0a56df9c56030db
Component: engine
The function is not needed as it's just a shallow wrapper around
unix.Mount().
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 2f98b5f51fb0a21e1bd930c0e660c4a7c4918aa5)
Upstream-commit: 023b63a0f276b79277c1c961d8c98d4d5b39525d
Component: engine
The errors returned from Mount and Unmount functions are raw
syscall.Errno errors (like EPERM or EINVAL), which provides
no context about what has happened and why.
Similar to os.PathError type, introduce mount.Error type
with some context. The error messages will now look like this:
> mount /tmp/mount-tests/source:/tmp/mount-tests/target, flags: 0x1001: operation not permitted
or
> mount tmpfs:/tmp/mount-test-source-516297835: operation not permitted
Before this patch, it was just
> operation not permitted
[v2: add Cause()]
[v3: rename MountError to Error, document Cause()]
[v4: fixes; audited all users]
[v5: make Error type private; changes after @cpuguy83 reviews]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 65331369617e89ce54cc9be080dba70f3a883d1c)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: 7f1c6bf5a745c8faeba695d3556dff4c4ff5f473
Component: engine
Signed-off-by: John Howard <jhoward@microsoft.com>
(cherry picked from commit 293c74ba79f0008f48073985507b34af59b45fa6)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 91f5be57af815df372371e1e989d00963ce4d02f
Component: engine
It has been pointed out that we're ignoring EINVAL from umount(2)
everywhere, so let's move it to a lower-level function. Also, its
implementation should be the same for any UNIX incarnation, so
let's consolidate it.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 90be078fe59a8cfeff2bcc5dc2f34a00309837b6)
Upstream-commit: 47c51447e1b6dacf92b40574f6f929958ca9d621
Component: engine
As standard mount.Unmount does what we need, let's use it.
In addition, this adds ignoring "not mounted" condition, which
was previously implemented (see PR#33329, commit cfa2591d3f26)
via a very expensive call to mount.Mounted().
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 77bc327e24a60791fe7e87980faf704cf7273cf9)
Upstream-commit: 893b24b80db170279d5c9532ed508a81c328de5e
Component: engine
[18.09 backport] Pass root to chroot to for chroot Tar/Untar (CVE-2018-15664)
Upstream-commit: 72797601723f6a8847027a1abbd1a3cea2667718
Component: engine
Increases the max recieved gRPC message size for Node and Secret list
operations. This has already been done for the other swarm types, but
was not done for these.
Signed-off-by: Drew Erny <drew.erny@docker.com>
(cherry picked from commit a0903e1fa3eca32065c7dbfda8d1e0879cbfbb8f)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: a987f31fbc42526bf2cccf1381066f075b18dfe0
Component: engine
Previously only unpack operations were supported with chroot.
This adds chroot support for packing operations.
This prevents potential breakouts when copying data from a container.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 3029e765e241ea2b5249868705dbf9095bc4d529)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 61e0459053c359e322b8d5c017e855f616fd34c0
Component: engine
This is useful for preventing CVE-2018-15664 where a malicious container
process can take advantage of a race on symlink resolution/sanitization.
Before this change chrootarchive would chroot to the destination
directory which is attacker controlled. With this patch we always chroot
to the container's root which is not attacker controlled.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit d089b639372a8f9301747ea56eaf0a42df24016a)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 155939994f453559676656bc4b05635e83ebef56
Component: engine
[18.09] Backport Forcing a nil IP specified in PortBindings to IPv4 zero (0.0.0.0).
Upstream-commit: c882c0011f0857da5e8cef1ea48e0aa80019abf1
Component: engine
As pointed out by Tonis, there's a race between ReleaseRWLayer()
and GetRWLayer():
```
----- goroutine 1 ----- ----- goroutine 2 -----
ReleaseRWLayer()
m := ls.mounts[l.Name()]
...
m.deleteReference(l)
m.hasReferences()
... GetRWLayer()
... mount := ls.mounts[id]
ls.driver.Remove(m.mountID)
ls.store.RemoveMount(m.name) return mount.getReference()
delete(ls.mounts, m.Name())
----------------------- -----------------------
```
When something like this happens, GetRWLayer will return
an RWLayer without a storage. Oops.
There might be more races like this, and it seems the best
solution is to lock by layer id/name by using pkg/locker.
With this in place, name collision could not happen, so remove
the part of previous commit that protected against it in
CreateRWLayer (temporary nil assigmment and associated rollback).
So, now we have
* layerStore.mountL sync.Mutex to protect layerStore.mount map[]
(against concurrent access);
* mountedLayer's embedded `sync.Mutex` to protect its references map[];
* layerStore.layerL (which I haven't touched);
* per-id locker, to avoid name conflicts and concurrent operations
on the same rw layer.
The whole rig seems to look more readable now (mutexes use is
straightforward, no nested locks).
Reported-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit af433dd200f8287305b1531d5058780be36b7e2e)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 1ebe324c6a481de86a5b84494057a639773624f1
Component: engine
This is an additon to commit 1fea38856a ("Remove v1.10 migrator")
aka PR #38265. Since that one, CreateRWLayerByGraphID() is not
used anywhere, so let's drop it.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit b4e9b507655e7dbdfb44d4ee284dcd658b859b3f)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 1576eaba3357b7aab5121b51b9ebc34e731082fe
Component: engine
Goroutine stack analisys shown some lock contention
while doing massively (100 instances of `docker rm`)
parallel image removal, with many goroutines waiting
for the mountL mutex. Optimize it.
With this commit, the above operation is about 3x
faster, with no noticeable change to container
creation times (tested on aufs and overlay2).
kolyshkin@:
- squashed commits
- added description
- protected CreateRWLayer against name collisions by
temporary assiging nil to ls.mounts[name], and treating
nil as "non-existent" in all the other functions.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 05250a4f0094e6802dd7d338d632ea632d0c7e34)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 78cbf4d1388a4eef155f08da0d2422e3a2cad11f
Component: engine