Right now we check for the existence of device but don't make sure it is
a thin pool device. We assume it is a thin pool device and call poolStatus()
on the device which returns an error EOF. And that error does not tell
anything.
So before we reach the stage of calling poolStatus() make sure we are working
with a thin pool device otherwise error out.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: 6d2d0a74e8effe65175473352bf58fc4beda1718
Component: engine
Start a goroutine which runs every 30 seconds and if there are deferred
deleted devices, it tries to clean those up.
Also it moves the call to cleanupDeletedDevices() into goroutine and
moves the locking completely inside the function. Now function does not
assume that device lock is held at the time of entry.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: 87de04005d61ab986d70a31e3ac82ec92912df55
Component: engine
Keep track of number of deleted devices and export this information through
"docker info".
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: d295dc66521e2734390473ec1f1da8a73ad3288a
Component: engine
Finally here is the patch to implement deferred deletion functionality.
Deferred deleted devices are marked as "Deleted" in device meta file.
First we try to delete the device and only if deletion fails and user has
enabled deferred deletion, device is marked for deferred deletion.
When docker starts up again, we go through list of deleted devices and
try to delete these again.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: d929589c1fc4538dcd1b2a7a3dc7d4afbdfa72fd
Component: engine
Provide a command line option dm.use_deferred_deletion to enable deferred
device deletion feature. By default feature will be turned off.
Not sure if there is much value in deferred deletion being turned on
without deferred removal being turned on. So for now, this feature can
be enabled only if deferred removal is on.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: 51e059e7e90f37848596a0b6ec83f7835e6e4131
Component: engine
Currently during startup we walk through all the device files and read
their device ID and mark in a bitmap that device id is used.
We are anyway going through all device files. So we can as well load all
that data into device hash map. This will save us little time when
container is actually launched later.
Also this will help with later patches where cleanup deferred device
wants to go through all the devices and see which have been marked for
deletion and delete these.
So re-organize the code a bit.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: 6b8b4feaa165af630dd33423f0538a0b987703ae
Component: engine
Simplify setupBaseImage() even further. Move some more code in a separate
function. Pure code reorganization. No functionality change.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: 0fcd485626f110bbf39e8a6b1edc11b4ac6f7065
Component: engine
Move thin pool related checks in a separate function. Pure code reorganization.
Makes reading code easier.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: 69051ec0a58906e5252f902063ccead2075c8bed
Component: engine
This moves base device creation function in a separate function. Pure
code reorganization. Makes reading code little easier.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: efc1ddd7e3341124a2ebbb8a358f44754b32f310
Component: engine
This patch does three things. Following are the descriptions.
===
Create a separate function for delete transactions so that parent function
is little smaller.
Also close transaction if an error happens.
===
When docker is being shutdown, save deviceset metadata first before
trying to remove the devices. Generally caller gives only 10 seconds
for shutdown to complete and then kills it after that. So if some device
is busy, we will wait 20 seconds for it removal and never be able to save
metadata. So first save metadata and then deal with device removal.
===
Move issue discard operation in a separate function. This makes reading code
little easier.
Also don't issue discards if device is still open. That means devices is
still probably being used and issuing discards is not a good idea.
This is especially true in case of deferred deletion. We want to issue
discards when device is not open. At that time device can be deleted too.
Otherwise we will issue discards and deletion will actually fail. Later
we will try deletion again and issue discards again and deletion will
fail again as device is open and busy.
So this will ensure that discards are issued once when device is not open
and it can actually be deleted.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: 482eca3099ad4bf3a7db62a31f88e2158cbbc933
Component: engine
Right now we seem to have 3 locks.
- devinfo.lock
This is a per device lock
- metaData.devicesLock
This is supposedely protecting map of devices.
- Global DeviceSet lock
This is protecting map of devices as well as serializing calls to libdevmapper.
Semantics of per devices lock and global deviceset lock seem to be very clear.
Even ordering between these two locks has been defined properly.
What is not clear is the need and ordering of metaData.devicesLock. Looks like
this lock is not necessary and global DeviceSet lock should be used to
protect map of devices as it is part of DeviceSet.
This patchset gets rid of metaData.devicesLock and instead uses DeviceSet
lock to protect map of devices.
Also at couple of places during initialization takes devices.Lock(). That
is not strictly necessary as there is supposed to be one thread of execution
during initializaiton. Still it makes the code clearer.
I think this makes code more clear and easier to understand and easier to
make further changes.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: 289145ecc6c016b8bb6797cc50905e34da911e84
Component: engine
Looks like nobody is calling HasActivatedDevice(). Get rid of it.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: 73f8b46d84e73929eacafe4c2f9676469f862c49
Component: engine
maxDeviceID is upper limit on device Id thin pool can support. Right now
we have this check only during startup. It is a good idea to move this
check in loadMetadata so that any time a device file is loaded and if it
is corrupted and device Id is more than maxDevieceID, it will be detected
right then and there.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: 94caae2477f42b55adeab22f14bcef22b90373d6
Component: engine
Use deactivateDevice() instead of removeDevice() directly. This will make
sure for device deletion, deferred removal is used if user has configured
it in. Also this makes reading code litle easier as there is single function
to remove a device and that is deactivateDevice().
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: 39081eb3aa8f715c1da6f798c6531efd7a8a494c
Component: engine
If a device is still mounted at the time of DeleteDevice(), that means
higher layers have not called Put() properly on the device and are trying
to delete it. This is a bug in the code where Get() and Put() have not been
properly paired up. Fail device deletion if it is still mounted.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: e97e46b7376c3195383636d305d832b2ba3f82c5
Component: engine
Exists() and HasDevice() just check if device file exists or not. It does
not say anything about if device is mounted or not. Fix comments.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: f5c0eb9ffe9a9d30ac6ff81fa1a4c216908189a6
Component: engine
device has map (device.Devices), contains valid devices and we skip all
the files which are not device files. transaction metadata file is not
device file. Skip this file when devices files are being read and loaded
into map.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: ba02bf31cbffe329e549516e0004cf113ab4517e
Component: engine
The mount syscall does not handle string flags like "noatime",
we must use bitmasks like MS_NOATIME instead.
pkg/mount.Mount already handles this work.
Signed-off-by: Carl Henrik Lunde <chlunde@ping.uio.no>
Upstream-commit: 9a64f2bbb3d8fed35bb094c73dac54a13dcf7369
Component: engine
Fixes: #15279
Due to
7904946eeb
the devices field is dropped.
This solution works on go1.4 and go1.5
Signed-off-by: Vincent Batts <vbatts@redhat.com>
Upstream-commit: f83d05c3be3c3bcc84f6fa229504848ee8078321
Component: engine
TL;DR: check for IsExist(err) after a failed MkdirAll() is both
redundant and wrong -- so two reasons to remove it.
Quoting MkdirAll documentation:
> MkdirAll creates a directory named path, along with any necessary
> parents, and returns nil, or else returns an error. If path
> is already a directory, MkdirAll does nothing and returns nil.
This means two things:
1. If a directory to be created already exists, no error is returned.
2. If the error returned is IsExist (EEXIST), it means there exists
a non-directory with the same name as MkdirAll need to use for
directory. Example: we want to MkdirAll("a/b"), but file "a"
(or "a/b") already exists, so MkdirAll fails.
The above is a theory, based on quoted documentation and my UNIX
knowledge.
3. In practice, though, current MkdirAll implementation [1] returns
ENOTDIR in most of cases described in #2, with the exception when
there is a race between MkdirAll and someone else creating the
last component of MkdirAll argument as a file. In this very case
MkdirAll() will indeed return EEXIST.
Because of #1, IsExist check after MkdirAll is not needed.
Because of #2 and #3, ignoring IsExist error is just plain wrong,
as directory we require is not created. It's cleaner to report
the error now.
Note this error is all over the tree, I guess due to copy-paste,
or trying to follow the same usage pattern as for Mkdir(),
or some not quite correct examples on the Internet.
[v2: a separate aufs commit is merged into this one]
[1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Upstream-commit: a83a76934787a20e96389d33bd56a09369f9b808
Component: engine
Current default basesize is 10G. Change it to 100G. Reason being that for
some people 10G is turning out to be too small and we don't have capabilities
to grow it dyamically.
This is just overcommitting and no real space is allocated till container
actually writes data. And this is no different then fs based graphdrivers
where virtual size of a container root is unlimited.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: 424d5e55a2f863b8eadab578e3ba647de09a4354
Component: engine
Replaced github.com/docker/libcontainer with
github.com/opencontainers/runc/libcontaier.
Also I moved AppArmor profile generation to docker.
Main idea of this update is to fix mounting cgroups inside containers.
After updating docker on CI we can even remove dind.
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
Upstream-commit: c86189d554ba14aa04b6314970d3699e5ddbf4de
Component: engine
The ability to save and verify base device UUID (#13896) introduced a
situation where the initialization would panic when removing the device
returns EBUSY.
Functions `verifyBaseDeviceUUID` and `saveBaseDeviceUUID` now take the
lock on the `DeviceSet`, which solves the problem as `removeDevice`
assumes it owns the lock.
Signed-off-by: Arnaud Porterie <arnaud.porterie@docker.com>
Upstream-commit: f08989902374a517b1f8e5e0bfd3b4ea59e5ba27
Component: engine
Often it happens that docker is not able to shutdown/remove the thin
pool it created because some device has leaked into some mount name
space. That means device is in use and that means pool can't be removed.
Docker will leave pool as it is and exit. Later when user starts the
docker, it finds pool is already there and docker uses it. But docker
does not know it is same pool which is using the loop devices. Now
docker thinks loop devices are not being used. That means it does not
display the data correctly in "docker info", giving user wrong information.
This patch tries to detect if loop devices as created by docker are
being used for pool and fills in the right details in "docker info".
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: bebf53443981c70a6a714ea518dc966a0e2b6558
Component: engine
DeviceMapper must be explicitly selected because the Docker binary might not be linked to the right devmapper library.
With this change, Docker fails fast if the driver detection finds the devicemapper directory but the driver is not the default option.
The option `override_udev_sync_check` doesn't make sense anymore, since the user must be explicit to select devicemapper, so it's being removed.
Docker fails to use devicemapper only if Docker has been built statically unless the option was explicit.
Signed-off-by: David Calavera <david.calavera@gmail.com>
Upstream-commit: 0a376291b2213699f986a7bca1cc8c4f4ed00f8d
Component: engine
It is easy for one to use docker for a while, shut it down and restart
docker with different set of storage options for device mapper driver
which will effectively change the thin pool. That means any of the
metadata stored in /var/lib/docker/devicemapper/metadata/ is not valid
for the new pool and user will run into various kind of issues like
container not found in the pool etc.
Users think that their images or containers are lost but it might just
be the case of configuration issue. People might use wrong metadata
with wrong pool.
To detect such situations, save UUID of base image and once docker
starts later, query and compare the UUID of base image with the
stored one. If they don't match, fail the initialization with the
error that UUID failed to match.
That way user will be forced to cleanup /var/lib/docker/ directory
and start docker again.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: c06b05b11e6bbe48ae3ca140096d7862e5e312f8
Component: engine
Export image/container metadata stored in graph driver. Right now 3 fields
DeviceId, DeviceSize and DeviceName are being exported from devicemapper.
Other graph drivers can export fields as they see fit.
This data can be used to mount the thin device outside of docker and tools
can look into image/container and do some kind of inspection.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: 407a626be62996cd6385ea4d80e669ab83f5f04d
Component: engine
Right now devicemapper mounts thin device using online discards by default
and passes mount option "discard". Generally people discourage usage of
online discards as they can be a drain on performance. Instead it is
recommended to use fstrim once in a while to reclaim the space.
In case of containers, we recommend to keep data volumes separate. So
there might not be lot of rm, unlink operations going on and there might
not be lot of space being freed by containers. So it might not matter
much if we don't reclaim that free space in pool.
User can still pass mount option explicitly using dm.mountopt=discard to
enable discards if they would like to.
So this is more like setting the containers by default for better performance
instead of better space efficiency in pool. And user can change the behavior
if they don't like default behavior.
Reported-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Upstream-commit: 04adaaf1ee1688ff48cb8f541dcb80e965f45080
Component: engine