We currently drop the global lock while holding a per-device lock when
waiting for device removal, and then we re-aquire it when the sleep is done.
This is causing a AB-BA deadlock if anyone at the same time tries to do any
operation on that device like this:
thread A: thread B
grabs global lock
grabs device lock
releases global lock
sleeps
grabs global lock
blocks on device lock
wakes up
blocks on global lock
To trigger this you can for instance do:
ID=`docker run -d fedora sleep 5`
cd /var/lib/docker/devicemapper/mnt/$ID
docker wait $ID
docker rm $ID &
docker rm $ID
The unmount will fail due to the mount being busy thus causing the
timeout and the second rm will then trigger the deadlock.
We fix this by adding a lock ordering such that the device locks
are always grabbed before the global lock. This is safe since the
device lookups now have a separate lock.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
Upstream-commit: 2ffef1b7eb618162673c6ffabccb9ca57c7dfce3
Component: engine