Commit Graph

5672 Commits

Author SHA1 Message Date
f68b87fd47 Support a proxy in splunk log driver
Signed-off-by: Daniel Nephin <dnephin@docker.com>
Upstream-commit: 3c4537d5b33d951237ea5e4cc123953eda7a37e7
Component: engine
2018-02-07 14:52:32 -05:00
44aff4f98f Merge pull request #36191 from cpuguy83/fix_attachable_network_race
Fix race in attachable network attachment
Upstream-commit: 6987557e0cef9bd139128e62d86586a40cda6036
Component: engine
2018-02-05 09:41:35 -08:00
e73d8c24d7 Libnetwork revendoring
Diff:
5ab4ab8300...20dd462e0a

- Memberlist revendor (fix for deadlock on exit)
- Network diagnostic client
- Fix for ndots configuration

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Upstream-commit: ec86547244fa329148a096db56f9ade77a7ce7eb
Component: engine
2018-02-02 14:36:32 -08:00
efc7d712c1 Fix race in attachable network attachment
Attachable networks are networks created on the cluster which can then
be attached to by non-swarm containers. These networks are lazily
created on the node that wants to attach to that network.

When no container is currently attached to one of these networks on a
node, and then multiple containers which want that network are started
concurrently, this can cause a race condition in the network attachment
where essentially we try to attach the same network to the node twice.

To easily reproduce this issue you must use a multi-node cluster with a
worker node that has lots of CPUs (I used a 36 CPU node).

Repro steps:

1. On manager, `docker network create -d overlay --attachable test`
2. On worker, `docker create --restart=always --network test busybox
top`, many times... 200 is a good number (but not much more due to
subnet size restrictions)
3. Restart the daemon

When the daemon restarts, it will attempt to start all those containers
simultaneously. Note that you could try to do this yourself over the API,
but it's harder to trigger due to the added latency from going over
the API.

The error produced happens when the daemon tries to start the container
upon allocating the network resources:

```
attaching to network failed, make sure your network options are correct and check manager logs: context deadline exceeded
```

What happens here is the worker makes a network attachment request to
the manager. This is an async call which in the happy case would cause a
task to be placed on the node, which the worker is waiting for to get
the network configuration.
In the case of this race, the error ocurrs on the manager like this:

```
task allocation failure" error="failed during network allocation for task n7bwwwbymj2o2h9asqkza8gom: failed to allocate network IP for task n7bwwwbymj2o2h9asqkza8gom network rj4szie2zfauqnpgh4eri1yue: could not find an available IP" module=node node.id=u3489c490fx1df8onlyfo1v6e
```

The task is not created and the worker times out waiting for the task.

---

The mitigation for this is to make sure that only one attachment reuest
is in flight for a given network at a time *when the network doesn't
already exist on the node*. If the network already exists on the node
there is no need for synchronization because the network is already
allocated and on the node so there is no need to request it from the
manager.

This basically comes down to a race with `Find(network) ||
Create(network)` without any sort of syncronization.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: c379d2681ffe8495a888fb1d0f14973fbdbdc969
Component: engine
2018-02-02 13:46:23 -05:00
4e10e192e9 Merge pull request #36160 from kolyshkin/layer-not-retained
daemon.cleanupContainer: nullify container RWLayer upon release
Upstream-commit: 53a58da551e961b3710bbbdfabbc162c3f5f30f6
Component: engine
2018-01-31 15:13:00 -08:00
877ec711a0 Fix issue of ExitCode and PID not show up in Task.Status.ContainerStatus
This fix tries to address the issue raised in 36139 where
ExitCode and PID does not show up in Task.Status.ContainerStatus

The issue was caused by `json:",omitempty"` in PID and ExitCode
which interprate 0 as null.

This is confusion as ExitCode 0 does have a meaning.

This fix removes  `json:",omitempty"` in ExitCode and PID,
but changes ContainerStatus to pointer so that ContainerStatus
does not show up at all if no content. If ContainerStatus
does have a content, then ExitCode and PID will show up (even if
they are 0).

This fix fixes 36139.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Upstream-commit: 9247e09944a4c7f3c2f3f20f180c047a19fb6bae
Component: engine
2018-01-31 15:35:19 +00:00
d59bd66e1b daemon.cleanupContainer: nullify container RWLayer upon release
ReleaseRWLayer can and should only be called once (unless it returns
an error), but might be called twice in case of a failure from
`system.EnsureRemoveAll(container.Root)`. This results in the
following error:

> Error response from daemon: driver "XXX" failed to remove root filesystem for YYY: layer not retained

The obvious fix is to set container.RWLayer to nil as soon as
ReleaseRWLayer() succeeds.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Upstream-commit: e9b9e4ace294230c6b8eb010eda564a2541c4564
Component: engine
2018-01-30 18:50:59 -08:00
98a8916ad3 Merge pull request #36124 from crosbymichael/exec
Use proc/exe for reexec
Upstream-commit: 9d61e5c8c1d85f633643a8b9071393dc417d56dd
Component: engine
2018-01-29 11:41:08 -08:00
9bff0e7832 Merge pull request #36130 from yongtang/36042-secret-config-mode
Fix secret and config mode issue
Upstream-commit: d093aa0ec365e1ffd4db8a513c5b341b9a0d012e
Component: engine
2018-01-29 10:37:24 -08:00
6c313c03fe Merge pull request #36114 from Microsoft/jjh/fixdeadlock-lcowdriver-hotremove
LCOW: Graphdriver fix deadlock in hotRemoveVHDs
Upstream-commit: 03a1df95369ddead968e48697038904c84578d00
Component: engine
2018-01-29 09:57:43 -08:00
82bc59e5d6 Fix secret and config mode issue
This fix tries to address the issue raised in 36042
where secret and config are not configured with the
specified file mode.

This fix update the file mode so that it is not impacted
with umask.

Additional tests have been added.

This fix fixes 36042.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Upstream-commit: 3305221eefd18ba7712a308c1fb05d4eeeac2cc6
Component: engine
2018-01-28 16:21:41 +00:00
5e3cb1566c Merge pull request #34992 from allencloud/simplify-shutdowntimeout
simplify codes on calculating shutdown timeout
Upstream-commit: c9f1807abbc60236f5552f8dd25e6d484584f037
Component: engine
2018-01-27 18:26:54 -08:00
8d2c67f10d Merge pull request #36095 from yongtang/36083-network-inspect-created-time
Fix issue where network inspect does not show Created time for networks in swarm scope
Upstream-commit: 924fb0e843930ca444e0f3a6632d7cb67a3da479
Component: engine
2018-01-26 17:18:30 -08:00
dce32bffba Use proc/exe for reexec
You don't need to resolve the symlink for the exec as long as the
process is to keep running during execution.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Upstream-commit: 59ec65cd8cec942cee6cbf2b8327ec57eb5078f0
Component: engine
2018-01-26 14:13:43 -05:00
db2a10168c Merge pull request #36047 from cpuguy83/graphdriver_improvements
Do not make graphdriver homes private mounts.
Upstream-commit: 2c05aefc99d33edde47b08e38978b6c2f4178648
Component: engine
2018-01-26 13:54:30 -05:00
4e8a0d189e Simplify codes on calculating shutdown timeout
Signed-off-by: Allen Sun <shlallen1990@gmail.com>
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
Upstream-commit: de68ac8393d32d2c2028dd11c5816430ad0d8d8b
Component: engine
2018-01-26 09:18:07 -08:00
a002f8068e LCOW: Graphdriver fix deadlock
Signed-off-by: John Howard <jhoward@microsoft.com>
Upstream-commit: a44fcd3d27c06aaa60d8d1cbce169f0d982e74b1
Component: engine
2018-01-26 08:57:52 -08:00
c49971b835 Merge pull request #36052 from Microsoft/jjh/no-overlay-off-only-one-disk
LCOW: Regular mount if only one layer
Upstream-commit: a8d0e36d0329063af9205b4848d1f5c09bd4c3be
Component: engine
2018-01-25 15:46:16 -08:00
1956fc58bd Merge pull request #36096 from cpuguy83/use_rshared_prop_for_daemon_root
Set daemon root to use shared propagation
Upstream-commit: 3ca99ac2f4a7196097d8f5d037ac10ebbcbb5c3c
Component: engine
2018-01-24 12:24:33 -08:00
f813e83349 Merge pull request #36078 from mixja/multiline-max-event-processing
awslogs - don't add new lines to maximum sized events
Upstream-commit: a636ed5ff473d69e9d0cda352fef0823518f016a
Component: engine
2018-01-24 12:06:49 -08:00
61c1474fc0 Merge pull request #35938 from yongtang/35931-filter-before-since
Fix `before` and `since` filter for `docker ps`
Upstream-commit: 25e56670cf7cd69e60c0d58ed25c33dbb21d3d8e
Component: engine
2018-01-24 12:06:19 -08:00
a0a9bd7e22 Merge pull request #36077 from yongtang/35752-verifyNetworking
Verify NetworkingConfig to make sure EndpointSettings is not nil
Upstream-commit: 914ce4fde798b41144ac931619f39a2c96eab261
Component: engine
2018-01-24 12:05:58 -08:00
91c8e6e25b Merge pull request #35966 from yongtang/33661-network-alias
Fix network alias issue with `network connect`
Upstream-commit: 70a0621f2558061b93ad24f04e9491bb5e0b8fdc
Component: engine
2018-01-23 14:56:28 -08:00
b1dfd77fa4 Set daemon root to use shared propagation
This change sets an explicit mount propagation for the daemon root.
This is useful for people who need to bind mount the docker daemon root
into a container.

Since bind mounting the daemon root should only ever happen with at
least `rlsave` propagation (to prevent the container from holding
references to mounts making it impossible for the daemon to clean up its
resources), we should make sure the user is actually able to this.

Most modern systems have shared root (`/`) propagation by default
already, however there are some cases where this may not be so
(e.g. potentially docker-in-docker scenarios, but also other cases).
So this just gives the daemon a little more control here and provides
a more uniform experience across different systems.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: a510192b86e7eb1e1112f3f625d80687fdec6578
Component: engine
2018-01-23 14:17:08 -08:00
8dd7e2516b Fix issue where network inspect does not show Created time in swarm scope
This fix tries to address the issue raised in 36083 where
`network inspect` does not show Created time if the network is
created in swarm scope.

The issue was that Created was not converted from swarm api.
This fix addresses the issue.

An unit test has been added.

This fix fixes 36083.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Upstream-commit: 090c439fb8a863731cc80fcb9932ce5958d8166d
Component: engine
2018-01-23 18:26:51 +00:00
544ec9eef9 Merge pull request #36019 from thaJeztah/improve-config-reload
improve daemon config reload; log active configuration
Upstream-commit: 99cfb5f31ad82238573de3475bf5bb0435ac1ebc
Component: engine
2018-01-22 17:58:25 -08:00
74da78f854 Fix network alias issue
This fix tries to address the issue raised in 33661 where
network alias does not work when connect to a network the second time.

This fix address the issue.

This fix fixes 33661.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Upstream-commit: d63a5a1ff593f14957f3e0a9678633e8237defc9
Component: engine
2018-01-23 01:04:33 +00:00
50c6561a9e Merge pull request #35949 from yongtang/34248-carry
Carry #34248 Added tag log option to json-logger and use RawAttrs
Upstream-commit: ea74dbe907f534ba2f59c1173330987c3fa84208
Component: engine
2018-01-22 15:02:54 -08:00
a53f2c40a3 Verify NetworkingConfig to make sure EndpointSettings is not nil
This fix tries to address the issue raised in 35752
where container start will trigger a crash if EndpointSettings is nil.

This fix adds the validation to make sure EndpointSettings != nil

This fix fixes 35752.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Upstream-commit: 8d2f4cb24129d87674a13319ca48ce8636ee527a
Component: engine
2018-01-22 16:31:10 +00:00
362cc9aedc Don't append new line for maximum sized events
Signed-off-by: Justin Menga <justin.menga@gmail.com>
Upstream-commit: d3e2d55a3d84d41c331151c9633211f0fb6a3096
Component: engine
2018-01-21 14:29:55 +13:00
36e0e57cbe Log active configuration when reloading
When succesfully reloading the daemon configuration, print a message
in the logs with the active configuration:

    INFO[2018-01-15T15:36:20.901688317Z] Got signal to reload configuration, reloading from: /etc/docker/daemon.json
    INFO[2018-01-14T02:23:48.782769942Z] Reloaded configuration: {"mtu":1500,"pidfile":"/var/run/docker.pid","data-root":"/var/lib/docker","exec-root":"/var/run/docker","group":"docker","deprecated-key-path":"/etc/docker/key.json","max-concurrent-downloads":3,"max-concurrent-uploads":5,"shutdown-timeout":15,"debug":true,"hosts":["unix:///var/run/docker.sock"],"log-level":"info","swarm-default-advertise-addr":"","metrics-addr":"","log-driver":"json-file","ip":"0.0.0.0","icc":true,"iptables":true,"ip-forward":true,"ip-masq":true,"userland-proxy":true,"disable-legacy-registry":true,"experimental":false,"network-control-plane-mtu":1500,"runtimes":{"runc":{"path":"docker-runc"}},"default-runtime":"runc","oom-score-adjust":-500,"default-shm-size":67108864,"default-ipc-mode":"shareable"}

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 8378dcf46d017c70df97d6f851e0196b113b422e
Component: engine
2018-01-21 00:56:02 +01:00
c2b247fce6 Move reload-related functions to reload.go
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: 6121a8429b9d3a6d20e900c521c2f50fff5db406
Component: engine
2018-01-21 00:55:49 +01:00
bde2b4704d Merge pull request #35830 from cpuguy83/unbindable_shm
Make container shm parent unbindable
Upstream-commit: c162e8eb417bbc124c1f89f676aea081ebb6251f
Component: engine
2018-01-19 17:43:30 -08:00
946a37c1e4 Merge pull request #35744 from ndeloof/35702
closes #35702 introduce « exec_die » event
Upstream-commit: f97256cbf1811740cfa9a72f705c8a70195cd468
Component: engine
2018-01-19 15:03:50 -08:00
30c97f4539 Merge pull request #36003 from pradipd/upgrade_fix
Fixing ingress network when upgrading from 17.09 to 17.12.
Upstream-commit: 949ee0e5297408e97c9b5444d500a2cecab06609
Component: engine
2018-01-19 15:46:50 -05:00
12ceea25e6 Merge pull request #36051 from Microsoft/jjh/remotefs-read-return-error
LCOW remotefs - return error in Read() implementation
Upstream-commit: 3c9d023af3428f49241a2e2385dae43151185466
Component: engine
2018-01-19 11:27:13 -08:00
3cf8a0c442 Carry 34248 Added tag log option to json-logger and use RawAttrs
This fix carries PR 34248: Added tag log option to json-logger

This fix changes to use RawAttrs based on review feedback.

This fix fixes 19803, this fix closes 34248.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Upstream-commit: e77267c5a682e2c5aaa32469f2c83c2479d57566
Component: engine
2018-01-19 17:51:20 +00:00
9dd65d097b Added tag log option to json-logger
Fixes #19803
Updated the json-logger to utilize the common log option
'tag' that can define container/image information to include
as part of logging.

When the 'tag' log option is not included, there is no change
to the log content via the json-logger. When the 'tag' log option
is included, the tag will be parsed as a template and the result
will be stored within each log entry as the attribute 'tag'.

Update: Removing test added to integration_cli as those have been deprecated.
Update: Using proper test calls (require and assert) in jsonfilelog_test.go based on review.
Update: Added new unit test configs for logs with tag. Updated unit test error checking.
Update: Cleanup check in jsonlogbytes_test.go to match pending changes in PR #34946.
Update: Merging to correct conflicts from PR #34946.

Signed-off-by: bonczj <josh.bonczkowski@gmail.com>
Upstream-commit: 5f50f4f511cd84e79bf005817af346b1764df27f
Component: engine
2018-01-19 17:41:19 +00:00
d00d4e32b9 Merge pull request #34859 from Microsoft/jjh/singleimagestore
LCOW: Coalesce daemon stores, allow dual LCOW and WCOW mode
Upstream-commit: bb6ce897378b4ebd0131fd835b01ad5f9af3ebb9
Component: engine
2018-01-19 11:38:30 -05:00
ebd586c561 LCOW remotefs - return error in Read() implementation
Signed-off-by: John Howard <jhoward@microsoft.com>
Upstream-commit: 6112ad6e7d5d7f5afc698447da80f91bdbf62720
Component: engine
2018-01-18 17:46:58 -08:00
40b95b8e94 Address feedback from Tonis
Signed-off-by: John Howard <jhoward@microsoft.com>
Upstream-commit: 0cba7740d41369eee33b671f26276325580bc07b
Component: engine
2018-01-18 12:30:39 -08:00
942fd3c62c LCOW: Regular mount if only one layer
Signed-off-by: John Howard <jhoward@microsoft.com>
Upstream-commit: 420dc4eeb48b155e6b83fccf62f8727ce4bf5b21
Component: engine
2018-01-18 12:01:58 -08:00
b4c44961cf Merge pull request #36030 from cpuguy83/quota_update
Ensure CPU quota/period updates are sent to runc
Upstream-commit: 0fa3962b8d8d78020c7e636c4bcea14d618929e1
Component: engine
2018-01-18 19:54:10 +01:00
852153685d LCOW: Refactor to multiple layer-stores based on feedback
Signed-off-by: John Howard <jhoward@microsoft.com>
Upstream-commit: afd305c4b5682fbc297e1685e2b7a49628b7c7f0
Component: engine
2018-01-18 08:31:05 -08:00
33860da10b LCOW: Re-coalesce stores
Signed-off-by: John Howard <jhoward@microsoft.com>

The re-coalesces the daemon stores which were split as part of the
original LCOW implementation.

This is part of the work discussed in https://github.com/moby/moby/issues/34617,
in particular see the document linked to in that issue.
Upstream-commit: ce8e529e182bde057cdfafded62c210b7293b8ba
Component: engine
2018-01-18 08:29:19 -08:00
ce1ad508f6 Merge pull request #35960 from abhi/service
Disable service on release network
Upstream-commit: 6feae060033544985e548dcf1b9127f8f634fe2b
Component: engine
2018-01-18 11:19:47 -05:00
9b47a9d16f Do not make graphdriver homes private mounts.
The idea behind making the graphdrivers private is to prevent leaking
mounts into other namespaces.
Unfortunately this is not really what happens.

There is one case where this does work, and that is when the namespace
was created before the daemon's namespace.
However with systemd each system servie winds up with it's own mount
namespace. This causes a race betwen daemon startup and other system
services as to if the mount is actually private.

This also means there is a negative impact when other system services
are started while the daemon is running.

Basically there are too many things that the daemon does not have
control over (nor should it) to be able to protect against these kinds
of leakages. One thing is certain, setting the graphdriver roots to
private disconnects the mount ns heirarchy preventing propagation of
unmounts... new mounts are of course not propagated either, but the
behavior is racey (or just bad in the case of restarting services)... so
it's better to just be able to keep mount propagation in tact.

It also does not protect situations like `-v
/var/lib/docker:/var/lib/docker` where all mounts are recursively bound
into the container anyway.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Upstream-commit: 9803272f2db84df7955b16c0d847ad72cdc494d1
Component: engine
2018-01-18 09:34:00 -05:00
2012c45c5a Disable service on release network
This PR contains a fix for moby/moby#30321. There was a moby/moby#31142
PR intending to fix the issue by adding a delay between disabling the
service in the cluster and the shutdown of the tasks. However
disabling the service was not deleting the service info in the cluster.
Added a fix to delete service info from cluster and verified using siege
to ensure there is zero downtime on rolling update of a service.In order
to support it and ensure consitency of enabling and disable service knob
from the daemon, we need to ensure we disable service when we release
the network from the container. This helps in making the enable and
disable service less racy. The corresponding part of libnetwork fix is
part of docker/libnetwork#1824

Signed-off-by: abhi <abhi@docker.com>
Upstream-commit: a042e5a20a7801efc936daf7a639487bb37ca966
Component: engine
2018-01-17 14:19:51 -08:00
0986b8a32c Fixing ingress network when upgrading from 17.09 to 17.12.
Signed-off-by: Pradip Dhara <pradipd@microsoft.com>

Signed-off-by: Pradip Dhara <pradipd@microsoft.com>
Upstream-commit: 2d7a50e5855ad0571e76d29cd1ab9f8f3a48433b
Component: engine
2018-01-17 17:11:18 +00:00
5a20e1240c LCOW: Fix OpenFile parameters
Signed-off-by: John Howard <jhoward@microsoft.com>
Upstream-commit: 141b9a74716c016029badf16aca21dc96975aaac
Component: engine
2018-01-17 07:58:18 -08:00