In order to do this, allow the socketcall syscall in the default
seccomp profile. This is a multiplexing syscall for the socket
operations, which is becoming obsolete gradually, but it is used
in some architectures. libseccomp has special handling for it for
x86 where it is common, so we did not need it in the profile,
but does not have any handling for ppc64le. It turns out that the
Debian images we use for tests do use the socketcall, while the
newer images such as Ubuntu 16.04 do not. Enabling this does no
harm as we allow all the socket operations anyway, and we allow
the similar ipc call for similar reasons already.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Upstream-commit: a83cedddc6d3e0fe1df352ec54245090df641ab8
Component: engine
Currently the default seccomp profile is fixed. This changes it
so that it varies depending on the Linux capabilities selected with
the --cap-add and --cap-drop options. Without this, if a user adds
privileges, eg to allow ptrace with --cap-add sys_ptrace then still
cannot actually use ptrace as it is still blocked by seccomp, so
they will probably disable seccomp or use --privileged. With this
change the syscalls that are needed for the capability are also
allowed by the seccomp profile based on the selected capabilities.
While this patch makes it easier to do things with for example
cap_sys_admin enabled, as it will now allow creating new namespaces
and use of mount, it still allows less than --cap-add cap_sys_admin
--security-opt seccomp:unconfined would have previously. It is not
recommended that users run containers with cap_sys_admin as this does
give full access to the host machine.
It also cleans up some architecture specific system calls to be
only selected when needed.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Upstream-commit: a01c4dc8f85827f32d88522e5153dddc02f11806
Component: engine
These syscalls are already blocked by the default capabilities:
mlock mlock2 mlockall require CAP_IPC_LOCK
vhangup requires CAP_SYS_TTY_CONFIG
There is therefore no reason to allow them in the default profile
as they cannot be used anyway.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Upstream-commit: e7a99ae5e16f8688a0735c91856d13633f48185c
Component: engine
In order to check that we can have the `ptrace` rule, we need to
actually calculate the version of apparmor_parser.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Upstream-commit: d274456f3eb9f2a3dc518985ec22d236d3bc3f6c
Component: engine
ExecPath isn't used by anything, and the signal apparmor rule isn't used
because it refers to a peer that we don't ship.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Upstream-commit: 64fb664908f7d3368d1bbfd1efb56cd45e5ed7a3
Component: engine
This adds the following new syscalls that are supported in libseccomp 2.3.0,
including calls added up to kernel 4.5-rc4:
mlock2 - same as mlock but with a flag
copy_file_range - copy file contents, like splice but with reflink support.
The following are not added, and mentioned in docs:
userfaultfd - userspace page fault handling, mainly designed for process migration
The following are not added, only apply to less common architectures:
switch_endian
membarrier
breakpoint
set_tls
I plan to review the other architectures, some of which can now have seccomp
enabled in the build as they are now supported.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Upstream-commit: 96896f2d0bc16269778dd4f60a4920b49953ffed
Component: engine
Fixes#20818
This syscall was blocked as there was some concern that it could be
used to bypass filtering of other syscall arguments. However none of the
potential syscalls where this could be an issue (poll, nanosleep,
clock_nanosleep, futex) are blocked in the default profile anyway.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Upstream-commit: 5abd881883883a132f96f8adb1b07b5545af452b
Component: engine
This change centralizes the template manipulation in a single package
and adds basic string functions to their execution.
Signed-off-by: David Calavera <david.calavera@gmail.com>
Upstream-commit: 8514880997bd1bc944769dcc41e52307bb01f7ff
Component: engine
On 32 bit x86 this is a multiplexing syscall for the system V
ipc syscalls such as shmget, and so needs to be allowed for
shared memory access for 32 bit binaries.
Fixes#20733
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Upstream-commit: 31410a6d79fc4ea6fa496636015bf9f53c1c8b14
Component: engine
We generally want to filter the personality(2) syscall, as it
allows disabling ASLR, and turning on some poorly supported
emulations that have been the target of CVEs. However the use
cases for reading the current value, setting the default
PER_LINUX personality, and setting PER_LINUX32 for 32 bit
emulation are fine.
See issue #20634
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Upstream-commit: 39b799ac53e2ba397edc3063432d01478416dbc8
Component: engine
Ubuntu ships apparmor_parser 2.9 erroniously as "2.8.95". Fix the
incorrect version check for >=2.8, when in fact 2.8 deosn't support the
required feature.
Signed-off-by: Aleksa Sarai <asarai@suse.com>
Upstream-commit: 284d9d451e93baff311b501018cae2097f76b134
Component: engine
Using {{if major}}{{if minor}} doesn't work as expected when the major
version changes. In addition, this didn't support patch levels (which is
necessary in some cases when distributions ship apparmor weirdly).
Signed-off-by: Aleksa Sarai <asarai@suse.com>
Upstream-commit: 4bf7a84c969b9309b0534a61af55b8bb824acc0a
Component: engine
profile is created by go generate
Signed-off-by: Jessica Frazelle <acidburn@docker.com>
Upstream-commit: d57816de0293e18ecfa68ac6e8c288a888912e33
Component: engine