The container that slirp4netns runs in should already be quite difficult to do
anything malicious in beyond basic denial of service or sending of network
traffic. There is, however, one hole remaining in the case in which there is
an adversary able to run code locally: abstract unix sockets. Because these
are governed by network namespaces, not IPC namespaces, and slirp4netns is in
the root network namespace, any process in the root network namespace can
cooperate with the slirp4netns process to take over its user.
To close this, we use seccomp to block the creation of unix-domain sockets by
slirp4netns. This requires some finesse, since slirp4netns absolutely needs
to be able to create other types of sockets - at minimum AF_INET and AF_INET6
Seccomp has many, many pitfalls. To name a few:
1. Seccomp provides you with an "arch" field, but this does not uniquely
determine the ABI being used; the actual meaning of a system call number
depends on both the number (which is often the result of ORing a related
system call with a flag for an alternate ABI) and the architecture.
2. Seccomp provides no direct way of knowing what the native value for the
arch field should be; the user must do configure/compile-time testing for
every architecture+ABI combination they want to support. Amusingly enough,
the linux-internal header files have this exact information
(SECCOMP_ARCH_NATIVE), but they aren't sharing it.
3. The only system call numbers we naturally have are the native ones in
asm/unistd.h. __NR_socket will always refer to the system call number for
the target system's ABI.
4. Seccomp can only manipulate 32-bit words, but represents every system call
argument as a uint64.
5. New system call numbers with as-yet-unknown semantics can be added to the
kernel at any time.
6. Based on this comment in arch/x86/entry/syscalls/syscall_32.tbl:
# 251 is available for reuse (was briefly sys_set_zone_reclaim)
previously-invalid system call numbers may later be reused for new system
calls.
7. Most architecture+ABI combinations have system call tables with many gaps
in them. arm-eabi, for example, has 35 such gaps (note: this is just the
number of distinct gaps, not the number of system call numbers contained in
those gaps).
8. Seccomp's BPF filters require a fully-acyclic control flow graph.
Any operation on a data structure must therefore first be fully
unrolled before it can be run.
9. Seccomp cannot dereference pointers. Only the raw bits provided to the
system calls can be inspected.
10. Some architecture+ABI combos have multiplexer system calls. For example,
socketcall can perform any socket-related system call. The arguments to
the multiplexed system call are passed indirectly, via a pointer to user
memory. They therefore cannot be inspected by seccomp.
11. Some valid system calls are not listed in any table in the kernel source.
For example, __ARM_NR_cacheflush is an "ARM private" system call. It does
not appear in any *.tbl file.
12. Conditional branches are limited to relative jumps of at most 256
instructions forward.
13. Prior to Linux 4.8, any process able to spawn another process and call
ptrace could bypass seccomp restrictions.
To address (1), (2), and (3), we include preprocessor checks to identify the
native architecture value, and reject all system calls that don't use the
native architecture.
To address (4), we use the AC_C_BIGENDIAN autoconf check to conditionally
define WORDS_BIGENDIAN, and match up the proper portions of any uint64 we test
for with the value in the accumulator being tested against.
To address (5) and (6), we use system call pinning. That is, we hardcode a
snapshot of all the valid system call numbers at the time of writing, and
reject any system call numbers not in the recorded set. A set is recorded for
every architecture+ABI combo, and the native one is chosen at compile-time.
This ensures that not only are non-native architectures rejected, but so are
non-native ABIs. For the sake of conciseness, we represent these sets as sets
of disjoint ranges. Due to (7), checking each range in turn could add a lot
of overhead to each system call, so we instead binary search through the
ranges. Due to (8), this binary search has to be fully unrolled, so we do
that too.
It can be tedious and error-prone to manually produce the syscall ranges by
looking at linux's *.tbl files, since the gaps are often small and
uncommented. To address this, a script, build-aux/extract-syscall-ranges.sh,
is added that will produce them given a *.tbl filename and an ABI regex (some
tables seem to abuse the ABI field with strange values like "memfd_secret").
Note that producing the final values still requires looking at the proper
asm/unistd.h file to find any private numbers and to identify any offsets and
ABI variants used.
(10) used to have no good solution, but in the past decade most architectures
have gained dedicated system call alternatives to at least socketcall, so we
can (hopefully) just block it entirely.
To address (13), we block ptrace also.
* build-aux/extract-syscall-ranges.sh: new script.
* Makefile.am (EXTRA_DIST): register it.
* config-daemon.ac: use AC_C_BIGENDIAN.
* nix/libutil/spawn.cc (setNoNewPrivsAction, addSeccompFilterAction): new
functions.
* nix/libutil/spawn.hh (setNoNewPrivsAction, addSeccompFilterAction): new
declarations.
(SpawnContext)[setNoNewPrivs, addSeccompFilter]: new fields.
* nix/libutil/seccomp.hh: new header file.
* nix/libutil/seccomp.cc: new file.
* nix/local.mk (libutil_a_SOURCES, libutil_headers): register them.
* nix/libstore/build.cc (slirpSeccompFilter, writeSeccompFilterDot):
new functions.
(spawnSlirp4netns): use them, set seccomp filter for slirp4netns.
Change-Id: Ic92c7f564ab12596b87ed0801b22f88fbb543b95
Signed-off-by: John Kehayias <john.kehayias@protonmail.com>
This adds a mechanism for manipulating and running "spawn phases" similarly to
how builder-side code manipulates "build phases". The main difference is that
spawn phases take a (reference to a) single structure that they can both read
from and write to, with their writes being visible to subsequent phases. The
base structure type for this is SpawnContext.
It also adds some predefined phase sequences, namely basicSpawnPhases and
cloneSpawnPhases, and exposes each of the actions performed by these phases.
Finally, it modifies build.cc to replace runChild() with use of this new code.
* nix/libutil/util.cc (keepOnExec, waitForMessage): new functions.
* nix/libutil.util.hh (keepOnExec, waitForMessage): add prototypes.
* nix/libutil/spawn.cc, nix/libutil/spawn.hh: new files.
(addPhaseAfter, addPhaseBefore, prependPhase, appendPhase, deletePhase,
replacePhase, reset_writeToStderrAction, restoreAffinityAction,
setsidAction, earlyIOSetupAction, dropAmbientCapabilitiesAction,
chrootAction, chdirAction, closeMostFDsAction, setPersonalityAction,
oomSacrificeAction, setIDsAction, restoreSIGPIPEAction, setupSuccessAction,
execAction, getBasicSpawnPhases, usernsInitSyncAction, usernsSetIDsAction,
initLoopbackAction, setHostAndDomainAction, makeFilesystemsPrivateAction,
makeChrootSeparateFilesystemAction, statfsToMountFlags, bindMount,
mountIntoChroot, mountIntoChrootAction, mountProcAction, mountDevshmAction,
mountDevptsAction, pivotRootAction, lockMountsAction, getCloneSpawnPhases,
runChildSetup, runChildSetupEntry, cloneChild, idMapToIdentityMap,
unshareAndInitUserns): new procedures.
* nix/local.mk (libutil_a_SOURCES): add spawn.cc.
(libutil_headers): add spawn.hh.
* nix/libstore/build.cc (restoreSIGPIPE, DerivationGoal::runChild,
childEntry): removed procedures.
(DerivationGoal::{dirsInChroot,env,readiness}): removed.
(execBuilderOrBuiltin, execBuilderOrBuiltinAction,
clearRootWritePermsAction): new procedures.
(DerivationGoal::startBuilder): modified to use a CloneSpawnContext if
chroot builds are available, otherwise a SpawnContext.
Change-Id: Ifd50110de077378ee151502eda62b99973d083bf
Change-Id: I76e10d3f928cc30566e1e6ca79077196972349f8
spawn.cc, util.cc, util.hh changes
Change-Id: I287320e63197cb4f65665ee5b3fdb3a0e125ebac
Signed-off-by: John Kehayias <john.kehayias@protonmail.com>
Fixes <https://issues.guix.gnu.org/78318>.
This is a followup to 107eb8ee8f.
* nix/local.mk (etc/guix-%.service): Add ‘g’ for ‘@localstatedir@’
substitution. Substitute ‘@storedir@’.
Reported-by: Ido Yariv <yarivido@gmail.com>
Change-Id: I9b53d3a6d713a000bc0a7a57f667badc00d2dff8
This is a followup to 8a7bd211d2.
As it is mentioned in autoconf manual that library names should be
specified in LIBS, not LDFLAGS. See:
https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.72/html_node/Preset-Output-Variables.html#index-LDFLAGS-2
This change also brings back the save_* vars trick that was there
before. I missed in my earlier change that nix/local.mk was referring
LIBGCRYPT_* vars directly.
And, instead of CXXFLAGS, CPPFLAGS is used since the latter is probably
more correct as this is used for include dirs, therefore using
preprocessor flags.
Tested with ./configure LDFLAGS="-Wl,--as-needed" --with-libgcrypt-prefix=... combinations.
* config-daemon.ac: Set ‘LIBGCRYPT_CPPFLAGS’ instead of
‘LIBGCRYPT_CXXFLAGS’. Set ‘LIBGCRYPT_LIBS’ in addition to
‘LIBGCRYPT_LDFLAGS’. Save and restore ‘CPPFLAGS’, ‘LDFLAGS’, and ‘LIBS’
around test.
* nix/local.mk (libutil_a_CPPFLAGS): Add $(LIBGCRYPT_CPPFLAGS).
(libstore_a_CXXFLAGS): Remove $(LIBGCRYPT_CFLAGS).
(guix_daemon_LDFLAGS): New variable.
Change-Id: Iadb10e1994c9a78e2927847af2cfe5e096fbb2a8
Signed-off-by: Ludovic Courtès <ludo@gnu.org>
Having substitute URLs explicitly listed in the service startup file
makes it clearer what should be modified to permanently change the list
of substitute URLs.
* config-daemon.ac: Rename ‘guix_substitute_urls’ to
‘GUIX_SUBSTITUTE_URLS’ and substitute it.
* nix/local.mk (etc/guix-%.service, etc/init.d/guix-daemon)
(etc/guix-%.conf): Substitute it.
* etc/guix-daemon.conf.in, etc/guix-daemon.service.in,
etc/init.d/guix-daemon.in: Add an explicit ‘--substitute-urls’ option.
Change-Id: Ie491b7fab5c42e54dca582801c03805a85de2bf9
This reverts commit 69f6edc1a8.
The intention is good, but nodist_systemdservice_DATA are meant to be
disposable artefacts generated from corresponding ‘.in’ files.
etc/guix-gc.timer doesn't fit that description, breaking builds:
$ make clean && make
…
make[2]: *** No rule to make target 'etc/guix-gc.timer', needed by 'all-am'. Stop.
Without this invoking ‘make clean’ would remove ‘guix-gc.timer’, and ‘make’
would fail with.
make[2]: *** No rule to make target 'etc/guix-gc.timer', needed by 'all-am'. Stop.
* nix/local.mk (nodist_systemdservice_DATA): Remove ‘guix-gc.timer’.
This follows up on 1a1faa78b0, and avoids
the (non-fatal) error seen in <https://issues.guix.gnu.org/41356>.
/gnu/store will remain writable on new foreign distribution
installations until the next release.
* etc/guix-install.sh (sys_enable_guix_daemon): Check for
‘gnu-store.mount’ presence before trying to cp it.
Update forgotten copyright header.
* etc/gnu-store.mount.in: New file.
* nix/local.mk (nodist_systemdservice_DATA): Add it.
(etc/%.mount): New rule for it.
* etc/guix-install.sh (sys_enable_guix_daemon): Install it.
* doc/guix.texi (Binary Installation): Document it.
* .gitignore: Ignore changes to it.
The daemon had a mechanism that allows it to handle a list of
substituters and try them sequentially; this removes it.
* nix/scripts/substitute.in: Remove.
* nix/local.mk (nodist_pkglibexec_SCRIPTS): Remove.
* config-daemon.ac: Don't output 'nix/scripts/substitute'.
* nix/libstore/build.cc (SubstitutionGoal)[subs, sub, hasSubstitute]:
Remove.
[tryNext]: Make private.
(SubstitutionGoal::SubstitutionGoal, SubstitutionGoal::init): Remove now
unneeded initializers.
(SubstitutionGoal::tryNext): Adjust to assume a single substituter: call
'amDone' upfront when we couldn't find substitutes.
(SubstitutionGoal::tryToRun): Adjust to run 'guix substitute' via
'settings.guixProgram'.
(SubstitutionGoal::finished): Call 'amDone(ecFailed)' upon failure
instead of setting 'state' to 'tryNext'.
* nix/libstore/globals.hh (Settings)[substituters]: Remove.
* nix/libstore/local-store.cc (LocalStore::~LocalStore): Adjust to
handle a single substituter.
(LocalStore::startSubstituter): Remove 'path' parameter. Adjust to
invoke 'settings.guixProgram'. Don't refer to 'run.program', which no
longer exists.
(LocalStore::querySubstitutablePaths): Adjust for 'runningSubstituters'
being a singleton instead of a list.
(LocalStore::querySubstitutablePathInfos): Likewise, and remove
'substituter' parameter.
* nix/libstore/local-store.hh (RunningSubstituter)[program]: Remove.
(LocalStore)[runningSubstituters]: Remove.
[runningSubstituter]: New field.
[querySubstitutablePathInfos]: Remove 'substituter' parameter.
[startSubstituter]: Remove 'substituter' parameter.
* nix/nix-daemon/guix-daemon.cc (main): Remove references to
'settings.substituters'.
* nix/nix-daemon/nix-daemon.cc (performOp): Ignore the user's
"build-use-substitutes" value when 'settings.useSubstitutes' is false.
This makes it easier to run the uninstalled daemon.
* nix/local.mk (libstore_a_CPPFLAGS): Append "/guix" to
NIX_LIBEXEC_DIR.
* build-aux/pre-inst-env.in (NIX_LIBEXEC_DIR): Adjust comment.
* nix/libstore/builtins.cc (builtinDownload): Remove SUBDIR and its
use.
* nix/libstore/local-store.cc (runAuthenticationProgram): Ditto.
* nix/libstore/gc.cc (addAdditionalRoots): Remove "/guix" prefix.
* nix/nix-daemon/guix-daemon.cc (main): Ditto.
That way it is handled in the same way as other helper scripts.
* nix/scripts/guix-authenticate.in: Rename to...
* nix/scripts/authenticate.in: ... this.
* config-daemon.ac: Adjust accordingly.
* nix/local.mk (libstore_a_CPPFLAGS): Remove -DOPENSSL_PATH.
(nodist_libexec_SCRIPTS): Remove.
(nodist_pkglibexec_SCRIPTS): New variable.
* nix/nix-daemon/guix-daemon.cc (main): Remove 'setenv' call for
"PATH".
* nix/libstore/local-store.cc (runAuthenticationProgram): New function.
(LocalStore::exportPath, LocalStore::importPath): Use it instead of
'runProgram' and OPENSSL_PATH.
* config-daemon.ac: Don't bail out when libbz2 is missing. Define
'HAVE_LIBBZ2' Automake conditional.
* nix/libstore/build.cc: Wrap relevant bits in '#if HAVE_BZLIB_H'.
* nix/libstore/globals.cc (Settings::Settings): 'logCompression'
defaults to COMPRESSION_GZIP when HAVE_BZLIB_H is false.
* nix/libstore/globals.hh (CompressionType): Make 'COMPRESSION_BZIP2'
conditional on HAVE_BZLIB_H.
* nix/local.mk (guix_register_LDADD, guix_daemon_LDADD): Add -lbz2 only
when HAVE_LIBBZ2.
* nix/nix-daemon/guix-daemon.cc (parse_opt): Ignore "bzip2" when not
HAVE_BZLIB_H.
Otherwise, users will be stuck running an old copy of guix and the guix-daemon
if they copy the service files instead of symlinking them.
* etc/guix-daemon.conf.in, etc/guix-daemon.service.in, etc/guix-publish.conf.in,
etc/guix-publish.service.in: Expand @localstatedir@ instead of @bindir@.
* nix/local.mk (etc/guix-%.service, etc/guix-%.conf): Use @localstatedir@
instead of @bindir@.
* .gitignore: add etc/guix-publish.conf and /etc/guix-publish.service.
* etc/guix-publish.conf.in: New file.
* etc/guix-publish.service.in: New file.
* nix/local.mk (etc/guix-%.service, etc/guix-%.conf): Generalized former
build-rules for by using patterns.
(nodist_systemdservice_DATA): Add etc/guix-publish.service, update
comment.
(nodist_upstartjob_DATA): Add etc/guix-publish.conf, update comment.
* doc/guix.texi (Invoking guix publish): Add description for enabling
"guix publish" on host distros using the new files.
* doc/local.mk: Use "%D%" for the directory of the fragment relative to
the base 'Makefile.am'.
* emacs/local.mk: Likewise.
* gnu/local.mk: Likewise.
* nix/local.mk: Likewise.
This follows a convention used by some other GNU packages like Autoconf,
Bison, Coreutils, and Gnulib.
* doc.am: Rename to ...
* doc/local.mk: ... this.
* emacs.am: Rename to ...
* emacs/local.mk: ... this.
* gnu-system.am: Rename to ...
* gnu/local.mk: ... this.
* daemon.am: Rename to ...
* nix/local.mk: ... this.
* Makefile.am: Adapt to them.
* doc/guix.texi (Porting to a New Platform): Adapt documentation.
* guix/config.scm.in (%state-directory, %config-directory): Adapt comments.
* emacs/guix-config.el.in (guix-config-state-directory): Likewise.