The container that slirp4netns runs in should already be quite difficult to do
anything malicious in beyond basic denial of service or sending of network
traffic. There is, however, one hole remaining in the case in which there is
an adversary able to run code locally: abstract unix sockets. Because these
are governed by network namespaces, not IPC namespaces, and slirp4netns is in
the root network namespace, any process in the root network namespace can
cooperate with the slirp4netns process to take over its user.
To close this, we use seccomp to block the creation of unix-domain sockets by
slirp4netns. This requires some finesse, since slirp4netns absolutely needs
to be able to create other types of sockets - at minimum AF_INET and AF_INET6
Seccomp has many, many pitfalls. To name a few:
1. Seccomp provides you with an "arch" field, but this does not uniquely
determine the ABI being used; the actual meaning of a system call number
depends on both the number (which is often the result of ORing a related
system call with a flag for an alternate ABI) and the architecture.
2. Seccomp provides no direct way of knowing what the native value for the
arch field should be; the user must do configure/compile-time testing for
every architecture+ABI combination they want to support. Amusingly enough,
the linux-internal header files have this exact information
(SECCOMP_ARCH_NATIVE), but they aren't sharing it.
3. The only system call numbers we naturally have are the native ones in
asm/unistd.h. __NR_socket will always refer to the system call number for
the target system's ABI.
4. Seccomp can only manipulate 32-bit words, but represents every system call
argument as a uint64.
5. New system call numbers with as-yet-unknown semantics can be added to the
kernel at any time.
6. Based on this comment in arch/x86/entry/syscalls/syscall_32.tbl:
# 251 is available for reuse (was briefly sys_set_zone_reclaim)
previously-invalid system call numbers may later be reused for new system
calls.
7. Most architecture+ABI combinations have system call tables with many gaps
in them. arm-eabi, for example, has 35 such gaps (note: this is just the
number of distinct gaps, not the number of system call numbers contained in
those gaps).
8. Seccomp's BPF filters require a fully-acyclic control flow graph.
Any operation on a data structure must therefore first be fully
unrolled before it can be run.
9. Seccomp cannot dereference pointers. Only the raw bits provided to the
system calls can be inspected.
10. Some architecture+ABI combos have multiplexer system calls. For example,
socketcall can perform any socket-related system call. The arguments to
the multiplexed system call are passed indirectly, via a pointer to user
memory. They therefore cannot be inspected by seccomp.
11. Some valid system calls are not listed in any table in the kernel source.
For example, __ARM_NR_cacheflush is an "ARM private" system call. It does
not appear in any *.tbl file.
12. Conditional branches are limited to relative jumps of at most 256
instructions forward.
13. Prior to Linux 4.8, any process able to spawn another process and call
ptrace could bypass seccomp restrictions.
To address (1), (2), and (3), we include preprocessor checks to identify the
native architecture value, and reject all system calls that don't use the
native architecture.
To address (4), we use the AC_C_BIGENDIAN autoconf check to conditionally
define WORDS_BIGENDIAN, and match up the proper portions of any uint64 we test
for with the value in the accumulator being tested against.
To address (5) and (6), we use system call pinning. That is, we hardcode a
snapshot of all the valid system call numbers at the time of writing, and
reject any system call numbers not in the recorded set. A set is recorded for
every architecture+ABI combo, and the native one is chosen at compile-time.
This ensures that not only are non-native architectures rejected, but so are
non-native ABIs. For the sake of conciseness, we represent these sets as sets
of disjoint ranges. Due to (7), checking each range in turn could add a lot
of overhead to each system call, so we instead binary search through the
ranges. Due to (8), this binary search has to be fully unrolled, so we do
that too.
It can be tedious and error-prone to manually produce the syscall ranges by
looking at linux's *.tbl files, since the gaps are often small and
uncommented. To address this, a script, build-aux/extract-syscall-ranges.sh,
is added that will produce them given a *.tbl filename and an ABI regex (some
tables seem to abuse the ABI field with strange values like "memfd_secret").
Note that producing the final values still requires looking at the proper
asm/unistd.h file to find any private numbers and to identify any offsets and
ABI variants used.
(10) used to have no good solution, but in the past decade most architectures
have gained dedicated system call alternatives to at least socketcall, so we
can (hopefully) just block it entirely.
To address (13), we block ptrace also.
* build-aux/extract-syscall-ranges.sh: new script.
* Makefile.am (EXTRA_DIST): register it.
* config-daemon.ac: use AC_C_BIGENDIAN.
* nix/libutil/spawn.cc (setNoNewPrivsAction, addSeccompFilterAction): new
functions.
* nix/libutil/spawn.hh (setNoNewPrivsAction, addSeccompFilterAction): new
declarations.
(SpawnContext)[setNoNewPrivs, addSeccompFilter]: new fields.
* nix/libutil/seccomp.hh: new header file.
* nix/libutil/seccomp.cc: new file.
* nix/local.mk (libutil_a_SOURCES, libutil_headers): register them.
* nix/libstore/build.cc (slirpSeccompFilter, writeSeccompFilterDot):
new functions.
(spawnSlirp4netns): use them, set seccomp filter for slirp4netns.
Change-Id: Ic92c7f564ab12596b87ed0801b22f88fbb543b95
Signed-off-by: John Kehayias <john.kehayias@protonmail.com>
guix/build/po.go is not installed and this speeds up convert-xref.scm when
cross-compiling for a host with incompatible guile bytecode.
Fixes: guix/guix#141
* Makefile.am (guile-compilation-rule): Parameterize the host variable.
(make-core-go,make-packages*-go, make-system-go,make-cli-go): Compile for the
host triplet.
(guix/build/po.go): Compile for the build triplet.
Change-Id: I9bad5f7743dd736a2958fb8ae8dd0ee8efc190ec
Signed-off-by: Ludovic Courtès <ludo@gnu.org>
Fixes <https://issues.guix.gnu.org/31785>.
Similar to <https://github.com/NixOS/nix/issues/178>, fixed in
<29cde917fe>.
We can't rely on Goal deletion to release our locks in a timely manner. In
the case in which multiple guix-daemon processes simultaneously try producing
an output path path1, the one that gets there first (P1) will get the lock,
and the second one (P2) will continue trying to acquire the lock until it is
released. Once it has acquired the lock, it checks to see whether the path
has already become valid in the meantime, and if so it reports success to
those Goals waiting on its completion and finishes. Unfortunately, it fails
to release the locks it holds first, so those stay held until that Goal gets
deleted.
Suppose we have the following store path dependency graph:
path4
/ | \
path1 path2 path3
P2 is now sitting on path1's lock, and will continue to do so until path4 is
completed. Suppose there is also a P3, and it has been blocked while P1
builds path2. Now P3 is sitting on path2's lock, and can't acquire path1's
lock to determine that it has been completed. Likewise P2 is sitting on
path1's lock, and now can't acquire path2's lock to determine that it has been
completed. Finally, P3 completes path3 while P2 is blocked.
Now:
- P1 knows that path1 and path2 are complete, and holds no locks, but can't
determine that path3 is complete
- P2 knows that path1 and path3 are complete, and holds locks on path1 and
path3, but can't determine that path2 is complete
- P3 knows that path2 and path3 are complete, and holds a lock on path2, but
can't determine that path1 is complete
And none of these locks will be released until path4 is complete. Thus, we
have a deadlock.
To resolve this, we should explicitly release these locks as soon as they
should be released.
* nix/libstore/build.cc
(DerivationGoal::tryToBuild, SubstitutionGoal::tryToRun):
Explicitly release locks in the has-become-valid case.
* tests/store-deadlock.scm: New file.
* Makefile.am (SCM_TESTS): Add it.
Change-Id: Ie510f84828892315fe6776c830db33d0f70bcef8
Signed-off-by: Ludovic Courtès <ludo@gnu.org>
* gnu/packages/aux-files/emacs/comp-integrity-next.el: New file.
* Makefile.am (AUX_FILES): Register it here.
* gnu/packages/emacs.scm (emacs-next-minimal): Update to 30.0.60-1.4e22ef8.
(emacs->emacs-next)[arguments]: Swap out ‘validate-comp-integrity’ phase,
so as to refer to the new integrity check.
* Makefile.am (AM_V_PO4A_0, AM_V_POXREF_0): Align the Automake status lines of
the po4a and the PO xref, since the common width is 10 characters.
Change-Id: Ic8c32f73294ba6e4ca71ab4aa889a558e4d7fcee
Signed-off-by: Florian Pelz <pelzflorian@pelzflorian.de>
Big thanks to Dariqq <dariqq@posteo.net> for debugging and communicating
upstream about a problem with the deblobbing scripts in this kernel
series!
* gnu/packages/linux.scm (linux-libre-6.9-version, linux-libre-6.9-gnu-revision,
deblob-scripts-6.9, linux-libre-6.9-pristine-source, linux-libre-6.9-source,
linux-libre-headers-6.9, linux-libre-6.9): New variables.
* gnu/packages/aux-files/linux-libre/6.9-arm.conf,
gnu/packages/aux-files/linux-libre/6.9-arm64.conf,
gnu/packages/aux-files/linux-libre/6.9-i686.conf,
gnu/packages/aux-files/linux-libre/6.9-x86.conf: New files.
* Makefile.am (AUX_FILES): Add them.
Signed-off-by: Leo Famulari <leo@famulari.name>
Change-Id: I8dc011a603684f0be88766b7881aa6c560b94443
The previous recommendation, running ‘make authenticate’, was insecure
because it led users to run code from the very repository they want to
authenticate:
https://lists.gnu.org/archive/html/guix-devel/2024-04/msg00252.html
* Makefile.am (commit_v1_0_0, channel_intro_commit)
(channel_intro_signer, GUIX_GIT_KEYRING, authenticate): Remove.
* Makefile.am (.git/hooks/%): New target, generalization of previous
‘.git/hooks/pre-push’ target.
(nodist_noinst_DATA): Add ‘.git/hooks/post-merge’.
* doc/contributing.texi (Building from Git): Suggest ‘guix git
authenticate’ instead of ‘make authenticate’.
* etc/git/post-merge: New file.
* etc/git/pre-push: Run ‘guix git authenticate’ instead of ‘make
authenticate’.
Reviewed-by: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Reported-by: Skyler Ferris <skyvine@protonmail.com>
Change-Id: Ia415aa8375013d0dd095e891116f6ce841d93efd
This replaces Automake's `build-aux/mdate-sh' with our own
`build-aux/mdate-from-git.scm' to use reproducible timestamps from Git
instead.
* build-aux/mdate-from-git.scm: New script.
* bootstrap: Use it to replace build-aux/mdate-sh.
* Makefile.am (EXTRA_DIST): Add it.
Change-Id: I17d0a7de9ffea397129c0db1728f86e28a4e245f
This fixes VERSION being empty in ./configure, which may to documentation
having empty version strings.
* Makefile.am (EXTRA_DIST): Add build-aux/git-version-gen.
Change-Id: If127519811b25e2df0f5caa6a83a4f860fd34eb2
* Makefile.am: Use in_git_p conditional to disable Autotools' cache
consistency assert and removal when bulding from tarball.
(dist): Depend on doc-pot-update again when building from tarball.
(dist-hook): Remove dependencies on gen-ChangeLog and gen-AUTHORS when
building from tarball.
(gen-ChangeLog, gen-AUTHORS): Remove guarding for building from tarball.
Use set -e to avoid silently failing.
(gen-tarball-version): Use $(SOURCE_DATE_EPOCH) instead of re-generating it
using git; this also works running from a tarball.
Change-Id: I9ebdd28a70837f6a4db610c4816bb283d176e2d9
* build-aux/xgettext.scm: Move setting of environment variables to shell
header.
(main): Use SOURCE_DATE_EPOCH as fallback for timestamp. This fixes running
from a tarball.
* Makefile.am (EXTRA_DIST): Add it.
Change-Id: Ic487587b22495868fd2a21545a13dc9e3458299c
This is a follow-up to commit
8b972da068
Makefile.am: Auto-configure Git on 'make'.
* configure.ac (in_git_p): New conditional.
* Makefile.am (nodist_noinst_DATA): Use it to only enable this when building
from Git.
Change-Id: I09a90a59a4933a8cdb04124467d38209171f2a57
This is a follow-up to commit
b0c33b1997
maint: Use reproducible timestamps and name for tarball.
* Makefile.am (am__tar): Add --format=ustar.
Change-Id: I1e499c413703105704f49a84868ec10de69846fb
* doc/local.mk (doc-clean): New target.
(DIST_CONFIGURE_FLAGS): New variable.
(auto-clean): Use them in new target.
* Makefile.am (dist-doc-pot-update): Use it in new target.
(dist): Change to depend on it to clean possibly stale files, instead of
doc-pot-update directly.
Add a toplevel check to ensure that Autotools cache is up to date.
Change-Id: I2ff2d88db9fe1e708ab65e33e1f3d7ecee882cb4