* [PATCH v3 0/4] pid_namespace: make init creation more flexible
@ 2026-02-24 16:47 Pavel Tikhomirov
2026-02-24 16:47 ` [PATCH v3 1/4] pid_namespace: avoid optimization of accesses to ->child_reaper Pavel Tikhomirov
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Pavel Tikhomirov @ 2026-02-24 16:47 UTC (permalink / raw)
To: Christian Brauner, Shuah Khan
Cc: Kees Cook, Andrew Morton, David Hildenbrand, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Jan Kara,
Oleg Nesterov, Aleksa Sarai, Andrei Vagin, Kirill Tkhai,
Alexander Mikhalitsyn, Adrian Reber, Pavel Tikhomirov,
linux-kernel, linux-mm, linux-kselftest
The first patch properly annotates accesses to ->child_reaper with
_ONCE macroses, to protect unlocked accesses from possible cpu/compiler
optimization problems.
The second patch makes sure that the init is always a first process in
the pid namespace, previously this was only checked for set_tid case,
and could lead to potential bugs.
The third patch allows to join pid namespace before pid namespace init
is created, that allows to create pid namespace by one process and then
create pid namespace init from another process after setns(). Please see
the detailed description in the patch commit message. It depends on the
second patch.
The forth and the final patch is a comprehansive test, that tests both
basic usecase of creating pid namespace and init separately, and a more
specific usecase which shows how we can improve clone3(set_tid)
usability after this change.
This change is generally useful as it makes clone3(set_tid) more
universal, and let's it work in all the cases evenly. Also it is highly
useful to CRIU to handle nested containers.
v2: Use *_ONCE for ->child_reaper accesses atomicity, and avoid taking
task_list lock for reading it. Rebase to master.
v3: Separate *_ONCE change and "init is first" checks into separate
commits.
This series is also available here:
https://github.com/Snorch/linux/commits/allow-creating-pid-namespace-init-after-setns-v3/
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Pavel Tikhomirov (4):
pid_namespace: avoid optimization of accesses to ->child_reaper
pid: check init is created first after idr alloc
pid_namespace: allow opening pid_for_children before init was created
selftests: Add tests for creating pidns init via setns
kernel/exit.c | 3 +-
kernel/fork.c | 5 +-
kernel/pid.c | 17 +-
kernel/pid_namespace.c | 9 -
.../selftests/pid_namespace/.gitignore | 1 +
.../testing/selftests/pid_namespace/Makefile | 2 +-
.../pid_namespace/pidns_init_via_setns.c | 238 ++++++++++++++++++
7 files changed, 256 insertions(+), 19 deletions(-)
create mode 100644 tools/testing/selftests/pid_namespace/pidns_init_via_setns.c
--
2.53.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 1/4] pid_namespace: avoid optimization of accesses to ->child_reaper
2026-02-24 16:47 [PATCH v3 0/4] pid_namespace: make init creation more flexible Pavel Tikhomirov
@ 2026-02-24 16:47 ` Pavel Tikhomirov
2026-02-24 16:47 ` [PATCH v3 2/4] pid: check init is created first after idr alloc Pavel Tikhomirov
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Pavel Tikhomirov @ 2026-02-24 16:47 UTC (permalink / raw)
To: Christian Brauner, Shuah Khan
Cc: Kees Cook, Andrew Morton, David Hildenbrand, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Jan Kara,
Oleg Nesterov, Aleksa Sarai, Andrei Vagin, Kirill Tkhai,
Alexander Mikhalitsyn, Adrian Reber, Pavel Tikhomirov,
linux-kernel, linux-mm, linux-kselftest
To avoid potential problems related to cpu/compiler optimizations around
->child_reaper, let's use WRITE_ONCE (additional to task_list lock)
everywhere we write it and use READ_ONCE where we read it without
explicit lock. Note: It also pairs with existing READ_ONCE with no lock
in nsfs_fh_to_dentry().
Also let's add ASSERT_EXCLUSIVE_WRITER before write to identify to KCSAN
that we don't expect any concurrent ->child_reaper modifications, and
those must be detected.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
--
v3: Split from main commit. Add ASSERT_EXCLUSIVE_WRITER.
---
kernel/exit.c | 3 ++-
kernel/fork.c | 5 ++++-
kernel/pid.c | 2 +-
3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/kernel/exit.c b/kernel/exit.c
index 8a87021211ae..8e5e523dcc79 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -608,7 +608,8 @@ static struct task_struct *find_child_reaper(struct task_struct *father,
reaper = find_alive_thread(father);
if (reaper) {
- pid_ns->child_reaper = reaper;
+ ASSERT_EXCLUSIVE_WRITER(pid_ns->child_reaper);
+ WRITE_ONCE(pid_ns->child_reaper, reaper);
return reaper;
}
diff --git a/kernel/fork.c b/kernel/fork.c
index e832da9d15a4..9ce2d12ec701 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2423,7 +2423,10 @@ __latent_entropy struct task_struct *copy_process(
init_task_pid(p, PIDTYPE_SID, task_session(current));
if (is_child_reaper(pid)) {
- ns_of_pid(pid)->child_reaper = p;
+ struct pid_namespace *ns = ns_of_pid(pid);
+
+ ASSERT_EXCLUSIVE_WRITER(ns->child_reaper);
+ WRITE_ONCE(ns->child_reaper, p);
p->signal->flags |= SIGNAL_UNKILLABLE;
}
p->signal->shared_pending.signal = delayed.signal;
diff --git a/kernel/pid.c b/kernel/pid.c
index 3b96571d0fe6..76c2744493e2 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -219,7 +219,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *arg_set_tid,
* Also fail if a PID != 1 is requested and
* no PID 1 exists.
*/
- if (tid != 1 && !tmp->child_reaper)
+ if (tid != 1 && !READ_ONCE(tmp->child_reaper))
goto out_abort;
retval = -EPERM;
if (!checkpoint_restore_ns_capable(tmp->user_ns))
--
2.53.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 2/4] pid: check init is created first after idr alloc
2026-02-24 16:47 [PATCH v3 0/4] pid_namespace: make init creation more flexible Pavel Tikhomirov
2026-02-24 16:47 ` [PATCH v3 1/4] pid_namespace: avoid optimization of accesses to ->child_reaper Pavel Tikhomirov
@ 2026-02-24 16:47 ` Pavel Tikhomirov
2026-02-24 16:47 ` [PATCH v3 3/4] pid_namespace: allow opening pid_for_children before init was created Pavel Tikhomirov
2026-02-24 16:47 ` [PATCH v3 4/4] selftests: Add tests for creating pidns init via setns Pavel Tikhomirov
3 siblings, 0 replies; 5+ messages in thread
From: Pavel Tikhomirov @ 2026-02-24 16:47 UTC (permalink / raw)
To: Christian Brauner, Shuah Khan
Cc: Kees Cook, Andrew Morton, David Hildenbrand, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Jan Kara,
Oleg Nesterov, Aleksa Sarai, Andrei Vagin, Kirill Tkhai,
Alexander Mikhalitsyn, Adrian Reber, Pavel Tikhomirov,
linux-kernel, linux-mm, linux-kselftest
This moves the condition (tid != 1 && !tmp->child_reaper) to after idr
alloc, so it not only covers that first process in pid namespace has pid
1 in case of clone3(set_tid) requesting wrong pid, but also if idr
itself gives wrong pid for some reason.
This could've been the case before this patch, when creating first
process the alloc_pid()->pidfs_add_pid() code path fails, so that the
idr->idr_next is non zero anymore and next process calling to
alloc_pid(), will get 2 as a pid from idr_alloc_cyclic(). Effectively
leading to init-less pid namespace, which is a bug.
Note: This is also a preparation for the next patch in the series, which
will introduce an ability of creating init from the task different to
the task which had created the pid namespace. Needed to make sure that
init is always first, even in this new case.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
--
v3: Split from main commit. Merge two checks of ->child_reaper into one.
---
kernel/pid.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/kernel/pid.c b/kernel/pid.c
index 76c2744493e2..ebf013f35cb3 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -215,12 +215,6 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *arg_set_tid,
retval = -EINVAL;
if (tid < 1 || tid >= pid_max[ns->level - i])
goto out_abort;
- /*
- * Also fail if a PID != 1 is requested and
- * no PID 1 exists.
- */
- if (tid != 1 && !READ_ONCE(tmp->child_reaper))
- goto out_abort;
retval = -EPERM;
if (!checkpoint_restore_ns_capable(tmp->user_ns))
goto out_abort;
@@ -296,9 +290,18 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *arg_set_tid,
pid->numbers[i].nr = nr;
pid->numbers[i].ns = tmp;
- tmp = tmp->parent;
i--;
retried_preload = false;
+
+ /*
+ * PID 1 (init) must be created first.
+ */
+ if (!READ_ONCE(tmp->child_reaper) && nr != 1) {
+ retval = -EINVAL;
+ goto out_free;
+ }
+
+ tmp = tmp->parent;
}
/*
--
2.53.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 3/4] pid_namespace: allow opening pid_for_children before init was created
2026-02-24 16:47 [PATCH v3 0/4] pid_namespace: make init creation more flexible Pavel Tikhomirov
2026-02-24 16:47 ` [PATCH v3 1/4] pid_namespace: avoid optimization of accesses to ->child_reaper Pavel Tikhomirov
2026-02-24 16:47 ` [PATCH v3 2/4] pid: check init is created first after idr alloc Pavel Tikhomirov
@ 2026-02-24 16:47 ` Pavel Tikhomirov
2026-02-24 16:47 ` [PATCH v3 4/4] selftests: Add tests for creating pidns init via setns Pavel Tikhomirov
3 siblings, 0 replies; 5+ messages in thread
From: Pavel Tikhomirov @ 2026-02-24 16:47 UTC (permalink / raw)
To: Christian Brauner, Shuah Khan
Cc: Kees Cook, Andrew Morton, David Hildenbrand, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Jan Kara,
Oleg Nesterov, Aleksa Sarai, Andrei Vagin, Kirill Tkhai,
Alexander Mikhalitsyn, Adrian Reber, Pavel Tikhomirov,
linux-kernel, linux-mm, linux-kselftest
This effectively gives us an ability to create the pid namespace init as
a child of the process (setns-ed to the pid namespace) different to the
process which created the pid namespace itself.
Original problem:
There is a cool set_tid feature in clone3() syscall, it allows you to
create process with desired pids on multiple pid namespace levels. Which
is useful to restore processes in CRIU for nested pid namespace case.
In nested container case we can potentially see this kind of pid/user
namespace tree:
Process
┌─────────┐
User NS0 ──▶ Pid NS0 ──▶ Pid p0 │
│ │ │ │
▼ ▼ │ │
User NS1 ──▶ Pid NS1 ──▶ Pid p1 │
│ │ │ │
... ... │ ... │
│ │ │ │
▼ ▼ │ │
User NSn ──▶ Pid NSn ──▶ Pid pn │
└─────────┘
So to create the "Process" and set pids {p0, p1, ... pn} for it on all
pid namespace levels we can use clone3() syscall set_tid feature, BUT
the syscall does not allow you to set pid on pid namespace levels you
don't have permission to. So basically you have to be in "User NS0" when
creating the "Process" to actually be able to set pids on all levels.
It is ok for almost any process, but with pid namespace init this does
not work, as currently we can only create pid namespace init and the pid
namespace itself simultaneously, so to make "Pid NSn" owned by "User
NSn" we have to be in the "User NSn".
We can't possibly be in "User NS0" and "User NSn" at the same time,
hence the problem.
Alternative solution:
Yes, for the case of pid namespace init we can use old and gold
/proc/sys/kernel/ns_last_pid interface on the levels lower than n. But
it is much more complicated and introduces tons of extra code to do. It
would be nice to make clone3() set_tid interface also aplicable to this
corner case.
Implementation:
Now when anyone can setns to the pid namespace before the creation of
init, and thus multiple processes can fork children to the pid
namespace, it is important that we enforce the first process created is
always pid namespace init. (Note that this was done by the previous
preparational patch as a standalon useful change.) We only allow other
processes after the init sets pid_namespace->child_reaper.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
--
v2: Use *_ONCE for ->child_reaper accesses atomicity, and avoid taking
task_list lock for reading it. Rebase to master, and thus remove
now excess pidns_ready variable.
v3: Separate *_ONCE change and "init is first" checks into separate
commits.
Note: I didn't find anything in copy_process() around setting the
->child_reaper which can influence the pid namespace, so it looks like
the pid namespace is fully setup at the point when init sets
->child_reaper to receive more processes. Thus tasklist lock looks
excess in pidns_for_children_get()'s ->child_reaper check and it should
be safe not to have it in the corresponding check in alloc_pid()
(introduced earlier in this series).
---
kernel/pid_namespace.c | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index e48f5de41361..d36afc58ee1d 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -369,15 +369,6 @@ static struct ns_common *pidns_for_children_get(struct task_struct *task)
}
task_unlock(task);
- if (ns) {
- read_lock(&tasklist_lock);
- if (!ns->child_reaper) {
- put_pid_ns(ns);
- ns = NULL;
- }
- read_unlock(&tasklist_lock);
- }
-
return ns ? &ns->ns : NULL;
}
--
2.53.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 4/4] selftests: Add tests for creating pidns init via setns
2026-02-24 16:47 [PATCH v3 0/4] pid_namespace: make init creation more flexible Pavel Tikhomirov
` (2 preceding siblings ...)
2026-02-24 16:47 ` [PATCH v3 3/4] pid_namespace: allow opening pid_for_children before init was created Pavel Tikhomirov
@ 2026-02-24 16:47 ` Pavel Tikhomirov
3 siblings, 0 replies; 5+ messages in thread
From: Pavel Tikhomirov @ 2026-02-24 16:47 UTC (permalink / raw)
To: Christian Brauner, Shuah Khan
Cc: Kees Cook, Andrew Morton, David Hildenbrand, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot, Jan Kara,
Oleg Nesterov, Aleksa Sarai, Andrei Vagin, Kirill Tkhai,
Alexander Mikhalitsyn, Adrian Reber, Pavel Tikhomirov,
linux-kernel, linux-mm, linux-kselftest
First testcase "pidns_init_via_setns" checks that a process can become
Pid 1 (init) in a new Pid namespace created via unshare() and joined via
setns().
Second testcase "pidns_init_via_setns_set_tid" checks that during this
process we can use clone3() + set_tid and set the pid in both the new
and old pid namespaces (owned by different user namespaces).
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
.../selftests/pid_namespace/.gitignore | 1 +
.../testing/selftests/pid_namespace/Makefile | 2 +-
.../pid_namespace/pidns_init_via_setns.c | 238 ++++++++++++++++++
3 files changed, 240 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/pid_namespace/pidns_init_via_setns.c
diff --git a/tools/testing/selftests/pid_namespace/.gitignore b/tools/testing/selftests/pid_namespace/.gitignore
index 5118f0f3edf4..c647c6eb3367 100644
--- a/tools/testing/selftests/pid_namespace/.gitignore
+++ b/tools/testing/selftests/pid_namespace/.gitignore
@@ -1,2 +1,3 @@
pid_max
+pidns_init_via_setns
regression_enomem
diff --git a/tools/testing/selftests/pid_namespace/Makefile b/tools/testing/selftests/pid_namespace/Makefile
index b972f55d07ae..b01a924ac04b 100644
--- a/tools/testing/selftests/pid_namespace/Makefile
+++ b/tools/testing/selftests/pid_namespace/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
CFLAGS += -g $(KHDR_INCLUDES)
-TEST_GEN_PROGS = regression_enomem pid_max
+TEST_GEN_PROGS = regression_enomem pid_max pidns_init_via_setns
LOCAL_HDRS += $(selfdir)/pidfd/pidfd.h
diff --git a/tools/testing/selftests/pid_namespace/pidns_init_via_setns.c b/tools/testing/selftests/pid_namespace/pidns_init_via_setns.c
new file mode 100644
index 000000000000..7e4c610291d3
--- /dev/null
+++ b/tools/testing/selftests/pid_namespace/pidns_init_via_setns.c
@@ -0,0 +1,238 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <sched.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include "kselftest_harness.h"
+#include "../pidfd/pidfd.h"
+
+/*
+ * Test that a process can become PID 1 (init) in a new PID namespace
+ * created via unshare() and joined via setns().
+ *
+ * Flow:
+ * 1. Parent creates a pipe for synchronization.
+ * 2. Parent forks a child.
+ * 3. Parent calls unshare(CLONE_NEWPID) to create a new PID namespace.
+ * 4. Parent signals the child via the pipe.
+ * 5. Child opens parent's /proc/<ppid>/ns/pid_for_children and calls
+ * setns(fd, CLONE_NEWPID) to join the new namespace.
+ * 6. Child forks a grandchild.
+ * 7. Grandchild verifies getpid() == 1.
+ */
+TEST(pidns_init_via_setns)
+{
+ pid_t child, parent_pid;
+ int pipe_fd[2];
+ char buf;
+
+ parent_pid = getpid();
+
+ ASSERT_EQ(0, pipe(pipe_fd));
+
+ child = fork();
+ ASSERT_GE(child, 0);
+
+ if (child == 0) {
+ char path[256];
+ int nsfd;
+ pid_t grandchild;
+
+ close(pipe_fd[1]);
+
+ /* Wait for parent to complete unshare */
+ ASSERT_EQ(1, read_nointr(pipe_fd[0], &buf, 1));
+ close(pipe_fd[0]);
+
+ snprintf(path, sizeof(path),
+ "/proc/%d/ns/pid_for_children", parent_pid);
+ nsfd = open(path, O_RDONLY);
+ ASSERT_GE(nsfd, 0);
+
+ ASSERT_EQ(0, setns(nsfd, CLONE_NEWPID));
+ close(nsfd);
+
+ grandchild = fork();
+ ASSERT_GE(grandchild, 0);
+
+ if (grandchild == 0) {
+ /* Should be init (PID 1) in the new namespace */
+ if (getpid() != 1)
+ _exit(1);
+ _exit(0);
+ }
+
+ ASSERT_EQ(0, wait_for_pid(grandchild));
+ _exit(0);
+ }
+
+ close(pipe_fd[0]);
+
+ if (geteuid())
+ ASSERT_EQ(0, unshare(CLONE_NEWUSER));
+
+ ASSERT_EQ(0, unshare(CLONE_NEWPID));
+
+ /* Signal child that the new PID namespace is ready */
+ buf = 0;
+ ASSERT_EQ(1, write_nointr(pipe_fd[1], &buf, 1));
+ close(pipe_fd[1]);
+
+ ASSERT_EQ(0, wait_for_pid(child));
+}
+
+/*
+ * Similar to pidns_init_via_setns, but:
+ * 1. Parent enters a new PID namespace right from the start to be able to
+ * later freely use pid 1001 in it.
+ * 2. After forking child, parent also calls unshare(CLONE_NEWUSER)
+ * before unshare(CLONE_NEWPID) so that new old and new pid namespaces have
+ * different user namespace owners.
+ * 3. Child uses clone3() with set_tid={1, 1001} instead of fork() and
+ * grandchild checks that it gets desired pids .
+ *
+ * Flow:
+ * 1. Test process creates a new PID namespace and forks a wrapper
+ * (PID 1 in the outer namespace).
+ * 2. Wrapper forks a child.
+ * 3. Wrapper calls unshare(CLONE_NEWUSER) + unshare(CLONE_NEWPID)
+ * to create an inner PID namespace.
+ * 4. Wrapper signals the child via pipe.
+ * 5. Child opens wrapper's /proc/<pid>/ns/pid_for_children and calls
+ * setns(fd, CLONE_NEWPID) to join the inner namespace.
+ * 6. Child calls clone3() with set_tid={1, 1001}.
+ * 7. Grandchild verifies its NSpid ends with "1001 1".
+ */
+
+pid_t set_tid[] = {1, 1001};
+
+static int pidns_init_via_setns_set_tid_grandchild(struct __test_metadata *_metadata)
+{
+ char *line = NULL;
+ size_t len = 0;
+ int found = 0;
+ FILE *gf;
+
+ gf = fopen("/proc/self/status", "r");
+ ASSERT_NE(gf, NULL);
+
+ while (getline(&line, &len, gf) != -1) {
+ if (strncmp(line, "NSpid:", 6) != 0)
+ continue;
+
+ for (int i = 0; i < 2; i++) {
+ char *last = strrchr(line, '\t');
+ pid_t pid;
+
+ ASSERT_NE(last, NULL);
+ ASSERT_EQ(sscanf(last, "%d", &pid), 1);
+ ASSERT_EQ(pid, set_tid[i]);
+ *last = '\0';
+ }
+
+ found = true;
+ break;
+ }
+
+ free(line);
+ fclose(gf);
+ ASSERT_TRUE(found);
+ return 0;
+}
+
+static int pidns_init_via_setns_set_tid_child(struct __test_metadata *_metadata,
+ pid_t parent_pid, int pipe_fd[2])
+{
+ struct __clone_args args = {
+ .exit_signal = SIGCHLD,
+ .set_tid = ptr_to_u64(set_tid),
+ .set_tid_size = 2,
+ };
+ pid_t grandchild;
+ char path[256];
+ char buf;
+ int nsfd;
+
+ close(pipe_fd[1]);
+
+ ASSERT_EQ(1, read_nointr(pipe_fd[0], &buf, 1));
+ close(pipe_fd[0]);
+
+ snprintf(path, sizeof(path),
+ "/proc/%d/ns/pid_for_children", parent_pid);
+ nsfd = open(path, O_RDONLY);
+ ASSERT_GE(nsfd, 0);
+
+ ASSERT_EQ(0, setns(nsfd, CLONE_NEWPID));
+ close(nsfd);
+
+ grandchild = sys_clone3(&args, sizeof(args));
+ ASSERT_GE(grandchild, 0);
+
+ if (grandchild == 0)
+ _exit(pidns_init_via_setns_set_tid_grandchild(_metadata));
+
+ ASSERT_EQ(0, wait_for_pid(grandchild));
+ return 0;
+}
+
+static int pidns_init_via_setns_set_tid_wrapper(struct __test_metadata *_metadata)
+{
+ int pipe_fd[2];
+ pid_t child, parent_pid;
+ char buf;
+ FILE *f;
+
+ /*
+ * We are PID 1 inside the new namespace, but /proc is
+ * mounted from the host. Read our host-visible PID so
+ * the child can reach our pid_for_children via /proc.
+ */
+ f = fopen("/proc/self/stat", "r");
+ ASSERT_NE(f, NULL);
+ ASSERT_EQ(fscanf(f, "%d", &parent_pid), 1);
+ ASSERT_EQ(0, pipe(pipe_fd));
+
+ child = fork();
+ ASSERT_GE(child, 0);
+
+ if (child == 0)
+ _exit(pidns_init_via_setns_set_tid_child(_metadata, parent_pid, pipe_fd));
+
+ close(pipe_fd[0]);
+
+ ASSERT_EQ(0, unshare(CLONE_NEWUSER));
+ ASSERT_EQ(0, unshare(CLONE_NEWPID));
+
+ buf = 0;
+ ASSERT_EQ(1, write_nointr(pipe_fd[1], &buf, 1));
+ close(pipe_fd[1]);
+
+ ASSERT_EQ(0, wait_for_pid(child));
+
+ fclose(f);
+ return 0;
+}
+
+TEST(pidns_init_via_setns_set_tid)
+{
+ pid_t wrapper;
+
+ if (geteuid())
+ ASSERT_EQ(0, unshare(CLONE_NEWUSER));
+
+ ASSERT_EQ(0, unshare(CLONE_NEWPID));
+
+ wrapper = fork();
+ ASSERT_GE(wrapper, 0);
+
+ if (wrapper == 0)
+ _exit(pidns_init_via_setns_set_tid_wrapper(_metadata));
+
+ ASSERT_EQ(0, wait_for_pid(wrapper));
+}
+
+TEST_HARNESS_MAIN
--
2.53.0
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-02-24 16:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-24 16:47 [PATCH v3 0/4] pid_namespace: make init creation more flexible Pavel Tikhomirov
2026-02-24 16:47 ` [PATCH v3 1/4] pid_namespace: avoid optimization of accesses to ->child_reaper Pavel Tikhomirov
2026-02-24 16:47 ` [PATCH v3 2/4] pid: check init is created first after idr alloc Pavel Tikhomirov
2026-02-24 16:47 ` [PATCH v3 3/4] pid_namespace: allow opening pid_for_children before init was created Pavel Tikhomirov
2026-02-24 16:47 ` [PATCH v3 4/4] selftests: Add tests for creating pidns init via setns Pavel Tikhomirov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox