From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>,
Shuah Khan <shuah@kernel.org>, Kees Cook <kees@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Jan Kara <jack@suse.cz>, Aleksa Sarai <cyphar@cyphar.com>,
Andrei Vagin <avagin@google.com>, Kirill Tkhai <tkhai@ya.ru>,
Alexander Mikhalitsyn <alexander@mihalicyn.com>,
Adrian Reber <areber@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-kselftest@vger.kernel.org,
Mateusz Guzik <mjguzik@gmail.com>
Subject: Re: [PATCH v2 1/2] pid_namespace: allow opening pid_for_children before init was created
Date: Tue, 24 Feb 2026 14:23:19 +0100 [thread overview]
Message-ID: <3f095a91-f052-4f38-a8e4-2e6dbc9a0c6a@virtuozzo.com> (raw)
In-Reply-To: <aZ2U6TwK7rQtkTvT@redhat.com>
On 2/24/26 13:09, Oleg Nesterov wrote:
> On 02/23, Pavel Tikhomirov wrote:
>>
>> To avoid possible problems related to cpu/compiler optimizations around
>> ->child_reaper, let's use WRITE_ONCE (additional to task_list lock)
>> everywhere we write it and use READ_ONCE everywhere we read it without
>> explicit lock.
>
> Yes, this is what I meant... but I can never recall if READ_ONCE() alone
> is enough to make KCSAN happy...
AFAICS this should be fine for memory safety of accesses to ->child_reaper.
I would love if someone more experienced in this area would confirm.
>
> I won't insist, but I think it would be better to do this in a separate
> change for documenation purposes and for discussion.
Ok, will do. It will be a bit ugly as I will first add READ_ONCE to the
pidns_for_children_get() and then remove the hunk with it in the next patch.
>
>> @@ -247,8 +247,9 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *arg_set_tid,
>> * alreay in use. Return EEXIST in that case.
>> */
>> if (nr == -ENOSPC)
>> -
>> nr = -EEXIST;
>> + } else if (!READ_ONCE(tmp->child_reaper) && idr_get_cursor(&tmp->idr) != 0) {
>> + nr = -EINVAL;
>> } else {
>
> Oh, this doesn't look clear/clean... This even looks racy even if it is not.
> Can you move this check into the "else" branch which does another get_cursor
> and unify this check with the RESERVED_PIDS check?
>
> Either way, I don't like the fact we check ->child_reaper != NULL twice.
Notice that we don't really read it twice, it is only twice in code but any real
execution either passes the first check or the second one, not both at the same
time as they are in tid set/unset branches correspondingly.
> Perhaps something like the preparational patch below makes sense ? Not
> sure this is actually better...
This looks more universal at least, as instead of two checks we have one in one
place. My only concern of putting the check where I put it was to avoid extra
idr_alloc_cyclic() + idr_remove(), if we already know we don't need it. But it's
only in last pid_namespace we can have ->child_reaper unset so we do alloc/remove
for all other namespaces anyway in error case, should not be a big deal.
I will add the preparation patches: for below patch and related to _ONCE.
>
> Oleg.
>
> --- x/kernel/pid.c
> +++ x/kernel/pid.c
> @@ -215,12 +215,6 @@ struct pid *alloc_pid(struct pid_namespa
> retval = -EINVAL;
> if (tid < 1 || tid >= pid_max[ns->level - i])
> goto out_abort;
> - /*
> - * Also fail if a PID != 1 is requested and
> - * no PID 1 exists.
> - */
> - if (tid != 1 && !tmp->child_reaper)
> - goto out_abort;
> retval = -EPERM;
> if (!checkpoint_restore_ns_capable(tmp->user_ns))
> goto out_abort;
> @@ -299,6 +293,11 @@ struct pid *alloc_pid(struct pid_namespa
> tmp = tmp->parent;
> i--;
> retried_preload = false;
> +
> + if (!READ_ONCE(tmp->child_reaper) && nr != 1) {
> + retval = -EINVAL;
> + goto out_free;
> + }
> }
>
> /*
>
--
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.
next prev parent reply other threads:[~2026-02-24 13:24 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-23 20:01 [PATCH v2 0/2] pid_namespace: make init creation more flexible Pavel Tikhomirov
2026-02-23 20:01 ` [PATCH v2 1/2] pid_namespace: allow opening pid_for_children before init was created Pavel Tikhomirov
2026-02-24 7:02 ` Andrei Vagin
2026-02-24 10:37 ` Pavel Tikhomirov
2026-02-24 12:09 ` Oleg Nesterov
2026-02-24 13:23 ` Pavel Tikhomirov [this message]
2026-02-23 20:01 ` [PATCH v2 2/2] selftests: Add tests for creating pidns init via setns Pavel Tikhomirov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3f095a91-f052-4f38-a8e4-2e6dbc9a0c6a@virtuozzo.com \
--to=ptikhomirov@virtuozzo.com \
--cc=akpm@linux-foundation.org \
--cc=alexander@mihalicyn.com \
--cc=areber@redhat.com \
--cc=avagin@google.com \
--cc=brauner@kernel.org \
--cc=cyphar@cyphar.com \
--cc=david@kernel.org \
--cc=jack@suse.cz \
--cc=juri.lelli@redhat.com \
--cc=kees@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=mjguzik@gmail.com \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=shuah@kernel.org \
--cc=tkhai@ya.ru \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox