Re: [PATCH] kernel/fork: stop playing lockless games for exe_file replacement

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mateusz Guzik <mjguzik@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, torvalds@linux-foundation.org,
	 brauner@kernel.org, ebiederm@xmission.com,
	akpm@linux-foundation.org,  linux-mm@kvack.org, koct9i@gmail.com,
	oleg@redhat.com, dave@stgolabs.net
Subject: Re: [PATCH] kernel/fork: stop playing lockless games for exe_file replacement
Date: Mon, 14 Aug 2023 10:54:44 +0200	[thread overview]
Message-ID: <CAGudoHGvBEpotHW0t3cDp4487SqSvba6JAW9N6jWuUz=_K7ixA@mail.gmail.com> (raw)
In-Reply-To: <88f1f73e-9879-95f9-bbc4-339c43a5c2f1@redhat.com>

On 8/14/23, David Hildenbrand <david@redhat.com> wrote:
> On 14.08.23 10:21, Mateusz Guzik wrote:
>> On 8/14/23, David Hildenbrand <david@redhat.com> wrote:
>>> On 13.08.23 14:33, Mateusz Guzik wrote:
>>>> xchg originated in 6e399cd144d8 ("prctl: avoid using mmap_sem for
>>>> exe_file serialization"). While the commit message does not explain
>>>> *why* the change, clearly the intent was to use mmap_sem less in this
>>>> codepath. I found the original submission [1] which ultimately claims
>>>> it
>>>> cleans things up.
>>>
>>> More details are apparently in v1 of that patch:
>>>
>>> https://lore.kernel.org/lkml/1424979417.10344.14.camel@stgolabs.net/
>>>
>>> Regarding your patch: adding more mmap_write_lock() where avoidable, I'm
>>> not so sure.
>>>
>>
>> But exe_file access is already synchronized with the semaphore and
>> your own commit added a mmap_read_lock/mmap_read_unlock cycle after
>> the xchg in question to accomodate this requirement.
>
> Yes, we want to handle concurrent fork() ("Don't race with dup_mmap()"),
> thus mmap_read_lock.
>
>> Then mmap_write_lock around the replacement is the obvious thing to do.
>
> Apparently to you. :)
>
> mmap_write_lock will block more than fork. For example, concurrent page
> faults (without new VMA locking), for no apparent reason to me.
>
>>
>>> Your patch doesn't look (to me) like it is removing a lot of complexity.
>>>
>>
>> The code in the current form makes the reader ask what prompts xchg +
>> read-lock instead of mere write-locking.
>>
>> This is not a hot path either and afaics it can only cause contention
>> if userspace is trying to abuse the interface to break the kernel,
>> messing with a processs hard at work (which would be an extra argument
>> to not play games on kernel side).
>>
>> That said, no, it does not remove "a lot of complexity". It does
>> remove some though at no real downside that I can see.
>>
>> But then again, it is on people who insist on xchg to justify it.
>
> Changing it now needs good justification, why we would want to block any
> concurrent MM activity. Maybe there is good justification.
>
> In any case, this commit would have to update the documentation of
> replace_mm_exe_file, that spells out existing locking behavior.
>

Perhaps it will help if I add that the prctl thingy always had a
troubled relationship with locking.

Last time I looked at it was in 2016, where I found that it was doing
down_read to update arg_start/arg_end and others while a consumer in
procfs would read them and assert on their sanity. As only a read-lock
was held, 2 threads could be used to transiently produce a bogus state
as they race with their updates and trigger the BUG. See this commit
(but also excuse weirdly bad english ;))
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ddf1d398e517e660207e2c807f76a90df543a217

Moreover check out the following in prctl_set_auxv:

        task_lock(current);
        memcpy(mm->saved_auxv, user_auxv, len);
        task_unlock(current);

any thread in the process can reach that codepath while sharing the
same mm, thus this does not lock any updates. Not only that, but a
duplicated memcpy onto the area in prctl_set_mm_map does not even take
that lock and the code to read this does not take any locks.

[Code duplication and synchronization aside, additional points
deducted for saved_auxv storing always-NULL pointers instead of adding
them on reads.]

The above exhausts my willingness to argue about this change, I'm just
a passerby. If it is NAKed, I'm dropping the subject.

I am willing to do the comment tidy ups if this can go in though, but
not before there is consensus.

-- 
Mateusz Guzik <mjguzik gmail.com>

next prev parent reply	other threads:[~2023-08-14  8:54 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-13 12:33 Mateusz Guzik
2023-08-14  7:55 ` David Hildenbrand
2023-08-14  8:21   ` Mateusz Guzik
2023-08-14  8:29     ` David Hildenbrand
2023-08-14  8:54       ` Mateusz Guzik [this message]
2023-08-14 15:38         ` David Hildenbrand
2023-08-14 15:05 ` Oleg Nesterov
2023-08-14 15:20   ` Oleg Nesterov
2023-08-14 15:37     ` David Hildenbrand
2023-08-14 15:58       ` Oleg Nesterov
2023-08-14 16:09         ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGudoHGvBEpotHW0t3cDp4487SqSvba6JAW9N6jWuUz=_K7ixA@mail.gmail.com' \
    --to=mjguzik@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=koct9i@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=oleg@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox