linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mateusz Guzik <mjguzik@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
	 linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] mm: remove unintentional voluntary preemption in get_mmap_lock_carefully
Date: Sun, 20 Aug 2023 14:46:26 +0200	[thread overview]
Message-ID: <CAGudoHFRoE0Wxjj9mp6ggMJDfVyu45AMeRPHLTyyq7HusgWs-g@mail.gmail.com> (raw)
In-Reply-To: <CAGudoHG3OxoYKSwDmJYEDOj6LmDMsQDs3SD5nBdrzA5Vc1_H0A@mail.gmail.com>

On 8/20/23, Mateusz Guzik <mjguzik@gmail.com> wrote:
> On 8/20/23, Matthew Wilcox <willy@infradead.org> wrote:
>> On Sun, Aug 20, 2023 at 12:43:03PM +0200, Mateusz Guzik wrote:
>>> Should the trylock succeed (and thus blocking was avoided), the routine
>>> wants to ensure blocking was still legal to do. However, the method
>>> used ends up calling __cond_resched injecting a voluntary preemption
>>> point with the freshly acquired lock.
>>>
>>> One can hack around it using __might_sleep instead of mere might_sleep,
>>> but since threads keep going off CPU here, I figured it is better to
>>> accomodate it.
>>
>> Except now we search the exception tables every time we call it.
>> The now-deleted comment (c2508ec5a58d) suggests this is slow:
>>
>
> I completely agree it a rather unfortunate side-effect.
>
>> -       /*
>> -        * Kernel-mode access to the user address space should only occur
>> -        * on well-defined single instructions listed in the exception
>> -        * tables.  But, an erroneous kernel fault occurring outside one
>> of
>> -        * those areas which also holds mmap_lock might deadlock
>> attempting
>> -        * to validate the fault against the address space.
>> -        *
>> -        * Only do the expensive exception table search when we might be
>> at
>> -        * risk of a deadlock.  This happens if we
>> -        * 1. Failed to acquire mmap_lock, and
>> -        * 2. The access did not originate in userspace.
>> -        */
>>
>> Now, this doesn't mean we're doing it on every page fault.  We skip
>> all of this if we're able to handle the fault under the VMA lock,
>> so the effect is probably much smaller than it was.  But I'm surprised
>> not to see you send any data quantifying the effect of this change!
>>
>
> Going off CPU *after* taking the lock sounds like an obviously bad
> thing to happen and as such I did not think it warrants any
> measurements.
>
> My first patch looked like this:
> diff --git a/mm/memory.c b/mm/memory.c
> index 1ec1ef3418bf..8662fd69eae8 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5259,7 +5259,9 @@ static inline bool
> get_mmap_lock_carefully(struct mm_struct *mm, struct pt_regs
>  {
>         /* Even if this succeeds, make it clear we *might* have slept */
>         if (likely(mmap_read_trylock(mm))) {
> -               might_sleep();
> +#if defined(CONFIG_DEBUG_ATOMIC_SLEEP)
> +               __might_sleep(__FILE__, __LINE__);
> +#endif
>                 return true;
>         }
>
> This keeps assertions while dodging __cond_resched.
>
> But then I figured someone may complain about scheduling latency which
> was not there prior to the patch.
>
> Between the 2 not so great choices I rolled with what I considered the
> safer one.
>
> However, now that I said it, I wonder if perhaps the search could be
> circumvented on x86-64? The idea would be to check if SMAP got
> disabled (and assuming the CPU supports it) -- if so and the faulting
> address belongs to userspace, assume it's all good. This is less
> precise in that SMAP can be disabled over the intended users access
> but would be fine as far as I'm concerned if the search is such a big
> deal.
>

Oof, hit send too fast.

This is less precise in that SMAP can be disabled over A LARGER AREA
THAN the intended users access but would be fine as far as I'm
concerned if the search is such a big.

there :)

Anyhow I don't feel strongly about any of this, I was mostly
interested in what happens with VFS on the off-CPU front and this one
is just a random thing I needed to check.

Now that I elaborated on my $0,03 I'm happy to respin with the
__might_sleep variant. If someone wants a different fix altogether
they are welcome to ignore these patches.

I do claim the current state *is* a problem though -- it can block
down_write for no good reason.

-- 
Mateusz Guzik <mjguzik gmail.com>


  reply	other threads:[~2023-08-20 12:46 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-20 10:43 Mateusz Guzik
2023-08-20 11:36 ` Matthew Wilcox
2023-08-20 12:41   ` Mateusz Guzik
2023-08-20 12:46     ` Mateusz Guzik [this message]
2023-08-20 12:47     ` Linus Torvalds
2023-08-20 12:59       ` Linus Torvalds
2023-08-20 13:08         ` Mateusz Guzik
2023-08-20 13:00       ` Mateusz Guzik
2023-08-20 18:12 ` Matthew Wilcox
2023-08-21  1:13   ` Mateusz Guzik
2023-08-21  3:58     ` Matthew Wilcox
2023-08-21  4:55       ` Linus Torvalds
2023-08-21  5:38         ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGudoHFRoE0Wxjj9mp6ggMJDfVyu45AMeRPHLTyyq7HusgWs-g@mail.gmail.com \
    --to=mjguzik@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox