linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mateusz Guzik <mjguzik@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH] mm: remove unintentional voluntary preemption in get_mmap_lock_carefully
Date: Mon, 21 Aug 2023 03:13:03 +0200	[thread overview]
Message-ID: <20230821011303.hoeqjbmjaxajh255@f> (raw)
In-Reply-To: <ZOJXgFJybD1ljCHL@casper.infradead.org>

On Sun, Aug 20, 2023 at 07:12:16PM +0100, Matthew Wilcox wrote:
> On Sun, Aug 20, 2023 at 12:43:03PM +0200, Mateusz Guzik wrote:
> > Found by checking off-CPU time during kernel build (like so:
> > "offcputime-bpfcc -Ku"), sample backtrace:
> >     finish_task_switch.isra.0
> >     __schedule
> >     __cond_resched
> >     lock_mm_and_find_vma
> >     do_user_addr_fault
> >     exc_page_fault
> >     asm_exc_page_fault
> >     -                sh (4502)
> 
> Now I'm awake, this backtrace really surprises me.  Do we not check
> need_resched on entry?  It seems terribly unlikely that need_resched
> gets set between entry and getting to this point, so I guess we must
> not.
> 
> I suggest the version of the patch which puts might_sleep() before the
> mmap_read_trylock() is the right one to apply.  It's basically what
> we've done forever, except that now we'll be rescheduling without the
> mmap lock held, which just seems like an overall win.
> 

I can't sleep and your response made me curious, is that really safe
here?

As I wrote in another email, the routine is concerned with a case of the
kernel faulting on something it should not have. For a case like that I
find rescheduling to another thread to be most concerning.

That said I think I found a winner -- add need_resched() prior to
trylock.

This adds less work than you would have added with might_sleep (a func
call), still respects the preemption point, dodges exception table
checks in the common case and does not switch away if the there is
anything fishy going on.

Or just do that might_sleep.

I'm really buggering off the subject now.

====

mm: remove unintentional voluntary preemption in get_mmap_lock_carefully

Should the trylock succeed (and thus blocking was avoided), the routine
wants to ensure blocking was still legal to do. However, might_sleep()
used ends up calling __cond_resched() injecting a voluntary preemption
point with the freshly acquired lock.

__might_sleep() instead with the lock, but check for preemption prior to
taking it.

Found by checking off-CPU time during kernel build (like so:
"offcputime-bpfcc -Ku"), sample backtrace:
    finish_task_switch.isra.0
    __schedule
    __cond_resched
    lock_mm_and_find_vma
    do_user_addr_fault
    exc_page_fault
    asm_exc_page_fault
    -                sh (4502)
        10

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
---
 mm/memory.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 1ec1ef3418bf..6dac9dbb7b59 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5258,8 +5258,8 @@ EXPORT_SYMBOL_GPL(handle_mm_fault);
 static inline bool get_mmap_lock_carefully(struct mm_struct *mm, struct pt_regs *regs)
 {
 	/* Even if this succeeds, make it clear we *might* have slept */
-	if (likely(mmap_read_trylock(mm))) {
-		might_sleep();
+	if (likely(!need_resched() && mmap_read_trylock(mm))) {
+		__might_sleep(__FILE__, __LINE__);
 		return true;
 	}
 
-- 
2.39.2


  reply	other threads:[~2023-08-21  1:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-20 10:43 Mateusz Guzik
2023-08-20 11:36 ` Matthew Wilcox
2023-08-20 12:41   ` Mateusz Guzik
2023-08-20 12:46     ` Mateusz Guzik
2023-08-20 12:47     ` Linus Torvalds
2023-08-20 12:59       ` Linus Torvalds
2023-08-20 13:08         ` Mateusz Guzik
2023-08-20 13:00       ` Mateusz Guzik
2023-08-20 18:12 ` Matthew Wilcox
2023-08-21  1:13   ` Mateusz Guzik [this message]
2023-08-21  3:58     ` Matthew Wilcox
2023-08-21  4:55       ` Linus Torvalds
2023-08-21  5:38         ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230821011303.hoeqjbmjaxajh255@f \
    --to=mjguzik@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox