linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Hugh Dickins <hughd@google.com>
Cc: "Matthew Wilcox" <willy@infradead.org>,
	"José Pekkarinen" <jose.pekkarinen@foxhound.fi>,
	akpm@linux-foundation.org, skhan@linuxfoundation.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	linux-kernel-mentees@lists.linux.dev,
	syzbot+89edd67979b52675ddec@syzkaller.appspotmail.com,
	"Jann Horn" <jannh@google.com>
Subject: Re: [PATCH] mm/pgtable: return null if no ptl in __pte_offset_map_lock
Date: Thu, 16 Nov 2023 00:26:39 -0800 (PST)	[thread overview]
Message-ID: <1b7019f3-0abf-a2f5-46ae-373af2043e56@google.com> (raw)
In-Reply-To: <515cb9c1-abcd-c3f3-cc0d-c3cd248b9d6f@google.com>

[-- Attachment #1: Type: text/plain, Size: 6161 bytes --]

On Wed, 15 Nov 2023, Hugh Dickins wrote:
> On Wed, 15 Nov 2023, Matthew Wilcox wrote:
> > On Wed, Nov 15, 2023 at 06:05:30PM +0200, José Pekkarinen wrote:
> > 
> > > > I don't think we should be changing ptlock_ptr().
> > > 
> > >     This is where the null ptr dereference originates, so the only
> > > alternative I can think of is to protect the life cycle of the ptdesc
> > > to prevent it to die between the pte check and the spin_unlock of
> > > __pte_offset_map_lock. Would that work for you?
> 
> Thanks for pursuing this, José, but I agree with Matthew: I don't
> think your patch is right at all.  The change in ptlock_ptr() did not
> make sense to me, and the change in __pte_offset_map_lock() leaves us
> still wondering what has gone wrong (and misses an rcu_read_unlock()).
> 
> You mentioned "I tested the syzbot reproducer in x86 and it doesn't
> produce this kasan report anymore": are you saying that you were able
> to reproduce the issue on x86 (without your patch)?  That would be very
> interesting (and I think would disprove my hypothesis below).  I ought
> to try on x86 if you managed to reproduce on it, but it's not worth
> the effort if you did not.  If you have an x86 stack and registers,
> please show (though I'm uncertain how much KASAN spoils the output).
> 
> > 
> > Ah!  I think I found the problem.
> > 
> > If ALLOC_SPLIT_PTLOCKS is not set, there is no problem as ->ptl
> > is embedded in the struct page.  But if ALLOC_SPLIT_PTLOCKS is set
> > (eg you have LOCKDEP enabled), we can _return_ a NULL pointer from
> > ptlock_ptr.  The NULL pointer dereference isn't in ptlock_ptr(), it's
> > in __pte_offset_map_lock().
> > 
> > So, how to solve this?  We can't just check the ptl against NULL; the
> > memory that ptl points to may have been freed.  We could grab a reference
> > to the pmd_page, possibly conditionally on ALLOC_SPLIT_LOCK being set,
> > but that will slow down everything.  We could make page_ptl_cachep
> > SLAB_TYPESAFE_BY_RCU, preventing the memory from being freed (even if
> > the lock might not be associated with this page any more).
> 
> But I don't think that's quite right either: or, I hope it's not right,
> because it would say that all my business of rcu_read_lock() and
> pte_free_defer() etc was a failing waste of everyone's time: if the
> page table (and/or its lock) can be freed while someone is in that RCU-
> protected area, then I've simply got it wrong.
> 
> What I thought, when yesterday's flurry of syzbot messages came in,
> was perhaps the problem is at the other end - when the new page table
> is allocated, rather than when it's being freed: a barrier missing there?
> 
> But pmd_install() does have an smp_wmb(), with a (suspiciously) long
> comment about why no smp_rmb()s are needed, except on alpha.
> 
> I'm not much good on barriers, but the thought now in my mind is that
> that comment is not quite accurate: that at the bottom level an smp_rmb()
> is (very seldom!) needed - not just to avoid a NULL (or previous garbage)
> ptl pointer in the ALLOC_SPLIT_LOCK case, but to make sure that the ptl
> is visibly correctly initialized (or would spin_lock on it achieve that?);
> and that what pte_offset_map() points to is visibly empty.  (I'm imagining
> stale cache lines left behind on the oopsing CPU, which need to be
> refetched after pmdval has been read.)
> 
> And that this issue has, strictly speaking, always been there, but in
> practice never a problem, because of mmap_lock held for write while (or
> prior to) freeing page table, and racers holding mmap_lock for read
> (vma lock not changing the calculus); but collapse_pte_mapped_thp()
> can now be done with mmap_lock for read, letting those racers in
> (and maybe your filemap_map_pages() has helped speed these races up -
> any blame I can shift on to you, the better ;)
> 
> But I may well be spouting nonsense.  And even if I'm right, is adding
> an smp_rmb() in there too high a price to pay on some architectures?
> 
> I did make an attempt to reproduce on an arm64, but there's too much
> more I'd need to muck around with to get it working right.  Let's ask for
> a syz test on top of v6.7-rc1 - hmm, how do I tell syzbot that it's arm64
> that it needs to try on?  I hope it gets that from the Cc'ed syz number.

Syzbot couldn't do it from this mail, but did it from the original thread:
https://lore.kernel.org/linux-mm/000000000000ba0007060a40644f@google.com/
tells us that I was spouting nonsense: this patch does not fix it.
I have no further idea yet.

> 
> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git b85ea95d086471afb4ad062012a4d73cd328fa86
> 
> [PATCH] mm/pgtable: smp_rmb() to match smp_wmb() in pmd_install()
> 
> Not-Yet-Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>  mm/memory.c          | 2 ++
>  mm/pgtable-generic.c | 5 +++++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 1f18ed4a5497..8939357f1509 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -425,6 +425,8 @@ void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte)
>  		 * being the notable exception) will already guarantee loads are
>  		 * seen in-order. See the alpha page table accessors for the
>  		 * smp_rmb() barriers in page table walking code.
> +		 *
> +		 * See __pte_offset_map() for the smp_rmb() at the pte level.
>  		 */
>  		smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */
>  		pmd_populate(mm, pmd, *pte);
> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
> index 4fcd959dcc4d..3330b666e9c3 100644
> --- a/mm/pgtable-generic.c
> +++ b/mm/pgtable-generic.c
> @@ -297,6 +297,11 @@ pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
>  		pmd_clear_bad(pmd);
>  		goto nomap;
>  	}
> +	/*
> +	 * Pair with the smp_wmb() in pmd_install(): make sure that the
> +	 * page table lock and page table contents are visibly initialized.
> +	 */
> +	smp_rmb();
>  	return __pte_map(&pmdval, addr);
>  nomap:
>  	rcu_read_unlock();
> -- 
> 2.35.3

  reply	other threads:[~2023-11-16  8:26 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-15  6:55 José Pekkarinen
2023-11-15 14:19 ` Matthew Wilcox
2023-11-15 16:05   ` José Pekkarinen
2023-11-15 19:04     ` Matthew Wilcox
2023-11-16  7:23       ` Hugh Dickins
2023-11-16  8:26         ` Hugh Dickins [this message]
2023-11-16 10:04         ` José Pekkarinen
2023-11-17  6:13           ` Hugh Dickins
2023-11-17  9:16             ` José Pekkarinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1b7019f3-0abf-a2f5-46ae-373af2043e56@google.com \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=jannh@google.com \
    --cc=jose.pekkarinen@foxhound.fi \
    --cc=linux-kernel-mentees@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=skhan@linuxfoundation.org \
    --cc=syzbot+89edd67979b52675ddec@syzkaller.appspotmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox