linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Nadav Amit <nadav.amit@gmail.com>
Subject: Re: [PATCH 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling
Date: Mon, 3 Oct 2022 17:27:52 -0400	[thread overview]
Message-ID: <YztT2OJake65WG3P@x1n> (raw)
In-Reply-To: <YzsZ3VV/95AvXDUz@monkey>

On Mon, Oct 03, 2022 at 10:20:29AM -0700, Mike Kravetz wrote:
> On 10/03/22 11:56, Peter Xu wrote:
> > After the recent rework patchset of hugetlb locking on pmd sharing,
> > kselftest for userfaultfd sometimes fails on hugetlb private tests with
> > unexpected write fault checks.
> > 
> > It turns out there's nothing wrong within the locking series regarding this
> > matter, but it could have changed the timing of threads so it can trigger
> > an old bug.
> > 
> > The real bug is when we call hugetlb_no_page() we're not with the pgtable
> > lock.  It means we're reading the pte values lockless.  It's perfectly fine
> > in most cases because before we do normal page allocations we'll take the
> > lock and check pte_same() again.  However before that, there are actually
> > two paths on userfaultfd missing/minor handling that may directly move on
> > with the fault process without checking the pte values.
> > 
> > It means for these two paths we may be generating an uffd message based on
> > an unstable pte, while an unstable pte can legally be anything as long as
> > the modifier holds the pgtable lock.
> > 
> > One example, which is also what happened in the failing kselftest and
> > caused the test failure, is that for private mappings CoW can happen on one
> > page.  CoW requires pte being cleared before being replaced with a new page
> > for TLB coherency, but then there can be a race condition:
> > 
> >         thread 1                              thread 2
> >         --------                              --------
> > 
> >       hugetlb_fault                         hugetlb_fault
> >         private pte RO
> >         hugetlb_wp
> >           pgtable_lock()
> >           huge_ptep_clear_flush
> >                                               pte=NULL
> >                                               hugetlb_no_page
> >                                                 generate uffd missing event
> >                                                 even if page existed!!
> >           set_huge_pte_at
> >           pgtable_unlock()
> 
> Thanks for working on this Peter!
> 
> I agree with this patch, but I suspect the above race is not possible.  Why?
> In both cases, we take the hugetlb fault mutex when processing a huegtlb
> page fault.  This means only one thread can execute the fault code for
> a specific mapping/index at a time.  This is why I was so confused, and may
> remain a bit confused about how we end up with userfault processing a write
> fault.  It was part of the reason for my 'unclear' wording about this being
> more about cpus not seeing updated values.  Note that we do drop the fault
> mutex before calling handle_usefault, but by then we have already made the
> 'missing' determination.
> 
> Thoughts?  Perhaps, I am still confused.

It's my fault to have the commit message wrong, sorry. :) And thanks for
raising this question, I could have overlooked that.

It turns out it's not the CoW that's clearing the pte... it's the
wr-protect with huge_ptep_modify_prot_start().  So the race is with
UFFDIO_WRITEPROTECT, not CoW.

Obviously when I was tracking the hpte changes I overlooked
huge_ptep_get_and_clear(), only seeing the CoW path and I'm pretty sure
that's already a bug which was obvious enough.  I didn't prove they
happened at the same time during the MISSING event.

Then after I further looked at the CoW code I start to question myself on
why CoW would trigger at all even with an available fault mutex, since for
private mappings mapcount should be 1:

	if (page_mapcount(old_page) == 1 && PageAnon(old_page)) {
		if (!PageAnonExclusive(old_page))
			page_move_anon_rmap(old_page, vma);
		if (likely(!unshare))
			set_huge_ptep_writable(vma, haddr, ptep);

		delayacct_wpcopy_end();
		return 0;
	}

There could still be something else I didn't catch, even though that may
not be relevant to this patchset anymore.

I'll wait a few more hours for other reviewers, then prepare a new version
with the corrected commit message.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2022-10-03 21:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-03 15:56 [PATCH 0/3] mm/hugetlb: Fix selftest failures with write check Peter Xu
2022-10-03 15:56 ` [PATCH 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling Peter Xu
2022-10-03 17:20   ` Mike Kravetz
2022-10-03 21:27     ` Peter Xu [this message]
2022-10-03 21:45       ` Mike Kravetz
2022-10-04  0:28         ` Peter Xu
2022-10-03 21:00   ` Nadav Amit
2022-10-03 21:16     ` Peter Xu
2022-10-03 21:32       ` Nadav Amit
2022-10-03 15:56 ` [PATCH 2/3] mm/hugetlb: Use hugetlb_pte_stable in migration race check Peter Xu
2022-10-03 15:56 ` [PATCH 3/3] mm/selftest: uffd: Explain the write missing fault check Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YztT2OJake65WG3P@x1n \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=rppt@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox