linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: James Houghton <jthoughton@google.com>
To: Peter Xu <peterx@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	 Axel Rasmussen <axelrasmussen@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH] hugetlb: unshare some PMDs when splitting VMAs
Date: Wed, 4 Jan 2023 23:12:53 +0000	[thread overview]
Message-ID: <CADrL8HVMCxHPBUuXo6dxam0giVcx2kAn=8n-3NjVO0ZddFw_tQ@mail.gmail.com> (raw)
In-Reply-To: <Y7Xbo0tUO26khHCA@x1n>

On Wed, Jan 4, 2023 at 8:03 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Jan 04, 2023 at 07:10:11PM +0000, James Houghton wrote:
> > > > I'll see if I can confirm that this is indeed possible and send a
> > > > repro if it is.
> > >
> > > I think your analysis above is correct.  The key being the failure to unshare
> > > in the non-PUD_SIZE vma after the split.
> >
> > I do indeed hit the WARN_ON_ONCE (repro attached), and the MADV wasn't
> > even needed (the UFFDIO_REGISTER does the VMA split before "unsharing
> > all PMDs"). With the fix, we avoid the WARN_ON_ONCE, but the behavior
> > is still incorrect: I expect the address range to be write-protected,
> > but it isn't.
> >
> > The reason why is that hugetlb_change_protection uses huge_pte_offset,
> > even if it's being called for a UFFDIO_WRITEPROTECT with
> > UFFDIO_WRITEPROTECT_MODE_WP. In that particular case, I'm pretty sure
> > we should be using huge_pte_alloc, but even so, it's not trivial to
> > get an allocation failure back up to userspace. The non-hugetlb
> > implementation of UFFDIO_WRITEPROTECT seems to also have this problem.
> >
> > Peter, what do you think?
>
> Indeed.  Thanks for spotting that, James.
>
> Non-hugetlb should be fine with having empty pgtable entries. Anon doesn't
> need to care about no-pgtable-populated ranges so far. Shmem does it with a
> few change_prepare() calls to populate the entries so the markers can be
> installed later on.

Ah ok! :)

>
> However I think the fault handling is still not well handled as you pointed
> out even for shmem: that's the path I probably never triggered myself yet
> before and the code stayed there since a very early version:
>
> #define  change_pmd_prepare(vma, pmd, cp_flags)                         \
>         do {                                                            \
>                 if (unlikely(uffd_wp_protect_file(vma, cp_flags))) {    \
>                         if (WARN_ON_ONCE(pte_alloc(vma->vm_mm, pmd)))   \
>                                 break;                                  \
>                 }                                                       \
>         } while (0)
>
> I think a better thing we can do here (instead of warning and stop the
> UFFDIO_WRITEPROTECT at the current stage) is returning with -ENOMEM
> properly so the user can know the error.  We'll need to touch the stacks up
> to uffd_wp_range() as it's the only one that can trigger the -ENOMEM so
> far, so as to not ignore retval from change_protection().
>
> Meanwhile, I'd also wonder whether we should call pagefault_out_of_memory()
> because it should be the same as when pgtable allocation failure happens in
> page faults, we may want to OOM already.  I can take care of hugetlb part
> too along the way.

I might be misunderstanding, the only case where
hugetlb_change_protection() would *need* to allocate is when it is
called from UFFDIO_WRITEPROTECT, not while handling a #pf. So I don't
think any calls to pagefault_out_of_memory() need to be added.

>
> Man page of UFFDIO_WRITEPROTECT may need a fixup too to introduce -ENOMEM.
>
> I can quickly prepare some patches for this, and hopefully it doesn't need
> to block the current fix on split.

I don't think it should block this splitting fix. I'll send another
version of this fix soon.

>
> Any thoughts?
>
> >
> > >
> > > To me, the fact it was somewhat difficult to come up with this scenario is an
> > > argument what we should just unshare at split time as you propose.  Who
> > > knows what other issues may exist.
> > >
> > > > 60dfaad65a ("mm/hugetlb: allow uffd wr-protect none ptes") is the
> > > > commit that introduced the WARN_ON_ONCE; perhaps it's a good choice
> > > > for a Fixes: tag (if above is indeed true).
> > >
> > > If the key issue in your above scenario is indeed the failure of
> > > hugetlb_unshare_all_pmds in the non-PUD_SIZE vma, then perhaps we tag?
> > >
> > > 6dfeaff93be1 ("hugetlb/userfaultfd: unshare all pmds for hugetlbfs when
> > > register wp")
> >
> > SGTM. Thanks Mike.
>
> Looks good here too.

Thanks, Peter!


  reply	other threads:[~2023-01-04 23:13 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-01 23:00 James Houghton
2023-01-03 19:24 ` Mike Kravetz
2023-01-03 20:26   ` James Houghton
2023-01-03 20:27     ` James Houghton
2023-01-03 22:23     ` Mike Kravetz
2023-01-04 19:10       ` James Houghton
2023-01-04 20:03         ` Peter Xu
2023-01-04 23:12           ` James Houghton [this message]
2023-01-03 23:04 ` Peter Xu
2023-01-04 19:34   ` James Houghton
2023-01-04 20:04     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADrL8HVMCxHPBUuXo6dxam0giVcx2kAn=8n-3NjVO0ZddFw_tQ@mail.gmail.com' \
    --to=jthoughton@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=peterx@redhat.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox