From: Peter Xu <peterx@redhat.com>
To: James Houghton <jthoughton@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
Muchun Song <songmuchun@bytedance.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] hugetlb: unshare some PMDs when splitting VMAs
Date: Wed, 4 Jan 2023 15:03:47 -0500 [thread overview]
Message-ID: <Y7Xbo0tUO26khHCA@x1n> (raw)
In-Reply-To: <CADrL8HV73m0nVJOK3uv4sbyGKOVZhVxSv2+i4pUV7tozu6vW5Q@mail.gmail.com>
On Wed, Jan 04, 2023 at 07:10:11PM +0000, James Houghton wrote:
> > > I'll see if I can confirm that this is indeed possible and send a
> > > repro if it is.
> >
> > I think your analysis above is correct. The key being the failure to unshare
> > in the non-PUD_SIZE vma after the split.
>
> I do indeed hit the WARN_ON_ONCE (repro attached), and the MADV wasn't
> even needed (the UFFDIO_REGISTER does the VMA split before "unsharing
> all PMDs"). With the fix, we avoid the WARN_ON_ONCE, but the behavior
> is still incorrect: I expect the address range to be write-protected,
> but it isn't.
>
> The reason why is that hugetlb_change_protection uses huge_pte_offset,
> even if it's being called for a UFFDIO_WRITEPROTECT with
> UFFDIO_WRITEPROTECT_MODE_WP. In that particular case, I'm pretty sure
> we should be using huge_pte_alloc, but even so, it's not trivial to
> get an allocation failure back up to userspace. The non-hugetlb
> implementation of UFFDIO_WRITEPROTECT seems to also have this problem.
>
> Peter, what do you think?
Indeed. Thanks for spotting that, James.
Non-hugetlb should be fine with having empty pgtable entries. Anon doesn't
need to care about no-pgtable-populated ranges so far. Shmem does it with a
few change_prepare() calls to populate the entries so the markers can be
installed later on.
However I think the fault handling is still not well handled as you pointed
out even for shmem: that's the path I probably never triggered myself yet
before and the code stayed there since a very early version:
#define change_pmd_prepare(vma, pmd, cp_flags) \
do { \
if (unlikely(uffd_wp_protect_file(vma, cp_flags))) { \
if (WARN_ON_ONCE(pte_alloc(vma->vm_mm, pmd))) \
break; \
} \
} while (0)
I think a better thing we can do here (instead of warning and stop the
UFFDIO_WRITEPROTECT at the current stage) is returning with -ENOMEM
properly so the user can know the error. We'll need to touch the stacks up
to uffd_wp_range() as it's the only one that can trigger the -ENOMEM so
far, so as to not ignore retval from change_protection().
Meanwhile, I'd also wonder whether we should call pagefault_out_of_memory()
because it should be the same as when pgtable allocation failure happens in
page faults, we may want to OOM already. I can take care of hugetlb part
too along the way.
Man page of UFFDIO_WRITEPROTECT may need a fixup too to introduce -ENOMEM.
I can quickly prepare some patches for this, and hopefully it doesn't need
to block the current fix on split.
Any thoughts?
>
> >
> > To me, the fact it was somewhat difficult to come up with this scenario is an
> > argument what we should just unshare at split time as you propose. Who
> > knows what other issues may exist.
> >
> > > 60dfaad65a ("mm/hugetlb: allow uffd wr-protect none ptes") is the
> > > commit that introduced the WARN_ON_ONCE; perhaps it's a good choice
> > > for a Fixes: tag (if above is indeed true).
> >
> > If the key issue in your above scenario is indeed the failure of
> > hugetlb_unshare_all_pmds in the non-PUD_SIZE vma, then perhaps we tag?
> >
> > 6dfeaff93be1 ("hugetlb/userfaultfd: unshare all pmds for hugetlbfs when
> > register wp")
>
> SGTM. Thanks Mike.
Looks good here too.
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2023-01-04 20:03 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-01 23:00 James Houghton
2023-01-03 19:24 ` Mike Kravetz
2023-01-03 20:26 ` James Houghton
2023-01-03 20:27 ` James Houghton
2023-01-03 22:23 ` Mike Kravetz
2023-01-04 19:10 ` James Houghton
2023-01-04 20:03 ` Peter Xu [this message]
2023-01-04 23:12 ` James Houghton
2023-01-03 23:04 ` Peter Xu
2023-01-04 19:34 ` James Houghton
2023-01-04 20:04 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y7Xbo0tUO26khHCA@x1n \
--to=peterx@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=jthoughton@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=songmuchun@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox