linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Axel Rasmussen <axelrasmussen@google.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Hugh Dickins <hughd@google.com>,
	Nadav Amit <nadav.amit@gmail.com>
Subject: Re: [PATCH] mm/khugepaged: Detecting uffd-wp vma more efficiently
Date: Wed, 22 Sep 2021 17:20:57 -0400	[thread overview]
Message-ID: <YUueOUfoamxOvEyO@t490s> (raw)
In-Reply-To: <CAJHvVch3g_UY-akMdu0O9413iCb1H83DLhR1Am8WnfUyV=s1=g@mail.gmail.com>

On Wed, Sep 22, 2021 at 01:49:42PM -0700, Axel Rasmussen wrote:
> Sorry for missing the other thread.

Heh, you didn't miss the other thread; I just posted both of the emails in a
few hours. :)

> 
> Unfortunately, I think shmem THP *doesn't* really work with minor
> faults, and what's worse, just checking the VMA flag isn't enough.

As I replied to myself (sorry to have done that), now I think minor mode is
fine, but let's see what else I've missed, which is possible...  Please see
below.

> 
> First, let me note the guarantee UFFD minor faults are trying to
> provide: for a given mapping, any minor fault (that is, pte_none() but
> a page is present in the page cache) must result in a minor userfault
> event. Furthermore, the only way the fault may be resolved (i.e., a
> PTE installed) is via a UFFDIO_CONTINUE ioctl from userspace.

Yes.

> 
> A typical use case for minor faults is, we have two mappings (i.e.,
> two VMAs), both pointing to the same underlying physical memory. It's
> typical for both to have MAP_SHARED. It's typical for one of these
> mappings to be fully faulted in (i.e., all of its PTEs exist), while
> the other one has some missing PTEs. The problem is, khugepaged might
> scan *either* of the two mappings. Say it picks the fully-faulted VMA:
> even if we set khugepaged_max_ptes_none to zero, it will still go
> ahead and collapse these pages - because *this* VMA has no missing
> PTEs.

Yes.

> 
> Why is this a problem? When we collapse, we install a PMD, for *all*
> VMAs which reference these pages. In other words, we might install
> PTEs for the other, minor-fault-registered mapping, and therefore
> userfaults will never trigger for some of those regions, even though
> userspace never UFFDIO_CONTINUE-ed them.

Nop - we don't install PMD for file-backed, do we?

Please see khugepaged_scan_pmd() - that one installs PMDs indeed, but it's
anonymous-only code.

Then please also see khugepaged_scan_file() - that one handles file-backed
(aka, shmem), and it does _not_ install pmd, afaict.  The installation is lazy.

Not installing pmd means uffd-minor can still trap any further faults just like
before, afaiu.

There's a very trivial detail that the pmd missing case will have a very slight
code path change when the next page fault happens: in __handle_mm_fault() we'll
first try to go into create_huge_pmd() once, however since shmem didn't provide
huge_fault(), we'll go the VM_FAULT_FALLBACK path, and things will go like
before when faulting on a small pte.  The next UFFDIO_CONTINUE will allocate
that missing pmd again, however it'll install a 4K page only.

> 
> I *think* the right place to check for this and solve it is in
> retract_page_tables(), and I have a patch which does this. I've been
> hesitant to send it though, as due to a lack of time and the
> complexity involved I haven't been able to write a clear reproducer
> program, which my patch clearly fixes. :/

Yes retract_page_tables() could drop pmd pgtable for minor fault, but IMHO it's
fine too as mentioned above.

Minor mode should only care about trapping the page fault when the next access
comes.  retract_page_tables() will wipe the pmd pgtable page, that's not fine
for uffd-wp, but IMHO that's still very fine for minor mode as it will keep
trapping the old missing ptes; the difference is it'll just generate even more
traps (rather than on the pte holes only, now it'll generate one message for
each 4k over the merged 2M).

As I mentioned in the other thread, I think that'll cause false positive minor
fault messages, but IMHO that's fine, and minor fault userspace should always
need to handle that.

I fully agree with you that a reproducer would be very nice to try.  So if my
understanding is correct, the reproducer won't really fail on minor mode in any
way, but it'll just need to be prepared to receive more messages than it should.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2021-09-22 21:21 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-22 17:51 Peter Xu
2021-09-22 18:21 ` David Hildenbrand
2021-09-22 18:58   ` Peter Xu
2021-09-22 19:29     ` Yang Shi
2021-09-22 20:04       ` Peter Xu
2021-09-22 20:23         ` Peter Xu
2021-09-24 10:05     ` David Hildenbrand
2021-09-22 19:33 ` Peter Xu
2021-09-22 20:49 ` Axel Rasmussen
2021-09-22 21:20   ` Peter Xu [this message]
2021-09-22 23:18     ` Hugh Dickins
2021-09-22 23:44       ` Peter Xu
2021-09-23  1:22         ` Hugh Dickins
2021-09-23  2:18           ` Peter Xu
2021-09-23 16:47             ` Axel Rasmussen
2021-09-23 17:53               ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YUueOUfoamxOvEyO@t490s \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nadav.amit@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox