linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: James Houghton <jthoughton@google.com>
To: Sean Christopherson <seanjc@google.com>
Cc: David Matlack <dmatlack@google.com>, Peter Xu <peterx@redhat.com>,
	 Andrea Arcangeli <aarcange@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	 Axel Rasmussen <axelrasmussen@google.com>,
	Linux MM <linux-mm@kvack.org>, kvm <kvm@vger.kernel.org>,
	 chao.p.peng@linux.intel.com
Subject: Re: [RFC] Improving userfaultfd scalability for live migration
Date: Tue, 6 Dec 2022 12:35:18 -0500	[thread overview]
Message-ID: <CADrL8HVM1poR5EYCsghhMMoN2U+FYT6yZr_5hZ8pLZTXpLnu8Q@mail.gmail.com> (raw)
In-Reply-To: <Y46VgQRU+do50iuv@google.com>

On Mon, Dec 5, 2022 at 8:06 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Dec 05, 2022, James Houghton wrote:
> > On Mon, Dec 5, 2022 at 1:20 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Mon, Dec 05, 2022, David Matlack wrote:
> > > > On Mon, Dec 5, 2022 at 7:30 AM Peter Xu <peterx@redhat.com> wrote:
> > > > > ...
> > > > > I'll have a closer read on the nested part, but note that this path already
> > > > > has the mmap lock then it invalidates the goal if we want to avoid taking
> > > > > it from the first place, or maybe we don't care?
> >
> > Not taking the mmap lock would be helpful, but we still have to take
> > it in UFFDIO_CONTINUE, so it's ok if we have to still take it here.
>
> IIUC, Peter is suggesting that the kernel not even get to the point where UFFD
> is involved.  The "fault" would get propagated to userspace by KVM, userspace
> fixes the fault (gets the page from the source, does MADV_POPULATE_WRITE), and
> resumes the vCPU.

If we haven't UFFDIO_CONTINUE'd some address range yet,
MADV_POPULATE_WRITE for that range will drop into handle_userfault and
go to sleep. Not good!

So, going with the no-slow-GUP approach, resolving faults is done like this:
- If we haven't UFFDIO_CONTINUE'd yet, do that now and restart
KVM_RUN. The PTEs will be none/blank right now. This is the common
case.
- If we have UFFDIO_CONTINUE'd already, if we were to do it again, we
would get EEXIST. (In this case, we probably have some type of swap
entry in the page tables.) We have to change the page tables to make
fast GUP succeed now *without* using UFFDIO_CONTINUE now.
MADV_POPULATE_WRITE seems to be the right tool for the job. This case
happens if the kernel has swapped the memory out, is migrating it, has
poisoned it, etc. If MADV_POPULATE_WRITE fails, we probably need to
crash or inject a memory error.

So with this approach, we never need to take the mmap_lock for reading
in hva_to_pfn, but we still need to take it in UFFDIO_CONTINUE.
Without removing the mmap_lock from *both*, we don't gain much.

So if we disregard this tiny mmap_lock benefit, the other approach
(the PF_NO_UFFD_WAIT approach) seems better. When KVM_RUN exits:
- If we haven't UFFDIO_CONTINUE'd yet, do that now and restart KVM_RUN.
- If we have, then something bad has happened. Slow GUP already ran
and failed, so we need to treat this in the same way we treat a
MADV_POPULATE_WRITE failure above: userspace might just want to crash
(or inject a memory error or something).

- James


  reply	other threads:[~2022-12-06 17:35 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-01 19:37 James Houghton
2022-12-03  1:03 ` Sean Christopherson
2022-12-05 15:27   ` Peter Xu
2022-12-05 17:31     ` David Matlack
2022-12-05 18:03       ` David Matlack
2022-12-05 18:23         ` Sean Christopherson
2022-12-05 18:20       ` Sean Christopherson
2022-12-05 21:19         ` James Houghton
2022-12-06  1:06           ` Sean Christopherson
2022-12-06 17:35             ` James Houghton [this message]
2022-12-06 18:00               ` Sean Christopherson
2022-12-06 20:41                 ` James Houghton
2022-12-08  1:56                   ` David Matlack
2022-12-08 17:50                     ` James Houghton
2023-01-04  0:57                       ` Sean Christopherson
2023-01-04  1:05                         ` James Houghton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADrL8HVM1poR5EYCsghhMMoN2U+FYT6yZr_5hZ8pLZTXpLnu8Q@mail.gmail.com \
    --to=jthoughton@google.com \
    --cc=aarcange@redhat.com \
    --cc=axelrasmussen@google.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox