Re: folio_mmapped - Quentin Perret

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Quentin Perret <qperret@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: Fuad Tabba <tabba@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	kvm@vger.kernel.org, kvmarm@lists.linux.dev, pbonzini@redhat.com,
	chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org,
	paul.walmsley@sifive.com, palmer@dabbelt.com,
	aou@eecs.berkeley.edu, seanjc@google.com, brauner@kernel.org,
	akpm@linux-foundation.org, xiaoyao.li@intel.com,
	yilun.xu@intel.com, chao.p.peng@linux.intel.com,
	jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com,
	yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com,
	mic@digikod.net, vbabka@suse.cz, vannapurve@google.com,
	ackerleytng@google.com, mail@maciej.szmigiero.name,
	michael.roth@amd.com, wei.w.wang@intel.com,
	liam.merwick@oracle.com, isaku.yamahata@gmail.com,
	kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com,
	steven.price@arm.com, quic_mnalajal@quicinc.com,
	quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com,
	quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com,
	quic_pheragu@quicinc.com, catalin.marinas@arm.com,
	james.morse@arm.com, yuzenghui@huawei.com,
	oliver.upton@linux.dev, maz@kernel.org, will@kernel.org,
	keirf@google.com, linux-mm@kvack.org
Subject: Re: folio_mmapped
Date: Mon, 4 Mar 2024 12:53:36 +0000	[thread overview]
Message-ID: <ZeXEUMPn27J5je8T@google.com> (raw)
In-Reply-To: <d8e6c848-e26a-4014-b0c2-f3a21fb4e636@redhat.com>

On Friday 01 Mar 2024 at 12:16:54 (+0100), David Hildenbrand wrote:
> > > I don't think that we can assume that only a single VMA covers a page.
> > > 
> > > > But of course, no rmap walk is always better.
> > > 
> > > We've been thinking some more about how to handle the case where the
> > > host userspace has a mapping of a page that later becomes private.
> > > 
> > > One idea is to refuse to run the guest (i.e., exit vcpu_run() to back
> > > to the host with a meaningful exit reason) until the host unmaps that
> > > page, and check for the refcount to the page as you mentioned earlier.
> > > This is essentially what the RFC I sent does (minus the bugs :) ) .
> > > 
> > > The other idea is to use the rmap walk as you suggested to zap that
> > > page. If the host tries to access that page again, it would get a
> > > SIGBUS on the fault. This has the advantage that, as you'd mentioned,
> > > the host doesn't need to constantly mmap() and munmap() pages. It
> > > could potentially be optimised further as suggested if we have a
> > > cooperating VMM that would issue a MADV_DONTNEED or something like
> > > that, but that's just an optimisation and we would still need to have
> > > the option of the rmap walk. However, I was wondering how practical
> > > this idea would be if more than a single VMA covers a page?
> > > 
> > 
> > Agree with all your points here. I changed Gunyah's implementation to do
> > the unmap instead of erroring out. I didn't observe a significant
> > performance difference. However, doing unmap might be a little faster
> > because we can check folio_mapped() before doing the rmap walk. When
> > erroring out at mmap() level, we always have to do the walk.
> 
> Right. On the mmap() level you won't really have to walk page tables, as the
> the munmap() already zapped the page and removed the "problematic" VMA.
> 
> Likely, you really want to avoid repeatedly calling mmap()+munmap() just to
> access shared memory; but that's just my best guess about your user space
> app :)

Ack, and expecting userspace to munmap the pages whenever we hit a valid
mapping in userspace page-tables in the KVM faults path makes for a
somewhat unusual interface IMO. Userspace can munmap, mmap again, and if
it doesn't touch the pages, it can proceed to run the guest just fine,
is that the expectation? If so, it feels like we're 'leaking' internal
kernel state somehow. The kernel is normally well within its rights to
zap userspace mappings if it wants to e.g. swap. (Obviously mlock is a
weird case, but even in that case, IIRC the kernel still has a certain
amount of flexibility and can use compaction and friends). Similarly,
it should be well within its right to proactively create them. How
would this scheme work if, 10 years from now, something like
Speculative Page Faults makes it into the kernel in a different form?

Not requiring to userspace to unmap makes the userspace interface a lot
simpler I think -- once a protected guest starts, you better not touch
its memory if it's not been shared back or you'll get slapped on the
wrist. Whether or not those pages have been accessed beforehand for
example is irrelevant.

next prev parent reply	other threads:[~2024-03-04 12:53 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240222161047.402609-1-tabba@google.com>
     [not found] ` <20240222141602976-0800.eberman@hu-eberman-lv.qualcomm.com>
2024-02-23  0:35   ` folio_mmapped Matthew Wilcox
2024-02-26  9:28     ` folio_mmapped David Hildenbrand
2024-02-26 21:14       ` folio_mmapped Elliot Berman
2024-02-27 14:59         ` folio_mmapped David Hildenbrand
2024-02-28 10:48           ` folio_mmapped Quentin Perret
2024-02-28 11:11             ` folio_mmapped David Hildenbrand
2024-02-28 12:44               ` folio_mmapped Quentin Perret
2024-02-28 13:00                 ` folio_mmapped David Hildenbrand
2024-02-28 13:34                   ` folio_mmapped Quentin Perret
2024-02-28 18:43                     ` folio_mmapped Elliot Berman
2024-02-28 18:51                       ` Quentin Perret
2024-02-29 10:04                     ` folio_mmapped David Hildenbrand
2024-02-29 19:01                       ` folio_mmapped Fuad Tabba
2024-03-01  0:40                         ` folio_mmapped Elliot Berman
2024-03-01 11:16                           ` folio_mmapped David Hildenbrand
2024-03-04 12:53                             ` Quentin Perret [this message]
2024-03-04 20:22                               ` folio_mmapped David Hildenbrand
2024-03-01 11:06                         ` folio_mmapped David Hildenbrand
2024-03-04 12:36                       ` folio_mmapped Quentin Perret
2024-03-04 19:04                         ` folio_mmapped Sean Christopherson
2024-03-04 20:17                           ` folio_mmapped David Hildenbrand
2024-03-04 21:43                             ` folio_mmapped Elliot Berman
2024-03-04 21:58                               ` folio_mmapped David Hildenbrand
2024-03-19  9:47                                 ` folio_mmapped Quentin Perret
2024-03-19  9:54                                   ` folio_mmapped David Hildenbrand
2024-03-18 17:06                             ` folio_mmapped Vishal Annapurve
2024-03-18 22:02                               ` folio_mmapped David Hildenbrand
     [not found]                                 ` <CAGtprH8B8y0Khrid5X_1twMce7r-Z7wnBiaNOi-QwxVj4D+L3w@mail.gmail.com>
2024-03-19  0:10                                   ` folio_mmapped Sean Christopherson
2024-03-19 10:26                                     ` folio_mmapped David Hildenbrand
2024-03-19 13:19                                       ` folio_mmapped David Hildenbrand
2024-03-19 14:31                                       ` folio_mmapped Will Deacon
2024-03-19 23:54                                         ` folio_mmapped Elliot Berman
2024-03-22 16:36                                           ` Will Deacon
2024-03-22 18:46                                             ` Elliot Berman
2024-03-27 19:31                                               ` Will Deacon
     [not found]                                         ` <2d6fc3c0-a55b-4316-90b8-deabb065d007@redhat.com>
2024-03-22 21:21                                           ` folio_mmapped David Hildenbrand
2024-03-26 22:04                                             ` folio_mmapped Elliot Berman
2024-03-27 19:34                                           ` folio_mmapped Will Deacon
2024-03-28  9:06                                             ` folio_mmapped David Hildenbrand
2024-03-28 10:10                                               ` folio_mmapped Quentin Perret
2024-03-28 10:32                                                 ` folio_mmapped David Hildenbrand
2024-03-28 10:58                                                   ` folio_mmapped Quentin Perret
2024-03-28 11:41                                                     ` folio_mmapped David Hildenbrand
2024-03-29 18:38                                                       ` folio_mmapped Vishal Annapurve
2024-04-04  0:15                                             ` folio_mmapped Sean Christopherson
2024-03-19 15:04                                       ` folio_mmapped Sean Christopherson
2024-03-22 17:16                                         ` folio_mmapped David Hildenbrand
2024-02-26  9:03   ` [RFC PATCH v1 00/26] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZeXEUMPn27J5je8T@google.com \
    --to=qperret@google.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=liam.merwick@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=quic_cvanscha@quicinc.com \
    --cc=quic_mnalajal@quicinc.com \
    --cc=quic_pderrin@quicinc.com \
    --cc=quic_pheragu@quicinc.com \
    --cc=quic_svaddagi@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    --cc=seanjc@google.com \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yu.c.zhang@linux.intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox