linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Fuad Tabba <tabba@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: Quentin Perret <qperret@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	kvm@vger.kernel.org,  kvmarm@lists.linux.dev,
	pbonzini@redhat.com, chenhuacai@kernel.org,  mpe@ellerman.id.au,
	anup@brainfault.org, paul.walmsley@sifive.com,
	 palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com,
	 brauner@kernel.org, akpm@linux-foundation.org,
	xiaoyao.li@intel.com,  yilun.xu@intel.com,
	chao.p.peng@linux.intel.com, jarkko@kernel.org,
	 amoorthy@google.com, dmatlack@google.com,
	yu.c.zhang@linux.intel.com,  isaku.yamahata@intel.com,
	mic@digikod.net, vbabka@suse.cz,  vannapurve@google.com,
	ackerleytng@google.com, mail@maciej.szmigiero.name,
	 michael.roth@amd.com, wei.w.wang@intel.com,
	liam.merwick@oracle.com,  isaku.yamahata@gmail.com,
	kirill.shutemov@linux.intel.com,  suzuki.poulose@arm.com,
	steven.price@arm.com, quic_mnalajal@quicinc.com,
	 quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com,
	quic_cvanscha@quicinc.com,  quic_pderrin@quicinc.com,
	quic_pheragu@quicinc.com, catalin.marinas@arm.com,
	 james.morse@arm.com, yuzenghui@huawei.com,
	oliver.upton@linux.dev,  maz@kernel.org, will@kernel.org,
	keirf@google.com, linux-mm@kvack.org
Subject: Re: folio_mmapped
Date: Thu, 29 Feb 2024 19:01:51 +0000	[thread overview]
Message-ID: <CA+EHjTzHtsbhzrb-TWft1q3Ree3kgzZbsir+R9L0tDgSX-d-0g@mail.gmail.com> (raw)
In-Reply-To: <fc486cb4-0fe3-403f-b5e6-26d2140fcef9@redhat.com>

Hi David,

...

>>>> "mmap() the whole thing once and only access what you are supposed to
> >   (> > > access" sounds reasonable to me. If you don't play by the rules, you get a
> >>>> signal.
> >>>
> >>> "... you get a signal, or maybe you don't". But yes I understand your
> >>> point, and as per the above there are real benefits to this approach so
> >>> why not.
> >>>
> >>> What do we expect userspace to do when a page goes from shared back to
> >>> being guest-private, because e.g. the guest decides to unshare? Use
> >>> munmap() on that page? Or perhaps an madvise() call of some sort? Note
> >>> that this will be needed when starting a guest as well, as userspace
> >>> needs to copy the guest payload in the guestmem file prior to starting
> >>> the protected VM.
> >>
> >> Let's assume we have the whole guest_memfd mapped exactly once in our
> >> process, a single VMA.
> >>
> >> When setting up the VM, we'll write the payload and then fire up the VM.
> >>
> >> That will (I assume) trigger some shared -> private conversion.
> >>
> >> When we want to convert shared -> private in the kernel, we would first
> >> check if the page is currently mapped. If it is, we could try unmapping that
> >> page using an rmap walk.
> >
> > I had not considered that. That would most certainly be slow, but a well
> > behaved userspace process shouldn't hit it so, that's probably not a
> > problem...
>
> If there really only is a single VMA that covers the page (or even mmaps
> the guest_memfd), it should not be too bad. For example, any
> fallocate(PUNCHHOLE) has to do the same, to unmap the page before
> discarding it from the pagecache.

I don't think that we can assume that only a single VMA covers a page.

> But of course, no rmap walk is always better.

We've been thinking some more about how to handle the case where the
host userspace has a mapping of a page that later becomes private.

One idea is to refuse to run the guest (i.e., exit vcpu_run() to back
to the host with a meaningful exit reason) until the host unmaps that
page, and check for the refcount to the page as you mentioned earlier.
This is essentially what the RFC I sent does (minus the bugs :) ) .

The other idea is to use the rmap walk as you suggested to zap that
page. If the host tries to access that page again, it would get a
SIGBUS on the fault. This has the advantage that, as you'd mentioned,
the host doesn't need to constantly mmap() and munmap() pages. It
could potentially be optimised further as suggested if we have a
cooperating VMM that would issue a MADV_DONTNEED or something like
that, but that's just an optimisation and we would still need to have
the option of the rmap walk. However, I was wondering how practical
this idea would be if more than a single VMA covers a page?

Also, there's the question of what to do if the page is gupped? In
this case I think the only thing we can do is refuse to run the guest
until the gup (and all references) are released, which also brings us
back to the way things (kind of) are...

Thanks,
/fuad


  reply	other threads:[~2024-02-29 19:02 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240222161047.402609-1-tabba@google.com>
     [not found] ` <20240222141602976-0800.eberman@hu-eberman-lv.qualcomm.com>
2024-02-23  0:35   ` folio_mmapped Matthew Wilcox
2024-02-26  9:28     ` folio_mmapped David Hildenbrand
2024-02-26 21:14       ` folio_mmapped Elliot Berman
2024-02-27 14:59         ` folio_mmapped David Hildenbrand
2024-02-28 10:48           ` folio_mmapped Quentin Perret
2024-02-28 11:11             ` folio_mmapped David Hildenbrand
2024-02-28 12:44               ` folio_mmapped Quentin Perret
2024-02-28 13:00                 ` folio_mmapped David Hildenbrand
2024-02-28 13:34                   ` folio_mmapped Quentin Perret
2024-02-28 18:43                     ` folio_mmapped Elliot Berman
2024-02-28 18:51                       ` Quentin Perret
2024-02-29 10:04                     ` folio_mmapped David Hildenbrand
2024-02-29 19:01                       ` Fuad Tabba [this message]
2024-03-01  0:40                         ` folio_mmapped Elliot Berman
2024-03-01 11:16                           ` folio_mmapped David Hildenbrand
2024-03-04 12:53                             ` folio_mmapped Quentin Perret
2024-03-04 20:22                               ` folio_mmapped David Hildenbrand
2024-03-01 11:06                         ` folio_mmapped David Hildenbrand
2024-03-04 12:36                       ` folio_mmapped Quentin Perret
2024-03-04 19:04                         ` folio_mmapped Sean Christopherson
2024-03-04 20:17                           ` folio_mmapped David Hildenbrand
2024-03-04 21:43                             ` folio_mmapped Elliot Berman
2024-03-04 21:58                               ` folio_mmapped David Hildenbrand
2024-03-19  9:47                                 ` folio_mmapped Quentin Perret
2024-03-19  9:54                                   ` folio_mmapped David Hildenbrand
2024-03-18 17:06                             ` folio_mmapped Vishal Annapurve
2024-03-18 22:02                               ` folio_mmapped David Hildenbrand
     [not found]                                 ` <CAGtprH8B8y0Khrid5X_1twMce7r-Z7wnBiaNOi-QwxVj4D+L3w@mail.gmail.com>
2024-03-19  0:10                                   ` folio_mmapped Sean Christopherson
2024-03-19 10:26                                     ` folio_mmapped David Hildenbrand
2024-03-19 13:19                                       ` folio_mmapped David Hildenbrand
2024-03-19 14:31                                       ` folio_mmapped Will Deacon
2024-03-19 23:54                                         ` folio_mmapped Elliot Berman
2024-03-22 16:36                                           ` Will Deacon
2024-03-22 18:46                                             ` Elliot Berman
2024-03-27 19:31                                               ` Will Deacon
     [not found]                                         ` <2d6fc3c0-a55b-4316-90b8-deabb065d007@redhat.com>
2024-03-22 21:21                                           ` folio_mmapped David Hildenbrand
2024-03-26 22:04                                             ` folio_mmapped Elliot Berman
2024-03-27 19:34                                           ` folio_mmapped Will Deacon
2024-03-28  9:06                                             ` folio_mmapped David Hildenbrand
2024-03-28 10:10                                               ` folio_mmapped Quentin Perret
2024-03-28 10:32                                                 ` folio_mmapped David Hildenbrand
2024-03-28 10:58                                                   ` folio_mmapped Quentin Perret
2024-03-28 11:41                                                     ` folio_mmapped David Hildenbrand
2024-03-29 18:38                                                       ` folio_mmapped Vishal Annapurve
2024-04-04  0:15                                             ` folio_mmapped Sean Christopherson
2024-03-19 15:04                                       ` folio_mmapped Sean Christopherson
2024-03-22 17:16                                         ` folio_mmapped David Hildenbrand
2024-02-26  9:03   ` [RFC PATCH v1 00/26] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+EHjTzHtsbhzrb-TWft1q3Ree3kgzZbsir+R9L0tDgSX-d-0g@mail.gmail.com \
    --to=tabba@google.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=liam.merwick@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_cvanscha@quicinc.com \
    --cc=quic_mnalajal@quicinc.com \
    --cc=quic_pderrin@quicinc.com \
    --cc=quic_pheragu@quicinc.com \
    --cc=quic_svaddagi@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    --cc=seanjc@google.com \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yu.c.zhang@linux.intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox