linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Elliot Berman <quic_eberman@quicinc.com>
To: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	Fuad Tabba <tabba@google.com>, <kvm@vger.kernel.org>,
	<kvmarm@lists.linux.dev>, <pbonzini@redhat.com>,
	<chenhuacai@kernel.org>, <mpe@ellerman.id.au>,
	<anup@brainfault.org>, <paul.walmsley@sifive.com>,
	<palmer@dabbelt.com>, <aou@eecs.berkeley.edu>,
	<seanjc@google.com>, <viro@zeniv.linux.org.uk>,
	<brauner@kernel.org>, <akpm@linux-foundation.org>,
	<xiaoyao.li@intel.com>, <yilun.xu@intel.com>,
	<chao.p.peng@linux.intel.com>, <jarkko@kernel.org>,
	<amoorthy@google.com>, <dmatlack@google.com>,
	<yu.c.zhang@linux.intel.com>, <isaku.yamahata@intel.com>,
	<mic@digikod.net>, <vbabka@suse.cz>, <vannapurve@google.com>,
	<ackerleytng@google.com>, <mail@maciej.szmigiero.name>,
	<michael.roth@amd.com>, <wei.w.wang@intel.com>,
	<liam.merwick@oracle.com>, <isaku.yamahata@gmail.com>,
	<kirill.shutemov@linux.intel.com>, <suzuki.poulose@arm.com>,
	<steven.price@arm.com>, <quic_mnalajal@quicinc.com>,
	<quic_tsoni@quicinc.com>, <quic_svaddagi@quicinc.com>,
	<quic_cvanscha@quicinc.com>, <quic_pderrin@quicinc.com>,
	<quic_pheragu@quicinc.com>, <catalin.marinas@arm.com>,
	<james.morse@arm.com>, <yuzenghui@huawei.com>,
	<oliver.upton@linux.dev>, <maz@kernel.org>, <will@kernel.org>,
	<qperret@google.com>, <keirf@google.com>, <linux-mm@kvack.org>
Subject: Re: Re: folio_mmapped
Date: Mon, 26 Feb 2024 13:14:38 -0800	[thread overview]
Message-ID: <20240226105258596-0800.eberman@hu-eberman-lv.qualcomm.com> (raw)
In-Reply-To: <40a8fb34-868f-4e19-9f98-7516948fc740@redhat.com>

On Mon, Feb 26, 2024 at 10:28:11AM +0100, David Hildenbrand wrote:
> On 23.02.24 01:35, Matthew Wilcox wrote:
> > On Thu, Feb 22, 2024 at 03:43:56PM -0800, Elliot Berman wrote:
> > > > This creates the situation where access to successfully mmap()'d
> > > > memory might SIGBUS at page fault. There is precedence for
> > > > similar behavior in the kernel I believe, with MADV_HWPOISON and
> > > > the hugetlbfs cgroups controller, which could SIGBUS at page
> > > > fault time depending on the accounting limit.
> > > 
> > > I added a test: folio_mmapped() [1] which checks if there's a vma
> > > covering the corresponding offset into the guest_memfd. I use this
> > > test before trying to make page private to guest and I've been able to
> > > ensure that userspace can't even mmap() private guest memory. If I try
> > > to make memory private, I can test that it's not mmapped and not allow
> > > memory to become private. In my testing so far, this is enough to
> > > prevent SIGBUS from happening.
> > > 
> > > This test probably should be moved outside Gunyah specific code, and was
> > > looking for maintainer to suggest the right home for it :)
> > > 
> > > [1]: https://lore.kernel.org/all/20240222-gunyah-v17-20-1e9da6763d38@quicinc.com/
> > 
> > You, um, might have wanted to send an email to linux-mm, not bury it in
> > the middle of a series of 35 patches?
> > 
> > So this isn't folio_mapped() because you're interested if anyone _could_
> > fault this page, not whether the folio is currently present in anyone's
> > page tables.
> > 
> > It's like walk_page_mapping() but with a trivial mm_walk_ops; not sure
> > it's worth the effort to use walk_page_mapping(), but I would defer to
> > David.
> 
> First, I suspect we are not only concerned about current+future VMAs
> covering the page, we are also interested in any page pins that could have
> been derived from such a VMA?
> 
> Imagine user space mmap'ed the file, faulted in page, took a pin on the page
> using pin_user_pages() and friends, and then munmap()'ed the VMA.
> 
> You likely want to catch that as well and not allow a conversion to private?
> 
> [I assume you want to convert the page to private only if you hold all the
> folio references -- i.e., if the refcount of a small folio is 1]
> 

Ah, this was something I hadn't thought about. I think both Fuad and I
need to update our series to check the refcount rather than mapcount
(kvm_is_gmem_mapped for Fuad, gunyah_folio_lend_safe for me).

> 
> Now, regarding the original question (disallow mapping the page), I see the
> following approaches:
> 
> 1) SIGBUS during page fault. There are other cases that can trigger
>    SIGBUS during page faults: hugetlb when we are out of free hugetlb
>    pages, userfaultfd with UFFD_FEATURE_SIGBUS.
> 
> -> Simple and should get the job done.
> 
> 2) folio_mmapped() + preventing new mmaps covering that folio
> 
> -> More complicated, requires an rmap walk on every conversion.
> 
> 3) Disallow any mmaps of the file while any page is private
> 
> -> Likely not what you want.
> 
> 
> Why was 1) abandoned? I looks a lot easier and harder to mess up. Why are
> you trying to avoid page faults? What's the use case?
> 

We were chatting whether we could do better than the SIGBUS approach.
SIGBUS/FAULT usually crashes userspace, so I was brainstorming ways to
return errors early. One difference between hugetlb and this usecase is
that running out of free hugetlb pages isn't something we could detect
at mmap time. In guest_memfd usecase, we should be able to detect when
SIGBUS becomes possible due to memory being lent to guest.

I can't think of a reason why userspace would want/be able to resume
operation after trying to access a page that it shouldn't be allowed, so
SIGBUS is functional. The advantage of trying to avoid SIGBUS was
better/easier reporting to userspace.

- Elliot


  reply	other threads:[~2024-02-26 21:15 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240222161047.402609-1-tabba@google.com>
     [not found] ` <20240222141602976-0800.eberman@hu-eberman-lv.qualcomm.com>
2024-02-23  0:35   ` folio_mmapped Matthew Wilcox
2024-02-26  9:28     ` folio_mmapped David Hildenbrand
2024-02-26 21:14       ` Elliot Berman [this message]
2024-02-27 14:59         ` folio_mmapped David Hildenbrand
2024-02-28 10:48           ` folio_mmapped Quentin Perret
2024-02-28 11:11             ` folio_mmapped David Hildenbrand
2024-02-28 12:44               ` folio_mmapped Quentin Perret
2024-02-28 13:00                 ` folio_mmapped David Hildenbrand
2024-02-28 13:34                   ` folio_mmapped Quentin Perret
2024-02-28 18:43                     ` folio_mmapped Elliot Berman
2024-02-28 18:51                       ` Quentin Perret
2024-02-29 10:04                     ` folio_mmapped David Hildenbrand
2024-02-29 19:01                       ` folio_mmapped Fuad Tabba
2024-03-01  0:40                         ` folio_mmapped Elliot Berman
2024-03-01 11:16                           ` folio_mmapped David Hildenbrand
2024-03-04 12:53                             ` folio_mmapped Quentin Perret
2024-03-04 20:22                               ` folio_mmapped David Hildenbrand
2024-03-01 11:06                         ` folio_mmapped David Hildenbrand
2024-03-04 12:36                       ` folio_mmapped Quentin Perret
2024-03-04 19:04                         ` folio_mmapped Sean Christopherson
2024-03-04 20:17                           ` folio_mmapped David Hildenbrand
2024-03-04 21:43                             ` folio_mmapped Elliot Berman
2024-03-04 21:58                               ` folio_mmapped David Hildenbrand
2024-03-19  9:47                                 ` folio_mmapped Quentin Perret
2024-03-19  9:54                                   ` folio_mmapped David Hildenbrand
2024-03-18 17:06                             ` folio_mmapped Vishal Annapurve
2024-03-18 22:02                               ` folio_mmapped David Hildenbrand
     [not found]                                 ` <CAGtprH8B8y0Khrid5X_1twMce7r-Z7wnBiaNOi-QwxVj4D+L3w@mail.gmail.com>
2024-03-19  0:10                                   ` folio_mmapped Sean Christopherson
2024-03-19 10:26                                     ` folio_mmapped David Hildenbrand
2024-03-19 13:19                                       ` folio_mmapped David Hildenbrand
2024-03-19 14:31                                       ` folio_mmapped Will Deacon
2024-03-19 23:54                                         ` folio_mmapped Elliot Berman
2024-03-22 16:36                                           ` Will Deacon
2024-03-22 18:46                                             ` Elliot Berman
2024-03-27 19:31                                               ` Will Deacon
     [not found]                                         ` <2d6fc3c0-a55b-4316-90b8-deabb065d007@redhat.com>
2024-03-22 21:21                                           ` folio_mmapped David Hildenbrand
2024-03-26 22:04                                             ` folio_mmapped Elliot Berman
2024-03-27 19:34                                           ` folio_mmapped Will Deacon
2024-03-28  9:06                                             ` folio_mmapped David Hildenbrand
2024-03-28 10:10                                               ` folio_mmapped Quentin Perret
2024-03-28 10:32                                                 ` folio_mmapped David Hildenbrand
2024-03-28 10:58                                                   ` folio_mmapped Quentin Perret
2024-03-28 11:41                                                     ` folio_mmapped David Hildenbrand
2024-03-29 18:38                                                       ` folio_mmapped Vishal Annapurve
2024-04-04  0:15                                             ` folio_mmapped Sean Christopherson
2024-03-19 15:04                                       ` folio_mmapped Sean Christopherson
2024-03-22 17:16                                         ` folio_mmapped David Hildenbrand
2024-02-26  9:03   ` [RFC PATCH v1 00/26] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support Fuad Tabba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240226105258596-0800.eberman@hu-eberman-lv.qualcomm.com \
    --to=quic_eberman@quicinc.com \
    --cc=ackerleytng@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=amoorthy@google.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=brauner@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chenhuacai@kernel.org \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=james.morse@arm.com \
    --cc=jarkko@kernel.org \
    --cc=keirf@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=liam.merwick@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=mail@maciej.szmigiero.name \
    --cc=maz@kernel.org \
    --cc=mic@digikod.net \
    --cc=michael.roth@amd.com \
    --cc=mpe@ellerman.id.au \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=quic_cvanscha@quicinc.com \
    --cc=quic_mnalajal@quicinc.com \
    --cc=quic_pderrin@quicinc.com \
    --cc=quic_pheragu@quicinc.com \
    --cc=quic_svaddagi@quicinc.com \
    --cc=quic_tsoni@quicinc.com \
    --cc=seanjc@google.com \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wei.w.wang@intel.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yu.c.zhang@linux.intel.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox