From: Ackerley Tng <ackerleytng@google.com>
To: Vishal Annapurve <vannapurve@google.com>
Cc: tabba@google.com, kvm@vger.kernel.org,
linux-arm-msm@vger.kernel.org, linux-mm@kvack.org,
pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au,
anup@brainfault.org, paul.walmsley@sifive.com,
palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com,
viro@zeniv.linux.org.uk, brauner@kernel.org,
willy@infradead.org, akpm@linux-foundation.org,
xiaoyao.li@intel.com, yilun.xu@intel.com,
chao.p.peng@linux.intel.com, jarkko@kernel.org,
amoorthy@google.com, dmatlack@google.com,
yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com,
mic@digikod.net, vbabka@suse.cz, mail@maciej.szmigiero.name,
david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com,
liam.merwick@oracle.com, isaku.yamahata@gmail.com,
kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com,
steven.price@arm.com, quic_eberman@quicinc.com,
quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com,
quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com,
quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
catalin.marinas@arm.com, james.morse@arm.com,
yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org,
will@kernel.org, qperret@google.com, keirf@google.com,
roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org,
jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com,
fvdl@google.com, hughd@google.com, jthoughton@google.com
Subject: Re: [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages
Date: Wed, 05 Feb 2025 04:31:53 +0000 [thread overview]
Message-ID: <diqzldulovuu.fsf@ackerleytng-ctop-specialist.c.googlers.com> (raw)
In-Reply-To: <CAGtprH-Ryn6Xqs-3_VBMkk3ew74Rf9=D8S_iHVmq2DE-YFk2-w@mail.gmail.com> (message from Vishal Annapurve on Tue, 4 Feb 2025 17:28:08 -0800)
Vishal Annapurve <vannapurve@google.com> writes:
> On Thu, Jan 23, 2025 at 1:51 AM Fuad Tabba <tabba@google.com> wrote:
>>
>> On Wed, 22 Jan 2025 at 22:16, Ackerley Tng <ackerleytng@google.com> wrote:
>> >
>> > Fuad Tabba <tabba@google.com> writes:
>> >
>> > Hey Fuad, I'm still working on verifying all this but for now this is
>> > one issue. I think this can be fixed by checking if the folio->mapping
>> > is NULL. If it's NULL, then the folio has been disassociated from the
>> > inode, and during the dissociation (removal from filemap), the
>> > mappability can also either
>> >
>> > 1. Be unset so that the default mappability can be set up based on
>> > GUEST_MEMFD_FLAG_INIT_MAPPABLE, or
>> > 2. Be directly restored based on GUEST_MEMFD_FLAG_INIT_MAPPABLE
>>
>> Thanks for pointing this out. I hadn't considered this case. I'll fix
>> in the respin.
>>
>
> Can the below scenario cause trouble?
> 1) Userspace converts a certain range of guest memfd as shared and
> grabs some refcounts on shared memory pages through existing kernel
> exposed mechanisms.
> 2) Userspace converts the same range to private which would cause the
> corresponding mappability attributes to be *MAPPABILITY_NONE.
> 3) Userspace truncates the range which will remove the page from pagecache.
> 4) Userspace does the fallocate again, leading to a new page getting
> allocated without freeing the older page which is still refcounted
> (step 1).
>
> Effectively this could allow userspace to keep allocating multiple
> pages for the same guest_memfd range.
I'm still verifying this but for now here's the flow Vishal described in
greater detail:
+ guest_memfd starts without GUEST_MEMFD_FLAG_INIT_MAPPABLE
+ All new pages will start with mappability = GUEST
+ guest uses a page
+ Get new page
+ Add page to filemap
+ guest converts page to shared
+ Mappability is now ALL
+ host uses page
+ host takes transient refcounts on page
+ Refcount on the page is now (a) filemap's refcount (b) vma's refcount
(c) transient refcount
+ guest converts page to private
+ Page is unmapped
+ Refcount on the page is now (a) filemap's refcount (b) transient
refcount
+ Since refcount is elevated, the mappabilities are left as NONE
+ Filemap's refcounts are removed from the page
+ Refcount on the page is now (a) transient refcount
+ host punches hole to deallocate page
+ Since mappability was NONE, restore filemap's refcount
+ Refcount on the page is now (a) transient refcount (b) filemap's
refcount
+ Mappabilities are reset to GUEST for truncated range
+ Folio is removed from filemap
+ Refcount on the page is now (a) transient refcount
+ Callback remains registered so that when the transient refcounts are
dropped, cleanup can happen - this is where merging will happen
with 1G page support
+ host fallocate()s in the same address range
+ will get a new page
Though the host does manage to get a new page while the old one stays
around, I think this is working as intended, since the transient
refcounts are truly holding the old folio around. When the transient
refcounts go away, the old folio will still get cleaned up (with 1G page
support: merged and returned) to as expected. The new page will also be
freed at some point later.
If the userspace program decides to keep taking transient refcounts to hold
pages around, then the userspace program is truly leaking memory and it
shouldn't be guest_memfd's bug.
next prev parent reply other threads:[~2025-02-05 4:32 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-17 16:29 [RFC PATCH v5 00/15] KVM: Restricted mapping of guest_memfd at the host and arm64 support Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 01/15] mm: Consolidate freeing of typed folios on final folio_put() Fuad Tabba
2025-01-17 22:05 ` Elliot Berman
2025-01-19 14:39 ` Fuad Tabba
2025-01-20 10:39 ` David Hildenbrand
2025-01-20 10:50 ` Fuad Tabba
2025-01-20 10:39 ` David Hildenbrand
2025-01-20 10:43 ` Fuad Tabba
2025-01-20 10:43 ` Vlastimil Babka
2025-01-20 11:12 ` Vlastimil Babka
2025-01-20 11:28 ` David Hildenbrand
2025-01-17 16:29 ` [RFC PATCH v5 02/15] KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes Fuad Tabba
2025-01-24 4:25 ` Gavin Shan
2025-01-29 10:12 ` Fuad Tabba
2025-02-11 15:58 ` Ackerley Tng
2025-01-17 16:29 ` [RFC PATCH v5 03/15] KVM: guest_memfd: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 04/15] KVM: guest_memfd: Track mappability within a struct kvm_gmem_private Fuad Tabba
2025-01-24 5:31 ` Gavin Shan
2025-01-29 10:15 ` Fuad Tabba
2025-02-26 22:29 ` Ackerley Tng
2025-01-17 16:29 ` [RFC PATCH v5 05/15] KVM: guest_memfd: Folio mappability states and functions that manage their transition Fuad Tabba
2025-01-20 10:30 ` Kirill A. Shutemov
2025-01-20 10:40 ` Fuad Tabba
2025-02-06 3:14 ` Ackerley Tng
2025-02-06 9:45 ` Fuad Tabba
2025-02-19 23:33 ` Ackerley Tng
2025-02-20 9:26 ` Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 06/15] KVM: guest_memfd: Handle final folio_put() of guestmem pages Fuad Tabba
2025-01-20 11:37 ` Vlastimil Babka
2025-01-20 12:14 ` Fuad Tabba
2025-01-22 22:24 ` Ackerley Tng
2025-01-23 11:00 ` Fuad Tabba
2025-02-06 3:18 ` Ackerley Tng
2025-02-06 3:28 ` Ackerley Tng
2025-02-06 9:47 ` Fuad Tabba
2025-01-30 14:23 ` Fuad Tabba
2025-01-22 22:16 ` Ackerley Tng
2025-01-23 9:50 ` Fuad Tabba
2025-02-05 1:28 ` Vishal Annapurve
2025-02-05 4:31 ` Ackerley Tng [this message]
2025-02-05 5:58 ` Vishal Annapurve
2025-02-05 0:42 ` Vishal Annapurve
2025-02-05 10:06 ` Fuad Tabba
2025-02-05 17:39 ` Vishal Annapurve
2025-02-05 17:42 ` Vishal Annapurve
2025-02-07 10:46 ` Ackerley Tng
2025-02-10 16:04 ` Fuad Tabba
2025-02-05 0:51 ` Vishal Annapurve
2025-02-05 10:07 ` Fuad Tabba
2025-02-06 3:37 ` Ackerley Tng
2025-02-06 9:49 ` Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 07/15] KVM: guest_memfd: Allow host to mmap guest_memfd() pages when shared Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 08/15] KVM: guest_memfd: Add guest_memfd support to kvm_(read|/write)_guest_page() Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 09/15] KVM: guest_memfd: Add KVM capability to check if guest_memfd is host mappable Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 10/15] KVM: guest_memfd: Add a guest_memfd() flag to initialize it as mappable Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 11/15] KVM: guest_memfd: selftests: guest_memfd mmap() test when mapping is allowed Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 12/15] KVM: arm64: Skip VMA checks for slots without userspace address Fuad Tabba
2025-01-17 16:29 ` [RFC PATCH v5 13/15] KVM: arm64: Refactor user_mem_abort() calculation of force_pte Fuad Tabba
2025-01-17 16:30 ` [RFC PATCH v5 14/15] KVM: arm64: Handle guest_memfd()-backed guest page faults Fuad Tabba
2025-01-17 16:30 ` [RFC PATCH v5 15/15] KVM: arm64: Enable guest_memfd private memory when pKVM is enabled Fuad Tabba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=diqzldulovuu.fsf@ackerleytng-ctop-specialist.c.googlers.com \
--to=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=amoorthy@google.com \
--cc=anup@brainfault.org \
--cc=aou@eecs.berkeley.edu \
--cc=brauner@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=chao.p.peng@linux.intel.com \
--cc=chenhuacai@kernel.org \
--cc=david@redhat.com \
--cc=dmatlack@google.com \
--cc=fvdl@google.com \
--cc=hch@infradead.org \
--cc=hughd@google.com \
--cc=isaku.yamahata@gmail.com \
--cc=isaku.yamahata@intel.com \
--cc=james.morse@arm.com \
--cc=jarkko@kernel.org \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=jthoughton@google.com \
--cc=keirf@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=liam.merwick@oracle.com \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mail@maciej.szmigiero.name \
--cc=maz@kernel.org \
--cc=mic@digikod.net \
--cc=michael.roth@amd.com \
--cc=mpe@ellerman.id.au \
--cc=oliver.upton@linux.dev \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=pbonzini@redhat.com \
--cc=qperret@google.com \
--cc=quic_cvanscha@quicinc.com \
--cc=quic_eberman@quicinc.com \
--cc=quic_mnalajal@quicinc.com \
--cc=quic_pderrin@quicinc.com \
--cc=quic_pheragu@quicinc.com \
--cc=quic_svaddagi@quicinc.com \
--cc=quic_tsoni@quicinc.com \
--cc=rientjes@google.com \
--cc=roypat@amazon.co.uk \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=steven.price@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=wei.w.wang@intel.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=xiaoyao.li@intel.com \
--cc=yilun.xu@intel.com \
--cc=yu.c.zhang@linux.intel.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox