From: Sean Christopherson <seanjc@google.com>
To: Ackerley Tng <ackerleytng@google.com>
Cc: quic_eberman@quicinc.com, akpm@linux-foundation.org,
david@redhat.com, kvm@vger.kernel.org,
linux-arm-msm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-mm@kvack.org,
maz@kernel.org, pbonzini@redhat.com, shuah@kernel.org,
tabba@google.com, willy@infradead.org, vannapurve@google.com,
hch@infradead.org, jgg@nvidia.com, rientjes@google.com,
jhubbard@nvidia.com, qperret@google.com, smostafa@google.com,
fvdl@google.com, hughd@google.com
Subject: Re: [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning
Date: Tue, 16 Jul 2024 09:03:00 -0700 [thread overview]
Message-ID: <ZpaZtPKrXolEduZH@google.com> (raw)
In-Reply-To: <20240712232937.2861788-1-ackerleytng@google.com>
Thanks for doing the dirty work!
On Fri, Jul 12, 2024, Ackerley Tng wrote:
> Here’s an update from the Linux MM Alignment Session on July 10 2024, 9-10am
> PDT:
>
> The current direction is:
>
> + Allow mmap() of ranges that cover both shared and private memory, but disallow
> faulting in of private pages
> + On access to private pages, userspace will get some error, perhaps SIGBUS
> + On shared to private conversions, unmap the page and decrease refcounts
Note, I would strike the "decrease refcounts" part, as putting references is a
natural consequence of unmapping memory, not an explicit action guest_memfd will
take when converting from shared=>private.
And more importantly, guest_memfd will wait for the refcount to hit zero (or
whatever the baseline refcount is).
> + To support huge pages, guest_memfd will take ownership of the hugepages, and
> provide interested parties (userspace, KVM, iommu) with pages to be used.
> + guest_memfd will track usage of (sub)pages, for both private and shared
> memory
> + Pages will be broken into smaller (probably 4K) chunks at creation time to
> simplify implementation (as opposed to splitting at runtime when private to
> shared conversion is requested by the guest)
FWIW, I doubt we'll ever release a version with mmap()+guest_memfd support that
shatters pages at creation. I can see it being an intermediate step, e.g. to
prove correctness and provide a bisection point, but shattering hugepages at
creation would effectively make hugepage support useless.
I don't think we need to sort this out now though, as when the shattering (and
potential reconstituion) occurs doesn't affect the overall direction in any way
(AFAIK). I'm chiming in purely to stave off complaints that this would break
hugepage support :-)
> + Core MM infrastructure will still be used to track page table mappings in
> mapcounts and other references (refcounts) per subpage
> + HugeTLB vmemmap Optimization (HVO) is lost when pages are broken up - to
> be optimized later. Suggestions:
> + Use a tracking data structure other than struct page
> + Remove the memory for struct pages backing private memory from the
> vmemmap, and re-populate the vmemmap on conversion from private to
> shared
> + Implementation pointers for huge page support
> + Consensus was that getting core MM to do tracking seems wrong
> + Maintaining special page refcounts for guest_memfd pages is difficult to
> get working and requires weird special casing in many places. This was
> tried for FS DAX pages and did not work out: [1]
>
> + Implementation suggestion: use infrastructure similar to what ZONE_DEVICE
> uses, to provide the huge page to interested parties
> + TBD: how to actually get huge pages into guest_memfd
> + TBD: how to provide/convert the huge pages to ZONE_DEVICE
> + Perhaps reserve them at boot time like in HugeTLB
>
> + Line of sight to compaction/migration:
> + Compaction here means making memory contiguous
> + Compaction/migration scope:
> + In scope for 4K pages
> + Out of scope for 1G pages and anything managed through ZONE_DEVICE
> + Out of scope for an initial implementation
> + Ideas for future implementations
> + Reuse the non-LRU page migration framework as used by memory balloning
> + Have userspace drive compaction/migration via ioctls
> + Having line of sight to optimizing lost HVO means avoiding being locked
> in to any implementation requiring struct pages
> + Without struct pages, it is hard to reuse core MM’s
> compaction/migration infrastructure
>
> + Discuss more details at LPC in Sep 2024, such as how to use huge pages,
> shared/private conversion, huge page splitting
>
> This addresses the prerequisites set out by Fuad and Elliott at the beginning of
> the session, which were:
>
> 1. Non-destructive shared/private conversion
> + Through having guest_memfd manage and track both shared/private memory
> 2. Huge page support with the option of converting individual subpages
> + Splitting of pages will be managed by guest_memfd
> 3. Line of sight to compaction/migration of private memory
> + Possibly driven by userspace using guest_memfd ioctls
> 4. Loading binaries into guest (private) memory before VM starts
> + This was identified as a special case of (1.) above
> 5. Non-protected guests in pKVM
> + Not discussed during session, but this is a goal of guest_memfd, for all VM
> types [2]
>
> David Hildenbrand summarized this during the meeting at t=47m25s [3].
>
> [1]: https://lore.kernel.org/linux-mm/cover.66009f59a7fe77320d413011386c3ae5c2ee82eb.1719386613.git-series.apopple@nvidia.com/
> [2]: https://lore.kernel.org/lkml/ZnRMn1ObU8TFrms3@google.com/
> [3]: https://drive.google.com/file/d/17lruFrde2XWs6B1jaTrAy9gjv08FnJ45/view?t=47m25s&resourcekey=0-LiteoxLd5f4fKoPRMjMTOw
next prev parent reply other threads:[~2024-07-16 16:03 UTC|newest]
Thread overview: 70+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-19 0:05 Elliot Berman
2024-06-19 0:05 ` [PATCH RFC 1/5] mm/gup: Move GUP_PIN_COUNTING_BIAS to page_ref.h Elliot Berman
2024-06-19 0:05 ` [PATCH RFC 2/5] mm/gup: Add an option for obtaining an exclusive pin Elliot Berman
2024-06-19 0:05 ` [PATCH RFC 3/5] mm/gup: Add support for re-pinning a normal pinned page as exclusive Elliot Berman
2024-06-19 0:05 ` [PATCH RFC 4/5] mm/gup-test: Verify exclusive pinned Elliot Berman
2024-06-19 0:05 ` [PATCH RFC 5/5] mm/gup_test: Verify GUP grabs same pages twice Elliot Berman
2024-06-19 0:11 ` [PATCH RFC 0/5] mm/gup: Introduce exclusive GUP pinning Elliot Berman
2024-06-19 2:44 ` John Hubbard
2024-06-19 7:37 ` David Hildenbrand
2024-06-19 9:11 ` Fuad Tabba
2024-06-19 11:51 ` Jason Gunthorpe
2024-06-19 12:01 ` Fuad Tabba
2024-06-19 12:42 ` Jason Gunthorpe
2024-06-20 15:37 ` Sean Christopherson
2024-06-21 8:23 ` Fuad Tabba
2024-06-21 8:43 ` David Hildenbrand
2024-06-21 8:54 ` Fuad Tabba
2024-06-21 9:10 ` David Hildenbrand
2024-06-21 10:16 ` Fuad Tabba
2024-06-21 16:54 ` Elliot Berman
2024-06-24 19:03 ` Sean Christopherson
2024-06-24 21:50 ` David Rientjes
2024-06-26 3:19 ` Vishal Annapurve
2024-06-26 5:20 ` Pankaj Gupta
2024-06-19 12:17 ` David Hildenbrand
2024-06-20 4:11 ` Christoph Hellwig
2024-06-20 8:32 ` Fuad Tabba
2024-06-20 13:55 ` Jason Gunthorpe
2024-06-20 14:01 ` David Hildenbrand
2024-06-20 14:29 ` Jason Gunthorpe
2024-06-20 14:45 ` David Hildenbrand
2024-06-20 16:04 ` Sean Christopherson
2024-06-20 18:56 ` David Hildenbrand
2024-06-20 16:36 ` Jason Gunthorpe
2024-06-20 18:53 ` David Hildenbrand
2024-06-20 20:30 ` Sean Christopherson
2024-06-20 20:47 ` David Hildenbrand
2024-06-20 22:32 ` Sean Christopherson
2024-06-20 23:00 ` Jason Gunthorpe
2024-06-20 23:11 ` Jason Gunthorpe
2024-06-20 23:54 ` Sean Christopherson
2024-06-21 7:43 ` David Hildenbrand
2024-06-21 12:39 ` Jason Gunthorpe
2024-06-20 23:08 ` Jason Gunthorpe
2024-06-20 22:47 ` Elliot Berman
2024-06-20 23:18 ` Jason Gunthorpe
2024-06-21 7:32 ` Quentin Perret
2024-06-21 8:02 ` David Hildenbrand
2024-06-21 9:25 ` Quentin Perret
2024-06-21 9:37 ` David Hildenbrand
2024-06-21 16:48 ` Elliot Berman
2024-06-21 12:26 ` Jason Gunthorpe
2024-06-19 12:16 ` David Hildenbrand
2024-06-20 8:47 ` Fuad Tabba
2024-06-20 9:00 ` David Hildenbrand
2024-06-20 14:01 ` Jason Gunthorpe
2024-06-20 13:08 ` Mostafa Saleh
2024-06-20 14:14 ` David Hildenbrand
2024-06-20 14:34 ` Jason Gunthorpe
2024-08-02 8:26 ` Tian, Kevin
2024-08-02 11:22 ` Jason Gunthorpe
2024-08-05 2:24 ` Tian, Kevin
2024-08-05 23:22 ` Jason Gunthorpe
2024-08-06 0:50 ` Tian, Kevin
2024-06-20 16:33 ` Mostafa Saleh
2024-07-12 23:29 ` Ackerley Tng
2024-07-16 16:03 ` Sean Christopherson [this message]
2024-07-16 16:08 ` Jason Gunthorpe
2024-07-16 17:34 ` Sean Christopherson
2024-07-16 20:11 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZpaZtPKrXolEduZH@google.com \
--to=seanjc@google.com \
--cc=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=fvdl@google.com \
--cc=hch@infradead.org \
--cc=hughd@google.com \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=maz@kernel.org \
--cc=pbonzini@redhat.com \
--cc=qperret@google.com \
--cc=quic_eberman@quicinc.com \
--cc=rientjes@google.com \
--cc=shuah@kernel.org \
--cc=smostafa@google.com \
--cc=tabba@google.com \
--cc=vannapurve@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox