From: Ryan Roberts <ryan.roberts@arm.com>
To: David Hildenbrand <david@redhat.com>, linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Hugh Dickins <hughd@google.com>,
Yin Fengwei <fengwei.yin@intel.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
Muchun Song <muchun.song@linux.dev>, Peter Xu <peterx@redhat.com>
Subject: Re: [PATCH RFC 00/39] mm/rmap: interface overhaul
Date: Tue, 5 Dec 2023 12:29:59 +0000 [thread overview]
Message-ID: <33fb5a78-1432-4b43-af62-afeac84cf1ca@arm.com> (raw)
In-Reply-To: <993ea322-8cdb-4ab1-84d3-0a1cb40049c9@arm.com>
On 04/12/2023 19:53, Ryan Roberts wrote:
> On 04/12/2023 14:21, David Hildenbrand wrote:
>> Baed on mm-stable from a couple of days.
>>
>> This series proposes an overhaul to our rmap interface, to get rid of the
>> "bool compound" / RMAP_COMPOUND parameter with the goal of making the
>> interface less error prone, more future proof, and more natural to extend
>> to "batching". Also, this converts the interface to always consume
>> folio+subpage, which speeds up operations on large folios.
>>
>> Further, this series adds PTE-batching variants for 4 rmap functions,
>> whereby only folio_add_anon_rmap_ptes() is used for batching in this series
>> when PTE-remapping a PMD-mapped THP.
>
> I certainly support the objective you have here; making the interfaces clearer,
> more consistent and more amenable to batching. I'll try to find some time this
> week to review.
>
>>
>> Ryan has series where we would make use of folio_remove_rmap_ptes() [1]
>> -- he carries his own batching variant right now -- and
>> folio_try_dup_anon_rmap_ptes()/folio_dup_file_rmap_ptes() [2].
>
> Note that the contpte series at [2] has a new patch in v3 (patch 2), which could
> benefit from folio_remove_rmap_ptes() or equivalent. My plan was to revive [1]
> on top of [2] once it is merged.
>
>>
>> There is some overlap with both series (and some other work, like
>> multi-size THP [3]), so that will need some coordination, and likely a
>> stepwise inclusion.
>
> Selfishly, I'd really like to get my stuff merged as soon as there is no
> technical reason not to. I'd prefer not to add this as a dependency if we can
> help it.
>
>>
>> I got that started [4], but it made sense to show the whole picture. The
>> patches of [4] are contained in here, with one additional patch added
>> ("mm/rmap: introduce and use hugetlb_try_share_anon_rmap()") and some
>> slight patch description changes.
>>
>> In general, RMAP batching is an important optimization for PTE-mapped
>> THP, especially once we want to move towards a total mapcount or further,
>> as shown with my WIP patches on "mapped shared vs. mapped exclusively" [5].
>>
>> The rmap batching part of [5] is also contained here in a slightly reworked
>> fork [and I found a bug du to the "compound" parameter handling in these
>> patches that should be fixed here :) ].
>>
>> This series performs a lot of folio conversion, that could be separated
>> if there is a good reason. Most of the added LOC in the diff are only due
>> to documentation.
>>
>> As we're moving to a pte/pmd interface where we clearly express the
>> mapping granularity we are dealing with, we first get the remainder of
>> hugetlb out of the way, as it is special and expected to remain special: it
>> treats everything as a "single logical PTE" and only currently allows
>> entire mappings.
>>
>> Even if we'd ever support partial mappings, I strongly
>> assume the interface and implementation will still differ heavily:
>> hopefull we can avoid working on subpages/subpage mapcounts completely and
>> only add a "count" parameter for them to enable batching.
>>
>>
>> New (extended) hugetlb interface that operate on entire folio:
>> * hugetlb_add_new_anon_rmap() -> Already existed
>> * hugetlb_add_anon_rmap() -> Already existed
>> * hugetlb_try_dup_anon_rmap()
>> * hugetlb_try_share_anon_rmap()
>> * hugetlb_add_file_rmap()
>> * hugetlb_remove_rmap()
>>
>> New "ordinary" interface for small folios / THP::
>> * folio_add_new_anon_rmap() -> Already existed
>> * folio_add_anon_rmap_[pte|ptes|pmd]()
>> * folio_try_dup_anon_rmap_[pte|ptes|pmd]()
>> * folio_try_share_anon_rmap_[pte|pmd]()
>> * folio_add_file_rmap_[pte|ptes|pmd]()
>> * folio_dup_file_rmap_[pte|ptes|pmd]()
>> * folio_remove_rmap_[pte|ptes|pmd]()
>
> I'm not sure if there are official guidelines, but personally if we are
> reworking the API, I'd take the opportunity to move "rmap" to the front of the
> name, rather than having it burried in the middle as it is for some of these:
>
> rmap_hugetlb_*()
>
> rmap_folio_*()
In fact, I'd be inclined to drop the "folio" to shorten the name; everything is
a folio, so its not telling us much. e.g.:
New (extended) hugetlb interface that operate on entire folio:
* rmap_hugetlb_add_new_anon() -> Already existed
* rmap_hugetlb_add_anon() -> Already existed
* rmap_hugetlb_try_dup_anon()
* rmap_hugetlb_try_share_anon()
* rmap_hugetlb_add_file()
* rmap_hugetlb_remove()
New "ordinary" interface for small folios / THP::
* rmap_add_new_anon() -> Already existed
* rmap_add_anon_[pte|ptes|pmd]()
* rmap_try_dup_anon_[pte|ptes|pmd]()
* rmap_try_share_anon_[pte|pmd]()
* rmap_add_file_[pte|ptes|pmd]()
* rmap_dup_file_[pte|ptes|pmd]()
* rmap_remove_[pte|ptes|pmd]()
>
> I guess reading the patches will tell me, but what's the point of "ptes"? Surely
> you're either mapping at pte or pmd level, and the number of pages is determined
> by the folio size? (or presumably nr param passed in)
>
> Thanks,
> Ryan
>
>>
>> folio_add_new_anon_rmap() will always map at the biggest granularity
>> possible (currently, a single PMD to cover a PMD-sized THP). Could be
>> extended if ever required.
>>
>> In the future, we might want "_pud" variants and eventually "_pmds" variants
>> for batching. Further, if hugepd is ever a thing outside hugetlb code,
>> we might want some variants for that. All stuff for the distant future.
>>
>>
>> I ran some simple microbenchmarks from [5] on an Intel(R) Xeon(R) Silver
>> 4210R: munmap(), fork(), cow, MADV_DONTNEED on each PTE ... and PTE
>> remapping PMD-mapped THPs on 1 GiB of memory.
>>
>> For small folios, there is barely a change (< 1 % performance improvement),
>> whereby fork() still stands out with 0.74% performance improvement, but
>> it might be just some noise. Folio optimizations don't help that much
>> with small folios.
>>
>> For PTE-mapped THP:
>> * PTE-remapping a PMD-mapped THP is more than 10% faster.
>> -> RMAP batching
>> * fork() is more than 4% faster.
>> -> folio conversion
>> * MADV_DONTNEED is 2% faster
>> -> folio conversion
>> * COW by writing only a single byte on a COW-shared PTE
>> -> folio conversion
>> * munmap() is only slightly faster (< 1%).
>>
>> [1] https://lkml.kernel.org/r/20230810103332.3062143-1-ryan.roberts@arm.com
>> [2] https://lkml.kernel.org/r/20231204105440.61448-1-ryan.roberts@arm.com
>> [3] https://lkml.kernel.org/r/20231204102027.57185-1-ryan.roberts@arm.com
>> [4] https://lkml.kernel.org/r/20231128145205.215026-1-david@redhat.com
>> [5] https://lkml.kernel.org/r/20231124132626.235350-1-david@redhat.com
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>> Cc: Yin Fengwei <fengwei.yin@intel.com>
>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>> Cc: Muchun Song <muchun.song@linux.dev>
>> Cc: Peter Xu <peterx@redhat.com>
>>
>> David Hildenbrand (39):
>> mm/rmap: rename hugepage_add* to hugetlb_add*
>> mm/rmap: introduce and use hugetlb_remove_rmap()
>> mm/rmap: introduce and use hugetlb_add_file_rmap()
>> mm/rmap: introduce and use hugetlb_try_dup_anon_rmap()
>> mm/rmap: introduce and use hugetlb_try_share_anon_rmap()
>> mm/rmap: add hugetlb sanity checks
>> mm/rmap: convert folio_add_file_rmap_range() into
>> folio_add_file_rmap_[pte|ptes|pmd]()
>> mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]()
>> mm/huge_memory: page_add_file_rmap() -> folio_add_file_rmap_pmd()
>> mm/migrate: page_add_file_rmap() -> folio_add_file_rmap_pte()
>> mm/userfaultfd: page_add_file_rmap() -> folio_add_file_rmap_pte()
>> mm/rmap: remove page_add_file_rmap()
>> mm/rmap: factor out adding folio mappings into __folio_add_rmap()
>> mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]()
>> mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()
>> mm/huge_memory: page_add_anon_rmap() -> folio_add_anon_rmap_pmd()
>> mm/migrate: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
>> mm/ksm: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
>> mm/swapfile: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
>> mm/memory: page_add_anon_rmap() -> folio_add_anon_rmap_pte()
>> mm/rmap: remove page_add_anon_rmap()
>> mm/rmap: remove RMAP_COMPOUND
>> mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]()
>> kernel/events/uprobes: page_remove_rmap() -> folio_remove_rmap_pte()
>> mm/huge_memory: page_remove_rmap() -> folio_remove_rmap_pmd()
>> mm/khugepaged: page_remove_rmap() -> folio_remove_rmap_pte()
>> mm/ksm: page_remove_rmap() -> folio_remove_rmap_pte()
>> mm/memory: page_remove_rmap() -> folio_remove_rmap_pte()
>> mm/migrate_device: page_remove_rmap() -> folio_remove_rmap_pte()
>> mm/rmap: page_remove_rmap() -> folio_remove_rmap_pte()
>> Documentation: stop referring to page_remove_rmap()
>> mm/rmap: remove page_remove_rmap()
>> mm/rmap: convert page_dup_file_rmap() to
>> folio_dup_file_rmap_[pte|ptes|pmd]()
>> mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]()
>> mm/huge_memory: page_try_dup_anon_rmap() ->
>> folio_try_dup_anon_rmap_pmd()
>> mm/memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pte()
>> mm/rmap: remove page_try_dup_anon_rmap()
>> mm: convert page_try_share_anon_rmap() to
>> folio_try_share_anon_rmap_[pte|pmd]()
>> mm/rmap: rename COMPOUND_MAPPED to ENTIRELY_MAPPED
>>
>> Documentation/mm/transhuge.rst | 4 +-
>> Documentation/mm/unevictable-lru.rst | 4 +-
>> include/linux/mm.h | 6 +-
>> include/linux/rmap.h | 380 +++++++++++++++++++-----
>> kernel/events/uprobes.c | 2 +-
>> mm/gup.c | 2 +-
>> mm/huge_memory.c | 85 +++---
>> mm/hugetlb.c | 21 +-
>> mm/internal.h | 12 +-
>> mm/khugepaged.c | 17 +-
>> mm/ksm.c | 15 +-
>> mm/memory-failure.c | 4 +-
>> mm/memory.c | 60 ++--
>> mm/migrate.c | 12 +-
>> mm/migrate_device.c | 41 +--
>> mm/mmu_gather.c | 2 +-
>> mm/rmap.c | 422 ++++++++++++++++-----------
>> mm/swapfile.c | 2 +-
>> mm/userfaultfd.c | 2 +-
>> 19 files changed, 709 insertions(+), 384 deletions(-)
>>
>
prev parent reply other threads:[~2023-12-05 12:30 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-04 14:21 David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 01/39] mm/rmap: rename hugepage_add* to hugetlb_add* David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 02/39] mm/rmap: introduce and use hugetlb_remove_rmap() David Hildenbrand
2023-12-06 1:22 ` Yin Fengwei
2023-12-06 12:11 ` David Hildenbrand
2023-12-07 0:56 ` Yin Fengwei
2023-12-04 14:21 ` [PATCH RFC 03/39] mm/rmap: introduce and use hugetlb_add_file_rmap() David Hildenbrand
2023-12-06 1:22 ` Yin Fengwei
2023-12-04 14:21 ` [PATCH RFC 04/39] mm/rmap: introduce and use hugetlb_try_dup_anon_rmap() David Hildenbrand
2023-12-06 1:22 ` Yin Fengwei
2023-12-04 14:21 ` [PATCH RFC 05/39] mm/rmap: introduce and use hugetlb_try_share_anon_rmap() David Hildenbrand
2023-12-06 1:23 ` Yin Fengwei
2023-12-04 14:21 ` [PATCH RFC 06/39] mm/rmap: add hugetlb sanity checks David Hildenbrand
2023-12-06 1:23 ` Yin Fengwei
2023-12-04 14:21 ` [PATCH RFC 07/39] mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]() David Hildenbrand
2023-12-05 12:04 ` Ryan Roberts
2023-12-05 12:25 ` David Hildenbrand
2023-12-06 1:30 ` Yin Fengwei
2023-12-06 9:17 ` David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 08/39] mm/memory: page_add_file_rmap() -> folio_add_file_rmap_[pte|pmd]() David Hildenbrand
2023-12-08 1:40 ` Yin, Fengwei
2023-12-04 14:21 ` [PATCH RFC 09/39] mm/huge_memory: page_add_file_rmap() -> folio_add_file_rmap_pmd() David Hildenbrand
2023-12-08 1:41 ` Yin, Fengwei
2023-12-04 14:21 ` [PATCH RFC 10/39] mm/migrate: page_add_file_rmap() -> folio_add_file_rmap_pte() David Hildenbrand
2023-12-08 1:42 ` Yin, Fengwei
2023-12-04 14:21 ` [PATCH RFC 11/39] mm/userfaultfd: " David Hildenbrand
2023-12-08 1:42 ` Yin, Fengwei
2023-12-04 14:21 ` [PATCH RFC 12/39] mm/rmap: remove page_add_file_rmap() David Hildenbrand
2023-12-08 1:42 ` Yin, Fengwei
2023-12-04 14:21 ` [PATCH RFC 13/39] mm/rmap: factor out adding folio mappings into __folio_add_rmap() David Hildenbrand
2023-12-08 1:44 ` Yin, Fengwei
2023-12-04 14:21 ` [PATCH RFC 14/39] mm/rmap: introduce folio_add_anon_rmap_[pte|ptes|pmd]() David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 15/39] mm/huge_memory: batch rmap operations in __split_huge_pmd_locked() David Hildenbrand
2023-12-05 12:22 ` Ryan Roberts
2023-12-05 12:26 ` David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 16/39] mm/huge_memory: page_add_anon_rmap() -> folio_add_anon_rmap_pmd() David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 17/39] mm/migrate: page_add_anon_rmap() -> folio_add_anon_rmap_pte() David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 18/39] mm/ksm: " David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 19/39] mm/swapfile: " David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 20/39] mm/memory: " David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 21/39] mm/rmap: remove page_add_anon_rmap() David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 22/39] mm/rmap: remove RMAP_COMPOUND David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 23/39] mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]() David Hildenbrand
2023-12-05 12:52 ` Ryan Roberts
2023-12-05 13:09 ` David Hildenbrand
2023-12-05 13:37 ` Ryan Roberts
2023-12-04 14:21 ` [PATCH RFC 24/39] kernel/events/uprobes: page_remove_rmap() -> folio_remove_rmap_pte() David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 25/39] mm/huge_memory: page_remove_rmap() -> folio_remove_rmap_pmd() David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 26/39] mm/khugepaged: page_remove_rmap() -> folio_remove_rmap_pte() David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 27/39] mm/ksm: " David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 28/39] mm/memory: " David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 29/39] mm/migrate_device: " David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 30/39] mm/rmap: " David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 31/39] Documentation: stop referring to page_remove_rmap() David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 32/39] mm/rmap: remove page_remove_rmap() David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 33/39] mm/rmap: convert page_dup_file_rmap() to folio_dup_file_rmap_[pte|ptes|pmd]() David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 34/39] mm/rmap: introduce folio_try_dup_anon_rmap_[pte|ptes|pmd]() David Hildenbrand
2023-12-04 17:59 ` David Hildenbrand
2023-12-05 13:11 ` David Hildenbrand
2023-12-05 13:12 ` Ryan Roberts
2023-12-05 13:17 ` David Hildenbrand
2023-12-05 13:18 ` David Hildenbrand
2023-12-05 13:40 ` Ryan Roberts
2023-12-05 13:50 ` David Hildenbrand
2023-12-05 14:02 ` Ryan Roberts
2023-12-05 14:12 ` David Hildenbrand
2023-12-05 13:32 ` David Hildenbrand
2023-12-05 13:40 ` Ryan Roberts
2023-12-04 14:21 ` [PATCH RFC 35/39] mm/huge_memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pmd() David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 36/39] mm/memory: page_try_dup_anon_rmap() -> folio_try_dup_anon_rmap_pte() David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 37/39] mm/rmap: remove page_try_dup_anon_rmap() David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 38/39] mm: convert page_try_share_anon_rmap() to folio_try_share_anon_rmap_[pte|pmd]() David Hildenbrand
2023-12-04 14:21 ` [PATCH RFC 39/39] mm/rmap: rename COMPOUND_MAPPED to ENTIRELY_MAPPED David Hildenbrand
2023-12-04 19:53 ` [PATCH RFC 00/39] mm/rmap: interface overhaul Ryan Roberts
2023-12-05 9:56 ` David Hildenbrand
2023-12-05 13:31 ` Ryan Roberts
2023-12-05 13:39 ` David Hildenbrand
2023-12-05 13:49 ` Ryan Roberts
2023-12-05 13:55 ` David Hildenbrand
2023-12-08 11:24 ` David Hildenbrand
2023-12-08 11:38 ` Ryan Roberts
2023-12-05 12:29 ` Ryan Roberts [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=33fb5a78-1432-4b43-af62-afeac84cf1ca@arm.com \
--to=ryan.roberts@arm.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=fengwei.yin@intel.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=muchun.song@linux.dev \
--cc=peterx@redhat.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox