* [PATCH v2 0/2] mm/page_alloc: rework conditional splitting >= pageblock_order pages when freeing
@ 2024-12-10 10:29 David Hildenbrand
2024-12-10 10:29 ` [PATCH v2 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate() David Hildenbrand
2024-12-10 10:29 ` [PATCH v2 2/2] mm/page_isolation: fixup isolate_single_pageblock() comment regarding splitting free pages David Hildenbrand
0 siblings, 2 replies; 6+ messages in thread
From: David Hildenbrand @ 2024-12-10 10:29 UTC (permalink / raw)
To: linux-kernel
Cc: linux-mm, David Hildenbrand, Andrew Morton, Zi Yan,
Vlastimil Babka, Yu Zhao
Now based on [1].
Looking into recent alloc_contig_range(__GFP_COMP) support, I realized that
we now unconditionally split up high-order pages on the page freeing path
to free in pageblock granularity, just to immediately let the buddy merge
them again in the common case.
Let's optimize for the common case (all pageblock migratetypes match), and
enable it only in configs where this is strictly required. Further, add
some comments that explain why this special casing is required at all.
Alongside, a fix for a stale comment in page isolation code.
Tested with runtime allocation of gigantic pages and virtio-mem.
v1 -> v2:
* "mm/page_alloc: conditionally split > pageblock_order pages in
free_one_page() and move_freepages_block_isolate()"
-> Similarly avoid pfn_to_page() on something that might not be a valid pfn
-> Add a comment regarding using "zone->nr_isolate_pageblock" in the
future to the patch description
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
[1] https://lkml.kernel.org/r/20241210093437.174413-1-david@redhat.com
David Hildenbrand (2):
mm/page_alloc: conditionally split > pageblock_order pages in
free_one_page() and move_freepages_block_isolate()
mm/page_isolation: fixup isolate_single_pageblock() comment regarding
splitting free pages
mm/page_alloc.c | 73 ++++++++++++++++++++++++++++++++++++---------
mm/page_isolation.c | 9 +++---
2 files changed, 63 insertions(+), 19 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH v2 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate() 2024-12-10 10:29 [PATCH v2 0/2] mm/page_alloc: rework conditional splitting >= pageblock_order pages when freeing David Hildenbrand @ 2024-12-10 10:29 ` David Hildenbrand 2024-12-10 21:16 ` Johannes Weiner 2024-12-10 10:29 ` [PATCH v2 2/2] mm/page_isolation: fixup isolate_single_pageblock() comment regarding splitting free pages David Hildenbrand 1 sibling, 1 reply; 6+ messages in thread From: David Hildenbrand @ 2024-12-10 10:29 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, David Hildenbrand, Andrew Morton, Zi Yan, Vlastimil Babka, Yu Zhao Let's special-case for the common scenarios that: (a) We are freeing pages <= pageblock_order (b) We are freeing a page <= MAX_PAGE_ORDER and all pageblocks match (especially, no mixture of isolated and non-isolated pageblocks) When we encounter a > MAX_PAGE_ORDER page, it can only come from alloc_contig_range(), and we can process MAX_PAGE_ORDER chunks. When we encounter a >pageblock_order <= MAX_PAGE_ORDER page, check whether all pageblocks match, and if so (common case), don't split them up just for the buddy to merge them back. This makes sure that when we free MAX_PAGE_ORDER chunks to the buddy, for example during system startups, memory onlining, or when isolating consecutive pageblocks via alloc_contig_range()/memory offlining, that we don't unnecessarily split up what we'll immediately merge again, because the migratetypes match. Rename split_large_buddy() to __free_one_page_maybe_split(), to make it clearer what's happening, and handle in it only natural buddy orders, not the alloc_contig_range(__GFP_COMP) special case: handle that in free_one_page() only. In the future, we might want to assume that all pageblocks are equal if zone->nr_isolate_pageblock == 0; however, that will require some zone->nr_isolate_pageblock accounting changes, such that we are guaranteed to see zone->nr_isolate_pageblock != 0 when there is an isolated pageblock. Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Yu Zhao <yuzhao@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz Signed-off-by: David Hildenbrand <david@redhat.com> --- mm/page_alloc.c | 73 +++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 59 insertions(+), 14 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a52c6022c65cb..444e4bcb9c7c6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1225,27 +1225,53 @@ static void free_pcppages_bulk(struct zone *zone, int count, spin_unlock_irqrestore(&zone->lock, flags); } -/* Split a multi-block free page into its individual pageblocks. */ -static void split_large_buddy(struct zone *zone, struct page *page, - unsigned long pfn, int order, fpi_t fpi) +static bool pfnblock_migratetype_equal(unsigned long pfn, + unsigned long end_pfn, int mt) { - unsigned long end = pfn + (1 << order); + VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | end_pfn, pageblock_nr_pages)); + while (pfn != end_pfn) { + struct page *page = pfn_to_page(pfn); + + if (unlikely(mt != get_pfnblock_migratetype(page, pfn))) + return false; + pfn += pageblock_nr_pages; + } + return true; +} + +static void __free_one_page_maybe_split(struct zone *zone, struct page *page, + unsigned long pfn, int order, fpi_t fpi_flags) +{ + const unsigned long end_pfn = pfn + (1 << order); + int mt = get_pfnblock_migratetype(page, pfn); + + VM_WARN_ON_ONCE(order > MAX_PAGE_ORDER); VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order)); /* Caller removed page from freelist, buddy info cleared! */ VM_WARN_ON_ONCE(PageBuddy(page)); - if (order > pageblock_order) - order = pageblock_order; + /* + * With CONFIG_MEMORY_ISOLATION, we might be freeing MAX_ORDER_NR_PAGES + * pages that cover pageblocks with different migratetypes; for example + * only some migratetypes might be MIGRATE_ISOLATE. In that (unlikely) + * case, fallback to freeing individual pageblocks so they get put + * onto the right lists. + */ + if (!IS_ENABLED(CONFIG_MEMORY_ISOLATION) || + likely(order <= pageblock_order) || + pfnblock_migratetype_equal(pfn + pageblock_nr_pages, end_pfn, mt)) { + __free_one_page(page, pfn, zone, order, mt, fpi_flags); + return; + } do { - int mt = get_pfnblock_migratetype(page, pfn); - - __free_one_page(page, pfn, zone, order, mt, fpi); - pfn += 1 << order; - if (pfn == end) + __free_one_page(page, pfn, zone, pageblock_order, mt, fpi_flags); + pfn += pageblock_nr_pages; + if (pfn == end_pfn) break; page = pfn_to_page(pfn); + mt = get_pfnblock_migratetype(page, pfn); } while (1); } @@ -1256,7 +1282,26 @@ static void free_one_page(struct zone *zone, struct page *page, unsigned long flags; spin_lock_irqsave(&zone->lock, flags); - split_large_buddy(zone, page, pfn, order, fpi_flags); + if (likely(order <= MAX_PAGE_ORDER)) { + __free_one_page_maybe_split(zone, page, pfn, order, fpi_flags); + } else if (IS_ENABLED(CONFIG_CONTIG_ALLOC)) { + const unsigned long end_pfn = pfn + (1 << order); + + /* + * The only way we can end up with order > MAX_PAGE_ORDER is + * through alloc_contig_range(__GFP_COMP). + */ + do { + __free_one_page_maybe_split(zone, page, pfn, + MAX_PAGE_ORDER, fpi_flags); + pfn += MAX_ORDER_NR_PAGES; + if (pfn == end_pfn) + break; + page = pfn_to_page(pfn); + } while (1); + } else { + WARN_ON_ONCE(1); + } spin_unlock_irqrestore(&zone->lock, flags); __count_vm_events(PGFREE, 1 << order); @@ -1792,7 +1837,7 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page, del_page_from_free_list(buddy, zone, order, get_pfnblock_migratetype(buddy, pfn)); set_pageblock_migratetype(page, migratetype); - split_large_buddy(zone, buddy, pfn, order, FPI_NONE); + __free_one_page_maybe_split(zone, buddy, pfn, order, FPI_NONE); return true; } @@ -1803,7 +1848,7 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page, del_page_from_free_list(page, zone, order, get_pfnblock_migratetype(page, pfn)); set_pageblock_migratetype(page, migratetype); - split_large_buddy(zone, page, pfn, order, FPI_NONE); + __free_one_page_maybe_split(zone, page, pfn, order, FPI_NONE); return true; } move: -- 2.47.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate() 2024-12-10 10:29 ` [PATCH v2 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate() David Hildenbrand @ 2024-12-10 21:16 ` Johannes Weiner 2024-12-10 21:40 ` David Hildenbrand 0 siblings, 1 reply; 6+ messages in thread From: Johannes Weiner @ 2024-12-10 21:16 UTC (permalink / raw) To: David Hildenbrand Cc: linux-kernel, linux-mm, Andrew Morton, Zi Yan, Vlastimil Babka, Yu Zhao On Tue, Dec 10, 2024 at 11:29:52AM +0100, David Hildenbrand wrote: > @@ -1225,27 +1225,53 @@ static void free_pcppages_bulk(struct zone *zone, int count, > spin_unlock_irqrestore(&zone->lock, flags); > } > > -/* Split a multi-block free page into its individual pageblocks. */ > -static void split_large_buddy(struct zone *zone, struct page *page, > - unsigned long pfn, int order, fpi_t fpi) > +static bool pfnblock_migratetype_equal(unsigned long pfn, > + unsigned long end_pfn, int mt) > { > - unsigned long end = pfn + (1 << order); > + VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | end_pfn, pageblock_nr_pages)); > > + while (pfn != end_pfn) { > + struct page *page = pfn_to_page(pfn); > + > + if (unlikely(mt != get_pfnblock_migratetype(page, pfn))) > + return false; > + pfn += pageblock_nr_pages; > + } > + return true; > +} > + > +static void __free_one_page_maybe_split(struct zone *zone, struct page *page, > + unsigned long pfn, int order, fpi_t fpi_flags) > +{ > + const unsigned long end_pfn = pfn + (1 << order); > + int mt = get_pfnblock_migratetype(page, pfn); > + > + VM_WARN_ON_ONCE(order > MAX_PAGE_ORDER); > VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order)); > /* Caller removed page from freelist, buddy info cleared! */ > VM_WARN_ON_ONCE(PageBuddy(page)); > > - if (order > pageblock_order) > - order = pageblock_order; > + /* > + * With CONFIG_MEMORY_ISOLATION, we might be freeing MAX_ORDER_NR_PAGES > + * pages that cover pageblocks with different migratetypes; for example > + * only some migratetypes might be MIGRATE_ISOLATE. In that (unlikely) > + * case, fallback to freeing individual pageblocks so they get put > + * onto the right lists. > + */ > + if (!IS_ENABLED(CONFIG_MEMORY_ISOLATION) || > + likely(order <= pageblock_order) || > + pfnblock_migratetype_equal(pfn + pageblock_nr_pages, end_pfn, mt)) { > + __free_one_page(page, pfn, zone, order, mt, fpi_flags); > + return; > + } Ok, if memory isolation is disabled, we know the migratetypes are all matching up, and we can skip the check. However, if memory isolation is enabled, but this isn't move_freepages_block_isolate() calling, we still do the check unnecessarily and slow down the boot, no? Having a function guess the caller is a bit of an anti-pattern. The resulting code is hard to follow, and it's very easy to unintentionally burden some cases with unnecessary stuff. It's better to unshare paths until you don't need conditionals like this. In addition to the fastpath, I think you're also punishing the move_freepages_block_isolate() case. We *know* we just changed the type of one of the buddy's blocks, and yet you're still checking the the range again to decide whether to split. All of this to accomodate hugetlb, which might not even be compiled in? Grrrr. Like you, I was quite surprised to see that GFP_COMP patch in the buddy hotpath splitting *everything* into blocks - on the offchance that somebody might free a hugetlb page. Even if !CONFIG_HUGETLB. Just - what the hell. We shouldn't merge "I only care about my niche usecase at the expense of literally everybody else" patches like this. My vote is NAK on this patch, and a retro-NAK on the GFP_COMP patch. The buddy allocator operates on the assumption of MAX_PAGE_ORDER. If we support folios of a larger size sourced from other allocators, then it should be the folio layer discriminating. So if folio_put() detects this is a massive alloc_contig chunk, then it should take a different freeing path. Do the splitting in there, then pass valid chunks back to the buddy. That would keep the layering cleaner and the cornercase overhead out of the allocator fastpath. It would also avoid the pointless and fragile attempt at freeing a big, non-buddy chunk through the PCP. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate() 2024-12-10 21:16 ` Johannes Weiner @ 2024-12-10 21:40 ` David Hildenbrand 2024-12-11 13:04 ` David Hildenbrand 0 siblings, 1 reply; 6+ messages in thread From: David Hildenbrand @ 2024-12-10 21:40 UTC (permalink / raw) To: Johannes Weiner Cc: linux-kernel, linux-mm, Andrew Morton, Zi Yan, Vlastimil Babka, Yu Zhao On 10.12.24 22:16, Johannes Weiner wrote: > On Tue, Dec 10, 2024 at 11:29:52AM +0100, David Hildenbrand wrote: >> @@ -1225,27 +1225,53 @@ static void free_pcppages_bulk(struct zone *zone, int count, >> spin_unlock_irqrestore(&zone->lock, flags); >> } >> >> -/* Split a multi-block free page into its individual pageblocks. */ >> -static void split_large_buddy(struct zone *zone, struct page *page, >> - unsigned long pfn, int order, fpi_t fpi) >> +static bool pfnblock_migratetype_equal(unsigned long pfn, >> + unsigned long end_pfn, int mt) >> { >> - unsigned long end = pfn + (1 << order); >> + VM_WARN_ON_ONCE(!IS_ALIGNED(pfn | end_pfn, pageblock_nr_pages)); >> >> + while (pfn != end_pfn) { >> + struct page *page = pfn_to_page(pfn); >> + >> + if (unlikely(mt != get_pfnblock_migratetype(page, pfn))) >> + return false; >> + pfn += pageblock_nr_pages; >> + } >> + return true; >> +} >> + >> +static void __free_one_page_maybe_split(struct zone *zone, struct page *page, >> + unsigned long pfn, int order, fpi_t fpi_flags) >> +{ >> + const unsigned long end_pfn = pfn + (1 << order); >> + int mt = get_pfnblock_migratetype(page, pfn); >> + >> + VM_WARN_ON_ONCE(order > MAX_PAGE_ORDER); >> VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order)); >> /* Caller removed page from freelist, buddy info cleared! */ >> VM_WARN_ON_ONCE(PageBuddy(page)); >> >> - if (order > pageblock_order) >> - order = pageblock_order; >> + /* >> + * With CONFIG_MEMORY_ISOLATION, we might be freeing MAX_ORDER_NR_PAGES >> + * pages that cover pageblocks with different migratetypes; for example >> + * only some migratetypes might be MIGRATE_ISOLATE. In that (unlikely) >> + * case, fallback to freeing individual pageblocks so they get put >> + * onto the right lists. >> + */ >> + if (!IS_ENABLED(CONFIG_MEMORY_ISOLATION) || >> + likely(order <= pageblock_order) || >> + pfnblock_migratetype_equal(pfn + pageblock_nr_pages, end_pfn, mt)) { >> + __free_one_page(page, pfn, zone, order, mt, fpi_flags); >> + return; >> + } Hi Johannes, > > Ok, if memory isolation is disabled, we know the migratetypes are all > matching up, and we can skip the check. However, if memory isolation > is enabled, but this isn't move_freepages_block_isolate() calling, we > still do the check unnecessarily and slow down the boot, no? Yes, although it's on most machines one additional pageblock check (x86), on some a bit more (e.g., 3 on s390x). As mentioned: " In the future, we might want to assume that all pageblocks are equal if zone->nr_isolate_pageblock == 0; however, that will require some zone->nr_isolate_pageblock accounting changes, such that we are guaranteed to see zone->nr_isolate_pageblock != 0 when there is an isolated pageblock. " With that boot time wouldn't suffer in any significant way. > > Having a function guess the caller is a bit of an anti-pattern. The > resulting code is hard to follow, and it's very easy to > unintentionally burden some cases with unnecessary stuff. It's better > to unshare paths until you don't need conditionals like this. > > In addition to the fastpath, I think you're also punishing the > move_freepages_block_isolate() case. We *know* we just changed the > type of one of the buddy's blocks, and yet you're still checking the > the range again to decide whether to split. Yes, that's not ideal, and it would be easy to unshare that case (call the "split" function instead of a "maybe_split" function). I am not 100% sure though, if move_freepages_block_isolate() can always decide "I really have a mixture", but that code is simply quite advanced :) > > All of this to accomodate hugetlb, which might not even be compiled > in? Grrrr. Jup. But at the same time, it's frequently compiled in but never used (or barely used; I mean, how often do people actually free 1Gig hugetlb pages compared to ordinary pages). > > Like you, I was quite surprised to see that GFP_COMP patch in the > buddy hotpath splitting *everything* into blocks - on the offchance > that somebody might free a hugetlb page. Even if !CONFIG_HUGETLB. Just > - what the hell. We shouldn't merge "I only care about my niche > usecase at the expense of literally everybody else" patches like this. After talking to Willy, the whole _GFP_COMP stuff might get removed sooner or later again once we hand out frozen refcount in alloc_contig_range(). It might take a while, though. > > My vote is NAK on this patch, and a retro-NAK on the GFP_COMP patch. I won't fight for this patch *if* the GFP_COMP patch gets reverted. It improves the situation, which can be improved further. But if it doesn't get reverted, we have to think about something else. > > The buddy allocator operates on the assumption of MAX_PAGE_ORDER. If > we support folios of a larger size sourced from other allocators, then > it should be the folio layer discriminating. So if folio_put() detects > this is a massive alloc_contig chunk, then it should take a different > freeing path. Do the splitting in there, then pass valid chunks back > to the buddy. That would keep the layering cleaner and the cornercase > overhead out of the allocator fastpath. That might be better, although not that completely trivial I assume. How to handle the "MAX_PAGE_ORDER page is getting freed but one pageblock is isolated" case cleanly is a bit of a head scratcher, at least to me. But I suspect we had it fullt working before the GFF_COMP patch. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate() 2024-12-10 21:40 ` David Hildenbrand @ 2024-12-11 13:04 ` David Hildenbrand 0 siblings, 0 replies; 6+ messages in thread From: David Hildenbrand @ 2024-12-11 13:04 UTC (permalink / raw) To: Johannes Weiner Cc: linux-kernel, linux-mm, Andrew Morton, Zi Yan, Vlastimil Babka, Yu Zhao >> Having a function guess the caller is a bit of an anti-pattern. The >> resulting code is hard to follow, and it's very easy to >> unintentionally burden some cases with unnecessary stuff. It's better >> to unshare paths until you don't need conditionals like this. > > > In addition to the fastpath, I think you're also punishing the >> move_freepages_block_isolate() case. We *know* we just changed the >> type of one of the buddy's blocks, and yet you're still checking the >> the range again to decide whether to split. > > Yes, that's not ideal, and it would be easy to unshare that case (call > the "split" function instead of a "maybe_split" function). > > I am not 100% sure though, if move_freepages_block_isolate() can always > decide "I really have a mixture", but that code is simply quite advanced :) I played with it, and I think we can indeed assume that move_freepages_block_isolate() will always have to split. -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 2/2] mm/page_isolation: fixup isolate_single_pageblock() comment regarding splitting free pages 2024-12-10 10:29 [PATCH v2 0/2] mm/page_alloc: rework conditional splitting >= pageblock_order pages when freeing David Hildenbrand 2024-12-10 10:29 ` [PATCH v2 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate() David Hildenbrand @ 2024-12-10 10:29 ` David Hildenbrand 1 sibling, 0 replies; 6+ messages in thread From: David Hildenbrand @ 2024-12-10 10:29 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, David Hildenbrand, Andrew Morton, Zi Yan, Vlastimil Babka, Yu Zhao Let's fixup the comment, documenting how free_one_page_maybe_split() comes into play. Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: David Hildenbrand <david@redhat.com> --- mm/page_isolation.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/mm/page_isolation.c b/mm/page_isolation.c index c608e9d728655..63fddf283e681 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -298,11 +298,10 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * pagelbocks. * [ MAX_PAGE_ORDER ] * [ pageblock0 | pageblock1 ] - * When either pageblock is isolated, if it is a free page, the page is not - * split into separate migratetype lists, which is supposed to; if it is an - * in-use page and freed later, __free_one_page() does not split the free page - * either. The function handles this by splitting the free page or migrating - * the in-use page then splitting the free page. + * When either pageblock is isolated, if it is an in-use page and freed later, + * __free_one_page_maybe_split() will split the free page if required. If the + * page is already free, this function handles this by splitting the free page + * through move_freepages_block_isolate()->__free_one_page_maybe_split(). */ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags, bool isolate_before, bool skip_isolation, int migratetype) -- 2.47.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-12-11 13:04 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-12-10 10:29 [PATCH v2 0/2] mm/page_alloc: rework conditional splitting >= pageblock_order pages when freeing David Hildenbrand 2024-12-10 10:29 ` [PATCH v2 1/2] mm/page_alloc: conditionally split > pageblock_order pages in free_one_page() and move_freepages_block_isolate() David Hildenbrand 2024-12-10 21:16 ` Johannes Weiner 2024-12-10 21:40 ` David Hildenbrand 2024-12-11 13:04 ` David Hildenbrand 2024-12-10 10:29 ` [PATCH v2 2/2] mm/page_isolation: fixup isolate_single_pageblock() comment regarding splitting free pages David Hildenbrand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox