[PATCH mm-new 0/3] mm: accelerate gigantic folio allocation

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH mm-new 0/3] mm: accelerate gigantic folio allocation
@ 2026-01-10  4:21 Kefeng Wang
  2026-01-10  4:21 ` [PATCH 1/3] mm: page_alloc: optimize pfn_range_valid_contig() Kefeng Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Kefeng Wang @ 2026-01-10  4:21 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Oscar Salvador, Muchun Song, linux-mm
  Cc: sidhartha.kumar, jane.chu, Zi Yan, Vlastimil Babka,
	Brendan Jackman, Johannes Weiner, Matthew Wilcox, Kefeng Wang

Optimize pfn_range_valid_contig() and replace_free_hugepage_folios() in
alloc_contig_frozen_pages() to speed up gigantic folio allocation. The
allocation time for 120×1G folios drops from 3.605s to 0.431s.

---
This is part of [patch v2 0/8] mm: hugetlb: allocate frozen gigantic folio[1],
and as suggested by David[2], reuses some logic from the has_unmovable_pages().

[1] https://lore.kernel.org/linux-mm/20250918132000.1951232-1-wangkefeng.wang@huawei.com/
[2] https://lore.kernel.org/linux-mm/17bd2977-6a04-402f-ae84-306167cd16d6@redhat.com/

Kefeng Wang (3):
  mm: page_alloc: optimize pfn_range_valid_contig()
  mm: hugetlb: optimize replace_free_hugepage_folios()
  mm: hugetlb_cma: optimize hugetlb_cma_alloc_frozen_folio()

 include/linux/page-isolation.h |   2 +
 mm/hugetlb.c                   |  53 +++++++---
 mm/hugetlb_cma.c               |   5 +-
 mm/page_alloc.c                |  25 ++---
 mm/page_isolation.c            | 187 +++++++++++++++++----------------
 5 files changed, 154 insertions(+), 118 deletions(-)

-- 
2.27.0



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/3] mm: page_alloc: optimize pfn_range_valid_contig()
  2026-01-10  4:21 [PATCH mm-new 0/3] mm: accelerate gigantic folio allocation Kefeng Wang
@ 2026-01-10  4:21 ` Kefeng Wang
  2026-01-10  4:21 ` [PATCH 2/3] mm: hugetlb: optimize replace_free_hugepage_folios() Kefeng Wang
  2026-01-10  4:21 ` [PATCH 3/3] mm: hugetlb_cma: optimize hugetlb_cma_alloc_frozen_folio() Kefeng Wang
  2 siblings, 0 replies; 4+ messages in thread
From: Kefeng Wang @ 2026-01-10  4:21 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Oscar Salvador, Muchun Song, linux-mm
  Cc: sidhartha.kumar, jane.chu, Zi Yan, Vlastimil Babka,
	Brendan Jackman, Johannes Weiner, Matthew Wilcox, Kefeng Wang

The alloc_contig_pages() function spends a significant amount of time
within the pfn_range_valid_contig() function.

- set_max_huge_pages                                      
   - 99.98% alloc_pool_huge_folio                         
        only_alloc_fresh_hugetlb_folio.isra.0             
      - alloc_contig_frozen_pages_noprof                  
         - 87.00% pfn_range_valid_contig                  
              pfn_to_online_page                          
         - 12.91% alloc_contig_frozen_range_noprof        
              4.51% replace_free_hugepage_folios          
            - 4.02% prep_new_page                         
                 prep_compound_page                       
            - 2.98% undo_isolate_page_range               
               - 2.79% unset_migratetype_isolate          
                  - 2.75% __move_freepages_block_isolate  
                       2.71% __move_freepages_block       
            - 0.98% start_isolate_page_range              
                 0.66% set_migratetype_isolate            

To optimize this process, implement a new helper page_is_unmovable(),
which reuses the logic from has_unmovable_pages(). This function avoids
unnecessary iterations for compound pages, such as THP, and non-compound
high-order buddy pages, which significantly improving the efficiency of
contiguous memory allocation.

A simple test on machine with 114G free memory, allocate 120 * 1G
HugeTLB folios(104 successfully returned),

  time echo 120 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

Before: 0m3.605s
After:  0m0.602s

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/page-isolation.h |   2 +
 mm/page_alloc.c                |  25 ++---
 mm/page_isolation.c            | 187 +++++++++++++++++----------------
 3 files changed, 109 insertions(+), 105 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 3e2f960e166c..6f8638c9904f 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -67,4 +67,6 @@ void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn);
 
 int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 			enum pb_isolate_mode mode);
+bool page_is_unmovable(struct zone *zone, struct page *page,
+			enum pb_isolate_mode mode, unsigned long *step);
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d8d5379c44dc..813c5f57883f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7157,18 +7157,20 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
 				   unsigned long nr_pages, bool skip_hugetlb,
 				   bool *skipped_hugetlb)
 {
-	unsigned long i, end_pfn = start_pfn + nr_pages;
+	unsigned long end_pfn = start_pfn + nr_pages;
 	struct page *page;
 
-	for (i = start_pfn; i < end_pfn; i++) {
-		page = pfn_to_online_page(i);
+	while (start_pfn < end_pfn) {
+		unsigned long step = 1;
+
+		page = pfn_to_online_page(start_pfn);
 		if (!page)
 			return false;
 
 		if (page_zone(page) != z)
 			return false;
 
-		if (PageReserved(page))
+		if (page_is_unmovable(z, page, PB_ISOLATE_MODE_OTHER, &step))
 			return false;
 
 		/*
@@ -7183,9 +7185,6 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
 		if (PageHuge(page)) {
 			unsigned int order;
 
-			if (!IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION))
-				return false;
-
 			if (skip_hugetlb) {
 				*skipped_hugetlb = true;
 				return false;
@@ -7196,17 +7195,9 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
 			if ((order >= MAX_FOLIO_ORDER) ||
 			    (nr_pages <= (1 << order)))
 				return false;
-
-			/*
-			 * Reaching this point means we've encounted a huge page
-			 * smaller than nr_pages, skip all pfn's for that page.
-			 *
-			 * We can't get here from a tail-PageHuge, as it implies
-			 * we started a scan in the middle of a hugepage larger
-			 * than nr_pages - which the prior check filters for.
-			 */
-			i += (1 << order) - 1;
 		}
+
+		start_pfn += step;
 	}
 	return true;
 }
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index b5924eff4f8b..c48ff5c00244 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -15,6 +15,100 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/page_isolation.h>
 
+bool page_is_unmovable(struct zone *zone, struct page *page,
+		enum pb_isolate_mode mode, unsigned long *step)
+{
+	/*
+	 * Both, bootmem allocations and memory holes are marked
+	 * PG_reserved and are unmovable. We can even have unmovable
+	 * allocations inside ZONE_MOVABLE, for example when
+	 * specifying "movablecore".
+	 */
+	if (PageReserved(page))
+		return true;
+
+	/*
+	 * If the zone is movable and we have ruled out all reserved
+	 * pages then it should be reasonably safe to assume the rest
+	 * is movable.
+	 */
+	if (zone_idx(zone) == ZONE_MOVABLE)
+		return false;
+
+	/*
+	 * Hugepages are not in LRU lists, but they're movable.
+	 * THPs are on the LRU, but need to be counted as #small pages.
+	 * We need not scan over tail pages because we don't
+	 * handle each tail page individually in migration.
+	 */
+	if (PageHuge(page) || PageCompound(page)) {
+		struct folio *folio = page_folio(page);
+
+		if (folio_test_hugetlb(folio)) {
+			struct hstate *h;
+
+			if (!IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION))
+				return true;
+
+			/*
+			 * The huge page may be freed so can not
+			 * use folio_hstate() directly.
+			 */
+			h = size_to_hstate(folio_size(folio));
+			if (h && !hugepage_migration_supported(h))
+				return true;
+
+		} else if (!folio_test_lru(folio)) {
+			return true;
+		}
+
+		*step = folio_nr_pages(folio) - folio_page_idx(folio, page);
+		return false;
+	}
+
+	/*
+	 * We can't use page_count without pin a page
+	 * because another CPU can free compound page.
+	 * This check already skips compound tails of THP
+	 * because their page->_refcount is zero at all time.
+	 */
+	if (!page_ref_count(page)) {
+		if (PageBuddy(page))
+			*step = (1 << buddy_order(page));
+		return false;
+	}
+
+	/*
+	 * The HWPoisoned page may be not in buddy system, and
+	 * page_count() is not 0.
+	 */
+	if ((mode == PB_ISOLATE_MODE_MEM_OFFLINE) && PageHWPoison(page))
+		return false;
+
+	/*
+	 * We treat all PageOffline() pages as movable when offlining
+	 * to give drivers a chance to decrement their reference count
+	 * in MEM_GOING_OFFLINE in order to indicate that these pages
+	 * can be offlined as there are no direct references anymore.
+	 * For actually unmovable PageOffline() where the driver does
+	 * not support this, we will fail later when trying to actually
+	 * move these pages that still have a reference count > 0.
+	 * (false negatives in this function only)
+	 */
+	if ((mode == PB_ISOLATE_MODE_MEM_OFFLINE) && PageOffline(page))
+		return false;
+
+	if (PageLRU(page) || page_has_movable_ops(page))
+		return false;
+
+	/*
+	 * If there are RECLAIMABLE pages, we need to check
+	 * it.  But now, memory offline itself doesn't call
+	 * shrink_node_slabs() and it still to be fixed.
+	 */
+	return true;
+}
+
 /*
  * This function checks whether the range [start_pfn, end_pfn) includes
  * unmovable pages or not. The range must fall into a single pageblock and
@@ -35,7 +129,6 @@ static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long e
 {
 	struct page *page = pfn_to_page(start_pfn);
 	struct zone *zone = page_zone(page);
-	unsigned long pfn;
 
 	VM_BUG_ON(pageblock_start_pfn(start_pfn) !=
 		  pageblock_start_pfn(end_pfn - 1));
@@ -52,96 +145,14 @@ static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long e
 		return page;
 	}
 
-	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
-		page = pfn_to_page(pfn);
+	while (start_pfn < end_pfn) {
+		unsigned long step = 1;
 
-		/*
-		 * Both, bootmem allocations and memory holes are marked
-		 * PG_reserved and are unmovable. We can even have unmovable
-		 * allocations inside ZONE_MOVABLE, for example when
-		 * specifying "movablecore".
-		 */
-		if (PageReserved(page))
+		page = pfn_to_page(start_pfn);
+		if (page_is_unmovable(zone, page, mode, &step))
 			return page;
 
-		/*
-		 * If the zone is movable and we have ruled out all reserved
-		 * pages then it should be reasonably safe to assume the rest
-		 * is movable.
-		 */
-		if (zone_idx(zone) == ZONE_MOVABLE)
-			continue;
-
-		/*
-		 * Hugepages are not in LRU lists, but they're movable.
-		 * THPs are on the LRU, but need to be counted as #small pages.
-		 * We need not scan over tail pages because we don't
-		 * handle each tail page individually in migration.
-		 */
-		if (PageHuge(page) || PageTransCompound(page)) {
-			struct folio *folio = page_folio(page);
-			unsigned int skip_pages;
-
-			if (PageHuge(page)) {
-				struct hstate *h;
-
-				/*
-				 * The huge page may be freed so can not
-				 * use folio_hstate() directly.
-				 */
-				h = size_to_hstate(folio_size(folio));
-				if (h && !hugepage_migration_supported(h))
-					return page;
-			} else if (!folio_test_lru(folio)) {
-				return page;
-			}
-
-			skip_pages = folio_nr_pages(folio) - folio_page_idx(folio, page);
-			pfn += skip_pages - 1;
-			continue;
-		}
-
-		/*
-		 * We can't use page_count without pin a page
-		 * because another CPU can free compound page.
-		 * This check already skips compound tails of THP
-		 * because their page->_refcount is zero at all time.
-		 */
-		if (!page_ref_count(page)) {
-			if (PageBuddy(page))
-				pfn += (1 << buddy_order(page)) - 1;
-			continue;
-		}
-
-		/*
-		 * The HWPoisoned page may be not in buddy system, and
-		 * page_count() is not 0.
-		 */
-		if ((mode == PB_ISOLATE_MODE_MEM_OFFLINE) && PageHWPoison(page))
-			continue;
-
-		/*
-		 * We treat all PageOffline() pages as movable when offlining
-		 * to give drivers a chance to decrement their reference count
-		 * in MEM_GOING_OFFLINE in order to indicate that these pages
-		 * can be offlined as there are no direct references anymore.
-		 * For actually unmovable PageOffline() where the driver does
-		 * not support this, we will fail later when trying to actually
-		 * move these pages that still have a reference count > 0.
-		 * (false negatives in this function only)
-		 */
-		if ((mode == PB_ISOLATE_MODE_MEM_OFFLINE) && PageOffline(page))
-			continue;
-
-		if (PageLRU(page) || page_has_movable_ops(page))
-			continue;
-
-		/*
-		 * If there are RECLAIMABLE pages, we need to check
-		 * it.  But now, memory offline itself doesn't call
-		 * shrink_node_slabs() and it still to be fixed.
-		 */
-		return page;
+		start_pfn += step;
 	}
 	return NULL;
 }
-- 
2.27.0



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/3] mm: hugetlb: optimize replace_free_hugepage_folios()
  2026-01-10  4:21 [PATCH mm-new 0/3] mm: accelerate gigantic folio allocation Kefeng Wang
  2026-01-10  4:21 ` [PATCH 1/3] mm: page_alloc: optimize pfn_range_valid_contig() Kefeng Wang
@ 2026-01-10  4:21 ` Kefeng Wang
  2026-01-10  4:21 ` [PATCH 3/3] mm: hugetlb_cma: optimize hugetlb_cma_alloc_frozen_folio() Kefeng Wang
  2 siblings, 0 replies; 4+ messages in thread
From: Kefeng Wang @ 2026-01-10  4:21 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Oscar Salvador, Muchun Song, linux-mm
  Cc: sidhartha.kumar, jane.chu, Zi Yan, Vlastimil Babka,
	Brendan Jackman, Johannes Weiner, Matthew Wilcox, Kefeng Wang

If no free hugepage folios are available, there is no need to perform
any replacement operations. Additionally, gigantic folios should not
be replaced under any circumstances. Therefore, we only check for the
presence of non-gigantic folios.

To ensure that gigantic folios are not mistakenly replaced, we utilize
the function `isolate_or_dissolve_huge_folio()`.

Lastly, to optimize performance, we skip unnecessary iterations over pfn
for compound pages, including THP and non-compound high-order buddy pages.
This helps save processing time.

A simple test on machine with 114G free memory, allocate 120 * 1G
HugeTLB folios(104 successfully returned),

  time echo 120 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

Before: 0m0.602s
After:  0m0.431s

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/hugetlb.c | 53 ++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 41 insertions(+), 12 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8c197307db0c..a80cef879f61 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2806,26 +2806,55 @@ int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *list)
  */
 int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
 {
-	struct folio *folio;
-	int ret = 0;
+	unsigned long nr = 0;
+	struct page *page;
+	struct hstate *h;
+	LIST_HEAD(list);
 
-	LIST_HEAD(isolate_list);
+	/* Avoid pfn iterations if no free non-gigantic huge pages */
+	for_each_hstate(h) {
+		if (hstate_is_gigantic(h))
+			continue;
+
+		nr += h->free_huge_pages;
+		if (nr)
+			break;
+	}
+
+	if (!nr)
+		return 0;
 
 	while (start_pfn < end_pfn) {
-		folio = pfn_folio(start_pfn);
+		page = pfn_to_page(start_pfn);
+		nr = 1;
 
-		/* Not to disrupt normal path by vainly holding hugetlb_lock */
-		if (folio_test_hugetlb(folio) && !folio_ref_count(folio)) {
-			ret = alloc_and_dissolve_hugetlb_folio(folio, &isolate_list);
-			if (ret)
-				break;
+		if (PageHuge(page) || PageCompound(page)) {
+			struct folio *folio = page_folio(page);
+
+			nr = folio_nr_pages(folio) - folio_page_idx(folio, page);
+
+			if (folio_test_hugetlb(folio) && !folio_ref_count(folio)) {
+				if (isolate_or_dissolve_huge_folio(folio, &list))
+					return -ENOMEM;
 
-			putback_movable_pages(&isolate_list);
+				putback_movable_pages(&list);
+			}
+		} else if (PageBuddy(page)) {
+			/*
+			 * Buddy order check without zone lock is unsafe and
+			 * the order is maybe invalid, but race should be
+			 * small, and the worst thing is skipping free hugetlb.
+			 */
+			const unsigned int order = buddy_order_unsafe(page);
+
+			if (order <= MAX_PAGE_ORDER)
+				nr = 1UL << order;
 		}
-		start_pfn++;
+
+		start_pfn += nr;
 	}
 
-	return ret;
+	return 0;
 }
 
 void wait_for_freed_hugetlb_folios(void)
-- 
2.27.0



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 3/3] mm: hugetlb_cma: optimize hugetlb_cma_alloc_frozen_folio()
  2026-01-10  4:21 [PATCH mm-new 0/3] mm: accelerate gigantic folio allocation Kefeng Wang
  2026-01-10  4:21 ` [PATCH 1/3] mm: page_alloc: optimize pfn_range_valid_contig() Kefeng Wang
  2026-01-10  4:21 ` [PATCH 2/3] mm: hugetlb: optimize replace_free_hugepage_folios() Kefeng Wang
@ 2026-01-10  4:21 ` Kefeng Wang
  2 siblings, 0 replies; 4+ messages in thread
From: Kefeng Wang @ 2026-01-10  4:21 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Oscar Salvador, Muchun Song, linux-mm
  Cc: sidhartha.kumar, jane.chu, Zi Yan, Vlastimil Babka,
	Brendan Jackman, Johannes Weiner, Matthew Wilcox, Kefeng Wang

Check hugetlb_cma_size which helps to avoid unnecessary gfp check
or nodemask traversal.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/hugetlb_cma.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb_cma.c b/mm/hugetlb_cma.c
index 0ddf9755c090..d8fa93825992 100644
--- a/mm/hugetlb_cma.c
+++ b/mm/hugetlb_cma.c
@@ -16,7 +16,7 @@
 static struct cma *hugetlb_cma[MAX_NUMNODES];
 static unsigned long hugetlb_cma_size_in_node[MAX_NUMNODES] __initdata;
 static bool hugetlb_cma_only;
-static unsigned long hugetlb_cma_size __initdata;
+static unsigned long hugetlb_cma_size __ro_after_init;
 
 void hugetlb_cma_free_frozen_folio(struct folio *folio)
 {
@@ -31,6 +31,9 @@ struct folio *hugetlb_cma_alloc_frozen_folio(int order, gfp_t gfp_mask,
 	struct folio *folio;
 	struct page *page = NULL;
 
+	if (!hugetlb_cma_size)
+		return NULL;
+
 	if (hugetlb_cma[nid])
 		page = cma_alloc_frozen_compound(hugetlb_cma[nid], order);
 
-- 
2.27.0



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-01-10  4:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-10  4:21 [PATCH mm-new 0/3] mm: accelerate gigantic folio allocation Kefeng Wang
2026-01-10  4:21 ` [PATCH 1/3] mm: page_alloc: optimize pfn_range_valid_contig() Kefeng Wang
2026-01-10  4:21 ` [PATCH 2/3] mm: hugetlb: optimize replace_free_hugepage_folios() Kefeng Wang
2026-01-10  4:21 ` [PATCH 3/3] mm: hugetlb_cma: optimize hugetlb_cma_alloc_frozen_folio() Kefeng Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox