[PATCH v2 0/8] mm: hugetlb: allocate frozen gigantic folio

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/8] mm: hugetlb: allocate frozen gigantic folio
@ 2025-09-18 13:19 Kefeng Wang
  2025-09-18 13:19 ` [PATCH v2 1/8] mm: page_alloc: optimize pfn_range_valid_contig() Kefeng Wang
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Kefeng Wang @ 2025-09-18 13:19 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Oscar Salvador, Muchun Song,
	Zi Yan, Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm, Kefeng Wang

Firstly, optimize pfn_range_valid_contig()/replace_free_hugepage_folios()
to accelerate the gigantic folio allocation, time reduced from 2.124s to
0.429s when allocate 200*1G.

Introduce alloc_contig_frozen_pages() and cma_alloc_frozen_compound()
which avoid atomic operation about page refcount, and then convert to
allocate frozen gigantic folio by the new helpers in hugetlb to cleanup
the alloc_gigantic_folio().

v2:
- Optimize gigantic folio allocation speed
- Using HPAGE_PUD_ORDER in debug_vm_pgtable
- Address some David's comments,
  - kill folio_alloc_gigantic()
  - add generic cma_alloc_frozen{_compound}() instead of
    cma_{alloc,free}_folio

Kefeng Wang (8):
  mm: page_alloc: optimize pfn_range_valid_contig()
  mm: hugetlb: optimize replace_free_hugepage_folios()
  mm: debug_vm_pgtable: add debug_vm_pgtable_free_huge_page()
  mm: page_alloc: add split_non_compound_page()
  mm: page_alloc: add alloc_contig_{range_frozen,frozen_pages}()
  mm: cma: add __cma_release()
  mm: cma: add cma_alloc_frozen{_compound}()
  mm: hugetlb: allocate frozen pages in alloc_gigantic_folio()

 include/linux/cma.h   |  26 +----
 include/linux/gfp.h   |  52 ++++------
 mm/cma.c              | 104 +++++++++-----------
 mm/debug_vm_pgtable.c |  38 ++++---
 mm/hugetlb.c          | 103 +++++++++----------
 mm/hugetlb_cma.c      |  27 ++---
 mm/hugetlb_cma.h      |  10 +-
 mm/internal.h         |   6 ++
 mm/page_alloc.c       | 224 +++++++++++++++++++++++++++++-------------
 9 files changed, 318 insertions(+), 272 deletions(-)

-- 
2.27.0



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/8] mm: page_alloc: optimize pfn_range_valid_contig()
  2025-09-18 13:19 [PATCH v2 0/8] mm: hugetlb: allocate frozen gigantic folio Kefeng Wang
@ 2025-09-18 13:19 ` Kefeng Wang
  2025-09-18 15:49   ` Zi Yan
                     ` (3 more replies)
  2025-09-18 13:19 ` [PATCH v2 2/8] mm: hugetlb: optimize replace_free_hugepage_folios() Kefeng Wang
                   ` (6 subsequent siblings)
  7 siblings, 4 replies; 23+ messages in thread
From: Kefeng Wang @ 2025-09-18 13:19 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Oscar Salvador, Muchun Song,
	Zi Yan, Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm, Kefeng Wang

The alloc_contig_pages() spends a lot of time in pfn_range_valid_contig(),
we could check whether the page in this pfn range could be allocated
before alloc_contig_range(), if the page can't be migrated, no further
action is required, and also skip some unnecessary iterations for
compound pages such as THP and non-compound high order buddy, which
save times a lot too. The check is racy, but the only danger is skipping
too much.

A simple test on machine with 116G free memory, allocate 120 * 1G
HugeTLB folios(107 successfully returned),

  time echo 120 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

Before: 0m2.124s
After:  0m0.602s

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/page_alloc.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 478beaf95f84..5b7d705e9710 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7012,6 +7012,7 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
 {
 	unsigned long i, end_pfn = start_pfn + nr_pages;
 	struct page *page;
+	struct folio *folio;
 
 	for (i = start_pfn; i < end_pfn; i++) {
 		page = pfn_to_online_page(i);
@@ -7021,11 +7022,26 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
 		if (page_zone(page) != z)
 			return false;
 
-		if (PageReserved(page))
+		folio = page_folio(page);
+		if (folio_test_reserved(folio))
 			return false;
 
-		if (PageHuge(page))
+		if (folio_test_hugetlb(folio))
 			return false;
+
+		/* The following type of folios aren't migrated */
+		if (folio_test_pgtable(folio) | folio_test_stack(folio))
+			return false;
+
+		/*
+		 * For compound pages such as THP and non-compound high
+		 * order buddy pages, save potentially a lot of iterations
+		 * if we can skip them at once.
+		 */
+		if (PageCompound(page))
+			i += (1UL << compound_order(page)) - 1;
+		else if (PageBuddy(page))
+			i += (1UL << buddy_order(page)) - 1;
 	}
 	return true;
 }
-- 
2.27.0



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 2/8] mm: hugetlb: optimize replace_free_hugepage_folios()
  2025-09-18 13:19 [PATCH v2 0/8] mm: hugetlb: allocate frozen gigantic folio Kefeng Wang
  2025-09-18 13:19 ` [PATCH v2 1/8] mm: page_alloc: optimize pfn_range_valid_contig() Kefeng Wang
@ 2025-09-18 13:19 ` Kefeng Wang
  2025-09-30  9:57   ` David Hildenbrand
  2025-09-18 13:19 ` [PATCH v2 3/8] mm: debug_vm_pgtable: add debug_vm_pgtable_free_huge_page() Kefeng Wang
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Kefeng Wang @ 2025-09-18 13:19 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Oscar Salvador, Muchun Song,
	Zi Yan, Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm, Kefeng Wang

No need to replace free hugepage folios if no free hugetlb folios,
we don't replace gigantic folio, so use isolate_or_dissolve_huge_folio(),
also skip some pfn iterations for compound pages such as THP and
non-compound high order buddy to save time.

A simple test on machine with 116G free memory, allocate 120 * 1G
HugeTLB folios(107 successfully returned),

  time echo 120 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

Before: 0m0.602s
After:  0m0.429s

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/hugetlb.c | 49 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 37 insertions(+), 12 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1806685ea326..bc88b659a88b 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2890,26 +2890,51 @@ int isolate_or_dissolve_huge_folio(struct folio *folio, struct list_head *list)
  */
 int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
 {
-	struct folio *folio;
-	int ret = 0;
+	unsigned long nr = 0;
+	struct page *page;
+	struct hstate *h;
+	LIST_HEAD(list);
+
+	/* Avoid pfn iterations if no free non-gigantic huge pages */
+	for_each_hstate(h) {
+		if (!hstate_is_gigantic(h))
+			nr += h->free_huge_pages;
+	}
 
-	LIST_HEAD(isolate_list);
+	if (!nr)
+		return 0;
 
 	while (start_pfn < end_pfn) {
-		folio = pfn_folio(start_pfn);
+		page = pfn_to_page(start_pfn);
+		nr = 1;
 
-		/* Not to disrupt normal path by vainly holding hugetlb_lock */
-		if (folio_test_hugetlb(folio) && !folio_ref_count(folio)) {
-			ret = alloc_and_dissolve_hugetlb_folio(folio, &isolate_list);
-			if (ret)
-				break;
+		if (PageHuge(page) || PageCompound(page)) {
+			struct folio *folio = page_folio(page);
+
+			nr = 1UL << compound_order(page);
 
-			putback_movable_pages(&isolate_list);
+			if (folio_test_hugetlb(folio) && !folio_ref_count(folio)) {
+				if (isolate_or_dissolve_huge_folio(folio, &list))
+					return -ENOMEM;
+
+				putback_movable_pages(&list);
+			}
+		} else if (PageBuddy(page)) {
+			/*
+			 * Buddy order check without zone lock is unsafe and
+			 * the order is maybe invalid, but race should be
+			 * small, and the worst thing is skipping free hugetlb.
+			 */
+			const unsigned int order = buddy_order_unsafe(page);
+
+			if (order <= MAX_PAGE_ORDER)
+				nr = 1UL << order;
 		}
-		start_pfn++;
+
+		start_pfn += nr;
 	}
 
-	return ret;
+	return 0;
 }
 
 void wait_for_freed_hugetlb_folios(void)
-- 
2.27.0



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 3/8] mm: debug_vm_pgtable: add debug_vm_pgtable_free_huge_page()
  2025-09-18 13:19 [PATCH v2 0/8] mm: hugetlb: allocate frozen gigantic folio Kefeng Wang
  2025-09-18 13:19 ` [PATCH v2 1/8] mm: page_alloc: optimize pfn_range_valid_contig() Kefeng Wang
  2025-09-18 13:19 ` [PATCH v2 2/8] mm: hugetlb: optimize replace_free_hugepage_folios() Kefeng Wang
@ 2025-09-18 13:19 ` Kefeng Wang
  2025-09-30 10:01   ` David Hildenbrand
  2025-09-18 13:19 ` [PATCH v2 4/8] mm: page_alloc: add split_non_compound_page() Kefeng Wang
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Kefeng Wang @ 2025-09-18 13:19 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Oscar Salvador, Muchun Song,
	Zi Yan, Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm, Kefeng Wang

Add a new helper to free huge page to be consistency to
debug_vm_pgtable_alloc_huge_page(), and use HPAGE_PUD_ORDER
instead of open-code.

Also move the free_contig_range() under CONFIG_ALLOC_CONTIG
since all caller are built with CONFIG_ALLOC_CONTIG.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/gfp.h   |  2 +-
 mm/debug_vm_pgtable.c | 38 +++++++++++++++++---------------------
 mm/page_alloc.c       |  2 +-
 3 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 0ceb4e09306c..1fefb63e0480 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -437,8 +437,8 @@ extern struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_
 					      int nid, nodemask_t *nodemask);
 #define alloc_contig_pages(...)			alloc_hooks(alloc_contig_pages_noprof(__VA_ARGS__))
 
-#endif
 void free_contig_range(unsigned long pfn, unsigned long nr_pages);
+#endif
 
 #ifdef CONFIG_CONTIG_ALLOC
 static inline struct folio *folio_alloc_gigantic_noprof(int order, gfp_t gfp,
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
index 830107b6dd08..d7f82aa58711 100644
--- a/mm/debug_vm_pgtable.c
+++ b/mm/debug_vm_pgtable.c
@@ -946,22 +946,26 @@ static unsigned long __init get_random_vaddr(void)
 	return random_vaddr;
 }
 
-static void __init destroy_args(struct pgtable_debug_args *args)
+static void __init
+debug_vm_pgtable_free_huge_page(struct pgtable_debug_args *args,
+		unsigned long pfn, int order)
 {
-	struct page *page = NULL;
+#ifdef CONFIG_CONTIG_ALLOC
+	if (args->is_contiguous_page) {
+		free_contig_range(pfn, 1 << order);
+		return;
+	}
+#endif
+	__free_pages(pfn_to_page(pfn), order);
+}
 
+static void __init destroy_args(struct pgtable_debug_args *args)
+{
 	/* Free (huge) page */
 	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
 	    has_transparent_pud_hugepage() &&
 	    args->pud_pfn != ULONG_MAX) {
-		if (args->is_contiguous_page) {
-			free_contig_range(args->pud_pfn,
-					  (1 << (HPAGE_PUD_SHIFT - PAGE_SHIFT)));
-		} else {
-			page = pfn_to_page(args->pud_pfn);
-			__free_pages(page, HPAGE_PUD_SHIFT - PAGE_SHIFT);
-		}
-
+		debug_vm_pgtable_free_huge_page(args, args->pud_pfn, HPAGE_PUD_ORDER);
 		args->pud_pfn = ULONG_MAX;
 		args->pmd_pfn = ULONG_MAX;
 		args->pte_pfn = ULONG_MAX;
@@ -970,20 +974,13 @@ static void __init destroy_args(struct pgtable_debug_args *args)
 	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
 	    has_transparent_hugepage() &&
 	    args->pmd_pfn != ULONG_MAX) {
-		if (args->is_contiguous_page) {
-			free_contig_range(args->pmd_pfn, (1 << HPAGE_PMD_ORDER));
-		} else {
-			page = pfn_to_page(args->pmd_pfn);
-			__free_pages(page, HPAGE_PMD_ORDER);
-		}
-
+		debug_vm_pgtable_free_huge_page(args, args->pmd_pfn, HPAGE_PMD_ORDER);
 		args->pmd_pfn = ULONG_MAX;
 		args->pte_pfn = ULONG_MAX;
 	}
 
 	if (args->pte_pfn != ULONG_MAX) {
-		page = pfn_to_page(args->pte_pfn);
-		__free_page(page);
+		__free_page(pfn_to_page(args->pte_pfn));
 
 		args->pte_pfn = ULONG_MAX;
 	}
@@ -1215,8 +1212,7 @@ static int __init init_args(struct pgtable_debug_args *args)
 	 */
 	if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
 	    has_transparent_pud_hugepage()) {
-		page = debug_vm_pgtable_alloc_huge_page(args,
-				HPAGE_PUD_SHIFT - PAGE_SHIFT);
+		page = debug_vm_pgtable_alloc_huge_page(args, HPAGE_PUD_ORDER);
 		if (page) {
 			args->pud_pfn = page_to_pfn(page);
 			args->pmd_pfn = args->pud_pfn;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5b7d705e9710..b6eeae39f4d0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7113,7 +7113,6 @@ struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
 	}
 	return NULL;
 }
-#endif /* CONFIG_CONTIG_ALLOC */
 
 void free_contig_range(unsigned long pfn, unsigned long nr_pages)
 {
@@ -7140,6 +7139,7 @@ void free_contig_range(unsigned long pfn, unsigned long nr_pages)
 	WARN(count != 0, "%lu pages are still in use!\n", count);
 }
 EXPORT_SYMBOL(free_contig_range);
+#endif /* CONFIG_CONTIG_ALLOC */
 
 /*
  * Effectively disable pcplists for the zone by setting the high limit to 0
-- 
2.27.0



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 4/8] mm: page_alloc: add split_non_compound_page()
  2025-09-18 13:19 [PATCH v2 0/8] mm: hugetlb: allocate frozen gigantic folio Kefeng Wang
                   ` (2 preceding siblings ...)
  2025-09-18 13:19 ` [PATCH v2 3/8] mm: debug_vm_pgtable: add debug_vm_pgtable_free_huge_page() Kefeng Wang
@ 2025-09-18 13:19 ` Kefeng Wang
  2025-09-30 10:06   ` David Hildenbrand
  2025-09-18 13:19 ` [PATCH v2 5/8] mm: page_alloc: add alloc_contig_{range_frozen,frozen_pages}() Kefeng Wang
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Kefeng Wang @ 2025-09-18 13:19 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Oscar Salvador, Muchun Song,
	Zi Yan, Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm, Kefeng Wang

Add new split_non_compound_page() to simplify make_alloc_exact().

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/page_alloc.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b6eeae39f4d0..e1d229b75f27 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3042,6 +3042,15 @@ void free_unref_folios(struct folio_batch *folios)
 	folio_batch_reinit(folios);
 }
 
+static void split_non_compound_page(struct page *page, unsigned int order)
+{
+	VM_BUG_ON_PAGE(PageCompound(page), page);
+
+	split_page_owner(page, order, 0);
+	pgalloc_tag_split(page_folio(page), order, 0);
+	split_page_memcg(page, order);
+}
+
 /*
  * split_page takes a non-compound higher-order page, and splits it into
  * n (1<<order) sub-pages: page[0..n]
@@ -3054,14 +3063,12 @@ void split_page(struct page *page, unsigned int order)
 {
 	int i;
 
-	VM_BUG_ON_PAGE(PageCompound(page), page);
 	VM_BUG_ON_PAGE(!page_count(page), page);
 
 	for (i = 1; i < (1 << order); i++)
 		set_page_refcounted(page + i);
-	split_page_owner(page, order, 0);
-	pgalloc_tag_split(page_folio(page), order, 0);
-	split_page_memcg(page, order);
+
+	split_non_compound_page(page, order);
 }
 EXPORT_SYMBOL_GPL(split_page);
 
@@ -5315,9 +5322,7 @@ static void *make_alloc_exact(unsigned long addr, unsigned int order,
 		struct page *page = virt_to_page((void *)addr);
 		struct page *last = page + nr;
 
-		split_page_owner(page, order, 0);
-		pgalloc_tag_split(page_folio(page), order, 0);
-		split_page_memcg(page, order);
+		split_non_compound_page(page, order);
 		while (page < --last)
 			set_page_refcounted(last);
 
-- 
2.27.0



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 5/8] mm: page_alloc: add alloc_contig_{range_frozen,frozen_pages}()
  2025-09-18 13:19 [PATCH v2 0/8] mm: hugetlb: allocate frozen gigantic folio Kefeng Wang
                   ` (3 preceding siblings ...)
  2025-09-18 13:19 ` [PATCH v2 4/8] mm: page_alloc: add split_non_compound_page() Kefeng Wang
@ 2025-09-18 13:19 ` Kefeng Wang
  2025-09-18 13:19 ` [PATCH v2 6/8] mm: cma: add __cma_release() Kefeng Wang
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Kefeng Wang @ 2025-09-18 13:19 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Oscar Salvador, Muchun Song,
	Zi Yan, Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm, Kefeng Wang

In order to allocate given range of pages or allocate compound
pages without incrementing their refcount, adding two new helper
alloc_contig_{range_frozen,frozen_pages}() which may be beneficial
to some users (eg hugetlb), also free_contig_range_frozen() is
provided to match alloc_contig_range_frozen(), but it is better to
use free_frozen_pages() to free frozen compound pages.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/gfp.h |  29 +++++--
 mm/page_alloc.c     | 183 +++++++++++++++++++++++++++++---------------
 2 files changed, 143 insertions(+), 69 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 1fefb63e0480..fbbdd8c88483 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -429,14 +429,27 @@ typedef unsigned int __bitwise acr_flags_t;
 #define ACR_FLAGS_CMA ((__force acr_flags_t)BIT(0)) // allocate for CMA
 
 /* The below functions must be run on a range from a single zone. */
-extern int alloc_contig_range_noprof(unsigned long start, unsigned long end,
-				     acr_flags_t alloc_flags, gfp_t gfp_mask);
-#define alloc_contig_range(...)			alloc_hooks(alloc_contig_range_noprof(__VA_ARGS__))
-
-extern struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
-					      int nid, nodemask_t *nodemask);
-#define alloc_contig_pages(...)			alloc_hooks(alloc_contig_pages_noprof(__VA_ARGS__))
-
+int alloc_contig_range_frozen_noprof(unsigned long start, unsigned long end,
+		acr_flags_t alloc_flags, gfp_t gfp_mask);
+#define alloc_contig_range_frozen(...)	\
+	alloc_hooks(alloc_contig_range_frozen_noprof(__VA_ARGS__))
+
+int alloc_contig_range_noprof(unsigned long start, unsigned long end,
+		acr_flags_t alloc_flags, gfp_t gfp_mask);
+#define alloc_contig_range(...)	\
+	alloc_hooks(alloc_contig_range_noprof(__VA_ARGS__))
+
+struct page *alloc_contig_frozen_pages_noprof(unsigned long nr_pages,
+		gfp_t gfp_mask, int nid, nodemask_t *nodemask);
+#define alloc_contig_frozen_pages(...) \
+	alloc_hooks(alloc_contig_frozen_pages_noprof(__VA_ARGS__))
+
+struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
+		int nid, nodemask_t *nodemask);
+#define alloc_contig_pages(...)	\
+	alloc_hooks(alloc_contig_pages_noprof(__VA_ARGS__))
+
+void free_contig_range_frozen(unsigned long pfn, unsigned long nr_pages);
 void free_contig_range(unsigned long pfn, unsigned long nr_pages);
 #endif
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e1d229b75f27..05db9b5d584f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6782,7 +6782,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 	return (ret < 0) ? ret : 0;
 }
 
-static void split_free_pages(struct list_head *list, gfp_t gfp_mask)
+static void split_free_frozen_pages(struct list_head *list, gfp_t gfp_mask)
 {
 	int order;
 
@@ -6794,11 +6794,10 @@ static void split_free_pages(struct list_head *list, gfp_t gfp_mask)
 			int i;
 
 			post_alloc_hook(page, order, gfp_mask);
-			set_page_refcounted(page);
 			if (!order)
 				continue;
 
-			split_page(page, order);
+			split_non_compound_page(page, order);
 
 			/* Add all subpages to the order-0 head, in sequence. */
 			list_del(&page->lru);
@@ -6842,28 +6841,8 @@ static int __alloc_contig_verify_gfp_mask(gfp_t gfp_mask, gfp_t *gfp_cc_mask)
 	return 0;
 }
 
-/**
- * alloc_contig_range() -- tries to allocate given range of pages
- * @start:	start PFN to allocate
- * @end:	one-past-the-last PFN to allocate
- * @alloc_flags:	allocation information
- * @gfp_mask:	GFP mask. Node/zone/placement hints are ignored; only some
- *		action and reclaim modifiers are supported. Reclaim modifiers
- *		control allocation behavior during compaction/migration/reclaim.
- *
- * The PFN range does not have to be pageblock aligned. The PFN range must
- * belong to a single zone.
- *
- * The first thing this routine does is attempt to MIGRATE_ISOLATE all
- * pageblocks in the range.  Once isolated, the pageblocks should not
- * be modified by others.
- *
- * Return: zero on success or negative error code.  On success all
- * pages which PFN is in [start, end) are allocated for the caller and
- * need to be freed with free_contig_range().
- */
-int alloc_contig_range_noprof(unsigned long start, unsigned long end,
-			      acr_flags_t alloc_flags, gfp_t gfp_mask)
+int alloc_contig_range_frozen_noprof(unsigned long start, unsigned long end,
+		acr_flags_t alloc_flags, gfp_t gfp_mask)
 {
 	const unsigned int order = ilog2(end - start);
 	unsigned long outer_start, outer_end;
@@ -6979,19 +6958,18 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
 	}
 
 	if (!(gfp_mask & __GFP_COMP)) {
-		split_free_pages(cc.freepages, gfp_mask);
+		split_free_frozen_pages(cc.freepages, gfp_mask);
 
 		/* Free head and tail (if any) */
 		if (start != outer_start)
-			free_contig_range(outer_start, start - outer_start);
+			free_contig_range_frozen(outer_start, start - outer_start);
 		if (end != outer_end)
-			free_contig_range(end, outer_end - end);
+			free_contig_range_frozen(end, outer_end - end);
 	} else if (start == outer_start && end == outer_end && is_power_of_2(end - start)) {
 		struct page *head = pfn_to_page(start);
 
 		check_new_pages(head, order);
 		prep_new_page(head, order, gfp_mask, 0);
-		set_page_refcounted(head);
 	} else {
 		ret = -EINVAL;
 		WARN(true, "PFN range: requested [%lu, %lu), allocated [%lu, %lu)\n",
@@ -7001,16 +6979,48 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
 	undo_isolate_page_range(start, end);
 	return ret;
 }
-EXPORT_SYMBOL(alloc_contig_range_noprof);
 
-static int __alloc_contig_pages(unsigned long start_pfn,
-				unsigned long nr_pages, gfp_t gfp_mask)
+/**
+ * alloc_contig_range() -- tries to allocate given range of pages
+ * @start:	start PFN to allocate
+ * @end:	one-past-the-last PFN to allocate
+ * @alloc_flags:	allocation information
+ * @gfp_mask:	GFP mask. Node/zone/placement hints are ignored; only some
+ *		action and reclaim modifiers are supported. Reclaim modifiers
+ *		control allocation behavior during compaction/migration/reclaim.
+ *
+ * The PFN range does not have to be pageblock aligned. The PFN range must
+ * belong to a single zone.
+ *
+ * The first thing this routine does is attempt to MIGRATE_ISOLATE all
+ * pageblocks in the range.  Once isolated, the pageblocks should not
+ * be modified by others.
+ *
+ * Return: zero on success or negative error code.  On success all
+ * pages which PFN is in [start, end) are allocated for the caller and
+ * need to be freed with free_contig_range().
+ */
+int alloc_contig_range_noprof(unsigned long start, unsigned long end,
+			      acr_flags_t alloc_flags, gfp_t gfp_mask)
 {
-	unsigned long end_pfn = start_pfn + nr_pages;
+	int ret;
+
+	ret = alloc_contig_range_frozen_noprof(start, end, alloc_flags, gfp_mask);
+	if (ret)
+		return ret;
+
+	if (gfp_mask & __GFP_COMP) {
+		set_page_refcounted(pfn_to_page(start));
+	} else {
+		unsigned long pfn;
+
+		for (pfn = start; pfn < end; pfn++)
+			set_page_refcounted(pfn_to_page(pfn));
+	}
 
-	return alloc_contig_range_noprof(start_pfn, end_pfn, ACR_FLAGS_NONE,
-					 gfp_mask);
+	return 0;
 }
+EXPORT_SYMBOL(alloc_contig_range_noprof);
 
 static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
 				   unsigned long nr_pages)
@@ -7059,31 +7069,8 @@ static bool zone_spans_last_pfn(const struct zone *zone,
 	return zone_spans_pfn(zone, last_pfn);
 }
 
-/**
- * alloc_contig_pages() -- tries to find and allocate contiguous range of pages
- * @nr_pages:	Number of contiguous pages to allocate
- * @gfp_mask:	GFP mask. Node/zone/placement hints limit the search; only some
- *		action and reclaim modifiers are supported. Reclaim modifiers
- *		control allocation behavior during compaction/migration/reclaim.
- * @nid:	Target node
- * @nodemask:	Mask for other possible nodes
- *
- * This routine is a wrapper around alloc_contig_range(). It scans over zones
- * on an applicable zonelist to find a contiguous pfn range which can then be
- * tried for allocation with alloc_contig_range(). This routine is intended
- * for allocation requests which can not be fulfilled with the buddy allocator.
- *
- * The allocated memory is always aligned to a page boundary. If nr_pages is a
- * power of two, then allocated range is also guaranteed to be aligned to same
- * nr_pages (e.g. 1GB request would be aligned to 1GB).
- *
- * Allocated pages can be freed with free_contig_range() or by manually calling
- * __free_page() on each allocated page.
- *
- * Return: pointer to contiguous pages on success, or NULL if not successful.
- */
-struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
-				 int nid, nodemask_t *nodemask)
+struct page *alloc_contig_frozen_pages_noprof(unsigned long nr_pages,
+		gfp_t gfp_mask, int nid, nodemask_t *nodemask)
 {
 	unsigned long ret, pfn, flags;
 	struct zonelist *zonelist;
@@ -7106,7 +7093,9 @@ struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
 				 * and cause alloc_contig_range() to fail...
 				 */
 				spin_unlock_irqrestore(&zone->lock, flags);
-				ret = __alloc_contig_pages(pfn, nr_pages,
+				ret = alloc_contig_range_frozen_noprof(pfn,
+							pfn + nr_pages,
+							ACR_FLAGS_NONE,
 							gfp_mask);
 				if (!ret)
 					return pfn_to_page(pfn);
@@ -7118,6 +7107,78 @@ struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
 	}
 	return NULL;
 }
+EXPORT_SYMBOL(alloc_contig_range_frozen_noprof);
+
+void free_contig_range_frozen(unsigned long pfn, unsigned long nr_pages)
+{
+	struct folio *folio = pfn_folio(pfn);
+
+	if (folio_test_large(folio)) {
+		int expected = folio_nr_pages(folio);
+
+		WARN_ON(folio_ref_count(folio));
+
+		if (nr_pages == expected)
+			free_frozen_pages(&folio->page, folio_order(folio));
+		else
+			WARN(true, "PFN %lu: nr_pages %lu != expected %d\n",
+			     pfn, nr_pages, expected);
+		return;
+	}
+
+	for (; nr_pages--; pfn++) {
+		struct page *page = pfn_to_page(pfn);
+
+		WARN_ON(page_ref_count(page));
+		free_frozen_pages(page, 0);
+	}
+}
+EXPORT_SYMBOL(free_contig_range_frozen);
+
+/**
+ * alloc_contig_pages() -- tries to find and allocate contiguous range of pages
+ * @nr_pages:	Number of contiguous pages to allocate
+ * @gfp_mask:	GFP mask. Node/zone/placement hints limit the search; only some
+ *		action and reclaim modifiers are supported. Reclaim modifiers
+ *		control allocation behavior during compaction/migration/reclaim.
+ * @nid:	Target node
+ * @nodemask:	Mask for other possible nodes
+ *
+ * This routine is a wrapper around alloc_contig_range(). It scans over zones
+ * on an applicable zonelist to find a contiguous pfn range which can then be
+ * tried for allocation with alloc_contig_range(). This routine is intended
+ * for allocation requests which can not be fulfilled with the buddy allocator.
+ *
+ * The allocated memory is always aligned to a page boundary. If nr_pages is a
+ * power of two, then allocated range is also guaranteed to be aligned to same
+ * nr_pages (e.g. 1GB request would be aligned to 1GB).
+ *
+ * Allocated pages can be freed with free_contig_range() or by manually calling
+ * __free_page() on each allocated page.
+ *
+ * Return: pointer to contiguous pages on success, or NULL if not successful.
+ */
+struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
+		int nid, nodemask_t *nodemask)
+{
+	struct page *page;
+
+	page =  alloc_contig_frozen_pages_noprof(nr_pages, gfp_mask, nid,
+						 nodemask);
+	if (!page)
+		return NULL;
+
+	if (gfp_mask & __GFP_COMP) {
+		set_page_refcounted(page);
+	} else {
+		unsigned long pfn = page_to_pfn(page);
+
+		for (; nr_pages--; pfn++)
+			set_page_refcounted(pfn_to_page(pfn));
+	}
+
+	return page;
+}
 
 void free_contig_range(unsigned long pfn, unsigned long nr_pages)
 {
-- 
2.27.0



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 6/8] mm: cma: add __cma_release()
  2025-09-18 13:19 [PATCH v2 0/8] mm: hugetlb: allocate frozen gigantic folio Kefeng Wang
                   ` (4 preceding siblings ...)
  2025-09-18 13:19 ` [PATCH v2 5/8] mm: page_alloc: add alloc_contig_{range_frozen,frozen_pages}() Kefeng Wang
@ 2025-09-18 13:19 ` Kefeng Wang
  2025-09-30 10:15   ` David Hildenbrand
  2025-09-18 13:19 ` [PATCH v2 7/8] mm: cma: add cma_alloc_frozen{_compound}() Kefeng Wang
  2025-09-18 13:20 ` [PATCH v2 8/8] mm: hugetlb: allocate frozen pages in alloc_gigantic_folio() Kefeng Wang
  7 siblings, 1 reply; 23+ messages in thread
From: Kefeng Wang @ 2025-09-18 13:19 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Oscar Salvador, Muchun Song,
	Zi Yan, Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm, Kefeng Wang

Kill cma_pages_valid() which only used in cma_release(), also
cleanup code duplication between cma pages valid checking and
cma memrange finding, add __cma_release() helper to prepare for
the upcoming frozen page release.

Reviewed-by: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/cma.h |  1 -
 mm/cma.c            | 57 ++++++++++++---------------------------------
 2 files changed, 15 insertions(+), 43 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h
index 62d9c1cf6326..e5745d2aec55 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -49,7 +49,6 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 					struct cma **res_cma);
 extern struct page *cma_alloc(struct cma *cma, unsigned long count, unsigned int align,
 			      bool no_warn);
-extern bool cma_pages_valid(struct cma *cma, const struct page *pages, unsigned long count);
 extern bool cma_release(struct cma *cma, const struct page *pages, unsigned long count);
 
 extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data);
diff --git a/mm/cma.c b/mm/cma.c
index 813e6dc7b095..2af8c5bc58dd 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -942,34 +942,36 @@ struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp)
 	return page ? page_folio(page) : NULL;
 }
 
-bool cma_pages_valid(struct cma *cma, const struct page *pages,
-		     unsigned long count)
+static bool __cma_release(struct cma *cma, const struct page *pages,
+			  unsigned long count)
 {
 	unsigned long pfn, end;
 	int r;
 	struct cma_memrange *cmr;
-	bool ret;
+
+	pr_debug("%s(page %p, count %lu)\n", __func__, (void *)pages, count);
 
 	if (!cma || !pages || count > cma->count)
 		return false;
 
 	pfn = page_to_pfn(pages);
-	ret = false;
 
 	for (r = 0; r < cma->nranges; r++) {
 		cmr = &cma->ranges[r];
 		end = cmr->base_pfn + cmr->count;
-		if (pfn >= cmr->base_pfn && pfn < end) {
-			ret = pfn + count <= end;
+		if (pfn >= cmr->base_pfn && pfn < end && pfn + count <= end)
 			break;
-		}
 	}
 
-	if (!ret)
-		pr_debug("%s(page %p, count %lu)\n",
-				__func__, (void *)pages, count);
+	if (r == cma->nranges)
+		return false;
 
-	return ret;
+	free_contig_range(pfn, count);
+	cma_clear_bitmap(cma, cmr, pfn, count);
+	cma_sysfs_account_release_pages(cma, count);
+	trace_cma_release(cma->name, pfn, pages, count);
+
+	return true;
 }
 
 /**
@@ -985,36 +987,7 @@ bool cma_pages_valid(struct cma *cma, const struct page *pages,
 bool cma_release(struct cma *cma, const struct page *pages,
 		 unsigned long count)
 {
-	struct cma_memrange *cmr;
-	unsigned long pfn, end_pfn;
-	int r;
-
-	pr_debug("%s(page %p, count %lu)\n", __func__, (void *)pages, count);
-
-	if (!cma_pages_valid(cma, pages, count))
-		return false;
-
-	pfn = page_to_pfn(pages);
-	end_pfn = pfn + count;
-
-	for (r = 0; r < cma->nranges; r++) {
-		cmr = &cma->ranges[r];
-		if (pfn >= cmr->base_pfn &&
-		    pfn < (cmr->base_pfn + cmr->count)) {
-			VM_BUG_ON(end_pfn > cmr->base_pfn + cmr->count);
-			break;
-		}
-	}
-
-	if (r == cma->nranges)
-		return false;
-
-	free_contig_range(pfn, count);
-	cma_clear_bitmap(cma, cmr, pfn, count);
-	cma_sysfs_account_release_pages(cma, count);
-	trace_cma_release(cma->name, pfn, pages, count);
-
-	return true;
+	return __cma_release(cma, pages, count);
 }
 
 bool cma_free_folio(struct cma *cma, const struct folio *folio)
@@ -1022,7 +995,7 @@ bool cma_free_folio(struct cma *cma, const struct folio *folio)
 	if (WARN_ON(!folio_test_large(folio)))
 		return false;
 
-	return cma_release(cma, &folio->page, folio_nr_pages(folio));
+	return __cma_release(cma, &folio->page, folio_nr_pages(folio));
 }
 
 int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
-- 
2.27.0



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 7/8] mm: cma: add cma_alloc_frozen{_compound}()
  2025-09-18 13:19 [PATCH v2 0/8] mm: hugetlb: allocate frozen gigantic folio Kefeng Wang
                   ` (5 preceding siblings ...)
  2025-09-18 13:19 ` [PATCH v2 6/8] mm: cma: add __cma_release() Kefeng Wang
@ 2025-09-18 13:19 ` Kefeng Wang
  2025-09-18 13:20 ` [PATCH v2 8/8] mm: hugetlb: allocate frozen pages in alloc_gigantic_folio() Kefeng Wang
  7 siblings, 0 replies; 23+ messages in thread
From: Kefeng Wang @ 2025-09-18 13:19 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Oscar Salvador, Muchun Song,
	Zi Yan, Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm, Kefeng Wang

Introduce cma_alloc_frozen{_compound}() helper to alloc pages
without incrementing their refcount, and convert hugetlb cma
to use cma_alloc_frozen_compound, also move cma_validate_zones()
into mm/internal.h since no outside user.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/cma.h | 25 +++++----------------
 mm/cma.c            | 55 +++++++++++++++++++++++++++++----------------
 mm/hugetlb_cma.c    | 22 ++++++++++--------
 mm/internal.h       |  6 +++++
 4 files changed, 60 insertions(+), 48 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h
index e5745d2aec55..4981c151ef84 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -51,29 +51,14 @@ extern struct page *cma_alloc(struct cma *cma, unsigned long count, unsigned int
 			      bool no_warn);
 extern bool cma_release(struct cma *cma, const struct page *pages, unsigned long count);
 
+struct page *cma_alloc_frozen(struct cma *cma, unsigned long count,
+		unsigned int align, bool no_warn);
+bool cma_release_frozen(struct cma *cma, const struct page *pages,
+		unsigned long count);
+
 extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data);
 extern bool cma_intersects(struct cma *cma, unsigned long start, unsigned long end);
 
 extern void cma_reserve_pages_on_error(struct cma *cma);
 
-#ifdef CONFIG_CMA
-struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp);
-bool cma_free_folio(struct cma *cma, const struct folio *folio);
-bool cma_validate_zones(struct cma *cma);
-#else
-static inline struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp)
-{
-	return NULL;
-}
-
-static inline bool cma_free_folio(struct cma *cma, const struct folio *folio)
-{
-	return false;
-}
-static inline bool cma_validate_zones(struct cma *cma)
-{
-	return false;
-}
-#endif
-
 #endif
diff --git a/mm/cma.c b/mm/cma.c
index 2af8c5bc58dd..aa237eab49bf 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -836,7 +836,7 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
 		spin_unlock_irq(&cma->lock);
 
 		mutex_lock(&cma->alloc_mutex);
-		ret = alloc_contig_range(pfn, pfn + count, ACR_FLAGS_CMA, gfp);
+		ret = alloc_contig_range_frozen(pfn, pfn + count, ACR_FLAGS_CMA, gfp);
 		mutex_unlock(&cma->alloc_mutex);
 		if (!ret)
 			break;
@@ -856,8 +856,8 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
 	return ret;
 }
 
-static struct page *__cma_alloc(struct cma *cma, unsigned long count,
-		       unsigned int align, gfp_t gfp)
+static struct page *__cma_alloc_frozen(struct cma *cma,
+		unsigned long count, unsigned int align, gfp_t gfp)
 {
 	struct page *page = NULL;
 	int ret = -ENOMEM, r;
@@ -914,6 +914,21 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count,
 	return page;
 }
 
+struct page *cma_alloc_frozen(struct cma *cma, unsigned long count,
+		unsigned int align, bool no_warn)
+{
+	gfp_t gfp = GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0);
+
+	return __cma_alloc_frozen(cma, count, align, gfp);
+}
+
+struct page *cma_alloc_frozen_compound(struct cma *cma, unsigned int order)
+{
+	gfp_t gfp = GFP_KERNEL | __GFP_COMP | __GFP_NOWARN;
+
+	return __cma_alloc_frozen(cma, 1 << order, order, gfp);
+}
+
 /**
  * cma_alloc() - allocate pages from contiguous area
  * @cma:   Contiguous memory region for which the allocation is performed.
@@ -927,23 +942,23 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count,
 struct page *cma_alloc(struct cma *cma, unsigned long count,
 		       unsigned int align, bool no_warn)
 {
-	return __cma_alloc(cma, count, align, GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
-}
-
-struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp)
-{
+	unsigned long pfn;
 	struct page *page;
 
-	if (WARN_ON(!order || !(gfp & __GFP_COMP)))
+	page = cma_alloc_frozen(cma, count, align, no_warn);
+	if (!page)
 		return NULL;
 
-	page = __cma_alloc(cma, 1 << order, order, gfp);
+	pfn = page_to_pfn(page);
 
-	return page ? page_folio(page) : NULL;
+	for (; count--; pfn++)
+		set_page_refcounted(pfn_to_page(pfn));
+
+	return page;
 }
 
 static bool __cma_release(struct cma *cma, const struct page *pages,
-			  unsigned long count)
+			  unsigned long count, bool frozen)
 {
 	unsigned long pfn, end;
 	int r;
@@ -966,7 +981,11 @@ static bool __cma_release(struct cma *cma, const struct page *pages,
 	if (r == cma->nranges)
 		return false;
 
-	free_contig_range(pfn, count);
+	if (frozen)
+		free_contig_range_frozen(pfn, count);
+	else
+		free_contig_range(pfn, count);
+
 	cma_clear_bitmap(cma, cmr, pfn, count);
 	cma_sysfs_account_release_pages(cma, count);
 	trace_cma_release(cma->name, pfn, pages, count);
@@ -987,15 +1006,13 @@ static bool __cma_release(struct cma *cma, const struct page *pages,
 bool cma_release(struct cma *cma, const struct page *pages,
 		 unsigned long count)
 {
-	return __cma_release(cma, pages, count);
+	return __cma_release(cma, pages, count, false);
 }
 
-bool cma_free_folio(struct cma *cma, const struct folio *folio)
+bool cma_release_frozen(struct cma *cma, const struct page *pages,
+		unsigned long count)
 {
-	if (WARN_ON(!folio_test_large(folio)))
-		return false;
-
-	return __cma_release(cma, &folio->page, folio_nr_pages(folio));
+	return __cma_release(cma, pages, count, true);
 }
 
 int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
diff --git a/mm/hugetlb_cma.c b/mm/hugetlb_cma.c
index e8e4dc7182d5..fc41f3b949f8 100644
--- a/mm/hugetlb_cma.c
+++ b/mm/hugetlb_cma.c
@@ -22,33 +22,37 @@ void hugetlb_cma_free_folio(struct folio *folio)
 {
 	int nid = folio_nid(folio);
 
-	WARN_ON_ONCE(!cma_free_folio(hugetlb_cma[nid], folio));
+	WARN_ON_ONCE(!cma_release(hugetlb_cma[nid], &folio->page,
+				  folio_nr_pages(folio)));
 }
 
-
 struct folio *hugetlb_cma_alloc_folio(int order, gfp_t gfp_mask,
 				      int nid, nodemask_t *nodemask)
 {
 	int node;
-	struct folio *folio = NULL;
+	struct folio *folio;
+	struct page *page = NULL;
 
 	if (hugetlb_cma[nid])
-		folio = cma_alloc_folio(hugetlb_cma[nid], order, gfp_mask);
+		page = cma_alloc_frozen_compound(hugetlb_cma[nid], order);
 
-	if (!folio && !(gfp_mask & __GFP_THISNODE)) {
+	if (!page && !(gfp_mask & __GFP_THISNODE)) {
 		for_each_node_mask(node, *nodemask) {
 			if (node == nid || !hugetlb_cma[node])
 				continue;
 
-			folio = cma_alloc_folio(hugetlb_cma[node], order, gfp_mask);
-			if (folio)
+			page = cma_alloc_frozen_compound(hugetlb_cma[nid], order);
+			if (page)
 				break;
 		}
 	}
 
-	if (folio)
-		folio_set_hugetlb_cma(folio);
+	if (!page)
+		return NULL;
 
+	set_page_refcounted(page);
+	folio = page_folio(page);
+	folio_set_hugetlb_cma(folio);
 	return folio;
 }
 
diff --git a/mm/internal.h b/mm/internal.h
index 1561fc2ff5b8..ffcfde60059e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -936,9 +936,15 @@ void init_cma_reserved_pageblock(struct page *page);
 struct cma;
 
 #ifdef CONFIG_CMA
+struct page *cma_alloc_frozen_compound(struct cma *cma, unsigned int order);
+bool cma_validate_zones(struct cma *cma);
 void *cma_reserve_early(struct cma *cma, unsigned long size);
 void init_cma_pageblock(struct page *page);
 #else
+static inline bool cma_validate_zones(struct cma *cma)
+{
+	return false;
+}
 static inline void *cma_reserve_early(struct cma *cma, unsigned long size)
 {
 	return NULL;
-- 
2.27.0



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 8/8] mm: hugetlb: allocate frozen pages in alloc_gigantic_folio()
  2025-09-18 13:19 [PATCH v2 0/8] mm: hugetlb: allocate frozen gigantic folio Kefeng Wang
                   ` (6 preceding siblings ...)
  2025-09-18 13:19 ` [PATCH v2 7/8] mm: cma: add cma_alloc_frozen{_compound}() Kefeng Wang
@ 2025-09-18 13:20 ` Kefeng Wang
  7 siblings, 0 replies; 23+ messages in thread
From: Kefeng Wang @ 2025-09-18 13:20 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Oscar Salvador, Muchun Song,
	Zi Yan, Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm, Kefeng Wang

The alloc_gigantic_folio() allocates a folio by alloc_contig_range()
with refcount increated and then freeze it, convert to allocate a
frozen folio to remove the atomic operation about folio refcount,
and saving atomic operation during __update_and_free_hugetlb_folio
too. Also rename hugetlb_cma_{alloc,free}_folio() with frozen which
make them more self-explanatory.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/gfp.h | 23 -------------------
 mm/hugetlb.c        | 54 ++++++++++-----------------------------------
 mm/hugetlb_cma.c    | 11 +++++----
 mm/hugetlb_cma.h    | 10 ++++-----
 4 files changed, 22 insertions(+), 76 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index fbbdd8c88483..82aba162f352 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -453,27 +453,4 @@ void free_contig_range_frozen(unsigned long pfn, unsigned long nr_pages);
 void free_contig_range(unsigned long pfn, unsigned long nr_pages);
 #endif
 
-#ifdef CONFIG_CONTIG_ALLOC
-static inline struct folio *folio_alloc_gigantic_noprof(int order, gfp_t gfp,
-							int nid, nodemask_t *node)
-{
-	struct page *page;
-
-	if (WARN_ON(!order || !(gfp & __GFP_COMP)))
-		return NULL;
-
-	page = alloc_contig_pages_noprof(1 << order, gfp, nid, node);
-
-	return page ? page_folio(page) : NULL;
-}
-#else
-static inline struct folio *folio_alloc_gigantic_noprof(int order, gfp_t gfp,
-							int nid, nodemask_t *node)
-{
-	return NULL;
-}
-#endif
-/* This should be paired with folio_put() rather than free_contig_range(). */
-#define folio_alloc_gigantic(...) alloc_hooks(folio_alloc_gigantic_noprof(__VA_ARGS__))
-
 #endif /* __LINUX_GFP_H */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bc88b659a88b..ce5f94c15268 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -125,16 +125,6 @@ static void hugetlb_unshare_pmds(struct vm_area_struct *vma,
 		unsigned long start, unsigned long end, bool take_locks);
 static struct resv_map *vma_resv_map(struct vm_area_struct *vma);
 
-static void hugetlb_free_folio(struct folio *folio)
-{
-	if (folio_test_hugetlb_cma(folio)) {
-		hugetlb_cma_free_folio(folio);
-		return;
-	}
-
-	folio_put(folio);
-}
-
 static inline bool subpool_is_free(struct hugepage_subpool *spool)
 {
 	if (spool->count)
@@ -1472,44 +1462,22 @@ static int hstate_next_node_to_free(struct hstate *h, nodemask_t *nodes_allowed)
 		nr_nodes--)
 
 #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
-#ifdef CONFIG_CONTIG_ALLOC
 static struct folio *alloc_gigantic_folio(int order, gfp_t gfp_mask,
 		int nid, nodemask_t *nodemask)
 {
 	struct folio *folio;
-	bool retried = false;
 
-retry:
-	folio = hugetlb_cma_alloc_folio(order, gfp_mask, nid, nodemask);
-	if (!folio) {
-		if (hugetlb_cma_exclusive_alloc())
-			return NULL;
-
-		folio = folio_alloc_gigantic(order, gfp_mask, nid, nodemask);
-		if (!folio)
-			return NULL;
-	}
-
-	if (folio_ref_freeze(folio, 1))
+	folio = hugetlb_cma_alloc_frozen_folio(order, gfp_mask, nid, nodemask);
+	if (folio)
 		return folio;
 
-	pr_warn("HugeTLB: unexpected refcount on PFN %lu\n", folio_pfn(folio));
-	hugetlb_free_folio(folio);
-	if (!retried) {
-		retried = true;
-		goto retry;
-	}
-	return NULL;
-}
+	if (hugetlb_cma_exclusive_alloc())
+		return NULL;
 
-#else /* !CONFIG_CONTIG_ALLOC */
-static struct folio *alloc_gigantic_folio(int order, gfp_t gfp_mask, int nid,
-					  nodemask_t *nodemask)
-{
-	return NULL;
+	folio = (struct folio *)alloc_contig_frozen_pages(1 << order, gfp_mask,
+							  nid, nodemask);
+	return folio;
 }
-#endif /* CONFIG_CONTIG_ALLOC */
-
 #else /* !CONFIG_ARCH_HAS_GIGANTIC_PAGE */
 static struct folio *alloc_gigantic_folio(int order, gfp_t gfp_mask, int nid,
 					  nodemask_t *nodemask)
@@ -1641,9 +1609,11 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
 	if (unlikely(folio_test_hwpoison(folio)))
 		folio_clear_hugetlb_hwpoison(folio);
 
-	folio_ref_unfreeze(folio, 1);
-
-	hugetlb_free_folio(folio);
+	VM_BUG_ON_FOLIO(folio_ref_count(folio), folio);
+	if (folio_test_hugetlb_cma(folio))
+		hugetlb_cma_free_frozen_folio(folio);
+	else
+		free_frozen_pages(&folio->page, folio_order(folio));
 }
 
 /*
diff --git a/mm/hugetlb_cma.c b/mm/hugetlb_cma.c
index fc41f3b949f8..af9caaf007e4 100644
--- a/mm/hugetlb_cma.c
+++ b/mm/hugetlb_cma.c
@@ -18,16 +18,16 @@ static unsigned long hugetlb_cma_size_in_node[MAX_NUMNODES] __initdata;
 static bool hugetlb_cma_only;
 static unsigned long hugetlb_cma_size __initdata;
 
-void hugetlb_cma_free_folio(struct folio *folio)
+void hugetlb_cma_free_frozen_folio(struct folio *folio)
 {
 	int nid = folio_nid(folio);
 
-	WARN_ON_ONCE(!cma_release(hugetlb_cma[nid], &folio->page,
-				  folio_nr_pages(folio)));
+	WARN_ON_ONCE(!cma_release_frozen(hugetlb_cma[nid], &folio->page,
+					 folio_nr_pages(folio)));
 }
 
-struct folio *hugetlb_cma_alloc_folio(int order, gfp_t gfp_mask,
-				      int nid, nodemask_t *nodemask)
+struct folio *hugetlb_cma_alloc_frozen_folio(int order, gfp_t gfp_mask,
+		int nid, nodemask_t *nodemask)
 {
 	int node;
 	struct folio *folio;
@@ -50,7 +50,6 @@ struct folio *hugetlb_cma_alloc_folio(int order, gfp_t gfp_mask,
 	if (!page)
 		return NULL;
 
-	set_page_refcounted(page);
 	folio = page_folio(page);
 	folio_set_hugetlb_cma(folio);
 	return folio;
diff --git a/mm/hugetlb_cma.h b/mm/hugetlb_cma.h
index 2c2ec8a7e134..3bc295c8c38e 100644
--- a/mm/hugetlb_cma.h
+++ b/mm/hugetlb_cma.h
@@ -3,8 +3,8 @@
 #define _LINUX_HUGETLB_CMA_H
 
 #ifdef CONFIG_CMA
-void hugetlb_cma_free_folio(struct folio *folio);
-struct folio *hugetlb_cma_alloc_folio(int order, gfp_t gfp_mask,
+void hugetlb_cma_free_frozen_folio(struct folio *folio);
+struct folio *hugetlb_cma_alloc_frozen_folio(int order, gfp_t gfp_mask,
 				      int nid, nodemask_t *nodemask);
 struct huge_bootmem_page *hugetlb_cma_alloc_bootmem(struct hstate *h, int *nid,
 						    bool node_exact);
@@ -14,12 +14,12 @@ unsigned long hugetlb_cma_total_size(void);
 void hugetlb_cma_validate_params(void);
 bool hugetlb_early_cma(struct hstate *h);
 #else
-static inline void hugetlb_cma_free_folio(struct folio *folio)
+static inline void hugetlb_cma_free_frozen_folio(struct folio *folio)
 {
 }
 
-static inline struct folio *hugetlb_cma_alloc_folio(int order, gfp_t gfp_mask,
-		int nid, nodemask_t *nodemask)
+static inline struct folio *hugetlb_cma_alloc_frozen_folio(int order,
+		gfp_t gfp_mask,	int nid, nodemask_t *nodemask)
 {
 	return NULL;
 }
-- 
2.27.0



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/8] mm: page_alloc: optimize pfn_range_valid_contig()
  2025-09-18 13:19 ` [PATCH v2 1/8] mm: page_alloc: optimize pfn_range_valid_contig() Kefeng Wang
@ 2025-09-18 15:49   ` Zi Yan
  2025-09-19  2:03     ` Kefeng Wang
  2025-09-19  1:40   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Zi Yan @ 2025-09-18 15:49 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, David Hildenbrand, Oscar Salvador, Muchun Song,
	Matthew Wilcox, sidhartha.kumar, jane.chu, Vlastimil Babka,
	Brendan Jackman, Johannes Weiner, linux-mm

On 18 Sep 2025, at 9:19, Kefeng Wang wrote:

> The alloc_contig_pages() spends a lot of time in pfn_range_valid_contig(),
> we could check whether the page in this pfn range could be allocated
> before alloc_contig_range(), if the page can't be migrated, no further
> action is required, and also skip some unnecessary iterations for
> compound pages such as THP and non-compound high order buddy, which
> save times a lot too. The check is racy, but the only danger is skipping
> too much.
>
> A simple test on machine with 116G free memory, allocate 120 * 1G
> HugeTLB folios(107 successfully returned),
>
>   time echo 120 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>
> Before: 0m2.124s
> After:  0m0.602s
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  mm/page_alloc.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 478beaf95f84..5b7d705e9710 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7012,6 +7012,7 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
>  {
>  	unsigned long i, end_pfn = start_pfn + nr_pages;
>  	struct page *page;
> +	struct folio *folio;
>
>  	for (i = start_pfn; i < end_pfn; i++) {
>  		page = pfn_to_online_page(i);
> @@ -7021,11 +7022,26 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
>  		if (page_zone(page) != z)
>  			return false;
>
> -		if (PageReserved(page))
> +		folio = page_folio(page);
> +		if (folio_test_reserved(folio))
>  			return false;
>
> -		if (PageHuge(page))
> +		if (folio_test_hugetlb(folio))
>  			return false;
> +
> +		/* The following type of folios aren't migrated */

s/aren’t/cannot be/

> +		if (folio_test_pgtable(folio) | folio_test_stack(folio))
> +			return false;

Maybe worth explicitly stating these two types of pages in the commit log.

> +
> +		/*
> +		 * For compound pages such as THP and non-compound high
> +		 * order buddy pages, save potentially a lot of iterations
> +		 * if we can skip them at once.
> +		 */
> +		if (PageCompound(page))
> +			i += (1UL << compound_order(page)) - 1;

Just a note here, if page is tail, this just move i to the next page
instead of next folio.

> +		else if (PageBuddy(page))
> +			i += (1UL << buddy_order(page)) - 1;
>  	}
>  	return true;
>  }
> -- 
> 2.27.0

Otherwise, LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com>


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/8] mm: page_alloc: optimize pfn_range_valid_contig()
  2025-09-18 13:19 ` [PATCH v2 1/8] mm: page_alloc: optimize pfn_range_valid_contig() Kefeng Wang
  2025-09-18 15:49   ` Zi Yan
@ 2025-09-19  1:40   ` kernel test robot
  2025-09-19  5:00   ` Dev Jain
  2025-09-30  9:56   ` David Hildenbrand
  3 siblings, 0 replies; 23+ messages in thread
From: kernel test robot @ 2025-09-19  1:40 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton, David Hildenbrand, Oscar Salvador,
	Muchun Song, Zi Yan, Matthew Wilcox
  Cc: llvm, oe-kbuild-all, Linux Memory Management List,
	sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, Kefeng Wang

Hi Kefeng,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Kefeng-Wang/mm-page_alloc-optimize-pfn_range_valid_contig/20250918-212431
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20250918132000.1951232-2-wangkefeng.wang%40huawei.com
patch subject: [PATCH v2 1/8] mm: page_alloc: optimize pfn_range_valid_contig()
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20250919/202509190917.wgDdVYHL-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250919/202509190917.wgDdVYHL-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202509190917.wgDdVYHL-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> mm/page_alloc.c:7033:7: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical]
    7033 |                 if (folio_test_pgtable(folio) | folio_test_stack(folio))
         |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         |                                               ||
   mm/page_alloc.c:7033:7: note: cast one or both operands to int to silence this warning
   1 warning generated.


vim +7033 mm/page_alloc.c

  7009	
  7010	static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
  7011					   unsigned long nr_pages)
  7012	{
  7013		unsigned long i, end_pfn = start_pfn + nr_pages;
  7014		struct page *page;
  7015		struct folio *folio;
  7016	
  7017		for (i = start_pfn; i < end_pfn; i++) {
  7018			page = pfn_to_online_page(i);
  7019			if (!page)
  7020				return false;
  7021	
  7022			if (page_zone(page) != z)
  7023				return false;
  7024	
  7025			folio = page_folio(page);
  7026			if (folio_test_reserved(folio))
  7027				return false;
  7028	
  7029			if (folio_test_hugetlb(folio))
  7030				return false;
  7031	
  7032			/* The following type of folios aren't migrated */
> 7033			if (folio_test_pgtable(folio) | folio_test_stack(folio))
  7034				return false;
  7035	
  7036			/*
  7037			 * For compound pages such as THP and non-compound high
  7038			 * order buddy pages, save potentially a lot of iterations
  7039			 * if we can skip them at once.
  7040			 */
  7041			if (PageCompound(page))
  7042				i += (1UL << compound_order(page)) - 1;
  7043			else if (PageBuddy(page))
  7044				i += (1UL << buddy_order(page)) - 1;
  7045		}
  7046		return true;
  7047	}
  7048	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/8] mm: page_alloc: optimize pfn_range_valid_contig()
  2025-09-18 15:49   ` Zi Yan
@ 2025-09-19  2:03     ` Kefeng Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Kefeng Wang @ 2025-09-19  2:03 UTC (permalink / raw)
  To: Zi Yan
  Cc: Andrew Morton, David Hildenbrand, Oscar Salvador, Muchun Song,
	Matthew Wilcox, sidhartha.kumar, jane.chu, Vlastimil Babka,
	Brendan Jackman, Johannes Weiner, linux-mm



On 2025/9/18 23:49, Zi Yan wrote:
> On 18 Sep 2025, at 9:19, Kefeng Wang wrote:
> 
>> The alloc_contig_pages() spends a lot of time in pfn_range_valid_contig(),
>> we could check whether the page in this pfn range could be allocated
>> before alloc_contig_range(), if the page can't be migrated, no further
>> action is required, and also skip some unnecessary iterations for
>> compound pages such as THP and non-compound high order buddy, which
>> save times a lot too. The check is racy, but the only danger is skipping
>> too much.
>>
>> A simple test on machine with 116G free memory, allocate 120 * 1G
>> HugeTLB folios(107 successfully returned),
>>
>>    time echo 120 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>>
>> Before: 0m2.124s
>> After:  0m0.602s
>>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>   mm/page_alloc.c | 20 ++++++++++++++++++--
>>   1 file changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 478beaf95f84..5b7d705e9710 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -7012,6 +7012,7 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
>>   {
>>   	unsigned long i, end_pfn = start_pfn + nr_pages;
>>   	struct page *page;
>> +	struct folio *folio;
>>
>>   	for (i = start_pfn; i < end_pfn; i++) {
>>   		page = pfn_to_online_page(i);
>> @@ -7021,11 +7022,26 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
>>   		if (page_zone(page) != z)
>>   			return false;
>>
>> -		if (PageReserved(page))
>> +		folio = page_folio(page);
>> +		if (folio_test_reserved(folio))
>>   			return false;
>>
>> -		if (PageHuge(page))
>> +		if (folio_test_hugetlb(folio))
>>   			return false;
>> +
>> +		/* The following type of folios aren't migrated */
> 
> s/aren’t/cannot be/

ACK.

> 
>> +		if (folio_test_pgtable(folio) | folio_test_stack(folio))
>> +			return false;

should be "||", will fix.

> 
> Maybe worth explicitly stating these two types of pages in the commit log.
> 
OK, will update.

>> +
>> +		/*
>> +		 * For compound pages such as THP and non-compound high
>> +		 * order buddy pages, save potentially a lot of iterations
>> +		 * if we can skip them at once.
>> +		 */
>> +		if (PageCompound(page))
>> +			i += (1UL << compound_order(page)) - 1;
> 
> Just a note here, if page is tail, this just move i to the next page
> instead of next folio.

As no reference held，it is not too precise but we optimize for most
scenarios.

> 
>> +		else if (PageBuddy(page))
>> +			i += (1UL << buddy_order(page)) - 1;
>>   	}
>>   	return true;
>>   }
>> -- 
>> 2.27.0
> 
> Otherwise, LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com>
> 
> 

Thanks.

> Best Regards,
> Yan, Zi
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/8] mm: page_alloc: optimize pfn_range_valid_contig()
  2025-09-18 13:19 ` [PATCH v2 1/8] mm: page_alloc: optimize pfn_range_valid_contig() Kefeng Wang
  2025-09-18 15:49   ` Zi Yan
  2025-09-19  1:40   ` kernel test robot
@ 2025-09-19  5:00   ` Dev Jain
  2025-09-20  8:19     ` Kefeng Wang
  2025-09-30  9:56   ` David Hildenbrand
  3 siblings, 1 reply; 23+ messages in thread
From: Dev Jain @ 2025-09-19  5:00 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton, David Hildenbrand, Oscar Salvador,
	Muchun Song, Zi Yan, Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm


On 18/09/25 6:49 pm, Kefeng Wang wrote:
> The alloc_contig_pages() spends a lot of time in pfn_range_valid_contig(),
> we could check whether the page in this pfn range could be allocated
> before alloc_contig_range(), if the page can't be migrated, no further
> action is required, and also skip some unnecessary iterations for
> compound pages such as THP and non-compound high order buddy, which
> save times a lot too. The check is racy, but the only danger is skipping
> too much.
>
> A simple test on machine with 116G free memory, allocate 120 * 1G
> HugeTLB folios(107 successfully returned),
>
>    time echo 120 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>
> Before: 0m2.124s
> After:  0m0.602s
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>   mm/page_alloc.c | 20 ++++++++++++++++++--
>   1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 478beaf95f84..5b7d705e9710 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7012,6 +7012,7 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
>   {
>   	unsigned long i, end_pfn = start_pfn + nr_pages;
>   	struct page *page;
> +	struct folio *folio;
>   
>   	for (i = start_pfn; i < end_pfn; i++) {
>   		page = pfn_to_online_page(i);
> @@ -7021,11 +7022,26 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
>   		if (page_zone(page) != z)
>   			return false;
>   
> -		if (PageReserved(page))
> +		folio = page_folio(page);
> +		if (folio_test_reserved(folio))
>   			return false;
>   
> -		if (PageHuge(page))
> +		if (folio_test_hugetlb(folio))
>   			return false;
> +
> +		/* The following type of folios aren't migrated */
> +		if (folio_test_pgtable(folio) | folio_test_stack(folio))
> +			return false;
> +
> +		/*
> +		 * For compound pages such as THP and non-compound high
> +		 * order buddy pages, save potentially a lot of iterations
> +		 * if we can skip them at once.
> +		 */
> +		if (PageCompound(page))
> +			i += (1UL << compound_order(page)) - 1;

Can we instead do
if (folio_test_large(folio))
     i += folio_nr_pages(folio) - folio_page_idx(folio, page).

> +		else if (PageBuddy(page))
> +			i += (1UL << buddy_order(page)) - 1;
>   	}
>   	return true;
>   }


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/8] mm: page_alloc: optimize pfn_range_valid_contig()
  2025-09-19  5:00   ` Dev Jain
@ 2025-09-20  8:19     ` Kefeng Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Kefeng Wang @ 2025-09-20  8:19 UTC (permalink / raw)
  To: Dev Jain, Andrew Morton, David Hildenbrand, Oscar Salvador,
	Muchun Song, Zi Yan, Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm



On 2025/9/19 13:00, Dev Jain wrote:
> 
> On 18/09/25 6:49 pm, Kefeng Wang wrote:
>> The alloc_contig_pages() spends a lot of time in 
>> pfn_range_valid_contig(),
>> we could check whether the page in this pfn range could be allocated
>> before alloc_contig_range(), if the page can't be migrated, no further
>> action is required, and also skip some unnecessary iterations for
>> compound pages such as THP and non-compound high order buddy, which
>> save times a lot too. The check is racy, but the only danger is skipping
>> too much.
>>
>> A simple test on machine with 116G free memory, allocate 120 * 1G
>> HugeTLB folios(107 successfully returned),
>>
>>    time echo 120 > /sys/kernel/mm/hugepages/hugepages-1048576kB/ 
>> nr_hugepages
>>
>> Before: 0m2.124s
>> After:  0m0.602s
>>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>   mm/page_alloc.c | 20 ++++++++++++++++++--
>>   1 file changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 478beaf95f84..5b7d705e9710 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -7012,6 +7012,7 @@ static bool pfn_range_valid_contig(struct zone 
>> *z, unsigned long start_pfn,
>>   {
>>       unsigned long i, end_pfn = start_pfn + nr_pages;
>>       struct page *page;
>> +    struct folio *folio;
>>       for (i = start_pfn; i < end_pfn; i++) {
>>           page = pfn_to_online_page(i);
>> @@ -7021,11 +7022,26 @@ static bool pfn_range_valid_contig(struct zone 
>> *z, unsigned long start_pfn,
>>           if (page_zone(page) != z)
>>               return false;
>> -        if (PageReserved(page))
>> +        folio = page_folio(page);
>> +        if (folio_test_reserved(folio))
>>               return false;
>> -        if (PageHuge(page))
>> +        if (folio_test_hugetlb(folio))
>>               return false;
>> +
>> +        /* The following type of folios aren't migrated */
>> +        if (folio_test_pgtable(folio) | folio_test_stack(folio))
>> +            return false;
>> +
>> +        /*
>> +         * For compound pages such as THP and non-compound high
>> +         * order buddy pages, save potentially a lot of iterations
>> +         * if we can skip them at once.
>> +         */
>> +        if (PageCompound(page))
>> +            i += (1UL << compound_order(page)) - 1;
> 
> Can we instead do
> if (folio_test_large(folio))
>      i += folio_nr_pages(folio) - folio_page_idx(folio, page).

I'm afraid not, see 9342bc134ae7 ("mm/memory_hotplug: fix call
folio_test_large with tail page in do_migrate_range").



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/8] mm: page_alloc: optimize pfn_range_valid_contig()
  2025-09-18 13:19 ` [PATCH v2 1/8] mm: page_alloc: optimize pfn_range_valid_contig() Kefeng Wang
                     ` (2 preceding siblings ...)
  2025-09-19  5:00   ` Dev Jain
@ 2025-09-30  9:56   ` David Hildenbrand
  2025-10-09 12:40     ` Kefeng Wang
  3 siblings, 1 reply; 23+ messages in thread
From: David Hildenbrand @ 2025-09-30  9:56 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton, Oscar Salvador, Muchun Song, Zi Yan,
	Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm

On 18.09.25 15:19, Kefeng Wang wrote:
> The alloc_contig_pages() spends a lot of time in pfn_range_valid_contig(),
> we could check whether the page in this pfn range could be allocated
> before alloc_contig_range(), if the page can't be migrated, no further
> action is required, and also skip some unnecessary iterations for
> compound pages such as THP and non-compound high order buddy, which
> save times a lot too. The check is racy, but the only danger is skipping
> too much.
> 
> A simple test on machine with 116G free memory, allocate 120 * 1G
> HugeTLB folios(107 successfully returned),
> 
>    time echo 120 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> 
> Before: 0m2.124s
> After:  0m0.602s
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>   mm/page_alloc.c | 20 ++++++++++++++++++--
>   1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 478beaf95f84..5b7d705e9710 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7012,6 +7012,7 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
>   {
>   	unsigned long i, end_pfn = start_pfn + nr_pages;
>   	struct page *page;
> +	struct folio *folio;
>   
>   	for (i = start_pfn; i < end_pfn; i++) {
>   		page = pfn_to_online_page(i);
> @@ -7021,11 +7022,26 @@ static bool pfn_range_valid_contig(struct zone *z, unsigned long start_pfn,
>   		if (page_zone(page) != z)
>   			return false;
>   
> -		if (PageReserved(page))
> +		folio = page_folio(page);
> +		if (folio_test_reserved(folio))
>   			return false;
>   
> -		if (PageHuge(page))
> +		if (folio_test_hugetlb(folio))
>   			return false;
> +
> +		/* The following type of folios aren't migrated */
> +		if (folio_test_pgtable(folio) | folio_test_stack(folio))
> +			return false;
> +

I don't enjoy us open coding this here. has_unmovable_pages() has a much 
better heuristics.

I suggest you drop this patch for now from this series, as it seems to 
be independent from the rest, and instead see if you could reuse some of 
the has_unmovable_pages() logic instead.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/8] mm: hugetlb: optimize replace_free_hugepage_folios()
  2025-09-18 13:19 ` [PATCH v2 2/8] mm: hugetlb: optimize replace_free_hugepage_folios() Kefeng Wang
@ 2025-09-30  9:57   ` David Hildenbrand
  2025-10-09 12:40     ` Kefeng Wang
  0 siblings, 1 reply; 23+ messages in thread
From: David Hildenbrand @ 2025-09-30  9:57 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton, Oscar Salvador, Muchun Song, Zi Yan,
	Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm

On 18.09.25 15:19, Kefeng Wang wrote:
> No need to replace free hugepage folios if no free hugetlb folios,
> we don't replace gigantic folio, so use isolate_or_dissolve_huge_folio(),
> also skip some pfn iterations for compound pages such as THP and
> non-compound high order buddy to save time.
> 
> A simple test on machine with 116G free memory, allocate 120 * 1G
> HugeTLB folios(107 successfully returned),
> 
>    time echo 120 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
> 
> Before: 0m0.602s
> After:  0m0.429s

Also this patch feels misplaced in this series. I suggest you send that 
out separately.

Or is there anything important that I am missing?

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/8] mm: debug_vm_pgtable: add debug_vm_pgtable_free_huge_page()
  2025-09-18 13:19 ` [PATCH v2 3/8] mm: debug_vm_pgtable: add debug_vm_pgtable_free_huge_page() Kefeng Wang
@ 2025-09-30 10:01   ` David Hildenbrand
  0 siblings, 0 replies; 23+ messages in thread
From: David Hildenbrand @ 2025-09-30 10:01 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton, Oscar Salvador, Muchun Song, Zi Yan,
	Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm

On 18.09.25 15:19, Kefeng Wang wrote:
> Add a new helper to free huge page to be consistency to
> debug_vm_pgtable_alloc_huge_page(), and use HPAGE_PUD_ORDER
> instead of open-code.
> 
> Also move the free_contig_range() under CONFIG_ALLOC_CONTIG
> since all caller are built with CONFIG_ALLOC_CONTIG.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/8] mm: page_alloc: add split_non_compound_page()
  2025-09-18 13:19 ` [PATCH v2 4/8] mm: page_alloc: add split_non_compound_page() Kefeng Wang
@ 2025-09-30 10:06   ` David Hildenbrand
  2025-10-09 12:40     ` Kefeng Wang
  0 siblings, 1 reply; 23+ messages in thread
From: David Hildenbrand @ 2025-09-30 10:06 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton, Oscar Salvador, Muchun Song, Zi Yan,
	Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm

On 18.09.25 15:19, Kefeng Wang wrote:
> Add new split_non_compound_page() to simplify make_alloc_exact().
> 

"Factor out the splitting of non-compound page from make_alloc_exact() 
and split_page() into a new helper function split_non_compound_page()".

Not sure I enjoy the name "split_non_compound_page()", but it matches 
the existing theme of split_page(): we're not really splitting any 
pages, we're just adjusting tracking metadata for pages part of the 
original-higher-order-page so it can be freed separately later.

But now I think of it, the terminology is bad if you look at the 
description of split_page(): "split_page takes a non-compound 
higher-order page, and splits it into n (1<<order) sub-pages: 
page[0..n]". It's unclear how split_non_compound_page() would really differ.

I would suggest you call the new helper simply "__split_page" ?

-- 
Cheers

David / dhildenb

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/8] mm: cma: add __cma_release()
  2025-09-18 13:19 ` [PATCH v2 6/8] mm: cma: add __cma_release() Kefeng Wang
@ 2025-09-30 10:15   ` David Hildenbrand
  2025-10-09 12:40     ` Kefeng Wang
  0 siblings, 1 reply; 23+ messages in thread
From: David Hildenbrand @ 2025-09-30 10:15 UTC (permalink / raw)
  To: Kefeng Wang, Andrew Morton, Oscar Salvador, Muchun Song, Zi Yan,
	Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm

On 18.09.25 15:19, Kefeng Wang wrote:
> Kill cma_pages_valid() which only used in cma_release(), also
> cleanup code duplication between cma pages valid checking and
> cma memrange finding, add __cma_release() helper to prepare for
> the upcoming frozen page release.
> 
> Reviewed-by: Jane Chu <jane.chu@oracle.com>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>   include/linux/cma.h |  1 -
>   mm/cma.c            | 57 ++++++++++++---------------------------------
>   2 files changed, 15 insertions(+), 43 deletions(-)
> 
> diff --git a/include/linux/cma.h b/include/linux/cma.h
> index 62d9c1cf6326..e5745d2aec55 100644
> --- a/include/linux/cma.h
> +++ b/include/linux/cma.h
> @@ -49,7 +49,6 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
>   					struct cma **res_cma);
>   extern struct page *cma_alloc(struct cma *cma, unsigned long count, unsigned int align,
>   			      bool no_warn);
> -extern bool cma_pages_valid(struct cma *cma, const struct page *pages, unsigned long count);
>   extern bool cma_release(struct cma *cma, const struct page *pages, unsigned long count);
>   
>   extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data);
> diff --git a/mm/cma.c b/mm/cma.c
> index 813e6dc7b095..2af8c5bc58dd 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -942,34 +942,36 @@ struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp)
>   	return page ? page_folio(page) : NULL;
>   }
>   
> -bool cma_pages_valid(struct cma *cma, const struct page *pages,
> -		     unsigned long count)
> +static bool __cma_release(struct cma *cma, const struct page *pages,
> +			  unsigned long count)
>   {
>   	unsigned long pfn, end;
>   	int r;
>   	struct cma_memrange *cmr;
> -	bool ret;
> +
> +	pr_debug("%s(page %p, count %lu)\n", __func__, (void *)pages, count);
>   
>   	if (!cma || !pages || count > cma->count)
>   		return false;
>   
>   	pfn = page_to_pfn(pages);
> -	ret = false;
>   
>   	for (r = 0; r < cma->nranges; r++) {
>   		cmr = &cma->ranges[r];
>   		end = cmr->base_pfn + cmr->count;
> -		if (pfn >= cmr->base_pfn && pfn < end) {
> -			ret = pfn + count <= end;
> +		if (pfn >= cmr->base_pfn && pfn < end && pfn + count <= end)

Are you afraid of overflows here, or why can't it simply be

	if (pfn >= cmr->base_pfn && pfn + count <= end)

But I wonder if we want to keep here

	if (pfn >= cmr->base_pfn && pfn < end)
	
And VM_WARN if the area does not completely fit into the range. See below.


>   			break;
> -		}
>   	}
>   
> -	if (!ret)
> -		pr_debug("%s(page %p, count %lu)\n",
> -				__func__, (void *)pages, count);
> +	if (r == cma->nranges)
> +		return false;

Would we want to warn one way or the other in that case? Is it valid 
that someone tries to free a wrong range?

Note that the original code had this pr_debug() in case no range for the 
start pfn was found (IIUC, it's confusing) and this  VM_BUG_ON(end_pfn > 
cmr->base_pfn + cmr->count) in case a range was found but it would not 
completely match the

You're not discussing that behavioral change in the changelog, and I 
think we would want to keep some sanity checks, likely in a 
VM_WARN_ON_ONCE() form.


-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 6/8] mm: cma: add __cma_release()
  2025-09-30 10:15   ` David Hildenbrand
@ 2025-10-09 12:40     ` Kefeng Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Kefeng Wang @ 2025-10-09 12:40 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton, Oscar Salvador, Muchun Song,
	Zi Yan, Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm



On 2025/9/30 18:15, David Hildenbrand wrote:
> On 18.09.25 15:19, Kefeng Wang wrote:
>> Kill cma_pages_valid() which only used in cma_release(), also
>> cleanup code duplication between cma pages valid checking and
>> cma memrange finding, add __cma_release() helper to prepare for
>> the upcoming frozen page release.
>>
>> Reviewed-by: Jane Chu <jane.chu@oracle.com>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>   include/linux/cma.h |  1 -
>>   mm/cma.c            | 57 ++++++++++++---------------------------------
>>   2 files changed, 15 insertions(+), 43 deletions(-)
>>
>> diff --git a/include/linux/cma.h b/include/linux/cma.h
>> index 62d9c1cf6326..e5745d2aec55 100644
>> --- a/include/linux/cma.h
>> +++ b/include/linux/cma.h
>> @@ -49,7 +49,6 @@ extern int cma_init_reserved_mem(phys_addr_t base, 
>> phys_addr_t size,
>>                       struct cma **res_cma);
>>   extern struct page *cma_alloc(struct cma *cma, unsigned long count, 
>> unsigned int align,
>>                     bool no_warn);
>> -extern bool cma_pages_valid(struct cma *cma, const struct page 
>> *pages, unsigned long count);
>>   extern bool cma_release(struct cma *cma, const struct page *pages, 
>> unsigned long count);
>>   extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), 
>> void *data);
>> diff --git a/mm/cma.c b/mm/cma.c
>> index 813e6dc7b095..2af8c5bc58dd 100644
>> --- a/mm/cma.c
>> +++ b/mm/cma.c
>> @@ -942,34 +942,36 @@ struct folio *cma_alloc_folio(struct cma *cma, 
>> int order, gfp_t gfp)
>>       return page ? page_folio(page) : NULL;
>>   }
>> -bool cma_pages_valid(struct cma *cma, const struct page *pages,
>> -             unsigned long count)
>> +static bool __cma_release(struct cma *cma, const struct page *pages,
>> +              unsigned long count)
>>   {
>>       unsigned long pfn, end;
>>       int r;
>>       struct cma_memrange *cmr;
>> -    bool ret;
>> +
>> +    pr_debug("%s(page %p, count %lu)\n", __func__, (void *)pages, 
>> count);
>>       if (!cma || !pages || count > cma->count)
>>           return false;
>>       pfn = page_to_pfn(pages);
>> -    ret = false;
>>       for (r = 0; r < cma->nranges; r++) {
>>           cmr = &cma->ranges[r];
>>           end = cmr->base_pfn + cmr->count;
>> -        if (pfn >= cmr->base_pfn && pfn < end) {
>> -            ret = pfn + count <= end;
>> +        if (pfn >= cmr->base_pfn && pfn < end && pfn + count <= end)
> 
> Are you afraid of overflows here, or why can't it simply be
> 
>      if (pfn >= cmr->base_pfn && pfn + count <= end)
> 
> But I wonder if we want to keep here
> 
>      if (pfn >= cmr->base_pfn && pfn < end)
> 
> And VM_WARN if the area does not completely fit into the range. See below.
> 
> 
>>               break;
>> -        }
>>       }
>> -    if (!ret)
>> -        pr_debug("%s(page %p, count %lu)\n",
>> -                __func__, (void *)pages, count);
>> +    if (r == cma->nranges)
>> +        return false;
> 
> Would we want to warn one way or the other in that case? Is it valid 
> that someone tries to free a wrong range?

The original cma_pages_valid() check start pfn whether it is in cma
range or not, and the range must be within the complete cma range.
The repeatedly check "VM_BUG_ON(pfn + count > end)" in cma_release()
is never performed since we return early after cma_pages_valid().

> 
> Note that the original code had this pr_debug() in case no range for the 
> start pfn was found (IIUC, it's confusing) and this  VM_BUG_ON(end_pfn > 
> cmr->base_pfn + cmr->count) in case a range was found but it would not 
> completely match the

So the VM_BUG_ON is not useful,
> 
> You're not discussing that behavioral change in the changelog, and I 
> think we would want to keep some sanity checks, likely in a 
> VM_WARN_ON_ONCE() form.
> 
> 

But for the error path, adding some debug info is better, a quick diff
based on this patch, what do you think?


diff --git a/mm/cma.c b/mm/cma.c
index 2af8c5bc58dd..88016f4aef7f 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -959,12 +959,19 @@ static bool __cma_release(struct cma *cma, const 
struct page *pages,
         for (r = 0; r < cma->nranges; r++) {
                 cmr = &cma->ranges[r];
                 end = cmr->base_pfn + cmr->count;
-               if (pfn >= cmr->base_pfn && pfn < end && pfn + count <= end)
-                       break;
+               if (pfn >= cmr->base_pfn && pfn < end) {
+                       if (pfn + count <= end)
+                               break;
+
+                       VM_WARN_ON_ONCE(1);
+               }
         }

-       if (r == cma->nranges)
+       if (r == cma->nranges) {
+               pr_debug("%s(no cma range match the page %p)\n",
+                        __func__, (void *)pages);
                 return false;
+       }

         free_contig_range(pfn, count);
         cma_clear_bitmap(cma, cmr, pfn, count);



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/8] mm: page_alloc: optimize pfn_range_valid_contig()
  2025-09-30  9:56   ` David Hildenbrand
@ 2025-10-09 12:40     ` Kefeng Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Kefeng Wang @ 2025-10-09 12:40 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton, Oscar Salvador, Muchun Song,
	Zi Yan, Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm



On 2025/9/30 17:56, David Hildenbrand wrote:
> On 18.09.25 15:19, Kefeng Wang wrote:
>> The alloc_contig_pages() spends a lot of time in 
>> pfn_range_valid_contig(),
>> we could check whether the page in this pfn range could be allocated
>> before alloc_contig_range(), if the page can't be migrated, no further
>> action is required, and also skip some unnecessary iterations for
>> compound pages such as THP and non-compound high order buddy, which
>> save times a lot too. The check is racy, but the only danger is skipping
>> too much.
>>
>> A simple test on machine with 116G free memory, allocate 120 * 1G
>> HugeTLB folios(107 successfully returned),
>>
>>    time echo 120 > /sys/kernel/mm/hugepages/hugepages-1048576kB/ 
>> nr_hugepages
>>
>> Before: 0m2.124s
>> After:  0m0.602s
>>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>   mm/page_alloc.c | 20 ++++++++++++++++++--
>>   1 file changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 478beaf95f84..5b7d705e9710 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -7012,6 +7012,7 @@ static bool pfn_range_valid_contig(struct zone 
>> *z, unsigned long start_pfn,
>>   {
>>       unsigned long i, end_pfn = start_pfn + nr_pages;
>>       struct page *page;
>> +    struct folio *folio;
>>       for (i = start_pfn; i < end_pfn; i++) {
>>           page = pfn_to_online_page(i);
>> @@ -7021,11 +7022,26 @@ static bool pfn_range_valid_contig(struct zone 
>> *z, unsigned long start_pfn,
>>           if (page_zone(page) != z)
>>               return false;
>> -        if (PageReserved(page))
>> +        folio = page_folio(page);
>> +        if (folio_test_reserved(folio))
>>               return false;
>> -        if (PageHuge(page))
>> +        if (folio_test_hugetlb(folio))
>>               return false;
>> +
>> +        /* The following type of folios aren't migrated */
>> +        if (folio_test_pgtable(folio) | folio_test_stack(folio))
>> +            return false;
>> +
> 
> I don't enjoy us open coding this here. has_unmovable_pages() has a much 
> better heuristics.
> 
> I suggest you drop this patch for now from this series, as it seems to 
> be independent from the rest, and instead see if you could reuse some of 
> the has_unmovable_pages() logic instead.
> 

OK，I will try to check if has_unmovable_pages() could be used. The new
patches are added when I test alloc_contig_pages/
alloc_contig_frozen_pages with different GFP flags.

Let me remove them and resend them separately.

[1] 
https://lore.kernel.org/linux-mm/39ea6d31-ec9c-4053-a875-8e86a8676a62@huawei.com/




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/8] mm: hugetlb: optimize replace_free_hugepage_folios()
  2025-09-30  9:57   ` David Hildenbrand
@ 2025-10-09 12:40     ` Kefeng Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Kefeng Wang @ 2025-10-09 12:40 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton, Oscar Salvador, Muchun Song,
	Zi Yan, Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm



On 2025/9/30 17:57, David Hildenbrand wrote:
> On 18.09.25 15:19, Kefeng Wang wrote:
>> No need to replace free hugepage folios if no free hugetlb folios,
>> we don't replace gigantic folio, so use isolate_or_dissolve_huge_folio(),
>> also skip some pfn iterations for compound pages such as THP and
>> non-compound high order buddy to save time.
>>
>> A simple test on machine with 116G free memory, allocate 120 * 1G
>> HugeTLB folios(107 successfully returned),
>>
>>    time echo 120 > /sys/kernel/mm/hugepages/hugepages-1048576kB/ 
>> nr_hugepages
>>
>> Before: 0m0.602s
>> After:  0m0.429s
> 
> Also this patch feels misplaced in this series. I suggest you send that 
> out separately.
> 
> Or is there anything important that I am missing?
> 

Sure, let me do it separately.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/8] mm: page_alloc: add split_non_compound_page()
  2025-09-30 10:06   ` David Hildenbrand
@ 2025-10-09 12:40     ` Kefeng Wang
  0 siblings, 0 replies; 23+ messages in thread
From: Kefeng Wang @ 2025-10-09 12:40 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton, Oscar Salvador, Muchun Song,
	Zi Yan, Matthew Wilcox
  Cc: sidhartha.kumar, jane.chu, Vlastimil Babka, Brendan Jackman,
	Johannes Weiner, linux-mm



On 2025/9/30 18:06, David Hildenbrand wrote:
> On 18.09.25 15:19, Kefeng Wang wrote:
>> Add new split_non_compound_page() to simplify make_alloc_exact().
>>
> 
> "Factor out the splitting of non-compound page from make_alloc_exact() 
> and split_page() into a new helper function split_non_compound_page()".
> 

Thanks, will update the changelog.

> Not sure I enjoy the name "split_non_compound_page()", but it matches 
> the existing theme of split_page(): we're not really splitting any 
> pages, we're just adjusting tracking metadata for pages part of the 
> original-higher-order-page so it can be freed separately later.
> 
> But now I think of it, the terminology is bad if you look at the 
> description of split_page(): "split_page takes a non-compound higher- 
> order page, and splits it into n (1<<order) sub-pages: page[0..n]". It's 
> unclear how split_non_compound_page() would really differ.
> 
> I would suggest you call the new helper simply "__split_page" ?
> 
> 

OK, the naming always beats me, will change it.



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2025-10-09 12:59 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-18 13:19 [PATCH v2 0/8] mm: hugetlb: allocate frozen gigantic folio Kefeng Wang
2025-09-18 13:19 ` [PATCH v2 1/8] mm: page_alloc: optimize pfn_range_valid_contig() Kefeng Wang
2025-09-18 15:49   ` Zi Yan
2025-09-19  2:03     ` Kefeng Wang
2025-09-19  1:40   ` kernel test robot
2025-09-19  5:00   ` Dev Jain
2025-09-20  8:19     ` Kefeng Wang
2025-09-30  9:56   ` David Hildenbrand
2025-10-09 12:40     ` Kefeng Wang
2025-09-18 13:19 ` [PATCH v2 2/8] mm: hugetlb: optimize replace_free_hugepage_folios() Kefeng Wang
2025-09-30  9:57   ` David Hildenbrand
2025-10-09 12:40     ` Kefeng Wang
2025-09-18 13:19 ` [PATCH v2 3/8] mm: debug_vm_pgtable: add debug_vm_pgtable_free_huge_page() Kefeng Wang
2025-09-30 10:01   ` David Hildenbrand
2025-09-18 13:19 ` [PATCH v2 4/8] mm: page_alloc: add split_non_compound_page() Kefeng Wang
2025-09-30 10:06   ` David Hildenbrand
2025-10-09 12:40     ` Kefeng Wang
2025-09-18 13:19 ` [PATCH v2 5/8] mm: page_alloc: add alloc_contig_{range_frozen,frozen_pages}() Kefeng Wang
2025-09-18 13:19 ` [PATCH v2 6/8] mm: cma: add __cma_release() Kefeng Wang
2025-09-30 10:15   ` David Hildenbrand
2025-10-09 12:40     ` Kefeng Wang
2025-09-18 13:19 ` [PATCH v2 7/8] mm: cma: add cma_alloc_frozen{_compound}() Kefeng Wang
2025-09-18 13:20 ` [PATCH v2 8/8] mm: hugetlb: allocate frozen pages in alloc_gigantic_folio() Kefeng Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox