[PATCH mm-unstable v1 0/3] mm/hugetlb: alloc/free gigantic folios

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH mm-unstable v1 0/3] mm/hugetlb: alloc/free gigantic folios
@ 2024-08-11 21:21 Yu Zhao
  2024-08-11 21:21 ` [PATCH mm-unstable v1 1/3] mm/contig_alloc: support __GFP_COMP Yu Zhao
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Yu Zhao @ 2024-08-11 21:21 UTC (permalink / raw)
  To: Andrew Morton, Muchun Song
  Cc: Matthew Wilcox (Oracle), Zi Yan, linux-mm, linux-kernel, Yu Zhao

Use __GFP_COMP for gigantic folios can greatly reduce not only the
complexity in the code but also the allocation and free time.

Approximate LOC to mm/hugetlb.c: -200, +50

Allocate and free 500 1GB hugeTLB memory without HVO by:
  time echo 500 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
  time echo 0 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

       Before  After
Alloc  ~13s    ~10s
Free   ~15s    <1s

The above magnitude generally holds for multiple x86 and arm64 CPU
models.

Yu Zhao (3):
  mm/contig_alloc: support __GFP_COMP
  mm/cma: add cma_alloc_folio()
  mm/hugetlb: use __GFP_COMP for gigantic folios

 include/linux/cma.h     |   1 +
 include/linux/hugetlb.h |   9 +-
 mm/cma.c                |  47 +++++---
 mm/compaction.c         |  48 +-------
 mm/hugetlb.c            | 244 ++++++++--------------------------------
 mm/internal.h           |   9 ++
 mm/page_alloc.c         | 111 +++++++++++++-----
 7 files changed, 177 insertions(+), 292 deletions(-)


base-commit: b447504e1fed49fabbc03d6c2530126824f87c92
prerequisite-patch-id: 9fe502f7c87a9f951d0aee61f426bd85bc43ef74
-- 
2.46.0.76.ge559c4bf1a-goog



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH mm-unstable v1 1/3] mm/contig_alloc: support __GFP_COMP
  2024-08-11 21:21 [PATCH mm-unstable v1 0/3] mm/hugetlb: alloc/free gigantic folios Yu Zhao
@ 2024-08-11 21:21 ` Yu Zhao
  2024-08-11 21:21 ` [PATCH mm-unstable v1 2/3] mm/cma: add cma_alloc_folio() Yu Zhao
  2024-08-11 21:21 ` [PATCH mm-unstable v1 3/3] mm/hugetlb: use __GFP_COMP for gigantic folios Yu Zhao
  2 siblings, 0 replies; 8+ messages in thread
From: Yu Zhao @ 2024-08-11 21:21 UTC (permalink / raw)
  To: Andrew Morton, Muchun Song
  Cc: Matthew Wilcox (Oracle), Zi Yan, linux-mm, linux-kernel, Yu Zhao

Support __GFP_COMP in alloc_contig_range(). When the flag is set, upon
success the function returns a large folio prepared by
prep_new_page(), rather than a range of order-0 pages prepared by
split_free_pages() (which is renamed from split_map_pages()).

alloc_contig_range() can return folios larger than MAX_PAGE_ORDER,
e.g., gigantic hugeTLB folios. As a result, on the free path
free_one_page() needs to handle this case by split_large_buddy(), in
addition to free_contig_range() properly handling large folios by
folio_put().

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/compaction.c |  48 +++------------------
 mm/internal.h   |   9 ++++
 mm/page_alloc.c | 111 ++++++++++++++++++++++++++++++++++--------------
 3 files changed, 94 insertions(+), 74 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index eb95e9b435d0..1ebfef98e1d0 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -79,40 +79,6 @@ static inline bool is_via_compact_memory(int order) { return false; }
 #define COMPACTION_HPAGE_ORDER	(PMD_SHIFT - PAGE_SHIFT)
 #endif
 
-static struct page *mark_allocated_noprof(struct page *page, unsigned int order, gfp_t gfp_flags)
-{
-	post_alloc_hook(page, order, __GFP_MOVABLE);
-	return page;
-}
-#define mark_allocated(...)	alloc_hooks(mark_allocated_noprof(__VA_ARGS__))
-
-static void split_map_pages(struct list_head *freepages)
-{
-	unsigned int i, order;
-	struct page *page, *next;
-	LIST_HEAD(tmp_list);
-
-	for (order = 0; order < NR_PAGE_ORDERS; order++) {
-		list_for_each_entry_safe(page, next, &freepages[order], lru) {
-			unsigned int nr_pages;
-
-			list_del(&page->lru);
-
-			nr_pages = 1 << order;
-
-			mark_allocated(page, order, __GFP_MOVABLE);
-			if (order)
-				split_page(page, order);
-
-			for (i = 0; i < nr_pages; i++) {
-				list_add(&page->lru, &tmp_list);
-				page++;
-			}
-		}
-		list_splice_init(&tmp_list, &freepages[0]);
-	}
-}
-
 static unsigned long release_free_list(struct list_head *freepages)
 {
 	int order;
@@ -742,11 +708,11 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
  *
  * Non-free pages, invalid PFNs, or zone boundaries within the
  * [start_pfn, end_pfn) range are considered errors, cause function to
- * undo its actions and return zero.
+ * undo its actions and return zero. cc->freepages[] are empty.
  *
  * Otherwise, function returns one-past-the-last PFN of isolated page
  * (which may be greater then end_pfn if end fell in a middle of
- * a free page).
+ * a free page). cc->freepages[] contain free pages isolated.
  */
 unsigned long
 isolate_freepages_range(struct compact_control *cc,
@@ -754,10 +720,9 @@ isolate_freepages_range(struct compact_control *cc,
 {
 	unsigned long isolated, pfn, block_start_pfn, block_end_pfn;
 	int order;
-	struct list_head tmp_freepages[NR_PAGE_ORDERS];
 
 	for (order = 0; order < NR_PAGE_ORDERS; order++)
-		INIT_LIST_HEAD(&tmp_freepages[order]);
+		INIT_LIST_HEAD(&cc->freepages[order]);
 
 	pfn = start_pfn;
 	block_start_pfn = pageblock_start_pfn(pfn);
@@ -788,7 +753,7 @@ isolate_freepages_range(struct compact_control *cc,
 			break;
 
 		isolated = isolate_freepages_block(cc, &isolate_start_pfn,
-					block_end_pfn, tmp_freepages, 0, true);
+					block_end_pfn, cc->freepages, 0, true);
 
 		/*
 		 * In strict mode, isolate_freepages_block() returns 0 if
@@ -807,13 +772,10 @@ isolate_freepages_range(struct compact_control *cc,
 
 	if (pfn < end_pfn) {
 		/* Loop terminated early, cleanup. */
-		release_free_list(tmp_freepages);
+		release_free_list(cc->freepages);
 		return 0;
 	}
 
-	/* __isolate_free_page() does not map the pages */
-	split_map_pages(tmp_freepages);
-
 	/* We don't use freelists for anything. */
 	return pfn;
 }
diff --git a/mm/internal.h b/mm/internal.h
index acda347620c6..03e795ce755f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -679,6 +679,15 @@ extern void prep_compound_page(struct page *page, unsigned int order);
 
 extern void post_alloc_hook(struct page *page, unsigned int order,
 					gfp_t gfp_flags);
+
+static inline struct page *post_alloc_hook_noprof(struct page *page, unsigned int order,
+						  gfp_t gfp_flags)
+{
+	post_alloc_hook(page, order, __GFP_MOVABLE);
+	return page;
+}
+#define mark_allocated(...) alloc_hooks(post_alloc_hook_noprof(__VA_ARGS__))
+
 extern bool free_pages_prepare(struct page *page, unsigned int order);
 
 extern int user_min_free_kbytes;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 84a7154fde93..6c801404a108 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1196,16 +1196,36 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
+/* Split a multi-block free page into its individual pageblocks */
+static void split_large_buddy(struct zone *zone, struct page *page,
+			      unsigned long pfn, int order, fpi_t fpi)
+{
+	unsigned long end = pfn + (1 << order);
+
+	VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order));
+	/* Caller removed page from freelist, buddy info cleared! */
+	VM_WARN_ON_ONCE(PageBuddy(page));
+
+	if (order > pageblock_order)
+		order = pageblock_order;
+
+	while (pfn != end) {
+		int mt = get_pfnblock_migratetype(page, pfn);
+
+		__free_one_page(page, pfn, zone, order, mt, fpi);
+		pfn += 1 << order;
+		page = pfn_to_page(pfn);
+	}
+}
+
 static void free_one_page(struct zone *zone, struct page *page,
 			  unsigned long pfn, unsigned int order,
 			  fpi_t fpi_flags)
 {
 	unsigned long flags;
-	int migratetype;
 
 	spin_lock_irqsave(&zone->lock, flags);
-	migratetype = get_pfnblock_migratetype(page, pfn);
-	__free_one_page(page, pfn, zone, order, migratetype, fpi_flags);
+	split_large_buddy(zone, page, pfn, order, fpi_flags);
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
 
@@ -1697,27 +1717,6 @@ static unsigned long find_large_buddy(unsigned long start_pfn)
 	return start_pfn;
 }
 
-/* Split a multi-block free page into its individual pageblocks */
-static void split_large_buddy(struct zone *zone, struct page *page,
-			      unsigned long pfn, int order)
-{
-	unsigned long end_pfn = pfn + (1 << order);
-
-	VM_WARN_ON_ONCE(order <= pageblock_order);
-	VM_WARN_ON_ONCE(pfn & (pageblock_nr_pages - 1));
-
-	/* Caller removed page from freelist, buddy info cleared! */
-	VM_WARN_ON_ONCE(PageBuddy(page));
-
-	while (pfn != end_pfn) {
-		int mt = get_pfnblock_migratetype(page, pfn);
-
-		__free_one_page(page, pfn, zone, pageblock_order, mt, FPI_NONE);
-		pfn += pageblock_nr_pages;
-		page = pfn_to_page(pfn);
-	}
-}
-
 /**
  * move_freepages_block_isolate - move free pages in block for page isolation
  * @zone: the zone
@@ -1758,7 +1757,7 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page,
 		del_page_from_free_list(buddy, zone, order,
 					get_pfnblock_migratetype(buddy, pfn));
 		set_pageblock_migratetype(page, migratetype);
-		split_large_buddy(zone, buddy, pfn, order);
+		split_large_buddy(zone, buddy, pfn, order, FPI_NONE);
 		return true;
 	}
 
@@ -1769,7 +1768,7 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page,
 		del_page_from_free_list(page, zone, order,
 					get_pfnblock_migratetype(page, pfn));
 		set_pageblock_migratetype(page, migratetype);
-		split_large_buddy(zone, page, pfn, order);
+		split_large_buddy(zone, page, pfn, order, FPI_NONE);
 		return true;
 	}
 move:
@@ -6482,6 +6481,31 @@ int __alloc_contig_migrate_range(struct compact_control *cc,
 	return (ret < 0) ? ret : 0;
 }
 
+static void split_free_pages(struct list_head *list)
+{
+	int order;
+
+	for (order = 0; order < NR_PAGE_ORDERS; order++) {
+		struct page *page, *next;
+		int nr_pages = 1 << order;
+
+		list_for_each_entry_safe(page, next, &list[order], lru) {
+			int i;
+
+			mark_allocated(page, order, __GFP_MOVABLE);
+			if (!order)
+				continue;
+
+			split_page(page, order);
+
+			/* add all subpages to the order-0 head, in sequence */
+			list_del(&page->lru);
+			for (i = 0; i < nr_pages; i++)
+				list_add_tail(&page[i].lru, &list[0]);
+		}
+	}
+}
+
 /**
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
@@ -6594,12 +6618,25 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
 		goto done;
 	}
 
-	/* Free head and tail (if any) */
-	if (start != outer_start)
-		free_contig_range(outer_start, start - outer_start);
-	if (end != outer_end)
-		free_contig_range(end, outer_end - end);
+	if (!(gfp_mask & __GFP_COMP)) {
+		split_free_pages(cc.freepages);
 
+		/* Free head and tail (if any) */
+		if (start != outer_start)
+			free_contig_range(outer_start, start - outer_start);
+		if (end != outer_end)
+			free_contig_range(end, outer_end - end);
+	} else if (start == outer_start && end == outer_end && is_power_of_2(end - start)) {
+		struct page *head = pfn_to_page(start);
+		int order = ilog2(end - start);
+
+		check_new_pages(head, order);
+		prep_new_page(head, order, gfp_mask, 0);
+	} else {
+		ret = -EINVAL;
+		WARN(true, "PFN range: requested [%lu, %lu), allocated [%lu, %lu)\n",
+		     start, end, outer_start, outer_end);
+	}
 done:
 	undo_isolate_page_range(start, end, migratetype);
 	return ret;
@@ -6708,6 +6745,18 @@ struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
 void free_contig_range(unsigned long pfn, unsigned long nr_pages)
 {
 	unsigned long count = 0;
+	struct folio *folio = pfn_folio(pfn);
+
+	if (folio_test_large(folio)) {
+		int expected = folio_nr_pages(folio);
+
+		if (nr_pages == expected)
+			folio_put(folio);
+		else
+			WARN(true, "PFN %lu: nr_pages %lu != expected %d\n",
+			     pfn, nr_pages, expected);
+		return;
+	}
 
 	for (; nr_pages--; pfn++) {
 		struct page *page = pfn_to_page(pfn);
-- 
2.46.0.76.ge559c4bf1a-goog



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH mm-unstable v1 2/3] mm/cma: add cma_alloc_folio()
  2024-08-11 21:21 [PATCH mm-unstable v1 0/3] mm/hugetlb: alloc/free gigantic folios Yu Zhao
  2024-08-11 21:21 ` [PATCH mm-unstable v1 1/3] mm/contig_alloc: support __GFP_COMP Yu Zhao
@ 2024-08-11 21:21 ` Yu Zhao
  2024-08-15 14:40   ` Kefeng Wang
  2024-08-11 21:21 ` [PATCH mm-unstable v1 3/3] mm/hugetlb: use __GFP_COMP for gigantic folios Yu Zhao
  2 siblings, 1 reply; 8+ messages in thread
From: Yu Zhao @ 2024-08-11 21:21 UTC (permalink / raw)
  To: Andrew Morton, Muchun Song
  Cc: Matthew Wilcox (Oracle), Zi Yan, linux-mm, linux-kernel, Yu Zhao

With alloc_contig_range() and free_contig_range() supporting large
folios, CMA can allocate and free large folios too, by
cma_alloc_folio() and cma_release().

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 include/linux/cma.h |  1 +
 mm/cma.c            | 47 ++++++++++++++++++++++++++++++---------------
 2 files changed, 33 insertions(+), 15 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h
index 9db877506ea8..086553fbda73 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -46,6 +46,7 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 					struct cma **res_cma);
 extern struct page *cma_alloc(struct cma *cma, unsigned long count, unsigned int align,
 			      bool no_warn);
+extern struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp);
 extern bool cma_pages_valid(struct cma *cma, const struct page *pages, unsigned long count);
 extern bool cma_release(struct cma *cma, const struct page *pages, unsigned long count);
 
diff --git a/mm/cma.c b/mm/cma.c
index 95d6950e177b..46feb06db8e7 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -403,18 +403,8 @@ static void cma_debug_show_areas(struct cma *cma)
 	spin_unlock_irq(&cma->lock);
 }
 
-/**
- * cma_alloc() - allocate pages from contiguous area
- * @cma:   Contiguous memory region for which the allocation is performed.
- * @count: Requested number of pages.
- * @align: Requested alignment of pages (in PAGE_SIZE order).
- * @no_warn: Avoid printing message about failed allocation
- *
- * This function allocates part of contiguous memory on specific
- * contiguous memory area.
- */
-struct page *cma_alloc(struct cma *cma, unsigned long count,
-		       unsigned int align, bool no_warn)
+static struct page *__cma_alloc(struct cma *cma, unsigned long count,
+				unsigned int align, gfp_t gfp)
 {
 	unsigned long mask, offset;
 	unsigned long pfn = -1;
@@ -463,8 +453,7 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
 
 		pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
 		mutex_lock(&cma_mutex);
-		ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
-				     GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
+		ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
 		mutex_unlock(&cma_mutex);
 		if (ret == 0) {
 			page = pfn_to_page(pfn);
@@ -494,7 +483,7 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
 			page_kasan_tag_reset(nth_page(page, i));
 	}
 
-	if (ret && !no_warn) {
+	if (ret && !(gfp & __GFP_NOWARN)) {
 		pr_err_ratelimited("%s: %s: alloc failed, req-size: %lu pages, ret: %d\n",
 				   __func__, cma->name, count, ret);
 		cma_debug_show_areas(cma);
@@ -513,6 +502,34 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
 	return page;
 }
 
+/**
+ * cma_alloc() - allocate pages from contiguous area
+ * @cma:   Contiguous memory region for which the allocation is performed.
+ * @count: Requested number of pages.
+ * @align: Requested alignment of pages (in PAGE_SIZE order).
+ * @no_warn: Avoid printing message about failed allocation
+ *
+ * This function allocates part of contiguous memory on specific
+ * contiguous memory area.
+ */
+struct page *cma_alloc(struct cma *cma, unsigned long count,
+		       unsigned int align, bool no_warn)
+{
+	return __cma_alloc(cma, count, align, GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
+}
+
+struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp)
+{
+	struct page *page;
+
+	if (WARN_ON(order && !(gfp | __GFP_COMP)))
+		return NULL;
+
+	page = __cma_alloc(cma, 1 << order, order, gfp);
+
+	return page ? page_folio(page) : NULL;
+}
+
 bool cma_pages_valid(struct cma *cma, const struct page *pages,
 		     unsigned long count)
 {
-- 
2.46.0.76.ge559c4bf1a-goog



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH mm-unstable v1 3/3] mm/hugetlb: use __GFP_COMP for gigantic folios
  2024-08-11 21:21 [PATCH mm-unstable v1 0/3] mm/hugetlb: alloc/free gigantic folios Yu Zhao
  2024-08-11 21:21 ` [PATCH mm-unstable v1 1/3] mm/contig_alloc: support __GFP_COMP Yu Zhao
  2024-08-11 21:21 ` [PATCH mm-unstable v1 2/3] mm/cma: add cma_alloc_folio() Yu Zhao
@ 2024-08-11 21:21 ` Yu Zhao
  2 siblings, 0 replies; 8+ messages in thread
From: Yu Zhao @ 2024-08-11 21:21 UTC (permalink / raw)
  To: Andrew Morton, Muchun Song
  Cc: Matthew Wilcox (Oracle), Zi Yan, linux-mm, linux-kernel, Yu Zhao

Use __GFP_COMP for gigantic folios to greatly reduce not only the code
but also the allocation and free time.

LOC (approximately): -200, +50

Allocate and free 500 1GB hugeTLB memory without HVO by:
  time echo 500 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
  time echo 0 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

       Before  After
Alloc  ~13s    ~10s
Free   ~15s    <1s

The above magnitude generally holds for multiple x86 and arm64 CPU
models.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 include/linux/hugetlb.h |   9 +-
 mm/hugetlb.c            | 244 ++++++++--------------------------------
 2 files changed, 50 insertions(+), 203 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 3100a52ceb73..98c47c394b89 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -896,10 +896,11 @@ static inline bool hugepage_movable_supported(struct hstate *h)
 /* Movability of hugepages depends on migration support. */
 static inline gfp_t htlb_alloc_mask(struct hstate *h)
 {
-	if (hugepage_movable_supported(h))
-		return GFP_HIGHUSER_MOVABLE;
-	else
-		return GFP_HIGHUSER;
+	gfp_t gfp = __GFP_COMP | __GFP_NOWARN;
+
+	gfp |= hugepage_movable_supported(h) ? GFP_HIGHUSER_MOVABLE : GFP_HIGHUSER;
+
+	return gfp;
 }
 
 static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1c13e65ab119..691f63408d50 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1512,43 +1512,7 @@ static int hstate_next_node_to_free(struct hstate *h, nodemask_t *nodes_allowed)
 		((node = hstate_next_node_to_free(hs, mask)) || 1);	\
 		nr_nodes--)
 
-/* used to demote non-gigantic_huge pages as well */
-static void __destroy_compound_gigantic_folio(struct folio *folio,
-					unsigned int order, bool demote)
-{
-	int i;
-	int nr_pages = 1 << order;
-	struct page *p;
-
-	atomic_set(&folio->_entire_mapcount, 0);
-	atomic_set(&folio->_large_mapcount, 0);
-	atomic_set(&folio->_pincount, 0);
-
-	for (i = 1; i < nr_pages; i++) {
-		p = folio_page(folio, i);
-		p->flags &= ~PAGE_FLAGS_CHECK_AT_FREE;
-		p->mapping = NULL;
-		clear_compound_head(p);
-		if (!demote)
-			set_page_refcounted(p);
-	}
-
-	__folio_clear_head(folio);
-}
-
-static void destroy_compound_hugetlb_folio_for_demote(struct folio *folio,
-					unsigned int order)
-{
-	__destroy_compound_gigantic_folio(folio, order, true);
-}
-
 #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
-static void destroy_compound_gigantic_folio(struct folio *folio,
-					unsigned int order)
-{
-	__destroy_compound_gigantic_folio(folio, order, false);
-}
-
 static void free_gigantic_folio(struct folio *folio, unsigned int order)
 {
 	/*
@@ -1569,38 +1533,52 @@ static void free_gigantic_folio(struct folio *folio, unsigned int order)
 static struct folio *alloc_gigantic_folio(struct hstate *h, gfp_t gfp_mask,
 		int nid, nodemask_t *nodemask)
 {
-	struct page *page;
-	unsigned long nr_pages = pages_per_huge_page(h);
+	struct folio *folio;
+	int order = huge_page_order(h);
+	bool retry = false;
+
 	if (nid == NUMA_NO_NODE)
 		nid = numa_mem_id();
-
+retry:
+	folio = NULL;
 #ifdef CONFIG_CMA
 	{
 		int node;
 
-		if (hugetlb_cma[nid]) {
-			page = cma_alloc(hugetlb_cma[nid], nr_pages,
-					huge_page_order(h), true);
-			if (page)
-				return page_folio(page);
-		}
+		if (hugetlb_cma[nid])
+			folio = cma_alloc_folio(hugetlb_cma[nid], order, gfp_mask);
 
-		if (!(gfp_mask & __GFP_THISNODE)) {
+		if (!folio && !(gfp_mask & __GFP_THISNODE)) {
 			for_each_node_mask(node, *nodemask) {
 				if (node == nid || !hugetlb_cma[node])
 					continue;
 
-				page = cma_alloc(hugetlb_cma[node], nr_pages,
-						huge_page_order(h), true);
-				if (page)
-					return page_folio(page);
+				folio = cma_alloc_folio(hugetlb_cma[node], order, gfp_mask);
+				if (folio)
+					break;
 			}
 		}
 	}
 #endif
+	if (!folio) {
+		struct page *page = alloc_contig_pages(1 << order, gfp_mask, nid, nodemask);
 
-	page = alloc_contig_pages(nr_pages, gfp_mask, nid, nodemask);
-	return page ? page_folio(page) : NULL;
+		if (!page)
+			return NULL;
+
+		folio = page_folio(page);
+	}
+
+	if (folio_ref_freeze(folio, 1))
+		return folio;
+
+	pr_warn("HugeTLB: unexpected refcount on PFN %lu\n", folio_pfn(folio));
+	free_gigantic_folio(folio, order);
+	if (!retry) {
+		retry = true;
+		goto retry;
+	}
+	return NULL;
 }
 
 #else /* !CONFIG_CONTIG_ALLOC */
@@ -1619,8 +1597,6 @@ static struct folio *alloc_gigantic_folio(struct hstate *h, gfp_t gfp_mask,
 }
 static inline void free_gigantic_folio(struct folio *folio,
 						unsigned int order) { }
-static inline void destroy_compound_gigantic_folio(struct folio *folio,
-						unsigned int order) { }
 #endif
 
 /*
@@ -1747,19 +1723,17 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
 		folio_clear_hugetlb_hwpoison(folio);
 
 	folio_ref_unfreeze(folio, 1);
+	INIT_LIST_HEAD(&folio->_deferred_list);
 
 	/*
 	 * Non-gigantic pages demoted from CMA allocated gigantic pages
 	 * need to be given back to CMA in free_gigantic_folio.
 	 */
 	if (hstate_is_gigantic(h) ||
-	    hugetlb_cma_folio(folio, huge_page_order(h))) {
-		destroy_compound_gigantic_folio(folio, huge_page_order(h));
+	    hugetlb_cma_folio(folio, huge_page_order(h)))
 		free_gigantic_folio(folio, huge_page_order(h));
-	} else {
-		INIT_LIST_HEAD(&folio->_deferred_list);
+	else
 		folio_put(folio);
-	}
 }
 
 /*
@@ -2032,95 +2006,6 @@ static void prep_new_hugetlb_folio(struct hstate *h, struct folio *folio, int ni
 	spin_unlock_irq(&hugetlb_lock);
 }
 
-static bool __prep_compound_gigantic_folio(struct folio *folio,
-					unsigned int order, bool demote)
-{
-	int i, j;
-	int nr_pages = 1 << order;
-	struct page *p;
-
-	__folio_clear_reserved(folio);
-	for (i = 0; i < nr_pages; i++) {
-		p = folio_page(folio, i);
-
-		/*
-		 * For gigantic hugepages allocated through bootmem at
-		 * boot, it's safer to be consistent with the not-gigantic
-		 * hugepages and clear the PG_reserved bit from all tail pages
-		 * too.  Otherwise drivers using get_user_pages() to access tail
-		 * pages may get the reference counting wrong if they see
-		 * PG_reserved set on a tail page (despite the head page not
-		 * having PG_reserved set).  Enforcing this consistency between
-		 * head and tail pages allows drivers to optimize away a check
-		 * on the head page when they need know if put_page() is needed
-		 * after get_user_pages().
-		 */
-		if (i != 0)	/* head page cleared above */
-			__ClearPageReserved(p);
-		/*
-		 * Subtle and very unlikely
-		 *
-		 * Gigantic 'page allocators' such as memblock or cma will
-		 * return a set of pages with each page ref counted.  We need
-		 * to turn this set of pages into a compound page with tail
-		 * page ref counts set to zero.  Code such as speculative page
-		 * cache adding could take a ref on a 'to be' tail page.
-		 * We need to respect any increased ref count, and only set
-		 * the ref count to zero if count is currently 1.  If count
-		 * is not 1, we return an error.  An error return indicates
-		 * the set of pages can not be converted to a gigantic page.
-		 * The caller who allocated the pages should then discard the
-		 * pages using the appropriate free interface.
-		 *
-		 * In the case of demote, the ref count will be zero.
-		 */
-		if (!demote) {
-			if (!page_ref_freeze(p, 1)) {
-				pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n");
-				goto out_error;
-			}
-		} else {
-			VM_BUG_ON_PAGE(page_count(p), p);
-		}
-		if (i != 0)
-			set_compound_head(p, &folio->page);
-	}
-	__folio_set_head(folio);
-	/* we rely on prep_new_hugetlb_folio to set the hugetlb flag */
-	folio_set_order(folio, order);
-	atomic_set(&folio->_entire_mapcount, -1);
-	atomic_set(&folio->_large_mapcount, -1);
-	atomic_set(&folio->_pincount, 0);
-	return true;
-
-out_error:
-	/* undo page modifications made above */
-	for (j = 0; j < i; j++) {
-		p = folio_page(folio, j);
-		if (j != 0)
-			clear_compound_head(p);
-		set_page_refcounted(p);
-	}
-	/* need to clear PG_reserved on remaining tail pages  */
-	for (; j < nr_pages; j++) {
-		p = folio_page(folio, j);
-		__ClearPageReserved(p);
-	}
-	return false;
-}
-
-static bool prep_compound_gigantic_folio(struct folio *folio,
-							unsigned int order)
-{
-	return __prep_compound_gigantic_folio(folio, order, false);
-}
-
-static bool prep_compound_gigantic_folio_for_demote(struct folio *folio,
-							unsigned int order)
-{
-	return __prep_compound_gigantic_folio(folio, order, true);
-}
-
 /*
  * Find and lock address space (mapping) in write mode.
  *
@@ -2159,7 +2044,6 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
 	 */
 	if (node_alloc_noretry && node_isset(nid, *node_alloc_noretry))
 		alloc_try_hard = false;
-	gfp_mask |= __GFP_COMP|__GFP_NOWARN;
 	if (alloc_try_hard)
 		gfp_mask |= __GFP_RETRY_MAYFAIL;
 	if (nid == NUMA_NO_NODE)
@@ -2206,48 +2090,14 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
 	return folio;
 }
 
-static struct folio *__alloc_fresh_hugetlb_folio(struct hstate *h,
-				gfp_t gfp_mask, int nid, nodemask_t *nmask,
-				nodemask_t *node_alloc_noretry)
-{
-	struct folio *folio;
-	bool retry = false;
-
-retry:
-	if (hstate_is_gigantic(h))
-		folio = alloc_gigantic_folio(h, gfp_mask, nid, nmask);
-	else
-		folio = alloc_buddy_hugetlb_folio(h, gfp_mask,
-				nid, nmask, node_alloc_noretry);
-	if (!folio)
-		return NULL;
-
-	if (hstate_is_gigantic(h)) {
-		if (!prep_compound_gigantic_folio(folio, huge_page_order(h))) {
-			/*
-			 * Rare failure to convert pages to compound page.
-			 * Free pages and try again - ONCE!
-			 */
-			free_gigantic_folio(folio, huge_page_order(h));
-			if (!retry) {
-				retry = true;
-				goto retry;
-			}
-			return NULL;
-		}
-	}
-
-	return folio;
-}
-
 static struct folio *only_alloc_fresh_hugetlb_folio(struct hstate *h,
 		gfp_t gfp_mask, int nid, nodemask_t *nmask,
 		nodemask_t *node_alloc_noretry)
 {
 	struct folio *folio;
 
-	folio = __alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask,
-						node_alloc_noretry);
+	folio = hstate_is_gigantic(h) ? alloc_gigantic_folio(h, gfp_mask, nid, nmask) :
+		alloc_buddy_hugetlb_folio(h, gfp_mask, nid, nmask, node_alloc_noretry);
 	if (folio)
 		init_new_hugetlb_folio(h, folio);
 	return folio;
@@ -2265,7 +2115,8 @@ static struct folio *alloc_fresh_hugetlb_folio(struct hstate *h,
 {
 	struct folio *folio;
 
-	folio = __alloc_fresh_hugetlb_folio(h, gfp_mask, nid, nmask, NULL);
+	folio = hstate_is_gigantic(h) ? alloc_gigantic_folio(h, gfp_mask, nid, nmask) :
+		alloc_buddy_hugetlb_folio(h, gfp_mask, nid, nmask, NULL);
 	if (!folio)
 		return NULL;
 
@@ -2549,9 +2400,8 @@ struct folio *alloc_buddy_hugetlb_folio_with_mpol(struct hstate *h,
 
 	nid = huge_node(vma, addr, gfp_mask, &mpol, &nodemask);
 	if (mpol_is_preferred_many(mpol)) {
-		gfp_t gfp = gfp_mask | __GFP_NOWARN;
+		gfp_t gfp = gfp_mask & ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
 
-		gfp &=  ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL);
 		folio = alloc_surplus_hugetlb_folio(h, gfp, nid, nodemask);
 
 		/* Fallback to all nodes if page==NULL */
@@ -3333,6 +3183,7 @@ static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio,
 	for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) {
 		struct page *page = pfn_to_page(pfn);
 
+		__ClearPageReserved(folio_page(folio, pfn - head_pfn));
 		__init_single_page(page, pfn, zone, nid);
 		prep_compound_tail((struct page *)folio, pfn - head_pfn);
 		ret = page_ref_freeze(page, 1);
@@ -3949,21 +3800,16 @@ static long demote_free_hugetlb_folios(struct hstate *src, struct hstate *dst,
 			continue;
 
 		list_del(&folio->lru);
-		/*
-		 * Use destroy_compound_hugetlb_folio_for_demote for all huge page
-		 * sizes as it will not ref count folios.
-		 */
-		destroy_compound_hugetlb_folio_for_demote(folio, huge_page_order(src));
+
+		split_page_owner(&folio->page, huge_page_order(src), huge_page_order(dst));
+		pgalloc_tag_split(&folio->page, 1 <<  huge_page_order(src));
 
 		for (i = 0; i < pages_per_huge_page(src); i += pages_per_huge_page(dst)) {
 			struct page *page = folio_page(folio, i);
 
-			if (hstate_is_gigantic(dst))
-				prep_compound_gigantic_folio_for_demote(page_folio(page),
-									dst->order);
-			else
-				prep_compound_page(page, dst->order);
-			set_page_private(page, 0);
+			page->mapping = NULL;
+			clear_compound_head(page);
+			prep_compound_page(page, dst->order);
 
 			init_new_hugetlb_folio(dst, page_folio(page));
 			list_add(&page->lru, &dst_list);
-- 
2.46.0.76.ge559c4bf1a-goog



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH mm-unstable v1 2/3] mm/cma: add cma_alloc_folio()
  2024-08-11 21:21 ` [PATCH mm-unstable v1 2/3] mm/cma: add cma_alloc_folio() Yu Zhao
@ 2024-08-15 14:40   ` Kefeng Wang
  2024-08-15 18:04     ` Yu Zhao
  0 siblings, 1 reply; 8+ messages in thread
From: Kefeng Wang @ 2024-08-15 14:40 UTC (permalink / raw)
  To: Yu Zhao, Andrew Morton, Muchun Song
  Cc: Matthew Wilcox (Oracle), Zi Yan, linux-mm, linux-kernel



On 2024/8/12 5:21, Yu Zhao wrote:
> With alloc_contig_range() and free_contig_range() supporting large
> folios, CMA can allocate and free large folios too, by
> cma_alloc_folio() and cma_release().
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>
> ---
>   include/linux/cma.h |  1 +
>   mm/cma.c            | 47 ++++++++++++++++++++++++++++++---------------
>   2 files changed, 33 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/cma.h b/include/linux/cma.h
> index 9db877506ea8..086553fbda73 100644
> --- a/include/linux/cma.h
> +++ b/include/linux/cma.h
> @@ -46,6 +46,7 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
>   					struct cma **res_cma);
>   extern struct page *cma_alloc(struct cma *cma, unsigned long count, unsigned int align,
>   			      bool no_warn);
> +extern struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp);
>   extern bool cma_pages_valid(struct cma *cma, const struct page *pages, unsigned long count);
>   extern bool cma_release(struct cma *cma, const struct page *pages, unsigned long count);
>   
> diff --git a/mm/cma.c b/mm/cma.c
> index 95d6950e177b..46feb06db8e7 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -403,18 +403,8 @@ static void cma_debug_show_areas(struct cma *cma)
>   	spin_unlock_irq(&cma->lock);
>   }
>   
> -/**
> - * cma_alloc() - allocate pages from contiguous area
> - * @cma:   Contiguous memory region for which the allocation is performed.
> - * @count: Requested number of pages.
> - * @align: Requested alignment of pages (in PAGE_SIZE order).
> - * @no_warn: Avoid printing message about failed allocation
> - *
> - * This function allocates part of contiguous memory on specific
> - * contiguous memory area.
> - */
> -struct page *cma_alloc(struct cma *cma, unsigned long count,
> -		       unsigned int align, bool no_warn)
> +static struct page *__cma_alloc(struct cma *cma, unsigned long count,
> +				unsigned int align, gfp_t gfp)
>   {
>   	unsigned long mask, offset;
>   	unsigned long pfn = -1;
> @@ -463,8 +453,7 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
>   
>   		pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
>   		mutex_lock(&cma_mutex);
> -		ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
> -				     GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
> +		ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
>   		mutex_unlock(&cma_mutex);
>   		if (ret == 0) {
>   			page = pfn_to_page(pfn);
> @@ -494,7 +483,7 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
>   			page_kasan_tag_reset(nth_page(page, i));
>   	}
>   
> -	if (ret && !no_warn) {
> +	if (ret && !(gfp & __GFP_NOWARN)) {
>   		pr_err_ratelimited("%s: %s: alloc failed, req-size: %lu pages, ret: %d\n",
>   				   __func__, cma->name, count, ret);
>   		cma_debug_show_areas(cma);
> @@ -513,6 +502,34 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
>   	return page;
>   }
>   
> +/**
> + * cma_alloc() - allocate pages from contiguous area
> + * @cma:   Contiguous memory region for which the allocation is performed.
> + * @count: Requested number of pages.
> + * @align: Requested alignment of pages (in PAGE_SIZE order).
> + * @no_warn: Avoid printing message about failed allocation
> + *
> + * This function allocates part of contiguous memory on specific
> + * contiguous memory area.
> + */
> +struct page *cma_alloc(struct cma *cma, unsigned long count,
> +		       unsigned int align, bool no_warn)
> +{
> +	return __cma_alloc(cma, count, align, GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
> +}
> +
> +struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp)
> +{
> +	struct page *page;
> +
> +	if (WARN_ON(order && !(gfp | __GFP_COMP)))
> +		return NULL;
> +
> +	page = __cma_alloc(cma, 1 << order, order, gfp);
> +
> +	return page ? page_folio(page) : NULL;

We don't set large_rmappable for cma alloc folio, which is not consistent
with  other folio allocation, eg  folio_alloc/folio_alloc_mpol(),
there is no issue for HugeTLB folio, and for HugeTLB folio must without
large_rmappable, but once we use it for mTHP/THP, it need some extra
handle, maybe we set large_rmappable here, and clear it in
init_new_hugetlb_folio()?

> +}
> +
>   bool cma_pages_valid(struct cma *cma, const struct page *pages,
>   		     unsigned long count)
>   {


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH mm-unstable v1 2/3] mm/cma: add cma_alloc_folio()
  2024-08-15 14:40   ` Kefeng Wang
@ 2024-08-15 18:04     ` Yu Zhao
  2024-08-15 18:25       ` Yu Zhao
  0 siblings, 1 reply; 8+ messages in thread
From: Yu Zhao @ 2024-08-15 18:04 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: Andrew Morton, Muchun Song, Matthew Wilcox (Oracle),
	Zi Yan, linux-mm, linux-kernel

On Thu, Aug 15, 2024 at 8:41 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
>
>
> On 2024/8/12 5:21, Yu Zhao wrote:
> > With alloc_contig_range() and free_contig_range() supporting large
> > folios, CMA can allocate and free large folios too, by
> > cma_alloc_folio() and cma_release().
> >
> > Signed-off-by: Yu Zhao <yuzhao@google.com>
> > ---
> >   include/linux/cma.h |  1 +
> >   mm/cma.c            | 47 ++++++++++++++++++++++++++++++---------------
> >   2 files changed, 33 insertions(+), 15 deletions(-)
> >
> > diff --git a/include/linux/cma.h b/include/linux/cma.h
> > index 9db877506ea8..086553fbda73 100644
> > --- a/include/linux/cma.h
> > +++ b/include/linux/cma.h
> > @@ -46,6 +46,7 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
> >                                       struct cma **res_cma);
> >   extern struct page *cma_alloc(struct cma *cma, unsigned long count, unsigned int align,
> >                             bool no_warn);
> > +extern struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp);
> >   extern bool cma_pages_valid(struct cma *cma, const struct page *pages, unsigned long count);
> >   extern bool cma_release(struct cma *cma, const struct page *pages, unsigned long count);
> >
> > diff --git a/mm/cma.c b/mm/cma.c
> > index 95d6950e177b..46feb06db8e7 100644
> > --- a/mm/cma.c
> > +++ b/mm/cma.c
> > @@ -403,18 +403,8 @@ static void cma_debug_show_areas(struct cma *cma)
> >       spin_unlock_irq(&cma->lock);
> >   }
> >
> > -/**
> > - * cma_alloc() - allocate pages from contiguous area
> > - * @cma:   Contiguous memory region for which the allocation is performed.
> > - * @count: Requested number of pages.
> > - * @align: Requested alignment of pages (in PAGE_SIZE order).
> > - * @no_warn: Avoid printing message about failed allocation
> > - *
> > - * This function allocates part of contiguous memory on specific
> > - * contiguous memory area.
> > - */
> > -struct page *cma_alloc(struct cma *cma, unsigned long count,
> > -                    unsigned int align, bool no_warn)
> > +static struct page *__cma_alloc(struct cma *cma, unsigned long count,
> > +                             unsigned int align, gfp_t gfp)
> >   {
> >       unsigned long mask, offset;
> >       unsigned long pfn = -1;
> > @@ -463,8 +453,7 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
> >
> >               pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
> >               mutex_lock(&cma_mutex);
> > -             ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
> > -                                  GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
> > +             ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
> >               mutex_unlock(&cma_mutex);
> >               if (ret == 0) {
> >                       page = pfn_to_page(pfn);
> > @@ -494,7 +483,7 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
> >                       page_kasan_tag_reset(nth_page(page, i));
> >       }
> >
> > -     if (ret && !no_warn) {
> > +     if (ret && !(gfp & __GFP_NOWARN)) {
> >               pr_err_ratelimited("%s: %s: alloc failed, req-size: %lu pages, ret: %d\n",
> >                                  __func__, cma->name, count, ret);
> >               cma_debug_show_areas(cma);
> > @@ -513,6 +502,34 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
> >       return page;
> >   }
> >
> > +/**
> > + * cma_alloc() - allocate pages from contiguous area
> > + * @cma:   Contiguous memory region for which the allocation is performed.
> > + * @count: Requested number of pages.
> > + * @align: Requested alignment of pages (in PAGE_SIZE order).
> > + * @no_warn: Avoid printing message about failed allocation
> > + *
> > + * This function allocates part of contiguous memory on specific
> > + * contiguous memory area.
> > + */
> > +struct page *cma_alloc(struct cma *cma, unsigned long count,
> > +                    unsigned int align, bool no_warn)
> > +{
> > +     return __cma_alloc(cma, count, align, GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
> > +}
> > +
> > +struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp)
> > +{
> > +     struct page *page;
> > +
> > +     if (WARN_ON(order && !(gfp | __GFP_COMP)))
> > +             return NULL;
> > +
> > +     page = __cma_alloc(cma, 1 << order, order, gfp);
> > +
> > +     return page ? page_folio(page) : NULL;
>
> We don't set large_rmappable for cma alloc folio, which is not consistent
> with  other folio allocation, eg  folio_alloc/folio_alloc_mpol(),
> there is no issue for HugeTLB folio, and for HugeTLB folio must without
> large_rmappable, but once we use it for mTHP/THP, it need some extra
> handle, maybe we set large_rmappable here, and clear it in
> init_new_hugetlb_folio()?

I want to hear what Matthew thinks about this.

My opinion is that we don't want to couple largely rmappable (or
deferred splittable) with __GFP_COMP, and for that matter, with large
folios, because the former are specific to THPs whereas the latter can
potentially work for most types of high order allocations.

Again, IMO, if we want to seriously answer the question of
  Can we get rid of non-compound multi-page allocations? [1]
then we should start planning on decouple large rmappable from the
generic folio allocation API.

[1] https://lpc.events/event/18/sessions/184/#20240920


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH mm-unstable v1 2/3] mm/cma: add cma_alloc_folio()
  2024-08-15 18:04     ` Yu Zhao
@ 2024-08-15 18:25       ` Yu Zhao
  2024-08-15 18:48         ` Usama Arif
  0 siblings, 1 reply; 8+ messages in thread
From: Yu Zhao @ 2024-08-15 18:25 UTC (permalink / raw)
  To: Kefeng Wang, Usama Arif
  Cc: Andrew Morton, Muchun Song, Matthew Wilcox (Oracle),
	Zi Yan, linux-mm, linux-kernel

On Thu, Aug 15, 2024 at 12:04 PM Yu Zhao <yuzhao@google.com> wrote:
>
> On Thu, Aug 15, 2024 at 8:41 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> >
> >
> >
> > On 2024/8/12 5:21, Yu Zhao wrote:
> > > With alloc_contig_range() and free_contig_range() supporting large
> > > folios, CMA can allocate and free large folios too, by
> > > cma_alloc_folio() and cma_release().
> > >
> > > Signed-off-by: Yu Zhao <yuzhao@google.com>
> > > ---
> > >   include/linux/cma.h |  1 +
> > >   mm/cma.c            | 47 ++++++++++++++++++++++++++++++---------------
> > >   2 files changed, 33 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/include/linux/cma.h b/include/linux/cma.h
> > > index 9db877506ea8..086553fbda73 100644
> > > --- a/include/linux/cma.h
> > > +++ b/include/linux/cma.h
> > > @@ -46,6 +46,7 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
> > >                                       struct cma **res_cma);
> > >   extern struct page *cma_alloc(struct cma *cma, unsigned long count, unsigned int align,
> > >                             bool no_warn);
> > > +extern struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp);
> > >   extern bool cma_pages_valid(struct cma *cma, const struct page *pages, unsigned long count);
> > >   extern bool cma_release(struct cma *cma, const struct page *pages, unsigned long count);
> > >
> > > diff --git a/mm/cma.c b/mm/cma.c
> > > index 95d6950e177b..46feb06db8e7 100644
> > > --- a/mm/cma.c
> > > +++ b/mm/cma.c
> > > @@ -403,18 +403,8 @@ static void cma_debug_show_areas(struct cma *cma)
> > >       spin_unlock_irq(&cma->lock);
> > >   }
> > >
> > > -/**
> > > - * cma_alloc() - allocate pages from contiguous area
> > > - * @cma:   Contiguous memory region for which the allocation is performed.
> > > - * @count: Requested number of pages.
> > > - * @align: Requested alignment of pages (in PAGE_SIZE order).
> > > - * @no_warn: Avoid printing message about failed allocation
> > > - *
> > > - * This function allocates part of contiguous memory on specific
> > > - * contiguous memory area.
> > > - */
> > > -struct page *cma_alloc(struct cma *cma, unsigned long count,
> > > -                    unsigned int align, bool no_warn)
> > > +static struct page *__cma_alloc(struct cma *cma, unsigned long count,
> > > +                             unsigned int align, gfp_t gfp)
> > >   {
> > >       unsigned long mask, offset;
> > >       unsigned long pfn = -1;
> > > @@ -463,8 +453,7 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
> > >
> > >               pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
> > >               mutex_lock(&cma_mutex);
> > > -             ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
> > > -                                  GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
> > > +             ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
> > >               mutex_unlock(&cma_mutex);
> > >               if (ret == 0) {
> > >                       page = pfn_to_page(pfn);
> > > @@ -494,7 +483,7 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
> > >                       page_kasan_tag_reset(nth_page(page, i));
> > >       }
> > >
> > > -     if (ret && !no_warn) {
> > > +     if (ret && !(gfp & __GFP_NOWARN)) {
> > >               pr_err_ratelimited("%s: %s: alloc failed, req-size: %lu pages, ret: %d\n",
> > >                                  __func__, cma->name, count, ret);
> > >               cma_debug_show_areas(cma);
> > > @@ -513,6 +502,34 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
> > >       return page;
> > >   }
> > >
> > > +/**
> > > + * cma_alloc() - allocate pages from contiguous area
> > > + * @cma:   Contiguous memory region for which the allocation is performed.
> > > + * @count: Requested number of pages.
> > > + * @align: Requested alignment of pages (in PAGE_SIZE order).
> > > + * @no_warn: Avoid printing message about failed allocation
> > > + *
> > > + * This function allocates part of contiguous memory on specific
> > > + * contiguous memory area.
> > > + */
> > > +struct page *cma_alloc(struct cma *cma, unsigned long count,
> > > +                    unsigned int align, bool no_warn)
> > > +{
> > > +     return __cma_alloc(cma, count, align, GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
> > > +}
> > > +
> > > +struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp)
> > > +{
> > > +     struct page *page;
> > > +
> > > +     if (WARN_ON(order && !(gfp | __GFP_COMP)))
> > > +             return NULL;
> > > +
> > > +     page = __cma_alloc(cma, 1 << order, order, gfp);
> > > +
> > > +     return page ? page_folio(page) : NULL;
> >
> > We don't set large_rmappable for cma alloc folio, which is not consistent
> > with  other folio allocation, eg  folio_alloc/folio_alloc_mpol(),
> > there is no issue for HugeTLB folio, and for HugeTLB folio must without
> > large_rmappable, but once we use it for mTHP/THP, it need some extra
> > handle, maybe we set large_rmappable here, and clear it in
> > init_new_hugetlb_folio()?
>
> I want to hear what Matthew thinks about this.
>
> My opinion is that we don't want to couple largely rmappable (or
> deferred splittable) with __GFP_COMP, and for that matter, with large
> folios, because the former are specific to THPs whereas the latter can
> potentially work for most types of high order allocations.
>
> Again, IMO, if we want to seriously answer the question of
>   Can we get rid of non-compound multi-page allocations? [1]
> then we should start planning on decouple large rmappable from the
> generic folio allocation API.
>
> [1] https://lpc.events/event/18/sessions/184/#20240920

Also along the similar lines, Usama is trying to add
PG_partially_mapped [1], which I have explicitly asked him not to
introduce that flag to hugeTLB, unless there are good reasons (none
ATM).

[1] https://lore.kernel.org/CAOUHufbmgwZwzUuHVvEDMqPGcsxE2hEreRZ4PhK5yz27GdK-Tw@mail.gmail.com/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH mm-unstable v1 2/3] mm/cma: add cma_alloc_folio()
  2024-08-15 18:25       ` Yu Zhao
@ 2024-08-15 18:48         ` Usama Arif
  0 siblings, 0 replies; 8+ messages in thread
From: Usama Arif @ 2024-08-15 18:48 UTC (permalink / raw)
  To: Yu Zhao, Kefeng Wang
  Cc: Andrew Morton, Muchun Song, Matthew Wilcox (Oracle),
	Zi Yan, linux-mm, linux-kernel



On 15/08/2024 19:25, Yu Zhao wrote:
> On Thu, Aug 15, 2024 at 12:04 PM Yu Zhao <yuzhao@google.com> wrote:
>>
>> On Thu, Aug 15, 2024 at 8:41 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>>
>>>
>>>
>>> On 2024/8/12 5:21, Yu Zhao wrote:
>>>> With alloc_contig_range() and free_contig_range() supporting large
>>>> folios, CMA can allocate and free large folios too, by
>>>> cma_alloc_folio() and cma_release().
>>>>
>>>> Signed-off-by: Yu Zhao <yuzhao@google.com>
>>>> ---
>>>>   include/linux/cma.h |  1 +
>>>>   mm/cma.c            | 47 ++++++++++++++++++++++++++++++---------------
>>>>   2 files changed, 33 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/include/linux/cma.h b/include/linux/cma.h
>>>> index 9db877506ea8..086553fbda73 100644
>>>> --- a/include/linux/cma.h
>>>> +++ b/include/linux/cma.h
>>>> @@ -46,6 +46,7 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
>>>>                                       struct cma **res_cma);
>>>>   extern struct page *cma_alloc(struct cma *cma, unsigned long count, unsigned int align,
>>>>                             bool no_warn);
>>>> +extern struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp);
>>>>   extern bool cma_pages_valid(struct cma *cma, const struct page *pages, unsigned long count);
>>>>   extern bool cma_release(struct cma *cma, const struct page *pages, unsigned long count);
>>>>
>>>> diff --git a/mm/cma.c b/mm/cma.c
>>>> index 95d6950e177b..46feb06db8e7 100644
>>>> --- a/mm/cma.c
>>>> +++ b/mm/cma.c
>>>> @@ -403,18 +403,8 @@ static void cma_debug_show_areas(struct cma *cma)
>>>>       spin_unlock_irq(&cma->lock);
>>>>   }
>>>>
>>>> -/**
>>>> - * cma_alloc() - allocate pages from contiguous area
>>>> - * @cma:   Contiguous memory region for which the allocation is performed.
>>>> - * @count: Requested number of pages.
>>>> - * @align: Requested alignment of pages (in PAGE_SIZE order).
>>>> - * @no_warn: Avoid printing message about failed allocation
>>>> - *
>>>> - * This function allocates part of contiguous memory on specific
>>>> - * contiguous memory area.
>>>> - */
>>>> -struct page *cma_alloc(struct cma *cma, unsigned long count,
>>>> -                    unsigned int align, bool no_warn)
>>>> +static struct page *__cma_alloc(struct cma *cma, unsigned long count,
>>>> +                             unsigned int align, gfp_t gfp)
>>>>   {
>>>>       unsigned long mask, offset;
>>>>       unsigned long pfn = -1;
>>>> @@ -463,8 +453,7 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
>>>>
>>>>               pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
>>>>               mutex_lock(&cma_mutex);
>>>> -             ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA,
>>>> -                                  GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
>>>> +             ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
>>>>               mutex_unlock(&cma_mutex);
>>>>               if (ret == 0) {
>>>>                       page = pfn_to_page(pfn);
>>>> @@ -494,7 +483,7 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
>>>>                       page_kasan_tag_reset(nth_page(page, i));
>>>>       }
>>>>
>>>> -     if (ret && !no_warn) {
>>>> +     if (ret && !(gfp & __GFP_NOWARN)) {
>>>>               pr_err_ratelimited("%s: %s: alloc failed, req-size: %lu pages, ret: %d\n",
>>>>                                  __func__, cma->name, count, ret);
>>>>               cma_debug_show_areas(cma);
>>>> @@ -513,6 +502,34 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
>>>>       return page;
>>>>   }
>>>>
>>>> +/**
>>>> + * cma_alloc() - allocate pages from contiguous area
>>>> + * @cma:   Contiguous memory region for which the allocation is performed.
>>>> + * @count: Requested number of pages.
>>>> + * @align: Requested alignment of pages (in PAGE_SIZE order).
>>>> + * @no_warn: Avoid printing message about failed allocation
>>>> + *
>>>> + * This function allocates part of contiguous memory on specific
>>>> + * contiguous memory area.
>>>> + */
>>>> +struct page *cma_alloc(struct cma *cma, unsigned long count,
>>>> +                    unsigned int align, bool no_warn)
>>>> +{
>>>> +     return __cma_alloc(cma, count, align, GFP_KERNEL | (no_warn ? __GFP_NOWARN : 0));
>>>> +}
>>>> +
>>>> +struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp)
>>>> +{
>>>> +     struct page *page;
>>>> +
>>>> +     if (WARN_ON(order && !(gfp | __GFP_COMP)))
>>>> +             return NULL;
>>>> +
>>>> +     page = __cma_alloc(cma, 1 << order, order, gfp);
>>>> +
>>>> +     return page ? page_folio(page) : NULL;
>>>
>>> We don't set large_rmappable for cma alloc folio, which is not consistent
>>> with  other folio allocation, eg  folio_alloc/folio_alloc_mpol(),
>>> there is no issue for HugeTLB folio, and for HugeTLB folio must without
>>> large_rmappable, but once we use it for mTHP/THP, it need some extra
>>> handle, maybe we set large_rmappable here, and clear it in
>>> init_new_hugetlb_folio()?
>>
>> I want to hear what Matthew thinks about this.
>>
>> My opinion is that we don't want to couple largely rmappable (or
>> deferred splittable) with __GFP_COMP, and for that matter, with large
>> folios, because the former are specific to THPs whereas the latter can
>> potentially work for most types of high order allocations.
>>
>> Again, IMO, if we want to seriously answer the question of
>>   Can we get rid of non-compound multi-page allocations? [1]
>> then we should start planning on decouple large rmappable from the
>> generic folio allocation API.
>>
>> [1] https://lpc.events/event/18/sessions/184/#20240920
> 
> Also along the similar lines, Usama is trying to add
> PG_partially_mapped [1], which I have explicitly asked him not to
> introduce that flag to hugeTLB, unless there are good reasons (none
> ATM).
> 
> [1] https://lore.kernel.org/CAOUHufbmgwZwzUuHVvEDMqPGcsxE2hEreRZ4PhK5yz27GdK-Tw@mail.gmail.com/

PG_partially_mapped won't be cleared for hugeTLB in the next revision of the series as suggested by Yu.
Its not there in the fix patch I posted as well in  https://lore.kernel.org/all/4acdf2b7-ed65-4087-9806-8f4a187b4eb5@gmail.com/


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-08-15 18:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-11 21:21 [PATCH mm-unstable v1 0/3] mm/hugetlb: alloc/free gigantic folios Yu Zhao
2024-08-11 21:21 ` [PATCH mm-unstable v1 1/3] mm/contig_alloc: support __GFP_COMP Yu Zhao
2024-08-11 21:21 ` [PATCH mm-unstable v1 2/3] mm/cma: add cma_alloc_folio() Yu Zhao
2024-08-15 14:40   ` Kefeng Wang
2024-08-15 18:04     ` Yu Zhao
2024-08-15 18:25       ` Yu Zhao
2024-08-15 18:48         ` Usama Arif
2024-08-11 21:21 ` [PATCH mm-unstable v1 3/3] mm/hugetlb: use __GFP_COMP for gigantic folios Yu Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox