[PATCH v1 1/2] mm/page_alloc: Optimize free_contig_range()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ryan Roberts <ryan.roberts@arm.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Brendan Jackman <jackmanb@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	"Vishal Moola (Oracle)" <vishal.moola@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: [PATCH v1 1/2] mm/page_alloc: Optimize free_contig_range()
Date: Mon,  5 Jan 2026 16:17:37 +0000	[thread overview]
Message-ID: <20260105161741.3952456-2-ryan.roberts@arm.com> (raw)
In-Reply-To: <20260105161741.3952456-1-ryan.roberts@arm.com>

Decompose the range of order-0 pages to be freed into the set of largest
possible power-of-2 size and aligned chunks and free them to the pcp or
buddy. This improves on the previous approach which freed each order-0
page individually in a loop. Testing shows performance to be improved by
more than 10x in some cases.

Since each page is order-0, we must decrement each page's reference
count individually and only consider the page for freeing as part of a
high order chunk if the reference count goes to zero. Additionally
free_pages_prepare() must be called for each individual order-0 page
too, so that the struct page state and global accounting state can be
appropriately managed. But once this is done, the resulting high order
chunks can be freed as a unit to the pcp or buddy.

This significiantly speeds up the free operation but also has the side
benefit that high order blocks are added to the pcp instead of each page
ending up on the pcp order-0 list; memory remains more readily available
in high orders.

vmalloc will shortly become a user of this new optimized
free_contig_range() since it agressively allocates high order
non-compound pages, but then calls split_page() to end up with
contiguous order-0 pages. These can now be freed much more efficiently.

The execution time of the following function was measured in a VM on an
Apple M2 system:

static int page_alloc_high_ordr_test(void)
{
	unsigned int order = HPAGE_PMD_ORDER;
	struct page *page;
	int i;

	for (i = 0; i < 100000; i++) {
		page = alloc_pages(GFP_KERNEL, order);
		if (!page)
			return -1;
		split_page(page, order);
		free_contig_range(page_to_pfn(page), 1UL << order);
	}

	return 0;
}

Execution time before: 1684366 usec
Execution time after:   136216 usec

Perf trace before:

    60.93%     0.00%  kthreadd     [kernel.kallsyms]      [k] ret_from_fork
            |
            ---ret_from_fork
               kthread
               0xffffbba283e63980
               |
               |--60.01%--0xffffbba283e636dc
               |          |
               |          |--58.57%--free_contig_range
               |          |          |
               |          |          |--57.19%--___free_pages
               |          |          |          |
               |          |          |          |--46.65%--__free_frozen_pages
               |          |          |          |          |
               |          |          |          |          |--28.08%--free_pcppages_bulk
               |          |          |          |          |
               |          |          |          |           --12.05%--free_frozen_page_commit.constprop.0
               |          |          |          |
               |          |          |          |--5.10%--__get_pfnblock_flags_mask.isra.0
               |          |          |          |
               |          |          |          |--1.13%--_raw_spin_unlock
               |          |          |          |
               |          |          |          |--0.78%--free_frozen_page_commit.constprop.0
               |          |          |          |
               |          |          |           --0.75%--_raw_spin_trylock
               |          |          |
               |          |           --0.95%--__free_frozen_pages
               |          |
               |           --1.44%--___free_pages
               |
                --0.78%--0xffffbba283e636c0
                          split_page

Perf trace after:

    10.62%     0.00%  kthreadd     [kernel.kallsyms]  [k] ret_from_fork
            |
            ---ret_from_fork
               kthread
               0xffffbbd55ef74980
               |
               |--8.74%--0xffffbbd55ef746dc
               |          free_contig_range
               |          |
               |           --8.72%--__free_contig_range
               |
                --1.56%--0xffffbbd55ef746c0
                          |
                           --1.54%--split_page

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 include/linux/gfp.h |   1 +
 mm/page_alloc.c     | 116 +++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 106 insertions(+), 11 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index b155929af5b1..3ed0bef34d0c 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -439,6 +439,7 @@ extern struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_
 #define alloc_contig_pages(...)			alloc_hooks(alloc_contig_pages_noprof(__VA_ARGS__))

 #endif
+unsigned long __free_contig_range(unsigned long pfn, unsigned long nr_pages);
 void free_contig_range(unsigned long pfn, unsigned long nr_pages);

 #ifdef CONFIG_CONTIG_ALLOC
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a045d728ae0f..1015c8edf8a4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -91,6 +91,9 @@ typedef int __bitwise fpi_t;
 /* Free the page without taking locks. Rely on trylock only. */
 #define FPI_TRYLOCK		((__force fpi_t)BIT(2))

+/* free_pages_prepare() has already been called for page(s) being freed. */
+#define FPI_PREPARED		((__force fpi_t)BIT(3))
+
 /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */
 static DEFINE_MUTEX(pcp_batch_high_lock);
 #define MIN_PERCPU_PAGELIST_HIGH_FRACTION (8)
@@ -1582,8 +1585,12 @@ static void __free_pages_ok(struct page *page, unsigned int order,
 	unsigned long pfn = page_to_pfn(page);
 	struct zone *zone = page_zone(page);

-	if (free_pages_prepare(page, order))
-		free_one_page(zone, page, pfn, order, fpi_flags);
+	if (!(fpi_flags & FPI_PREPARED)) {
+		if (!free_pages_prepare(page, order))
+			return;
+	}
+
+	free_one_page(zone, page, pfn, order, fpi_flags);
 }

 void __meminit __free_pages_core(struct page *page, unsigned int order,
@@ -2943,8 +2950,10 @@ static void __free_frozen_pages(struct page *page, unsigned int order,
 		return;
 	}

-	if (!free_pages_prepare(page, order))
-		return;
+	if (!(fpi_flags & FPI_PREPARED)) {
+		if (!free_pages_prepare(page, order))
+			return;
+	}

 	/*
 	 * We only track unmovable, reclaimable and movable on pcp lists.
@@ -7250,9 +7259,99 @@ struct page *alloc_contig_pages_noprof(unsigned long nr_pages, gfp_t gfp_mask,
 }
 #endif /* CONFIG_CONTIG_ALLOC */

+static void free_prepared_contig_range(struct page *page,
+				       unsigned long nr_pages)
+{
+	while (nr_pages) {
+		unsigned int fit_order, align_order, order;
+		unsigned long pfn;
+
+		/*
+		 * Find the largest aligned power-of-2 number of pages that
+		 * starts at the current page, does not exceed nr_pages and is
+		 * less than or equal to pageblock_order.
+		 */
+		pfn = page_to_pfn(page);
+		fit_order = ilog2(nr_pages);
+		align_order = pfn ? __ffs(pfn) : fit_order;
+		order = min3(fit_order, align_order, pageblock_order);
+
+		/*
+		 * Free the chunk as a single block. Our caller has already
+		 * called free_pages_prepare() for each order-0 page.
+		 */
+		__free_frozen_pages(page, order, FPI_PREPARED);
+
+		page += 1UL << order;
+		nr_pages -= 1UL << order;
+	}
+}
+
+/**
+ * __free_contig_range - Free contiguous range of order-0 pages.
+ * @pfn: Page frame number of the first page in the range.
+ * @nr_pages: Number of pages to free.
+ *
+ * For each order-0 struct page in the physically contiguous range, put a
+ * reference. Free any page who's reference count falls to zero. The
+ * implementation is functionally equivalent to, but significantly faster than
+ * calling __free_page() for each struct page in a loop.
+ *
+ * Memory allocated with alloc_pages(order>=1) then subsequently split to
+ * order-0 with split_page() is an example of appropriate contiguous pages that
+ * can be freed with this API.
+ *
+ * Returns the number of pages which were not freed, because their reference
+ * count did not fall to zero.
+ *
+ * Context: May be called in interrupt context or while holding a normal
+ * spinlock, but not in NMI context or while holding a raw spinlock.
+ */
+unsigned long __free_contig_range(unsigned long pfn, unsigned long nr_pages)
+{
+	struct page *page = pfn_to_page(pfn);
+	unsigned long not_freed = 0;
+	struct page *start = NULL;
+	unsigned long i;
+	bool can_free;
+
+	/*
+	 * Chunk the range into contiguous runs of pages for which the refcount
+	 * went to zero and for which free_pages_prepare() succeeded. If
+	 * free_pages_prepare() fails we consider the page to have been freed
+	 * deliberately leak it.
+	 *
+	 * Code assumes contiguous PFNs have contiguous struct pages, but not
+	 * vice versa.
+	 */
+	for (i = 0; i < nr_pages; i++, page++) {
+		VM_BUG_ON_PAGE(PageHead(page), page);
+		VM_BUG_ON_PAGE(PageTail(page), page);
+
+		can_free = put_page_testzero(page);
+		if (!can_free)
+			not_freed++;
+		else if (!free_pages_prepare(page, 0))
+			can_free = false;
+
+		if (!can_free && start) {
+			free_prepared_contig_range(start, page - start);
+			start = NULL;
+		} else if (can_free && !start) {
+			start = page;
+		}
+	}
+
+	if (start)
+		free_prepared_contig_range(start, page - start);
+
+	return not_freed;
+}
+EXPORT_SYMBOL(__free_contig_range);
+
 void free_contig_range(unsigned long pfn, unsigned long nr_pages)
 {
-	unsigned long count = 0;
+	unsigned long count;
 	struct folio *folio = pfn_folio(pfn);

 	if (folio_test_large(folio)) {
@@ -7266,12 +7365,7 @@ void free_contig_range(unsigned long pfn, unsigned long nr_pages)
 		return;
 	}

-	for (; nr_pages--; pfn++) {
-		struct page *page = pfn_to_page(pfn);
-
-		count += page_count(page) != 1;
-		__free_page(page);
-	}
+	count = __free_contig_range(pfn, nr_pages);
 	WARN(count != 0, "%lu pages are still in use!\n", count);
 }
 EXPORT_SYMBOL(free_contig_range);
--
2.43.0

next prev parent reply	other threads:[~2026-01-05 16:17 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-05 16:17 [PATCH v1 0/2] Free contiguous order-0 pages efficiently Ryan Roberts
2026-01-05 16:17 ` Ryan Roberts [this message]
2026-01-05 17:15   ` [PATCH v1 1/2] mm/page_alloc: Optimize free_contig_range() Zi Yan
2026-01-05 17:31     ` Ryan Roberts
2026-01-07  3:32       ` Zi Yan
2026-01-05 16:17 ` [PATCH v1 2/2] vmalloc: Optimize vfree Ryan Roberts
2026-01-06  4:36   ` Matthew Wilcox
2026-01-06  9:47     ` David Laight
2026-01-06 11:04     ` Ryan Roberts
2026-01-05 16:26 ` [PATCH v1 0/2] Free contiguous order-0 pages efficiently David Hildenbrand (Red Hat)
2026-01-05 16:36 ` Zi Yan
2026-01-05 16:41   ` Ryan Roberts
2026-01-06  4:38 ` Matthew Wilcox
2026-01-06 11:10   ` Ryan Roberts
2026-01-06 11:34   ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260105161741.3952456-2-ryan.roberts@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=vishal.moola@gmail.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox