linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
@ 2025-04-07 18:01 Johannes Weiner
  2025-04-07 18:01 ` [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback() Johannes Weiner
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Johannes Weiner @ 2025-04-07 18:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, Carlos Song,
	linux-mm, linux-kernel, kernel test robot, stable

The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal
single pages from biggest buddy") as the root cause of a 56.4%
regression in vm-scalability::lru-file-mmap-read.

Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix
freelist movement during block conversion"), as the root cause for a
regression in worst-case zone->lock+irqoff hold times.

Both of these patches modify the page allocator's fallback path to be
less greedy in an effort to stave off fragmentation. The flip side of
this is that fallbacks are also less productive each time around,
which means the fallback search can run much more frequently.

Carlos' traces point to rmqueue_bulk() specifically, which tries to
refill the percpu cache by allocating a large batch of pages in a
loop. It highlights how once the native freelists are exhausted, the
fallback code first scans orders top-down for whole blocks to claim,
then falls back to a bottom-up search for the smallest buddy to steal.
For the next batch page, it goes through the same thing again.

This can be made more efficient. Since rmqueue_bulk() holds the
zone->lock over the entire batch, the freelists are not subject to
outside changes; when the search for a block to claim has already
failed, there is no point in trying again for the next page.

Modify __rmqueue() to remember the last successful fallback mode, and
restart directly from there on the next rmqueue_bulk() iteration.

Oliver confirms that this improves beyond the regression that the test
robot reported against c2f6ea38fc1b:

commit:
  f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
  c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
  acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
  2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()")   <--- your patch

f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
---------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \
  25525364 ±  3%     -56.4%   11135467           -57.8%   10779336           +31.6%   33581409        vm-scalability.throughput

Carlos confirms that worst-case times are almost fully recovered
compared to before the earlier culprit patch:

  2dd482ba627d (before freelist hygiene):    1ms
  c0cd6f557b90  (after freelist hygiene):   90ms
 next-20250319    (steal smallest buddy):  280ms
    this patch                          :    8ms

Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Carlos Song <carlos.song@nxp.com>
Tested-by: kernel test robot <oliver.sang@intel.com>
Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion")
Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy")
Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com
Cc: stable@vger.kernel.org	# 6.10+
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/page_alloc.c | 100 +++++++++++++++++++++++++++++++++++-------------
 1 file changed, 74 insertions(+), 26 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f51aa6051a99..03b0d45ed45a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2194,11 +2194,11 @@ try_to_claim_block(struct zone *zone, struct page *page,
  * The use of signed ints for order and current_order is a deliberate
  * deviation from the rest of this file, to make the for loop
  * condition simpler.
- *
- * Return the stolen page, or NULL if none can be found.
  */
+
+/* Try to claim a whole foreign block, take a page, expand the remainder */
 static __always_inline struct page *
-__rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
+__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
 						unsigned int alloc_flags)
 {
 	struct free_area *area;
@@ -2236,14 +2236,26 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
 		page = try_to_claim_block(zone, page, current_order, order,
 					  start_migratetype, fallback_mt,
 					  alloc_flags);
-		if (page)
-			goto got_one;
+		if (page) {
+			trace_mm_page_alloc_extfrag(page, order, current_order,
+						    start_migratetype, fallback_mt);
+			return page;
+		}
 	}
 
-	if (alloc_flags & ALLOC_NOFRAGMENT)
-		return NULL;
+	return NULL;
+}
+
+/* Try to steal a single page from a foreign block */
+static __always_inline struct page *
+__rmqueue_steal(struct zone *zone, int order, int start_migratetype)
+{
+	struct free_area *area;
+	int current_order;
+	struct page *page;
+	int fallback_mt;
+	bool claim_block;
 
-	/* No luck claiming pageblock. Find the smallest fallback page */
 	for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
 		area = &(zone->free_area[current_order]);
 		fallback_mt = find_suitable_fallback(area, current_order,
@@ -2253,25 +2265,28 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
 
 		page = get_page_from_free_area(area, fallback_mt);
 		page_del_and_expand(zone, page, order, current_order, fallback_mt);
-		goto got_one;
+		trace_mm_page_alloc_extfrag(page, order, current_order,
+					    start_migratetype, fallback_mt);
+		return page;
 	}
 
 	return NULL;
-
-got_one:
-	trace_mm_page_alloc_extfrag(page, order, current_order,
-		start_migratetype, fallback_mt);
-
-	return page;
 }
 
+enum rmqueue_mode {
+	RMQUEUE_NORMAL,
+	RMQUEUE_CMA,
+	RMQUEUE_CLAIM,
+	RMQUEUE_STEAL,
+};
+
 /*
  * Do the hard work of removing an element from the buddy allocator.
  * Call me with the zone->lock already held.
  */
 static __always_inline struct page *
 __rmqueue(struct zone *zone, unsigned int order, int migratetype,
-						unsigned int alloc_flags)
+	  unsigned int alloc_flags, enum rmqueue_mode *mode)
 {
 	struct page *page;
 
@@ -2290,16 +2305,47 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
 		}
 	}
 
-	page = __rmqueue_smallest(zone, order, migratetype);
-	if (unlikely(!page)) {
-		if (alloc_flags & ALLOC_CMA)
+	/*
+	 * Try the different freelists, native then foreign.
+	 *
+	 * The fallback logic is expensive and rmqueue_bulk() calls in
+	 * a loop with the zone->lock held, meaning the freelists are
+	 * not subject to any outside changes. Remember in *mode where
+	 * we found pay dirt, to save us the search on the next call.
+	 */
+	switch (*mode) {
+	case RMQUEUE_NORMAL:
+		page = __rmqueue_smallest(zone, order, migratetype);
+		if (page)
+			return page;
+		fallthrough;
+	case RMQUEUE_CMA:
+		if (alloc_flags & ALLOC_CMA) {
 			page = __rmqueue_cma_fallback(zone, order);
-
-		if (!page)
-			page = __rmqueue_fallback(zone, order, migratetype,
-						  alloc_flags);
+			if (page) {
+				*mode = RMQUEUE_CMA;
+				return page;
+			}
+		}
+		fallthrough;
+	case RMQUEUE_CLAIM:
+		page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
+		if (page) {
+			/* Replenished native freelist, back to normal mode */
+			*mode = RMQUEUE_NORMAL;
+			return page;
+		}
+		fallthrough;
+	case RMQUEUE_STEAL:
+		if (!(alloc_flags & ALLOC_NOFRAGMENT)) {
+			page = __rmqueue_steal(zone, order, migratetype);
+			if (page) {
+				*mode = RMQUEUE_STEAL;
+				return page;
+			}
+		}
 	}
-	return page;
+	return NULL;
 }
 
 /*
@@ -2311,6 +2357,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			unsigned long count, struct list_head *list,
 			int migratetype, unsigned int alloc_flags)
 {
+	enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
 	unsigned long flags;
 	int i;
 
@@ -2321,7 +2368,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 	}
 	for (i = 0; i < count; ++i) {
 		struct page *page = __rmqueue(zone, order, migratetype,
-								alloc_flags);
+					      alloc_flags, &rmqm);
 		if (unlikely(page == NULL))
 			break;
 
@@ -2934,6 +2981,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
 {
 	struct page *page;
 	unsigned long flags;
+	enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
 
 	do {
 		page = NULL;
@@ -2945,7 +2993,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
 		if (alloc_flags & ALLOC_HIGHATOMIC)
 			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
 		if (!page) {
-			page = __rmqueue(zone, order, migratetype, alloc_flags);
+			page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm);
 
 			/*
 			 * If the allocation fails, allow OOM handling and
-- 
2.49.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback()
  2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
@ 2025-04-07 18:01 ` Johannes Weiner
  2025-04-10  8:51   ` Vlastimil Babka
                     ` (2 more replies)
  2025-04-08 17:22 ` [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Brendan Jackman
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 18+ messages in thread
From: Johannes Weiner @ 2025-04-07 18:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, Carlos Song,
	linux-mm, linux-kernel

find_suitable_fallback() is not as efficient as it could be, and
somewhat difficult to follow.

1. should_try_claim_block() is a loop invariant. There is no point in
   checking fallback areas if the caller is interested in claimable
   blocks but the order and the migratetype don't allow for that.

2. __rmqueue_steal() doesn't care about claimability, so it shouldn't
   have to run those tests.

Different callers want different things from this helper:

1. __compact_finished() scans orders up until it finds a claimable block
2. __rmqueue_claim() scans orders down as long as blocks are claimable
3. __rmqueue_steal() doesn't care about claimability at all

Move should_try_claim_block() out of the loop. Only test it for the
two callers who care in the first place. Distinguish "no blocks" from
"order + mt are not claimable" in the return value; __rmqueue_claim()
can stop once order becomes unclaimable, __compact_finished() can keep
advancing until order becomes claimable.

Before:

 Performance counter stats for './run case-lru-file-mmap-read' (5 runs):

	 85,294.85 msec task-clock                       #    5.644 CPUs utilized               ( +-  0.32% )
	    15,968      context-switches                 #  187.209 /sec                        ( +-  3.81% )
	       153      cpu-migrations                   #    1.794 /sec                        ( +-  3.29% )
	   801,808      page-faults                      #    9.400 K/sec                       ( +-  0.10% )
   733,358,331,786      instructions                     #    1.87  insn per cycle              ( +-  0.20% )  (64.94%)
   392,622,904,199      cycles                           #    4.603 GHz                         ( +-  0.31% )  (64.84%)
   148,563,488,531      branches                         #    1.742 G/sec                       ( +-  0.18% )  (63.86%)
       152,143,228      branch-misses                    #    0.10% of all branches             ( +-  1.19% )  (62.82%)

	   15.1128 +- 0.0637 seconds time elapsed  ( +-  0.42% )

After:

 Performance counter stats for './run case-lru-file-mmap-read' (5 runs):

         84,380.21 msec task-clock                       #    5.664 CPUs utilized               ( +-  0.21% )
            16,656      context-switches                 #  197.392 /sec                        ( +-  3.27% )
               151      cpu-migrations                   #    1.790 /sec                        ( +-  3.28% )
           801,703      page-faults                      #    9.501 K/sec                       ( +-  0.09% )
   731,914,183,060      instructions                     #    1.88  insn per cycle              ( +-  0.38% )  (64.90%)
   388,673,535,116      cycles                           #    4.606 GHz                         ( +-  0.24% )  (65.06%)
   148,251,482,143      branches                         #    1.757 G/sec                       ( +-  0.37% )  (63.92%)
       149,766,550      branch-misses                    #    0.10% of all branches             ( +-  1.22% )  (62.88%)

           14.8968 +- 0.0486 seconds time elapsed  ( +-  0.33% )

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/compaction.c |  4 +---
 mm/internal.h   |  2 +-
 mm/page_alloc.c | 31 +++++++++++++------------------
 3 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 139f00c0308a..7462a02802a5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2348,7 +2348,6 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 	ret = COMPACT_NO_SUITABLE_PAGE;
 	for (order = cc->order; order < NR_PAGE_ORDERS; order++) {
 		struct free_area *area = &cc->zone->free_area[order];
-		bool claim_block;
 
 		/* Job done if page is free of the right migratetype */
 		if (!free_area_empty(area, migratetype))
@@ -2364,8 +2363,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 		 * Job done if allocation would steal freepages from
 		 * other migratetype buddy lists.
 		 */
-		if (find_suitable_fallback(area, order, migratetype,
-						true, &claim_block) != -1)
+		if (find_suitable_fallback(area, order, migratetype, true) >= 0)
 			/*
 			 * Movable pages are OK in any pageblock. If we are
 			 * stealing for a non-movable allocation, make sure
diff --git a/mm/internal.h b/mm/internal.h
index 50c2f590b2d0..55384b9971c3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -915,7 +915,7 @@ static inline void init_cma_pageblock(struct page *page)
 
 
 int find_suitable_fallback(struct free_area *area, unsigned int order,
-			int migratetype, bool claim_only, bool *claim_block);
+			   int migratetype, bool claimable);
 
 static inline bool free_area_empty(struct free_area *area, int migratetype)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 03b0d45ed45a..1522e3a29b16 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2077,31 +2077,25 @@ static bool should_try_claim_block(unsigned int order, int start_mt)
 
 /*
  * Check whether there is a suitable fallback freepage with requested order.
- * Sets *claim_block to instruct the caller whether it should convert a whole
- * pageblock to the returned migratetype.
- * If only_claim is true, this function returns fallback_mt only if
+ * If claimable is true, this function returns fallback_mt only if
  * we would do this whole-block claiming. This would help to reduce
  * fragmentation due to mixed migratetype pages in one pageblock.
  */
 int find_suitable_fallback(struct free_area *area, unsigned int order,
-			int migratetype, bool only_claim, bool *claim_block)
+			   int migratetype, bool claimable)
 {
 	int i;
-	int fallback_mt;
+
+	if (claimable && !should_try_claim_block(order, migratetype))
+		return -2;
 
 	if (area->nr_free == 0)
 		return -1;
 
-	*claim_block = false;
 	for (i = 0; i < MIGRATE_PCPTYPES - 1 ; i++) {
-		fallback_mt = fallbacks[migratetype][i];
-		if (free_area_empty(area, fallback_mt))
-			continue;
+		int fallback_mt = fallbacks[migratetype][i];
 
-		if (should_try_claim_block(order, migratetype))
-			*claim_block = true;
-
-		if (*claim_block || !only_claim)
+		if (!free_area_empty(area, fallback_mt))
 			return fallback_mt;
 	}
 
@@ -2206,7 +2200,6 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
 	int min_order = order;
 	struct page *page;
 	int fallback_mt;
-	bool claim_block;
 
 	/*
 	 * Do not steal pages from freelists belonging to other pageblocks
@@ -2225,11 +2218,14 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
 				--current_order) {
 		area = &(zone->free_area[current_order]);
 		fallback_mt = find_suitable_fallback(area, current_order,
-				start_migratetype, false, &claim_block);
+						     start_migratetype, true);
+
+		/* No block in that order */
 		if (fallback_mt == -1)
 			continue;
 
-		if (!claim_block)
+		/* Advanced into orders too low to claim, abort */
+		if (fallback_mt == -2)
 			break;
 
 		page = get_page_from_free_area(area, fallback_mt);
@@ -2254,12 +2250,11 @@ __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
 	int current_order;
 	struct page *page;
 	int fallback_mt;
-	bool claim_block;
 
 	for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
 		area = &(zone->free_area[current_order]);
 		fallback_mt = find_suitable_fallback(area, current_order,
-				start_migratetype, false, &claim_block);
+						     start_migratetype, false);
 		if (fallback_mt == -1)
 			continue;
 
-- 
2.49.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
  2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
  2025-04-07 18:01 ` [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback() Johannes Weiner
@ 2025-04-08 17:22 ` Brendan Jackman
  2025-04-08 18:50   ` Johannes Weiner
  2025-04-09  8:02 ` Yunsheng Lin
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Brendan Jackman @ 2025-04-08 17:22 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Vlastimil Babka, Mel Gorman, Carlos Song, linux-mm, linux-kernel,
	kernel test robot, stable

On Mon Apr 7, 2025 at 6:01 PM UTC, Johannes Weiner wrote:
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2194,11 +2194,11 @@ try_to_claim_block(struct zone *zone, struct page *page,
>   * The use of signed ints for order and current_order is a deliberate
>   * deviation from the rest of this file, to make the for loop
>   * condition simpler.
> - *
> - * Return the stolen page, or NULL if none can be found.
>   */

This commentary is pretty confusing now, there's a block of text that
kinda vaguely applies to the aggregate of __rmqueue_steal(),
__rmqueue_fallback() and half of __rmqueue(). I think this new code does
a better job of speaking for itself so I think we should just delete
this block comment and replace it with some more verbosity elsewhere.

> +
> +/* Try to claim a whole foreign block, take a page, expand the remainder */

Also on the commentary front, I am not a fan of "foreign" and "native":

- "Foreign" is already used in this file to mean NUMA-nonlocal.

- We already have "start" and "fallback" being used in identifiers
  as adjectives to describe the mitegratetype concept.

  I wouldn't say those are _better_, "native" and "foreign" might be
  clearer, but it's not worth introducing inconsistency IMO.

>  static __always_inline struct page *
> -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
> +__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
>  						unsigned int alloc_flags)
>  {
>  	struct free_area *area;

[pasting in more context that wasn't in the original diff..]
>	/*
>	 * Find the largest available free page in the other list. This roughly
>	 * approximates finding the pageblock with the most free pages, which
>	 * would be too costly to do exactly.
>	 */
>	for (current_order = MAX_PAGE_ORDER; current_order >= min_order;
>				--current_order) {

IIUC we could go one step further here and also avoid repeating this
iteration? Maybe something for a separate patch though?

Anyway, the approach seems like a clear improvement, thanks. I will need
to take a closer look at it tomorrow, I've run out of brain juice today.

Here's what I got from redistributing the block comment and flipping
the terminology:

diff --git i/mm/page_alloc.c w/mm/page_alloc.c
index dfb2b3f508af..b8142d605691 100644
--- i/mm/page_alloc.c
+++ w/mm/page_alloc.c
@@ -2183,21 +2183,13 @@ try_to_claim_block(struct zone *zone, struct page *page,
 }
 
 /*
- * Try finding a free buddy page on the fallback list.
- *
- * This will attempt to claim a whole pageblock for the requested type
- * to ensure grouping of such requests in the future.
- *
- * If a whole block cannot be claimed, steal an individual page, regressing to
- * __rmqueue_smallest() logic to at least break up as little contiguity as
- * possible.
+ * Try to allocate from some fallback migratetype by claiming the entire block,
+ * i.e. converting it to the allocation's start migratetype.
  *
  * The use of signed ints for order and current_order is a deliberate
  * deviation from the rest of this file, to make the for loop
  * condition simpler.
  */
-
-/* Try to claim a whole foreign block, take a page, expand the remainder */
 static __always_inline struct page *
 __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
                                                unsigned int alloc_flags)
@@ -2247,7 +2239,10 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
        return NULL;
 }
 
-/* Try to steal a single page from a foreign block */
+/*
+ * Try to steal a single page from some fallback migratetype. Leave the rest of
+ * the block as its current migratetype, potentially causing fragmentation.
+ */
 static __always_inline struct page *
 __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
 {
@@ -2307,7 +2302,9 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
        }
 
        /*
-        * Try the different freelists, native then foreign.
+        * First try the freelists of the requested migratetype, then try
+        * fallbacks. Roughly, each fallback stage poses more of a fragmentation
+        * risk.
         *
         * The fallback logic is expensive and rmqueue_bulk() calls in
         * a loop with the zone->lock held, meaning the freelists are
@@ -2332,7 +2329,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
        case RMQUEUE_CLAIM:
                page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
                if (page) {
-                       /* Replenished native freelist, back to normal mode */
+                       /* Replenished requested migratetype's freelist, back to normal mode */
                        *mode = RMQUEUE_NORMAL;
                        return page;
                }




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
  2025-04-08 17:22 ` [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Brendan Jackman
@ 2025-04-08 18:50   ` Johannes Weiner
  2025-04-09 17:30     ` Brendan Jackman
  0 siblings, 1 reply; 18+ messages in thread
From: Johannes Weiner @ 2025-04-08 18:50 UTC (permalink / raw)
  To: Brendan Jackman
  Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Carlos Song,
	linux-mm, linux-kernel, kernel test robot, stable

On Tue, Apr 08, 2025 at 05:22:00PM +0000, Brendan Jackman wrote:
> On Mon Apr 7, 2025 at 6:01 PM UTC, Johannes Weiner wrote:
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2194,11 +2194,11 @@ try_to_claim_block(struct zone *zone, struct page *page,
> >   * The use of signed ints for order and current_order is a deliberate
> >   * deviation from the rest of this file, to make the for loop
> >   * condition simpler.
> > - *
> > - * Return the stolen page, or NULL if none can be found.
> >   */
> 
> This commentary is pretty confusing now, there's a block of text that
> kinda vaguely applies to the aggregate of __rmqueue_steal(),
> __rmqueue_fallback() and half of __rmqueue(). I think this new code does
> a better job of speaking for itself so I think we should just delete
> this block comment and replace it with some more verbosity elsewhere.

I'm glad you think so, let's remove it then!

> > +/* Try to claim a whole foreign block, take a page, expand the remainder */
> 
> Also on the commentary front, I am not a fan of "foreign" and "native":
> 
> - "Foreign" is already used in this file to mean NUMA-nonlocal.
> 
> - We already have "start" and "fallback" being used in identifiers
>   as adjectives to describe the mitegratetype concept.
> 
>   I wouldn't say those are _better_, "native" and "foreign" might be
>   clearer, but it's not worth introducing inconsistency IMO.

That's a fair point, no objection to renaming them.

> >  static __always_inline struct page *
> > -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
> > +__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> >  						unsigned int alloc_flags)
> >  {
> >  	struct free_area *area;
> 
> [pasting in more context that wasn't in the original diff..]
> >	/*
> >	 * Find the largest available free page in the other list. This roughly
> >	 * approximates finding the pageblock with the most free pages, which
> >	 * would be too costly to do exactly.
> >	 */
> >	for (current_order = MAX_PAGE_ORDER; current_order >= min_order;
> >				--current_order) {
> 
> IIUC we could go one step further here and also avoid repeating this
> iteration? Maybe something for a separate patch though?

That might be worth a test, but agree this should be a separate patch.

AFAICS, in the most common configurations MAX_PAGE_ORDER is only one
step above pageblock_order or even the same. It might not be worth the
complication.

> Anyway, the approach seems like a clear improvement, thanks. I will need
> to take a closer look at it tomorrow, I've run out of brain juice today.

Much appreciate you taking a look, thanks.

> Here's what I got from redistributing the block comment and flipping
> the terminology:
> 
> diff --git i/mm/page_alloc.c w/mm/page_alloc.c
> index dfb2b3f508af..b8142d605691 100644
> --- i/mm/page_alloc.c
> +++ w/mm/page_alloc.c
> @@ -2183,21 +2183,13 @@ try_to_claim_block(struct zone *zone, struct page *page,
>  }
>  
>  /*
> - * Try finding a free buddy page on the fallback list.
> - *
> - * This will attempt to claim a whole pageblock for the requested type
> - * to ensure grouping of such requests in the future.
> - *
> - * If a whole block cannot be claimed, steal an individual page, regressing to
> - * __rmqueue_smallest() logic to at least break up as little contiguity as
> - * possible.
> + * Try to allocate from some fallback migratetype by claiming the entire block,
> + * i.e. converting it to the allocation's start migratetype.
>   *
>   * The use of signed ints for order and current_order is a deliberate
>   * deviation from the rest of this file, to make the for loop
>   * condition simpler.
>   */
> -
> -/* Try to claim a whole foreign block, take a page, expand the remainder */
>  static __always_inline struct page *
>  __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
>                                                 unsigned int alloc_flags)
> @@ -2247,7 +2239,10 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
>         return NULL;
>  }
>  
> -/* Try to steal a single page from a foreign block */
> +/*
> + * Try to steal a single page from some fallback migratetype. Leave the rest of
> + * the block as its current migratetype, potentially causing fragmentation.
> + */
>  static __always_inline struct page *
>  __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
>  {
> @@ -2307,7 +2302,9 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
>         }
>  
>         /*
> -        * Try the different freelists, native then foreign.
> +        * First try the freelists of the requested migratetype, then try
> +        * fallbacks. Roughly, each fallback stage poses more of a fragmentation
> +        * risk.

How about "then try fallback modes with increasing levels of
fragmentation risk."

>          * The fallback logic is expensive and rmqueue_bulk() calls in
>          * a loop with the zone->lock held, meaning the freelists are
> @@ -2332,7 +2329,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
>         case RMQUEUE_CLAIM:
>                 page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
>                 if (page) {
> -                       /* Replenished native freelist, back to normal mode */
> +                       /* Replenished requested migratetype's freelist, back to normal mode */
>                         *mode = RMQUEUE_NORMAL;

This line is kind of long now. How about:

			/* Replenished preferred freelist, back to normal mode */

But yeah, I like your proposed changes. Would you care to send a
proper patch?

Acked-by: Johannes Weiner <hannes@cmpxchg.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
  2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
  2025-04-07 18:01 ` [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback() Johannes Weiner
  2025-04-08 17:22 ` [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Brendan Jackman
@ 2025-04-09  8:02 ` Yunsheng Lin
  2025-04-09 14:00   ` Johannes Weiner
  2025-04-10  2:02 ` Zi Yan
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Yunsheng Lin @ 2025-04-09  8:02 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, Carlos Song,
	linux-mm, linux-kernel, kernel test robot, stable

On 2025/4/8 2:01, Johannes Weiner wrote:

...

>  
> @@ -2934,6 +2981,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
>  {
>  	struct page *page;
>  	unsigned long flags;
> +	enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
>  
>  	do {
>  		page = NULL;
> @@ -2945,7 +2993,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
>  		if (alloc_flags & ALLOC_HIGHATOMIC)
>  			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
>  		if (!page) {
> -			page = __rmqueue(zone, order, migratetype, alloc_flags);
> +			page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm);
>  
>  			/*
>  			 * If the allocation fails, allow OOM handling and

It was not in the diff, but it seems the zone->lock is held inside the do..while loop,
doesn't it mean that the freelists are subject to outside changes and rmqm is stale?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
  2025-04-09  8:02 ` Yunsheng Lin
@ 2025-04-09 14:00   ` Johannes Weiner
  0 siblings, 0 replies; 18+ messages in thread
From: Johannes Weiner @ 2025-04-09 14:00 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Andrew Morton, Vlastimil Babka, Brendan Jackman, Mel Gorman,
	Carlos Song, linux-mm, linux-kernel, kernel test robot, stable

On Wed, Apr 09, 2025 at 04:02:39PM +0800, Yunsheng Lin wrote:
> On 2025/4/8 2:01, Johannes Weiner wrote:
> > @@ -2934,6 +2981,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
> >  {
> >  	struct page *page;
> >  	unsigned long flags;
> > +	enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
> >  
> >  	do {
> >  		page = NULL;
> > @@ -2945,7 +2993,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
> >  		if (alloc_flags & ALLOC_HIGHATOMIC)
> >  			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> >  		if (!page) {
> > -			page = __rmqueue(zone, order, migratetype, alloc_flags);
> > +			page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm);
> >  
> >  			/*
> >  			 * If the allocation fails, allow OOM handling and
> 
> It was not in the diff, but it seems the zone->lock is held inside the do..while loop,
> doesn't it mean that the freelists are subject to outside changes and rmqm is stale?

Yes. Note that it only loops when there is a bug/corrupted page, so it
won't make much difference in practice. But it's still kind of weird.

Thanks for your review, Yunsheng!

Andrew, could you please fold the below fixlet?

---

From 71b1eea7ded41c1f674909f9755c23b9ee9bcb6a Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Wed, 9 Apr 2025 09:56:52 -0400
Subject: [PATCH] mm: page_alloc: speed up fallbacks in rmqueue_bulk() fix

reset rmqueue_mode in rmqueue_buddy() error loop, per Yunsheng Lin

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dfb2b3f508af..7ffeeb0f62d3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2983,7 +2983,6 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
 {
 	struct page *page;
 	unsigned long flags;
-	enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
 
 	do {
 		page = NULL;
@@ -2996,6 +2995,8 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
 		if (alloc_flags & ALLOC_HIGHATOMIC)
 			page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
 		if (!page) {
+			enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
+
 			page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm);
 
 			/*
-- 
2.49.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
  2025-04-08 18:50   ` Johannes Weiner
@ 2025-04-09 17:30     ` Brendan Jackman
  2025-04-10  8:16       ` Vlastimil Babka
  0 siblings, 1 reply; 18+ messages in thread
From: Brendan Jackman @ 2025-04-09 17:30 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Carlos Song,
	linux-mm, linux-kernel, kernel test robot, stable

On Tue Apr 8, 2025 at 6:50 PM UTC, Johannes Weiner wrote:
>> >	/*
>> >	 * Find the largest available free page in the other list. This roughly
>> >	 * approximates finding the pageblock with the most free pages, which
>> >	 * would be too costly to do exactly.
>> >	 */
>> >	for (current_order = MAX_PAGE_ORDER; current_order >= min_order;
>> >				--current_order) {
>> 
>> IIUC we could go one step further here and also avoid repeating this
>> iteration? Maybe something for a separate patch though?
>
> That might be worth a test, but agree this should be a separate patch.
>
> AFAICS, in the most common configurations MAX_PAGE_ORDER is only one
> step above pageblock_order or even the same. It might not be worth the
> complication.

Oh yeah, makes sense.

>>         /*
>> -        * Try the different freelists, native then foreign.
>> +        * First try the freelists of the requested migratetype, then try
>> +        * fallbacks. Roughly, each fallback stage poses more of a fragmentation
>> +        * risk.
>
> How about "then try fallback modes with increasing levels of
> fragmentation risk."

Yep, nice thanks.

>>          * The fallback logic is expensive and rmqueue_bulk() calls in
>>          * a loop with the zone->lock held, meaning the freelists are
>> @@ -2332,7 +2329,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
>>         case RMQUEUE_CLAIM:
>>                 page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
>>                 if (page) {
>> -                       /* Replenished native freelist, back to normal mode */
>> +                       /* Replenished requested migratetype's freelist, back to normal mode */
>>                         *mode = RMQUEUE_NORMAL;
>
> This line is kind of long now. How about:
>
> 			/* Replenished preferred freelist, back to normal mode */

Yep, sounds good - it's still 81 characters, the rest of this file
sticks to 80 for comments, I guess I'll leave it to Andrew to decide if
that is an issue?

> But yeah, I like your proposed changes. Would you care to send a
> proper patch?

Sure, pasting below. Andrew, could you fold this in? Also, I haven't
done this style of patch sending before, please let me know if I'm doing
something to make your life difficult.

> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Aside from the commen stuff fixed by the patch below:

Reviewed-by: Brendan Jackman <jackmanb@google.com>

---

From 8ff20dbb52770d082e182482d2b47e521de028d1 Mon Sep 17 00:00:00 2001                                                                                                                                                                                                                    
From: Brendan Jackman <jackmanb@google.com>
Date: Wed, 9 Apr 2025 17:22:14 +000
Subject: [PATCH] page_alloc: speed up fallbacks in rmqueue_bulk() - comment updates

Tidy up some terminology and redistribute commentary.                                                                                                                                                                                                                                                                                            
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 mm/page_alloc.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dfb2b3f508af4..220bd0bcc38c3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2183,21 +2183,13 @@ try_to_claim_block(struct zone *zone, struct page *page,
 }
  
 /*
- * Try finding a free buddy page on the fallback list.
- *
- * This will attempt to claim a whole pageblock for the requested type
- * to ensure grouping of such requests in the future.
- *
- * If a whole block cannot be claimed, steal an individual page, regressing to
- * __rmqueue_smallest() logic to at least break up as little contiguity as
- * possible.
+ * Try to allocate from some fallback migratetype by claiming the entire block,
+ * i.e. converting it to the allocation's start migratetype.
  *
  * The use of signed ints for order and current_order is a deliberate 
  * deviation from the rest of this file, to make the for loop
  * condition simpler.
  */
-
-/* Try to claim a whole foreign block, take a page, expand the remainder */
 static __always_inline struct page *
 __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
                                                unsigned int alloc_flags)
@@ -2247,7 +2239,10 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
        return NULL;
 }
  
-/* Try to steal a single page from a foreign block */
+/*
+ * Try to steal a single page from some fallback migratetype. Leave the rest of
+ * the block as its current migratetype, potentially causing fragmentation.
+ */
 static __always_inline struct page *
 __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
 {
@@ -2307,7 +2302,8 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
        }
  
        /*
-        * Try the different freelists, native then foreign.
+        * First try the freelists of the requested migratetype, then try
+        * fallbacks modes with increasing levels of fragmentation risk.
         *
         * The fallback logic is expensive and rmqueue_bulk() calls in
         * a loop with the zone->lock held, meaning the freelists are
@@ -2332,7 +2328,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
        case RMQUEUE_CLAIM:
                page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
                if (page) {
-                       /* Replenished native freelist, back to normal mode */
+                       /* Replenished preferred freelist, back to normal mode. */
                        *mode = RMQUEUE_NORMAL;
                        return page;
                }

base-commit: aa42382db4e2a4ed1f4ba97ffc50e2ce45accb0c
-- 
2.49.0.504.g3bcea36a83-goog




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
  2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
                   ` (2 preceding siblings ...)
  2025-04-09  8:02 ` Yunsheng Lin
@ 2025-04-10  2:02 ` Zi Yan
  2025-04-10  7:03 ` [EXT] " Carlos Song
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Zi Yan @ 2025-04-10  2:02 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, Carlos Song,
	linux-mm, linux-kernel, kernel test robot, stable

On Mon Apr 7, 2025 at 2:01 PM EDT, Johannes Weiner wrote:
> The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal
> single pages from biggest buddy") as the root cause of a 56.4%
> regression in vm-scalability::lru-file-mmap-read.
>
> Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix
> freelist movement during block conversion"), as the root cause for a
> regression in worst-case zone->lock+irqoff hold times.
>
> Both of these patches modify the page allocator's fallback path to be
> less greedy in an effort to stave off fragmentation. The flip side of
> this is that fallbacks are also less productive each time around,
> which means the fallback search can run much more frequently.
>
> Carlos' traces point to rmqueue_bulk() specifically, which tries to
> refill the percpu cache by allocating a large batch of pages in a
> loop. It highlights how once the native freelists are exhausted, the
> fallback code first scans orders top-down for whole blocks to claim,
> then falls back to a bottom-up search for the smallest buddy to steal.
> For the next batch page, it goes through the same thing again.
>
> This can be made more efficient. Since rmqueue_bulk() holds the
> zone->lock over the entire batch, the freelists are not subject to
> outside changes; when the search for a block to claim has already
> failed, there is no point in trying again for the next page.
>
> Modify __rmqueue() to remember the last successful fallback mode, and
> restart directly from there on the next rmqueue_bulk() iteration.
>
> Oliver confirms that this improves beyond the regression that the test
> robot reported against c2f6ea38fc1b:
>
> commit:
>   f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
>   c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
>   acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
>   2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()")   <--- your patch
>
> f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
> ---------------- --------------------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \          |                \
>   25525364 ±  3%     -56.4%   11135467           -57.8%   10779336           +31.6%   33581409        vm-scalability.throughput
>
> Carlos confirms that worst-case times are almost fully recovered
> compared to before the earlier culprit patch:
>
>   2dd482ba627d (before freelist hygiene):    1ms
>   c0cd6f557b90  (after freelist hygiene):   90ms
>  next-20250319    (steal smallest buddy):  280ms
>     this patch                          :    8ms
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Reported-by: Carlos Song <carlos.song@nxp.com>
> Tested-by: kernel test robot <oliver.sang@intel.com>
> Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion")
> Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy")
> Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com
> Cc: stable@vger.kernel.org	# 6.10+
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/page_alloc.c | 100 +++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 74 insertions(+), 26 deletions(-)
>

It is a really nice cleanup. It improves my understanding of the rmqueue*()
and the whole flow a lot. Thank you for the patch.

Acked-by: Zi Yan <ziy@nvidia.com>

-- 
Best Regards,
Yan, Zi



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [EXT] [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
  2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
                   ` (3 preceding siblings ...)
  2025-04-10  2:02 ` Zi Yan
@ 2025-04-10  7:03 ` Carlos Song
  2025-04-10  8:12 ` Vlastimil Babka
  2025-04-10 10:48 ` Shivank Garg
  6 siblings, 0 replies; 18+ messages in thread
From: Carlos Song @ 2025-04-10  7:03 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, linux-mm,
	linux-kernel, kernel test robot, stable

Hi,

That is a nice fix! I test it at imx7d.

Tested-by: Carlos Song <carlos.song@nxp.com>


> -----Original Message-----
> From: Johannes Weiner <hannes@cmpxchg.org>
> Sent: Tuesday, April 8, 2025 2:02 AM
> To: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>; Brendan Jackman
> <jackmanb@google.com>; Mel Gorman <mgorman@techsingularity.net>;
> Carlos Song <carlos.song@nxp.com>; linux-mm@kvack.org;
> linux-kernel@vger.kernel.org; kernel test robot <oliver.sang@intel.com>;
> stable@vger.kernel.org
> Subject: [EXT] [PATCH 1/2] mm: page_alloc: speed up fallbacks in
> rmqueue_bulk()
>
> Caution: This is an external email. Please take care when clicking links or
> opening attachments. When in doubt, report the message using the 'Report this
> email' button
>
>
> The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal single
> pages from biggest buddy") as the root cause of a 56.4% regression in
> vm-scalability::lru-file-mmap-read.
>
> Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix freelist
> movement during block conversion"), as the root cause for a regression in
> worst-case zone->lock+irqoff hold times.
>
> Both of these patches modify the page allocator's fallback path to be less greedy
> in an effort to stave off fragmentation. The flip side of this is that fallbacks are
> also less productive each time around, which means the fallback search can run
> much more frequently.
>
> Carlos' traces point to rmqueue_bulk() specifically, which tries to refill the
> percpu cache by allocating a large batch of pages in a loop. It highlights how
> once the native freelists are exhausted, the fallback code first scans orders
> top-down for whole blocks to claim, then falls back to a bottom-up search for the
> smallest buddy to steal.
> For the next batch page, it goes through the same thing again.
>
> This can be made more efficient. Since rmqueue_bulk() holds the
> zone->lock over the entire batch, the freelists are not subject to
> outside changes; when the search for a block to claim has already failed, there is
> no point in trying again for the next page.
>
> Modify __rmqueue() to remember the last successful fallback mode, and restart
> directly from there on the next rmqueue_bulk() iteration.
>
> Oliver confirms that this improves beyond the regression that the test robot
> reported against c2f6ea38fc1b:
>
> commit:
>   f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
>   c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
>   acc4d5ff0b ("Merge tag 'net-6.15-rc0' of
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
>   2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()")   <---
> your patch
>
> f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1
> acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
> ---------------- --------------------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change
>  %stddev     %change         %stddev
>              \          |                \          |
> \          |                \
>   25525364 ±  3%     -56.4%   11135467           -57.8%
> 10779336           +31.6%   33581409
> vm-scalability.throughput
>
> Carlos confirms that worst-case times are almost fully recovered compared to
> before the earlier culprit patch:
>
>   2dd482ba627d (before freelist hygiene):    1ms
>   c0cd6f557b90  (after freelist hygiene):   90ms
>  next-20250319    (steal smallest buddy):  280ms
>     this patch                          :    8ms
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Reported-by: Carlos Song <carlos.song@nxp.com>
> Tested-by: kernel test robot <oliver.sang@intel.com>
> Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block
> conversion")
> Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest
> buddy")
> Closes:
> https://lore.kern/
> el.org%2Foe-lkp%2F202503271547.fc08b188-lkp%40intel.com&data=05%7C02
> %7Ccarlos.song%40nxp.com%7C3325bfa02b7942cca36508dd75fe4a3d%7C686
> ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638796457225141027%7CUn
> known%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCI
> sIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdat
> a=gl5xQ8OLJNIccLDsYgVejUC%2B9ZQrxmt%2FxkfQXsmDuNY%3D&reserved=0
> Cc: stable@vger.kernel.org      # 6.10+
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/page_alloc.c | 100 +++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 74 insertions(+), 26 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index
> f51aa6051a99..03b0d45ed45a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2194,11 +2194,11 @@ try_to_claim_block(struct zone *zone, struct page
> *page,
>   * The use of signed ints for order and current_order is a deliberate
>   * deviation from the rest of this file, to make the for loop
>   * condition simpler.
> - *
> - * Return the stolen page, or NULL if none can be found.
>   */
> +
> +/* Try to claim a whole foreign block, take a page, expand the
> +remainder */
>  static __always_inline struct page *
> -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
> +__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
>                                                 unsigned int
> alloc_flags)  {
>         struct free_area *area;
> @@ -2236,14 +2236,26 @@ __rmqueue_fallback(struct zone *zone, int order,
> int start_migratetype,
>                 page = try_to_claim_block(zone, page, current_order, order,
>                                           start_migratetype,
> fallback_mt,
>                                           alloc_flags);
> -               if (page)
> -                       goto got_one;
> +               if (page) {
> +                       trace_mm_page_alloc_extfrag(page, order,
> current_order,
> +
> start_migratetype, fallback_mt);
> +                       return page;
> +               }
>         }
>
> -       if (alloc_flags & ALLOC_NOFRAGMENT)
> -               return NULL;
> +       return NULL;
> +}
> +
> +/* Try to steal a single page from a foreign block */ static
> +__always_inline struct page * __rmqueue_steal(struct zone *zone, int
> +order, int start_migratetype) {
> +       struct free_area *area;
> +       int current_order;
> +       struct page *page;
> +       int fallback_mt;
> +       bool claim_block;
>
> -       /* No luck claiming pageblock. Find the smallest fallback page */
>         for (current_order = order; current_order < NR_PAGE_ORDERS;
> current_order++) {
>                 area = &(zone->free_area[current_order]);
>                 fallback_mt = find_suitable_fallback(area, current_order,
> @@ -2253,25 +2265,28 @@ __rmqueue_fallback(struct zone *zone, int order,
> int start_migratetype,
>
>                 page = get_page_from_free_area(area, fallback_mt);
>                 page_del_and_expand(zone, page, order, current_order,
> fallback_mt);
> -               goto got_one;
> +               trace_mm_page_alloc_extfrag(page, order, current_order,
> +                                           start_migratetype,
> fallback_mt);
> +               return page;
>         }
>
>         return NULL;
> -
> -got_one:
> -       trace_mm_page_alloc_extfrag(page, order, current_order,
> -               start_migratetype, fallback_mt);
> -
> -       return page;
>  }
>
> +enum rmqueue_mode {
> +       RMQUEUE_NORMAL,
> +       RMQUEUE_CMA,
> +       RMQUEUE_CLAIM,
> +       RMQUEUE_STEAL,
> +};
> +
>  /*
>   * Do the hard work of removing an element from the buddy allocator.
>   * Call me with the zone->lock already held.
>   */
>  static __always_inline struct page *
>  __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> -                                               unsigned int
> alloc_flags)
> +         unsigned int alloc_flags, enum rmqueue_mode *mode)
>  {
>         struct page *page;
>
> @@ -2290,16 +2305,47 @@ __rmqueue(struct zone *zone, unsigned int order,
> int migratetype,
>                 }
>         }
>
> -       page = __rmqueue_smallest(zone, order, migratetype);
> -       if (unlikely(!page)) {
> -               if (alloc_flags & ALLOC_CMA)
> +       /*
> +        * Try the different freelists, native then foreign.
> +        *
> +        * The fallback logic is expensive and rmqueue_bulk() calls in
> +        * a loop with the zone->lock held, meaning the freelists are
> +        * not subject to any outside changes. Remember in *mode where
> +        * we found pay dirt, to save us the search on the next call.
> +        */
> +       switch (*mode) {
> +       case RMQUEUE_NORMAL:
> +               page = __rmqueue_smallest(zone, order, migratetype);
> +               if (page)
> +                       return page;
> +               fallthrough;
> +       case RMQUEUE_CMA:
> +               if (alloc_flags & ALLOC_CMA) {
>                         page = __rmqueue_cma_fallback(zone, order);
> -
> -               if (!page)
> -                       page = __rmqueue_fallback(zone, order,
> migratetype,
> -                                                 alloc_flags);
> +                       if (page) {
> +                               *mode = RMQUEUE_CMA;
> +                               return page;
> +                       }
> +               }
> +               fallthrough;
> +       case RMQUEUE_CLAIM:
> +               page = __rmqueue_claim(zone, order, migratetype,
> alloc_flags);
> +               if (page) {
> +                       /* Replenished native freelist, back to normal
> mode */
> +                       *mode = RMQUEUE_NORMAL;
> +                       return page;
> +               }
> +               fallthrough;
> +       case RMQUEUE_STEAL:
> +               if (!(alloc_flags & ALLOC_NOFRAGMENT)) {
> +                       page = __rmqueue_steal(zone, order,
> migratetype);
> +                       if (page) {
> +                               *mode = RMQUEUE_STEAL;
> +                               return page;
> +                       }
> +               }
>         }
> -       return page;
> +       return NULL;
>  }
>
>  /*
> @@ -2311,6 +2357,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned
> int order,
>                         unsigned long count, struct list_head *list,
>                         int migratetype, unsigned int alloc_flags)  {
> +       enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
>         unsigned long flags;
>         int i;
>
> @@ -2321,7 +2368,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned
> int order,
>         }
>         for (i = 0; i < count; ++i) {
>                 struct page *page = __rmqueue(zone, order, migratetype,
> -
> alloc_flags);
> +                                             alloc_flags, &rmqm);
>                 if (unlikely(page == NULL))
>                         break;
>
> @@ -2934,6 +2981,7 @@ struct page *rmqueue_buddy(struct zone
> *preferred_zone, struct zone *zone,  {
>         struct page *page;
>         unsigned long flags;
> +       enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
>
>         do {
>                 page = NULL;
> @@ -2945,7 +2993,7 @@ struct page *rmqueue_buddy(struct zone
> *preferred_zone, struct zone *zone,
>                 if (alloc_flags & ALLOC_HIGHATOMIC)
>                         page = __rmqueue_smallest(zone, order,
> MIGRATE_HIGHATOMIC);
>                 if (!page) {
> -                       page = __rmqueue(zone, order, migratetype,
> alloc_flags);
> +                       page = __rmqueue(zone, order, migratetype,
> + alloc_flags, &rmqm);
>
>                         /*
>                          * If the allocation fails, allow OOM handling and
> --
> 2.49.0



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
  2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
                   ` (4 preceding siblings ...)
  2025-04-10  7:03 ` [EXT] " Carlos Song
@ 2025-04-10  8:12 ` Vlastimil Babka
  2025-04-10 10:48 ` Shivank Garg
  6 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2025-04-10  8:12 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Brendan Jackman, Mel Gorman, Carlos Song, linux-mm, linux-kernel,
	kernel test robot, stable

On 4/7/25 20:01, Johannes Weiner wrote:
> The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal
> single pages from biggest buddy") as the root cause of a 56.4%
> regression in vm-scalability::lru-file-mmap-read.
> 
> Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix
> freelist movement during block conversion"), as the root cause for a
> regression in worst-case zone->lock+irqoff hold times.
> 
> Both of these patches modify the page allocator's fallback path to be
> less greedy in an effort to stave off fragmentation. The flip side of
> this is that fallbacks are also less productive each time around,
> which means the fallback search can run much more frequently.
> 
> Carlos' traces point to rmqueue_bulk() specifically, which tries to
> refill the percpu cache by allocating a large batch of pages in a
> loop. It highlights how once the native freelists are exhausted, the
> fallback code first scans orders top-down for whole blocks to claim,
> then falls back to a bottom-up search for the smallest buddy to steal.
> For the next batch page, it goes through the same thing again.
> 
> This can be made more efficient. Since rmqueue_bulk() holds the
> zone->lock over the entire batch, the freelists are not subject to
> outside changes; when the search for a block to claim has already
> failed, there is no point in trying again for the next page.
> 
> Modify __rmqueue() to remember the last successful fallback mode, and
> restart directly from there on the next rmqueue_bulk() iteration.
> 
> Oliver confirms that this improves beyond the regression that the test
> robot reported against c2f6ea38fc1b:
> 
> commit:
>   f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
>   c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
>   acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
>   2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()")   <--- your patch
> 
> f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
> ---------------- --------------------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \          |                \
>   25525364 ±  3%     -56.4%   11135467           -57.8%   10779336           +31.6%   33581409        vm-scalability.throughput
> 
> Carlos confirms that worst-case times are almost fully recovered
> compared to before the earlier culprit patch:
> 
>   2dd482ba627d (before freelist hygiene):    1ms
>   c0cd6f557b90  (after freelist hygiene):   90ms
>  next-20250319    (steal smallest buddy):  280ms
>     this patch                          :    8ms
> 
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Reported-by: Carlos Song <carlos.song@nxp.com>
> Tested-by: kernel test robot <oliver.sang@intel.com>
> Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion")
> Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy")
> Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com
> Cc: stable@vger.kernel.org	# 6.10+

Might be going to be fun with dependency commits.

> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Cool stuff.

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
  2025-04-09 17:30     ` Brendan Jackman
@ 2025-04-10  8:16       ` Vlastimil Babka
  0 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2025-04-10  8:16 UTC (permalink / raw)
  To: Brendan Jackman, Johannes Weiner
  Cc: Andrew Morton, Mel Gorman, Carlos Song, linux-mm, linux-kernel,
	kernel test robot, stable

On 4/9/25 19:30, Brendan Jackman wrote:
> 
> From 8ff20dbb52770d082e182482d2b47e521de028d1 Mon Sep 17 00:00:00 2001                                                                                                                                                                                                                    
> From: Brendan Jackman <jackmanb@google.com>
> Date: Wed, 9 Apr 2025 17:22:14 +000
> Subject: [PATCH] page_alloc: speed up fallbacks in rmqueue_bulk() - comment updates
> 
> Tidy up some terminology and redistribute commentary.                                                                                                                                                                                                                                                                                            
> Signed-off-by: Brendan Jackman <jackmanb@google.com>

LGTM (assuming folding)

> ---
>  mm/page_alloc.c | 22 +++++++++-------------
>  1 file changed, 9 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index dfb2b3f508af4..220bd0bcc38c3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2183,21 +2183,13 @@ try_to_claim_block(struct zone *zone, struct page *page,
>  }
>   
>  /*
> - * Try finding a free buddy page on the fallback list.
> - *
> - * This will attempt to claim a whole pageblock for the requested type
> - * to ensure grouping of such requests in the future.
> - *
> - * If a whole block cannot be claimed, steal an individual page, regressing to
> - * __rmqueue_smallest() logic to at least break up as little contiguity as
> - * possible.
> + * Try to allocate from some fallback migratetype by claiming the entire block,
> + * i.e. converting it to the allocation's start migratetype.
>   *
>   * The use of signed ints for order and current_order is a deliberate 
>   * deviation from the rest of this file, to make the for loop
>   * condition simpler.
>   */
> -
> -/* Try to claim a whole foreign block, take a page, expand the remainder */
>  static __always_inline struct page *
>  __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
>                                                 unsigned int alloc_flags)
> @@ -2247,7 +2239,10 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
>         return NULL;
>  }
>   
> -/* Try to steal a single page from a foreign block */
> +/*
> + * Try to steal a single page from some fallback migratetype. Leave the rest of
> + * the block as its current migratetype, potentially causing fragmentation.
> + */
>  static __always_inline struct page *
>  __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
>  {
> @@ -2307,7 +2302,8 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
>         }
>   
>         /*
> -        * Try the different freelists, native then foreign.
> +        * First try the freelists of the requested migratetype, then try
> +        * fallbacks modes with increasing levels of fragmentation risk.
>          *
>          * The fallback logic is expensive and rmqueue_bulk() calls in
>          * a loop with the zone->lock held, meaning the freelists are
> @@ -2332,7 +2328,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
>         case RMQUEUE_CLAIM:
>                 page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
>                 if (page) {
> -                       /* Replenished native freelist, back to normal mode */
> +                       /* Replenished preferred freelist, back to normal mode. */
>                         *mode = RMQUEUE_NORMAL;
>                         return page;
>                 }
> 
> base-commit: aa42382db4e2a4ed1f4ba97ffc50e2ce45accb0c



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback()
  2025-04-07 18:01 ` [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback() Johannes Weiner
@ 2025-04-10  8:51   ` Vlastimil Babka
  2025-04-10 10:50   ` Shivank Garg
  2025-04-10 13:55   ` Brendan Jackman
  2 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2025-04-10  8:51 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Brendan Jackman, Mel Gorman, Carlos Song, linux-mm, linux-kernel

On 4/7/25 20:01, Johannes Weiner wrote:
> find_suitable_fallback() is not as efficient as it could be, and
> somewhat difficult to follow.
> 
> 1. should_try_claim_block() is a loop invariant. There is no point in
>    checking fallback areas if the caller is interested in claimable
>    blocks but the order and the migratetype don't allow for that.
> 
> 2. __rmqueue_steal() doesn't care about claimability, so it shouldn't
>    have to run those tests.
> 
> Different callers want different things from this helper:
> 
> 1. __compact_finished() scans orders up until it finds a claimable block
> 2. __rmqueue_claim() scans orders down as long as blocks are claimable
> 3. __rmqueue_steal() doesn't care about claimability at all
> 
> Move should_try_claim_block() out of the loop. Only test it for the
> two callers who care in the first place. Distinguish "no blocks" from
> "order + mt are not claimable" in the return value; __rmqueue_claim()
> can stop once order becomes unclaimable, __compact_finished() can keep
> advancing until order becomes claimable.
> 
> Before:
> 
>  Performance counter stats for './run case-lru-file-mmap-read' (5 runs):
> 
> 	 85,294.85 msec task-clock                       #    5.644 CPUs utilized               ( +-  0.32% )
> 	    15,968      context-switches                 #  187.209 /sec                        ( +-  3.81% )
> 	       153      cpu-migrations                   #    1.794 /sec                        ( +-  3.29% )
> 	   801,808      page-faults                      #    9.400 K/sec                       ( +-  0.10% )
>    733,358,331,786      instructions                     #    1.87  insn per cycle              ( +-  0.20% )  (64.94%)
>    392,622,904,199      cycles                           #    4.603 GHz                         ( +-  0.31% )  (64.84%)
>    148,563,488,531      branches                         #    1.742 G/sec                       ( +-  0.18% )  (63.86%)
>        152,143,228      branch-misses                    #    0.10% of all branches             ( +-  1.19% )  (62.82%)
> 
> 	   15.1128 +- 0.0637 seconds time elapsed  ( +-  0.42% )
> 
> After:
> 
>  Performance counter stats for './run case-lru-file-mmap-read' (5 runs):
> 
>          84,380.21 msec task-clock                       #    5.664 CPUs utilized               ( +-  0.21% )
>             16,656      context-switches                 #  197.392 /sec                        ( +-  3.27% )
>                151      cpu-migrations                   #    1.790 /sec                        ( +-  3.28% )
>            801,703      page-faults                      #    9.501 K/sec                       ( +-  0.09% )
>    731,914,183,060      instructions                     #    1.88  insn per cycle              ( +-  0.38% )  (64.90%)
>    388,673,535,116      cycles                           #    4.606 GHz                         ( +-  0.24% )  (65.06%)
>    148,251,482,143      branches                         #    1.757 G/sec                       ( +-  0.37% )  (63.92%)
>        149,766,550      branch-misses                    #    0.10% of all branches             ( +-  1.22% )  (62.88%)
> 
>            14.8968 +- 0.0486 seconds time elapsed  ( +-  0.33% )
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Yay, you found a way to get rid of the ugly "bool claim_only, bool
*claim_block" parameter combo. Great!

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
  2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
                   ` (5 preceding siblings ...)
  2025-04-10  8:12 ` Vlastimil Babka
@ 2025-04-10 10:48 ` Shivank Garg
  6 siblings, 0 replies; 18+ messages in thread
From: Shivank Garg @ 2025-04-10 10:48 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, Carlos Song,
	linux-mm, linux-kernel, kernel test robot, stable



On 4/7/2025 11:31 PM, Johannes Weiner wrote:
> The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal
> single pages from biggest buddy") as the root cause of a 56.4%
> regression in vm-scalability::lru-file-mmap-read.
> 
> Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix
> freelist movement during block conversion"), as the root cause for a
> regression in worst-case zone->lock+irqoff hold times.
> 
> Both of these patches modify the page allocator's fallback path to be
> less greedy in an effort to stave off fragmentation. The flip side of
> this is that fallbacks are also less productive each time around,
> which means the fallback search can run much more frequently.
> 
> Carlos' traces point to rmqueue_bulk() specifically, which tries to
> refill the percpu cache by allocating a large batch of pages in a
> loop. It highlights how once the native freelists are exhausted, the
> fallback code first scans orders top-down for whole blocks to claim,
> then falls back to a bottom-up search for the smallest buddy to steal.
> For the next batch page, it goes through the same thing again.
> 
> This can be made more efficient. Since rmqueue_bulk() holds the
> zone->lock over the entire batch, the freelists are not subject to
> outside changes; when the search for a block to claim has already
> failed, there is no point in trying again for the next page.
> 
> Modify __rmqueue() to remember the last successful fallback mode, and
> restart directly from there on the next rmqueue_bulk() iteration.
> 
> Oliver confirms that this improves beyond the regression that the test
> robot reported against c2f6ea38fc1b:
> 
> commit:
>   f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
>   c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
>   acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
>   2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()")   <--- your patch
> 
> f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
> ---------------- --------------------------- --------------------------- ---------------------------
>          %stddev     %change         %stddev     %change         %stddev     %change         %stddev
>              \          |                \          |                \          |                \
>   25525364 ±  3%     -56.4%   11135467           -57.8%   10779336           +31.6%   33581409        vm-scalability.throughput
> 
> Carlos confirms that worst-case times are almost fully recovered
> compared to before the earlier culprit patch:
> 
>   2dd482ba627d (before freelist hygiene):    1ms
>   c0cd6f557b90  (after freelist hygiene):   90ms
>  next-20250319    (steal smallest buddy):  280ms
>     this patch                          :    8ms
> 
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Reported-by: Carlos Song <carlos.song@nxp.com>
> Tested-by: kernel test robot <oliver.sang@intel.com>
> Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion")
> Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy")
> Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com
> Cc: stable@vger.kernel.org	# 6.10+
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---

Tested on AMD Zen 3 EPYC (2-socket and 1 NUMA node, 64 CPUs on each socket)
vm-scalability/300s-lru-file-mmap-read.

		Vanilla		Patched		Change
Throughput	32267451	36112127 	+11.9% improvement
Stddev% 	2477.36		2969.18		+19.8%
Free Time	0.144072	0.148774	+3.2%
Median		227967		249851		+9.6%

Tested-by: Shivank Garg <shivankg@amd.com>

Thanks,
Shivank





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback()
  2025-04-07 18:01 ` [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback() Johannes Weiner
  2025-04-10  8:51   ` Vlastimil Babka
@ 2025-04-10 10:50   ` Shivank Garg
  2025-04-10 13:55   ` Brendan Jackman
  2 siblings, 0 replies; 18+ messages in thread
From: Shivank Garg @ 2025-04-10 10:50 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, Carlos Song,
	linux-mm, linux-kernel



On 4/7/2025 11:31 PM, Johannes Weiner wrote:
> find_suitable_fallback() is not as efficient as it could be, and
> somewhat difficult to follow.
> 
> 1. should_try_claim_block() is a loop invariant. There is no point in
>    checking fallback areas if the caller is interested in claimable
>    blocks but the order and the migratetype don't allow for that.
> 
> 2. __rmqueue_steal() doesn't care about claimability, so it shouldn't
>    have to run those tests.
> 
> Different callers want different things from this helper:
> 
> 1. __compact_finished() scans orders up until it finds a claimable block
> 2. __rmqueue_claim() scans orders down as long as blocks are claimable
> 3. __rmqueue_steal() doesn't care about claimability at all
> 
> Move should_try_claim_block() out of the loop. Only test it for the
> two callers who care in the first place. Distinguish "no blocks" from
> "order + mt are not claimable" in the return value; __rmqueue_claim()
> can stop once order becomes unclaimable, __compact_finished() can keep
> advancing until order becomes claimable.
> 
> Before:
> 
>  Performance counter stats for './run case-lru-file-mmap-read' (5 runs):
> 
> 	 85,294.85 msec task-clock                       #    5.644 CPUs utilized               ( +-  0.32% )
> 	    15,968      context-switches                 #  187.209 /sec                        ( +-  3.81% )
> 	       153      cpu-migrations                   #    1.794 /sec                        ( +-  3.29% )
> 	   801,808      page-faults                      #    9.400 K/sec                       ( +-  0.10% )
>    733,358,331,786      instructions                     #    1.87  insn per cycle              ( +-  0.20% )  (64.94%)
>    392,622,904,199      cycles                           #    4.603 GHz                         ( +-  0.31% )  (64.84%)
>    148,563,488,531      branches                         #    1.742 G/sec                       ( +-  0.18% )  (63.86%)
>        152,143,228      branch-misses                    #    0.10% of all branches             ( +-  1.19% )  (62.82%)
> 
> 	   15.1128 +- 0.0637 seconds time elapsed  ( +-  0.42% )
> 
> After:
> 
>  Performance counter stats for './run case-lru-file-mmap-read' (5 runs):
> 
>          84,380.21 msec task-clock                       #    5.664 CPUs utilized               ( +-  0.21% )
>             16,656      context-switches                 #  197.392 /sec                        ( +-  3.27% )
>                151      cpu-migrations                   #    1.790 /sec                        ( +-  3.28% )
>            801,703      page-faults                      #    9.501 K/sec                       ( +-  0.09% )
>    731,914,183,060      instructions                     #    1.88  insn per cycle              ( +-  0.38% )  (64.90%)
>    388,673,535,116      cycles                           #    4.606 GHz                         ( +-  0.24% )  (65.06%)
>    148,251,482,143      branches                         #    1.757 G/sec                       ( +-  0.37% )  (63.92%)
>        149,766,550      branch-misses                    #    0.10% of all branches             ( +-  1.22% )  (62.88%)
> 
>            14.8968 +- 0.0486 seconds time elapsed  ( +-  0.33% )
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/compaction.c |  4 +---
>  mm/internal.h   |  2 +-
>  mm/page_alloc.c | 31 +++++++++++++------------------
>  3 files changed, 15 insertions(+), 22 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 139f00c0308a..7462a02802a5 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2348,7 +2348,6 @@ static enum compact_result __compact_finished(struct compact_control *cc)
>  	ret = COMPACT_NO_SUITABLE_PAGE;
>  	for (order = cc->order; order < NR_PAGE_ORDERS; order++) {
>  		struct free_area *area = &cc->zone->free_area[order];
> -		bool claim_block;
>  
>  		/* Job done if page is free of the right migratetype */
>  		if (!free_area_empty(area, migratetype))
> @@ -2364,8 +2363,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
>  		 * Job done if allocation would steal freepages from
>  		 * other migratetype buddy lists.
>  		 */
> -		if (find_suitable_fallback(area, order, migratetype,
> -						true, &claim_block) != -1)
> +		if (find_suitable_fallback(area, order, migratetype, true) >= 0)
>  			/*
>  			 * Movable pages are OK in any pageblock. If we are
>  			 * stealing for a non-movable allocation, make sure
> diff --git a/mm/internal.h b/mm/internal.h
> index 50c2f590b2d0..55384b9971c3 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -915,7 +915,7 @@ static inline void init_cma_pageblock(struct page *page)
>  
>  
>  int find_suitable_fallback(struct free_area *area, unsigned int order,
> -			int migratetype, bool claim_only, bool *claim_block);
> +			   int migratetype, bool claimable);
>  
>  static inline bool free_area_empty(struct free_area *area, int migratetype)
>  {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 03b0d45ed45a..1522e3a29b16 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2077,31 +2077,25 @@ static bool should_try_claim_block(unsigned int order, int start_mt)
>  
>  /*
>   * Check whether there is a suitable fallback freepage with requested order.
> - * Sets *claim_block to instruct the caller whether it should convert a whole
> - * pageblock to the returned migratetype.
> - * If only_claim is true, this function returns fallback_mt only if
> + * If claimable is true, this function returns fallback_mt only if
>   * we would do this whole-block claiming. This would help to reduce
>   * fragmentation due to mixed migratetype pages in one pageblock.
>   */
>  int find_suitable_fallback(struct free_area *area, unsigned int order,
> -			int migratetype, bool only_claim, bool *claim_block)
> +			   int migratetype, bool claimable)
>  {
>  	int i;
> -	int fallback_mt;
> +
> +	if (claimable && !should_try_claim_block(order, migratetype))
> +		return -2;
>  
>  	if (area->nr_free == 0)
>  		return -1;
>  
> -	*claim_block = false;
>  	for (i = 0; i < MIGRATE_PCPTYPES - 1 ; i++) {
> -		fallback_mt = fallbacks[migratetype][i];
> -		if (free_area_empty(area, fallback_mt))
> -			continue;
> +		int fallback_mt = fallbacks[migratetype][i];
>  
> -		if (should_try_claim_block(order, migratetype))
> -			*claim_block = true;
> -
> -		if (*claim_block || !only_claim)
> +		if (!free_area_empty(area, fallback_mt))
>  			return fallback_mt;
>  	}
>  
> @@ -2206,7 +2200,6 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
>  	int min_order = order;
>  	struct page *page;
>  	int fallback_mt;
> -	bool claim_block;
>  
>  	/*
>  	 * Do not steal pages from freelists belonging to other pageblocks
> @@ -2225,11 +2218,14 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
>  				--current_order) {
>  		area = &(zone->free_area[current_order]);
>  		fallback_mt = find_suitable_fallback(area, current_order,
> -				start_migratetype, false, &claim_block);
> +						     start_migratetype, true);
> +
> +		/* No block in that order */
>  		if (fallback_mt == -1)
>  			continue;
>  
> -		if (!claim_block)
> +		/* Advanced into orders too low to claim, abort */
> +		if (fallback_mt == -2)
>  			break;
>  
>  		page = get_page_from_free_area(area, fallback_mt);
> @@ -2254,12 +2250,11 @@ __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
>  	int current_order;
>  	struct page *page;
>  	int fallback_mt;
> -	bool claim_block;
>  
>  	for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
>  		area = &(zone->free_area[current_order]);
>  		fallback_mt = find_suitable_fallback(area, current_order,
> -				start_migratetype, false, &claim_block);
> +						     start_migratetype, false);
>  		if (fallback_mt == -1)
>  			continue;
>  


Tested-by: Shivank Garg <shivankg@amd.com>

Thanks,
Shivank


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback()
  2025-04-07 18:01 ` [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback() Johannes Weiner
  2025-04-10  8:51   ` Vlastimil Babka
  2025-04-10 10:50   ` Shivank Garg
@ 2025-04-10 13:55   ` Brendan Jackman
  2025-04-11 13:45     ` Johannes Weiner
  2 siblings, 1 reply; 18+ messages in thread
From: Brendan Jackman @ 2025-04-10 13:55 UTC (permalink / raw)
  To: Johannes Weiner, Andrew Morton
  Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, Carlos Song,
	linux-mm, linux-kernel

On Mon Apr 7, 2025 at 6:01 PM UTC, Johannes Weiner wrote:
> find_suitable_fallback() is not as efficient as it could be, and
> somewhat difficult to follow.
>
> 1. should_try_claim_block() is a loop invariant. There is no point in
>    checking fallback areas if the caller is interested in claimable
>    blocks but the order and the migratetype don't allow for that.
>
> 2. __rmqueue_steal() doesn't care about claimability, so it shouldn't
>    have to run those tests.
>
> Different callers want different things from this helper:
>
> 1. __compact_finished() scans orders up until it finds a claimable block
> 2. __rmqueue_claim() scans orders down as long as blocks are claimable
> 3. __rmqueue_steal() doesn't care about claimability at all
>
> Move should_try_claim_block() out of the loop. Only test it for the
> two callers who care in the first place. Distinguish "no blocks" from
> "order + mt are not claimable" in the return value; __rmqueue_claim()
> can stop once order becomes unclaimable, __compact_finished() can keep
> advancing until order becomes claimable.

Nice!

My initial thought was: now we can drop the boolean arg and just have
the callers who care about claimability just call
should_try_claim_block() themselves. Then we can also get rid of the
magic -2 return value and find_suitable_fallback() becomes a pretty
obvious function.

I think it's a win on balance but it does make more verbosity at the
callsites, and an extra interface between page_alloc.c and compaction.c
So maybe it's a wash, maybe you already considered it and decided you
don't prefer it.

So, LGTM either way, I will attempt to attach the optional additional
patch for your perusal, hopefully without molesting the mail encoding
this time...

Reviewed-by: Brendan Jackman <jackmanb@google.com>

---

From 25f77012674db95354fb2496bc89954b8f8b4e6c Mon Sep 17 00:00:00 2001
From: Brendan Jackman <jackmanb@google.com>
Date: Thu, 10 Apr 2025 13:22:58 +0000
Subject: [PATCH] mm: page_alloc: Split up find_suitable_fallback()

Now that it's been simplified, it's clear that the bool arg isn't
needed, callers can just use should_try_claim_block(). Once that logic
is stripped out, the function becomes very obvious and can get a more
straightforward name and comment.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 mm/compaction.c |  3 ++-
 mm/internal.h   |  5 +++--
 mm/page_alloc.c | 33 +++++++++++++--------------------
 3 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 39a4d178dff3c..d735c22e71029 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2363,7 +2363,8 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 		 * Job done if allocation would steal freepages from
 		 * other migratetype buddy lists.
 		 */
-		if (find_suitable_fallback(area, order, migratetype, true) >= 0)
+		if (should_try_claim_block(order, migratetype) &&
+		    find_fallback_migratetype(area, order, migratetype) >= 0)
 			/*
 			 * Movable pages are OK in any pageblock. If we are
 			 * stealing for a non-movable allocation, make sure
diff --git a/mm/internal.h b/mm/internal.h
index 4e0ea83aaf1c8..93a8be54924f4 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -914,8 +914,9 @@ static inline void init_cma_pageblock(struct page *page)
 #endif
 
 
-int find_suitable_fallback(struct free_area *area, unsigned int order,
-			   int migratetype, bool claimable);
+int find_fallback_migratetype(struct free_area *area, unsigned int order,
+			   int migratetype);
+bool should_try_claim_block(unsigned int order, int start_mt);
 
 static inline bool free_area_empty(struct free_area *area, int migratetype)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0a1f28bf5255c..604cad16e1b5a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2034,7 +2034,7 @@ static inline bool boost_watermark(struct zone *zone)
  * try to claim an entire block to satisfy further allocations, instead of
  * polluting multiple pageblocks?
  */
-static bool should_try_claim_block(unsigned int order, int start_mt)
+bool should_try_claim_block(unsigned int order, int start_mt)
 {
 	/*
 	 * Leaving this order check is intended, although there is
@@ -2076,20 +2076,12 @@ static bool should_try_claim_block(unsigned int order, int start_mt)
 	return false;
 }
 
-/*
- * Check whether there is a suitable fallback freepage with requested order.
- * If claimable is true, this function returns fallback_mt only if
- * we would do this whole-block claiming. This would help to reduce
- * fragmentation due to mixed migratetype pages in one pageblock.
- */
-int find_suitable_fallback(struct free_area *area, unsigned int order,
-			   int migratetype, bool claimable)
+/* Find a fallback migratetype with at least one page of the given order. */
+int find_fallback_migratetype(struct free_area *area, unsigned int order,
+			      int migratetype)
 {
 	int i;
 
-	if (claimable && !should_try_claim_block(order, migratetype))
-		return -2;
-
 	if (area->nr_free == 0)
 		return -1;
 
@@ -2209,18 +2201,19 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
 	 */
 	for (current_order = MAX_PAGE_ORDER; current_order >= min_order;
 				--current_order) {
+
+		/* Advanced into orders too low to claim, abort */
+		if (!should_try_claim_block(order, start_migratetype))
+			break;
+
 		area = &(zone->free_area[current_order]);
-		fallback_mt = find_suitable_fallback(area, current_order,
-						     start_migratetype, true);
+		fallback_mt = find_fallback_migratetype(area, current_order,
+						     start_migratetype);
 
 		/* No block in that order */
 		if (fallback_mt == -1)
 			continue;
 
-		/* Advanced into orders too low to claim, abort */
-		if (fallback_mt == -2)
-			break;
-
 		page = get_page_from_free_area(area, fallback_mt);
 		page = try_to_claim_block(zone, page, current_order, order,
 					  start_migratetype, fallback_mt,
@@ -2249,8 +2242,8 @@ __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
 
 	for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
 		area = &(zone->free_area[current_order]);
-		fallback_mt = find_suitable_fallback(area, current_order,
-						     start_migratetype, false);
+		fallback_mt = find_fallback_migratetype(area, current_order,
+							start_migratetype);
 		if (fallback_mt == -1)
 			continue;
 

base-commit: 0e56a6f04d3b06eafe0000d2e3c3d7c2d554366a
-- 
2.49.0.504.g3bcea36a83-goog



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback()
  2025-04-10 13:55   ` Brendan Jackman
@ 2025-04-11 13:45     ` Johannes Weiner
  2025-04-11 15:07       ` Brendan Jackman
  0 siblings, 1 reply; 18+ messages in thread
From: Johannes Weiner @ 2025-04-11 13:45 UTC (permalink / raw)
  To: Brendan Jackman
  Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Carlos Song,
	linux-mm, linux-kernel

On Thu, Apr 10, 2025 at 01:55:27PM +0000, Brendan Jackman wrote:
> On Mon Apr 7, 2025 at 6:01 PM UTC, Johannes Weiner wrote:
> > find_suitable_fallback() is not as efficient as it could be, and
> > somewhat difficult to follow.
> >
> > 1. should_try_claim_block() is a loop invariant. There is no point in
> >    checking fallback areas if the caller is interested in claimable
> >    blocks but the order and the migratetype don't allow for that.
> >
> > 2. __rmqueue_steal() doesn't care about claimability, so it shouldn't
> >    have to run those tests.
> >
> > Different callers want different things from this helper:
> >
> > 1. __compact_finished() scans orders up until it finds a claimable block
> > 2. __rmqueue_claim() scans orders down as long as blocks are claimable
> > 3. __rmqueue_steal() doesn't care about claimability at all
> >
> > Move should_try_claim_block() out of the loop. Only test it for the
> > two callers who care in the first place. Distinguish "no blocks" from
> > "order + mt are not claimable" in the return value; __rmqueue_claim()
> > can stop once order becomes unclaimable, __compact_finished() can keep
> > advancing until order becomes claimable.
> 
> Nice!
> 
> My initial thought was: now we can drop the boolean arg and just have
> the callers who care about claimability just call
> should_try_claim_block() themselves. Then we can also get rid of the
> magic -2 return value and find_suitable_fallback() becomes a pretty
> obvious function.
> 
> I think it's a win on balance but it does make more verbosity at the
> callsites, and an extra interface between page_alloc.c and compaction.c
> So maybe it's a wash, maybe you already considered it and decided you
> don't prefer it.
> 
> So, LGTM either way, I will attempt to attach the optional additional
> patch for your perusal, hopefully without molesting the mail encoding
> this time...
> 
> Reviewed-by: Brendan Jackman <jackmanb@google.com>

Thanks!

> From 25f77012674db95354fb2496bc89954b8f8b4e6c Mon Sep 17 00:00:00 2001
> From: Brendan Jackman <jackmanb@google.com>
> Date: Thu, 10 Apr 2025 13:22:58 +0000
> Subject: [PATCH] mm: page_alloc: Split up find_suitable_fallback()
> 
> Now that it's been simplified, it's clear that the bool arg isn't
> needed, callers can just use should_try_claim_block(). Once that logic
> is stripped out, the function becomes very obvious and can get a more
> straightforward name and comment.
> 
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
>  mm/compaction.c |  3 ++-
>  mm/internal.h   |  5 +++--
>  mm/page_alloc.c | 33 +++++++++++++--------------------
>  3 files changed, 18 insertions(+), 23 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 39a4d178dff3c..d735c22e71029 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2363,7 +2363,8 @@ static enum compact_result __compact_finished(struct compact_control *cc)
>  		 * Job done if allocation would steal freepages from
>  		 * other migratetype buddy lists.
>  		 */
> -		if (find_suitable_fallback(area, order, migratetype, true) >= 0)
> +		if (should_try_claim_block(order, migratetype) &&
> +		    find_fallback_migratetype(area, order, migratetype) >= 0)

So I agree with pushing the test into the callers. However, I think
the name "should_try_claim_block()" is not great for this. It makes
sense in the alloc/fallback path, but compaction here doesn't claim
anything. It just wants to know if this order + migratetype is
eligible under block claiming rules.

IMO this would be more readable with the old terminology:

		if (can_claim_block(order, migratetype) &&
		    find_fallback_migratetype(area, order, migratetype) >= 0)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback()
  2025-04-11 13:45     ` Johannes Weiner
@ 2025-04-11 15:07       ` Brendan Jackman
  2025-04-11 17:07         ` Johannes Weiner
  0 siblings, 1 reply; 18+ messages in thread
From: Brendan Jackman @ 2025-04-11 15:07 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Carlos Song,
	linux-mm, linux-kernel

On Fri Apr 11, 2025 at 1:45 PM UTC, Johannes Weiner wrote:
>> -		if (find_suitable_fallback(area, order, migratetype, true) >= 0)
>> +		if (should_try_claim_block(order, migratetype) &&
>> +		    find_fallback_migratetype(area, order, migratetype) >= 0)
>
> So I agree with pushing the test into the callers. However, I think
> the name "should_try_claim_block()" is not great for this. It makes
> sense in the alloc/fallback path, but compaction here doesn't claim
> anything. It just wants to know if this order + migratetype is
> eligible under block claiming rules.
>
> IMO this would be more readable with the old terminology:
>
> 		if (can_claim_block(order, migratetype) &&
> 		    find_fallback_migratetype(area, order, migratetype) >= 0)

Sure, that makes sense, here's a modified version of the patch:

---

From 85be0fca4627c5b832a3382c92b6310609e14ca4 Mon Sep 17 00:00:00 2001
From: Brendan Jackman <jackmanb@google.com>
Date: Thu, 10 Apr 2025 13:22:58 +0000
Subject: [PATCH] mm: page_alloc: Split up find_suitable_fallback()

Now that it's been simplified, it's clear that the bool arg isn't
needed, callers can just use should_try_claim_block(). Once that logic
is stripped out, the function becomes very obvious and can get a more
straightforward name and comment.

Since should_try_claim_block() is now exported to compaction.c, give it
a name that makes more sense outside the context of allocation -
should_claim_block() seems confusing in code that has no interest in
actually claiming a block.

Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
 mm/compaction.c |  3 ++-
 mm/internal.h   |  5 +++--
 mm/page_alloc.c | 33 +++++++++++++--------------------
 3 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 39a4d178dff3c..0528996c40507 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2363,7 +2363,8 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 		 * Job done if allocation would steal freepages from
 		 * other migratetype buddy lists.
 		 */
-		if (find_suitable_fallback(area, order, migratetype, true) >= 0)
+		if (can_claim_block(order, migratetype) &&
+		    find_fallback_migratetype(area, order, migratetype) >= 0)
 			/*
 			 * Movable pages are OK in any pageblock. If we are
 			 * stealing for a non-movable allocation, make sure
diff --git a/mm/internal.h b/mm/internal.h
index 4e0ea83aaf1c8..5450ea7f5b1ec 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -914,8 +914,9 @@ static inline void init_cma_pageblock(struct page *page)
 #endif
 
 
-int find_suitable_fallback(struct free_area *area, unsigned int order,
-			   int migratetype, bool claimable);
+int find_fallback_migratetype(struct free_area *area, unsigned int order,
+			      int migratetype);
+bool can_claim_block(unsigned int order, int start_mt);
 
 static inline bool free_area_empty(struct free_area *area, int migratetype)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0a1f28bf5255c..c27a106ec5985 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2034,7 +2034,7 @@ static inline bool boost_watermark(struct zone *zone)
  * try to claim an entire block to satisfy further allocations, instead of
  * polluting multiple pageblocks?
  */
-static bool should_try_claim_block(unsigned int order, int start_mt)
+bool can_claim_block(unsigned int order, int start_mt)
 {
 	/*
 	 * Leaving this order check is intended, although there is
@@ -2076,20 +2076,12 @@ static bool should_try_claim_block(unsigned int order, int start_mt)
 	return false;
 }
 
-/*
- * Check whether there is a suitable fallback freepage with requested order.
- * If claimable is true, this function returns fallback_mt only if
- * we would do this whole-block claiming. This would help to reduce
- * fragmentation due to mixed migratetype pages in one pageblock.
- */
-int find_suitable_fallback(struct free_area *area, unsigned int order,
-			   int migratetype, bool claimable)
+/* Find a fallback migratetype with at least one page of the given order. */
+int find_fallback_migratetype(struct free_area *area, unsigned int order,
+			      int migratetype)
 {
 	int i;
 
-	if (claimable && !should_try_claim_block(order, migratetype))
-		return -2;
-
 	if (area->nr_free == 0)
 		return -1;
 
@@ -2209,18 +2201,19 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
 	 */
 	for (current_order = MAX_PAGE_ORDER; current_order >= min_order;
 				--current_order) {
+
+		/* Advanced into orders too low to claim, abort */
+		if (!can_claim_block(order, start_migratetype))
+			break;
+
 		area = &(zone->free_area[current_order]);
-		fallback_mt = find_suitable_fallback(area, current_order,
-						     start_migratetype, true);
+		fallback_mt = find_fallback_migratetype(area, current_order,
+							start_migratetype);
 
 		/* No block in that order */
 		if (fallback_mt == -1)
 			continue;
 
-		/* Advanced into orders too low to claim, abort */
-		if (fallback_mt == -2)
-			break;
-
 		page = get_page_from_free_area(area, fallback_mt);
 		page = try_to_claim_block(zone, page, current_order, order,
 					  start_migratetype, fallback_mt,
@@ -2249,8 +2242,8 @@ __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
 
 	for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
 		area = &(zone->free_area[current_order]);
-		fallback_mt = find_suitable_fallback(area, current_order,
-						     start_migratetype, false);
+		fallback_mt = find_fallback_migratetype(area, current_order,
+							start_migratetype);
 		if (fallback_mt == -1)
 			continue;
 
-- 
2.49.0.604.gff1f9ca942-goog



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback()
  2025-04-11 15:07       ` Brendan Jackman
@ 2025-04-11 17:07         ` Johannes Weiner
  0 siblings, 0 replies; 18+ messages in thread
From: Johannes Weiner @ 2025-04-11 17:07 UTC (permalink / raw)
  To: Brendan Jackman
  Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Carlos Song,
	linux-mm, linux-kernel

On Fri, Apr 11, 2025 at 03:07:01PM +0000, Brendan Jackman wrote:
> On Fri Apr 11, 2025 at 1:45 PM UTC, Johannes Weiner wrote:
> >> -		if (find_suitable_fallback(area, order, migratetype, true) >= 0)
> >> +		if (should_try_claim_block(order, migratetype) &&
> >> +		    find_fallback_migratetype(area, order, migratetype) >= 0)
> >
> > So I agree with pushing the test into the callers. However, I think
> > the name "should_try_claim_block()" is not great for this. It makes
> > sense in the alloc/fallback path, but compaction here doesn't claim
> > anything. It just wants to know if this order + migratetype is
> > eligible under block claiming rules.
> >
> > IMO this would be more readable with the old terminology:
> >
> > 		if (can_claim_block(order, migratetype) &&
> > 		    find_fallback_migratetype(area, order, migratetype) >= 0)
> 
> Sure, that makes sense, here's a modified version of the patch:
> 
> ---
> 
> From 85be0fca4627c5b832a3382c92b6310609e14ca4 Mon Sep 17 00:00:00 2001
> From: Brendan Jackman <jackmanb@google.com>
> Date: Thu, 10 Apr 2025 13:22:58 +0000
> Subject: [PATCH] mm: page_alloc: Split up find_suitable_fallback()
> 
> Now that it's been simplified, it's clear that the bool arg isn't
> needed, callers can just use should_try_claim_block(). Once that logic
> is stripped out, the function becomes very obvious and can get a more
> straightforward name and comment.
> 
> Since should_try_claim_block() is now exported to compaction.c, give it
> a name that makes more sense outside the context of allocation -
> should_claim_block() seems confusing in code that has no interest in
> actually claiming a block.
> 
> Signed-off-by: Brendan Jackman <jackmanb@google.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Thanks!

One minor nit:

> @@ -914,8 +914,9 @@ static inline void init_cma_pageblock(struct page *page)
>  #endif
>  
>  
> -int find_suitable_fallback(struct free_area *area, unsigned int order,
> -			   int migratetype, bool claimable);
> +int find_fallback_migratetype(struct free_area *area, unsigned int order,
> +			      int migratetype);
> +bool can_claim_block(unsigned int order, int start_mt);

Switch those around to match the C file order?

(Just being extra, and this is probably a losing battle, but hey...)


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-04-11 17:07 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
2025-04-07 18:01 ` [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback() Johannes Weiner
2025-04-10  8:51   ` Vlastimil Babka
2025-04-10 10:50   ` Shivank Garg
2025-04-10 13:55   ` Brendan Jackman
2025-04-11 13:45     ` Johannes Weiner
2025-04-11 15:07       ` Brendan Jackman
2025-04-11 17:07         ` Johannes Weiner
2025-04-08 17:22 ` [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Brendan Jackman
2025-04-08 18:50   ` Johannes Weiner
2025-04-09 17:30     ` Brendan Jackman
2025-04-10  8:16       ` Vlastimil Babka
2025-04-09  8:02 ` Yunsheng Lin
2025-04-09 14:00   ` Johannes Weiner
2025-04-10  2:02 ` Zi Yan
2025-04-10  7:03 ` [EXT] " Carlos Song
2025-04-10  8:12 ` Vlastimil Babka
2025-04-10 10:48 ` Shivank Garg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox