* [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback()
2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
@ 2025-04-07 18:01 ` Johannes Weiner
2025-04-10 8:51 ` Vlastimil Babka
` (2 more replies)
2025-04-08 17:22 ` [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Brendan Jackman
` (5 subsequent siblings)
6 siblings, 3 replies; 18+ messages in thread
From: Johannes Weiner @ 2025-04-07 18:01 UTC (permalink / raw)
To: Andrew Morton
Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, Carlos Song,
linux-mm, linux-kernel
find_suitable_fallback() is not as efficient as it could be, and
somewhat difficult to follow.
1. should_try_claim_block() is a loop invariant. There is no point in
checking fallback areas if the caller is interested in claimable
blocks but the order and the migratetype don't allow for that.
2. __rmqueue_steal() doesn't care about claimability, so it shouldn't
have to run those tests.
Different callers want different things from this helper:
1. __compact_finished() scans orders up until it finds a claimable block
2. __rmqueue_claim() scans orders down as long as blocks are claimable
3. __rmqueue_steal() doesn't care about claimability at all
Move should_try_claim_block() out of the loop. Only test it for the
two callers who care in the first place. Distinguish "no blocks" from
"order + mt are not claimable" in the return value; __rmqueue_claim()
can stop once order becomes unclaimable, __compact_finished() can keep
advancing until order becomes claimable.
Before:
Performance counter stats for './run case-lru-file-mmap-read' (5 runs):
85,294.85 msec task-clock # 5.644 CPUs utilized ( +- 0.32% )
15,968 context-switches # 187.209 /sec ( +- 3.81% )
153 cpu-migrations # 1.794 /sec ( +- 3.29% )
801,808 page-faults # 9.400 K/sec ( +- 0.10% )
733,358,331,786 instructions # 1.87 insn per cycle ( +- 0.20% ) (64.94%)
392,622,904,199 cycles # 4.603 GHz ( +- 0.31% ) (64.84%)
148,563,488,531 branches # 1.742 G/sec ( +- 0.18% ) (63.86%)
152,143,228 branch-misses # 0.10% of all branches ( +- 1.19% ) (62.82%)
15.1128 +- 0.0637 seconds time elapsed ( +- 0.42% )
After:
Performance counter stats for './run case-lru-file-mmap-read' (5 runs):
84,380.21 msec task-clock # 5.664 CPUs utilized ( +- 0.21% )
16,656 context-switches # 197.392 /sec ( +- 3.27% )
151 cpu-migrations # 1.790 /sec ( +- 3.28% )
801,703 page-faults # 9.501 K/sec ( +- 0.09% )
731,914,183,060 instructions # 1.88 insn per cycle ( +- 0.38% ) (64.90%)
388,673,535,116 cycles # 4.606 GHz ( +- 0.24% ) (65.06%)
148,251,482,143 branches # 1.757 G/sec ( +- 0.37% ) (63.92%)
149,766,550 branch-misses # 0.10% of all branches ( +- 1.22% ) (62.88%)
14.8968 +- 0.0486 seconds time elapsed ( +- 0.33% )
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/compaction.c | 4 +---
mm/internal.h | 2 +-
mm/page_alloc.c | 31 +++++++++++++------------------
3 files changed, 15 insertions(+), 22 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 139f00c0308a..7462a02802a5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2348,7 +2348,6 @@ static enum compact_result __compact_finished(struct compact_control *cc)
ret = COMPACT_NO_SUITABLE_PAGE;
for (order = cc->order; order < NR_PAGE_ORDERS; order++) {
struct free_area *area = &cc->zone->free_area[order];
- bool claim_block;
/* Job done if page is free of the right migratetype */
if (!free_area_empty(area, migratetype))
@@ -2364,8 +2363,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
* Job done if allocation would steal freepages from
* other migratetype buddy lists.
*/
- if (find_suitable_fallback(area, order, migratetype,
- true, &claim_block) != -1)
+ if (find_suitable_fallback(area, order, migratetype, true) >= 0)
/*
* Movable pages are OK in any pageblock. If we are
* stealing for a non-movable allocation, make sure
diff --git a/mm/internal.h b/mm/internal.h
index 50c2f590b2d0..55384b9971c3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -915,7 +915,7 @@ static inline void init_cma_pageblock(struct page *page)
int find_suitable_fallback(struct free_area *area, unsigned int order,
- int migratetype, bool claim_only, bool *claim_block);
+ int migratetype, bool claimable);
static inline bool free_area_empty(struct free_area *area, int migratetype)
{
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 03b0d45ed45a..1522e3a29b16 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2077,31 +2077,25 @@ static bool should_try_claim_block(unsigned int order, int start_mt)
/*
* Check whether there is a suitable fallback freepage with requested order.
- * Sets *claim_block to instruct the caller whether it should convert a whole
- * pageblock to the returned migratetype.
- * If only_claim is true, this function returns fallback_mt only if
+ * If claimable is true, this function returns fallback_mt only if
* we would do this whole-block claiming. This would help to reduce
* fragmentation due to mixed migratetype pages in one pageblock.
*/
int find_suitable_fallback(struct free_area *area, unsigned int order,
- int migratetype, bool only_claim, bool *claim_block)
+ int migratetype, bool claimable)
{
int i;
- int fallback_mt;
+
+ if (claimable && !should_try_claim_block(order, migratetype))
+ return -2;
if (area->nr_free == 0)
return -1;
- *claim_block = false;
for (i = 0; i < MIGRATE_PCPTYPES - 1 ; i++) {
- fallback_mt = fallbacks[migratetype][i];
- if (free_area_empty(area, fallback_mt))
- continue;
+ int fallback_mt = fallbacks[migratetype][i];
- if (should_try_claim_block(order, migratetype))
- *claim_block = true;
-
- if (*claim_block || !only_claim)
+ if (!free_area_empty(area, fallback_mt))
return fallback_mt;
}
@@ -2206,7 +2200,6 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
int min_order = order;
struct page *page;
int fallback_mt;
- bool claim_block;
/*
* Do not steal pages from freelists belonging to other pageblocks
@@ -2225,11 +2218,14 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
--current_order) {
area = &(zone->free_area[current_order]);
fallback_mt = find_suitable_fallback(area, current_order,
- start_migratetype, false, &claim_block);
+ start_migratetype, true);
+
+ /* No block in that order */
if (fallback_mt == -1)
continue;
- if (!claim_block)
+ /* Advanced into orders too low to claim, abort */
+ if (fallback_mt == -2)
break;
page = get_page_from_free_area(area, fallback_mt);
@@ -2254,12 +2250,11 @@ __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
int current_order;
struct page *page;
int fallback_mt;
- bool claim_block;
for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
area = &(zone->free_area[current_order]);
fallback_mt = find_suitable_fallback(area, current_order,
- start_migratetype, false, &claim_block);
+ start_migratetype, false);
if (fallback_mt == -1)
continue;
--
2.49.0
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback()
2025-04-07 18:01 ` [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback() Johannes Weiner
@ 2025-04-10 8:51 ` Vlastimil Babka
2025-04-10 10:50 ` Shivank Garg
2025-04-10 13:55 ` Brendan Jackman
2 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2025-04-10 8:51 UTC (permalink / raw)
To: Johannes Weiner, Andrew Morton
Cc: Brendan Jackman, Mel Gorman, Carlos Song, linux-mm, linux-kernel
On 4/7/25 20:01, Johannes Weiner wrote:
> find_suitable_fallback() is not as efficient as it could be, and
> somewhat difficult to follow.
>
> 1. should_try_claim_block() is a loop invariant. There is no point in
> checking fallback areas if the caller is interested in claimable
> blocks but the order and the migratetype don't allow for that.
>
> 2. __rmqueue_steal() doesn't care about claimability, so it shouldn't
> have to run those tests.
>
> Different callers want different things from this helper:
>
> 1. __compact_finished() scans orders up until it finds a claimable block
> 2. __rmqueue_claim() scans orders down as long as blocks are claimable
> 3. __rmqueue_steal() doesn't care about claimability at all
>
> Move should_try_claim_block() out of the loop. Only test it for the
> two callers who care in the first place. Distinguish "no blocks" from
> "order + mt are not claimable" in the return value; __rmqueue_claim()
> can stop once order becomes unclaimable, __compact_finished() can keep
> advancing until order becomes claimable.
>
> Before:
>
> Performance counter stats for './run case-lru-file-mmap-read' (5 runs):
>
> 85,294.85 msec task-clock # 5.644 CPUs utilized ( +- 0.32% )
> 15,968 context-switches # 187.209 /sec ( +- 3.81% )
> 153 cpu-migrations # 1.794 /sec ( +- 3.29% )
> 801,808 page-faults # 9.400 K/sec ( +- 0.10% )
> 733,358,331,786 instructions # 1.87 insn per cycle ( +- 0.20% ) (64.94%)
> 392,622,904,199 cycles # 4.603 GHz ( +- 0.31% ) (64.84%)
> 148,563,488,531 branches # 1.742 G/sec ( +- 0.18% ) (63.86%)
> 152,143,228 branch-misses # 0.10% of all branches ( +- 1.19% ) (62.82%)
>
> 15.1128 +- 0.0637 seconds time elapsed ( +- 0.42% )
>
> After:
>
> Performance counter stats for './run case-lru-file-mmap-read' (5 runs):
>
> 84,380.21 msec task-clock # 5.664 CPUs utilized ( +- 0.21% )
> 16,656 context-switches # 197.392 /sec ( +- 3.27% )
> 151 cpu-migrations # 1.790 /sec ( +- 3.28% )
> 801,703 page-faults # 9.501 K/sec ( +- 0.09% )
> 731,914,183,060 instructions # 1.88 insn per cycle ( +- 0.38% ) (64.90%)
> 388,673,535,116 cycles # 4.606 GHz ( +- 0.24% ) (65.06%)
> 148,251,482,143 branches # 1.757 G/sec ( +- 0.37% ) (63.92%)
> 149,766,550 branch-misses # 0.10% of all branches ( +- 1.22% ) (62.88%)
>
> 14.8968 +- 0.0486 seconds time elapsed ( +- 0.33% )
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Yay, you found a way to get rid of the ugly "bool claim_only, bool
*claim_block" parameter combo. Great!
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback()
2025-04-07 18:01 ` [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback() Johannes Weiner
2025-04-10 8:51 ` Vlastimil Babka
@ 2025-04-10 10:50 ` Shivank Garg
2025-04-10 13:55 ` Brendan Jackman
2 siblings, 0 replies; 18+ messages in thread
From: Shivank Garg @ 2025-04-10 10:50 UTC (permalink / raw)
To: Johannes Weiner, Andrew Morton
Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, Carlos Song,
linux-mm, linux-kernel
On 4/7/2025 11:31 PM, Johannes Weiner wrote:
> find_suitable_fallback() is not as efficient as it could be, and
> somewhat difficult to follow.
>
> 1. should_try_claim_block() is a loop invariant. There is no point in
> checking fallback areas if the caller is interested in claimable
> blocks but the order and the migratetype don't allow for that.
>
> 2. __rmqueue_steal() doesn't care about claimability, so it shouldn't
> have to run those tests.
>
> Different callers want different things from this helper:
>
> 1. __compact_finished() scans orders up until it finds a claimable block
> 2. __rmqueue_claim() scans orders down as long as blocks are claimable
> 3. __rmqueue_steal() doesn't care about claimability at all
>
> Move should_try_claim_block() out of the loop. Only test it for the
> two callers who care in the first place. Distinguish "no blocks" from
> "order + mt are not claimable" in the return value; __rmqueue_claim()
> can stop once order becomes unclaimable, __compact_finished() can keep
> advancing until order becomes claimable.
>
> Before:
>
> Performance counter stats for './run case-lru-file-mmap-read' (5 runs):
>
> 85,294.85 msec task-clock # 5.644 CPUs utilized ( +- 0.32% )
> 15,968 context-switches # 187.209 /sec ( +- 3.81% )
> 153 cpu-migrations # 1.794 /sec ( +- 3.29% )
> 801,808 page-faults # 9.400 K/sec ( +- 0.10% )
> 733,358,331,786 instructions # 1.87 insn per cycle ( +- 0.20% ) (64.94%)
> 392,622,904,199 cycles # 4.603 GHz ( +- 0.31% ) (64.84%)
> 148,563,488,531 branches # 1.742 G/sec ( +- 0.18% ) (63.86%)
> 152,143,228 branch-misses # 0.10% of all branches ( +- 1.19% ) (62.82%)
>
> 15.1128 +- 0.0637 seconds time elapsed ( +- 0.42% )
>
> After:
>
> Performance counter stats for './run case-lru-file-mmap-read' (5 runs):
>
> 84,380.21 msec task-clock # 5.664 CPUs utilized ( +- 0.21% )
> 16,656 context-switches # 197.392 /sec ( +- 3.27% )
> 151 cpu-migrations # 1.790 /sec ( +- 3.28% )
> 801,703 page-faults # 9.501 K/sec ( +- 0.09% )
> 731,914,183,060 instructions # 1.88 insn per cycle ( +- 0.38% ) (64.90%)
> 388,673,535,116 cycles # 4.606 GHz ( +- 0.24% ) (65.06%)
> 148,251,482,143 branches # 1.757 G/sec ( +- 0.37% ) (63.92%)
> 149,766,550 branch-misses # 0.10% of all branches ( +- 1.22% ) (62.88%)
>
> 14.8968 +- 0.0486 seconds time elapsed ( +- 0.33% )
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
> mm/compaction.c | 4 +---
> mm/internal.h | 2 +-
> mm/page_alloc.c | 31 +++++++++++++------------------
> 3 files changed, 15 insertions(+), 22 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 139f00c0308a..7462a02802a5 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2348,7 +2348,6 @@ static enum compact_result __compact_finished(struct compact_control *cc)
> ret = COMPACT_NO_SUITABLE_PAGE;
> for (order = cc->order; order < NR_PAGE_ORDERS; order++) {
> struct free_area *area = &cc->zone->free_area[order];
> - bool claim_block;
>
> /* Job done if page is free of the right migratetype */
> if (!free_area_empty(area, migratetype))
> @@ -2364,8 +2363,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
> * Job done if allocation would steal freepages from
> * other migratetype buddy lists.
> */
> - if (find_suitable_fallback(area, order, migratetype,
> - true, &claim_block) != -1)
> + if (find_suitable_fallback(area, order, migratetype, true) >= 0)
> /*
> * Movable pages are OK in any pageblock. If we are
> * stealing for a non-movable allocation, make sure
> diff --git a/mm/internal.h b/mm/internal.h
> index 50c2f590b2d0..55384b9971c3 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -915,7 +915,7 @@ static inline void init_cma_pageblock(struct page *page)
>
>
> int find_suitable_fallback(struct free_area *area, unsigned int order,
> - int migratetype, bool claim_only, bool *claim_block);
> + int migratetype, bool claimable);
>
> static inline bool free_area_empty(struct free_area *area, int migratetype)
> {
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 03b0d45ed45a..1522e3a29b16 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2077,31 +2077,25 @@ static bool should_try_claim_block(unsigned int order, int start_mt)
>
> /*
> * Check whether there is a suitable fallback freepage with requested order.
> - * Sets *claim_block to instruct the caller whether it should convert a whole
> - * pageblock to the returned migratetype.
> - * If only_claim is true, this function returns fallback_mt only if
> + * If claimable is true, this function returns fallback_mt only if
> * we would do this whole-block claiming. This would help to reduce
> * fragmentation due to mixed migratetype pages in one pageblock.
> */
> int find_suitable_fallback(struct free_area *area, unsigned int order,
> - int migratetype, bool only_claim, bool *claim_block)
> + int migratetype, bool claimable)
> {
> int i;
> - int fallback_mt;
> +
> + if (claimable && !should_try_claim_block(order, migratetype))
> + return -2;
>
> if (area->nr_free == 0)
> return -1;
>
> - *claim_block = false;
> for (i = 0; i < MIGRATE_PCPTYPES - 1 ; i++) {
> - fallback_mt = fallbacks[migratetype][i];
> - if (free_area_empty(area, fallback_mt))
> - continue;
> + int fallback_mt = fallbacks[migratetype][i];
>
> - if (should_try_claim_block(order, migratetype))
> - *claim_block = true;
> -
> - if (*claim_block || !only_claim)
> + if (!free_area_empty(area, fallback_mt))
> return fallback_mt;
> }
>
> @@ -2206,7 +2200,6 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> int min_order = order;
> struct page *page;
> int fallback_mt;
> - bool claim_block;
>
> /*
> * Do not steal pages from freelists belonging to other pageblocks
> @@ -2225,11 +2218,14 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> --current_order) {
> area = &(zone->free_area[current_order]);
> fallback_mt = find_suitable_fallback(area, current_order,
> - start_migratetype, false, &claim_block);
> + start_migratetype, true);
> +
> + /* No block in that order */
> if (fallback_mt == -1)
> continue;
>
> - if (!claim_block)
> + /* Advanced into orders too low to claim, abort */
> + if (fallback_mt == -2)
> break;
>
> page = get_page_from_free_area(area, fallback_mt);
> @@ -2254,12 +2250,11 @@ __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
> int current_order;
> struct page *page;
> int fallback_mt;
> - bool claim_block;
>
> for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
> area = &(zone->free_area[current_order]);
> fallback_mt = find_suitable_fallback(area, current_order,
> - start_migratetype, false, &claim_block);
> + start_migratetype, false);
> if (fallback_mt == -1)
> continue;
>
Tested-by: Shivank Garg <shivankg@amd.com>
Thanks,
Shivank
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback()
2025-04-07 18:01 ` [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback() Johannes Weiner
2025-04-10 8:51 ` Vlastimil Babka
2025-04-10 10:50 ` Shivank Garg
@ 2025-04-10 13:55 ` Brendan Jackman
2025-04-11 13:45 ` Johannes Weiner
2 siblings, 1 reply; 18+ messages in thread
From: Brendan Jackman @ 2025-04-10 13:55 UTC (permalink / raw)
To: Johannes Weiner, Andrew Morton
Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, Carlos Song,
linux-mm, linux-kernel
On Mon Apr 7, 2025 at 6:01 PM UTC, Johannes Weiner wrote:
> find_suitable_fallback() is not as efficient as it could be, and
> somewhat difficult to follow.
>
> 1. should_try_claim_block() is a loop invariant. There is no point in
> checking fallback areas if the caller is interested in claimable
> blocks but the order and the migratetype don't allow for that.
>
> 2. __rmqueue_steal() doesn't care about claimability, so it shouldn't
> have to run those tests.
>
> Different callers want different things from this helper:
>
> 1. __compact_finished() scans orders up until it finds a claimable block
> 2. __rmqueue_claim() scans orders down as long as blocks are claimable
> 3. __rmqueue_steal() doesn't care about claimability at all
>
> Move should_try_claim_block() out of the loop. Only test it for the
> two callers who care in the first place. Distinguish "no blocks" from
> "order + mt are not claimable" in the return value; __rmqueue_claim()
> can stop once order becomes unclaimable, __compact_finished() can keep
> advancing until order becomes claimable.
Nice!
My initial thought was: now we can drop the boolean arg and just have
the callers who care about claimability just call
should_try_claim_block() themselves. Then we can also get rid of the
magic -2 return value and find_suitable_fallback() becomes a pretty
obvious function.
I think it's a win on balance but it does make more verbosity at the
callsites, and an extra interface between page_alloc.c and compaction.c
So maybe it's a wash, maybe you already considered it and decided you
don't prefer it.
So, LGTM either way, I will attempt to attach the optional additional
patch for your perusal, hopefully without molesting the mail encoding
this time...
Reviewed-by: Brendan Jackman <jackmanb@google.com>
---
From 25f77012674db95354fb2496bc89954b8f8b4e6c Mon Sep 17 00:00:00 2001
From: Brendan Jackman <jackmanb@google.com>
Date: Thu, 10 Apr 2025 13:22:58 +0000
Subject: [PATCH] mm: page_alloc: Split up find_suitable_fallback()
Now that it's been simplified, it's clear that the bool arg isn't
needed, callers can just use should_try_claim_block(). Once that logic
is stripped out, the function becomes very obvious and can get a more
straightforward name and comment.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
mm/compaction.c | 3 ++-
mm/internal.h | 5 +++--
mm/page_alloc.c | 33 +++++++++++++--------------------
3 files changed, 18 insertions(+), 23 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 39a4d178dff3c..d735c22e71029 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2363,7 +2363,8 @@ static enum compact_result __compact_finished(struct compact_control *cc)
* Job done if allocation would steal freepages from
* other migratetype buddy lists.
*/
- if (find_suitable_fallback(area, order, migratetype, true) >= 0)
+ if (should_try_claim_block(order, migratetype) &&
+ find_fallback_migratetype(area, order, migratetype) >= 0)
/*
* Movable pages are OK in any pageblock. If we are
* stealing for a non-movable allocation, make sure
diff --git a/mm/internal.h b/mm/internal.h
index 4e0ea83aaf1c8..93a8be54924f4 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -914,8 +914,9 @@ static inline void init_cma_pageblock(struct page *page)
#endif
-int find_suitable_fallback(struct free_area *area, unsigned int order,
- int migratetype, bool claimable);
+int find_fallback_migratetype(struct free_area *area, unsigned int order,
+ int migratetype);
+bool should_try_claim_block(unsigned int order, int start_mt);
static inline bool free_area_empty(struct free_area *area, int migratetype)
{
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0a1f28bf5255c..604cad16e1b5a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2034,7 +2034,7 @@ static inline bool boost_watermark(struct zone *zone)
* try to claim an entire block to satisfy further allocations, instead of
* polluting multiple pageblocks?
*/
-static bool should_try_claim_block(unsigned int order, int start_mt)
+bool should_try_claim_block(unsigned int order, int start_mt)
{
/*
* Leaving this order check is intended, although there is
@@ -2076,20 +2076,12 @@ static bool should_try_claim_block(unsigned int order, int start_mt)
return false;
}
-/*
- * Check whether there is a suitable fallback freepage with requested order.
- * If claimable is true, this function returns fallback_mt only if
- * we would do this whole-block claiming. This would help to reduce
- * fragmentation due to mixed migratetype pages in one pageblock.
- */
-int find_suitable_fallback(struct free_area *area, unsigned int order,
- int migratetype, bool claimable)
+/* Find a fallback migratetype with at least one page of the given order. */
+int find_fallback_migratetype(struct free_area *area, unsigned int order,
+ int migratetype)
{
int i;
- if (claimable && !should_try_claim_block(order, migratetype))
- return -2;
-
if (area->nr_free == 0)
return -1;
@@ -2209,18 +2201,19 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
*/
for (current_order = MAX_PAGE_ORDER; current_order >= min_order;
--current_order) {
+
+ /* Advanced into orders too low to claim, abort */
+ if (!should_try_claim_block(order, start_migratetype))
+ break;
+
area = &(zone->free_area[current_order]);
- fallback_mt = find_suitable_fallback(area, current_order,
- start_migratetype, true);
+ fallback_mt = find_fallback_migratetype(area, current_order,
+ start_migratetype);
/* No block in that order */
if (fallback_mt == -1)
continue;
- /* Advanced into orders too low to claim, abort */
- if (fallback_mt == -2)
- break;
-
page = get_page_from_free_area(area, fallback_mt);
page = try_to_claim_block(zone, page, current_order, order,
start_migratetype, fallback_mt,
@@ -2249,8 +2242,8 @@ __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
area = &(zone->free_area[current_order]);
- fallback_mt = find_suitable_fallback(area, current_order,
- start_migratetype, false);
+ fallback_mt = find_fallback_migratetype(area, current_order,
+ start_migratetype);
if (fallback_mt == -1)
continue;
base-commit: 0e56a6f04d3b06eafe0000d2e3c3d7c2d554366a
--
2.49.0.504.g3bcea36a83-goog
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback()
2025-04-10 13:55 ` Brendan Jackman
@ 2025-04-11 13:45 ` Johannes Weiner
2025-04-11 15:07 ` Brendan Jackman
0 siblings, 1 reply; 18+ messages in thread
From: Johannes Weiner @ 2025-04-11 13:45 UTC (permalink / raw)
To: Brendan Jackman
Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Carlos Song,
linux-mm, linux-kernel
On Thu, Apr 10, 2025 at 01:55:27PM +0000, Brendan Jackman wrote:
> On Mon Apr 7, 2025 at 6:01 PM UTC, Johannes Weiner wrote:
> > find_suitable_fallback() is not as efficient as it could be, and
> > somewhat difficult to follow.
> >
> > 1. should_try_claim_block() is a loop invariant. There is no point in
> > checking fallback areas if the caller is interested in claimable
> > blocks but the order and the migratetype don't allow for that.
> >
> > 2. __rmqueue_steal() doesn't care about claimability, so it shouldn't
> > have to run those tests.
> >
> > Different callers want different things from this helper:
> >
> > 1. __compact_finished() scans orders up until it finds a claimable block
> > 2. __rmqueue_claim() scans orders down as long as blocks are claimable
> > 3. __rmqueue_steal() doesn't care about claimability at all
> >
> > Move should_try_claim_block() out of the loop. Only test it for the
> > two callers who care in the first place. Distinguish "no blocks" from
> > "order + mt are not claimable" in the return value; __rmqueue_claim()
> > can stop once order becomes unclaimable, __compact_finished() can keep
> > advancing until order becomes claimable.
>
> Nice!
>
> My initial thought was: now we can drop the boolean arg and just have
> the callers who care about claimability just call
> should_try_claim_block() themselves. Then we can also get rid of the
> magic -2 return value and find_suitable_fallback() becomes a pretty
> obvious function.
>
> I think it's a win on balance but it does make more verbosity at the
> callsites, and an extra interface between page_alloc.c and compaction.c
> So maybe it's a wash, maybe you already considered it and decided you
> don't prefer it.
>
> So, LGTM either way, I will attempt to attach the optional additional
> patch for your perusal, hopefully without molesting the mail encoding
> this time...
>
> Reviewed-by: Brendan Jackman <jackmanb@google.com>
Thanks!
> From 25f77012674db95354fb2496bc89954b8f8b4e6c Mon Sep 17 00:00:00 2001
> From: Brendan Jackman <jackmanb@google.com>
> Date: Thu, 10 Apr 2025 13:22:58 +0000
> Subject: [PATCH] mm: page_alloc: Split up find_suitable_fallback()
>
> Now that it's been simplified, it's clear that the bool arg isn't
> needed, callers can just use should_try_claim_block(). Once that logic
> is stripped out, the function becomes very obvious and can get a more
> straightforward name and comment.
>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
> ---
> mm/compaction.c | 3 ++-
> mm/internal.h | 5 +++--
> mm/page_alloc.c | 33 +++++++++++++--------------------
> 3 files changed, 18 insertions(+), 23 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 39a4d178dff3c..d735c22e71029 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2363,7 +2363,8 @@ static enum compact_result __compact_finished(struct compact_control *cc)
> * Job done if allocation would steal freepages from
> * other migratetype buddy lists.
> */
> - if (find_suitable_fallback(area, order, migratetype, true) >= 0)
> + if (should_try_claim_block(order, migratetype) &&
> + find_fallback_migratetype(area, order, migratetype) >= 0)
So I agree with pushing the test into the callers. However, I think
the name "should_try_claim_block()" is not great for this. It makes
sense in the alloc/fallback path, but compaction here doesn't claim
anything. It just wants to know if this order + migratetype is
eligible under block claiming rules.
IMO this would be more readable with the old terminology:
if (can_claim_block(order, migratetype) &&
find_fallback_migratetype(area, order, migratetype) >= 0)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback()
2025-04-11 13:45 ` Johannes Weiner
@ 2025-04-11 15:07 ` Brendan Jackman
2025-04-11 17:07 ` Johannes Weiner
0 siblings, 1 reply; 18+ messages in thread
From: Brendan Jackman @ 2025-04-11 15:07 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Carlos Song,
linux-mm, linux-kernel
On Fri Apr 11, 2025 at 1:45 PM UTC, Johannes Weiner wrote:
>> - if (find_suitable_fallback(area, order, migratetype, true) >= 0)
>> + if (should_try_claim_block(order, migratetype) &&
>> + find_fallback_migratetype(area, order, migratetype) >= 0)
>
> So I agree with pushing the test into the callers. However, I think
> the name "should_try_claim_block()" is not great for this. It makes
> sense in the alloc/fallback path, but compaction here doesn't claim
> anything. It just wants to know if this order + migratetype is
> eligible under block claiming rules.
>
> IMO this would be more readable with the old terminology:
>
> if (can_claim_block(order, migratetype) &&
> find_fallback_migratetype(area, order, migratetype) >= 0)
Sure, that makes sense, here's a modified version of the patch:
---
From 85be0fca4627c5b832a3382c92b6310609e14ca4 Mon Sep 17 00:00:00 2001
From: Brendan Jackman <jackmanb@google.com>
Date: Thu, 10 Apr 2025 13:22:58 +0000
Subject: [PATCH] mm: page_alloc: Split up find_suitable_fallback()
Now that it's been simplified, it's clear that the bool arg isn't
needed, callers can just use should_try_claim_block(). Once that logic
is stripped out, the function becomes very obvious and can get a more
straightforward name and comment.
Since should_try_claim_block() is now exported to compaction.c, give it
a name that makes more sense outside the context of allocation -
should_claim_block() seems confusing in code that has no interest in
actually claiming a block.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
mm/compaction.c | 3 ++-
mm/internal.h | 5 +++--
mm/page_alloc.c | 33 +++++++++++++--------------------
3 files changed, 18 insertions(+), 23 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 39a4d178dff3c..0528996c40507 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2363,7 +2363,8 @@ static enum compact_result __compact_finished(struct compact_control *cc)
* Job done if allocation would steal freepages from
* other migratetype buddy lists.
*/
- if (find_suitable_fallback(area, order, migratetype, true) >= 0)
+ if (can_claim_block(order, migratetype) &&
+ find_fallback_migratetype(area, order, migratetype) >= 0)
/*
* Movable pages are OK in any pageblock. If we are
* stealing for a non-movable allocation, make sure
diff --git a/mm/internal.h b/mm/internal.h
index 4e0ea83aaf1c8..5450ea7f5b1ec 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -914,8 +914,9 @@ static inline void init_cma_pageblock(struct page *page)
#endif
-int find_suitable_fallback(struct free_area *area, unsigned int order,
- int migratetype, bool claimable);
+int find_fallback_migratetype(struct free_area *area, unsigned int order,
+ int migratetype);
+bool can_claim_block(unsigned int order, int start_mt);
static inline bool free_area_empty(struct free_area *area, int migratetype)
{
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0a1f28bf5255c..c27a106ec5985 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2034,7 +2034,7 @@ static inline bool boost_watermark(struct zone *zone)
* try to claim an entire block to satisfy further allocations, instead of
* polluting multiple pageblocks?
*/
-static bool should_try_claim_block(unsigned int order, int start_mt)
+bool can_claim_block(unsigned int order, int start_mt)
{
/*
* Leaving this order check is intended, although there is
@@ -2076,20 +2076,12 @@ static bool should_try_claim_block(unsigned int order, int start_mt)
return false;
}
-/*
- * Check whether there is a suitable fallback freepage with requested order.
- * If claimable is true, this function returns fallback_mt only if
- * we would do this whole-block claiming. This would help to reduce
- * fragmentation due to mixed migratetype pages in one pageblock.
- */
-int find_suitable_fallback(struct free_area *area, unsigned int order,
- int migratetype, bool claimable)
+/* Find a fallback migratetype with at least one page of the given order. */
+int find_fallback_migratetype(struct free_area *area, unsigned int order,
+ int migratetype)
{
int i;
- if (claimable && !should_try_claim_block(order, migratetype))
- return -2;
-
if (area->nr_free == 0)
return -1;
@@ -2209,18 +2201,19 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
*/
for (current_order = MAX_PAGE_ORDER; current_order >= min_order;
--current_order) {
+
+ /* Advanced into orders too low to claim, abort */
+ if (!can_claim_block(order, start_migratetype))
+ break;
+
area = &(zone->free_area[current_order]);
- fallback_mt = find_suitable_fallback(area, current_order,
- start_migratetype, true);
+ fallback_mt = find_fallback_migratetype(area, current_order,
+ start_migratetype);
/* No block in that order */
if (fallback_mt == -1)
continue;
- /* Advanced into orders too low to claim, abort */
- if (fallback_mt == -2)
- break;
-
page = get_page_from_free_area(area, fallback_mt);
page = try_to_claim_block(zone, page, current_order, order,
start_migratetype, fallback_mt,
@@ -2249,8 +2242,8 @@ __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
area = &(zone->free_area[current_order]);
- fallback_mt = find_suitable_fallback(area, current_order,
- start_migratetype, false);
+ fallback_mt = find_fallback_migratetype(area, current_order,
+ start_migratetype);
if (fallback_mt == -1)
continue;
--
2.49.0.604.gff1f9ca942-goog
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback()
2025-04-11 15:07 ` Brendan Jackman
@ 2025-04-11 17:07 ` Johannes Weiner
0 siblings, 0 replies; 18+ messages in thread
From: Johannes Weiner @ 2025-04-11 17:07 UTC (permalink / raw)
To: Brendan Jackman
Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Carlos Song,
linux-mm, linux-kernel
On Fri, Apr 11, 2025 at 03:07:01PM +0000, Brendan Jackman wrote:
> On Fri Apr 11, 2025 at 1:45 PM UTC, Johannes Weiner wrote:
> >> - if (find_suitable_fallback(area, order, migratetype, true) >= 0)
> >> + if (should_try_claim_block(order, migratetype) &&
> >> + find_fallback_migratetype(area, order, migratetype) >= 0)
> >
> > So I agree with pushing the test into the callers. However, I think
> > the name "should_try_claim_block()" is not great for this. It makes
> > sense in the alloc/fallback path, but compaction here doesn't claim
> > anything. It just wants to know if this order + migratetype is
> > eligible under block claiming rules.
> >
> > IMO this would be more readable with the old terminology:
> >
> > if (can_claim_block(order, migratetype) &&
> > find_fallback_migratetype(area, order, migratetype) >= 0)
>
> Sure, that makes sense, here's a modified version of the patch:
>
> ---
>
> From 85be0fca4627c5b832a3382c92b6310609e14ca4 Mon Sep 17 00:00:00 2001
> From: Brendan Jackman <jackmanb@google.com>
> Date: Thu, 10 Apr 2025 13:22:58 +0000
> Subject: [PATCH] mm: page_alloc: Split up find_suitable_fallback()
>
> Now that it's been simplified, it's clear that the bool arg isn't
> needed, callers can just use should_try_claim_block(). Once that logic
> is stripped out, the function becomes very obvious and can get a more
> straightforward name and comment.
>
> Since should_try_claim_block() is now exported to compaction.c, give it
> a name that makes more sense outside the context of allocation -
> should_claim_block() seems confusing in code that has no interest in
> actually claiming a block.
>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Thanks!
One minor nit:
> @@ -914,8 +914,9 @@ static inline void init_cma_pageblock(struct page *page)
> #endif
>
>
> -int find_suitable_fallback(struct free_area *area, unsigned int order,
> - int migratetype, bool claimable);
> +int find_fallback_migratetype(struct free_area *area, unsigned int order,
> + int migratetype);
> +bool can_claim_block(unsigned int order, int start_mt);
Switch those around to match the C file order?
(Just being extra, and this is probably a losing battle, but hey...)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
2025-04-07 18:01 ` [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback() Johannes Weiner
@ 2025-04-08 17:22 ` Brendan Jackman
2025-04-08 18:50 ` Johannes Weiner
2025-04-09 8:02 ` Yunsheng Lin
` (4 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Brendan Jackman @ 2025-04-08 17:22 UTC (permalink / raw)
To: Johannes Weiner, Andrew Morton
Cc: Vlastimil Babka, Mel Gorman, Carlos Song, linux-mm, linux-kernel,
kernel test robot, stable
On Mon Apr 7, 2025 at 6:01 PM UTC, Johannes Weiner wrote:
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2194,11 +2194,11 @@ try_to_claim_block(struct zone *zone, struct page *page,
> * The use of signed ints for order and current_order is a deliberate
> * deviation from the rest of this file, to make the for loop
> * condition simpler.
> - *
> - * Return the stolen page, or NULL if none can be found.
> */
This commentary is pretty confusing now, there's a block of text that
kinda vaguely applies to the aggregate of __rmqueue_steal(),
__rmqueue_fallback() and half of __rmqueue(). I think this new code does
a better job of speaking for itself so I think we should just delete
this block comment and replace it with some more verbosity elsewhere.
> +
> +/* Try to claim a whole foreign block, take a page, expand the remainder */
Also on the commentary front, I am not a fan of "foreign" and "native":
- "Foreign" is already used in this file to mean NUMA-nonlocal.
- We already have "start" and "fallback" being used in identifiers
as adjectives to describe the mitegratetype concept.
I wouldn't say those are _better_, "native" and "foreign" might be
clearer, but it's not worth introducing inconsistency IMO.
> static __always_inline struct page *
> -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
> +__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> unsigned int alloc_flags)
> {
> struct free_area *area;
[pasting in more context that wasn't in the original diff..]
> /*
> * Find the largest available free page in the other list. This roughly
> * approximates finding the pageblock with the most free pages, which
> * would be too costly to do exactly.
> */
> for (current_order = MAX_PAGE_ORDER; current_order >= min_order;
> --current_order) {
IIUC we could go one step further here and also avoid repeating this
iteration? Maybe something for a separate patch though?
Anyway, the approach seems like a clear improvement, thanks. I will need
to take a closer look at it tomorrow, I've run out of brain juice today.
Here's what I got from redistributing the block comment and flipping
the terminology:
diff --git i/mm/page_alloc.c w/mm/page_alloc.c
index dfb2b3f508af..b8142d605691 100644
--- i/mm/page_alloc.c
+++ w/mm/page_alloc.c
@@ -2183,21 +2183,13 @@ try_to_claim_block(struct zone *zone, struct page *page,
}
/*
- * Try finding a free buddy page on the fallback list.
- *
- * This will attempt to claim a whole pageblock for the requested type
- * to ensure grouping of such requests in the future.
- *
- * If a whole block cannot be claimed, steal an individual page, regressing to
- * __rmqueue_smallest() logic to at least break up as little contiguity as
- * possible.
+ * Try to allocate from some fallback migratetype by claiming the entire block,
+ * i.e. converting it to the allocation's start migratetype.
*
* The use of signed ints for order and current_order is a deliberate
* deviation from the rest of this file, to make the for loop
* condition simpler.
*/
-
-/* Try to claim a whole foreign block, take a page, expand the remainder */
static __always_inline struct page *
__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
unsigned int alloc_flags)
@@ -2247,7 +2239,10 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
return NULL;
}
-/* Try to steal a single page from a foreign block */
+/*
+ * Try to steal a single page from some fallback migratetype. Leave the rest of
+ * the block as its current migratetype, potentially causing fragmentation.
+ */
static __always_inline struct page *
__rmqueue_steal(struct zone *zone, int order, int start_migratetype)
{
@@ -2307,7 +2302,9 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
}
/*
- * Try the different freelists, native then foreign.
+ * First try the freelists of the requested migratetype, then try
+ * fallbacks. Roughly, each fallback stage poses more of a fragmentation
+ * risk.
*
* The fallback logic is expensive and rmqueue_bulk() calls in
* a loop with the zone->lock held, meaning the freelists are
@@ -2332,7 +2329,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
case RMQUEUE_CLAIM:
page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
if (page) {
- /* Replenished native freelist, back to normal mode */
+ /* Replenished requested migratetype's freelist, back to normal mode */
*mode = RMQUEUE_NORMAL;
return page;
}
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-08 17:22 ` [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Brendan Jackman
@ 2025-04-08 18:50 ` Johannes Weiner
2025-04-09 17:30 ` Brendan Jackman
0 siblings, 1 reply; 18+ messages in thread
From: Johannes Weiner @ 2025-04-08 18:50 UTC (permalink / raw)
To: Brendan Jackman
Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Carlos Song,
linux-mm, linux-kernel, kernel test robot, stable
On Tue, Apr 08, 2025 at 05:22:00PM +0000, Brendan Jackman wrote:
> On Mon Apr 7, 2025 at 6:01 PM UTC, Johannes Weiner wrote:
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2194,11 +2194,11 @@ try_to_claim_block(struct zone *zone, struct page *page,
> > * The use of signed ints for order and current_order is a deliberate
> > * deviation from the rest of this file, to make the for loop
> > * condition simpler.
> > - *
> > - * Return the stolen page, or NULL if none can be found.
> > */
>
> This commentary is pretty confusing now, there's a block of text that
> kinda vaguely applies to the aggregate of __rmqueue_steal(),
> __rmqueue_fallback() and half of __rmqueue(). I think this new code does
> a better job of speaking for itself so I think we should just delete
> this block comment and replace it with some more verbosity elsewhere.
I'm glad you think so, let's remove it then!
> > +/* Try to claim a whole foreign block, take a page, expand the remainder */
>
> Also on the commentary front, I am not a fan of "foreign" and "native":
>
> - "Foreign" is already used in this file to mean NUMA-nonlocal.
>
> - We already have "start" and "fallback" being used in identifiers
> as adjectives to describe the mitegratetype concept.
>
> I wouldn't say those are _better_, "native" and "foreign" might be
> clearer, but it's not worth introducing inconsistency IMO.
That's a fair point, no objection to renaming them.
> > static __always_inline struct page *
> > -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
> > +__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> > unsigned int alloc_flags)
> > {
> > struct free_area *area;
>
> [pasting in more context that wasn't in the original diff..]
> > /*
> > * Find the largest available free page in the other list. This roughly
> > * approximates finding the pageblock with the most free pages, which
> > * would be too costly to do exactly.
> > */
> > for (current_order = MAX_PAGE_ORDER; current_order >= min_order;
> > --current_order) {
>
> IIUC we could go one step further here and also avoid repeating this
> iteration? Maybe something for a separate patch though?
That might be worth a test, but agree this should be a separate patch.
AFAICS, in the most common configurations MAX_PAGE_ORDER is only one
step above pageblock_order or even the same. It might not be worth the
complication.
> Anyway, the approach seems like a clear improvement, thanks. I will need
> to take a closer look at it tomorrow, I've run out of brain juice today.
Much appreciate you taking a look, thanks.
> Here's what I got from redistributing the block comment and flipping
> the terminology:
>
> diff --git i/mm/page_alloc.c w/mm/page_alloc.c
> index dfb2b3f508af..b8142d605691 100644
> --- i/mm/page_alloc.c
> +++ w/mm/page_alloc.c
> @@ -2183,21 +2183,13 @@ try_to_claim_block(struct zone *zone, struct page *page,
> }
>
> /*
> - * Try finding a free buddy page on the fallback list.
> - *
> - * This will attempt to claim a whole pageblock for the requested type
> - * to ensure grouping of such requests in the future.
> - *
> - * If a whole block cannot be claimed, steal an individual page, regressing to
> - * __rmqueue_smallest() logic to at least break up as little contiguity as
> - * possible.
> + * Try to allocate from some fallback migratetype by claiming the entire block,
> + * i.e. converting it to the allocation's start migratetype.
> *
> * The use of signed ints for order and current_order is a deliberate
> * deviation from the rest of this file, to make the for loop
> * condition simpler.
> */
> -
> -/* Try to claim a whole foreign block, take a page, expand the remainder */
> static __always_inline struct page *
> __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> unsigned int alloc_flags)
> @@ -2247,7 +2239,10 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> return NULL;
> }
>
> -/* Try to steal a single page from a foreign block */
> +/*
> + * Try to steal a single page from some fallback migratetype. Leave the rest of
> + * the block as its current migratetype, potentially causing fragmentation.
> + */
> static __always_inline struct page *
> __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
> {
> @@ -2307,7 +2302,9 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> }
>
> /*
> - * Try the different freelists, native then foreign.
> + * First try the freelists of the requested migratetype, then try
> + * fallbacks. Roughly, each fallback stage poses more of a fragmentation
> + * risk.
How about "then try fallback modes with increasing levels of
fragmentation risk."
> * The fallback logic is expensive and rmqueue_bulk() calls in
> * a loop with the zone->lock held, meaning the freelists are
> @@ -2332,7 +2329,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> case RMQUEUE_CLAIM:
> page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
> if (page) {
> - /* Replenished native freelist, back to normal mode */
> + /* Replenished requested migratetype's freelist, back to normal mode */
> *mode = RMQUEUE_NORMAL;
This line is kind of long now. How about:
/* Replenished preferred freelist, back to normal mode */
But yeah, I like your proposed changes. Would you care to send a
proper patch?
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-08 18:50 ` Johannes Weiner
@ 2025-04-09 17:30 ` Brendan Jackman
2025-04-10 8:16 ` Vlastimil Babka
0 siblings, 1 reply; 18+ messages in thread
From: Brendan Jackman @ 2025-04-09 17:30 UTC (permalink / raw)
To: Johannes Weiner
Cc: Andrew Morton, Vlastimil Babka, Mel Gorman, Carlos Song,
linux-mm, linux-kernel, kernel test robot, stable
On Tue Apr 8, 2025 at 6:50 PM UTC, Johannes Weiner wrote:
>> > /*
>> > * Find the largest available free page in the other list. This roughly
>> > * approximates finding the pageblock with the most free pages, which
>> > * would be too costly to do exactly.
>> > */
>> > for (current_order = MAX_PAGE_ORDER; current_order >= min_order;
>> > --current_order) {
>>
>> IIUC we could go one step further here and also avoid repeating this
>> iteration? Maybe something for a separate patch though?
>
> That might be worth a test, but agree this should be a separate patch.
>
> AFAICS, in the most common configurations MAX_PAGE_ORDER is only one
> step above pageblock_order or even the same. It might not be worth the
> complication.
Oh yeah, makes sense.
>> /*
>> - * Try the different freelists, native then foreign.
>> + * First try the freelists of the requested migratetype, then try
>> + * fallbacks. Roughly, each fallback stage poses more of a fragmentation
>> + * risk.
>
> How about "then try fallback modes with increasing levels of
> fragmentation risk."
Yep, nice thanks.
>> * The fallback logic is expensive and rmqueue_bulk() calls in
>> * a loop with the zone->lock held, meaning the freelists are
>> @@ -2332,7 +2329,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
>> case RMQUEUE_CLAIM:
>> page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
>> if (page) {
>> - /* Replenished native freelist, back to normal mode */
>> + /* Replenished requested migratetype's freelist, back to normal mode */
>> *mode = RMQUEUE_NORMAL;
>
> This line is kind of long now. How about:
>
> /* Replenished preferred freelist, back to normal mode */
Yep, sounds good - it's still 81 characters, the rest of this file
sticks to 80 for comments, I guess I'll leave it to Andrew to decide if
that is an issue?
> But yeah, I like your proposed changes. Would you care to send a
> proper patch?
Sure, pasting below. Andrew, could you fold this in? Also, I haven't
done this style of patch sending before, please let me know if I'm doing
something to make your life difficult.
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Aside from the commen stuff fixed by the patch below:
Reviewed-by: Brendan Jackman <jackmanb@google.com>
---
From 8ff20dbb52770d082e182482d2b47e521de028d1 Mon Sep 17 00:00:00 2001
From: Brendan Jackman <jackmanb@google.com>
Date: Wed, 9 Apr 2025 17:22:14 +000
Subject: [PATCH] page_alloc: speed up fallbacks in rmqueue_bulk() - comment updates
Tidy up some terminology and redistribute commentary.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
mm/page_alloc.c | 22 +++++++++-------------
1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dfb2b3f508af4..220bd0bcc38c3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2183,21 +2183,13 @@ try_to_claim_block(struct zone *zone, struct page *page,
}
/*
- * Try finding a free buddy page on the fallback list.
- *
- * This will attempt to claim a whole pageblock for the requested type
- * to ensure grouping of such requests in the future.
- *
- * If a whole block cannot be claimed, steal an individual page, regressing to
- * __rmqueue_smallest() logic to at least break up as little contiguity as
- * possible.
+ * Try to allocate from some fallback migratetype by claiming the entire block,
+ * i.e. converting it to the allocation's start migratetype.
*
* The use of signed ints for order and current_order is a deliberate
* deviation from the rest of this file, to make the for loop
* condition simpler.
*/
-
-/* Try to claim a whole foreign block, take a page, expand the remainder */
static __always_inline struct page *
__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
unsigned int alloc_flags)
@@ -2247,7 +2239,10 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
return NULL;
}
-/* Try to steal a single page from a foreign block */
+/*
+ * Try to steal a single page from some fallback migratetype. Leave the rest of
+ * the block as its current migratetype, potentially causing fragmentation.
+ */
static __always_inline struct page *
__rmqueue_steal(struct zone *zone, int order, int start_migratetype)
{
@@ -2307,7 +2302,8 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
}
/*
- * Try the different freelists, native then foreign.
+ * First try the freelists of the requested migratetype, then try
+ * fallbacks modes with increasing levels of fragmentation risk.
*
* The fallback logic is expensive and rmqueue_bulk() calls in
* a loop with the zone->lock held, meaning the freelists are
@@ -2332,7 +2328,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
case RMQUEUE_CLAIM:
page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
if (page) {
- /* Replenished native freelist, back to normal mode */
+ /* Replenished preferred freelist, back to normal mode. */
*mode = RMQUEUE_NORMAL;
return page;
}
base-commit: aa42382db4e2a4ed1f4ba97ffc50e2ce45accb0c
--
2.49.0.504.g3bcea36a83-goog
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-09 17:30 ` Brendan Jackman
@ 2025-04-10 8:16 ` Vlastimil Babka
0 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2025-04-10 8:16 UTC (permalink / raw)
To: Brendan Jackman, Johannes Weiner
Cc: Andrew Morton, Mel Gorman, Carlos Song, linux-mm, linux-kernel,
kernel test robot, stable
On 4/9/25 19:30, Brendan Jackman wrote:
>
> From 8ff20dbb52770d082e182482d2b47e521de028d1 Mon Sep 17 00:00:00 2001
> From: Brendan Jackman <jackmanb@google.com>
> Date: Wed, 9 Apr 2025 17:22:14 +000
> Subject: [PATCH] page_alloc: speed up fallbacks in rmqueue_bulk() - comment updates
>
> Tidy up some terminology and redistribute commentary.
> Signed-off-by: Brendan Jackman <jackmanb@google.com>
LGTM (assuming folding)
> ---
> mm/page_alloc.c | 22 +++++++++-------------
> 1 file changed, 9 insertions(+), 13 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index dfb2b3f508af4..220bd0bcc38c3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2183,21 +2183,13 @@ try_to_claim_block(struct zone *zone, struct page *page,
> }
>
> /*
> - * Try finding a free buddy page on the fallback list.
> - *
> - * This will attempt to claim a whole pageblock for the requested type
> - * to ensure grouping of such requests in the future.
> - *
> - * If a whole block cannot be claimed, steal an individual page, regressing to
> - * __rmqueue_smallest() logic to at least break up as little contiguity as
> - * possible.
> + * Try to allocate from some fallback migratetype by claiming the entire block,
> + * i.e. converting it to the allocation's start migratetype.
> *
> * The use of signed ints for order and current_order is a deliberate
> * deviation from the rest of this file, to make the for loop
> * condition simpler.
> */
> -
> -/* Try to claim a whole foreign block, take a page, expand the remainder */
> static __always_inline struct page *
> __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> unsigned int alloc_flags)
> @@ -2247,7 +2239,10 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> return NULL;
> }
>
> -/* Try to steal a single page from a foreign block */
> +/*
> + * Try to steal a single page from some fallback migratetype. Leave the rest of
> + * the block as its current migratetype, potentially causing fragmentation.
> + */
> static __always_inline struct page *
> __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
> {
> @@ -2307,7 +2302,8 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> }
>
> /*
> - * Try the different freelists, native then foreign.
> + * First try the freelists of the requested migratetype, then try
> + * fallbacks modes with increasing levels of fragmentation risk.
> *
> * The fallback logic is expensive and rmqueue_bulk() calls in
> * a loop with the zone->lock held, meaning the freelists are
> @@ -2332,7 +2328,7 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> case RMQUEUE_CLAIM:
> page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
> if (page) {
> - /* Replenished native freelist, back to normal mode */
> + /* Replenished preferred freelist, back to normal mode. */
> *mode = RMQUEUE_NORMAL;
> return page;
> }
>
> base-commit: aa42382db4e2a4ed1f4ba97ffc50e2ce45accb0c
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
2025-04-07 18:01 ` [PATCH 2/2] mm: page_alloc: tighten up find_suitable_fallback() Johannes Weiner
2025-04-08 17:22 ` [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Brendan Jackman
@ 2025-04-09 8:02 ` Yunsheng Lin
2025-04-09 14:00 ` Johannes Weiner
2025-04-10 2:02 ` Zi Yan
` (3 subsequent siblings)
6 siblings, 1 reply; 18+ messages in thread
From: Yunsheng Lin @ 2025-04-09 8:02 UTC (permalink / raw)
To: Johannes Weiner, Andrew Morton
Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, Carlos Song,
linux-mm, linux-kernel, kernel test robot, stable
On 2025/4/8 2:01, Johannes Weiner wrote:
...
>
> @@ -2934,6 +2981,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
> {
> struct page *page;
> unsigned long flags;
> + enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
>
> do {
> page = NULL;
> @@ -2945,7 +2993,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
> if (alloc_flags & ALLOC_HIGHATOMIC)
> page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> if (!page) {
> - page = __rmqueue(zone, order, migratetype, alloc_flags);
> + page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm);
>
> /*
> * If the allocation fails, allow OOM handling and
It was not in the diff, but it seems the zone->lock is held inside the do..while loop,
doesn't it mean that the freelists are subject to outside changes and rmqm is stale?
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-09 8:02 ` Yunsheng Lin
@ 2025-04-09 14:00 ` Johannes Weiner
0 siblings, 0 replies; 18+ messages in thread
From: Johannes Weiner @ 2025-04-09 14:00 UTC (permalink / raw)
To: Yunsheng Lin
Cc: Andrew Morton, Vlastimil Babka, Brendan Jackman, Mel Gorman,
Carlos Song, linux-mm, linux-kernel, kernel test robot, stable
On Wed, Apr 09, 2025 at 04:02:39PM +0800, Yunsheng Lin wrote:
> On 2025/4/8 2:01, Johannes Weiner wrote:
> > @@ -2934,6 +2981,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
> > {
> > struct page *page;
> > unsigned long flags;
> > + enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
> >
> > do {
> > page = NULL;
> > @@ -2945,7 +2993,7 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
> > if (alloc_flags & ALLOC_HIGHATOMIC)
> > page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
> > if (!page) {
> > - page = __rmqueue(zone, order, migratetype, alloc_flags);
> > + page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm);
> >
> > /*
> > * If the allocation fails, allow OOM handling and
>
> It was not in the diff, but it seems the zone->lock is held inside the do..while loop,
> doesn't it mean that the freelists are subject to outside changes and rmqm is stale?
Yes. Note that it only loops when there is a bug/corrupted page, so it
won't make much difference in practice. But it's still kind of weird.
Thanks for your review, Yunsheng!
Andrew, could you please fold the below fixlet?
---
From 71b1eea7ded41c1f674909f9755c23b9ee9bcb6a Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Wed, 9 Apr 2025 09:56:52 -0400
Subject: [PATCH] mm: page_alloc: speed up fallbacks in rmqueue_bulk() fix
reset rmqueue_mode in rmqueue_buddy() error loop, per Yunsheng Lin
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/page_alloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dfb2b3f508af..7ffeeb0f62d3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2983,7 +2983,6 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
{
struct page *page;
unsigned long flags;
- enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
do {
page = NULL;
@@ -2996,6 +2995,8 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
if (alloc_flags & ALLOC_HIGHATOMIC)
page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
if (!page) {
+ enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
+
page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm);
/*
--
2.49.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
` (2 preceding siblings ...)
2025-04-09 8:02 ` Yunsheng Lin
@ 2025-04-10 2:02 ` Zi Yan
2025-04-10 7:03 ` [EXT] " Carlos Song
` (2 subsequent siblings)
6 siblings, 0 replies; 18+ messages in thread
From: Zi Yan @ 2025-04-10 2:02 UTC (permalink / raw)
To: Johannes Weiner, Andrew Morton
Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, Carlos Song,
linux-mm, linux-kernel, kernel test robot, stable
On Mon Apr 7, 2025 at 2:01 PM EDT, Johannes Weiner wrote:
> The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal
> single pages from biggest buddy") as the root cause of a 56.4%
> regression in vm-scalability::lru-file-mmap-read.
>
> Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix
> freelist movement during block conversion"), as the root cause for a
> regression in worst-case zone->lock+irqoff hold times.
>
> Both of these patches modify the page allocator's fallback path to be
> less greedy in an effort to stave off fragmentation. The flip side of
> this is that fallbacks are also less productive each time around,
> which means the fallback search can run much more frequently.
>
> Carlos' traces point to rmqueue_bulk() specifically, which tries to
> refill the percpu cache by allocating a large batch of pages in a
> loop. It highlights how once the native freelists are exhausted, the
> fallback code first scans orders top-down for whole blocks to claim,
> then falls back to a bottom-up search for the smallest buddy to steal.
> For the next batch page, it goes through the same thing again.
>
> This can be made more efficient. Since rmqueue_bulk() holds the
> zone->lock over the entire batch, the freelists are not subject to
> outside changes; when the search for a block to claim has already
> failed, there is no point in trying again for the next page.
>
> Modify __rmqueue() to remember the last successful fallback mode, and
> restart directly from there on the next rmqueue_bulk() iteration.
>
> Oliver confirms that this improves beyond the regression that the test
> robot reported against c2f6ea38fc1b:
>
> commit:
> f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
> c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
> acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
> 2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <--- your patch
>
> f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
> ---------------- --------------------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev %change %stddev
> \ | \ | \ | \
> 25525364 ± 3% -56.4% 11135467 -57.8% 10779336 +31.6% 33581409 vm-scalability.throughput
>
> Carlos confirms that worst-case times are almost fully recovered
> compared to before the earlier culprit patch:
>
> 2dd482ba627d (before freelist hygiene): 1ms
> c0cd6f557b90 (after freelist hygiene): 90ms
> next-20250319 (steal smallest buddy): 280ms
> this patch : 8ms
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Reported-by: Carlos Song <carlos.song@nxp.com>
> Tested-by: kernel test robot <oliver.sang@intel.com>
> Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion")
> Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy")
> Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com
> Cc: stable@vger.kernel.org # 6.10+
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
> mm/page_alloc.c | 100 +++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 74 insertions(+), 26 deletions(-)
>
It is a really nice cleanup. It improves my understanding of the rmqueue*()
and the whole flow a lot. Thank you for the patch.
Acked-by: Zi Yan <ziy@nvidia.com>
--
Best Regards,
Yan, Zi
^ permalink raw reply [flat|nested] 18+ messages in thread* RE: [EXT] [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
` (3 preceding siblings ...)
2025-04-10 2:02 ` Zi Yan
@ 2025-04-10 7:03 ` Carlos Song
2025-04-10 8:12 ` Vlastimil Babka
2025-04-10 10:48 ` Shivank Garg
6 siblings, 0 replies; 18+ messages in thread
From: Carlos Song @ 2025-04-10 7:03 UTC (permalink / raw)
To: Johannes Weiner, Andrew Morton
Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, linux-mm,
linux-kernel, kernel test robot, stable
Hi,
That is a nice fix! I test it at imx7d.
Tested-by: Carlos Song <carlos.song@nxp.com>
> -----Original Message-----
> From: Johannes Weiner <hannes@cmpxchg.org>
> Sent: Tuesday, April 8, 2025 2:02 AM
> To: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>; Brendan Jackman
> <jackmanb@google.com>; Mel Gorman <mgorman@techsingularity.net>;
> Carlos Song <carlos.song@nxp.com>; linux-mm@kvack.org;
> linux-kernel@vger.kernel.org; kernel test robot <oliver.sang@intel.com>;
> stable@vger.kernel.org
> Subject: [EXT] [PATCH 1/2] mm: page_alloc: speed up fallbacks in
> rmqueue_bulk()
>
> Caution: This is an external email. Please take care when clicking links or
> opening attachments. When in doubt, report the message using the 'Report this
> email' button
>
>
> The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal single
> pages from biggest buddy") as the root cause of a 56.4% regression in
> vm-scalability::lru-file-mmap-read.
>
> Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix freelist
> movement during block conversion"), as the root cause for a regression in
> worst-case zone->lock+irqoff hold times.
>
> Both of these patches modify the page allocator's fallback path to be less greedy
> in an effort to stave off fragmentation. The flip side of this is that fallbacks are
> also less productive each time around, which means the fallback search can run
> much more frequently.
>
> Carlos' traces point to rmqueue_bulk() specifically, which tries to refill the
> percpu cache by allocating a large batch of pages in a loop. It highlights how
> once the native freelists are exhausted, the fallback code first scans orders
> top-down for whole blocks to claim, then falls back to a bottom-up search for the
> smallest buddy to steal.
> For the next batch page, it goes through the same thing again.
>
> This can be made more efficient. Since rmqueue_bulk() holds the
> zone->lock over the entire batch, the freelists are not subject to
> outside changes; when the search for a block to claim has already failed, there is
> no point in trying again for the next page.
>
> Modify __rmqueue() to remember the last successful fallback mode, and restart
> directly from there on the next rmqueue_bulk() iteration.
>
> Oliver confirms that this improves beyond the regression that the test robot
> reported against c2f6ea38fc1b:
>
> commit:
> f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
> c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
> acc4d5ff0b ("Merge tag 'net-6.15-rc0' of
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
> 2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <---
> your patch
>
> f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1
> acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
> ---------------- --------------------------- --------------------------- ---------------------------
> %stddev %change %stddev %change
> %stddev %change %stddev
> \ | \ |
> \ | \
> 25525364 ± 3% -56.4% 11135467 -57.8%
> 10779336 +31.6% 33581409
> vm-scalability.throughput
>
> Carlos confirms that worst-case times are almost fully recovered compared to
> before the earlier culprit patch:
>
> 2dd482ba627d (before freelist hygiene): 1ms
> c0cd6f557b90 (after freelist hygiene): 90ms
> next-20250319 (steal smallest buddy): 280ms
> this patch : 8ms
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Reported-by: Carlos Song <carlos.song@nxp.com>
> Tested-by: kernel test robot <oliver.sang@intel.com>
> Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block
> conversion")
> Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest
> buddy")
> Closes:
> https://lore.kern/
> el.org%2Foe-lkp%2F202503271547.fc08b188-lkp%40intel.com&data=05%7C02
> %7Ccarlos.song%40nxp.com%7C3325bfa02b7942cca36508dd75fe4a3d%7C686
> ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638796457225141027%7CUn
> known%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCI
> sIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdat
> a=gl5xQ8OLJNIccLDsYgVejUC%2B9ZQrxmt%2FxkfQXsmDuNY%3D&reserved=0
> Cc: stable@vger.kernel.org # 6.10+
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
> mm/page_alloc.c | 100 +++++++++++++++++++++++++++++++++++-------------
> 1 file changed, 74 insertions(+), 26 deletions(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c index
> f51aa6051a99..03b0d45ed45a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2194,11 +2194,11 @@ try_to_claim_block(struct zone *zone, struct page
> *page,
> * The use of signed ints for order and current_order is a deliberate
> * deviation from the rest of this file, to make the for loop
> * condition simpler.
> - *
> - * Return the stolen page, or NULL if none can be found.
> */
> +
> +/* Try to claim a whole foreign block, take a page, expand the
> +remainder */
> static __always_inline struct page *
> -__rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
> +__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
> unsigned int
> alloc_flags) {
> struct free_area *area;
> @@ -2236,14 +2236,26 @@ __rmqueue_fallback(struct zone *zone, int order,
> int start_migratetype,
> page = try_to_claim_block(zone, page, current_order, order,
> start_migratetype,
> fallback_mt,
> alloc_flags);
> - if (page)
> - goto got_one;
> + if (page) {
> + trace_mm_page_alloc_extfrag(page, order,
> current_order,
> +
> start_migratetype, fallback_mt);
> + return page;
> + }
> }
>
> - if (alloc_flags & ALLOC_NOFRAGMENT)
> - return NULL;
> + return NULL;
> +}
> +
> +/* Try to steal a single page from a foreign block */ static
> +__always_inline struct page * __rmqueue_steal(struct zone *zone, int
> +order, int start_migratetype) {
> + struct free_area *area;
> + int current_order;
> + struct page *page;
> + int fallback_mt;
> + bool claim_block;
>
> - /* No luck claiming pageblock. Find the smallest fallback page */
> for (current_order = order; current_order < NR_PAGE_ORDERS;
> current_order++) {
> area = &(zone->free_area[current_order]);
> fallback_mt = find_suitable_fallback(area, current_order,
> @@ -2253,25 +2265,28 @@ __rmqueue_fallback(struct zone *zone, int order,
> int start_migratetype,
>
> page = get_page_from_free_area(area, fallback_mt);
> page_del_and_expand(zone, page, order, current_order,
> fallback_mt);
> - goto got_one;
> + trace_mm_page_alloc_extfrag(page, order, current_order,
> + start_migratetype,
> fallback_mt);
> + return page;
> }
>
> return NULL;
> -
> -got_one:
> - trace_mm_page_alloc_extfrag(page, order, current_order,
> - start_migratetype, fallback_mt);
> -
> - return page;
> }
>
> +enum rmqueue_mode {
> + RMQUEUE_NORMAL,
> + RMQUEUE_CMA,
> + RMQUEUE_CLAIM,
> + RMQUEUE_STEAL,
> +};
> +
> /*
> * Do the hard work of removing an element from the buddy allocator.
> * Call me with the zone->lock already held.
> */
> static __always_inline struct page *
> __rmqueue(struct zone *zone, unsigned int order, int migratetype,
> - unsigned int
> alloc_flags)
> + unsigned int alloc_flags, enum rmqueue_mode *mode)
> {
> struct page *page;
>
> @@ -2290,16 +2305,47 @@ __rmqueue(struct zone *zone, unsigned int order,
> int migratetype,
> }
> }
>
> - page = __rmqueue_smallest(zone, order, migratetype);
> - if (unlikely(!page)) {
> - if (alloc_flags & ALLOC_CMA)
> + /*
> + * Try the different freelists, native then foreign.
> + *
> + * The fallback logic is expensive and rmqueue_bulk() calls in
> + * a loop with the zone->lock held, meaning the freelists are
> + * not subject to any outside changes. Remember in *mode where
> + * we found pay dirt, to save us the search on the next call.
> + */
> + switch (*mode) {
> + case RMQUEUE_NORMAL:
> + page = __rmqueue_smallest(zone, order, migratetype);
> + if (page)
> + return page;
> + fallthrough;
> + case RMQUEUE_CMA:
> + if (alloc_flags & ALLOC_CMA) {
> page = __rmqueue_cma_fallback(zone, order);
> -
> - if (!page)
> - page = __rmqueue_fallback(zone, order,
> migratetype,
> - alloc_flags);
> + if (page) {
> + *mode = RMQUEUE_CMA;
> + return page;
> + }
> + }
> + fallthrough;
> + case RMQUEUE_CLAIM:
> + page = __rmqueue_claim(zone, order, migratetype,
> alloc_flags);
> + if (page) {
> + /* Replenished native freelist, back to normal
> mode */
> + *mode = RMQUEUE_NORMAL;
> + return page;
> + }
> + fallthrough;
> + case RMQUEUE_STEAL:
> + if (!(alloc_flags & ALLOC_NOFRAGMENT)) {
> + page = __rmqueue_steal(zone, order,
> migratetype);
> + if (page) {
> + *mode = RMQUEUE_STEAL;
> + return page;
> + }
> + }
> }
> - return page;
> + return NULL;
> }
>
> /*
> @@ -2311,6 +2357,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned
> int order,
> unsigned long count, struct list_head *list,
> int migratetype, unsigned int alloc_flags) {
> + enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
> unsigned long flags;
> int i;
>
> @@ -2321,7 +2368,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned
> int order,
> }
> for (i = 0; i < count; ++i) {
> struct page *page = __rmqueue(zone, order, migratetype,
> -
> alloc_flags);
> + alloc_flags, &rmqm);
> if (unlikely(page == NULL))
> break;
>
> @@ -2934,6 +2981,7 @@ struct page *rmqueue_buddy(struct zone
> *preferred_zone, struct zone *zone, {
> struct page *page;
> unsigned long flags;
> + enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
>
> do {
> page = NULL;
> @@ -2945,7 +2993,7 @@ struct page *rmqueue_buddy(struct zone
> *preferred_zone, struct zone *zone,
> if (alloc_flags & ALLOC_HIGHATOMIC)
> page = __rmqueue_smallest(zone, order,
> MIGRATE_HIGHATOMIC);
> if (!page) {
> - page = __rmqueue(zone, order, migratetype,
> alloc_flags);
> + page = __rmqueue(zone, order, migratetype,
> + alloc_flags, &rmqm);
>
> /*
> * If the allocation fails, allow OOM handling and
> --
> 2.49.0
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
` (4 preceding siblings ...)
2025-04-10 7:03 ` [EXT] " Carlos Song
@ 2025-04-10 8:12 ` Vlastimil Babka
2025-04-10 10:48 ` Shivank Garg
6 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2025-04-10 8:12 UTC (permalink / raw)
To: Johannes Weiner, Andrew Morton
Cc: Brendan Jackman, Mel Gorman, Carlos Song, linux-mm, linux-kernel,
kernel test robot, stable
On 4/7/25 20:01, Johannes Weiner wrote:
> The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal
> single pages from biggest buddy") as the root cause of a 56.4%
> regression in vm-scalability::lru-file-mmap-read.
>
> Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix
> freelist movement during block conversion"), as the root cause for a
> regression in worst-case zone->lock+irqoff hold times.
>
> Both of these patches modify the page allocator's fallback path to be
> less greedy in an effort to stave off fragmentation. The flip side of
> this is that fallbacks are also less productive each time around,
> which means the fallback search can run much more frequently.
>
> Carlos' traces point to rmqueue_bulk() specifically, which tries to
> refill the percpu cache by allocating a large batch of pages in a
> loop. It highlights how once the native freelists are exhausted, the
> fallback code first scans orders top-down for whole blocks to claim,
> then falls back to a bottom-up search for the smallest buddy to steal.
> For the next batch page, it goes through the same thing again.
>
> This can be made more efficient. Since rmqueue_bulk() holds the
> zone->lock over the entire batch, the freelists are not subject to
> outside changes; when the search for a block to claim has already
> failed, there is no point in trying again for the next page.
>
> Modify __rmqueue() to remember the last successful fallback mode, and
> restart directly from there on the next rmqueue_bulk() iteration.
>
> Oliver confirms that this improves beyond the regression that the test
> robot reported against c2f6ea38fc1b:
>
> commit:
> f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
> c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
> acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
> 2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <--- your patch
>
> f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
> ---------------- --------------------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev %change %stddev
> \ | \ | \ | \
> 25525364 ± 3% -56.4% 11135467 -57.8% 10779336 +31.6% 33581409 vm-scalability.throughput
>
> Carlos confirms that worst-case times are almost fully recovered
> compared to before the earlier culprit patch:
>
> 2dd482ba627d (before freelist hygiene): 1ms
> c0cd6f557b90 (after freelist hygiene): 90ms
> next-20250319 (steal smallest buddy): 280ms
> this patch : 8ms
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Reported-by: Carlos Song <carlos.song@nxp.com>
> Tested-by: kernel test robot <oliver.sang@intel.com>
> Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion")
> Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy")
> Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com
> Cc: stable@vger.kernel.org # 6.10+
Might be going to be fun with dependency commits.
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cool stuff.
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
2025-04-07 18:01 [PATCH 1/2] mm: page_alloc: speed up fallbacks in rmqueue_bulk() Johannes Weiner
` (5 preceding siblings ...)
2025-04-10 8:12 ` Vlastimil Babka
@ 2025-04-10 10:48 ` Shivank Garg
6 siblings, 0 replies; 18+ messages in thread
From: Shivank Garg @ 2025-04-10 10:48 UTC (permalink / raw)
To: Johannes Weiner, Andrew Morton
Cc: Vlastimil Babka, Brendan Jackman, Mel Gorman, Carlos Song,
linux-mm, linux-kernel, kernel test robot, stable
On 4/7/2025 11:31 PM, Johannes Weiner wrote:
> The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal
> single pages from biggest buddy") as the root cause of a 56.4%
> regression in vm-scalability::lru-file-mmap-read.
>
> Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix
> freelist movement during block conversion"), as the root cause for a
> regression in worst-case zone->lock+irqoff hold times.
>
> Both of these patches modify the page allocator's fallback path to be
> less greedy in an effort to stave off fragmentation. The flip side of
> this is that fallbacks are also less productive each time around,
> which means the fallback search can run much more frequently.
>
> Carlos' traces point to rmqueue_bulk() specifically, which tries to
> refill the percpu cache by allocating a large batch of pages in a
> loop. It highlights how once the native freelists are exhausted, the
> fallback code first scans orders top-down for whole blocks to claim,
> then falls back to a bottom-up search for the smallest buddy to steal.
> For the next batch page, it goes through the same thing again.
>
> This can be made more efficient. Since rmqueue_bulk() holds the
> zone->lock over the entire batch, the freelists are not subject to
> outside changes; when the search for a block to claim has already
> failed, there is no point in trying again for the next page.
>
> Modify __rmqueue() to remember the last successful fallback mode, and
> restart directly from there on the next rmqueue_bulk() iteration.
>
> Oliver confirms that this improves beyond the regression that the test
> robot reported against c2f6ea38fc1b:
>
> commit:
> f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
> c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
> acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
> 2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <--- your patch
>
> f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
> ---------------- --------------------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev %change %stddev
> \ | \ | \ | \
> 25525364 ± 3% -56.4% 11135467 -57.8% 10779336 +31.6% 33581409 vm-scalability.throughput
>
> Carlos confirms that worst-case times are almost fully recovered
> compared to before the earlier culprit patch:
>
> 2dd482ba627d (before freelist hygiene): 1ms
> c0cd6f557b90 (after freelist hygiene): 90ms
> next-20250319 (steal smallest buddy): 280ms
> this patch : 8ms
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Reported-by: Carlos Song <carlos.song@nxp.com>
> Tested-by: kernel test robot <oliver.sang@intel.com>
> Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion")
> Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy")
> Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com
> Cc: stable@vger.kernel.org # 6.10+
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
Tested on AMD Zen 3 EPYC (2-socket and 1 NUMA node, 64 CPUs on each socket)
vm-scalability/300s-lru-file-mmap-read.
Vanilla Patched Change
Throughput 32267451 36112127 +11.9% improvement
Stddev% 2477.36 2969.18 +19.8%
Free Time 0.144072 0.148774 +3.2%
Median 227967 249851 +9.6%
Tested-by: Shivank Garg <shivankg@amd.com>
Thanks,
Shivank
^ permalink raw reply [flat|nested] 18+ messages in thread