linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/page_alloc: optimize lowmem_reserve max lookup using monotonicity
@ 2025-11-14 10:40 fujunjie
  2025-11-14 12:36 ` Brendan Jackman
  0 siblings, 1 reply; 6+ messages in thread
From: fujunjie @ 2025-11-14 10:40 UTC (permalink / raw)
  To: akpm
  Cc: vbabka, surenb, mhocko, jackmanb, hannes, ziy, linux-mm,
	linux-kernel, fujunjie

calculate_totalreserve_pages() currently finds the maximum
lowmem_reserve[j] for a zone by scanning the full range
[j = zone_idx .. MAX_NR_ZONES). However,
setup_per_zone_lowmem_reserve() constructs lowmem_reserve[]
monotonically increasing in j for a fixed zone (and never populates
lowmem_reserve[zone_idx] itself). This means the maximum valid reserve
entry always resides at the highest j > zone_idx that has a non-zero value.

Rewrite the loop to walk backwards from MAX_NR_ZONES - 1 down to
zone_idx + 1, stopping at the first non-zero lowmem_reserve[j]. This
avoids a full-range scan and makes the intent clearer. Behavior remains
unchanged.

Although this code is not on a hot path, the revised form is clearer
and avoids an unnecessary full scan.

Signed-off-by: fujunjie <fujunjie1@qq.com>
---
 mm/page_alloc.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 600d9e981c23d..414c5ba978418 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6285,10 +6285,22 @@ static void calculate_totalreserve_pages(void)
 			long max = 0;
 			unsigned long managed_pages = zone_managed_pages(zone);
 
-			/* Find valid and maximum lowmem_reserve in the zone */
-			for (j = i; j < MAX_NR_ZONES; j++)
-				max = max(max, zone->lowmem_reserve[j]);
+			/*
+			 * Find valid and maximum lowmem_reserve in the zone.
+			 *
+			 * setup_per_zone_lowmem_reserve() builds
+			 * lowmem_reserve[j] monotonically increasing in j
+			 * for a fixed zone, so the maximum lives at the
+			 * highest index that has a non-zero value.  Walk
+			 * backwards and stop at the first hit.
+			 */
+			for (j = MAX_NR_ZONES - 1; j > i; j--) {
+				if (!zone->lowmem_reserve[j])
+					continue;
 
+				max = zone->lowmem_reserve[j];
+				break;
+			}
 			/* we treat the high watermark as reserved pages. */
 			max += high_wmark_pages(zone);
 
-- 
2.34.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/page_alloc: optimize lowmem_reserve max lookup using monotonicity
  2025-11-14 10:40 [PATCH] mm/page_alloc: optimize lowmem_reserve max lookup using monotonicity fujunjie
@ 2025-11-14 12:36 ` Brendan Jackman
  2025-11-14 14:55   ` Fujunjie
  2025-11-14 16:12   ` Zi Yan
  0 siblings, 2 replies; 6+ messages in thread
From: Brendan Jackman @ 2025-11-14 12:36 UTC (permalink / raw)
  To: fujunjie, akpm
  Cc: vbabka, surenb, mhocko, jackmanb, hannes, ziy, linux-mm, linux-kernel

On Fri Nov 14, 2025 at 10:40 AM UTC, fujunjie wrote:
> Although this code is not on a hot path, the revised form is clearer

Is it...?

If people do think it is clearer, let's at least write the right comment
in the right place. Instead of having one piece of code
(calculate_totalreserve_pages()) describe at a distance the behaviour of
another piece of code (setup_per_zone_lowmem_reserve()), let's describe
an invariant of the data ("lowmem_reserve is monotonic, up to the first
zero value"), at the site where the data is defined.

I know sometimes in code this complex we do need these
"spooky-action-at-a-distance" comments but this doesn't seem like one of
those places to me.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/page_alloc: optimize lowmem_reserve max lookup using monotonicity
  2025-11-14 12:36 ` Brendan Jackman
@ 2025-11-14 14:55   ` Fujunjie
  2025-11-14 16:12   ` Zi Yan
  1 sibling, 0 replies; 6+ messages in thread
From: Fujunjie @ 2025-11-14 14:55 UTC (permalink / raw)
  To: Brendan Jackman, akpm
  Cc: vbabka, surenb, linux-kernel, linux-mm, linux-kernel, hannes

[-- Attachment #1: Type: text/plain, Size: 3135 bytes --]


On Fri Nov 14, 2025 at 8:36 PM UTC, Brendan Jackman wrote:
> On Fri Nov 14, 2025 at 10:40 AM UTC, fujunjie wrote:
>> Although this code is not on a hot path, the revised form is clearer
> Is it...?
>
> If people do think it is clearer, let's at least write the right comment
> in the right place. Instead of having one piece of code
> (calculate_totalreserve_pages()) describe at a distance the behaviour of
> another piece of code (setup_per_zone_lowmem_reserve()), let's describe
> an invariant of the data ("lowmem_reserve is monotonic, up to the first
> zero value"), at the site where the data is defined.
>
> I know sometimes in code this complex we do need these
> "spooky-action-at-a-distance" comments but this doesn't seem like one of
> those places to me.

Thanks for the review, Brendan! Let me clarify the motivation using the 
actual semantics of zone->lowmem_reserve[j], since that meaning is what 
leads to the monotonic property. For a given zone “i” (i.e., 
zone_idx(zone) == i), the entry zone->lowmem_reserve[j] (for j > i) 
represents how many pages in this zone “i” must be treated as reserved 
when evaluating whether an allocation class that is allowed to use zones 
up to “j” may fall back into zone “i”. These reserved pages protect more 
constrained allocations that cannot use higher zones and may rely 
heavily on this lower zone. Because of this meaning, as j increases we 
are considering allocation classes that are able to use a strictly 
larger set of zones. Such classes are more flexible, and therefore we 
should not allow them to consume more low memory than allocation classes 
with a smaller j. Consequently, the reserved amount for zone “i” can 
only stay the same or increase as j increases; it should not decrease. 
setup_per_zone_lowmem_reserve() encodes this semantics directly: as j 
increases, it accumulates the managed pages of higher zones and computes 
zone->lowmem_reserve[j] as managed_pages / ratio[i]. This makes 
zone->lowmem_reserve[j] monotonically non-decreasing in j by design, 
reflecting the intended protection model rather than an accidental 
implementation detail. Given this structure, scanning backward from the 
highest j and taking the first non-zero entry in 
calculate_totalreserve_pages() is a natural way to use the data: the 
maximum reserve for zone “i” must appear at the highest j for which the 
reserve is defined. For readers familiar with how 
zone->lowmem_reserve[j] is constructed, encountering a full forward 
maximum scan can raise the question “why search the entire range when 
the reserve grows monotonically with j?” That said, I agree with your 
point that if users rely on this property, the monotonicity should be 
documented where zone->lowmem_reserve[j] is populated (or near the field 
definition), rather than explained indirectly from 
calculate_totalreserve_pages(). I can move the comment there and keep 
the consuming code simple. I’ll wait a bit to see if others have 
opinions, and if the direction seems agreeable I can send a v2 with the 
comment relocated accordingly. Thanks again for the feedback!

[-- Attachment #2: Type: text/html, Size: 3718 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/page_alloc: optimize lowmem_reserve max lookup using monotonicity
  2025-11-14 12:36 ` Brendan Jackman
  2025-11-14 14:55   ` Fujunjie
@ 2025-11-14 16:12   ` Zi Yan
  2025-11-14 16:34     ` Fujunjie
  1 sibling, 1 reply; 6+ messages in thread
From: Zi Yan @ 2025-11-14 16:12 UTC (permalink / raw)
  To: Brendan Jackman
  Cc: fujunjie, akpm, vbabka, surenb, mhocko, hannes, linux-mm, linux-kernel

On 14 Nov 2025, at 7:36, Brendan Jackman wrote:

> On Fri Nov 14, 2025 at 10:40 AM UTC, fujunjie wrote:
>> Although this code is not on a hot path, the revised form is clearer
>
> Is it...?
>
> If people do think it is clearer, let's at least write the right comment
> in the right place. Instead of having one piece of code
> (calculate_totalreserve_pages()) describe at a distance the behaviour of
> another piece of code (setup_per_zone_lowmem_reserve()), let's describe
> an invariant of the data ("lowmem_reserve is monotonic, up to the first
> zero value"), at the site where the data is defined.

I agree.

>
> I know sometimes in code this complex we do need these
> "spooky-action-at-a-distance" comments but this doesn't seem like one of
> those places to me.

My concern on this change is that the correctness of
calculate_totalreserve_pages() now relies on the implementation of
setup_per_zone_lowmem_reserve(). How can we make sure in the future
this will not break when setup_per_zone_lowmem_reserve() is changed?
Hoping people read the comment and do the right thing?

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/page_alloc: optimize lowmem_reserve max lookup using monotonicity
  2025-11-14 16:12   ` Zi Yan
@ 2025-11-14 16:34     ` Fujunjie
  2025-11-14 17:15       ` Zi Yan
  0 siblings, 1 reply; 6+ messages in thread
From: Fujunjie @ 2025-11-14 16:34 UTC (permalink / raw)
  To: Zi Yan, Brendan Jackman
  Cc: akpm, vbabka, surenb, mhocko, hannes, linux-mm, linux-kernel



On Sat Nov 15, 2025 at 00:12 AM UTC, Zi Yan wrote:

> My concern on this change is that the correctness of
> calculate_totalreserve_pages() now relies on the implementation of
> setup_per_zone_lowmem_reserve(). How can we make sure in the future
> this will not break when setup_per_zone_lowmem_reserve() is changed?
> Hoping people read the comment and do the right thing?
Thanks for raising this, Zi.

I agree it would be a real problem if calculate_totalreserve_pages()
were relying on a fragile detail of how setup_per_zone_lowmem_reserve()
happens to be written today.

What I intended to rely on is not an implementation detail, but the
semantics of zone->lowmem_reserve[j] for a given zone (with
zone_idx(zone) == i).

For such a zone "i", zone->lowmem_reserve[j] (j > i) represents how many
pages in zone "i" must effectively be kept in reserve when deciding
whether an allocation class that is allowed to allocate from zones up to
"j" may fall back into zone "i". The purpose of these reserves is to
protect allocation classes that cannot use higher zones and therefore
depend more heavily on this lower zone.

When viewed this way, the partial ordering in j comes from the meaning
of the field: as j increases, we are considering allocation classes that
can use a strictly larger set of fallback zones. Those more flexible
allocations should not be allowed to consume more low memory than the
less flexible ones. It would be quite unexpected—in terms of the reserve
semantics—if a higher-j allocation class were permitted to deplete zone
"i" more aggressively than a lower-j one.

So the “non-decreasing in j” property is really a data invariant implied
by the reserve semantics, rather than an assumption about how
setup_per_zone_lowmem_reserve() happens to be implemented today.

setup_per_zone_lowmem_reserve() currently encodes this meaning by
accumulating managed pages from higher zones and applying the configured
ratio. If some future change were to alter that implementation in a way
that breaks monotonicity, that would likely reflect a change in the
intended semantics of lowmem_reserve itself—at which point consumers
like calculate_totalreserve_pages() would naturally need to be updated
as well.

Best Regards,
Junjie,Fu



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/page_alloc: optimize lowmem_reserve max lookup using monotonicity
  2025-11-14 16:34     ` Fujunjie
@ 2025-11-14 17:15       ` Zi Yan
  0 siblings, 0 replies; 6+ messages in thread
From: Zi Yan @ 2025-11-14 17:15 UTC (permalink / raw)
  To: Fujunjie
  Cc: Brendan Jackman, akpm, vbabka, surenb, mhocko, hannes, linux-mm,
	linux-kernel

On 14 Nov 2025, at 11:34, Fujunjie wrote:

> On Sat Nov 15, 2025 at 00:12 AM UTC, Zi Yan wrote:
>
>> My concern on this change is that the correctness of
>> calculate_totalreserve_pages() now relies on the implementation of
>> setup_per_zone_lowmem_reserve(). How can we make sure in the future
>> this will not break when setup_per_zone_lowmem_reserve() is changed?
>> Hoping people read the comment and do the right thing?
> Thanks for raising this, Zi.
>
> I agree it would be a real problem if calculate_totalreserve_pages()
> were relying on a fragile detail of how setup_per_zone_lowmem_reserve()
> happens to be written today.
>
> What I intended to rely on is not an implementation detail, but the
> semantics of zone->lowmem_reserve[j] for a given zone (with
> zone_idx(zone) == i).
>
> For such a zone "i", zone->lowmem_reserve[j] (j > i) represents how many
> pages in zone "i" must effectively be kept in reserve when deciding
> whether an allocation class that is allowed to allocate from zones up to
> "j" may fall back into zone "i". The purpose of these reserves is to
> protect allocation classes that cannot use higher zones and therefore
> depend more heavily on this lower zone.
>
> When viewed this way, the partial ordering in j comes from the meaning
> of the field: as j increases, we are considering allocation classes that
> can use a strictly larger set of fallback zones. Those more flexible
> allocations should not be allowed to consume more low memory than the
> less flexible ones. It would be quite unexpected—in terms of the reserve
> semantics—if a higher-j allocation class were permitted to deplete zone
> "i" more aggressively than a lower-j one.
>
> So the “non-decreasing in j” property is really a data invariant implied
> by the reserve semantics, rather than an assumption about how
> setup_per_zone_lowmem_reserve() happens to be implemented today.
>
> setup_per_zone_lowmem_reserve() currently encodes this meaning by
> accumulating managed pages from higher zones and applying the configured
> ratio. If some future change were to alter that implementation in a way
> that breaks monotonicity, that would likely reflect a change in the
> intended semantics of lowmem_reserve itself—at which point consumers
> like calculate_totalreserve_pages() would naturally need to be updated
> as well.

Thank you for the explanation. Now your changes make more sense to me.
Like Brendan mentioned, at least add a comment in
setup_per_zone_lowmem_reserve() to state this monotonicity requirement
and mention the correctness of calculate_totalreserve_pages() relies on
it. And also please add the above text to the commit log to clarify
the purpose of the patch.

Thanks.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-11-14 17:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-14 10:40 [PATCH] mm/page_alloc: optimize lowmem_reserve max lookup using monotonicity fujunjie
2025-11-14 12:36 ` Brendan Jackman
2025-11-14 14:55   ` Fujunjie
2025-11-14 16:12   ` Zi Yan
2025-11-14 16:34     ` Fujunjie
2025-11-14 17:15       ` Zi Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox