linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] mm: pcp: increase pcp->free_count threshold to trigger free_high
@ 2025-04-07 10:52 Nikhil Dhama
  2025-04-11  2:16 ` Huang, Ying
  0 siblings, 1 reply; 8+ messages in thread
From: Nikhil Dhama @ 2025-04-07 10:52 UTC (permalink / raw)
  To: akpm
  Cc: bharata, raghavendra.kodsarathimmappa, ying.huang, oe-lkp, lkp,
	Nikhil Dhama, Huang Ying, linux-mm, linux-kernel, Mel Gorman

In old pcp design, pcp->free_factor gets incremented in nr_pcp_free()
which is invoked by free_pcppages_bulk(). So, it used to increase
free_factor by 1 only when we try to reduce the size of pcp list or
flush for high order, and free_high used to trigger only 
for order > 0 and order < costly_order and pcp->free_factor > 0.

For iperf3 I noticed that with older design in kernel v6.6, pcp list was
drained mostly when pcp->count > high (more often when count goes above
530). and most of the time pcp->free_factor was 0, triggering very few
high order flushes.

But this is changed in the current design, introduced in commit 6ccdcb6d3a74 
("mm, pcp: reduce detecting time of consecutive high order page freeing"), 
where pcp->free_factor is changed to pcp->free_count to keep track of the 
number of pages freed contiguously. In this design, pcp->free_count is 
incremented on every deallocation, irrespective of whether pcp list was 
reduced or not. And logic to trigger free_high is if pcp->free_count goes 
above batch (which is 63) and there are two contiguous page free without 
any allocation.

With this design, for iperf3, pcp list is getting flushed more frequently 
because free_high heuristics is triggered more often now. I observed that 
high order pcp list is drained as soon as both count and free_count goes 
above 63.

Due to this more aggressive high order flushing, applications
doing contiguous high order allocation will require to go to global list
more frequently.

On a 2-node AMD machine with 384 vCPUs on each node,
connected via Mellonox connectX-7, I am seeing a ~30% performance
reduction if we scale number of iperf3 client/server pairs from 32 to 64.

Though this new design reduced the time to detect high order flushes,
but for application which are allocating high order pages more
frequently it may be flushing the high order list pre-maturely.
This motivates towards tuning on how late or early we should flush
high order lists. 

So, in this patch, we increased the pcp->free_count threshold to 
trigger free_high from "batch" to "batch + pcp->high_min / 2". 
This new threshold keeps high order pages in pcp list for a 
longer duration which can help the application doing high order
allocations frequently.

With this patch performace to Iperf3 is restored and 
score for other benchmarks on the same machine are as follows:

		      iperf3    lmbench3        netperf         kbuild
                               (AF_UNIX)   (SCTP_STREAM_MANY)
                     -------   ---------   -----------------    ------
v6.6  vanilla (base)    100          100              100          100
v6.12 vanilla            69          113             98.5         98.8
v6.12 + this patch      100        110.3            100.2         99.3


netperf-tcp:

                                  6.12                      6.12
                               vanilla    	      this_patch
Hmean     64         732.14 (   0.00%)         730.45 (  -0.23%)
Hmean     128       1417.46 (   0.00%)        1419.44 (   0.14%)
Hmean     256       2679.67 (   0.00%)        2676.45 (  -0.12%)
Hmean     1024      8328.52 (   0.00%)        8339.34 (   0.13%)
Hmean     2048     12716.98 (   0.00%)       12743.68 (   0.21%)
Hmean     3312     15787.79 (   0.00%)       15887.25 (   0.63%)
Hmean     4096     17311.91 (   0.00%)       17332.68 (   0.12%)
Hmean     8192     20310.73 (   0.00%)       20465.09 (   0.76%)

Fixes: 6ccdcb6d3a74 ("mm, pcp: reduce detecting time of consecutive high order page freeing")

Signed-off-by: Nikhil Dhama <nikhil.dhama@amd.com>
Suggested-by: Huang Ying <ying.huang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Huang Ying <huang.ying.caritas@gmail.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: Mel Gorman <mgorman@techsingularity.net>

---
 v1: https://lore.kernel.org/linux-mm/20250107091724.35287-1-nikhil.dhama@amd.com/
 v2: https://lore.kernel.org/linux-mm/20250325171915.14384-1-nikhil.dhama@amd.com/

 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b6958333054d..569dcf1f731f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2617,7 +2617,7 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp,
 	 * stops will be drained from vmstat refresh context.
 	 */
 	if (order && order <= PAGE_ALLOC_COSTLY_ORDER) {
-		free_high = (pcp->free_count >= batch &&
+		free_high = (pcp->free_count >= (batch + pcp->high_min / 2) &&
 			     (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) &&
 			     (!(pcp->flags & PCPF_FREE_HIGH_BATCH) ||
 			      pcp->count >= READ_ONCE(batch)));
-- 
2.25.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] mm: pcp: increase pcp->free_count threshold to trigger free_high
  2025-04-07 10:52 [PATCH v3] mm: pcp: increase pcp->free_count threshold to trigger free_high Nikhil Dhama
@ 2025-04-11  2:16 ` Huang, Ying
  2025-04-11  6:02   ` Raghavendra K T
  0 siblings, 1 reply; 8+ messages in thread
From: Huang, Ying @ 2025-04-11  2:16 UTC (permalink / raw)
  To: Nikhil Dhama
  Cc: akpm, bharata, raghavendra.kodsarathimmappa, oe-lkp, lkp,
	Huang Ying, linux-mm, linux-kernel, Mel Gorman

Hi, Nikhil,

Sorry for late reply.

Nikhil Dhama <nikhil.dhama@amd.com> writes:

> In old pcp design, pcp->free_factor gets incremented in nr_pcp_free()
> which is invoked by free_pcppages_bulk(). So, it used to increase
> free_factor by 1 only when we try to reduce the size of pcp list or
> flush for high order, and free_high used to trigger only 
> for order > 0 and order < costly_order and pcp->free_factor > 0.
>
> For iperf3 I noticed that with older design in kernel v6.6, pcp list was
> drained mostly when pcp->count > high (more often when count goes above
> 530). and most of the time pcp->free_factor was 0, triggering very few
> high order flushes.
>
> But this is changed in the current design, introduced in commit 6ccdcb6d3a74 
> ("mm, pcp: reduce detecting time of consecutive high order page freeing"), 
> where pcp->free_factor is changed to pcp->free_count to keep track of the 
> number of pages freed contiguously. In this design, pcp->free_count is 
> incremented on every deallocation, irrespective of whether pcp list was 
> reduced or not. And logic to trigger free_high is if pcp->free_count goes 
> above batch (which is 63) and there are two contiguous page free without 
> any allocation.

The design changes because pcp->high can become much higher than that
before it.  This makes it much harder to trigger free_high, which causes
some performance regressions too.

> With this design, for iperf3, pcp list is getting flushed more frequently 
> because free_high heuristics is triggered more often now. I observed that 
> high order pcp list is drained as soon as both count and free_count goes 
> above 63.
>
> Due to this more aggressive high order flushing, applications
> doing contiguous high order allocation will require to go to global list
> more frequently.
>
> On a 2-node AMD machine with 384 vCPUs on each node,
> connected via Mellonox connectX-7, I am seeing a ~30% performance
> reduction if we scale number of iperf3 client/server pairs from 32 to 64.
>
> Though this new design reduced the time to detect high order flushes,
> but for application which are allocating high order pages more
> frequently it may be flushing the high order list pre-maturely.
> This motivates towards tuning on how late or early we should flush
> high order lists. 
>
> So, in this patch, we increased the pcp->free_count threshold to 
> trigger free_high from "batch" to "batch + pcp->high_min / 2". 
> This new threshold keeps high order pages in pcp list for a 
> longer duration which can help the application doing high order
> allocations frequently.

IIUC, we restore the original behavior with "batch + pcp->high / 2" as
in my analysis in

https://lore.kernel.org/all/875xjmuiup.fsf@DESKTOP-5N7EMDA/

If you think my analysis is correct, can you add that in patch
description too?  This makes it easier for people to know why the code
looks this way.

> With this patch performace to Iperf3 is restored and 
> score for other benchmarks on the same machine are as follows:
>
> 		      iperf3    lmbench3        netperf         kbuild
>                                (AF_UNIX)   (SCTP_STREAM_MANY)
>                      -------   ---------   -----------------    ------
> v6.6  vanilla (base)    100          100              100          100
> v6.12 vanilla            69          113             98.5         98.8
> v6.12 + this patch      100        110.3            100.2         99.3
>
>
> netperf-tcp:
>
>                                   6.12                      6.12
>                                vanilla    	      this_patch
> Hmean     64         732.14 (   0.00%)         730.45 (  -0.23%)
> Hmean     128       1417.46 (   0.00%)        1419.44 (   0.14%)
> Hmean     256       2679.67 (   0.00%)        2676.45 (  -0.12%)
> Hmean     1024      8328.52 (   0.00%)        8339.34 (   0.13%)
> Hmean     2048     12716.98 (   0.00%)       12743.68 (   0.21%)
> Hmean     3312     15787.79 (   0.00%)       15887.25 (   0.63%)
> Hmean     4096     17311.91 (   0.00%)       17332.68 (   0.12%)
> Hmean     8192     20310.73 (   0.00%)       20465.09 (   0.76%)
>
> Fixes: 6ccdcb6d3a74 ("mm, pcp: reduce detecting time of consecutive high order page freeing")
>
> Signed-off-by: Nikhil Dhama <nikhil.dhama@amd.com>
> Suggested-by: Huang Ying <ying.huang@linux.alibaba.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Huang Ying <huang.ying.caritas@gmail.com>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Mel Gorman <mgorman@techsingularity.net>
>
> ---
>  v1: https://lore.kernel.org/linux-mm/20250107091724.35287-1-nikhil.dhama@amd.com/
>  v2: https://lore.kernel.org/linux-mm/20250325171915.14384-1-nikhil.dhama@amd.com/
>
>  mm/page_alloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index b6958333054d..569dcf1f731f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2617,7 +2617,7 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp,
>  	 * stops will be drained from vmstat refresh context.
>  	 */
>  	if (order && order <= PAGE_ALLOC_COSTLY_ORDER) {
> -		free_high = (pcp->free_count >= batch &&
> +		free_high = (pcp->free_count >= (batch + pcp->high_min / 2) &&
>  			     (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) &&
>  			     (!(pcp->flags & PCPF_FREE_HIGH_BATCH) ||
>  			      pcp->count >= READ_ONCE(batch)));

---
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] mm: pcp: increase pcp->free_count threshold to trigger free_high
  2025-04-11  2:16 ` Huang, Ying
@ 2025-04-11  6:02   ` Raghavendra K T
  2025-04-11  6:15     ` Huang, Ying
  0 siblings, 1 reply; 8+ messages in thread
From: Raghavendra K T @ 2025-04-11  6:02 UTC (permalink / raw)
  To: Huang, Ying, Nikhil Dhama
  Cc: akpm, bharata, raghavendra.kodsarathimmappa, oe-lkp, lkp,
	Huang Ying, linux-mm, linux-kernel, Mel Gorman



On 4/11/2025 7:46 AM, Huang, Ying wrote:
> Hi, Nikhil,
> 
> Sorry for late reply.
> 
> Nikhil Dhama <nikhil.dhama@amd.com> writes:
> 
>> In old pcp design, pcp->free_factor gets incremented in nr_pcp_free()
>> which is invoked by free_pcppages_bulk(). So, it used to increase
>> free_factor by 1 only when we try to reduce the size of pcp list or
>> flush for high order, and free_high used to trigger only
>> for order > 0 and order < costly_order and pcp->free_factor > 0.
>>
>> For iperf3 I noticed that with older design in kernel v6.6, pcp list was
>> drained mostly when pcp->count > high (more often when count goes above
>> 530). and most of the time pcp->free_factor was 0, triggering very few
>> high order flushes.
>>
>> But this is changed in the current design, introduced in commit 6ccdcb6d3a74
>> ("mm, pcp: reduce detecting time of consecutive high order page freeing"),
>> where pcp->free_factor is changed to pcp->free_count to keep track of the
>> number of pages freed contiguously. In this design, pcp->free_count is
>> incremented on every deallocation, irrespective of whether pcp list was
>> reduced or not. And logic to trigger free_high is if pcp->free_count goes
>> above batch (which is 63) and there are two contiguous page free without
>> any allocation.
> 
> The design changes because pcp->high can become much higher than that
> before it.  This makes it much harder to trigger free_high, which causes
> some performance regressions too.
> 
>> With this design, for iperf3, pcp list is getting flushed more frequently
>> because free_high heuristics is triggered more often now. I observed that
>> high order pcp list is drained as soon as both count and free_count goes
>> above 63.
>>
>> Due to this more aggressive high order flushing, applications
>> doing contiguous high order allocation will require to go to global list
>> more frequently.
>>
>> On a 2-node AMD machine with 384 vCPUs on each node,
>> connected via Mellonox connectX-7, I am seeing a ~30% performance
>> reduction if we scale number of iperf3 client/server pairs from 32 to 64.
>>
>> Though this new design reduced the time to detect high order flushes,
>> but for application which are allocating high order pages more
>> frequently it may be flushing the high order list pre-maturely.
>> This motivates towards tuning on how late or early we should flush
>> high order lists.
>>
>> So, in this patch, we increased the pcp->free_count threshold to
>> trigger free_high from "batch" to "batch + pcp->high_min / 2".
>> This new threshold keeps high order pages in pcp list for a
>> longer duration which can help the application doing high order
>> allocations frequently.
> 
> IIUC, we restore the original behavior with "batch + pcp->high / 2" as
> in my analysis in
> 
> https://lore.kernel.org/all/875xjmuiup.fsf@DESKTOP-5N7EMDA/
> 
> If you think my analysis is correct, can you add that in patch
> description too?  This makes it easier for people to know why the code
> looks this way.
> 

Yes. This makes sense. Andrew has already included the patch in mm tree.

Nikhil,

Could you please help with the updated write up based on Ying's
suggestion assuming it works for Andrew?

- Raghu




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] mm: pcp: increase pcp->free_count threshold to trigger free_high
  2025-04-11  6:02   ` Raghavendra K T
@ 2025-04-11  6:15     ` Huang, Ying
  2025-04-26  2:11       ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Huang, Ying @ 2025-04-11  6:15 UTC (permalink / raw)
  To: Raghavendra K T, Nikhil Dhama
  Cc: akpm, bharata, raghavendra.kodsarathimmappa, oe-lkp, lkp,
	Huang Ying, linux-mm, linux-kernel, Mel Gorman

Raghavendra K T <raghavendra.kt@amd.com> writes:

> On 4/11/2025 7:46 AM, Huang, Ying wrote:
>> Hi, Nikhil,
>> Sorry for late reply.
>> Nikhil Dhama <nikhil.dhama@amd.com> writes:
>> 
>>> In old pcp design, pcp->free_factor gets incremented in nr_pcp_free()
>>> which is invoked by free_pcppages_bulk(). So, it used to increase
>>> free_factor by 1 only when we try to reduce the size of pcp list or
>>> flush for high order, and free_high used to trigger only
>>> for order > 0 and order < costly_order and pcp->free_factor > 0.
>>>
>>> For iperf3 I noticed that with older design in kernel v6.6, pcp list was
>>> drained mostly when pcp->count > high (more often when count goes above
>>> 530). and most of the time pcp->free_factor was 0, triggering very few
>>> high order flushes.
>>>
>>> But this is changed in the current design, introduced in commit 6ccdcb6d3a74
>>> ("mm, pcp: reduce detecting time of consecutive high order page freeing"),
>>> where pcp->free_factor is changed to pcp->free_count to keep track of the
>>> number of pages freed contiguously. In this design, pcp->free_count is
>>> incremented on every deallocation, irrespective of whether pcp list was
>>> reduced or not. And logic to trigger free_high is if pcp->free_count goes
>>> above batch (which is 63) and there are two contiguous page free without
>>> any allocation.
>> The design changes because pcp->high can become much higher than
>> that
>> before it.  This makes it much harder to trigger free_high, which causes
>> some performance regressions too.
>> 
>>> With this design, for iperf3, pcp list is getting flushed more frequently
>>> because free_high heuristics is triggered more often now. I observed that
>>> high order pcp list is drained as soon as both count and free_count goes
>>> above 63.
>>>
>>> Due to this more aggressive high order flushing, applications
>>> doing contiguous high order allocation will require to go to global list
>>> more frequently.
>>>
>>> On a 2-node AMD machine with 384 vCPUs on each node,
>>> connected via Mellonox connectX-7, I am seeing a ~30% performance
>>> reduction if we scale number of iperf3 client/server pairs from 32 to 64.
>>>
>>> Though this new design reduced the time to detect high order flushes,
>>> but for application which are allocating high order pages more
>>> frequently it may be flushing the high order list pre-maturely.
>>> This motivates towards tuning on how late or early we should flush
>>> high order lists.
>>>
>>> So, in this patch, we increased the pcp->free_count threshold to
>>> trigger free_high from "batch" to "batch + pcp->high_min / 2".
>>> This new threshold keeps high order pages in pcp list for a
>>> longer duration which can help the application doing high order
>>> allocations frequently.
>> IIUC, we restore the original behavior with "batch + pcp->high / 2"
>> as
>> in my analysis in
>> https://lore.kernel.org/all/875xjmuiup.fsf@DESKTOP-5N7EMDA/
>> If you think my analysis is correct, can you add that in patch
>> description too?  This makes it easier for people to know why the code
>> looks this way.
>> 
>
> Yes. This makes sense. Andrew has already included the patch in mm tree.
>
> Nikhil,
>
> Could you please help with the updated write up based on Ying's
> suggestion assuming it works for Andrew?

Thanks!

Just send a updated version, Andrew will update the patch in mm tree
unless it has been merged by mm-stable.

---
Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] mm: pcp: increase pcp->free_count threshold to trigger free_high
  2025-04-11  6:15     ` Huang, Ying
@ 2025-04-26  2:11       ` Andrew Morton
  2025-04-28  5:00         ` Nikhil Dhama
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2025-04-26  2:11 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Raghavendra K T, Nikhil Dhama, bharata,
	raghavendra.kodsarathimmappa, oe-lkp, lkp, Huang Ying, linux-mm,
	linux-kernel, Mel Gorman

On Fri, 11 Apr 2025 14:15:42 +0800 "Huang, Ying" <ying.huang@linux.alibaba.com> wrote:

> >> in my analysis in
> >> https://lore.kernel.org/all/875xjmuiup.fsf@DESKTOP-5N7EMDA/
> >> If you think my analysis is correct, can you add that in patch
> >> description too?  This makes it easier for people to know why the code
> >> looks this way.
> >> 
> >
> > Yes. This makes sense. Andrew has already included the patch in mm tree.
> >
> > Nikhil,
> >
> > Could you please help with the updated write up based on Ying's
> > suggestion assuming it works for Andrew?
> 
> Thanks!
> 
> Just send a updated version, Andrew will update the patch in mm tree
> unless it has been merged by mm-stable.

[ two weeks pass ]

Nikhil's attentions are presumably elsewhere.  Could someone (Ying or
Raghavendra?) please send along altered changelog text which I can
paste in there?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] mm: pcp: increase pcp->free_count threshold to trigger free_high
  2025-04-26  2:11       ` Andrew Morton
@ 2025-04-28  5:00         ` Nikhil Dhama
  2025-05-11  4:30           ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Nikhil Dhama @ 2025-04-28  5:00 UTC (permalink / raw)
  To: akpm
  Cc: bharata, huang.ying.caritas, linux-kernel, linux-mm, lkp,
	mgorman, nikhil.dhama, oe-lkp, raghavendra.kt, ying.huang

Hi Andrew, 

Sorry, I forgot to CC mm, may be that's why it went unnnoticed.

On 4/26/2025 7:41 AM, Andrew Morton wrote:


> On Fri, 11 Apr 2025 14:15:42 +0800 "Huang, Ying" <ying.huang@linux.alibaba.com> wrote:
>
>>>> in my analysis in
>>>> https://lore.kernel.org/all/875xjmuiup.fsf@DESKTOP-5N7EMDA/
>>>> If you think my analysis is correct, can you add that in patch
>>>> description too?  This makes it easier for people to know why the code
>>>> looks this way.
>>>>
>>>
>>> Yes. This makes sense. Andrew has already included the patch in mm tree.
>>>
>>> Nikhil,
>>>
>>> Could you please help with the updated write up based on Ying's
>>> suggestion assuming it works for Andrew?
>>
>> Thanks!
>>
>> Just send a updated version, Andrew will update the patch in mm tree
>> unless it has been merged by mm-stable.
>
> [ two weeks pass ]
>
> Nikhil's attentions are presumably elsewhere.  Could someone (Ying or
> Raghavendra?) please send along altered changelog text which I can
> paste in there?

Please find the updated changelog text as follows:


In old pcp design, pcp->free_factor gets incremented in nr_pcp_free()
which is invoked by free_pcppages_bulk().  So, it used to increase
free_factor by 1 only when we try to reduce the size of pcp list and
free_high used to trigger only for order > 0 and order < costly_order
and pcp->free_factor > 0.

For iperf3 I noticed that with older design in kernel v6.6, pcp list
was drained mostly when pcp->count > high (more often when count goes
above 530).  and most of the time pcp->free_factor was 0, triggering
very few high order flushes.

But this is changed in the current design, introduced in commit
6ccdcb6d3a74 ("mm, pcp: reduce detecting time of consecutive high order
page freeing"), where pcp->free_factor is changed to pcp->free_count to
keep track of the number of pages freed contiguously.  In this design,
pcp->free_count is incremented on every deallocation, irrespective of
whether pcp list was reduced or not.  And logic to trigger free_high is
if pcp->free_count goes above batch (which is 63) and there are two
contiguous page free without any allocation.

With this design, for iperf3, pcp list is getting flushed more
frequently because free_high heuristics is triggered more often now.  I
observed that high order pcp list is drained as soon as both count and
free_count goes above 63.

Due to this more aggressive high order flushing, applications doing
contiguous high order allocation will require to go to global list more
frequently.

On a 2-node AMD machine with 384 vCPUs on each node, connected via
Mellonox connectX-7, I am seeing a ~30% performance reduction if we
scale number of iperf3 client/server pairs from 32 to 64.

Though this new design reduced the time to detect high order flushes,
but for application which are allocating high order pages more
frequently it may be flushing the high order list pre-maturely.  This
motivates towards tuning on how late or early we should flush high
order lists.  

So, in this patch, we increased the pcp->free_count threshold to
trigger free_high from "batch" to "batch + pcp->high_min / 2" as
suggested by Ying [1], In the original pcp->free_factor solution,
free_high is triggered for contiguous freeing with size ranging from
"batch" to "pcp->high + batch".  So, the average value is "batch +
pcp->high / 2".  While in the pcp->free_count solution, free_high will
be triggered for contiguous freeing with size "batch".  So, to restore
the original behavior, we can use the threshold "batch + pcp->high_min
/ 2"

This new threshold keeps high order pages in pcp list for a longer
duration which can help the application doing high order allocations
frequently.

With this patch performace to Iperf3 is restored and score for other
benchmarks on the same machine are as follows:

		      iperf3    lmbench3        netperf         kbuild
                               (AF_UNIX)   (SCTP_STREAM_MANY)
                     -------   ---------   -----------------    ------
v6.6  vanilla (base)    100          100              100          100
v6.12 vanilla            69          113             98.5         98.8
v6.12 + this patch      100        110.3            100.2         99.3


netperf-tcp:

                                  6.12                      6.12
                               vanilla    	      this_patch
Hmean     64         732.14 (   0.00%)         730.45 (  -0.23%)
Hmean     128       1417.46 (   0.00%)        1419.44 (   0.14%)
Hmean     256       2679.67 (   0.00%)        2676.45 (  -0.12%)
Hmean     1024      8328.52 (   0.00%)        8339.34 (   0.13%)
Hmean     2048     12716.98 (   0.00%)       12743.68 (   0.21%)
Hmean     3312     15787.79 (   0.00%)       15887.25 (   0.63%)
Hmean     4096     17311.91 (   0.00%)       17332.68 (   0.12%)
Hmean     8192     20310.73 (   0.00%)       20465.09 (   0.76%)

Link: https://lore.kernel.org/all/875xjmuiup.fsf@DESKTOP-5N7EMDA/ [1]
Link: https://lkml.kernel.org/r/20250407105219.55351-1-nikhil.dhama@amd.com
Fixes: 6ccdcb6d3a74 ("mm, pcp: reduce detecting time of consecutive high order page freeing")
Signed-off-by: Nikhil Dhama <nikhil.dhama@amd.com>
Suggested-by: Huang Ying <ying.huang@linux.alibaba.com>
Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com>
Cc: Raghavendra K T <raghavendra.kt@amd.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Bharata B Rao <bharata@amd.com>


---
 v1: https://lore.kernel.org/linux-mm/20250107091724.35287-1-nikhil.dhama@amd.com/
 v2: https://lore.kernel.org/linux-mm/20250325171915.14384-1-nikhil.dhama@amd.com/

 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b6958333054d..569dcf1f731f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2617,7 +2617,7 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp,
 	 * stops will be drained from vmstat refresh context.
 	 */
 	if (order && order <= PAGE_ALLOC_COSTLY_ORDER) {
-		free_high = (pcp->free_count >= batch &&
+		free_high = (pcp->free_count >= (batch + pcp->high_min / 2) &&
 			     (pcp->flags & PCPF_PREV_FREE_HIGH_ORDER) &&
 			     (!(pcp->flags & PCPF_FREE_HIGH_BATCH) ||
 			      pcp->count >= READ_ONCE(batch)));
-- 
2.25.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] mm: pcp: increase pcp->free_count threshold to trigger free_high
  2025-04-28  5:00         ` Nikhil Dhama
@ 2025-05-11  4:30           ` Andrew Morton
  2025-05-12  6:50             ` Nikhil Dhama
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2025-05-11  4:30 UTC (permalink / raw)
  To: Nikhil Dhama
  Cc: bharata, huang.ying.caritas, linux-kernel, linux-mm, lkp,
	mgorman, oe-lkp, raghavendra.kt, ying.huang

On Mon, 28 Apr 2025 10:30:47 +0530 Nikhil Dhama <nikhil.dhama@amd.com> wrote:

> > Nikhil's attentions are presumably elsewhere.  Could someone (Ying or
> > Raghavendra?) please send along altered changelog text which I can
> > paste in there?
> 
> Please find the updated changelog text as follows:

As far as I can tell, this replacement text is identical to that of the
orginal patch.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] mm: pcp: increase pcp->free_count threshold to trigger free_high
  2025-05-11  4:30           ` Andrew Morton
@ 2025-05-12  6:50             ` Nikhil Dhama
  0 siblings, 0 replies; 8+ messages in thread
From: Nikhil Dhama @ 2025-05-12  6:50 UTC (permalink / raw)
  To: akpm
  Cc: bharata, huang.ying.caritas, linux-kernel, linux-mm, lkp,
	mgorman, nikhil.dhama, oe-lkp, raghavendra.kt, ying.huang

Hi Andrew, 

On 5/11/2025 10:00 AM, Andrew Morton wrote:
> On Mon, 28 Apr 2025 10:30:47 +0530 Nikhil Dhama <nikhil.dhama@amd.com> wrote:
>
>>> Nikhil's attentions are presumably elsewhere.  Could someone (Ying or
>>> Raghavendra?) please send along altered changelog text which I can
>>> paste in there?
>>
>> Please find the updated changelog text as follows:
>
> As far as I can tell, this replacement text is identical to that of the
> orginal patch.
>

As per Ying's suggestion following changes were made to changelog:

In para 1, 

> free_factor by 1 only when we try to reduce the size of pcp list or
> flush for high order, and free_high used to trigger only
> for order > 0 and order < costly_order and pcp->free_factor > 0.

removed "or flush for high order", updating it as 

> free_factor by 1 only when we try to reduce the size of pcp list and
> free_high used to trigger only for order > 0 and order < costly_order
> and pcp->free_factor > 0. 


In para 8, added idea suggested by Ying [1] behind changing the threshold 
from "batch" to  "batch + pcp->high_min / 2". Changed it from, 

> So, in this patch, we increased the pcp->free_count threshold to 
> trigger free_high from "batch" to "batch + pcp->high_min / 2". 

to 

> So, in this patch, we increased the pcp->free_count threshold to
> trigger free_high from "batch" to "batch + pcp->high_min / 2" as
> suggested by Ying [1], In the original pcp->free_factor solution,
> free_high is triggered for contiguous freeing with size ranging from
> "batch" to "pcp->high + batch".  So, the average value is "batch +
> pcp->high / 2".  While in the pcp->free_count solution, free_high will
> be triggered for contiguous freeing with size "batch".  So, to restore
> the original behavior, we can use the threshold "batch + pcp->high_min
> / 2"
[...]
> Link: https://lore.kernel.org/all/875xjmuiup.fsf@DESKTOP-5N7EMDA/ [1]

You seem to already have updated copy in mm tree from the email where I
missed to add linux-mm.

Thanks,
Nikhil


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-05-12  6:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-07 10:52 [PATCH v3] mm: pcp: increase pcp->free_count threshold to trigger free_high Nikhil Dhama
2025-04-11  2:16 ` Huang, Ying
2025-04-11  6:02   ` Raghavendra K T
2025-04-11  6:15     ` Huang, Ying
2025-04-26  2:11       ` Andrew Morton
2025-04-28  5:00         ` Nikhil Dhama
2025-05-11  4:30           ` Andrew Morton
2025-05-12  6:50             ` Nikhil Dhama

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox