linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Raghavendra K T <raghavendra.kt@amd.com>
To: "Huang, Ying" <ying.huang@linux.alibaba.com>,
	Nikhil Dhama <nikhil.dhama@amd.com>
Cc: akpm@linux-foundation.org, bharata@amd.com,
	raghavendra.kodsarathimmappa@amd.com, oe-lkp@lists.linux.dev,
	lkp@intel.com, Huang Ying <huang.ying.caritas@gmail.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [PATCH v3] mm: pcp: increase pcp->free_count threshold to trigger free_high
Date: Fri, 11 Apr 2025 11:32:08 +0530	[thread overview]
Message-ID: <c8b2a3c9-2252-4c0a-85a9-26fa6b519757@amd.com> (raw)
In-Reply-To: <87mscn8msp.fsf@DESKTOP-5N7EMDA>



On 4/11/2025 7:46 AM, Huang, Ying wrote:
> Hi, Nikhil,
> 
> Sorry for late reply.
> 
> Nikhil Dhama <nikhil.dhama@amd.com> writes:
> 
>> In old pcp design, pcp->free_factor gets incremented in nr_pcp_free()
>> which is invoked by free_pcppages_bulk(). So, it used to increase
>> free_factor by 1 only when we try to reduce the size of pcp list or
>> flush for high order, and free_high used to trigger only
>> for order > 0 and order < costly_order and pcp->free_factor > 0.
>>
>> For iperf3 I noticed that with older design in kernel v6.6, pcp list was
>> drained mostly when pcp->count > high (more often when count goes above
>> 530). and most of the time pcp->free_factor was 0, triggering very few
>> high order flushes.
>>
>> But this is changed in the current design, introduced in commit 6ccdcb6d3a74
>> ("mm, pcp: reduce detecting time of consecutive high order page freeing"),
>> where pcp->free_factor is changed to pcp->free_count to keep track of the
>> number of pages freed contiguously. In this design, pcp->free_count is
>> incremented on every deallocation, irrespective of whether pcp list was
>> reduced or not. And logic to trigger free_high is if pcp->free_count goes
>> above batch (which is 63) and there are two contiguous page free without
>> any allocation.
> 
> The design changes because pcp->high can become much higher than that
> before it.  This makes it much harder to trigger free_high, which causes
> some performance regressions too.
> 
>> With this design, for iperf3, pcp list is getting flushed more frequently
>> because free_high heuristics is triggered more often now. I observed that
>> high order pcp list is drained as soon as both count and free_count goes
>> above 63.
>>
>> Due to this more aggressive high order flushing, applications
>> doing contiguous high order allocation will require to go to global list
>> more frequently.
>>
>> On a 2-node AMD machine with 384 vCPUs on each node,
>> connected via Mellonox connectX-7, I am seeing a ~30% performance
>> reduction if we scale number of iperf3 client/server pairs from 32 to 64.
>>
>> Though this new design reduced the time to detect high order flushes,
>> but for application which are allocating high order pages more
>> frequently it may be flushing the high order list pre-maturely.
>> This motivates towards tuning on how late or early we should flush
>> high order lists.
>>
>> So, in this patch, we increased the pcp->free_count threshold to
>> trigger free_high from "batch" to "batch + pcp->high_min / 2".
>> This new threshold keeps high order pages in pcp list for a
>> longer duration which can help the application doing high order
>> allocations frequently.
> 
> IIUC, we restore the original behavior with "batch + pcp->high / 2" as
> in my analysis in
> 
> https://lore.kernel.org/all/875xjmuiup.fsf@DESKTOP-5N7EMDA/
> 
> If you think my analysis is correct, can you add that in patch
> description too?  This makes it easier for people to know why the code
> looks this way.
> 

Yes. This makes sense. Andrew has already included the patch in mm tree.

Nikhil,

Could you please help with the updated write up based on Ying's
suggestion assuming it works for Andrew?

- Raghu




  reply	other threads:[~2025-04-11  6:02 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-07 10:52 Nikhil Dhama
2025-04-11  2:16 ` Huang, Ying
2025-04-11  6:02   ` Raghavendra K T [this message]
2025-04-11  6:15     ` Huang, Ying
2025-04-26  2:11       ` Andrew Morton
2025-04-28  5:00         ` Nikhil Dhama
2025-05-11  4:30           ` Andrew Morton
2025-05-12  6:50             ` Nikhil Dhama

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c8b2a3c9-2252-4c0a-85a9-26fa6b519757@amd.com \
    --to=raghavendra.kt@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@amd.com \
    --cc=huang.ying.caritas@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=nikhil.dhama@amd.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=raghavendra.kodsarathimmappa@amd.com \
    --cc=ying.huang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox