linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@linux.alibaba.com>
To: Raghavendra K T <raghavendra.kt@amd.com>,
	Nikhil Dhama <nikhil.dhama@amd.com>
Cc: akpm@linux-foundation.org,  bharata@amd.com,
	raghavendra.kodsarathimmappa@amd.com,  oe-lkp@lists.linux.dev,
	lkp@intel.com,  Huang Ying <huang.ying.caritas@gmail.com>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
	 Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [PATCH v3] mm: pcp: increase pcp->free_count threshold to trigger free_high
Date: Fri, 11 Apr 2025 14:15:42 +0800	[thread overview]
Message-ID: <87mscn5ilt.fsf@DESKTOP-5N7EMDA> (raw)
In-Reply-To: <c8b2a3c9-2252-4c0a-85a9-26fa6b519757@amd.com> (Raghavendra K. T.'s message of "Fri, 11 Apr 2025 11:32:08 +0530")

Raghavendra K T <raghavendra.kt@amd.com> writes:

> On 4/11/2025 7:46 AM, Huang, Ying wrote:
>> Hi, Nikhil,
>> Sorry for late reply.
>> Nikhil Dhama <nikhil.dhama@amd.com> writes:
>> 
>>> In old pcp design, pcp->free_factor gets incremented in nr_pcp_free()
>>> which is invoked by free_pcppages_bulk(). So, it used to increase
>>> free_factor by 1 only when we try to reduce the size of pcp list or
>>> flush for high order, and free_high used to trigger only
>>> for order > 0 and order < costly_order and pcp->free_factor > 0.
>>>
>>> For iperf3 I noticed that with older design in kernel v6.6, pcp list was
>>> drained mostly when pcp->count > high (more often when count goes above
>>> 530). and most of the time pcp->free_factor was 0, triggering very few
>>> high order flushes.
>>>
>>> But this is changed in the current design, introduced in commit 6ccdcb6d3a74
>>> ("mm, pcp: reduce detecting time of consecutive high order page freeing"),
>>> where pcp->free_factor is changed to pcp->free_count to keep track of the
>>> number of pages freed contiguously. In this design, pcp->free_count is
>>> incremented on every deallocation, irrespective of whether pcp list was
>>> reduced or not. And logic to trigger free_high is if pcp->free_count goes
>>> above batch (which is 63) and there are two contiguous page free without
>>> any allocation.
>> The design changes because pcp->high can become much higher than
>> that
>> before it.  This makes it much harder to trigger free_high, which causes
>> some performance regressions too.
>> 
>>> With this design, for iperf3, pcp list is getting flushed more frequently
>>> because free_high heuristics is triggered more often now. I observed that
>>> high order pcp list is drained as soon as both count and free_count goes
>>> above 63.
>>>
>>> Due to this more aggressive high order flushing, applications
>>> doing contiguous high order allocation will require to go to global list
>>> more frequently.
>>>
>>> On a 2-node AMD machine with 384 vCPUs on each node,
>>> connected via Mellonox connectX-7, I am seeing a ~30% performance
>>> reduction if we scale number of iperf3 client/server pairs from 32 to 64.
>>>
>>> Though this new design reduced the time to detect high order flushes,
>>> but for application which are allocating high order pages more
>>> frequently it may be flushing the high order list pre-maturely.
>>> This motivates towards tuning on how late or early we should flush
>>> high order lists.
>>>
>>> So, in this patch, we increased the pcp->free_count threshold to
>>> trigger free_high from "batch" to "batch + pcp->high_min / 2".
>>> This new threshold keeps high order pages in pcp list for a
>>> longer duration which can help the application doing high order
>>> allocations frequently.
>> IIUC, we restore the original behavior with "batch + pcp->high / 2"
>> as
>> in my analysis in
>> https://lore.kernel.org/all/875xjmuiup.fsf@DESKTOP-5N7EMDA/
>> If you think my analysis is correct, can you add that in patch
>> description too?  This makes it easier for people to know why the code
>> looks this way.
>> 
>
> Yes. This makes sense. Andrew has already included the patch in mm tree.
>
> Nikhil,
>
> Could you please help with the updated write up based on Ying's
> suggestion assuming it works for Andrew?

Thanks!

Just send a updated version, Andrew will update the patch in mm tree
unless it has been merged by mm-stable.

---
Best Regards,
Huang, Ying


  reply	other threads:[~2025-04-11  6:15 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-07 10:52 Nikhil Dhama
2025-04-11  2:16 ` Huang, Ying
2025-04-11  6:02   ` Raghavendra K T
2025-04-11  6:15     ` Huang, Ying [this message]
2025-04-26  2:11       ` Andrew Morton
2025-04-28  5:00         ` Nikhil Dhama
2025-05-11  4:30           ` Andrew Morton
2025-05-12  6:50             ` Nikhil Dhama

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mscn5ilt.fsf@DESKTOP-5N7EMDA \
    --to=ying.huang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@amd.com \
    --cc=huang.ying.caritas@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=nikhil.dhama@amd.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=raghavendra.kodsarathimmappa@amd.com \
    --cc=raghavendra.kt@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox