Re: [PATCH linux-mm v2] mm: make pcp_decay_high working better with NOHZ full

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Huang, Ying" <ying.huang@intel.com>
To: MengEn Sun <mengensun88@gmail.com>
Cc: akpm@linux-foundation.org,  alexjlzheng@tencent.com,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org,
	 mengensun@tencent.com
Subject: Re: [PATCH linux-mm v2] mm: make pcp_decay_high working better with NOHZ full
Date: Tue, 22 Oct 2024 14:36:56 +0800	[thread overview]
Message-ID: <87v7xk4p9z.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <1729574046-3392-1-git-send-email-mengensun@tencent.com> (MengEn Sun's message of "Tue, 22 Oct 2024 13:14:06 +0800")

MengEn Sun <mengensun88@gmail.com> writes:

> Thank you for your suggestion. I understand and am ready to make
> some changes  
>
>> 
>> Have verified the issue with some test?  If not, I suggest you to do
>> that.
>> 
>
> I have conducted tests:
> Applying this patch or not does not have a significant impact on the
> test results.

I don't expect some measurable performance difference with the patch.
If we can observe that the PCP size isn't tuned down to high_min before
and is after, that should be a valid test result to show the value of
the patch.  Can you try that?  The PCP size can be observed via
/proc/zoneinfo.

> perhaps my testing was not thorough enough. #^_^
>
> But, the logic of the code is like following:
> CPU0                              CPUx
> ----                              -----
>                                   T0: vmstat_work is pending
> T1: vmstat_shepherd
>     check vmstat_work
>     and do nothing
>                                   T2: vmstat_work is in unpending state.
>
>                                   T3: alloc many pages
>                                   T4: free all the pages allocated at T3
>                                   T5: entry NOHZ, flushing all zonestats
>                                       and nodestats
> T6: next vmstat_shepherd fired
>
> In my opinion, there are indeed some issues. I'm not sure if there's
> something I haven't understood?
>
>
> By the way,
> There are two other questions for me:
> Q1:
> Vmstat_work is a **deferreable work** So, It may be delayed for a long time
> by NOHZ. As a result, "vmstat_update() may not be executed once every
> second in the above scenario. Therefore, I'm not sure if using a deferrable
> work to reduce pcp->high is appropriate. In my tests, if I don't use
> deferrable work, it takes about a minute to reduce high to high_min, but
> using deferrable work may take several minutes to reduce high to high_min.

It's not a big issue to take longer time to decay pcp->high.

> Q2:
> On a big machine, for example, with 1TB of memory, the default maximum
> memory on PCP can be 1TB * 0.125.
> This portion of memory is not accounted for in MemFree in /proc/meminfo.
> Users can see this portion of memory from /proc/zoneinfo, but the memory
> reported by the `free` command is reduced.
> can we include the PCP memory in the MemFree statistic in /proc/meminfo?

This has been discussed before.

https://lore.kernel.org/linux-mm/20220816084426.135528-1-wangkefeng.wang@huawei.com/
https://lore.kernel.org/linux-mm/20240830014453.3070909-1-mawupeng1@huawei.com/

>> > While, This seems to be fine:
>> > - if freeing and allocating memory occur later, it may the
>> >   high_max may be adjust automatically
>> > - If memory is tight, the memory reclamation process will
>> >   release the pcp
>> 
>> This could be a real issue for me.
>
> Thanks, I will test more carefully for those issue
>
>> 
>> > Whatever, we make vmstat_shepherd to checking whether we need
>> > decay pcp high_max, and fire pcp_decay_high early if we need.
>> >
>> > Fixes: 51a755c56dc0 ("mm: tune PCP high automatically")
>> > Reviewed-by: Jinliang Zheng <alexjlzheng@tencent.com>
>> > Signed-off-by: MengEn Sun <mengensun@tencent.com>
>> > ---
>> > changelog:
>> > v1: https://lore.kernel.org/lkml/20241012154328.015f57635566485ad60712f3@linux-foundation.org/T/#t
>> > v2: Make the commit message clearer by adding some comments.
>> > ---
>> >  mm/vmstat.c | 9 +++++++++
>> >  1 file changed, 9 insertions(+)
>> >
>> > diff --git a/mm/vmstat.c b/mm/vmstat.c
>> > index 1917c034c045..07b494b06872 100644
>> > --- a/mm/vmstat.c
>> > +++ b/mm/vmstat.c
>> > @@ -2024,8 +2024,17 @@ static bool need_update(int cpu)
>> >  
>> >  	for_each_populated_zone(zone) {
>> >  		struct per_cpu_zonestat *pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu);
>> > +		struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
>> >  		struct per_cpu_nodestat *n;
>> >  
>> > +		/* per_cpu_nodestats and per_cpu_zonestats maybe flush when cpu
>> > +		 * entering NOHZ full, see quiet_vmstat. so, we check pcp
>> > +		 * high_{min,max} to determine whether it is necessary to run
>> > +		 * decay_pcp_high on the corresponding CPU
>> > +		 */
>> 
>> Please follow the comments coding style.
>> 
>>                 /*
>>                  * comments line 1
>>                  * comments line 2
>>                  */
>> 
>
> Thank you for your suggestion. I understand and am ready to make
> some changes
>
>> > +		if (pcp->high_max > pcp->high_min)
>> > +			return true;
>> > +
>> 
>> We don't tune pcp->high_max/min in fact.  Instead, we tune pcp->high.
>> Your code may make need_update() return true in most cases.
>
> You are right, using high_max is incorrect. May i use pcp->high > pcp->high_min?
>
>> 
>> >  		/*
>> >  		 * The fast way of checking if there are any vmstat diffs.
>> >  		 */

--
Best Regards,
Huang, Ying

     prev parent reply	other threads:[~2024-10-22  6:40 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-18  7:57 mengensun88
2024-10-21  8:40 ` Huang, Ying
2024-10-22  5:14   ` MengEn Sun
2024-10-22  6:36     ` Huang, Ying [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v7xk4p9z.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexjlzheng@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mengensun88@gmail.com \
    --cc=mengensun@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox