Re: [PATCH -next v2] mm, proc: collect percpu free pages into the free pages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Kefeng Wang <wangkefeng.wang@huawei.com>
To: Dmytro Maluka <dmaluka@chromium.org>, Michal Hocko <mhocko@suse.com>
Cc: Liu Shixin <liushixin2@huawei.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	huang ying <huang.ying.caritas@gmail.com>,
	Aaron Lu <aaron.lu@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>, Kemi Wang <kemi.wang@intel.com>,
	<linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>
Subject: Re: [PATCH -next v2] mm, proc: collect percpu free pages into the free pages
Date: Sat, 25 Nov 2023 10:22:10 +0800	[thread overview]
Message-ID: <2dbbd4e7-036f-4643-b05f-5967f4253ab8@huawei.com> (raw)
In-Reply-To: <ZWDjbrHx6XNzAtl_@google.com>



On 2023/11/25 1:54, Dmytro Maluka wrote:
> On Tue, Aug 23, 2022 at 03:37:52PM +0200, Michal Hocko wrote:
>> On Tue 23-08-22 20:46:43, Liu Shixin wrote:
>>> On 2022/8/23 15:50, Michal Hocko wrote:
>>>> On Mon 22-08-22 14:12:07, Andrew Morton wrote:
>>>>> On Mon, 22 Aug 2022 11:33:54 +0800 Liu Shixin <liushixin2@huawei.com> wrote:
>>>>>
>>>>>> The page on pcplist could be used, but not counted into memory free or
>>>>>> avaliable, and pcp_free is only showed by show_mem() for now. Since commit
>>>>>> d8a759b57035 ("mm, page_alloc: double zone's batchsize"), there is a
>>>>>> significant decrease in the display of free memory, with a large number
>>>>>> of cpus and zones, the number of pages in the percpu list can be very
>>>>>> large, so it is better to let user to know the pcp count.
>>>>>>
>>>>>> On a machine with 3 zones and 72 CPUs. Before commit d8a759b57035, the
>>>>>> maximum amount of pages in the pcp lists was theoretically 162MB(3*72*768KB).
>>>>>> After the patch, the lists can hold 324MB. It has been observed to be 114MB
>>>>>> in the idle state after system startup in practice(increased 80 MB).
>>>>>>
>>>>> Seems reasonable.
>>>> I have asked in the previous incarnation of the patch but haven't really
>>>> received any answer[1]. Is this a _real_ problem? The absolute amount of
>>>> memory could be perceived as a lot but is this really noticeable wrt
>>>> overall memory on those systems?
> 
> Let me provide some other numbers, from the desktop side. On a low-end
> chromebook with 4GB RAM and a dual-core CPU, after commit b92ca18e8ca5
> (mm/page_alloc: disassociate the pcp->high from pcp->batch) the max
> amount of PCP pages increased 56x times: from 2.9MB (1.45 per CPU) to
> 165MB (82.5MB per CPU).
> 
> On such a system, memory pressure conditions are not a rare occurrence,
> so several dozen MB make a lot of difference.

And with mm: PCP high auto-tuning merged in v6.7， the pcp could be more 
bigger than before.

> 
> (The reason it increased so much is because it now corresponds to the
> low watermark, which is 165MB. And the low watermark, in turn, is so
> high because of khugepaged, which bumps up min_free_kbytes to 132MB
> regardless of the total amount of memory.)
> 
>>> This may not obvious when the memory is sufficient. However, as products monitor the
>>> memory to plan it. The change has caused warning.
>>
>> Is it possible that the said monitor is over sensitive and looking at
>> wrong numbers? Overall free memory doesn't really tell much TBH.
>> MemAvailable is a very rough estimation as well.
>>
>> In reality what really matters much more is whether the memory is
>> readily available when it is required and none of MemFree/MemAvailable
>> gives you that information in general case.
>>
>>> We have also considered using /proc/zoneinfo to calculate the total
>>> number of pcplists. However, we think it is more appropriate to add
>>> the total number of pcplists to free and available pages. After all,
>>> this part is also free pages.
>>
>> Those free pages are not generally available as exaplained. They are
>> available to a specific CPU, drained under memory pressure and other
>> events but still there is no guarantee a specific process can harvest
>> that memory because the pcp caches are replenished all the time.
>> So in a sense it is a semi-hidden memory.
> 
> I was intuitively assuming that per-CPU pages should be always available
> for allocation without resorting to paging out allocated pages (and thus
> it should be non-controversially a good idea to include per-CPU pages in
> MemFree, to make it more accurate).
> 
> But looking at the code in __alloc_pages() and around, I see you are
> right: we don't try draining other CPUs' PCP lists *before* resorting to
> direct reclaim, compaction etc.
> 
> BTW, why not? Shouldn't draining PCP lists be cheaper than pageout() in
> any case?

Same question here, could drain pcp before direct reclaim?

> 
>> That being said, I am still not convinced this is actually going to help
>> all that much. You will see a slightly different numbers which do not
>> tell much one way or another and if the sole reason for tweaking these
>> numbers is that some monitor is complaining because X became X-epsilon
>> then this sounds like a weak justification to me. That epsilon happens
>> all the time because there are quite some hidden caches that are
>> released under memory pressure. I am not sure it is maintainable to
>> consider each one of them and pretend that MemFree/MemAvailable is
>> somehow precise. It has never been and likely never will be.
>> -- 
>> Michal Hocko
>> SUSE Labs

next prev parent reply	other threads:[~2023-11-25  2:42 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-22  2:33 [PATCH -next] " Liu Shixin
2022-08-22  3:33 ` [PATCH -next v2] " Liu Shixin
2022-08-22 21:12   ` Andrew Morton
2022-08-22 21:13     ` Andrew Morton
2022-08-23 13:12       ` Liu Shixin
2022-08-23  7:50     ` Michal Hocko
2022-08-23 12:46       ` Liu Shixin
2022-08-23 13:37         ` Michal Hocko
2022-08-24 10:05           ` Liu Shixin
2022-08-24 10:12             ` Michal Hocko
2023-11-24 17:54           ` Dmytro Maluka
2023-11-25  2:22             ` Kefeng Wang [this message]
2023-11-27  8:50             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2dbbd4e7-036f-4643-b05f-5967f4253ab8@huawei.com \
    --to=wangkefeng.wang@huawei.com \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=brouer@redhat.com \
    --cc=dave.hansen@intel.com \
    --cc=dmaluka@chromium.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=huang.ying.caritas@gmail.com \
    --cc=kemi.wang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liushixin2@huawei.com \
    --cc=mhocko@suse.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox