From: Andrew Morton <akpm@linux-foundation.org>
To: Huang Ying <ying.huang@intel.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Christoph Lameter <cl@linux.com>,
Mel Gorman <mgorman@techsingularity.net>,
Vlastimil Babka <vbabka@suse.cz>,
Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH -V2] mm: fix draining PCP of remote zone
Date: Mon, 9 Oct 2023 17:41:35 -0700 [thread overview]
Message-ID: <20231009174135.2357dcfcdc691a6ef61dbd9a@linux-foundation.org> (raw)
In-Reply-To: <20231007062356.187621-1-ying.huang@intel.com>
On Sat, 7 Oct 2023 14:23:56 +0800 Huang Ying <ying.huang@intel.com> wrote:
> If there is no memory allocation/freeing in the PCP (Per-CPU Pageset)
> of a remote zone (zone in remote NUMA node) after some time (3 seconds
> for now), the pages of the PCP of the remote zone will be drained to
> avoid memory wastage.
>
> This behavior was introduced in the commit 4ae7c03943fc ("[PATCH]
> Periodically drain non local pagesets") and the commit
> 4037d452202e ("Move remote node draining out of slab allocators")
>
> But, after the commit 7cc36bbddde5 ("vmstat: on-demand vmstat workers
> V8"), the vmstat updater worker which is used to drain the PCP of
> remote zones may not be re-queued when we are waiting for the
> timeout (pcp->expire != 0) if there are no vmstat changes on this CPU,
> for example, when the CPU goes idle or runs user space only workloads.
> This may cause the pages of a remote zone be kept in PCP of this CPU
> for long time. So that, the page reclaiming of the remote zone may be
> triggered prematurely. This isn't a severe problem in practice,
> because the PCP of the remote zone will be drained if some memory are
> allocated/freed again on this CPU. And, the PCP will eventually be
> drained during the direct reclaiming if necessary.
>
> Anyway, the problem still deserves a fix via guaranteeing that the
> vmstat updater worker will always be re-queued when we are waiting for
> the timeout. In effect, this restores the original behavior before
> the commit 7cc36bbddde5.
>
> We can reproduce the bug via allocating/freeing pages from a remote
> zone then go idle as follows. And the patch can fix it.
>
> - Run some workloads, use `numactl` to bind CPU to node 0 and memory to
> node 1. So the PCP of the CPU on node 0 for zone on node 1 will be
> filled.
>
> - After workloads finish, idle for 60s
>
> - Check /proc/zoneinfo
>
> With the original kernel, the number of pages in the PCP of the CPU on
> node 0 for zone on node 1 is non-zero after idle. With the patched
> kernel, it becomes 0 after idle. That is, we avoid to keep pages in
> the remote PCP during idle.
>
Thanks, I updated the changelog in place and queued this for mm-stable.
prev parent reply other threads:[~2023-10-10 0:42 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-07 6:23 Huang Ying
2023-10-10 0:41 ` Andrew Morton [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231009174135.2357dcfcdc691a6ef61dbd9a@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=vbabka@suse.cz \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox