From: MengEn Sun <mengensun88@gmail.com>
To: rientjes@google.com
Cc: akpm@linux-foundation.org, alexjlzheng@tencent.com,
linux-mm@kvack.org, mengensun88@gmail.com, mengensun@tencent.com
Subject: Re: [PATCH] mm/page_alloc: add cond_resched in __drain_all_pages()
Date: Wed, 8 Jan 2025 01:39:38 +0800 [thread overview]
Message-ID: <1736271578-29364-1-git-send-email-mengensun@tencent.com> (raw)
In-Reply-To: <3b000941-b1b6-befa-4ec9-2bff63d557c1@google.com>
Hi, David
>
> > else
> > drain_pages(cpu);
> > + cond_resched();
> > }
> >
> > mutex_unlock(&pcpu_drain_mutex);
>
> This is another example of a soft lockup that we haven't observed and we
> have systems with many more cores than 64.
It seems that the cause of this issue is not related to the number of CPUs,
but rather more to the ratio of memory capacity to the number of CPUs, or
the total memory capacity itself.
For example, my machine has 64 CPUs and 256 GB of memory, with a single
NUMA node. Under the current kernel, for a single zone, the amount of memory
that can be allocated to the PCP (Per-CPU Pool) across all CPUs is
approximately one-eighth of the total memory in that zone.
So, in the worst-case scenario on my machine:
The total memory in the NORMAL zone is about 32 GB (one-eighth of the total),
and with 64 CPUs, each CPU can receive approximately 512 MB of memory in the
worst case. With a page size of 4 KB, this means each CPU has about 128,000
(100K+) pages in the PCP.
Although the PCP auto-tune algorithm starts to compress the PCP capacity when
memory is tight (for example, when it falls below the high watermark or
during memory reclamation in the zone), this relies on memory allocation and
release on the CPU or a delayed work to trigger this action. However, the
delayed work and memory allocation/release actions are not very controllable.
>
> Is this happening because of contention on pcp->lock or zone->lock? I
> would assume the latter, but best to confirm.
You are right, because we are conducting memory stress testing, and
zone->lock is indeed a hotspot.
> I think this is just papering over a scalability problem with zone->lock.
> How many NUMA nodes and zones does this 223GB system have?
>
> If this is a problem with zone->lock, this problem should likely be
> addressed more holistically.
You are right; the zone->lock issue can indeed become a hotspot in larger
machines. However, I feel that fundamentally solving it is not very easy.
This PCP feature adopts an approach of aggregating tasks for batch
processing.
Another idea is to break the critical sections into smaller ones, but
I'm not sure if this approach is feasible.
Best Regards
prev parent reply other threads:[~2025-01-07 17:39 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-25 6:26 mengensun88
2024-12-25 23:03 ` David Rientjes
2025-01-07 17:39 ` MengEn Sun [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1736271578-29364-1-git-send-email-mengensun@tencent.com \
--to=mengensun88@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alexjlzheng@tencent.com \
--cc=linux-mm@kvack.org \
--cc=mengensun@tencent.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox