From: Joshua Hahn <joshua.hahnjy@gmail.com>
To: "Harry Yoo (Oracle)" <harry@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@kernel.org>, Yosry Ahmed <yosry@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
Vlastimil Babka <vbabka@kernel.org>,
Dennis Zhou <dennis@kernel.org>, Tejun Heo <tj@kernel.org>,
Christoph Lameter <cl@gentwo.org>,
cgroups@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH v2] mm/percpu, memcontrol: Per-memcg-lruvec percpu accounting
Date: Tue, 7 Apr 2026 20:40:24 -0700 [thread overview]
Message-ID: <20260408034025.3317937-1-joshua.hahnjy@gmail.com> (raw)
In-Reply-To: <adXAG52R6WVHd0n9@hyeyoo>
On Wed, 8 Apr 2026 11:40:27 +0900 "Harry Yoo (Oracle)" <harry@kernel.org> wrote:
> On Fri, Apr 03, 2026 at 08:38:43PM -0700, Joshua Hahn wrote:
> > enum memcg_stat_item includes memory that is tracked on a per-memcg
> > level, but not at a per-node (and per-lruvec) level. Diagnosing
> > memory pressure for memcgs in multi-NUMA systems can be difficult,
> > since not all of the memory accounted in memcg can be traced back
> > to a node. In scenarios where numa nodes in an memcg are asymmetrically
> > stressed, this difference can be invisible to the user.
> >
> > Convert MEMCG_PERCPU_B from a memcg_stat_item to a memcg_node_stat_item
> > to give visibility into per-node breakdowns for percpu allocations.
> >
> > This will get us closer to being able to know the memcg and physical
> > association of all memory on the system. Specifically for percpu, this
> > granularity will help demonstrate footprint differences on systems with
> > asymmetric NUMA nodes.
> >
> > Because percpu memory is accounted at a sub-PAGE_SIZE level, we must
> > account node level statistics (accounted in PAGE_SIZE units) and
> > memcg-lruvec statistics separately. Account node statistics when the pcpu
> > pages are allocated, and account memcg-lruvec statistics when pcpu
> > objects are handed out.
> >
> > To do account these separately, expose mod_memcg_lruvec_state to be
> > used outside of memcontrol.
> >
> > The memory overhead of this patch is small; it adds 16 bytes
> > per-cgroup-node-cpu. For an example machine with 200 CPUs split across
> > 2 nodes and 50 cgroups in the system, we see a 312.5 kB increase. Note
> > that this is the same cost as any other item in memcg_node_stat_item.
> >
> > Performance impact is also negligible. These are results from a kernel
> > module which performs 100k percpu allocations via __alloc_percpu_gfp
> > with GFP_KERNEL | __GFP_ACCOUNT in a cgroup, across 20 trials.
> > Batched performs 100k allocations followed by 100k frees, while
> > interleaved performs allocation --> free --> allocation ...
> >
> > +-------------+----------------+--------------+--------------+
> > | Test | linus-upstream | patch | diff |
> > +-------------+----------------+--------------+--------------+
> > | Batched | 6586 +/- 51 | 6595 +/- 35 | +9 (0.13%) |
> > | Interleaved | 1053 +/- 126 | 1085 +/- 113 | +32 (+0.85%) |
> > +-------------+----------------+--------------+--------------+
> >
> > One functional change is that there can be a tiny inconsistency between
> > the size of the allocation used for memcg limit checking and what is
> > charged to each lruvec due to dropping fractional charges when rounding.
> > In reality this value is very very small and always lies on the side of
> > memory checking at a higher threshold, so there is no behavioral change
> > from userspace.
> >
> > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> > ---
> > include/linux/memcontrol.h | 4 +++-
> > include/linux/mmzone.h | 4 +++-
> > mm/memcontrol.c | 12 +++++-----
> > mm/percpu-vm.c | 14 ++++++++++--
> > mm/percpu.c | 45 ++++++++++++++++++++++++++++++++++----
> > mm/vmstat.c | 1 +
> > 6 files changed, 66 insertions(+), 14 deletions(-)
> >
> > diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
> > index 4f5937090590d..e36b639f521dd 100644
> > --- a/mm/percpu-vm.c
> > +++ b/mm/percpu-vm.c
> > @@ -65,6 +66,10 @@ static void pcpu_free_pages(struct pcpu_chunk *chunk,
> > __free_page(page);
> > }
> > }
> > +
> > + for_each_node(nid)
> > + mod_node_page_state(NODE_DATA(nid), NR_PERCPU_B,
> > + -1L * nr_pages * nr_cpus_node(nid) * PAGE_SIZE);
>
> Can this end up with mis-accounting due to CPU hotplug?
Hey Harry, thanks for giving this patch a look!
Yes, definitely. I think the solution is just to charge based on possible
CPUs, even if that might lead to some inaccuracy (by however many CPUs
aren't online at that moment). Seems like that's what already happens
in memcg anyways, so I think this discrepancy is OK to tolerate.
Will spin up a v3! Thanks a lot, Harry! Have a great day : -)
Joshua
next prev parent reply other threads:[~2026-04-08 3:40 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-04 3:38 Joshua Hahn
2026-04-04 4:56 ` Matthew Wilcox
2026-04-04 5:03 ` Joshua Hahn
2026-04-08 2:40 ` Harry Yoo (Oracle)
2026-04-08 3:40 ` Joshua Hahn [this message]
2026-04-08 3:52 ` Harry Yoo (Oracle)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260408034025.3317937-1-joshua.hahnjy@gmail.com \
--to=joshua.hahnjy@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=cl@gentwo.org \
--cc=david@kernel.org \
--cc=dennis@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=harry@kernel.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=tj@kernel.org \
--cc=vbabka@kernel.org \
--cc=yosry@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox