From: Joshua Hahn <joshua.hahnjy@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R . Howlett" <liam.howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
cgroups@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: [PATCH 0/8 RFC] mm/memcontrol, page_counter: move stock from mem_cgroup to page_counter
Date: Fri, 10 Apr 2026 14:06:54 -0700 [thread overview]
Message-ID: <20260410210742.550489-1-joshua.hahnjy@gmail.com> (raw)
Memcg currently keeps a "stock" of 64 pages per-cpu to cache pre-charged
allocations, allowing small allocations and frees to avoid walking the
expensive mem_cgroup hierarchy traversal on each charge. This design
introduces a fastpath to charge/uncharge, but has several limitations:
1. Each CPU can track up to 7 (NR_MEMCG_STOCK) mem_cgroups. When more
than 7 mem_cgroups are actively charging on a single CPU, a random
victim is evicted, and its associated stock is drained, which
triggers unnecessary hierarchy walks.
Note that previously there used to be a 1-1 mapping between CPU and
memcg stock; it was bumped up to 7 in f735eebe55f8f ("multi-memcg
percpu charge cache") because it was observed that stock would
frequently get flushed and refilled.
2. Stock management is tightly coupled to struct mem_cgroup, which
makes it difficult to add a new page_counter to struct mem_cgroup
and do its own stock management, since each operation has to be
duplicated.
3. Each stock slot requires a css reference, as well as a traversal
overhead on every stock operation to check which cpu-memcg we are
trying to consume stock for.
This series moves the per-cpu stock down into the page_counter, which
consolidates stock limit checking and page_counter limit checking into
page_counter_try_charge. This eliminates the 7-memcg-per-cpu slot
limit, the random evictions (drain & refill), slot traversal, and
css refcounting.
In addition, it makes independent stock management scalable for future
users. As a demonstration, this series also introduces independent
stock management for the cgroup v1 memsw page_counter, which curbs
the likelihood of the worst-case scenario (traversing both the
memsw and memory page_counter hierarchies).
One change that should be noted is that draining is simplified to use
work_on_cpu() for synchronous remote CPU drain. This eliminates the
need for backpointers and embedded work_structs in the per-cpu stock
struct, which minimizes memory overhead. This change over the existing
async drain scheduling was done since the drain operation is much
more rare now, only happening under memory pressure and on cgroup
death (as opposed to the previous arbitrary scenario where more than
7 memcgs are charging to a CPU).
Performance testing across single-cgroup, as well as 4-cgroup (under the
7 memcg limit) and 32-cgroup scenarios on a 40CPU, 50G memory system
shows negligible performance differences. In the tests, I repeatedly
fault and release anonymous pages using madvise(MADV_DONTNEED) to
stress the charge/uncharge path, across 30 trials of 50 iterations.
Metric here is time it took across each iteration (ms).
+----------+--------+-------+--------+-----------+
| #cgroups | before | after | stddev | delta (%) |
+----------+--------+-------+--------+-----------+
| 1 | 446 | 441 | 5.097 | -1.195 |
| 4 | 1832 | 1822 | 11.897 | -0.582 |
| 32 | 14730 | 14739 | 54.089 | 0.061 |
+----------+--------+-------+--------+-----------+
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Joshua Hahn (8):
mm/page_counter: introduce per-page_counter stock
mm/page_counter: use page_counter_stock in page_counter_try_charge
mm/page_counter: use page_counter_stock in page_counter_uncharge
mm/page_counter: introduce stock drain APIs
mm/memcontrol: convert memcg to use page_counter_stock
mm/memcontrol: optimize memsw stock for cgroup v1
mm/memcontrol: optimize stock usage for cgroup v2
mm/memcontrol: remove unused memcg_stock code
include/linux/page_counter.h | 15 ++
mm/memcontrol.c | 269 ++++++-----------------------------
mm/page_counter.c | 173 +++++++++++++++++++++-
3 files changed, 224 insertions(+), 233 deletions(-)
--
2.52.0
next reply other threads:[~2026-04-10 21:07 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-10 21:06 Joshua Hahn [this message]
2026-04-10 21:06 ` [PATCH 1/8 RFC] mm/page_counter: introduce per-page_counter stock Joshua Hahn
2026-04-10 21:06 ` [PATCH 2/8 RFC] mm/page_counter: use page_counter_stock in page_counter_try_charge Joshua Hahn
2026-04-10 21:06 ` [PATCH 3/8 RFC] mm/page_counter: use page_counter_stock in page_counter_uncharge Joshua Hahn
2026-04-10 21:06 ` [PATCH 4/8 RFC] mm/page_counter: introduce stock drain APIs Joshua Hahn
2026-04-10 21:06 ` [PATCH 5/8 RFC] mm/memcontrol: convert memcg to use page_counter_stock Joshua Hahn
2026-04-10 21:07 ` [PATCH 6/8 RFC] mm/memcontrol: optimize memsw stock for cgroup v1 Joshua Hahn
2026-04-10 21:07 ` [PATCH 7/8 RFC] mm/memcontrol: optimize stock usage for cgroup v2 Joshua Hahn
2026-04-10 21:07 ` [PATCH 8/8 RFC] mm/memcontrol: remove unused memcg_stock code Joshua Hahn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260410210742.550489-1-joshua.hahnjy@gmail.com \
--to=joshua.hahnjy@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=liam.howlett@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox