linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Joshua Hahn <joshua.hahnjy@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R . Howlett" <liam.howlett@oracle.com>,
	Vlastimil Babka <vbabka@kernel.org>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: [PATCH 0/8 RFC] mm/memcontrol, page_counter: move stock from mem_cgroup to page_counter
Date: Fri, 10 Apr 2026 14:06:54 -0700	[thread overview]
Message-ID: <20260410210742.550489-1-joshua.hahnjy@gmail.com> (raw)

Memcg currently keeps a "stock" of 64 pages per-cpu to cache pre-charged
allocations, allowing small allocations and frees to avoid walking the
expensive mem_cgroup hierarchy traversal on each charge. This design
introduces a fastpath to charge/uncharge, but has several limitations:

1. Each CPU can track up to 7 (NR_MEMCG_STOCK) mem_cgroups. When more
   than 7 mem_cgroups are actively charging on a single CPU, a random
   victim is evicted, and its associated stock is drained, which
   triggers unnecessary hierarchy walks.

   Note that previously there used to be a 1-1 mapping between CPU and
   memcg stock; it was bumped up to 7 in f735eebe55f8f ("multi-memcg
   percpu charge cache") because it was observed that stock would
   frequently get flushed and refilled.

2. Stock management is tightly coupled to struct mem_cgroup, which
   makes it difficult to add a new page_counter to struct mem_cgroup
   and do its own stock management, since each operation has to be
   duplicated.

3. Each stock slot requires a css reference, as well as a traversal
   overhead on every stock operation to check which cpu-memcg we are
   trying to consume stock for.

This series moves the per-cpu stock down into the page_counter, which
consolidates stock limit checking and page_counter limit checking into
page_counter_try_charge. This eliminates the 7-memcg-per-cpu slot
limit, the random evictions (drain & refill), slot traversal, and
css refcounting.

In addition, it makes independent stock management scalable for future
users. As a demonstration, this series also introduces independent
stock management for the cgroup v1 memsw page_counter, which curbs
the likelihood of the worst-case scenario (traversing both the
memsw and memory page_counter hierarchies).

One change that should be noted is that draining is simplified to use
work_on_cpu() for synchronous remote CPU drain. This eliminates the
need for backpointers and embedded work_structs in the per-cpu stock
struct, which minimizes memory overhead. This change over the existing
async drain scheduling was done since the drain operation is much
more rare now, only happening under memory pressure and on cgroup
death (as opposed to the previous arbitrary scenario where more than
7 memcgs are charging to a CPU).

Performance testing across single-cgroup, as well as 4-cgroup (under the
7 memcg limit) and 32-cgroup scenarios on a 40CPU, 50G memory system
shows negligible performance differences. In the tests, I repeatedly
fault and release anonymous pages using madvise(MADV_DONTNEED) to
stress the charge/uncharge path, across 30 trials of 50 iterations.
Metric here is time it took across each iteration (ms).

+----------+--------+-------+--------+-----------+
| #cgroups | before | after | stddev | delta (%) |
+----------+--------+-------+--------+-----------+
|        1 |    446 |   441 |  5.097 |    -1.195 |
|        4 |   1832 |  1822 | 11.897 |    -0.582 |
|       32 |  14730 | 14739 | 54.089 |     0.061 |
+----------+--------+-------+--------+-----------+

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>

Joshua Hahn (8):
  mm/page_counter: introduce per-page_counter stock
  mm/page_counter: use page_counter_stock in page_counter_try_charge
  mm/page_counter: use page_counter_stock in page_counter_uncharge
  mm/page_counter: introduce stock drain APIs
  mm/memcontrol: convert memcg to use page_counter_stock
  mm/memcontrol: optimize memsw stock for cgroup v1
  mm/memcontrol: optimize stock usage for cgroup v2
  mm/memcontrol: remove unused memcg_stock code

 include/linux/page_counter.h |  15 ++
 mm/memcontrol.c              | 269 ++++++-----------------------------
 mm/page_counter.c            | 173 +++++++++++++++++++++-
 3 files changed, 224 insertions(+), 233 deletions(-)

-- 
2.52.0



             reply	other threads:[~2026-04-10 21:07 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-10 21:06 Joshua Hahn [this message]
2026-04-10 21:06 ` [PATCH 1/8 RFC] mm/page_counter: introduce per-page_counter stock Joshua Hahn
2026-04-10 21:06 ` [PATCH 2/8 RFC] mm/page_counter: use page_counter_stock in page_counter_try_charge Joshua Hahn
2026-04-10 21:06 ` [PATCH 3/8 RFC] mm/page_counter: use page_counter_stock in page_counter_uncharge Joshua Hahn
2026-04-10 21:06 ` [PATCH 4/8 RFC] mm/page_counter: introduce stock drain APIs Joshua Hahn
2026-04-10 21:06 ` [PATCH 5/8 RFC] mm/memcontrol: convert memcg to use page_counter_stock Joshua Hahn
2026-04-10 21:07 ` [PATCH 6/8 RFC] mm/memcontrol: optimize memsw stock for cgroup v1 Joshua Hahn
2026-04-10 21:07 ` [PATCH 7/8 RFC] mm/memcontrol: optimize stock usage for cgroup v2 Joshua Hahn
2026-04-10 21:07 ` [PATCH 8/8 RFC] mm/memcontrol: remove unused memcg_stock code Joshua Hahn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260410210742.550489-1-joshua.hahnjy@gmail.com \
    --to=joshua.hahnjy@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox