From: Joshua Hahn <joshua.hahnjy@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R . Howlett" <liam.howlett@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
cgroups@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: [PATCH 1/8 RFC] mm/page_counter: introduce per-page_counter stock
Date: Fri, 10 Apr 2026 14:06:55 -0700 [thread overview]
Message-ID: <20260410210742.550489-2-joshua.hahnjy@gmail.com> (raw)
In-Reply-To: <20260410210742.550489-1-joshua.hahnjy@gmail.com>
In order to avoid expensive hierarchy walks on every memcg charge and
limit check, memcontrol uses per-cpu stocks (memcg_stock_pcp) to cache
pre-charged pages and introduce a fast path to try_charge_memcg.
However, there are a few quirks with the current implementation that
could be improved upon.
First, each memcg_stock_pcp can only cache the charges of 7 memcgs
(defined as NR_MEMCG_STOCK), which means that once a CPU starts handling
the charging of more than 7 memcgs, it randomly selects a victim memcg
to evict and drain from the cpu, which can cause unnecessarily increased
latencies and thrashing as memcgs continually evict each others' stock.
Second, stock is tightly coupled with memcg, which means that all page
counters in a memcg share the same resource. This may simplify some of
the charging logic, but it prevents new page counters from being added
and using a separate stock.
We can address these concerns by pushing the concept of stock down to
the page_counter level, which addresses the random eviction problem by
getting rid of the 7 slot limit, and makes enabling separate stock
caches for other page_counters simpler.
Introduce a generic per-cpu stock directly in struct page_counter.
Stock can optionally be enabled per-page_counter, limiting the overhead
increase for page_counters who do not benefit greatly from caching
charges.
This patch introduces the page_counter_stock struct and its
enable/disable/free functions, but does not use these yet.
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
include/linux/page_counter.h | 13 ++++++++
mm/page_counter.c | 60 ++++++++++++++++++++++++++++++++++++
2 files changed, 73 insertions(+)
diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h
index d649b6bbbc871..c7e3ab3356d20 100644
--- a/include/linux/page_counter.h
+++ b/include/linux/page_counter.h
@@ -5,8 +5,15 @@
#include <linux/atomic.h>
#include <linux/cache.h>
#include <linux/limits.h>
+#include <linux/local_lock.h>
+#include <linux/percpu.h>
#include <asm/page.h>
+struct page_counter_stock {
+ local_trylock_t lock;
+ unsigned long nr_pages;
+};
+
struct page_counter {
/*
* Make sure 'usage' does not share cacheline with any other field in
@@ -41,6 +48,8 @@ struct page_counter {
unsigned long high;
unsigned long max;
struct page_counter *parent;
+ struct page_counter_stock __percpu *stock;
+ unsigned int batch;
} ____cacheline_internodealigned_in_smp;
#if BITS_PER_LONG == 32
@@ -99,6 +108,10 @@ static inline void page_counter_reset_watermark(struct page_counter *counter)
counter->watermark = usage;
}
+int page_counter_enable_stock(struct page_counter *counter, unsigned int batch);
+void page_counter_disable_stock(struct page_counter *counter);
+void page_counter_free_stock(struct page_counter *counter);
+
#if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM)
void page_counter_calculate_protection(struct page_counter *root,
struct page_counter *counter,
diff --git a/mm/page_counter.c b/mm/page_counter.c
index 661e0f2a5127a..965021993e161 100644
--- a/mm/page_counter.c
+++ b/mm/page_counter.c
@@ -8,6 +8,7 @@
#include <linux/page_counter.h>
#include <linux/atomic.h>
#include <linux/kernel.h>
+#include <linux/percpu.h>
#include <linux/string.h>
#include <linux/sched.h>
#include <linux/bug.h>
@@ -289,6 +290,65 @@ int page_counter_memparse(const char *buf, const char *max,
return 0;
}
+int page_counter_enable_stock(struct page_counter *counter, unsigned int batch)
+{
+ struct page_counter_stock __percpu *stock;
+ int cpu;
+
+ stock = alloc_percpu(struct page_counter_stock);
+ if (!stock)
+ return -ENOMEM;
+
+ for_each_possible_cpu(cpu) {
+ struct page_counter_stock *s = per_cpu_ptr(stock, cpu);
+
+ local_trylock_init(&s->lock);
+ }
+ counter->stock = stock;
+ counter->batch = batch;
+
+ return 0;
+}
+
+void page_counter_disable_stock(struct page_counter *counter)
+{
+ unsigned int stock_to_drain = 0;
+ int cpu;
+
+ if (!counter->stock)
+ return;
+
+ for_each_possible_cpu(cpu) {
+ struct page_counter_stock *stock;
+
+ /*
+ * No need for local lock; this is called during css_offline,
+ * after the cgroup has already been removed.
+ */
+ stock = per_cpu_ptr(counter->stock, cpu);
+ stock_to_drain += stock->nr_pages;
+ }
+
+ if (stock_to_drain) {
+ struct page_counter *c;
+
+ for (c = counter; c; c = c->parent)
+ page_counter_cancel(c, stock_to_drain);
+ }
+
+ /* This prevents future charges from trying to deposit pages */
+ counter->batch = 0;
+}
+
+void page_counter_free_stock(struct page_counter *counter)
+{
+ if (!counter->stock)
+ return;
+
+ free_percpu(counter->stock);
+ counter->stock = NULL;
+}
+
#if IS_ENABLED(CONFIG_MEMCG) || IS_ENABLED(CONFIG_CGROUP_DMEM)
/*
--
2.52.0
next prev parent reply other threads:[~2026-04-10 21:07 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-10 21:06 [PATCH 0/8 RFC] mm/memcontrol, page_counter: move stock from mem_cgroup to page_counter Joshua Hahn
2026-04-10 21:06 ` Joshua Hahn [this message]
2026-04-10 21:06 ` [PATCH 2/8 RFC] mm/page_counter: use page_counter_stock in page_counter_try_charge Joshua Hahn
2026-04-10 21:06 ` [PATCH 3/8 RFC] mm/page_counter: use page_counter_stock in page_counter_uncharge Joshua Hahn
2026-04-10 21:06 ` [PATCH 4/8 RFC] mm/page_counter: introduce stock drain APIs Joshua Hahn
2026-04-10 21:06 ` [PATCH 5/8 RFC] mm/memcontrol: convert memcg to use page_counter_stock Joshua Hahn
2026-04-10 21:07 ` [PATCH 6/8 RFC] mm/memcontrol: optimize memsw stock for cgroup v1 Joshua Hahn
2026-04-10 21:07 ` [PATCH 7/8 RFC] mm/memcontrol: optimize stock usage for cgroup v2 Joshua Hahn
2026-04-10 21:07 ` [PATCH 8/8 RFC] mm/memcontrol: remove unused memcg_stock code Joshua Hahn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260410210742.550489-2-joshua.hahnjy@gmail.com \
--to=joshua.hahnjy@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=liam.howlett@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox