* [PATCH 0/9] memcg: cleanup per-cpu stock
@ 2025-03-15 17:49 Shakeel Butt
2025-03-15 17:49 ` [PATCH 1/9] memcg: remove root memcg check from refill_stock Shakeel Butt
` (10 more replies)
0 siblings, 11 replies; 33+ messages in thread
From: Shakeel Butt @ 2025-03-15 17:49 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
This is a cleanup series which is trying to simplify the memcg per-cpu
stock code, particularly it tries to remove unnecessary dependencies on
local_lock of per-cpu memcg stock. The eight patch from Vlastimil
optimizes the charge path by combining the charging and accounting.
This series is based on next-20250314 plus two following patches:
Link: https://lore.kernel.org/all/20250312222552.3284173-1-shakeel.butt@linux.dev/
Link: https://lore.kernel.org/all/20250313054812.2185900-1-shakeel.butt@linux.dev/
Shakeel Butt (8):
memcg: remove root memcg check from refill_stock
memcg: decouple drain_obj_stock from local stock
memcg: introduce memcg_uncharge
memcg: manually inline __refill_stock
memcg: no refilling stock from obj_cgroup_release
memcg: do obj_cgroup_put inside drain_obj_stock
memcg: use __mod_memcg_state in drain_obj_stock
memcg: manually inline replace_stock_objcg
Vlastimil Babka (1):
memcg: combine slab obj stock charging and accounting
mm/memcontrol.c | 195 +++++++++++++++++++++++-------------------------
1 file changed, 95 insertions(+), 100 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 1/9] memcg: remove root memcg check from refill_stock
2025-03-15 17:49 [PATCH 0/9] memcg: cleanup per-cpu stock Shakeel Butt
@ 2025-03-15 17:49 ` Shakeel Butt
2025-03-18 0:39 ` Roman Gushchin
2025-03-18 7:59 ` Vlastimil Babka
2025-03-15 17:49 ` [PATCH 2/9] memcg: decouple drain_obj_stock from local stock Shakeel Butt
` (9 subsequent siblings)
10 siblings, 2 replies; 33+ messages in thread
From: Shakeel Butt @ 2025-03-15 17:49 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
refill_stock can not be called with root memcg, so there is no need to
check it.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b29433eb17fa..c09a32e93d39 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1883,6 +1883,7 @@ static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
drain_stock(stock);
}
+/* Should never be called with root_mem_cgroup. */
static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
{
unsigned long flags;
@@ -1892,8 +1893,6 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
* In case of unlikely failure to lock percpu stock_lock
* uncharge memcg directly.
*/
- if (mem_cgroup_is_root(memcg))
- return;
page_counter_uncharge(&memcg->memory, nr_pages);
if (do_memsw_account())
page_counter_uncharge(&memcg->memsw, nr_pages);
--
2.47.1
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 2/9] memcg: decouple drain_obj_stock from local stock
2025-03-15 17:49 [PATCH 0/9] memcg: cleanup per-cpu stock Shakeel Butt
2025-03-15 17:49 ` [PATCH 1/9] memcg: remove root memcg check from refill_stock Shakeel Butt
@ 2025-03-15 17:49 ` Shakeel Butt
2025-03-18 0:44 ` Roman Gushchin
2025-03-15 17:49 ` [PATCH 3/9] memcg: introduce memcg_uncharge Shakeel Butt
` (8 subsequent siblings)
10 siblings, 1 reply; 33+ messages in thread
From: Shakeel Butt @ 2025-03-15 17:49 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
Currently drain_obj_stock() can potentially call __refill_stock which
accesses local cpu stock and thus requires memcg stock's local_lock.
However if we look at the code paths leading to drain_obj_stock(), there
is never a good reason to refill the memcg stock at all from it.
At the moment, drain_obj_stock can be called from reclaim, hotplug cpu
teardown, mod_objcg_state() and refill_obj_stock(). For reclaim and
hotplug there is no need to refill. For the other two paths, most
probably the newly switched objcg would be used in near future and thus
no need to refill stock with the older objcg.
In addition, __refill_stock() from drain_obj_stock() happens on rare
cases, so performance is not really an issue. Let's just uncharge
directly instead of refill which will also decouple drain_obj_stock from
local cpu stock and local_lock requirements.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/memcontrol.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c09a32e93d39..28cb75b5bc66 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2855,7 +2855,12 @@ static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
memcg1_account_kmem(memcg, -nr_pages);
- __refill_stock(memcg, nr_pages);
+ if (!mem_cgroup_is_root(memcg)) {
+ page_counter_uncharge(&memcg->memory, nr_pages);
+ if (do_memsw_account())
+ page_counter_uncharge(&memcg->memsw,
+ nr_pages);
+ }
css_put(&memcg->css);
}
--
2.47.1
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 3/9] memcg: introduce memcg_uncharge
2025-03-15 17:49 [PATCH 0/9] memcg: cleanup per-cpu stock Shakeel Butt
2025-03-15 17:49 ` [PATCH 1/9] memcg: remove root memcg check from refill_stock Shakeel Butt
2025-03-15 17:49 ` [PATCH 2/9] memcg: decouple drain_obj_stock from local stock Shakeel Butt
@ 2025-03-15 17:49 ` Shakeel Butt
2025-03-18 0:50 ` Roman Gushchin
2025-03-15 17:49 ` [PATCH 4/9] memcg: manually inline __refill_stock Shakeel Butt
` (7 subsequent siblings)
10 siblings, 1 reply; 33+ messages in thread
From: Shakeel Butt @ 2025-03-15 17:49 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
At multiple places in memcontrol.c, the memory and memsw page counters
are being uncharged. This is error-prone. Let's move the functionality
to a newly introduced memcg_uncharge and call it from all those places.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/memcontrol.c | 28 ++++++++++++----------------
1 file changed, 12 insertions(+), 16 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 28cb75b5bc66..b54e3a1d23bd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1816,6 +1816,13 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
return ret;
}
+static void memcg_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+ page_counter_uncharge(&memcg->memory, nr_pages);
+ if (do_memsw_account())
+ page_counter_uncharge(&memcg->memsw, nr_pages);
+}
+
/*
* Returns stocks cached in percpu and reset cached information.
*/
@@ -1828,10 +1835,7 @@ static void drain_stock(struct memcg_stock_pcp *stock)
return;
if (stock_pages) {
- page_counter_uncharge(&old->memory, stock_pages);
- if (do_memsw_account())
- page_counter_uncharge(&old->memsw, stock_pages);
-
+ memcg_uncharge(old, stock_pages);
WRITE_ONCE(stock->nr_pages, 0);
}
@@ -1893,9 +1897,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
* In case of unlikely failure to lock percpu stock_lock
* uncharge memcg directly.
*/
- page_counter_uncharge(&memcg->memory, nr_pages);
- if (do_memsw_account())
- page_counter_uncharge(&memcg->memsw, nr_pages);
+ memcg_uncharge(memcg, nr_pages);
return;
}
__refill_stock(memcg, nr_pages);
@@ -2855,12 +2857,8 @@ static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
memcg1_account_kmem(memcg, -nr_pages);
- if (!mem_cgroup_is_root(memcg)) {
- page_counter_uncharge(&memcg->memory, nr_pages);
- if (do_memsw_account())
- page_counter_uncharge(&memcg->memsw,
- nr_pages);
- }
+ if (!mem_cgroup_is_root(memcg))
+ memcg_uncharge(memcg, nr_pages);
css_put(&memcg->css);
}
@@ -4689,9 +4687,7 @@ static inline void uncharge_gather_clear(struct uncharge_gather *ug)
static void uncharge_batch(const struct uncharge_gather *ug)
{
if (ug->nr_memory) {
- page_counter_uncharge(&ug->memcg->memory, ug->nr_memory);
- if (do_memsw_account())
- page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory);
+ memcg_uncharge(ug->memcg, ug->nr_memory);
if (ug->nr_kmem) {
mod_memcg_state(ug->memcg, MEMCG_KMEM, -ug->nr_kmem);
memcg1_account_kmem(ug->memcg, -ug->nr_kmem);
--
2.47.1
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 4/9] memcg: manually inline __refill_stock
2025-03-15 17:49 [PATCH 0/9] memcg: cleanup per-cpu stock Shakeel Butt
` (2 preceding siblings ...)
2025-03-15 17:49 ` [PATCH 3/9] memcg: introduce memcg_uncharge Shakeel Butt
@ 2025-03-15 17:49 ` Shakeel Butt
2025-03-18 0:58 ` Roman Gushchin
2025-03-15 17:49 ` [PATCH 5/9] memcg: no refilling stock from obj_cgroup_release Shakeel Butt
` (6 subsequent siblings)
10 siblings, 1 reply; 33+ messages in thread
From: Shakeel Butt @ 2025-03-15 17:49 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
There are no more multiple callers of __refill_stock(), so simply inline
it to refill_stock().
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/memcontrol.c | 32 ++++++++++++--------------------
1 file changed, 12 insertions(+), 20 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b54e3a1d23bd..7054b0ebd207 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1865,14 +1865,21 @@ static void drain_local_stock(struct work_struct *dummy)
obj_cgroup_put(old);
}
-/*
- * Cache charges(val) to local per_cpu area.
- * This will be consumed by consume_stock() function, later.
- */
-static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
+/* Should never be called with root_mem_cgroup. */
+static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
{
struct memcg_stock_pcp *stock;
unsigned int stock_pages;
+ unsigned long flags;
+
+ if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
+ /*
+ * In case of unlikely failure to lock percpu stock_lock
+ * uncharge memcg directly.
+ */
+ memcg_uncharge(memcg, nr_pages);
+ return;
+ }
stock = this_cpu_ptr(&memcg_stock);
if (READ_ONCE(stock->cached) != memcg) { /* reset if necessary */
@@ -1885,22 +1892,7 @@ static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
if (stock_pages > MEMCG_CHARGE_BATCH)
drain_stock(stock);
-}
-/* Should never be called with root_mem_cgroup. */
-static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
- unsigned long flags;
-
- if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
- /*
- * In case of unlikely failure to lock percpu stock_lock
- * uncharge memcg directly.
- */
- memcg_uncharge(memcg, nr_pages);
- return;
- }
- __refill_stock(memcg, nr_pages);
localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
}
--
2.47.1
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 5/9] memcg: no refilling stock from obj_cgroup_release
2025-03-15 17:49 [PATCH 0/9] memcg: cleanup per-cpu stock Shakeel Butt
` (3 preceding siblings ...)
2025-03-15 17:49 ` [PATCH 4/9] memcg: manually inline __refill_stock Shakeel Butt
@ 2025-03-15 17:49 ` Shakeel Butt
2025-03-18 1:06 ` Roman Gushchin
2025-03-15 17:49 ` [PATCH 6/9] memcg: do obj_cgroup_put inside drain_obj_stock Shakeel Butt
` (5 subsequent siblings)
10 siblings, 1 reply; 33+ messages in thread
From: Shakeel Butt @ 2025-03-15 17:49 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
obj_cgroup_release is called when all the references to the objcg have
been released i.e. no more memory objects are pointing to it. Most
probably objcg->memcg will be pointing to some ancestor memcg. In
obj_cgroup_release(), the kernel calls obj_cgroup_uncharge_pages() which
refills the local stock.
There is no need to refill the local stock with some ancestor memcg and
flush the local stock. Let's decouple obj_cgroup_release() from the
local stock by uncharging instead of refilling. One additional benefit
of this change is that it removes the requirement to only call
obj_cgroup_put() outside of local_lock.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
mm/memcontrol.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7054b0ebd207..83db180455a1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -129,8 +129,7 @@ bool mem_cgroup_kmem_disabled(void)
return cgroup_memory_nokmem;
}
-static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
- unsigned int nr_pages);
+static void memcg_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages);
static void obj_cgroup_release(struct percpu_ref *ref)
{
@@ -163,8 +162,16 @@ static void obj_cgroup_release(struct percpu_ref *ref)
WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1));
nr_pages = nr_bytes >> PAGE_SHIFT;
- if (nr_pages)
- obj_cgroup_uncharge_pages(objcg, nr_pages);
+ if (nr_pages) {
+ struct mem_cgroup *memcg;
+
+ memcg = get_mem_cgroup_from_objcg(objcg);
+ mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
+ memcg1_account_kmem(memcg, -nr_pages);
+ if (!mem_cgroup_is_root(memcg))
+ memcg_uncharge(memcg, nr_pages);
+ css_put(&memcg->css);
+ }
spin_lock_irqsave(&objcg_lock, flags);
list_del(&objcg->list);
--
2.47.1
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 6/9] memcg: do obj_cgroup_put inside drain_obj_stock
2025-03-15 17:49 [PATCH 0/9] memcg: cleanup per-cpu stock Shakeel Butt
` (4 preceding siblings ...)
2025-03-15 17:49 ` [PATCH 5/9] memcg: no refilling stock from obj_cgroup_release Shakeel Butt
@ 2025-03-15 17:49 ` Shakeel Butt
2025-03-18 1:07 ` Roman Gushchin
2025-03-15 17:49 ` [PATCH 7/9] memcg: use __mod_memcg_state in drain_obj_stock Shakeel Butt
` (4 subsequent siblings)
10 siblings, 1 reply; 33+ messages in thread
From: Shakeel Butt @ 2025-03-15 17:49 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
Previously we could not call obj_cgroup_put() inside the local lock
because on the put on the last reference, the release function
obj_cgroup_release() may try to re-acquire the local lock. However that
chain has been broken. Now simply do obj_cgroup_put() inside
drain_obj_stock() instead of returning the old objcg.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 37 +++++++++++--------------------------
1 file changed, 11 insertions(+), 26 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 83db180455a1..3c4de384b5a0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1778,7 +1778,7 @@ static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock) = {
};
static DEFINE_MUTEX(percpu_charge_mutex);
-static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock);
+static void drain_obj_stock(struct memcg_stock_pcp *stock);
static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
struct mem_cgroup *root_memcg);
@@ -1853,7 +1853,6 @@ static void drain_stock(struct memcg_stock_pcp *stock)
static void drain_local_stock(struct work_struct *dummy)
{
struct memcg_stock_pcp *stock;
- struct obj_cgroup *old = NULL;
unsigned long flags;
/*
@@ -1864,12 +1863,11 @@ static void drain_local_stock(struct work_struct *dummy)
localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
stock = this_cpu_ptr(&memcg_stock);
- old = drain_obj_stock(stock);
+ drain_obj_stock(stock);
drain_stock(stock);
clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
- obj_cgroup_put(old);
}
/* Should never be called with root_mem_cgroup. */
@@ -1951,18 +1949,16 @@ void drain_all_stock(struct mem_cgroup *root_memcg)
static int memcg_hotplug_cpu_dead(unsigned int cpu)
{
struct memcg_stock_pcp *stock;
- struct obj_cgroup *old;
unsigned long flags;
stock = &per_cpu(memcg_stock, cpu);
/* drain_obj_stock requires stock_lock */
localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
- old = drain_obj_stock(stock);
+ drain_obj_stock(stock);
localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
drain_stock(stock);
- obj_cgroup_put(old);
return 0;
}
@@ -2745,24 +2741,20 @@ void __memcg_kmem_uncharge_page(struct page *page, int order)
}
/* Replace the stock objcg with objcg, return the old objcg */
-static struct obj_cgroup *replace_stock_objcg(struct memcg_stock_pcp *stock,
- struct obj_cgroup *objcg)
+static void replace_stock_objcg(struct memcg_stock_pcp *stock,
+ struct obj_cgroup *objcg)
{
- struct obj_cgroup *old = NULL;
-
- old = drain_obj_stock(stock);
+ drain_obj_stock(stock);
obj_cgroup_get(objcg);
stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
WRITE_ONCE(stock->cached_objcg, objcg);
- return old;
}
static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
enum node_stat_item idx, int nr)
{
struct memcg_stock_pcp *stock;
- struct obj_cgroup *old = NULL;
unsigned long flags;
int *bytes;
@@ -2775,7 +2767,7 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
* changes.
*/
if (READ_ONCE(stock->cached_objcg) != objcg) {
- old = replace_stock_objcg(stock, objcg);
+ replace_stock_objcg(stock, objcg);
stock->cached_pgdat = pgdat;
} else if (stock->cached_pgdat != pgdat) {
/* Flush the existing cached vmstat data */
@@ -2816,7 +2808,6 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
__mod_objcg_mlstate(objcg, pgdat, idx, nr);
localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
- obj_cgroup_put(old);
}
static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
@@ -2838,12 +2829,12 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
return ret;
}
-static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
+static void drain_obj_stock(struct memcg_stock_pcp *stock)
{
struct obj_cgroup *old = READ_ONCE(stock->cached_objcg);
if (!old)
- return NULL;
+ return;
if (stock->nr_bytes) {
unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT;
@@ -2896,11 +2887,7 @@ static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
}
WRITE_ONCE(stock->cached_objcg, NULL);
- /*
- * The `old' objects needs to be released by the caller via
- * obj_cgroup_put() outside of memcg_stock_pcp::stock_lock.
- */
- return old;
+ obj_cgroup_put(old);
}
static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
@@ -2922,7 +2909,6 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
bool allow_uncharge)
{
struct memcg_stock_pcp *stock;
- struct obj_cgroup *old = NULL;
unsigned long flags;
unsigned int nr_pages = 0;
@@ -2930,7 +2916,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
stock = this_cpu_ptr(&memcg_stock);
if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */
- old = replace_stock_objcg(stock, objcg);
+ replace_stock_objcg(stock, objcg);
allow_uncharge = true; /* Allow uncharge when objcg changes */
}
stock->nr_bytes += nr_bytes;
@@ -2941,7 +2927,6 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
}
localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
- obj_cgroup_put(old);
if (nr_pages)
obj_cgroup_uncharge_pages(objcg, nr_pages);
--
2.47.1
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 7/9] memcg: use __mod_memcg_state in drain_obj_stock
2025-03-15 17:49 [PATCH 0/9] memcg: cleanup per-cpu stock Shakeel Butt
` (5 preceding siblings ...)
2025-03-15 17:49 ` [PATCH 6/9] memcg: do obj_cgroup_put inside drain_obj_stock Shakeel Butt
@ 2025-03-15 17:49 ` Shakeel Butt
2025-03-17 20:56 ` Vlastimil Babka
2025-03-18 1:13 ` Roman Gushchin
2025-03-15 17:49 ` [PATCH 8/9] memcg: combine slab obj stock charging and accounting Shakeel Butt
` (3 subsequent siblings)
10 siblings, 2 replies; 33+ messages in thread
From: Shakeel Butt @ 2025-03-15 17:49 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
For non-PREEMPT_RT kernels, drain_obj_stock() is always called with irq
disabled, so we can use __mod_memcg_state() instead of
mod_memcg_state(). For PREEMPT_RT, we need to add memcg_stats_[un]lock
in __mod_memcg_state().
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
mm/memcontrol.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3c4de384b5a0..dfe9c2eb7816 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -707,10 +707,12 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, idx))
return;
+ memcg_stats_lock();
__this_cpu_add(memcg->vmstats_percpu->state[i], val);
val = memcg_state_val_in_pages(idx, val);
memcg_rstat_updated(memcg, val);
trace_mod_memcg_state(memcg, idx, val);
+ memcg_stats_unlock();
}
#ifdef CONFIG_MEMCG_V1
@@ -2845,7 +2847,7 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
memcg = get_mem_cgroup_from_objcg(old);
- mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
+ __mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
memcg1_account_kmem(memcg, -nr_pages);
if (!mem_cgroup_is_root(memcg))
memcg_uncharge(memcg, nr_pages);
--
2.47.1
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 8/9] memcg: combine slab obj stock charging and accounting
2025-03-15 17:49 [PATCH 0/9] memcg: cleanup per-cpu stock Shakeel Butt
` (6 preceding siblings ...)
2025-03-15 17:49 ` [PATCH 7/9] memcg: use __mod_memcg_state in drain_obj_stock Shakeel Butt
@ 2025-03-15 17:49 ` Shakeel Butt
2025-03-18 1:20 ` Roman Gushchin
2025-03-15 17:49 ` [PATCH 9/9] memcg: manually inline replace_stock_objcg Shakeel Butt
` (2 subsequent siblings)
10 siblings, 1 reply; 33+ messages in thread
From: Shakeel Butt @ 2025-03-15 17:49 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
From: Vlastimil Babka <vbabka@suse.cz>
When handing slab objects, we use obj_cgroup_[un]charge() for
(un)charging and mod_objcg_state() to account NR_SLAB_[UN]RECLAIMABLE_B.
All these operations use the percpu stock for performance. However with
the calls being separate, the stock_lock is taken twice in each case.
By refactoring the code, we can turn mod_objcg_state() into
__account_obj_stock() which is called on a stock that's already locked
and validated. On the charging side we can call this function from
consume_obj_stock() when it succeeds, and refill_obj_stock() in the
fallback. We just expand parameters of these functions as necessary.
The uncharge side from __memcg_slab_free_hook() is just the call to
refill_obj_stock().
Other callers of obj_cgroup_[un]charge() (i.e. not slab) simply pass the
extra parameters as NULL/zeroes to skip the __account_obj_stock()
operation.
In __memcg_slab_post_alloc_hook() we now charge each object separately,
but that's not a problem as we did call mod_objcg_state() for each
object separately, and most allocations are non-bulk anyway. This
could be improved by batching all operations until slab_pgdat(slab)
changes.
Some preliminary benchmarking with a kfree(kmalloc()) loop of 10M
iterations with/without __GFP_ACCOUNT:
Before the patch:
kmalloc/kfree !memcg: 581390144 cycles
kmalloc/kfree memcg: 783689984 cycles
After the patch:
kmalloc/kfree memcg: 658723808 cycles
More than half of the overhead of __GFP_ACCOUNT relative to
non-accounted case seems eliminated.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 77 +++++++++++++++++++++++++++++--------------------
1 file changed, 46 insertions(+), 31 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index dfe9c2eb7816..553eb1d7250a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2753,25 +2753,17 @@ static void replace_stock_objcg(struct memcg_stock_pcp *stock,
WRITE_ONCE(stock->cached_objcg, objcg);
}
-static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
- enum node_stat_item idx, int nr)
+static void __account_obj_stock(struct obj_cgroup *objcg,
+ struct memcg_stock_pcp *stock, int nr,
+ struct pglist_data *pgdat, enum node_stat_item idx)
{
- struct memcg_stock_pcp *stock;
- unsigned long flags;
int *bytes;
- localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
- stock = this_cpu_ptr(&memcg_stock);
-
/*
* Save vmstat data in stock and skip vmstat array update unless
- * accumulating over a page of vmstat data or when pgdat or idx
- * changes.
+ * accumulating over a page of vmstat data or when pgdat changes.
*/
- if (READ_ONCE(stock->cached_objcg) != objcg) {
- replace_stock_objcg(stock, objcg);
- stock->cached_pgdat = pgdat;
- } else if (stock->cached_pgdat != pgdat) {
+ if (stock->cached_pgdat != pgdat) {
/* Flush the existing cached vmstat data */
struct pglist_data *oldpg = stock->cached_pgdat;
@@ -2808,11 +2800,10 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
}
if (nr)
__mod_objcg_mlstate(objcg, pgdat, idx, nr);
-
- localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
}
-static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
+static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
+ struct pglist_data *pgdat, enum node_stat_item idx)
{
struct memcg_stock_pcp *stock;
unsigned long flags;
@@ -2824,6 +2815,9 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
if (objcg == READ_ONCE(stock->cached_objcg) && stock->nr_bytes >= nr_bytes) {
stock->nr_bytes -= nr_bytes;
ret = true;
+
+ if (pgdat)
+ __account_obj_stock(objcg, stock, nr_bytes, pgdat, idx);
}
localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
@@ -2908,7 +2902,8 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
}
static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
- bool allow_uncharge)
+ bool allow_uncharge, int nr_acct, struct pglist_data *pgdat,
+ enum node_stat_item idx)
{
struct memcg_stock_pcp *stock;
unsigned long flags;
@@ -2923,6 +2918,9 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
}
stock->nr_bytes += nr_bytes;
+ if (pgdat)
+ __account_obj_stock(objcg, stock, nr_acct, pgdat, idx);
+
if (allow_uncharge && (stock->nr_bytes > PAGE_SIZE)) {
nr_pages = stock->nr_bytes >> PAGE_SHIFT;
stock->nr_bytes &= (PAGE_SIZE - 1);
@@ -2934,12 +2932,13 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
obj_cgroup_uncharge_pages(objcg, nr_pages);
}
-int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
+static int obj_cgroup_charge_account(struct obj_cgroup *objcg, gfp_t gfp, size_t size,
+ struct pglist_data *pgdat, enum node_stat_item idx)
{
unsigned int nr_pages, nr_bytes;
int ret;
- if (consume_obj_stock(objcg, size))
+ if (likely(consume_obj_stock(objcg, size, pgdat, idx)))
return 0;
/*
@@ -2972,15 +2971,21 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
nr_pages += 1;
ret = obj_cgroup_charge_pages(objcg, gfp, nr_pages);
- if (!ret && nr_bytes)
- refill_obj_stock(objcg, PAGE_SIZE - nr_bytes, false);
+ if (!ret && (nr_bytes || pgdat))
+ refill_obj_stock(objcg, nr_bytes ? PAGE_SIZE - nr_bytes : 0,
+ false, size, pgdat, idx);
return ret;
}
+int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
+{
+ return obj_cgroup_charge_account(objcg, gfp, size, NULL, 0);
+}
+
void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
{
- refill_obj_stock(objcg, size, true);
+ refill_obj_stock(objcg, size, true, 0, NULL, 0);
}
static inline size_t obj_full_size(struct kmem_cache *s)
@@ -3032,23 +3037,32 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
return false;
}
- if (obj_cgroup_charge(objcg, flags, size * obj_full_size(s)))
- return false;
-
for (i = 0; i < size; i++) {
slab = virt_to_slab(p[i]);
if (!slab_obj_exts(slab) &&
alloc_slab_obj_exts(slab, s, flags, false)) {
- obj_cgroup_uncharge(objcg, obj_full_size(s));
continue;
}
+ /*
+ * if we fail and size is 1, memcg_alloc_abort_single() will
+ * just free the object, which is ok as we have not assigned
+ * objcg to its obj_ext yet
+ *
+ * for larger sizes, kmem_cache_free_bulk() will uncharge
+ * any objects that were already charged and obj_ext assigned
+ *
+ * TODO: we could batch this until slab_pgdat(slab) changes
+ * between iterations, with a more complicated undo
+ */
+ if (obj_cgroup_charge_account(objcg, flags, obj_full_size(s),
+ slab_pgdat(slab), cache_vmstat_idx(s)))
+ return false;
+
off = obj_to_index(s, slab, p[i]);
obj_cgroup_get(objcg);
slab_obj_exts(slab)[off].objcg = objcg;
- mod_objcg_state(objcg, slab_pgdat(slab),
- cache_vmstat_idx(s), obj_full_size(s));
}
return true;
@@ -3057,6 +3071,8 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
void **p, int objects, struct slabobj_ext *obj_exts)
{
+ size_t obj_size = obj_full_size(s);
+
for (int i = 0; i < objects; i++) {
struct obj_cgroup *objcg;
unsigned int off;
@@ -3067,9 +3083,8 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
continue;
obj_exts[off].objcg = NULL;
- obj_cgroup_uncharge(objcg, obj_full_size(s));
- mod_objcg_state(objcg, slab_pgdat(slab), cache_vmstat_idx(s),
- -obj_full_size(s));
+ refill_obj_stock(objcg, obj_size, true, -obj_size,
+ slab_pgdat(slab), cache_vmstat_idx(s));
obj_cgroup_put(objcg);
}
}
--
2.47.1
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 9/9] memcg: manually inline replace_stock_objcg
2025-03-15 17:49 [PATCH 0/9] memcg: cleanup per-cpu stock Shakeel Butt
` (7 preceding siblings ...)
2025-03-15 17:49 ` [PATCH 8/9] memcg: combine slab obj stock charging and accounting Shakeel Butt
@ 2025-03-15 17:49 ` Shakeel Butt
2025-03-18 1:21 ` Roman Gushchin
2025-03-18 8:00 ` Vlastimil Babka
2025-03-16 3:57 ` [PATCH 0/9] memcg: cleanup per-cpu stock Andrew Morton
2025-04-02 20:40 ` Shakeel Butt
10 siblings, 2 replies; 33+ messages in thread
From: Shakeel Butt @ 2025-03-15 17:49 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
The replace_stock_objcg() is being called by only refill_obj_stock, so
manually inline it.
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/memcontrol.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 553eb1d7250a..f6e3fc418866 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2742,17 +2742,6 @@ void __memcg_kmem_uncharge_page(struct page *page, int order)
obj_cgroup_put(objcg);
}
-/* Replace the stock objcg with objcg, return the old objcg */
-static void replace_stock_objcg(struct memcg_stock_pcp *stock,
- struct obj_cgroup *objcg)
-{
- drain_obj_stock(stock);
- obj_cgroup_get(objcg);
- stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
- ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
- WRITE_ONCE(stock->cached_objcg, objcg);
-}
-
static void __account_obj_stock(struct obj_cgroup *objcg,
struct memcg_stock_pcp *stock, int nr,
struct pglist_data *pgdat, enum node_stat_item idx)
@@ -2913,7 +2902,12 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
stock = this_cpu_ptr(&memcg_stock);
if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */
- replace_stock_objcg(stock, objcg);
+ drain_obj_stock(stock);
+ obj_cgroup_get(objcg);
+ stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
+ ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
+ WRITE_ONCE(stock->cached_objcg, objcg);
+
allow_uncharge = true; /* Allow uncharge when objcg changes */
}
stock->nr_bytes += nr_bytes;
--
2.47.1
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 0/9] memcg: cleanup per-cpu stock
2025-03-15 17:49 [PATCH 0/9] memcg: cleanup per-cpu stock Shakeel Butt
` (8 preceding siblings ...)
2025-03-15 17:49 ` [PATCH 9/9] memcg: manually inline replace_stock_objcg Shakeel Butt
@ 2025-03-16 3:57 ` Andrew Morton
2025-03-16 4:43 ` Shakeel Butt
2025-03-16 15:59 ` Alexei Starovoitov
2025-04-02 20:40 ` Shakeel Butt
10 siblings, 2 replies; 33+ messages in thread
From: Andrew Morton @ 2025-03-16 3:57 UTC (permalink / raw)
To: Shakeel Butt
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
On Sat, 15 Mar 2025 10:49:21 -0700 Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> This is a cleanup series which is trying to simplify the memcg per-cpu
> stock code, particularly it tries to remove unnecessary dependencies on
> local_lock of per-cpu memcg stock. The eight patch from Vlastimil
> optimizes the charge path by combining the charging and accounting.
>
> This series is based on next-20250314 plus two following patches:
>
> Link: https://lore.kernel.org/all/20250312222552.3284173-1-shakeel.butt@linux.dev/
> Link: https://lore.kernel.org/all/20250313054812.2185900-1-shakeel.butt@linux.dev/
Unfortunately the bpf tree has been making changes in the same area of
memcontrol.c. 01d37228d331 ("memcg: Use trylock to access memcg
stock_lock.")
Sigh. We're at -rc7 and I don't think it's worth working around that
for a cleanup series. So I'm inclined to just defer this series until
the next -rc cycle.
If BPF merges reasonably early in the next merge window then please
promptly send this along and I should be able to squeak it into
6.15-rc1.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 0/9] memcg: cleanup per-cpu stock
2025-03-16 3:57 ` [PATCH 0/9] memcg: cleanup per-cpu stock Andrew Morton
@ 2025-03-16 4:43 ` Shakeel Butt
2025-03-16 15:59 ` Alexei Starovoitov
1 sibling, 0 replies; 33+ messages in thread
From: Shakeel Butt @ 2025-03-16 4:43 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
On Sat, Mar 15, 2025 at 08:57:59PM -0700, Andrew Morton wrote:
> On Sat, 15 Mar 2025 10:49:21 -0700 Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> >
> > This is a cleanup series which is trying to simplify the memcg per-cpu
> > stock code, particularly it tries to remove unnecessary dependencies on
> > local_lock of per-cpu memcg stock. The eight patch from Vlastimil
> > optimizes the charge path by combining the charging and accounting.
> >
> > This series is based on next-20250314 plus two following patches:
> >
> > Link: https://lore.kernel.org/all/20250312222552.3284173-1-shakeel.butt@linux.dev/
> > Link: https://lore.kernel.org/all/20250313054812.2185900-1-shakeel.butt@linux.dev/
>
> Unfortunately the bpf tree has been making changes in the same area of
> memcontrol.c. 01d37228d331 ("memcg: Use trylock to access memcg
> stock_lock.")
>
> Sigh. We're at -rc7 and I don't think it's worth working around that
> for a cleanup series. So I'm inclined to just defer this series until
> the next -rc cycle.
>
> If BPF merges reasonably early in the next merge window then please
> promptly send this along and I should be able to squeak it into
> 6.15-rc1.
>
Seems reasonable to me and thanks for taking a look.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 0/9] memcg: cleanup per-cpu stock
2025-03-16 3:57 ` [PATCH 0/9] memcg: cleanup per-cpu stock Andrew Morton
2025-03-16 4:43 ` Shakeel Butt
@ 2025-03-16 15:59 ` Alexei Starovoitov
2025-03-17 18:11 ` Shakeel Butt
2025-03-17 20:27 ` Andrew Morton
1 sibling, 2 replies; 33+ messages in thread
From: Alexei Starovoitov @ 2025-03-16 15:59 UTC (permalink / raw)
To: Andrew Morton
Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, Roman Gushchin,
Muchun Song, Vlastimil Babka, Sebastian Andrzej Siewior,
linux-mm, cgroups, linux-kernel, Meta kernel team
On Sat, Mar 15, 2025 at 08:57:59PM -0700, Andrew Morton wrote:
> On Sat, 15 Mar 2025 10:49:21 -0700 Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> >
> > This is a cleanup series which is trying to simplify the memcg per-cpu
> > stock code, particularly it tries to remove unnecessary dependencies on
> > local_lock of per-cpu memcg stock. The eight patch from Vlastimil
> > optimizes the charge path by combining the charging and accounting.
> >
> > This series is based on next-20250314 plus two following patches:
> >
> > Link: https://lore.kernel.org/all/20250312222552.3284173-1-shakeel.butt@linux.dev/
> > Link: https://lore.kernel.org/all/20250313054812.2185900-1-shakeel.butt@linux.dev/
>
> Unfortunately the bpf tree has been making changes in the same area of
> memcontrol.c. 01d37228d331 ("memcg: Use trylock to access memcg
> stock_lock.")
>
> Sigh. We're at -rc7 and I don't think it's worth working around that
> for a cleanup series. So I'm inclined to just defer this series until
> the next -rc cycle.
>
> If BPF merges reasonably early in the next merge window then please
> promptly send this along and I should be able to squeak it into
> 6.15-rc1.
Ohh. I didn't realize that try_alloc changes are causing so much trouble.
Sorry about that.
Andrew,
could you please instead take bpf-next.git try_alloc_pages branch
into your tree and resolve two trivial conflicts:
1. https://lore.kernel.org/bpf/20250311120422.1d9a8f80@canb.auug.org.au/
2. https://lore.kernel.org/bpf/20250312145247.380c2aa5@canb.auug.org.au/
There are 7 commits there.
You can also squash Vlastimil's fix
"Fix the flipped condition in gfpflags_allow_spinning" into
"Introduce try_alloc_pages" patch or keep everything as-is.
I'll drop it from bpf-next right after.
Then Shakeel can rebase/resend his set without conflicts and everything
will be nicely ready for the merge window.
I'll defer other bpf side things to after merge window when trees converge.
Thanks!
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 0/9] memcg: cleanup per-cpu stock
2025-03-16 15:59 ` Alexei Starovoitov
@ 2025-03-17 18:11 ` Shakeel Butt
2025-03-17 20:27 ` Andrew Morton
1 sibling, 0 replies; 33+ messages in thread
From: Shakeel Butt @ 2025-03-17 18:11 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
Muchun Song, Vlastimil Babka, Sebastian Andrzej Siewior,
linux-mm, cgroups, linux-kernel, Meta kernel team
On Sun, Mar 16, 2025 at 08:59:20AM -0700, Alexei Starovoitov wrote:
> On Sat, Mar 15, 2025 at 08:57:59PM -0700, Andrew Morton wrote:
> > On Sat, 15 Mar 2025 10:49:21 -0700 Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > >
> > > This is a cleanup series which is trying to simplify the memcg per-cpu
> > > stock code, particularly it tries to remove unnecessary dependencies on
> > > local_lock of per-cpu memcg stock. The eight patch from Vlastimil
> > > optimizes the charge path by combining the charging and accounting.
> > >
> > > This series is based on next-20250314 plus two following patches:
> > >
> > > Link: https://lore.kernel.org/all/20250312222552.3284173-1-shakeel.butt@linux.dev/
> > > Link: https://lore.kernel.org/all/20250313054812.2185900-1-shakeel.butt@linux.dev/
> >
> > Unfortunately the bpf tree has been making changes in the same area of
> > memcontrol.c. 01d37228d331 ("memcg: Use trylock to access memcg
> > stock_lock.")
> >
> > Sigh. We're at -rc7 and I don't think it's worth working around that
> > for a cleanup series. So I'm inclined to just defer this series until
> > the next -rc cycle.
> >
> > If BPF merges reasonably early in the next merge window then please
> > promptly send this along and I should be able to squeak it into
> > 6.15-rc1.
>
> Ohh. I didn't realize that try_alloc changes are causing so much trouble.
> Sorry about that.
>
> Andrew,
>
> could you please instead take bpf-next.git try_alloc_pages branch
> into your tree and resolve two trivial conflicts:
> 1. https://lore.kernel.org/bpf/20250311120422.1d9a8f80@canb.auug.org.au/
> 2. https://lore.kernel.org/bpf/20250312145247.380c2aa5@canb.auug.org.au/
> There are 7 commits there.
> You can also squash Vlastimil's fix
> "Fix the flipped condition in gfpflags_allow_spinning" into
> "Introduce try_alloc_pages" patch or keep everything as-is.
>
> I'll drop it from bpf-next right after.
>
> Then Shakeel can rebase/resend his set without conflicts and everything
> will be nicely ready for the merge window.
Thanks Alexei.
Andrew, if you decide to take try_alloc_pages branch into mm-tree, then
this series should apply cleanly as it is based on next-20250314 which
already have all the try_alloc_pages patches and the conflict
resolutions.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 0/9] memcg: cleanup per-cpu stock
2025-03-16 15:59 ` Alexei Starovoitov
2025-03-17 18:11 ` Shakeel Butt
@ 2025-03-17 20:27 ` Andrew Morton
1 sibling, 0 replies; 33+ messages in thread
From: Andrew Morton @ 2025-03-17 20:27 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Shakeel Butt, Johannes Weiner, Michal Hocko, Roman Gushchin,
Muchun Song, Vlastimil Babka, Sebastian Andrzej Siewior,
linux-mm, cgroups, linux-kernel, Meta kernel team
On Sun, 16 Mar 2025 08:59:20 -0700 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> On Sat, Mar 15, 2025 at 08:57:59PM -0700, Andrew Morton wrote:
> > On Sat, 15 Mar 2025 10:49:21 -0700 Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > >
> > > This is a cleanup series which is trying to simplify the memcg per-cpu
> > > stock code, particularly it tries to remove unnecessary dependencies on
> > > local_lock of per-cpu memcg stock. The eight patch from Vlastimil
> > > optimizes the charge path by combining the charging and accounting.
> > >
> > > This series is based on next-20250314 plus two following patches:
> > >
> > > Link: https://lore.kernel.org/all/20250312222552.3284173-1-shakeel.butt@linux.dev/
> > > Link: https://lore.kernel.org/all/20250313054812.2185900-1-shakeel.butt@linux.dev/
> >
> > Unfortunately the bpf tree has been making changes in the same area of
> > memcontrol.c. 01d37228d331 ("memcg: Use trylock to access memcg
> > stock_lock.")
> >
> > Sigh. We're at -rc7 and I don't think it's worth working around that
> > for a cleanup series. So I'm inclined to just defer this series until
> > the next -rc cycle.
> >
> > If BPF merges reasonably early in the next merge window then please
> > promptly send this along and I should be able to squeak it into
> > 6.15-rc1.
>
> Ohh. I didn't realize that try_alloc changes are causing so much trouble.
> Sorry about that.
>
> Andrew,
>
> could you please instead take bpf-next.git try_alloc_pages branch
> into your tree and resolve two trivial conflicts:
> 1. https://lore.kernel.org/bpf/20250311120422.1d9a8f80@canb.auug.org.au/
> 2. https://lore.kernel.org/bpf/20250312145247.380c2aa5@canb.auug.org.au/
> There are 7 commits there.
> You can also squash Vlastimil's fix
> "Fix the flipped condition in gfpflags_allow_spinning" into
> "Introduce try_alloc_pages" patch or keep everything as-is.
>
> I'll drop it from bpf-next right after.
>
> Then Shakeel can rebase/resend his set without conflicts and everything
> will be nicely ready for the merge window.
>
> I'll defer other bpf side things to after merge window when trees converge.
Let's just leave things as they are, please. It's only a cleanup
series and merging cleanups after -rc7 is rather dubious even without
issues such as these.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 7/9] memcg: use __mod_memcg_state in drain_obj_stock
2025-03-15 17:49 ` [PATCH 7/9] memcg: use __mod_memcg_state in drain_obj_stock Shakeel Butt
@ 2025-03-17 20:56 ` Vlastimil Babka
2025-03-17 21:54 ` Shakeel Butt
2025-03-18 1:13 ` Roman Gushchin
1 sibling, 1 reply; 33+ messages in thread
From: Vlastimil Babka @ 2025-03-17 20:56 UTC (permalink / raw)
To: Shakeel Butt, Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Sebastian Andrzej Siewior, linux-mm, cgroups, linux-kernel,
Meta kernel team
On 3/15/25 18:49, Shakeel Butt wrote:
> For non-PREEMPT_RT kernels, drain_obj_stock() is always called with irq
> disabled, so we can use __mod_memcg_state() instead of
> mod_memcg_state(). For PREEMPT_RT, we need to add memcg_stats_[un]lock
> in __mod_memcg_state().
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
I've asked in the RFC and from Sebastian's answer I think my question was
misunderstod, so let me try again.
After this patch we'll have from mod_memcg_state():
mod_memcg_state()
local_irq_save(flags);
__mod_memcg_state()
memcg_stats_lock(); <- new and unnecessary?
Instead of modifying __mod_memcg_state() we could be more targetted and just
do in drain_obj_stock():
memcg_stats_lock();
__mod_memcg_state();
memcg_stats_unlock();
Am I missing something?
> ---
> mm/memcontrol.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 3c4de384b5a0..dfe9c2eb7816 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -707,10 +707,12 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
> if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, idx))
> return;
>
> + memcg_stats_lock();
> __this_cpu_add(memcg->vmstats_percpu->state[i], val);
> val = memcg_state_val_in_pages(idx, val);
> memcg_rstat_updated(memcg, val);
> trace_mod_memcg_state(memcg, idx, val);
> + memcg_stats_unlock();
> }
>
> #ifdef CONFIG_MEMCG_V1
> @@ -2845,7 +2847,7 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
>
> memcg = get_mem_cgroup_from_objcg(old);
>
> - mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
> + __mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
> memcg1_account_kmem(memcg, -nr_pages);
> if (!mem_cgroup_is_root(memcg))
> memcg_uncharge(memcg, nr_pages);
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 7/9] memcg: use __mod_memcg_state in drain_obj_stock
2025-03-17 20:56 ` Vlastimil Babka
@ 2025-03-17 21:54 ` Shakeel Butt
2025-03-18 8:02 ` Vlastimil Babka
0 siblings, 1 reply; 33+ messages in thread
From: Shakeel Butt @ 2025-03-17 21:54 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
Muchun Song, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
On Mon, Mar 17, 2025 at 09:56:39PM +0100, Vlastimil Babka wrote:
> On 3/15/25 18:49, Shakeel Butt wrote:
> > For non-PREEMPT_RT kernels, drain_obj_stock() is always called with irq
> > disabled, so we can use __mod_memcg_state() instead of
> > mod_memcg_state(). For PREEMPT_RT, we need to add memcg_stats_[un]lock
> > in __mod_memcg_state().
> >
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> I've asked in the RFC and from Sebastian's answer I think my question was
> misunderstod, so let me try again.
>
> After this patch we'll have from mod_memcg_state():
>
> mod_memcg_state()
> local_irq_save(flags);
> __mod_memcg_state()
> memcg_stats_lock(); <- new and unnecessary?
>
> Instead of modifying __mod_memcg_state() we could be more targetted and just
> do in drain_obj_stock():
>
> memcg_stats_lock();
> __mod_memcg_state();
> memcg_stats_unlock();
>
> Am I missing something?
This seems unnecessary because this patch is adding the first user of
__mod_memcg_state() but I think maintainability is better with
memcg_stats_[un]lock() inside __mod_memcg_state().
Let's take the example of __mod_memcg_lruvec_state(). It is being
called from places where non-RT kernel, the irqs are disabled through
spin_lock_irq*, so on RT kernel, the irq would not be disabled and
thus explicitly need preemption disabled. What if in future
__mod_memcg_state() is being used by a caller which assumes preemption
is disabled through irq disable. The RT kernel would be buggy there.
I am not sure if it is easy to force the future users to explicitly add
memcg_stats_[un]lock() across the call to __mod_memcg_state().
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/9] memcg: remove root memcg check from refill_stock
2025-03-15 17:49 ` [PATCH 1/9] memcg: remove root memcg check from refill_stock Shakeel Butt
@ 2025-03-18 0:39 ` Roman Gushchin
2025-03-18 7:59 ` Vlastimil Babka
1 sibling, 0 replies; 33+ messages in thread
From: Roman Gushchin @ 2025-03-18 0:39 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
On Sat, Mar 15, 2025 at 10:49:22AM -0700, Shakeel Butt wrote:
> refill_stock can not be called with root memcg, so there is no need to
> check it.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/9] memcg: decouple drain_obj_stock from local stock
2025-03-15 17:49 ` [PATCH 2/9] memcg: decouple drain_obj_stock from local stock Shakeel Butt
@ 2025-03-18 0:44 ` Roman Gushchin
0 siblings, 0 replies; 33+ messages in thread
From: Roman Gushchin @ 2025-03-18 0:44 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
On Sat, Mar 15, 2025 at 10:49:23AM -0700, Shakeel Butt wrote:
> Currently drain_obj_stock() can potentially call __refill_stock which
> accesses local cpu stock and thus requires memcg stock's local_lock.
> However if we look at the code paths leading to drain_obj_stock(), there
> is never a good reason to refill the memcg stock at all from it.
>
> At the moment, drain_obj_stock can be called from reclaim, hotplug cpu
> teardown, mod_objcg_state() and refill_obj_stock(). For reclaim and
> hotplug there is no need to refill. For the other two paths, most
> probably the newly switched objcg would be used in near future and thus
> no need to refill stock with the older objcg.
>
> In addition, __refill_stock() from drain_obj_stock() happens on rare
> cases, so performance is not really an issue. Let's just uncharge
> directly instead of refill which will also decouple drain_obj_stock from
> local cpu stock and local_lock requirements.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 3/9] memcg: introduce memcg_uncharge
2025-03-15 17:49 ` [PATCH 3/9] memcg: introduce memcg_uncharge Shakeel Butt
@ 2025-03-18 0:50 ` Roman Gushchin
0 siblings, 0 replies; 33+ messages in thread
From: Roman Gushchin @ 2025-03-18 0:50 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
On Sat, Mar 15, 2025 at 10:49:24AM -0700, Shakeel Butt wrote:
> At multiple places in memcontrol.c, the memory and memsw page counters
> are being uncharged. This is error-prone. Let's move the functionality
> to a newly introduced memcg_uncharge and call it from all those places.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
This is a nice one!
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 4/9] memcg: manually inline __refill_stock
2025-03-15 17:49 ` [PATCH 4/9] memcg: manually inline __refill_stock Shakeel Butt
@ 2025-03-18 0:58 ` Roman Gushchin
2025-03-18 7:58 ` Vlastimil Babka
0 siblings, 1 reply; 33+ messages in thread
From: Roman Gushchin @ 2025-03-18 0:58 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
On Sat, Mar 15, 2025 at 10:49:25AM -0700, Shakeel Butt wrote:
> There are no more multiple callers of __refill_stock(), so simply inline
> it to refill_stock().
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> mm/memcontrol.c | 32 ++++++++++++--------------------
> 1 file changed, 12 insertions(+), 20 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b54e3a1d23bd..7054b0ebd207 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1865,14 +1865,21 @@ static void drain_local_stock(struct work_struct *dummy)
> obj_cgroup_put(old);
> }
>
> -/*
> - * Cache charges(val) to local per_cpu area.
> - * This will be consumed by consume_stock() function, later.
> - */
> -static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> +/* Should never be called with root_mem_cgroup. */
How about adding something like this?
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 768d6b15dbfa..5c26002f2168 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1881,6 +1881,8 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
{
unsigned long flags;
+ VM_WARN_ON_ONCE(mem_cgroup_is_root(memcg));
+
local_lock_irqsave(&memcg_stock.stock_lock, flags);
__refill_stock(memcg, nr_pages);
local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
Other than that,
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 5/9] memcg: no refilling stock from obj_cgroup_release
2025-03-15 17:49 ` [PATCH 5/9] memcg: no refilling stock from obj_cgroup_release Shakeel Butt
@ 2025-03-18 1:06 ` Roman Gushchin
0 siblings, 0 replies; 33+ messages in thread
From: Roman Gushchin @ 2025-03-18 1:06 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
On Sat, Mar 15, 2025 at 10:49:26AM -0700, Shakeel Butt wrote:
> obj_cgroup_release is called when all the references to the objcg have
> been released i.e. no more memory objects are pointing to it. Most
> probably objcg->memcg will be pointing to some ancestor memcg. In
> obj_cgroup_release(), the kernel calls obj_cgroup_uncharge_pages() which
> refills the local stock.
>
> There is no need to refill the local stock with some ancestor memcg and
> flush the local stock. Let's decouple obj_cgroup_release() from the
> local stock by uncharging instead of refilling. One additional benefit
> of this change is that it removes the requirement to only call
> obj_cgroup_put() outside of local_lock.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6/9] memcg: do obj_cgroup_put inside drain_obj_stock
2025-03-15 17:49 ` [PATCH 6/9] memcg: do obj_cgroup_put inside drain_obj_stock Shakeel Butt
@ 2025-03-18 1:07 ` Roman Gushchin
0 siblings, 0 replies; 33+ messages in thread
From: Roman Gushchin @ 2025-03-18 1:07 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
On Sat, Mar 15, 2025 at 10:49:27AM -0700, Shakeel Butt wrote:
> Previously we could not call obj_cgroup_put() inside the local lock
> because on the put on the last reference, the release function
> obj_cgroup_release() may try to re-acquire the local lock. However that
> chain has been broken. Now simply do obj_cgroup_put() inside
> drain_obj_stock() instead of returning the old objcg.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Nice one too!
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 7/9] memcg: use __mod_memcg_state in drain_obj_stock
2025-03-15 17:49 ` [PATCH 7/9] memcg: use __mod_memcg_state in drain_obj_stock Shakeel Butt
2025-03-17 20:56 ` Vlastimil Babka
@ 2025-03-18 1:13 ` Roman Gushchin
2025-03-18 7:50 ` Vlastimil Babka
1 sibling, 1 reply; 33+ messages in thread
From: Roman Gushchin @ 2025-03-18 1:13 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
On Sat, Mar 15, 2025 at 10:49:28AM -0700, Shakeel Butt wrote:
> For non-PREEMPT_RT kernels, drain_obj_stock() is always called with irq
> disabled, so we can use __mod_memcg_state() instead of
> mod_memcg_state(). For PREEMPT_RT, we need to add memcg_stats_[un]lock
> in __mod_memcg_state().
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> mm/memcontrol.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 3c4de384b5a0..dfe9c2eb7816 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -707,10 +707,12 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
> if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, idx))
> return;
>
> + memcg_stats_lock();
> __this_cpu_add(memcg->vmstats_percpu->state[i], val);
> val = memcg_state_val_in_pages(idx, val);
> memcg_rstat_updated(memcg, val);
> trace_mod_memcg_state(memcg, idx, val);
> + memcg_stats_unlock();
> }
>
> #ifdef CONFIG_MEMCG_V1
> @@ -2845,7 +2847,7 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
VM_WARN_ON_IRQS_ENABLED() ?
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 8/9] memcg: combine slab obj stock charging and accounting
2025-03-15 17:49 ` [PATCH 8/9] memcg: combine slab obj stock charging and accounting Shakeel Butt
@ 2025-03-18 1:20 ` Roman Gushchin
0 siblings, 0 replies; 33+ messages in thread
From: Roman Gushchin @ 2025-03-18 1:20 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
On Sat, Mar 15, 2025 at 10:49:29AM -0700, Shakeel Butt wrote:
> From: Vlastimil Babka <vbabka@suse.cz>
>
> When handing slab objects, we use obj_cgroup_[un]charge() for
> (un)charging and mod_objcg_state() to account NR_SLAB_[UN]RECLAIMABLE_B.
> All these operations use the percpu stock for performance. However with
> the calls being separate, the stock_lock is taken twice in each case.
>
> By refactoring the code, we can turn mod_objcg_state() into
> __account_obj_stock() which is called on a stock that's already locked
> and validated. On the charging side we can call this function from
> consume_obj_stock() when it succeeds, and refill_obj_stock() in the
> fallback. We just expand parameters of these functions as necessary.
> The uncharge side from __memcg_slab_free_hook() is just the call to
> refill_obj_stock().
>
> Other callers of obj_cgroup_[un]charge() (i.e. not slab) simply pass the
> extra parameters as NULL/zeroes to skip the __account_obj_stock()
> operation.
>
> In __memcg_slab_post_alloc_hook() we now charge each object separately,
> but that's not a problem as we did call mod_objcg_state() for each
> object separately, and most allocations are non-bulk anyway. This
> could be improved by batching all operations until slab_pgdat(slab)
> changes.
>
> Some preliminary benchmarking with a kfree(kmalloc()) loop of 10M
> iterations with/without __GFP_ACCOUNT:
>
> Before the patch:
> kmalloc/kfree !memcg: 581390144 cycles
> kmalloc/kfree memcg: 783689984 cycles
>
> After the patch:
> kmalloc/kfree memcg: 658723808 cycles
>
> More than half of the overhead of __GFP_ACCOUNT relative to
> non-accounted case seems eliminated.
Oh, this is huge!
I believe the next step is to also integrate the refcnt management,
it might shave off few more percent.
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 9/9] memcg: manually inline replace_stock_objcg
2025-03-15 17:49 ` [PATCH 9/9] memcg: manually inline replace_stock_objcg Shakeel Butt
@ 2025-03-18 1:21 ` Roman Gushchin
2025-03-18 8:00 ` Vlastimil Babka
1 sibling, 0 replies; 33+ messages in thread
From: Roman Gushchin @ 2025-03-18 1:21 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
On Sat, Mar 15, 2025 at 10:49:30AM -0700, Shakeel Butt wrote:
> The replace_stock_objcg() is being called by only refill_obj_stock, so
> manually inline it.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Thanks!
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 7/9] memcg: use __mod_memcg_state in drain_obj_stock
2025-03-18 1:13 ` Roman Gushchin
@ 2025-03-18 7:50 ` Vlastimil Babka
0 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2025-03-18 7:50 UTC (permalink / raw)
To: Roman Gushchin, Shakeel Butt
Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
Sebastian Andrzej Siewior, linux-mm, cgroups, linux-kernel,
Meta kernel team
On 3/18/25 02:13, Roman Gushchin wrote:
> On Sat, Mar 15, 2025 at 10:49:28AM -0700, Shakeel Butt wrote:
>> For non-PREEMPT_RT kernels, drain_obj_stock() is always called with irq
>> disabled, so we can use __mod_memcg_state() instead of
>> mod_memcg_state(). For PREEMPT_RT, we need to add memcg_stats_[un]lock
>> in __mod_memcg_state().
>>
>> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
>> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>> ---
>> mm/memcontrol.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 3c4de384b5a0..dfe9c2eb7816 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -707,10 +707,12 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
>> if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, idx))
>> return;
>>
>> + memcg_stats_lock();
>> __this_cpu_add(memcg->vmstats_percpu->state[i], val);
>> val = memcg_state_val_in_pages(idx, val);
>> memcg_rstat_updated(memcg, val);
>> trace_mod_memcg_state(memcg, idx, val);
>> + memcg_stats_unlock();
>> }
>>
>> #ifdef CONFIG_MEMCG_V1
>> @@ -2845,7 +2847,7 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
>
> VM_WARN_ON_IRQS_ENABLED() ?
It's part of memcg_stats_lock()
> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 4/9] memcg: manually inline __refill_stock
2025-03-18 0:58 ` Roman Gushchin
@ 2025-03-18 7:58 ` Vlastimil Babka
0 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2025-03-18 7:58 UTC (permalink / raw)
To: Roman Gushchin, Shakeel Butt
Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
Sebastian Andrzej Siewior, linux-mm, cgroups, linux-kernel,
Meta kernel team
On 3/18/25 01:58, Roman Gushchin wrote:
> On Sat, Mar 15, 2025 at 10:49:25AM -0700, Shakeel Butt wrote:
>> There are no more multiple callers of __refill_stock(), so simply inline
>> it to refill_stock().
>>
>> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
>> mm/memcontrol.c | 32 ++++++++++++--------------------
>> 1 file changed, 12 insertions(+), 20 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index b54e3a1d23bd..7054b0ebd207 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -1865,14 +1865,21 @@ static void drain_local_stock(struct work_struct *dummy)
>> obj_cgroup_put(old);
>> }
>>
>> -/*
>> - * Cache charges(val) to local per_cpu area.
>> - * This will be consumed by consume_stock() function, later.
>> - */
>> -static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
>> +/* Should never be called with root_mem_cgroup. */
>
> How about adding something like this?
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 768d6b15dbfa..5c26002f2168 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1881,6 +1881,8 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> {
> unsigned long flags;
>
> + VM_WARN_ON_ONCE(mem_cgroup_is_root(memcg));
Already in patch 1 though? But I agree.
> +
> local_lock_irqsave(&memcg_stock.stock_lock, flags);
> __refill_stock(memcg, nr_pages);
> local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
>
>
> Other than that,
> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/9] memcg: remove root memcg check from refill_stock
2025-03-15 17:49 ` [PATCH 1/9] memcg: remove root memcg check from refill_stock Shakeel Butt
2025-03-18 0:39 ` Roman Gushchin
@ 2025-03-18 7:59 ` Vlastimil Babka
2025-03-21 16:55 ` Shakeel Butt
1 sibling, 1 reply; 33+ messages in thread
From: Vlastimil Babka @ 2025-03-18 7:59 UTC (permalink / raw)
To: Shakeel Butt, Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Sebastian Andrzej Siewior, linux-mm, cgroups, linux-kernel,
Meta kernel team
On 3/15/25 18:49, Shakeel Butt wrote:
> refill_stock can not be called with root memcg, so there is no need to
> check it.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
It's not trivial to verify this so I'd add
VM_WARN_ON_ONCE(mem_cgroup_is_root(memcg)); as Roman suggested in patch 4 reply.
> ---
> mm/memcontrol.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b29433eb17fa..c09a32e93d39 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1883,6 +1883,7 @@ static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> drain_stock(stock);
> }
>
> +/* Should never be called with root_mem_cgroup. */
> static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> {
> unsigned long flags;
> @@ -1892,8 +1893,6 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> * In case of unlikely failure to lock percpu stock_lock
> * uncharge memcg directly.
> */
> - if (mem_cgroup_is_root(memcg))
> - return;
> page_counter_uncharge(&memcg->memory, nr_pages);
> if (do_memsw_account())
> page_counter_uncharge(&memcg->memsw, nr_pages);
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 9/9] memcg: manually inline replace_stock_objcg
2025-03-15 17:49 ` [PATCH 9/9] memcg: manually inline replace_stock_objcg Shakeel Butt
2025-03-18 1:21 ` Roman Gushchin
@ 2025-03-18 8:00 ` Vlastimil Babka
1 sibling, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2025-03-18 8:00 UTC (permalink / raw)
To: Shakeel Butt, Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Sebastian Andrzej Siewior, linux-mm, cgroups, linux-kernel,
Meta kernel team
On 3/15/25 18:49, Shakeel Butt wrote:
> The replace_stock_objcg() is being called by only refill_obj_stock, so
> manually inline it.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> mm/memcontrol.c | 18 ++++++------------
> 1 file changed, 6 insertions(+), 12 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 553eb1d7250a..f6e3fc418866 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2742,17 +2742,6 @@ void __memcg_kmem_uncharge_page(struct page *page, int order)
> obj_cgroup_put(objcg);
> }
>
> -/* Replace the stock objcg with objcg, return the old objcg */
> -static void replace_stock_objcg(struct memcg_stock_pcp *stock,
> - struct obj_cgroup *objcg)
> -{
> - drain_obj_stock(stock);
> - obj_cgroup_get(objcg);
> - stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
> - ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
> - WRITE_ONCE(stock->cached_objcg, objcg);
> -}
> -
> static void __account_obj_stock(struct obj_cgroup *objcg,
> struct memcg_stock_pcp *stock, int nr,
> struct pglist_data *pgdat, enum node_stat_item idx)
> @@ -2913,7 +2902,12 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
>
> stock = this_cpu_ptr(&memcg_stock);
> if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */
> - replace_stock_objcg(stock, objcg);
> + drain_obj_stock(stock);
> + obj_cgroup_get(objcg);
> + stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
> + ? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
> + WRITE_ONCE(stock->cached_objcg, objcg);
> +
> allow_uncharge = true; /* Allow uncharge when objcg changes */
> }
> stock->nr_bytes += nr_bytes;
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 7/9] memcg: use __mod_memcg_state in drain_obj_stock
2025-03-17 21:54 ` Shakeel Butt
@ 2025-03-18 8:02 ` Vlastimil Babka
0 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2025-03-18 8:02 UTC (permalink / raw)
To: Shakeel Butt
Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
Muchun Song, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
On 3/17/25 22:54, Shakeel Butt wrote:
> On Mon, Mar 17, 2025 at 09:56:39PM +0100, Vlastimil Babka wrote:
>> On 3/15/25 18:49, Shakeel Butt wrote:
>> > For non-PREEMPT_RT kernels, drain_obj_stock() is always called with irq
>> > disabled, so we can use __mod_memcg_state() instead of
>> > mod_memcg_state(). For PREEMPT_RT, we need to add memcg_stats_[un]lock
>> > in __mod_memcg_state().
>> >
>> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
>> > Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>>
>> I've asked in the RFC and from Sebastian's answer I think my question was
>> misunderstod, so let me try again.
>>
>> After this patch we'll have from mod_memcg_state():
>>
>> mod_memcg_state()
>> local_irq_save(flags);
>> __mod_memcg_state()
>> memcg_stats_lock(); <- new and unnecessary?
>>
>> Instead of modifying __mod_memcg_state() we could be more targetted and just
>> do in drain_obj_stock():
>>
>> memcg_stats_lock();
>> __mod_memcg_state();
>> memcg_stats_unlock();
>>
>> Am I missing something?
>
> This seems unnecessary because this patch is adding the first user of
> __mod_memcg_state()
You mean first other user than mod_memcg_state() itself.
> but I think maintainability is better with
> memcg_stats_[un]lock() inside __mod_memcg_state().
>
> Let's take the example of __mod_memcg_lruvec_state(). It is being
> called from places where non-RT kernel, the irqs are disabled through
> spin_lock_irq*, so on RT kernel, the irq would not be disabled and
> thus explicitly need preemption disabled. What if in future
> __mod_memcg_state() is being used by a caller which assumes preemption
> is disabled through irq disable. The RT kernel would be buggy there.
>
> I am not sure if it is easy to force the future users to explicitly add
> memcg_stats_[un]lock() across the call to __mod_memcg_state().
I see the point. Well least memcg_stats_lock() isn't expensive, and it's a
no-op on non-debug !RT anyway.
Acked-by: Vlastimil Babka <vbabka@suse.cz>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/9] memcg: remove root memcg check from refill_stock
2025-03-18 7:59 ` Vlastimil Babka
@ 2025-03-21 16:55 ` Shakeel Butt
0 siblings, 0 replies; 33+ messages in thread
From: Shakeel Butt @ 2025-03-21 16:55 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
Muchun Song, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
On Tue, Mar 18, 2025 at 08:59:59AM +0100, Vlastimil Babka wrote:
> On 3/15/25 18:49, Shakeel Butt wrote:
> > refill_stock can not be called with root memcg, so there is no need to
> > check it.
> >
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>
> It's not trivial to verify this so I'd add
> VM_WARN_ON_ONCE(mem_cgroup_is_root(memcg)); as Roman suggested in patch 4 reply.
>
Ack, will do in the next version.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 0/9] memcg: cleanup per-cpu stock
2025-03-15 17:49 [PATCH 0/9] memcg: cleanup per-cpu stock Shakeel Butt
` (9 preceding siblings ...)
2025-03-16 3:57 ` [PATCH 0/9] memcg: cleanup per-cpu stock Andrew Morton
@ 2025-04-02 20:40 ` Shakeel Butt
10 siblings, 0 replies; 33+ messages in thread
From: Shakeel Butt @ 2025-04-02 20:40 UTC (permalink / raw)
To: Andrew Morton
Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
linux-kernel, Meta kernel team
On Sat, Mar 15, 2025 at 10:49:21AM -0700, Shakeel Butt wrote:
> This is a cleanup series which is trying to simplify the memcg per-cpu
> stock code, particularly it tries to remove unnecessary dependencies on
> local_lock of per-cpu memcg stock. The eight patch from Vlastimil
> optimizes the charge path by combining the charging and accounting.
>
> This series is based on next-20250314 plus two following patches:
>
> Link: https://lore.kernel.org/all/20250312222552.3284173-1-shakeel.butt@linux.dev/
> Link: https://lore.kernel.org/all/20250313054812.2185900-1-shakeel.butt@linux.dev/
>
> Shakeel Butt (8):
> memcg: remove root memcg check from refill_stock
> memcg: decouple drain_obj_stock from local stock
> memcg: introduce memcg_uncharge
> memcg: manually inline __refill_stock
> memcg: no refilling stock from obj_cgroup_release
> memcg: do obj_cgroup_put inside drain_obj_stock
> memcg: use __mod_memcg_state in drain_obj_stock
> memcg: manually inline replace_stock_objcg
>
> Vlastimil Babka (1):
> memcg: combine slab obj stock charging and accounting
>
> mm/memcontrol.c | 195 +++++++++++++++++++++++-------------------------
> 1 file changed, 95 insertions(+), 100 deletions(-)
>
I am waiting for [1] to merge into linus tree and then I will rebase and
resend this series.
[1] https://lore.kernel.org/all/20250401205245.70838-1-alexei.starovoitov@gmail.com/
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2025-04-02 20:40 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-15 17:49 [PATCH 0/9] memcg: cleanup per-cpu stock Shakeel Butt
2025-03-15 17:49 ` [PATCH 1/9] memcg: remove root memcg check from refill_stock Shakeel Butt
2025-03-18 0:39 ` Roman Gushchin
2025-03-18 7:59 ` Vlastimil Babka
2025-03-21 16:55 ` Shakeel Butt
2025-03-15 17:49 ` [PATCH 2/9] memcg: decouple drain_obj_stock from local stock Shakeel Butt
2025-03-18 0:44 ` Roman Gushchin
2025-03-15 17:49 ` [PATCH 3/9] memcg: introduce memcg_uncharge Shakeel Butt
2025-03-18 0:50 ` Roman Gushchin
2025-03-15 17:49 ` [PATCH 4/9] memcg: manually inline __refill_stock Shakeel Butt
2025-03-18 0:58 ` Roman Gushchin
2025-03-18 7:58 ` Vlastimil Babka
2025-03-15 17:49 ` [PATCH 5/9] memcg: no refilling stock from obj_cgroup_release Shakeel Butt
2025-03-18 1:06 ` Roman Gushchin
2025-03-15 17:49 ` [PATCH 6/9] memcg: do obj_cgroup_put inside drain_obj_stock Shakeel Butt
2025-03-18 1:07 ` Roman Gushchin
2025-03-15 17:49 ` [PATCH 7/9] memcg: use __mod_memcg_state in drain_obj_stock Shakeel Butt
2025-03-17 20:56 ` Vlastimil Babka
2025-03-17 21:54 ` Shakeel Butt
2025-03-18 8:02 ` Vlastimil Babka
2025-03-18 1:13 ` Roman Gushchin
2025-03-18 7:50 ` Vlastimil Babka
2025-03-15 17:49 ` [PATCH 8/9] memcg: combine slab obj stock charging and accounting Shakeel Butt
2025-03-18 1:20 ` Roman Gushchin
2025-03-15 17:49 ` [PATCH 9/9] memcg: manually inline replace_stock_objcg Shakeel Butt
2025-03-18 1:21 ` Roman Gushchin
2025-03-18 8:00 ` Vlastimil Babka
2025-03-16 3:57 ` [PATCH 0/9] memcg: cleanup per-cpu stock Andrew Morton
2025-03-16 4:43 ` Shakeel Butt
2025-03-16 15:59 ` Alexei Starovoitov
2025-03-17 18:11 ` Shakeel Butt
2025-03-17 20:27 ` Andrew Morton
2025-04-02 20:40 ` Shakeel Butt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox