linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/10] memcg: stock code cleanups
@ 2025-03-14  6:15 Shakeel Butt
  2025-03-14  6:15 ` [RFC PATCH 01/10] memcg: remove root memcg check from refill_stock Shakeel Butt
                   ` (10 more replies)
  0 siblings, 11 replies; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
	linux-kernel, Meta kernel team

This is a cleanup series (first 7 patches) is trying to simplify the
memcg stock code, particularly it tries to remove unnecessary
dependencies. The last 3 patches are a prototype to make per-cpu memcg
stock work without disabling irqs.

My plan is to send out the first 7 cleanup patches separately for the
next release window and iterate more on the last 3 patches plus add
functionality for multiple memcgs.

This series is based on next-20250313 plus two following patches:

Link: https://lore.kernel.org/all/20250312222552.3284173-1-shakeel.butt@linux.dev/
Link: https://lore.kernel.org/all/20250313054812.2185900-1-shakeel.butt@linux.dev/

 to simply the memcg stock code

Shakeel Butt (10):
  memcg: remove root memcg check from refill_stock
  memcg: decouple drain_obj_stock from local stock
  memcg: introduce memcg_uncharge
  memcg: manually inline __refill_stock
  memcg: no refilling stock from obj_cgroup_release
  memcg: do obj_cgroup_put inside drain_obj_stock
  memcg: use __mod_memcg_state in drain_obj_stock
  memcg: assert in_task for couple of local_lock holders
  memcg: trylock stock for objcg
  memcg: no more irq disabling for stock locks

 mm/memcontrol.c | 201 +++++++++++++++++++++++++++---------------------
 1 file changed, 112 insertions(+), 89 deletions(-)

-- 
2.47.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH 01/10] memcg: remove root memcg check from refill_stock
  2025-03-14  6:15 [RFC PATCH 00/10] memcg: stock code cleanups Shakeel Butt
@ 2025-03-14  6:15 ` Shakeel Butt
  2025-03-14  6:15 ` [RFC PATCH 02/10] memcg: decouple drain_obj_stock from local stock Shakeel Butt
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
	linux-kernel, Meta kernel team

refill_stock can not be called with root memcg, so there is no need to
check it.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 mm/memcontrol.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b29433eb17fa..c09a32e93d39 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1883,6 +1883,7 @@ static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 		drain_stock(stock);
 }
 
+/* Should never be called with root_mem_cgroup. */
 static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
 	unsigned long flags;
@@ -1892,8 +1893,6 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 		 * In case of unlikely failure to lock percpu stock_lock
 		 * uncharge memcg directly.
 		 */
-		if (mem_cgroup_is_root(memcg))
-			return;
 		page_counter_uncharge(&memcg->memory, nr_pages);
 		if (do_memsw_account())
 			page_counter_uncharge(&memcg->memsw, nr_pages);
-- 
2.47.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH 02/10] memcg: decouple drain_obj_stock from local stock
  2025-03-14  6:15 [RFC PATCH 00/10] memcg: stock code cleanups Shakeel Butt
  2025-03-14  6:15 ` [RFC PATCH 01/10] memcg: remove root memcg check from refill_stock Shakeel Butt
@ 2025-03-14  6:15 ` Shakeel Butt
  2025-03-14  9:57   ` Vlastimil Babka
  2025-03-14  6:15 ` [RFC PATCH 03/10] memcg: introduce memcg_uncharge Shakeel Butt
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
	linux-kernel, Meta kernel team

Currently drain_obj_stock() can potentially call __refill_stock which
accesses local cpu stock and thus requires memcg stock's local_lock.
However if we look at the code paths leading to drain_obj_stock(), there
is never a good reason to refill the memcg stock at all from it.

At the moment, drain_obj_stock can be called from reclaim, hotplug cpu
teardown, mod_objcg_state() and refill_obj_stock(). For reclaim and
hotplug there is no need to refill. For the other two paths, most
probably the newly switched objcg would be used in near future and thus
no need to refill stock with the older objcg.

In addition, __refill_stock() from drain_obj_stock() happens on rare
cases, so performance is not really an issue. Let's just uncharge
directly instead of refill which will also decouple drain_obj_stock from
local cpu stock and local_lock requirements.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 mm/memcontrol.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c09a32e93d39..28cb75b5bc66 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2855,7 +2855,12 @@ static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
 
 			mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
 			memcg1_account_kmem(memcg, -nr_pages);
-			__refill_stock(memcg, nr_pages);
+			if (!mem_cgroup_is_root(memcg)) {
+				page_counter_uncharge(&memcg->memory, nr_pages);
+				if (do_memsw_account())
+					page_counter_uncharge(&memcg->memsw,
+							      nr_pages);
+			}
 
 			css_put(&memcg->css);
 		}
-- 
2.47.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH 03/10] memcg: introduce memcg_uncharge
  2025-03-14  6:15 [RFC PATCH 00/10] memcg: stock code cleanups Shakeel Butt
  2025-03-14  6:15 ` [RFC PATCH 01/10] memcg: remove root memcg check from refill_stock Shakeel Butt
  2025-03-14  6:15 ` [RFC PATCH 02/10] memcg: decouple drain_obj_stock from local stock Shakeel Butt
@ 2025-03-14  6:15 ` Shakeel Butt
  2025-03-14 10:01   ` Vlastimil Babka
  2025-03-14  6:15 ` [RFC PATCH 04/10] memcg: manually inline __refill_stock Shakeel Butt
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
	linux-kernel, Meta kernel team

At multiple places in memcontrol.c, the memory and memsw page counters
are being uncharged. This is error-prone. Let's move the functionality
to a newly introduced memcg_uncharge and call it from all those places.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 mm/memcontrol.c | 28 ++++++++++++----------------
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 28cb75b5bc66..b54e3a1d23bd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1816,6 +1816,13 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
 	return ret;
 }
 
+static void memcg_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+	page_counter_uncharge(&memcg->memory, nr_pages);
+	if (do_memsw_account())
+		page_counter_uncharge(&memcg->memsw, nr_pages);
+}
+
 /*
  * Returns stocks cached in percpu and reset cached information.
  */
@@ -1828,10 +1835,7 @@ static void drain_stock(struct memcg_stock_pcp *stock)
 		return;
 
 	if (stock_pages) {
-		page_counter_uncharge(&old->memory, stock_pages);
-		if (do_memsw_account())
-			page_counter_uncharge(&old->memsw, stock_pages);
-
+		memcg_uncharge(old, stock_pages);
 		WRITE_ONCE(stock->nr_pages, 0);
 	}
 
@@ -1893,9 +1897,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 		 * In case of unlikely failure to lock percpu stock_lock
 		 * uncharge memcg directly.
 		 */
-		page_counter_uncharge(&memcg->memory, nr_pages);
-		if (do_memsw_account())
-			page_counter_uncharge(&memcg->memsw, nr_pages);
+		memcg_uncharge(memcg, nr_pages);
 		return;
 	}
 	__refill_stock(memcg, nr_pages);
@@ -2855,12 +2857,8 @@ static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
 
 			mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
 			memcg1_account_kmem(memcg, -nr_pages);
-			if (!mem_cgroup_is_root(memcg)) {
-				page_counter_uncharge(&memcg->memory, nr_pages);
-				if (do_memsw_account())
-					page_counter_uncharge(&memcg->memsw,
-							      nr_pages);
-			}
+			if (!mem_cgroup_is_root(memcg))
+				memcg_uncharge(memcg, nr_pages);
 
 			css_put(&memcg->css);
 		}
@@ -4689,9 +4687,7 @@ static inline void uncharge_gather_clear(struct uncharge_gather *ug)
 static void uncharge_batch(const struct uncharge_gather *ug)
 {
 	if (ug->nr_memory) {
-		page_counter_uncharge(&ug->memcg->memory, ug->nr_memory);
-		if (do_memsw_account())
-			page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory);
+		memcg_uncharge(ug->memcg, ug->nr_memory);
 		if (ug->nr_kmem) {
 			mod_memcg_state(ug->memcg, MEMCG_KMEM, -ug->nr_kmem);
 			memcg1_account_kmem(ug->memcg, -ug->nr_kmem);
-- 
2.47.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH 04/10] memcg: manually inline __refill_stock
  2025-03-14  6:15 [RFC PATCH 00/10] memcg: stock code cleanups Shakeel Butt
                   ` (2 preceding siblings ...)
  2025-03-14  6:15 ` [RFC PATCH 03/10] memcg: introduce memcg_uncharge Shakeel Butt
@ 2025-03-14  6:15 ` Shakeel Butt
  2025-03-14 10:05   ` Vlastimil Babka
  2025-03-14  6:15 ` [RFC PATCH 05/10] memcg: no refilling stock from obj_cgroup_release Shakeel Butt
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
	linux-kernel, Meta kernel team

There are no more multiple callers of __refill_stock(), so simply inline
it to refill_stock().

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 mm/memcontrol.c | 32 ++++++++++++--------------------
 1 file changed, 12 insertions(+), 20 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b54e3a1d23bd..7054b0ebd207 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1865,14 +1865,21 @@ static void drain_local_stock(struct work_struct *dummy)
 	obj_cgroup_put(old);
 }
 
-/*
- * Cache charges(val) to local per_cpu area.
- * This will be consumed by consume_stock() function, later.
- */
-static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
+/* Should never be called with root_mem_cgroup. */
+static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
 	struct memcg_stock_pcp *stock;
 	unsigned int stock_pages;
+	unsigned long flags;
+
+	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
+		/*
+		 * In case of unlikely failure to lock percpu stock_lock
+		 * uncharge memcg directly.
+		 */
+		memcg_uncharge(memcg, nr_pages);
+		return;
+	}
 
 	stock = this_cpu_ptr(&memcg_stock);
 	if (READ_ONCE(stock->cached) != memcg) { /* reset if necessary */
@@ -1885,22 +1892,7 @@ static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 
 	if (stock_pages > MEMCG_CHARGE_BATCH)
 		drain_stock(stock);
-}
 
-/* Should never be called with root_mem_cgroup. */
-static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
-	unsigned long flags;
-
-	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
-		/*
-		 * In case of unlikely failure to lock percpu stock_lock
-		 * uncharge memcg directly.
-		 */
-		memcg_uncharge(memcg, nr_pages);
-		return;
-	}
-	__refill_stock(memcg, nr_pages);
 	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
 }
 
-- 
2.47.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH 05/10] memcg: no refilling stock from obj_cgroup_release
  2025-03-14  6:15 [RFC PATCH 00/10] memcg: stock code cleanups Shakeel Butt
                   ` (3 preceding siblings ...)
  2025-03-14  6:15 ` [RFC PATCH 04/10] memcg: manually inline __refill_stock Shakeel Butt
@ 2025-03-14  6:15 ` Shakeel Butt
  2025-03-14 10:09   ` Vlastimil Babka
  2025-03-14 11:26   ` Sebastian Andrzej Siewior
  2025-03-14  6:15 ` [RFC PATCH 06/10] memcg: do obj_cgroup_put inside drain_obj_stock Shakeel Butt
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
	linux-kernel, Meta kernel team

obj_cgroup_release is called when all the references to the objcg has
been released i.e. no more memory objects are pointing to it. Most
probably objcg->memcg will be pointing to some ancestor memcg and at the
moment, in obj_cgroup_release, the kernel call
obj_cgroup_uncharge_pages() to uncharge last remaining memory.

However obj_cgroup_uncharge_pages() refills the local stock. There is
no need to refill the local stock with some ancestor memcg and flush the
local stock. In addition this removes the requirement to only call
obj_cgroup_put() outside of local_lock.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 mm/memcontrol.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7054b0ebd207..83db180455a1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -129,8 +129,7 @@ bool mem_cgroup_kmem_disabled(void)
 	return cgroup_memory_nokmem;
 }
 
-static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
-				      unsigned int nr_pages);
+static void memcg_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages);
 
 static void obj_cgroup_release(struct percpu_ref *ref)
 {
@@ -163,8 +162,16 @@ static void obj_cgroup_release(struct percpu_ref *ref)
 	WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1));
 	nr_pages = nr_bytes >> PAGE_SHIFT;
 
-	if (nr_pages)
-		obj_cgroup_uncharge_pages(objcg, nr_pages);
+	if (nr_pages) {
+		struct mem_cgroup *memcg;
+
+		memcg = get_mem_cgroup_from_objcg(objcg);
+		mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
+		memcg1_account_kmem(memcg, -nr_pages);
+		if (!mem_cgroup_is_root(memcg))
+			memcg_uncharge(memcg, nr_pages);
+		css_put(&memcg->css);
+	}
 
 	spin_lock_irqsave(&objcg_lock, flags);
 	list_del(&objcg->list);
-- 
2.47.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH 06/10] memcg: do obj_cgroup_put inside drain_obj_stock
  2025-03-14  6:15 [RFC PATCH 00/10] memcg: stock code cleanups Shakeel Butt
                   ` (4 preceding siblings ...)
  2025-03-14  6:15 ` [RFC PATCH 05/10] memcg: no refilling stock from obj_cgroup_release Shakeel Butt
@ 2025-03-14  6:15 ` Shakeel Butt
  2025-03-14 10:17   ` Vlastimil Babka
  2025-03-14  6:15 ` [RFC PATCH 07/10] memcg: use __mod_memcg_state in drain_obj_stock Shakeel Butt
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
	linux-kernel, Meta kernel team

Previously we could not call obj_cgroup_put() inside the local lock
because on the put on the last reference, the release function
obj_cgroup_release() may try to re-acquire the local lock. However that
chain has been broken. Now simply do obj_cgroup_put() inside
drain_obj_stock() instead of returning the old objcg.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 mm/memcontrol.c | 37 +++++++++++--------------------------
 1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 83db180455a1..3c4de384b5a0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1778,7 +1778,7 @@ static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock) = {
 };
 static DEFINE_MUTEX(percpu_charge_mutex);
 
-static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock);
+static void drain_obj_stock(struct memcg_stock_pcp *stock);
 static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
 				     struct mem_cgroup *root_memcg);
 
@@ -1853,7 +1853,6 @@ static void drain_stock(struct memcg_stock_pcp *stock)
 static void drain_local_stock(struct work_struct *dummy)
 {
 	struct memcg_stock_pcp *stock;
-	struct obj_cgroup *old = NULL;
 	unsigned long flags;
 
 	/*
@@ -1864,12 +1863,11 @@ static void drain_local_stock(struct work_struct *dummy)
 	localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
 
 	stock = this_cpu_ptr(&memcg_stock);
-	old = drain_obj_stock(stock);
+	drain_obj_stock(stock);
 	drain_stock(stock);
 	clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
 
 	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
-	obj_cgroup_put(old);
 }
 
 /* Should never be called with root_mem_cgroup. */
@@ -1951,18 +1949,16 @@ void drain_all_stock(struct mem_cgroup *root_memcg)
 static int memcg_hotplug_cpu_dead(unsigned int cpu)
 {
 	struct memcg_stock_pcp *stock;
-	struct obj_cgroup *old;
 	unsigned long flags;
 
 	stock = &per_cpu(memcg_stock, cpu);
 
 	/* drain_obj_stock requires stock_lock */
 	localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
-	old = drain_obj_stock(stock);
+	drain_obj_stock(stock);
 	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
 
 	drain_stock(stock);
-	obj_cgroup_put(old);
 
 	return 0;
 }
@@ -2745,24 +2741,20 @@ void __memcg_kmem_uncharge_page(struct page *page, int order)
 }
 
 /* Replace the stock objcg with objcg, return the old objcg */
-static struct obj_cgroup *replace_stock_objcg(struct memcg_stock_pcp *stock,
-					     struct obj_cgroup *objcg)
+static void replace_stock_objcg(struct memcg_stock_pcp *stock,
+				struct obj_cgroup *objcg)
 {
-	struct obj_cgroup *old = NULL;
-
-	old = drain_obj_stock(stock);
+	drain_obj_stock(stock);
 	obj_cgroup_get(objcg);
 	stock->nr_bytes = atomic_read(&objcg->nr_charged_bytes)
 			? atomic_xchg(&objcg->nr_charged_bytes, 0) : 0;
 	WRITE_ONCE(stock->cached_objcg, objcg);
-	return old;
 }
 
 static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 		     enum node_stat_item idx, int nr)
 {
 	struct memcg_stock_pcp *stock;
-	struct obj_cgroup *old = NULL;
 	unsigned long flags;
 	int *bytes;
 
@@ -2775,7 +2767,7 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 	 * changes.
 	 */
 	if (READ_ONCE(stock->cached_objcg) != objcg) {
-		old = replace_stock_objcg(stock, objcg);
+		replace_stock_objcg(stock, objcg);
 		stock->cached_pgdat = pgdat;
 	} else if (stock->cached_pgdat != pgdat) {
 		/* Flush the existing cached vmstat data */
@@ -2816,7 +2808,6 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 		__mod_objcg_mlstate(objcg, pgdat, idx, nr);
 
 	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
-	obj_cgroup_put(old);
 }
 
 static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
@@ -2838,12 +2829,12 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
 	return ret;
 }
 
-static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
+static void drain_obj_stock(struct memcg_stock_pcp *stock)
 {
 	struct obj_cgroup *old = READ_ONCE(stock->cached_objcg);
 
 	if (!old)
-		return NULL;
+		return;
 
 	if (stock->nr_bytes) {
 		unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT;
@@ -2896,11 +2887,7 @@ static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
 	}
 
 	WRITE_ONCE(stock->cached_objcg, NULL);
-	/*
-	 * The `old' objects needs to be released by the caller via
-	 * obj_cgroup_put() outside of memcg_stock_pcp::stock_lock.
-	 */
-	return old;
+	obj_cgroup_put(old);
 }
 
 static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
@@ -2922,7 +2909,6 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
 			     bool allow_uncharge)
 {
 	struct memcg_stock_pcp *stock;
-	struct obj_cgroup *old = NULL;
 	unsigned long flags;
 	unsigned int nr_pages = 0;
 
@@ -2930,7 +2916,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
 
 	stock = this_cpu_ptr(&memcg_stock);
 	if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */
-		old = replace_stock_objcg(stock, objcg);
+		replace_stock_objcg(stock, objcg);
 		allow_uncharge = true;	/* Allow uncharge when objcg changes */
 	}
 	stock->nr_bytes += nr_bytes;
@@ -2941,7 +2927,6 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
 	}
 
 	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
-	obj_cgroup_put(old);
 
 	if (nr_pages)
 		obj_cgroup_uncharge_pages(objcg, nr_pages);
-- 
2.47.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH 07/10] memcg: use __mod_memcg_state in drain_obj_stock
  2025-03-14  6:15 [RFC PATCH 00/10] memcg: stock code cleanups Shakeel Butt
                   ` (5 preceding siblings ...)
  2025-03-14  6:15 ` [RFC PATCH 06/10] memcg: do obj_cgroup_put inside drain_obj_stock Shakeel Butt
@ 2025-03-14  6:15 ` Shakeel Butt
  2025-03-14 10:27   ` Vlastimil Babka
  2025-03-14 11:38   ` Sebastian Andrzej Siewior
  2025-03-14  6:15 ` [RFC PATCH 08/10] memcg: assert in_task for couple of local_lock holders Shakeel Butt
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
	linux-kernel, Meta kernel team

For non-PREEMPT_RT kernels, drain_obj_stock() is always called with irq
disabled, so we can use __mod_memcg_state() instead of
mod_memcg_state(). For PREEMPT_RT, we need to add memcg_stats_[un]lock
in __mod_memcg_state().

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 mm/memcontrol.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3c4de384b5a0..dfe9c2eb7816 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -707,10 +707,12 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
 	if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, idx))
 		return;
 
+	memcg_stats_lock();
 	__this_cpu_add(memcg->vmstats_percpu->state[i], val);
 	val = memcg_state_val_in_pages(idx, val);
 	memcg_rstat_updated(memcg, val);
 	trace_mod_memcg_state(memcg, idx, val);
+	memcg_stats_unlock();
 }
 
 #ifdef CONFIG_MEMCG_V1
@@ -2845,7 +2847,7 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
 
 			memcg = get_mem_cgroup_from_objcg(old);
 
-			mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
+			__mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
 			memcg1_account_kmem(memcg, -nr_pages);
 			if (!mem_cgroup_is_root(memcg))
 				memcg_uncharge(memcg, nr_pages);
-- 
2.47.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH 08/10] memcg: assert in_task for couple of local_lock holders
  2025-03-14  6:15 [RFC PATCH 00/10] memcg: stock code cleanups Shakeel Butt
                   ` (6 preceding siblings ...)
  2025-03-14  6:15 ` [RFC PATCH 07/10] memcg: use __mod_memcg_state in drain_obj_stock Shakeel Butt
@ 2025-03-14  6:15 ` Shakeel Butt
  2025-03-14  6:15 ` [RFC PATCH 09/10] memcg: trylock stock for objcg Shakeel Butt
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
	linux-kernel, Meta kernel team

The drain_local_stock() and memcg_hotplug_cpu_dead() only run in task
context, so there is no need to localtry_trylock_irqsave() the local
stock_lock in those functions. The plan is to convert all stock_lock
users which can be called in multiple context to use
localtry_trylock_irqsave() and subsequently switch to non-irq disabling
interface. So, for functions which are never called in non-task context,
this patch adds the asserts.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 mm/memcontrol.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index dfe9c2eb7816..c803d2f5e322 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1857,6 +1857,8 @@ static void drain_local_stock(struct work_struct *dummy)
 	struct memcg_stock_pcp *stock;
 	unsigned long flags;
 
+	lockdep_assert_once(in_task());
+
 	/*
 	 * The only protection from cpu hotplug (memcg_hotplug_cpu_dead) vs.
 	 * drain_stock races is that we always operate on local CPU stock
@@ -1953,6 +1955,8 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu)
 	struct memcg_stock_pcp *stock;
 	unsigned long flags;
 
+	lockdep_assert_once(in_task());
+
 	stock = &per_cpu(memcg_stock, cpu);
 
 	/* drain_obj_stock requires stock_lock */
-- 
2.47.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH 09/10] memcg: trylock stock for objcg
  2025-03-14  6:15 [RFC PATCH 00/10] memcg: stock code cleanups Shakeel Butt
                   ` (7 preceding siblings ...)
  2025-03-14  6:15 ` [RFC PATCH 08/10] memcg: assert in_task for couple of local_lock holders Shakeel Butt
@ 2025-03-14  6:15 ` Shakeel Butt
  2025-03-14 11:47   ` Sebastian Andrzej Siewior
  2025-03-14  6:15 ` [RFC PATCH 10/10] memcg: no more irq disabling for stock locks Shakeel Butt
  2025-03-14 13:33 ` [RFC PATCH 00/10] memcg: stock code cleanups Vlastimil Babka
  10 siblings, 1 reply; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
	linux-kernel, Meta kernel team

To make objcg stock functions work without disabling irq, we need to
convert those function to use localtry_trylock_irqsave() instead of
localtry_lock_irqsave(). This patch for now just does the conversion and
later patch will eliminate the irq disabling code.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 mm/memcontrol.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c803d2f5e322..ba5d004049d3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2764,7 +2764,11 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 	unsigned long flags;
 	int *bytes;
 
-	localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
+	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
+		__mod_objcg_mlstate(objcg, pgdat, idx, nr);
+		return;
+	}
+
 	stock = this_cpu_ptr(&memcg_stock);
 
 	/*
@@ -2822,7 +2826,8 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
 	unsigned long flags;
 	bool ret = false;
 
-	localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
+	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags))
+		return ret;
 
 	stock = this_cpu_ptr(&memcg_stock);
 	if (objcg == READ_ONCE(stock->cached_objcg) && stock->nr_bytes >= nr_bytes) {
@@ -2918,7 +2923,10 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
 	unsigned long flags;
 	unsigned int nr_pages = 0;
 
-	localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
+	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
+		atomic_add(nr_bytes, &objcg->nr_charged_bytes);
+		return;
+	}
 
 	stock = this_cpu_ptr(&memcg_stock);
 	if (READ_ONCE(stock->cached_objcg) != objcg) { /* reset if necessary */
-- 
2.47.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH 10/10] memcg: no more irq disabling for stock locks
  2025-03-14  6:15 [RFC PATCH 00/10] memcg: stock code cleanups Shakeel Butt
                   ` (8 preceding siblings ...)
  2025-03-14  6:15 ` [RFC PATCH 09/10] memcg: trylock stock for objcg Shakeel Butt
@ 2025-03-14  6:15 ` Shakeel Butt
  2025-03-14 10:54   ` Vlastimil Babka
  2025-03-14 13:33 ` [RFC PATCH 00/10] memcg: stock code cleanups Vlastimil Babka
  10 siblings, 1 reply; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Vlastimil Babka, Sebastian Andrzej Siewior, linux-mm, cgroups,
	linux-kernel, Meta kernel team

Let's switch all memcg_stock locks acquire and release places to not
disable and enable irqs. There are two still functions (i.e.
mod_objcg_state() and drain_obj_stock) which needs to disable irqs to
update the stats on non-RT kernels. For now add a simple wrapper for
that functionality.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 mm/memcontrol.c | 83 ++++++++++++++++++++++++++++++++-----------------
 1 file changed, 54 insertions(+), 29 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ba5d004049d3..fa28efa298f4 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1796,22 +1796,17 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
  *
  * returns true if successful, false otherwise.
  */
-static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
-			  gfp_t gfp_mask)
+static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
 	struct memcg_stock_pcp *stock;
 	unsigned int stock_pages;
-	unsigned long flags;
 	bool ret = false;
 
 	if (nr_pages > MEMCG_CHARGE_BATCH)
 		return ret;
 
-	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
-		if (!gfpflags_allow_spinning(gfp_mask))
+	if (!localtry_trylock(&memcg_stock.stock_lock))
 			return ret;
-		localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
-	}
 
 	stock = this_cpu_ptr(&memcg_stock);
 	stock_pages = READ_ONCE(stock->nr_pages);
@@ -1820,7 +1815,7 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
 		ret = true;
 	}
 
-	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+	localtry_unlock(&memcg_stock.stock_lock);
 
 	return ret;
 }
@@ -1855,7 +1850,6 @@ static void drain_stock(struct memcg_stock_pcp *stock)
 static void drain_local_stock(struct work_struct *dummy)
 {
 	struct memcg_stock_pcp *stock;
-	unsigned long flags;
 
 	lockdep_assert_once(in_task());
 
@@ -1864,14 +1858,14 @@ static void drain_local_stock(struct work_struct *dummy)
 	 * drain_stock races is that we always operate on local CPU stock
 	 * here with IRQ disabled
 	 */
-	localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
+	localtry_lock(&memcg_stock.stock_lock);
 
 	stock = this_cpu_ptr(&memcg_stock);
 	drain_obj_stock(stock);
 	drain_stock(stock);
 	clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
 
-	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+	localtry_unlock(&memcg_stock.stock_lock);
 }
 
 /* Should never be called with root_mem_cgroup. */
@@ -1879,9 +1873,8 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 {
 	struct memcg_stock_pcp *stock;
 	unsigned int stock_pages;
-	unsigned long flags;
 
-	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
+	if (!localtry_trylock(&memcg_stock.stock_lock)) {
 		/*
 		 * In case of unlikely failure to lock percpu stock_lock
 		 * uncharge memcg directly.
@@ -1902,7 +1895,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
 	if (stock_pages > MEMCG_CHARGE_BATCH)
 		drain_stock(stock);
 
-	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+	localtry_unlock(&memcg_stock.stock_lock);
 }
 
 /*
@@ -1953,17 +1946,12 @@ void drain_all_stock(struct mem_cgroup *root_memcg)
 static int memcg_hotplug_cpu_dead(unsigned int cpu)
 {
 	struct memcg_stock_pcp *stock;
-	unsigned long flags;
 
 	lockdep_assert_once(in_task());
 
 	stock = &per_cpu(memcg_stock, cpu);
 
-	/* drain_obj_stock requires stock_lock */
-	localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
 	drain_obj_stock(stock);
-	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
-
 	drain_stock(stock);
 
 	return 0;
@@ -2254,7 +2242,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	unsigned long pflags;
 
 retry:
-	if (consume_stock(memcg, nr_pages, gfp_mask))
+	if (consume_stock(memcg, nr_pages))
 		return 0;
 
 	if (!gfpflags_allow_spinning(gfp_mask))
@@ -2757,6 +2745,28 @@ static void replace_stock_objcg(struct memcg_stock_pcp *stock,
 	WRITE_ONCE(stock->cached_objcg, objcg);
 }
 
+static unsigned long rt_lock(void)
+{
+#ifdef CONFIG_PREEMPT_RT
+	migrate_disable();
+	return 0;
+#else
+	unsigned long flags = 0;
+
+	local_irq_save(flags);
+	return flags;
+#endif
+}
+
+static void rt_unlock(unsigned long flags)
+{
+#ifdef CONFIG_PREEMPT_RT
+	migrate_enable();
+#else
+	local_irq_restore(flags);
+#endif
+}
+
 static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 		     enum node_stat_item idx, int nr)
 {
@@ -2764,7 +2774,8 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 	unsigned long flags;
 	int *bytes;
 
-	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
+	if (!localtry_trylock(&memcg_stock.stock_lock)) {
+		/* Do we need mix_rt_[un]lock here too. */
 		__mod_objcg_mlstate(objcg, pgdat, idx, nr);
 		return;
 	}
@@ -2783,6 +2794,8 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 		/* Flush the existing cached vmstat data */
 		struct pglist_data *oldpg = stock->cached_pgdat;
 
+		flags = rt_lock();
+
 		if (stock->nr_slab_reclaimable_b) {
 			__mod_objcg_mlstate(objcg, oldpg, NR_SLAB_RECLAIMABLE_B,
 					  stock->nr_slab_reclaimable_b);
@@ -2793,6 +2806,8 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 					  stock->nr_slab_unreclaimable_b);
 			stock->nr_slab_unreclaimable_b = 0;
 		}
+
+		rt_unlock(flags);
 		stock->cached_pgdat = pgdat;
 	}
 
@@ -2814,19 +2829,21 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 			nr = 0;
 		}
 	}
-	if (nr)
+	if (nr) {
+		flags = rt_lock();
 		__mod_objcg_mlstate(objcg, pgdat, idx, nr);
+		rt_unlock(flags);
+	}
 
-	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+	localtry_unlock(&memcg_stock.stock_lock);
 }
 
 static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
 {
 	struct memcg_stock_pcp *stock;
-	unsigned long flags;
 	bool ret = false;
 
-	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags))
+	if (!localtry_trylock(&memcg_stock.stock_lock))
 		return ret;
 
 	stock = this_cpu_ptr(&memcg_stock);
@@ -2835,7 +2852,7 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
 		ret = true;
 	}
 
-	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+	localtry_unlock(&memcg_stock.stock_lock);
 
 	return ret;
 }
@@ -2843,10 +2860,16 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
 static void drain_obj_stock(struct memcg_stock_pcp *stock)
 {
 	struct obj_cgroup *old = READ_ONCE(stock->cached_objcg);
+	unsigned long flags;
+	bool locked = stock->nr_bytes || stock->nr_slab_reclaimable_b ||
+		stock->nr_slab_unreclaimable_b;
 
 	if (!old)
 		return;
 
+	if (locked)
+		flags = rt_lock();
+
 	if (stock->nr_bytes) {
 		unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT;
 		unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1);
@@ -2897,6 +2920,9 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
 		stock->cached_pgdat = NULL;
 	}
 
+	if (locked)
+		rt_unlock(flags);
+
 	WRITE_ONCE(stock->cached_objcg, NULL);
 	obj_cgroup_put(old);
 }
@@ -2920,10 +2946,9 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
 			     bool allow_uncharge)
 {
 	struct memcg_stock_pcp *stock;
-	unsigned long flags;
 	unsigned int nr_pages = 0;
 
-	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
+	if (!localtry_trylock(&memcg_stock.stock_lock)) {
 		atomic_add(nr_bytes, &objcg->nr_charged_bytes);
 		return;
 	}
@@ -2940,7 +2965,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
 		stock->nr_bytes &= (PAGE_SIZE - 1);
 	}
 
-	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+	localtry_unlock(&memcg_stock.stock_lock);
 
 	if (nr_pages)
 		obj_cgroup_uncharge_pages(objcg, nr_pages);
-- 
2.47.1



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 02/10] memcg: decouple drain_obj_stock from local stock
  2025-03-14  6:15 ` [RFC PATCH 02/10] memcg: decouple drain_obj_stock from local stock Shakeel Butt
@ 2025-03-14  9:57   ` Vlastimil Babka
  0 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2025-03-14  9:57 UTC (permalink / raw)
  To: Shakeel Butt, Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Sebastian Andrzej Siewior, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On 3/14/25 07:15, Shakeel Butt wrote:
> Currently drain_obj_stock() can potentially call __refill_stock which
> accesses local cpu stock and thus requires memcg stock's local_lock.
> However if we look at the code paths leading to drain_obj_stock(), there
> is never a good reason to refill the memcg stock at all from it.
> 
> At the moment, drain_obj_stock can be called from reclaim, hotplug cpu
> teardown, mod_objcg_state() and refill_obj_stock(). For reclaim and
> hotplug there is no need to refill. For the other two paths, most
> probably the newly switched objcg would be used in near future and thus
> no need to refill stock with the older objcg.
> 
> In addition, __refill_stock() from drain_obj_stock() happens on rare
> cases, so performance is not really an issue. Let's just uncharge
> directly instead of refill which will also decouple drain_obj_stock from
> local cpu stock and local_lock requirements.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/memcontrol.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index c09a32e93d39..28cb75b5bc66 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2855,7 +2855,12 @@ static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
>  
>  			mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
>  			memcg1_account_kmem(memcg, -nr_pages);
> -			__refill_stock(memcg, nr_pages);
> +			if (!mem_cgroup_is_root(memcg)) {
> +				page_counter_uncharge(&memcg->memory, nr_pages);
> +				if (do_memsw_account())
> +					page_counter_uncharge(&memcg->memsw,
> +							      nr_pages);
> +			}
>  
>  			css_put(&memcg->css);
>  		}



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 03/10] memcg: introduce memcg_uncharge
  2025-03-14  6:15 ` [RFC PATCH 03/10] memcg: introduce memcg_uncharge Shakeel Butt
@ 2025-03-14 10:01   ` Vlastimil Babka
  0 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2025-03-14 10:01 UTC (permalink / raw)
  To: Shakeel Butt, Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Sebastian Andrzej Siewior, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On 3/14/25 07:15, Shakeel Butt wrote:
> At multiple places in memcontrol.c, the memory and memsw page counters
> are being uncharged. This is error-prone. Let's move the functionality
> to a newly introduced memcg_uncharge and call it from all those places.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Acked-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 04/10] memcg: manually inline __refill_stock
  2025-03-14  6:15 ` [RFC PATCH 04/10] memcg: manually inline __refill_stock Shakeel Butt
@ 2025-03-14 10:05   ` Vlastimil Babka
  0 siblings, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2025-03-14 10:05 UTC (permalink / raw)
  To: Shakeel Butt, Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Sebastian Andrzej Siewior, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On 3/14/25 07:15, Shakeel Butt wrote:
> There are no more multiple callers of __refill_stock(), so simply inline
> it to refill_stock().
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Acked-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 05/10] memcg: no refilling stock from obj_cgroup_release
  2025-03-14  6:15 ` [RFC PATCH 05/10] memcg: no refilling stock from obj_cgroup_release Shakeel Butt
@ 2025-03-14 10:09   ` Vlastimil Babka
  2025-03-14 11:26   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 34+ messages in thread
From: Vlastimil Babka @ 2025-03-14 10:09 UTC (permalink / raw)
  To: Shakeel Butt, Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Sebastian Andrzej Siewior, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On 3/14/25 07:15, Shakeel Butt wrote:
> obj_cgroup_release is called when all the references to the objcg has
> been released i.e. no more memory objects are pointing to it. Most
> probably objcg->memcg will be pointing to some ancestor memcg and at the
> moment, in obj_cgroup_release, the kernel call
> obj_cgroup_uncharge_pages() to uncharge last remaining memory.
> 
> However obj_cgroup_uncharge_pages() refills the local stock. There is
> no need to refill the local stock with some ancestor memcg and flush the
> local stock. In addition this removes the requirement to only call
> obj_cgroup_put() outside of local_lock.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/memcontrol.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 7054b0ebd207..83db180455a1 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -129,8 +129,7 @@ bool mem_cgroup_kmem_disabled(void)
>  	return cgroup_memory_nokmem;
>  }
>  
> -static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
> -				      unsigned int nr_pages);
> +static void memcg_uncharge(struct mem_cgroup *memcg, unsigned int nr_pages);
>  
>  static void obj_cgroup_release(struct percpu_ref *ref)
>  {
> @@ -163,8 +162,16 @@ static void obj_cgroup_release(struct percpu_ref *ref)
>  	WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1));
>  	nr_pages = nr_bytes >> PAGE_SHIFT;
>  
> -	if (nr_pages)
> -		obj_cgroup_uncharge_pages(objcg, nr_pages);
> +	if (nr_pages) {
> +		struct mem_cgroup *memcg;
> +
> +		memcg = get_mem_cgroup_from_objcg(objcg);
> +		mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
> +		memcg1_account_kmem(memcg, -nr_pages);
> +		if (!mem_cgroup_is_root(memcg))
> +			memcg_uncharge(memcg, nr_pages);
> +		css_put(&memcg->css);
> +	}
>  
>  	spin_lock_irqsave(&objcg_lock, flags);
>  	list_del(&objcg->list);



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 06/10] memcg: do obj_cgroup_put inside drain_obj_stock
  2025-03-14  6:15 ` [RFC PATCH 06/10] memcg: do obj_cgroup_put inside drain_obj_stock Shakeel Butt
@ 2025-03-14 10:17   ` Vlastimil Babka
  2025-03-14 11:35     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 34+ messages in thread
From: Vlastimil Babka @ 2025-03-14 10:17 UTC (permalink / raw)
  To: Shakeel Butt, Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Sebastian Andrzej Siewior, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On 3/14/25 07:15, Shakeel Butt wrote:
> Previously we could not call obj_cgroup_put() inside the local lock
> because on the put on the last reference, the release function
> obj_cgroup_release() may try to re-acquire the local lock. However that
> chain has been broken. Now simply do obj_cgroup_put() inside
> drain_obj_stock() instead of returning the old objcg.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Hm is this really safe? I can see obj_cgroup_release() doing
percpu_ref_exit() -> kfree(), do we have guaranteed that allocation won't be
also in a kmemcg and recurse?





^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 07/10] memcg: use __mod_memcg_state in drain_obj_stock
  2025-03-14  6:15 ` [RFC PATCH 07/10] memcg: use __mod_memcg_state in drain_obj_stock Shakeel Butt
@ 2025-03-14 10:27   ` Vlastimil Babka
  2025-03-14 11:44     ` Sebastian Andrzej Siewior
  2025-03-14 11:38   ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 34+ messages in thread
From: Vlastimil Babka @ 2025-03-14 10:27 UTC (permalink / raw)
  To: Shakeel Butt, Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Sebastian Andrzej Siewior, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On 3/14/25 07:15, Shakeel Butt wrote:
> For non-PREEMPT_RT kernels, drain_obj_stock() is always called with irq
> disabled, so we can use __mod_memcg_state() instead of
> mod_memcg_state(). For PREEMPT_RT, we need to add memcg_stats_[un]lock
> in __mod_memcg_state().
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Maybe it'll make sense later but as of this patch itself it begs a question
why put memcg_stats_lock()/unlock() in __mod_memcg_state itself and not just
around the call in drain_obj_stock()?

> ---
>  mm/memcontrol.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 3c4de384b5a0..dfe9c2eb7816 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -707,10 +707,12 @@ void __mod_memcg_state(struct mem_cgroup *memcg, enum memcg_stat_item idx,
>  	if (WARN_ONCE(BAD_STAT_IDX(i), "%s: missing stat item %d\n", __func__, idx))
>  		return;
>  
> +	memcg_stats_lock();
>  	__this_cpu_add(memcg->vmstats_percpu->state[i], val);
>  	val = memcg_state_val_in_pages(idx, val);
>  	memcg_rstat_updated(memcg, val);
>  	trace_mod_memcg_state(memcg, idx, val);
> +	memcg_stats_unlock();
>  }
>  
>  #ifdef CONFIG_MEMCG_V1
> @@ -2845,7 +2847,7 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
>  
>  			memcg = get_mem_cgroup_from_objcg(old);
>  
> -			mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
> +			__mod_memcg_state(memcg, MEMCG_KMEM, -nr_pages);
>  			memcg1_account_kmem(memcg, -nr_pages);
>  			if (!mem_cgroup_is_root(memcg))
>  				memcg_uncharge(memcg, nr_pages);



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 10/10] memcg: no more irq disabling for stock locks
  2025-03-14  6:15 ` [RFC PATCH 10/10] memcg: no more irq disabling for stock locks Shakeel Butt
@ 2025-03-14 10:54   ` Vlastimil Babka
  2025-03-14 11:58     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 34+ messages in thread
From: Vlastimil Babka @ 2025-03-14 10:54 UTC (permalink / raw)
  To: Shakeel Butt, Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Sebastian Andrzej Siewior, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On 3/14/25 07:15, Shakeel Butt wrote:
> Let's switch all memcg_stock locks acquire and release places to not
> disable and enable irqs. There are two still functions (i.e.
> mod_objcg_state() and drain_obj_stock) which needs to disable irqs to
> update the stats on non-RT kernels. For now add a simple wrapper for
> that functionality.

BTW, which part of __mod_objcg_mlstate() really needs disabled irqs and not
just preemption? I see it does rcu_read_lock() anyway, which disables
preemption. Then in __mod_memcg_lruvec_state() we do some __this_cpu_add()
updates. I think these also are fine with just disabled preemption as they
are atomic vs irqs (but don't need LOCK prefix to be atomic vs other cpus
updates).

Is it just memcg_rstat_updated() which does READ_ONCE/WRITE_ONCE? Could we
perhaps just change it to operations where disabled preemption is enough?

> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---
>  mm/memcontrol.c | 83 ++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 54 insertions(+), 29 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index ba5d004049d3..fa28efa298f4 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1796,22 +1796,17 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
>   *
>   * returns true if successful, false otherwise.
>   */
> -static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
> -			  gfp_t gfp_mask)
> +static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
>  {
>  	struct memcg_stock_pcp *stock;
>  	unsigned int stock_pages;
> -	unsigned long flags;
>  	bool ret = false;
>  
>  	if (nr_pages > MEMCG_CHARGE_BATCH)
>  		return ret;
>  
> -	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
> -		if (!gfpflags_allow_spinning(gfp_mask))
> +	if (!localtry_trylock(&memcg_stock.stock_lock))
>  			return ret;
> -		localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
> -	}
>  
>  	stock = this_cpu_ptr(&memcg_stock);
>  	stock_pages = READ_ONCE(stock->nr_pages);
> @@ -1820,7 +1815,7 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
>  		ret = true;
>  	}
>  
> -	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
> +	localtry_unlock(&memcg_stock.stock_lock);
>  
>  	return ret;
>  }
> @@ -1855,7 +1850,6 @@ static void drain_stock(struct memcg_stock_pcp *stock)
>  static void drain_local_stock(struct work_struct *dummy)
>  {
>  	struct memcg_stock_pcp *stock;
> -	unsigned long flags;
>  
>  	lockdep_assert_once(in_task());
>  
> @@ -1864,14 +1858,14 @@ static void drain_local_stock(struct work_struct *dummy)
>  	 * drain_stock races is that we always operate on local CPU stock
>  	 * here with IRQ disabled
>  	 */
> -	localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
> +	localtry_lock(&memcg_stock.stock_lock);
>  
>  	stock = this_cpu_ptr(&memcg_stock);
>  	drain_obj_stock(stock);
>  	drain_stock(stock);
>  	clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
>  
> -	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
> +	localtry_unlock(&memcg_stock.stock_lock);
>  }
>  
>  /* Should never be called with root_mem_cgroup. */
> @@ -1879,9 +1873,8 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
>  {
>  	struct memcg_stock_pcp *stock;
>  	unsigned int stock_pages;
> -	unsigned long flags;
>  
> -	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
> +	if (!localtry_trylock(&memcg_stock.stock_lock)) {
>  		/*
>  		 * In case of unlikely failure to lock percpu stock_lock
>  		 * uncharge memcg directly.
> @@ -1902,7 +1895,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
>  	if (stock_pages > MEMCG_CHARGE_BATCH)
>  		drain_stock(stock);
>  
> -	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
> +	localtry_unlock(&memcg_stock.stock_lock);
>  }
>  
>  /*
> @@ -1953,17 +1946,12 @@ void drain_all_stock(struct mem_cgroup *root_memcg)
>  static int memcg_hotplug_cpu_dead(unsigned int cpu)
>  {
>  	struct memcg_stock_pcp *stock;
> -	unsigned long flags;
>  
>  	lockdep_assert_once(in_task());
>  
>  	stock = &per_cpu(memcg_stock, cpu);
>  
> -	/* drain_obj_stock requires stock_lock */
> -	localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
>  	drain_obj_stock(stock);
> -	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
> -
>  	drain_stock(stock);
>  
>  	return 0;
> @@ -2254,7 +2242,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  	unsigned long pflags;
>  
>  retry:
> -	if (consume_stock(memcg, nr_pages, gfp_mask))
> +	if (consume_stock(memcg, nr_pages))
>  		return 0;
>  
>  	if (!gfpflags_allow_spinning(gfp_mask))
> @@ -2757,6 +2745,28 @@ static void replace_stock_objcg(struct memcg_stock_pcp *stock,
>  	WRITE_ONCE(stock->cached_objcg, objcg);
>  }
>  
> +static unsigned long rt_lock(void)
> +{
> +#ifdef CONFIG_PREEMPT_RT
> +	migrate_disable();
> +	return 0;
> +#else
> +	unsigned long flags = 0;
> +
> +	local_irq_save(flags);
> +	return flags;
> +#endif
> +}
> +
> +static void rt_unlock(unsigned long flags)
> +{
> +#ifdef CONFIG_PREEMPT_RT
> +	migrate_enable();
> +#else
> +	local_irq_restore(flags);
> +#endif
> +}
> +
>  static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
>  		     enum node_stat_item idx, int nr)
>  {
> @@ -2764,7 +2774,8 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
>  	unsigned long flags;
>  	int *bytes;
>  
> -	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
> +	if (!localtry_trylock(&memcg_stock.stock_lock)) {
> +		/* Do we need mix_rt_[un]lock here too. */
>  		__mod_objcg_mlstate(objcg, pgdat, idx, nr);
>  		return;
>  	}
> @@ -2783,6 +2794,8 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
>  		/* Flush the existing cached vmstat data */
>  		struct pglist_data *oldpg = stock->cached_pgdat;
>  
> +		flags = rt_lock();
> +
>  		if (stock->nr_slab_reclaimable_b) {
>  			__mod_objcg_mlstate(objcg, oldpg, NR_SLAB_RECLAIMABLE_B,
>  					  stock->nr_slab_reclaimable_b);
> @@ -2793,6 +2806,8 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
>  					  stock->nr_slab_unreclaimable_b);
>  			stock->nr_slab_unreclaimable_b = 0;
>  		}
> +
> +		rt_unlock(flags);
>  		stock->cached_pgdat = pgdat;
>  	}
>  
> @@ -2814,19 +2829,21 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
>  			nr = 0;
>  		}
>  	}
> -	if (nr)
> +	if (nr) {
> +		flags = rt_lock();
>  		__mod_objcg_mlstate(objcg, pgdat, idx, nr);
> +		rt_unlock(flags);
> +	}
>  
> -	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
> +	localtry_unlock(&memcg_stock.stock_lock);
>  }
>  
>  static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
>  {
>  	struct memcg_stock_pcp *stock;
> -	unsigned long flags;
>  	bool ret = false;
>  
> -	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags))
> +	if (!localtry_trylock(&memcg_stock.stock_lock))
>  		return ret;
>  
>  	stock = this_cpu_ptr(&memcg_stock);
> @@ -2835,7 +2852,7 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
>  		ret = true;
>  	}
>  
> -	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
> +	localtry_unlock(&memcg_stock.stock_lock);
>  
>  	return ret;
>  }
> @@ -2843,10 +2860,16 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
>  static void drain_obj_stock(struct memcg_stock_pcp *stock)
>  {
>  	struct obj_cgroup *old = READ_ONCE(stock->cached_objcg);
> +	unsigned long flags;
> +	bool locked = stock->nr_bytes || stock->nr_slab_reclaimable_b ||
> +		stock->nr_slab_unreclaimable_b;
>  
>  	if (!old)
>  		return;
>  
> +	if (locked)
> +		flags = rt_lock();
> +
>  	if (stock->nr_bytes) {
>  		unsigned int nr_pages = stock->nr_bytes >> PAGE_SHIFT;
>  		unsigned int nr_bytes = stock->nr_bytes & (PAGE_SIZE - 1);
> @@ -2897,6 +2920,9 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock)
>  		stock->cached_pgdat = NULL;
>  	}
>  
> +	if (locked)
> +		rt_unlock(flags);
> +
>  	WRITE_ONCE(stock->cached_objcg, NULL);
>  	obj_cgroup_put(old);
>  }
> @@ -2920,10 +2946,9 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
>  			     bool allow_uncharge)
>  {
>  	struct memcg_stock_pcp *stock;
> -	unsigned long flags;
>  	unsigned int nr_pages = 0;
>  
> -	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
> +	if (!localtry_trylock(&memcg_stock.stock_lock)) {
>  		atomic_add(nr_bytes, &objcg->nr_charged_bytes);
>  		return;
>  	}
> @@ -2940,7 +2965,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
>  		stock->nr_bytes &= (PAGE_SIZE - 1);
>  	}
>  
> -	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
> +	localtry_unlock(&memcg_stock.stock_lock);
>  
>  	if (nr_pages)
>  		obj_cgroup_uncharge_pages(objcg, nr_pages);



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 05/10] memcg: no refilling stock from obj_cgroup_release
  2025-03-14  6:15 ` [RFC PATCH 05/10] memcg: no refilling stock from obj_cgroup_release Shakeel Butt
  2025-03-14 10:09   ` Vlastimil Babka
@ 2025-03-14 11:26   ` Sebastian Andrzej Siewior
  2025-03-14 15:25     ` Shakeel Butt
  1 sibling, 1 reply; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-03-14 11:26 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Vlastimil Babka, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On 2025-03-13 23:15:06 [-0700], Shakeel Butt wrote:
> obj_cgroup_release is called when all the references to the objcg has

"references to the objcg have"

> been released i.e. no more memory objects are pointing to it. Most
> probably objcg->memcg will be pointing to some ancestor memcg and at the
> moment, in obj_cgroup_release, the kernel call
> obj_cgroup_uncharge_pages() to uncharge last remaining memory.

This sounds somehow funny. I think the point is to uncharge the pages
without tampering memcg_stock because it is unnecessary.

> However obj_cgroup_uncharge_pages() refills the local stock. There is
> no need to refill the local stock with some ancestor memcg and flush the
> local stock. In addition this removes the requirement to only call
> obj_cgroup_put() outside of local_lock.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Sebastian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 06/10] memcg: do obj_cgroup_put inside drain_obj_stock
  2025-03-14 10:17   ` Vlastimil Babka
@ 2025-03-14 11:35     ` Sebastian Andrzej Siewior
  2025-03-14 15:29       ` Shakeel Butt
  0 siblings, 1 reply; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-03-14 11:35 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Shakeel Butt, Andrew Morton, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On 2025-03-14 11:17:28 [+0100], Vlastimil Babka wrote:
> On 3/14/25 07:15, Shakeel Butt wrote:
> > Previously we could not call obj_cgroup_put() inside the local lock
> > because on the put on the last reference, the release function
> > obj_cgroup_release() may try to re-acquire the local lock. However that
> > chain has been broken. Now simply do obj_cgroup_put() inside
> > drain_obj_stock() instead of returning the old objcg.
> > 
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> 
> Hm is this really safe? I can see obj_cgroup_release() doing
> percpu_ref_exit() -> kfree(), do we have guaranteed that allocation won't be
> also in a kmemcg and recurse?

This was like this until commit
	5675114623872 ("mm/memcg: protect memcg_stock with a local_lock_t")

at which point the put had to happen outside. This "percpu_ref_exit() ->
kfree()" was also prior this commit.

Sebastian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 07/10] memcg: use __mod_memcg_state in drain_obj_stock
  2025-03-14  6:15 ` [RFC PATCH 07/10] memcg: use __mod_memcg_state in drain_obj_stock Shakeel Butt
  2025-03-14 10:27   ` Vlastimil Babka
@ 2025-03-14 11:38   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-03-14 11:38 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Vlastimil Babka, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On 2025-03-13 23:15:08 [-0700], Shakeel Butt wrote:
> For non-PREEMPT_RT kernels, drain_obj_stock() is always called with irq
> disabled, so we can use __mod_memcg_state() instead of
> mod_memcg_state(). For PREEMPT_RT, we need to add memcg_stats_[un]lock
> in __mod_memcg_state().
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Sebastian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 07/10] memcg: use __mod_memcg_state in drain_obj_stock
  2025-03-14 10:27   ` Vlastimil Babka
@ 2025-03-14 11:44     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-03-14 11:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Shakeel Butt, Andrew Morton, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On 2025-03-14 11:27:40 [+0100], Vlastimil Babka wrote:
> On 3/14/25 07:15, Shakeel Butt wrote:
> > For non-PREEMPT_RT kernels, drain_obj_stock() is always called with irq
> > disabled, so we can use __mod_memcg_state() instead of
> > mod_memcg_state(). For PREEMPT_RT, we need to add memcg_stats_[un]lock
> > in __mod_memcg_state().
> > 
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> 
> Maybe it'll make sense later but as of this patch itself it begs a question
> why put memcg_stats_lock()/unlock() in __mod_memcg_state itself and not just
> around the call in drain_obj_stock()?

The memcg_stats_lock() were introduce to protect the per-CPU counters
(vmstats_percpu) on PREEMPT_RT which are protected on !PREEMPT_RT by
disabling interrupts. Other modifier have this already except for
__mod_memcg_state() because mod_memcg_state() was the only user and
already disables interrupt for the operation.

Sebastian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 09/10] memcg: trylock stock for objcg
  2025-03-14  6:15 ` [RFC PATCH 09/10] memcg: trylock stock for objcg Shakeel Butt
@ 2025-03-14 11:47   ` Sebastian Andrzej Siewior
  2025-03-14 15:33     ` Shakeel Butt
  0 siblings, 1 reply; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-03-14 11:47 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Vlastimil Babka, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On 2025-03-13 23:15:10 [-0700], Shakeel Butt wrote:
> To make objcg stock functions work without disabling irq, we need to
> convert those function to use localtry_trylock_irqsave() instead of
> localtry_lock_irqsave(). This patch for now just does the conversion and
> later patch will eliminate the irq disabling code.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---
>  mm/memcontrol.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index c803d2f5e322..ba5d004049d3 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2764,7 +2764,11 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
>  	unsigned long flags;
>  	int *bytes;
>  
> -	localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
> +	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {

Don't you need to change the of memcg_stock.stock_lock? Didn't we
introduce an explicit different type for this trylock feature?

> +		__mod_objcg_mlstate(objcg, pgdat, idx, nr);
> +		return;
> +	}
> +
>  	stock = this_cpu_ptr(&memcg_stock);
>  
>  	/*

Sebastian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 10/10] memcg: no more irq disabling for stock locks
  2025-03-14 10:54   ` Vlastimil Babka
@ 2025-03-14 11:58     ` Sebastian Andrzej Siewior
  2025-03-14 15:55       ` Shakeel Butt
  0 siblings, 1 reply; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-03-14 11:58 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Shakeel Butt, Andrew Morton, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On 2025-03-14 11:54:34 [+0100], Vlastimil Babka wrote:
> On 3/14/25 07:15, Shakeel Butt wrote:
> > Let's switch all memcg_stock locks acquire and release places to not
> > disable and enable irqs. There are two still functions (i.e.
> > mod_objcg_state() and drain_obj_stock) which needs to disable irqs to
> > update the stats on non-RT kernels. For now add a simple wrapper for
> > that functionality.
> 
> BTW, which part of __mod_objcg_mlstate() really needs disabled irqs and not
> just preemption? I see it does rcu_read_lock() anyway, which disables
> preemption. Then in __mod_memcg_lruvec_state() we do some __this_cpu_add()
> updates. I think these also are fine with just disabled preemption as they
> are atomic vs irqs (but don't need LOCK prefix to be atomic vs other cpus
> updates).

__this_cpu_add() is not safe if also used in interrupt context. Some
architectures (not x86) implemented as read, add, write.
this_cpu_add()() does the same but disables interrupts during the
operation.
So __this_cpu_add() should not be used if interrupts are not disabled
and a modification can happen from interrupt context.

> Is it just memcg_rstat_updated() which does READ_ONCE/WRITE_ONCE? Could we
> perhaps just change it to operations where disabled preemption is enough?
> 
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > @@ -2757,6 +2745,28 @@ static void replace_stock_objcg(struct memcg_stock_pcp *stock,
> >  	WRITE_ONCE(stock->cached_objcg, objcg);
> >  }
> >  
> > +static unsigned long rt_lock(void)
> > +{

No, we don't name it rt_lock(). We have local_lock() for this exact
reason. And migrate_disable() does not protect vs re-enter of the
function on the CPU while local_irq_save() does.

> > +#ifdef CONFIG_PREEMPT_RT
> > +	migrate_disable();
> > +	return 0;
> > +#else
> > +	unsigned long flags = 0;
> > +
> > +	local_irq_save(flags);
> > +	return flags;
> > +#endif
> > +}
> > +
> > +static void rt_unlock(unsigned long flags)
> > +{
> > +#ifdef CONFIG_PREEMPT_RT
> > +	migrate_enable();
> > +#else
> > +	local_irq_restore(flags);
> > +#endif
> > +}
> > +
> >  static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
> >  		     enum node_stat_item idx, int nr)
> >  {

Sebastian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 00/10] memcg: stock code cleanups
  2025-03-14  6:15 [RFC PATCH 00/10] memcg: stock code cleanups Shakeel Butt
                   ` (9 preceding siblings ...)
  2025-03-14  6:15 ` [RFC PATCH 10/10] memcg: no more irq disabling for stock locks Shakeel Butt
@ 2025-03-14 13:33 ` Vlastimil Babka
  2025-03-14 16:03   ` Shakeel Butt
  10 siblings, 1 reply; 34+ messages in thread
From: Vlastimil Babka @ 2025-03-14 13:33 UTC (permalink / raw)
  To: Shakeel Butt, Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Sebastian Andrzej Siewior, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On 3/14/25 07:15, Shakeel Butt wrote:
> This is a cleanup series (first 7 patches) is trying to simplify the
> memcg stock code, particularly it tries to remove unnecessary
> dependencies. The last 3 patches are a prototype to make per-cpu memcg
> stock work without disabling irqs.
> 
> My plan is to send out the first 7 cleanup patches separately for the
> next release window and iterate more on the last 3 patches plus add
> functionality for multiple memcgs.
> 
> This series is based on next-20250313 plus two following patches:
> 
> Link: https://lore.kernel.org/all/20250312222552.3284173-1-shakeel.butt@linux.dev/
> Link: https://lore.kernel.org/all/20250313054812.2185900-1-shakeel.butt@linux.dev/
> 
>  to simply the memcg stock code

Hi, 

I've been looking at this area too, and noticed a different opportunity for
cleanup+perf improvement yesterday. I rebased it on top of patch 7 as it
would make sense to do it before changing the locking - it reduces the
number of places where the local_trylock is taken. If it's ok to you, please
incorporate to your series.

Thanks,
Vlastimil

----8<----
From 8dda8e4225ee91b48a73759f727a28d448c634b5 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Thu, 13 Mar 2025 20:08:32 +0100
Subject: [PATCH] memcg: combine slab obj stock charging and accounting

When handing slab objects, we use obj_cgroup_[un]charge() for
(un)charging and mod_objcg_state() to account NR_SLAB_[UN]RECLAIMABLE_B.
All these operations use the percpu stock for performance. However with
the calls being separate, the stock_lock is taken twice in each case.

By refactoring the code, we can turn mod_objcg_state() into
__account_obj_stock() which is called on a stock that's already locked
and validated. On the charging side we can call this function from
consume_obj_stock() when it succeeds, and refill_obj_stock() in the
fallback. We just expand parameters of these functions as necessary.
The uncharge side from __memcg_slab_free_hook() is just the call to
refill_obj_stock().

Other callers of obj_cgroup_[un]charge() (i.e. not slab) simply pass the
extra parameters as NULL/zeroes to skip the __account_obj_stock()
operation.

In __memcg_slab_post_alloc_hook() we now charge each object separately,
but that's not a problem as we did call mod_objcg_state() for each
object separately, and most allocations are non-bulk anyway. This
could be improved by batching all operations until slab_pgdat(slab)
changes.

Some preliminary benchmarking with a kfree(kmalloc()) loop of 10M
iterations with/without __GFP_ACCOUNT:

Before the patch:
kmalloc/kfree !memcg:    581390144 cycles
kmalloc/kfree memcg:     783689984 cycles

After the patch:
kmalloc/kfree memcg:     658723808 cycles

More than half of the overhead of __GFP_ACCOUNT relative to
non-accounted case seems eliminated.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/memcontrol.c | 77 +++++++++++++++++++++++++++++--------------------
 1 file changed, 46 insertions(+), 31 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index dfe9c2eb7816..553eb1d7250a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2753,25 +2753,17 @@ static void replace_stock_objcg(struct memcg_stock_pcp *stock,
 	WRITE_ONCE(stock->cached_objcg, objcg);
 }
 
-static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
-		     enum node_stat_item idx, int nr)
+static void __account_obj_stock(struct obj_cgroup *objcg,
+				struct memcg_stock_pcp *stock, int nr,
+				struct pglist_data *pgdat, enum node_stat_item idx)
 {
-	struct memcg_stock_pcp *stock;
-	unsigned long flags;
 	int *bytes;
 
-	localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
-	stock = this_cpu_ptr(&memcg_stock);
-
 	/*
 	 * Save vmstat data in stock and skip vmstat array update unless
-	 * accumulating over a page of vmstat data or when pgdat or idx
-	 * changes.
+	 * accumulating over a page of vmstat data or when pgdat changes.
 	 */
-	if (READ_ONCE(stock->cached_objcg) != objcg) {
-		replace_stock_objcg(stock, objcg);
-		stock->cached_pgdat = pgdat;
-	} else if (stock->cached_pgdat != pgdat) {
+	if (stock->cached_pgdat != pgdat) {
 		/* Flush the existing cached vmstat data */
 		struct pglist_data *oldpg = stock->cached_pgdat;
 
@@ -2808,11 +2800,10 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
 	}
 	if (nr)
 		__mod_objcg_mlstate(objcg, pgdat, idx, nr);
-
-	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
 }
 
-static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
+static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
+			      struct pglist_data *pgdat, enum node_stat_item idx)
 {
 	struct memcg_stock_pcp *stock;
 	unsigned long flags;
@@ -2824,6 +2815,9 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes)
 	if (objcg == READ_ONCE(stock->cached_objcg) && stock->nr_bytes >= nr_bytes) {
 		stock->nr_bytes -= nr_bytes;
 		ret = true;
+
+		if (pgdat)
+			__account_obj_stock(objcg, stock, nr_bytes, pgdat, idx);
 	}
 
 	localtry_unlock_irqrestore(&memcg_stock.stock_lock, flags);
@@ -2908,7 +2902,8 @@ static bool obj_stock_flush_required(struct memcg_stock_pcp *stock,
 }
 
 static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
-			     bool allow_uncharge)
+		bool allow_uncharge, int nr_acct, struct pglist_data *pgdat,
+		enum node_stat_item idx)
 {
 	struct memcg_stock_pcp *stock;
 	unsigned long flags;
@@ -2923,6 +2918,9 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
 	}
 	stock->nr_bytes += nr_bytes;
 
+	if (pgdat)
+		__account_obj_stock(objcg, stock, nr_acct, pgdat, idx);
+
 	if (allow_uncharge && (stock->nr_bytes > PAGE_SIZE)) {
 		nr_pages = stock->nr_bytes >> PAGE_SHIFT;
 		stock->nr_bytes &= (PAGE_SIZE - 1);
@@ -2934,12 +2932,13 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes,
 		obj_cgroup_uncharge_pages(objcg, nr_pages);
 }
 
-int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
+static int obj_cgroup_charge_account(struct obj_cgroup *objcg, gfp_t gfp, size_t size,
+				     struct pglist_data *pgdat, enum node_stat_item idx)
 {
 	unsigned int nr_pages, nr_bytes;
 	int ret;
 
-	if (consume_obj_stock(objcg, size))
+	if (likely(consume_obj_stock(objcg, size, pgdat, idx)))
 		return 0;
 
 	/*
@@ -2972,15 +2971,21 @@ int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
 		nr_pages += 1;
 
 	ret = obj_cgroup_charge_pages(objcg, gfp, nr_pages);
-	if (!ret && nr_bytes)
-		refill_obj_stock(objcg, PAGE_SIZE - nr_bytes, false);
+	if (!ret && (nr_bytes || pgdat))
+		refill_obj_stock(objcg, nr_bytes ? PAGE_SIZE - nr_bytes : 0,
+					 false, size, pgdat, idx);
 
 	return ret;
 }
 
+int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size)
+{
+	return obj_cgroup_charge_account(objcg, gfp, size, NULL, 0);
+}
+
 void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
 {
-	refill_obj_stock(objcg, size, true);
+	refill_obj_stock(objcg, size, true, 0, NULL, 0);
 }
 
 static inline size_t obj_full_size(struct kmem_cache *s)
@@ -3032,23 +3037,32 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
 			return false;
 	}
 
-	if (obj_cgroup_charge(objcg, flags, size * obj_full_size(s)))
-		return false;
-
 	for (i = 0; i < size; i++) {
 		slab = virt_to_slab(p[i]);
 
 		if (!slab_obj_exts(slab) &&
 		    alloc_slab_obj_exts(slab, s, flags, false)) {
-			obj_cgroup_uncharge(objcg, obj_full_size(s));
 			continue;
 		}
 
+		/*
+		 * if we fail and size is 1, memcg_alloc_abort_single() will
+		 * just free the object, which is ok as we have not assigned
+		 * objcg to its obj_ext yet
+		 *
+		 * for larger sizes, kmem_cache_free_bulk() will uncharge
+		 * any objects that were already charged and obj_ext assigned
+		 *
+		 * TODO: we could batch this until slab_pgdat(slab) changes
+		 * between iterations, with a more complicated undo
+		 */
+		if (obj_cgroup_charge_account(objcg, flags, obj_full_size(s),
+					slab_pgdat(slab), cache_vmstat_idx(s)))
+			return false;
+
 		off = obj_to_index(s, slab, p[i]);
 		obj_cgroup_get(objcg);
 		slab_obj_exts(slab)[off].objcg = objcg;
-		mod_objcg_state(objcg, slab_pgdat(slab),
-				cache_vmstat_idx(s), obj_full_size(s));
 	}
 
 	return true;
@@ -3057,6 +3071,8 @@ bool __memcg_slab_post_alloc_hook(struct kmem_cache *s, struct list_lru *lru,
 void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
 			    void **p, int objects, struct slabobj_ext *obj_exts)
 {
+	size_t obj_size = obj_full_size(s);
+
 	for (int i = 0; i < objects; i++) {
 		struct obj_cgroup *objcg;
 		unsigned int off;
@@ -3067,9 +3083,8 @@ void __memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
 			continue;
 
 		obj_exts[off].objcg = NULL;
-		obj_cgroup_uncharge(objcg, obj_full_size(s));
-		mod_objcg_state(objcg, slab_pgdat(slab), cache_vmstat_idx(s),
-				-obj_full_size(s));
+		refill_obj_stock(objcg, obj_size, true, -obj_size,
+				 slab_pgdat(slab), cache_vmstat_idx(s));
 		obj_cgroup_put(objcg);
 	}
 }
-- 
2.48.1




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 05/10] memcg: no refilling stock from obj_cgroup_release
  2025-03-14 11:26   ` Sebastian Andrzej Siewior
@ 2025-03-14 15:25     ` Shakeel Butt
  0 siblings, 0 replies; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14 15:25 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Vlastimil Babka, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On Fri, Mar 14, 2025 at 12:26:27PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-03-13 23:15:06 [-0700], Shakeel Butt wrote:
> > obj_cgroup_release is called when all the references to the objcg has
> 
> "references to the objcg have"
> 
> > been released i.e. no more memory objects are pointing to it. Most
> > probably objcg->memcg will be pointing to some ancestor memcg and at the
> > moment, in obj_cgroup_release, the kernel call
> > obj_cgroup_uncharge_pages() to uncharge last remaining memory.
> 
> This sounds somehow funny. I think the point is to uncharge the pages
> without tampering memcg_stock because it is unnecessary.
> 

Thanks, I will see to make the point more clean in the next version.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 06/10] memcg: do obj_cgroup_put inside drain_obj_stock
  2025-03-14 11:35     ` Sebastian Andrzej Siewior
@ 2025-03-14 15:29       ` Shakeel Butt
  0 siblings, 0 replies; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14 15:29 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Vlastimil Babka, Andrew Morton, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On Fri, Mar 14, 2025 at 12:35:33PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-03-14 11:17:28 [+0100], Vlastimil Babka wrote:
> > On 3/14/25 07:15, Shakeel Butt wrote:
> > > Previously we could not call obj_cgroup_put() inside the local lock
> > > because on the put on the last reference, the release function
> > > obj_cgroup_release() may try to re-acquire the local lock. However that
> > > chain has been broken. Now simply do obj_cgroup_put() inside
> > > drain_obj_stock() instead of returning the old objcg.
> > > 
> > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > 
> > Hm is this really safe? I can see obj_cgroup_release() doing
> > percpu_ref_exit() -> kfree(), do we have guaranteed that allocation won't be
> > also in a kmemcg and recurse?
> 
> This was like this until commit
> 	5675114623872 ("mm/memcg: protect memcg_stock with a local_lock_t")
> 
> at which point the put had to happen outside. This "percpu_ref_exit() ->
> kfree()" was also prior this commit.

Yes, as Sebastian said, this is as safe as before commit 567511462387.
Also the ref->data which is getting kfree()'ed from percpu_ref_exit() is
not a __GFP_ACCOUNT allocation, so can't recurse.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 09/10] memcg: trylock stock for objcg
  2025-03-14 11:47   ` Sebastian Andrzej Siewior
@ 2025-03-14 15:33     ` Shakeel Butt
  0 siblings, 0 replies; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14 15:33 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Vlastimil Babka, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On Fri, Mar 14, 2025 at 12:47:00PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-03-13 23:15:10 [-0700], Shakeel Butt wrote:
> > To make objcg stock functions work without disabling irq, we need to
> > convert those function to use localtry_trylock_irqsave() instead of
> > localtry_lock_irqsave(). This patch for now just does the conversion and
> > later patch will eliminate the irq disabling code.
> > 
> > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> > ---
> >  mm/memcontrol.c | 14 +++++++++++---
> >  1 file changed, 11 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index c803d2f5e322..ba5d004049d3 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2764,7 +2764,11 @@ static void mod_objcg_state(struct obj_cgroup *objcg, struct pglist_data *pgdat,
> >  	unsigned long flags;
> >  	int *bytes;
> >  
> > -	localtry_lock_irqsave(&memcg_stock.stock_lock, flags);
> > +	if (!localtry_trylock_irqsave(&memcg_stock.stock_lock, flags)) {
> 
> Don't you need to change the of memcg_stock.stock_lock? Didn't we
> introduce an explicit different type for this trylock feature?
> 

Yes, Alexei has already changed the type of this exact lock at [1].

[1] https://lore.kernel.org/r/20250222024427.30294-5-alexei.starovoitov@gmail.com


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 10/10] memcg: no more irq disabling for stock locks
  2025-03-14 11:58     ` Sebastian Andrzej Siewior
@ 2025-03-14 15:55       ` Shakeel Butt
  2025-03-14 16:42         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14 15:55 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Vlastimil Babka, Andrew Morton, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On Fri, Mar 14, 2025 at 12:58:02PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-03-14 11:54:34 [+0100], Vlastimil Babka wrote:
> > On 3/14/25 07:15, Shakeel Butt wrote:
> > > Let's switch all memcg_stock locks acquire and release places to not
> > > disable and enable irqs. There are two still functions (i.e.
> > > mod_objcg_state() and drain_obj_stock) which needs to disable irqs to
> > > update the stats on non-RT kernels. For now add a simple wrapper for
> > > that functionality.
> > 
> > BTW, which part of __mod_objcg_mlstate() really needs disabled irqs and not
> > just preemption? I see it does rcu_read_lock() anyway, which disables
> > preemption. Then in __mod_memcg_lruvec_state() we do some __this_cpu_add()
> > updates. I think these also are fine with just disabled preemption as they
> > are atomic vs irqs (but don't need LOCK prefix to be atomic vs other cpus
> > updates).
> 
> __this_cpu_add() is not safe if also used in interrupt context. Some
> architectures (not x86) implemented as read, add, write.
> this_cpu_add()() does the same but disables interrupts during the
> operation.
> So __this_cpu_add() should not be used if interrupts are not disabled
> and a modification can happen from interrupt context.

So, if I use this_cpu_add() instead of __this_cpu_add() in
__mod_memcg_state(), __mod_memcg_lruvec_state(), __count_memcg_events()
then I can call these functions without disabling interrupts. Also
this_cpu_add() does not disable interrupts for x86 and arm64, correct?
For x86 and arm64, can I assume that the cost of this_cpu_add() is the
same as __this_cpu_add()?

> 
> > Is it just memcg_rstat_updated() which does READ_ONCE/WRITE_ONCE? Could we
> > perhaps just change it to operations where disabled preemption is enough?

Yes, I will look into it.

> > 
> > > Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> …
> > > @@ -2757,6 +2745,28 @@ static void replace_stock_objcg(struct memcg_stock_pcp *stock,
> > >  	WRITE_ONCE(stock->cached_objcg, objcg);
> > >  }
> > >  
> > > +static unsigned long rt_lock(void)
> > > +{
> 
> No, we don't name it rt_lock(). We have local_lock() for this exact
> reason. And migrate_disable() does not protect vs re-enter of the
> function on the CPU while local_irq_save() does.

Thanks for clarification. Here do nothing for RT kernel and disable
interrupts for non-RT kernels. (Let'e see how the other conversation
goes, maybe we can remove the interrupt disabling requirement)


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 00/10] memcg: stock code cleanups
  2025-03-14 13:33 ` [RFC PATCH 00/10] memcg: stock code cleanups Vlastimil Babka
@ 2025-03-14 16:03   ` Shakeel Butt
  0 siblings, 0 replies; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14 16:03 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Sebastian Andrzej Siewior, linux-mm, cgroups,
	linux-kernel, Meta kernel team

On Fri, Mar 14, 2025 at 02:33:28PM +0100, Vlastimil Babka wrote:
> On 3/14/25 07:15, Shakeel Butt wrote:
> > This is a cleanup series (first 7 patches) is trying to simplify the
> > memcg stock code, particularly it tries to remove unnecessary
> > dependencies. The last 3 patches are a prototype to make per-cpu memcg
> > stock work without disabling irqs.
> > 
> > My plan is to send out the first 7 cleanup patches separately for the
> > next release window and iterate more on the last 3 patches plus add
> > functionality for multiple memcgs.
> > 
> > This series is based on next-20250313 plus two following patches:
> > 
> > Link: https://lore.kernel.org/all/20250312222552.3284173-1-shakeel.butt@linux.dev/
> > Link: https://lore.kernel.org/all/20250313054812.2185900-1-shakeel.butt@linux.dev/
> > 
> >  to simply the memcg stock code
> 
> Hi, 
> 
> I've been looking at this area too, and noticed a different opportunity for
> cleanup+perf improvement yesterday. I rebased it on top of patch 7 as it
> would make sense to do it before changing the locking - it reduces the
> number of places where the local_trylock is taken. If it's ok to you, please
> incorporate to your series.

Thanks a lot Vlastimil, I will take it and add review tag after
reviewing it.

Andrew, I will post only the cleanup series soon. It will be based on
next-20250314. If you decide to take it for upcoming open window
(6.15-rc) then some coordination with bpf tree would be needed. However
for the next window (6.16-rc), I don't expect conflicts.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 10/10] memcg: no more irq disabling for stock locks
  2025-03-14 15:55       ` Shakeel Butt
@ 2025-03-14 16:42         ` Sebastian Andrzej Siewior
  2025-03-14 17:02           ` Shakeel Butt
  0 siblings, 1 reply; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-03-14 16:42 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Vlastimil Babka, Andrew Morton, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On 2025-03-14 08:55:51 [-0700], Shakeel Butt wrote:
> On Fri, Mar 14, 2025 at 12:58:02PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2025-03-14 11:54:34 [+0100], Vlastimil Babka wrote:
> > > On 3/14/25 07:15, Shakeel Butt wrote:
> > > > Let's switch all memcg_stock locks acquire and release places to not
> > > > disable and enable irqs. There are two still functions (i.e.
> > > > mod_objcg_state() and drain_obj_stock) which needs to disable irqs to
> > > > update the stats on non-RT kernels. For now add a simple wrapper for
> > > > that functionality.
> > > 
> > > BTW, which part of __mod_objcg_mlstate() really needs disabled irqs and not
> > > just preemption? I see it does rcu_read_lock() anyway, which disables
> > > preemption. Then in __mod_memcg_lruvec_state() we do some __this_cpu_add()
> > > updates. I think these also are fine with just disabled preemption as they
> > > are atomic vs irqs (but don't need LOCK prefix to be atomic vs other cpus
> > > updates).
> > 
> > __this_cpu_add() is not safe if also used in interrupt context. Some
> > architectures (not x86) implemented as read, add, write.
> > this_cpu_add()() does the same but disables interrupts during the
> > operation.
> > So __this_cpu_add() should not be used if interrupts are not disabled
> > and a modification can happen from interrupt context.
> 
> So, if I use this_cpu_add() instead of __this_cpu_add() in
> __mod_memcg_state(), __mod_memcg_lruvec_state(), __count_memcg_events()
> then I can call these functions without disabling interrupts. Also
> this_cpu_add() does not disable interrupts for x86 and arm64, correct?
> For x86 and arm64, can I assume that the cost of this_cpu_add() is the
> same as __this_cpu_add()?

on arm64, __this_cpu_add will "load, add, store". preemptible.
this_cpu_add() will "disable preemption, atomic-load, add, atomic-store or
start over with atomic-load. if succeeded enable preemption and move an"

so no, this is not the same. On x86 it is possible to increment a memory
value directly with one opcode so you get preempted either before or
after that operation.

Sebastian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 10/10] memcg: no more irq disabling for stock locks
  2025-03-14 16:42         ` Sebastian Andrzej Siewior
@ 2025-03-14 17:02           ` Shakeel Butt
  2025-03-14 17:38             ` Shakeel Butt
  2025-03-14 18:19             ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14 17:02 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Vlastimil Babka, Andrew Morton, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On Fri, Mar 14, 2025 at 05:42:34PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-03-14 08:55:51 [-0700], Shakeel Butt wrote:
> > On Fri, Mar 14, 2025 at 12:58:02PM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2025-03-14 11:54:34 [+0100], Vlastimil Babka wrote:
> > > > On 3/14/25 07:15, Shakeel Butt wrote:
> > > > > Let's switch all memcg_stock locks acquire and release places to not
> > > > > disable and enable irqs. There are two still functions (i.e.
> > > > > mod_objcg_state() and drain_obj_stock) which needs to disable irqs to
> > > > > update the stats on non-RT kernels. For now add a simple wrapper for
> > > > > that functionality.
> > > > 
> > > > BTW, which part of __mod_objcg_mlstate() really needs disabled irqs and not
> > > > just preemption? I see it does rcu_read_lock() anyway, which disables
> > > > preemption. Then in __mod_memcg_lruvec_state() we do some __this_cpu_add()
> > > > updates. I think these also are fine with just disabled preemption as they
> > > > are atomic vs irqs (but don't need LOCK prefix to be atomic vs other cpus
> > > > updates).
> > > 
> > > __this_cpu_add() is not safe if also used in interrupt context. Some
> > > architectures (not x86) implemented as read, add, write.
> > > this_cpu_add()() does the same but disables interrupts during the
> > > operation.
> > > So __this_cpu_add() should not be used if interrupts are not disabled
> > > and a modification can happen from interrupt context.
> > 
> > So, if I use this_cpu_add() instead of __this_cpu_add() in
> > __mod_memcg_state(), __mod_memcg_lruvec_state(), __count_memcg_events()
> > then I can call these functions without disabling interrupts. Also
> > this_cpu_add() does not disable interrupts for x86 and arm64, correct?
> > For x86 and arm64, can I assume that the cost of this_cpu_add() is the
> > same as __this_cpu_add()?
> 
> on arm64, __this_cpu_add will "load, add, store". preemptible.
> this_cpu_add() will "disable preemption, atomic-load, add, atomic-store or
> start over with atomic-load. if succeeded enable preemption and move an"

So, this_cpu_add() on arm64 is not protected against interrupts but is
protected against preemption. We have a following comment in
include/linux/percpu-defs.h. Is this not true anymore?

/*
 * Operations with implied preemption/interrupt protection.  These
 * operations can be used without worrying about preemption or interrupt.
 */
...
#define this_cpu_add(pcp, val)          __pcpu_size_call(this_cpu_add_, pcp, val)

> 
> so no, this is not the same. On x86 it is possible to increment a memory
> value directly with one opcode so you get preempted either before or
> after that operation.
> 
> Sebastian


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 10/10] memcg: no more irq disabling for stock locks
  2025-03-14 17:02           ` Shakeel Butt
@ 2025-03-14 17:38             ` Shakeel Butt
  2025-03-14 18:19             ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 34+ messages in thread
From: Shakeel Butt @ 2025-03-14 17:38 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Vlastimil Babka, Tejun Heo, Andrew Morton, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Muchun Song, linux-mm, cgroups,
	linux-kernel, Meta kernel team

On Fri, Mar 14, 2025 at 10:02:47AM -0700, Shakeel Butt wrote:
> On Fri, Mar 14, 2025 at 05:42:34PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2025-03-14 08:55:51 [-0700], Shakeel Butt wrote:
> > > On Fri, Mar 14, 2025 at 12:58:02PM +0100, Sebastian Andrzej Siewior wrote:
> > > > On 2025-03-14 11:54:34 [+0100], Vlastimil Babka wrote:
> > > > > On 3/14/25 07:15, Shakeel Butt wrote:
> > > > > > Let's switch all memcg_stock locks acquire and release places to not
> > > > > > disable and enable irqs. There are two still functions (i.e.
> > > > > > mod_objcg_state() and drain_obj_stock) which needs to disable irqs to
> > > > > > update the stats on non-RT kernels. For now add a simple wrapper for
> > > > > > that functionality.
> > > > > 
> > > > > BTW, which part of __mod_objcg_mlstate() really needs disabled irqs and not
> > > > > just preemption? I see it does rcu_read_lock() anyway, which disables
> > > > > preemption. Then in __mod_memcg_lruvec_state() we do some __this_cpu_add()
> > > > > updates. I think these also are fine with just disabled preemption as they
> > > > > are atomic vs irqs (but don't need LOCK prefix to be atomic vs other cpus
> > > > > updates).
> > > > 
> > > > __this_cpu_add() is not safe if also used in interrupt context. Some
> > > > architectures (not x86) implemented as read, add, write.
> > > > this_cpu_add()() does the same but disables interrupts during the
> > > > operation.
> > > > So __this_cpu_add() should not be used if interrupts are not disabled
> > > > and a modification can happen from interrupt context.
> > > 
> > > So, if I use this_cpu_add() instead of __this_cpu_add() in
> > > __mod_memcg_state(), __mod_memcg_lruvec_state(), __count_memcg_events()
> > > then I can call these functions without disabling interrupts. Also
> > > this_cpu_add() does not disable interrupts for x86 and arm64, correct?
> > > For x86 and arm64, can I assume that the cost of this_cpu_add() is the
> > > same as __this_cpu_add()?
> > 
> > on arm64, __this_cpu_add will "load, add, store". preemptible.
> > this_cpu_add() will "disable preemption, atomic-load, add, atomic-store or
> > start over with atomic-load. if succeeded enable preemption and move an"
> 
> So, this_cpu_add() on arm64 is not protected against interrupts but is
> protected against preemption. We have a following comment in
> include/linux/percpu-defs.h. Is this not true anymore?
> 
> /*
>  * Operations with implied preemption/interrupt protection.  These
>  * operations can be used without worrying about preemption or interrupt.
>  */
> ...
> #define this_cpu_add(pcp, val)          __pcpu_size_call(this_cpu_add_, pcp, val)
> 

Just got clarification from Johannes & Tejun that this_cpu_add() is
indeed safe against irqs on arm64 as well. Basically arm64 uses loop of
Load Exclusive and Store Exclusive instruction to protect against irqs.
Defined in __PERCPU_OP_CASE() macro in arch/arm64/include/asm/percpu.h.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 10/10] memcg: no more irq disabling for stock locks
  2025-03-14 17:02           ` Shakeel Butt
  2025-03-14 17:38             ` Shakeel Butt
@ 2025-03-14 18:19             ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 34+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-03-14 18:19 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Vlastimil Babka, Andrew Morton, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On 2025-03-14 10:02:47 [-0700], Shakeel Butt wrote:
> > 
> > on arm64, __this_cpu_add will "load, add, store". preemptible.
> > this_cpu_add() will "disable preemption, atomic-load, add, atomic-store or
> > start over with atomic-load. if succeeded enable preemption and move an"
> 
> So, this_cpu_add() on arm64 is not protected against interrupts but is
> protected against preemption. We have a following comment in
> include/linux/percpu-defs.h. Is this not true anymore?

It performs an atomic update. So it loads exclusive from memory and then
stores conditionally if the exclusive monitor did not observe another
load on this address. Disabling preemption is only done to ensure that
the operation happens on the local-CPU and task gets not moved another
CPU during the operation. The concurrent update to the same memory
address from an interrupt will be caught by the exclusive monitor.

The reason to remain on the same CPU is probably to ensure that
__this_cpu_add() in an IRQ-off region does not clash with an atomic
update performed elsewhere.

While looking at it, there is also the LSE extension which results in a
single add _if_ atomic.

Sebastian


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2025-03-14 18:19 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-14  6:15 [RFC PATCH 00/10] memcg: stock code cleanups Shakeel Butt
2025-03-14  6:15 ` [RFC PATCH 01/10] memcg: remove root memcg check from refill_stock Shakeel Butt
2025-03-14  6:15 ` [RFC PATCH 02/10] memcg: decouple drain_obj_stock from local stock Shakeel Butt
2025-03-14  9:57   ` Vlastimil Babka
2025-03-14  6:15 ` [RFC PATCH 03/10] memcg: introduce memcg_uncharge Shakeel Butt
2025-03-14 10:01   ` Vlastimil Babka
2025-03-14  6:15 ` [RFC PATCH 04/10] memcg: manually inline __refill_stock Shakeel Butt
2025-03-14 10:05   ` Vlastimil Babka
2025-03-14  6:15 ` [RFC PATCH 05/10] memcg: no refilling stock from obj_cgroup_release Shakeel Butt
2025-03-14 10:09   ` Vlastimil Babka
2025-03-14 11:26   ` Sebastian Andrzej Siewior
2025-03-14 15:25     ` Shakeel Butt
2025-03-14  6:15 ` [RFC PATCH 06/10] memcg: do obj_cgroup_put inside drain_obj_stock Shakeel Butt
2025-03-14 10:17   ` Vlastimil Babka
2025-03-14 11:35     ` Sebastian Andrzej Siewior
2025-03-14 15:29       ` Shakeel Butt
2025-03-14  6:15 ` [RFC PATCH 07/10] memcg: use __mod_memcg_state in drain_obj_stock Shakeel Butt
2025-03-14 10:27   ` Vlastimil Babka
2025-03-14 11:44     ` Sebastian Andrzej Siewior
2025-03-14 11:38   ` Sebastian Andrzej Siewior
2025-03-14  6:15 ` [RFC PATCH 08/10] memcg: assert in_task for couple of local_lock holders Shakeel Butt
2025-03-14  6:15 ` [RFC PATCH 09/10] memcg: trylock stock for objcg Shakeel Butt
2025-03-14 11:47   ` Sebastian Andrzej Siewior
2025-03-14 15:33     ` Shakeel Butt
2025-03-14  6:15 ` [RFC PATCH 10/10] memcg: no more irq disabling for stock locks Shakeel Butt
2025-03-14 10:54   ` Vlastimil Babka
2025-03-14 11:58     ` Sebastian Andrzej Siewior
2025-03-14 15:55       ` Shakeel Butt
2025-03-14 16:42         ` Sebastian Andrzej Siewior
2025-03-14 17:02           ` Shakeel Butt
2025-03-14 17:38             ` Shakeel Butt
2025-03-14 18:19             ` Sebastian Andrzej Siewior
2025-03-14 13:33 ` [RFC PATCH 00/10] memcg: stock code cleanups Vlastimil Babka
2025-03-14 16:03   ` Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox