linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] memcg: cleanup the memcg stats interfaces
@ 2025-11-10 23:20 Shakeel Butt
  2025-11-10 23:20 ` [PATCH 1/4] memcg: use mod_node_page_state to update stats Shakeel Butt
                   ` (8 more replies)
  0 siblings, 9 replies; 31+ messages in thread
From: Shakeel Butt @ 2025-11-10 23:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Harry Yoo, Qi Zheng, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

The memcg stats are safe against irq (and nmi) context and thus does not
require disabling irqs. However for some stats which are also maintained
at node level, it is using irq unsafe interface and thus requiring the
users to still disables irqs or use interfaces which explicitly disables
irqs. Let's move memcg code to use irq safe node level stats function
which is already optimized for architectures with HAVE_CMPXCHG_LOCAL
(all major ones), so there will not be any performance penalty for its
usage.

Shakeel Butt (4):
  memcg: use mod_node_page_state to update stats
  memcg: remove __mod_lruvec_kmem_state
  memcg: remove __mod_lruvec_state
  memcg: remove __lruvec_stat_mod_folio

 include/linux/memcontrol.h | 28 ++++------------------
 include/linux/mm_inline.h  |  2 +-
 include/linux/vmstat.h     | 48 ++------------------------------------
 mm/filemap.c               | 20 ++++++++--------
 mm/huge_memory.c           |  4 ++--
 mm/khugepaged.c            |  8 +++----
 mm/memcontrol.c            | 20 ++++++++--------
 mm/migrate.c               | 20 ++++++++--------
 mm/page-writeback.c        |  2 +-
 mm/rmap.c                  |  4 ++--
 mm/shmem.c                 |  6 ++---
 mm/vmscan.c                |  4 ++--
 mm/workingset.c            |  2 +-
 13 files changed, 53 insertions(+), 115 deletions(-)

-- 
2.47.3



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/4] memcg: use mod_node_page_state to update stats
  2025-11-10 23:20 [PATCH 0/4] memcg: cleanup the memcg stats interfaces Shakeel Butt
@ 2025-11-10 23:20 ` Shakeel Butt
  2025-11-11  1:39   ` Harry Yoo
  2025-11-11 18:58   ` Roman Gushchin
  2025-11-10 23:20 ` [PATCH 2/4] memcg: remove __mod_lruvec_kmem_state Shakeel Butt
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 31+ messages in thread
From: Shakeel Butt @ 2025-11-10 23:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Harry Yoo, Qi Zheng, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

The memcg stats are safe against irq (and nmi) context and thus does not
require disabling irqs. However some code paths for memcg stats also
update the node level stats and use irq unsafe interface and thus
require the users to disable irqs. However node level stats, on
architectures with HAVE_CMPXCHG_LOCAL (all major ones), has interface
which does not require irq disabling. Let's move memcg stats code to
start using that interface for node level stats.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 include/linux/memcontrol.h | 2 +-
 include/linux/vmstat.h     | 4 ++--
 mm/memcontrol.c            | 6 +++---
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 8c0f15e5978f..f82fac2fd988 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1408,7 +1408,7 @@ static inline void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
 {
 	struct page *page = virt_to_head_page(p);
 
-	__mod_node_page_state(page_pgdat(page), idx, val);
+	mod_node_page_state(page_pgdat(page), idx, val);
 }
 
 static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index c287998908bf..11a37aaa4dd9 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -557,7 +557,7 @@ static inline void mod_lruvec_page_state(struct page *page,
 static inline void __mod_lruvec_state(struct lruvec *lruvec,
 				      enum node_stat_item idx, int val)
 {
-	__mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
+	mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
 }
 
 static inline void mod_lruvec_state(struct lruvec *lruvec,
@@ -569,7 +569,7 @@ static inline void mod_lruvec_state(struct lruvec *lruvec,
 static inline void __lruvec_stat_mod_folio(struct folio *folio,
 					 enum node_stat_item idx, int val)
 {
-	__mod_node_page_state(folio_pgdat(folio), idx, val);
+	mod_node_page_state(folio_pgdat(folio), idx, val);
 }
 
 static inline void lruvec_stat_mod_folio(struct folio *folio,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 025da46d9959..f4b8a6414ed3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -770,7 +770,7 @@ void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 			int val)
 {
 	/* Update node */
-	__mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
+	mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
 
 	/* Update memcg and lruvec */
 	if (!mem_cgroup_disabled())
@@ -789,7 +789,7 @@ void __lruvec_stat_mod_folio(struct folio *folio, enum node_stat_item idx,
 	/* Untracked pages have no memcg, no lruvec. Update only the node */
 	if (!memcg) {
 		rcu_read_unlock();
-		__mod_node_page_state(pgdat, idx, val);
+		mod_node_page_state(pgdat, idx, val);
 		return;
 	}
 
@@ -815,7 +815,7 @@ void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
 	 * vmstats to keep it correct for the root memcg.
 	 */
 	if (!memcg) {
-		__mod_node_page_state(pgdat, idx, val);
+		mod_node_page_state(pgdat, idx, val);
 	} else {
 		lruvec = mem_cgroup_lruvec(memcg, pgdat);
 		__mod_lruvec_state(lruvec, idx, val);
-- 
2.47.3



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 2/4] memcg: remove __mod_lruvec_kmem_state
  2025-11-10 23:20 [PATCH 0/4] memcg: cleanup the memcg stats interfaces Shakeel Butt
  2025-11-10 23:20 ` [PATCH 1/4] memcg: use mod_node_page_state to update stats Shakeel Butt
@ 2025-11-10 23:20 ` Shakeel Butt
  2025-11-11  1:46   ` Harry Yoo
                     ` (2 more replies)
  2025-11-10 23:20 ` [PATCH 3/4] memcg: remove __mod_lruvec_state Shakeel Butt
                   ` (6 subsequent siblings)
  8 siblings, 3 replies; 31+ messages in thread
From: Shakeel Butt @ 2025-11-10 23:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Harry Yoo, Qi Zheng, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

The __mod_lruvec_kmem_state is already safe against irqs, so there is no
need to have a separate interface (i.e. mod_lruvec_kmem_state) which
wraps calls to it with irq disabling and reenabling. Let's rename
__mod_lruvec_kmem_state to mod_lruvec_kmem_state.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 include/linux/memcontrol.h | 28 +++++-----------------------
 mm/memcontrol.c            |  2 +-
 mm/workingset.c            |  2 +-
 3 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index f82fac2fd988..1384a9d305e1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -957,17 +957,7 @@ unsigned long lruvec_page_state_local(struct lruvec *lruvec,
 void mem_cgroup_flush_stats(struct mem_cgroup *memcg);
 void mem_cgroup_flush_stats_ratelimited(struct mem_cgroup *memcg);
 
-void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val);
-
-static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
-					 int val)
-{
-	unsigned long flags;
-
-	local_irq_save(flags);
-	__mod_lruvec_kmem_state(p, idx, val);
-	local_irq_restore(flags);
-}
+void mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val);
 
 void count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
 			unsigned long count);
@@ -1403,14 +1393,6 @@ static inline void mem_cgroup_flush_stats_ratelimited(struct mem_cgroup *memcg)
 {
 }
 
-static inline void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
-					   int val)
-{
-	struct page *page = virt_to_head_page(p);
-
-	mod_node_page_state(page_pgdat(page), idx, val);
-}
-
 static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
 					 int val)
 {
@@ -1470,14 +1452,14 @@ struct slabobj_ext {
 #endif
 } __aligned(8);
 
-static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
+static inline void inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
 {
-	__mod_lruvec_kmem_state(p, idx, 1);
+	mod_lruvec_kmem_state(p, idx, 1);
 }
 
-static inline void __dec_lruvec_kmem_state(void *p, enum node_stat_item idx)
+static inline void dec_lruvec_kmem_state(void *p, enum node_stat_item idx)
 {
-	__mod_lruvec_kmem_state(p, idx, -1);
+	mod_lruvec_kmem_state(p, idx, -1);
 }
 
 static inline struct lruvec *parent_lruvec(struct lruvec *lruvec)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f4b8a6414ed3..3a59d3ee92a7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -799,7 +799,7 @@ void __lruvec_stat_mod_folio(struct folio *folio, enum node_stat_item idx,
 }
 EXPORT_SYMBOL(__lruvec_stat_mod_folio);
 
-void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
+void mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
 {
 	pg_data_t *pgdat = page_pgdat(virt_to_page(p));
 	struct mem_cgroup *memcg;
diff --git a/mm/workingset.c b/mm/workingset.c
index d32dc2e02a61..892f6fe94ea9 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -749,7 +749,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
 	if (WARN_ON_ONCE(node->count != node->nr_values))
 		goto out_invalid;
 	xa_delete_node(node, workingset_update_node);
-	__inc_lruvec_kmem_state(node, WORKINGSET_NODERECLAIM);
+	inc_lruvec_kmem_state(node, WORKINGSET_NODERECLAIM);
 
 out_invalid:
 	xa_unlock_irq(&mapping->i_pages);
-- 
2.47.3



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 3/4] memcg: remove __mod_lruvec_state
  2025-11-10 23:20 [PATCH 0/4] memcg: cleanup the memcg stats interfaces Shakeel Butt
  2025-11-10 23:20 ` [PATCH 1/4] memcg: use mod_node_page_state to update stats Shakeel Butt
  2025-11-10 23:20 ` [PATCH 2/4] memcg: remove __mod_lruvec_kmem_state Shakeel Butt
@ 2025-11-10 23:20 ` Shakeel Butt
  2025-11-11  5:21   ` Harry Yoo
  2025-11-11 18:58   ` Roman Gushchin
  2025-11-10 23:20 ` [PATCH 4/4] memcg: remove __lruvec_stat_mod_folio Shakeel Butt
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 31+ messages in thread
From: Shakeel Butt @ 2025-11-10 23:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Harry Yoo, Qi Zheng, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

The __mod_lruvec_state is already safe against irqs, so there is no
need to have a separate interface (i.e. mod_lruvec_state) which
wraps calls to it with irq disabling and reenabling. Let's rename
__mod_lruvec_state to mod_lruvec_state.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 include/linux/mm_inline.h |  2 +-
 include/linux/vmstat.h    | 18 +-----------------
 mm/memcontrol.c           |  8 ++++----
 mm/migrate.c              | 20 ++++++++++----------
 mm/vmscan.c               |  4 ++--
 5 files changed, 18 insertions(+), 34 deletions(-)

diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 795b255abf65..d7b963255012 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -44,7 +44,7 @@ static __always_inline void __update_lru_size(struct lruvec *lruvec,
 	lockdep_assert_held(&lruvec->lru_lock);
 	WARN_ON_ONCE(nr_pages != (int)nr_pages);
 
-	__mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages);
+	mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages);
 	__mod_zone_page_state(&pgdat->node_zones[zid],
 				NR_ZONE_LRU_BASE + lru, nr_pages);
 }
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 11a37aaa4dd9..4eb7753e6e5c 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -520,19 +520,9 @@ static inline const char *vm_event_name(enum vm_event_item item)
 
 #ifdef CONFIG_MEMCG
 
-void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
+void mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 			int val);
 
-static inline void mod_lruvec_state(struct lruvec *lruvec,
-				    enum node_stat_item idx, int val)
-{
-	unsigned long flags;
-
-	local_irq_save(flags);
-	__mod_lruvec_state(lruvec, idx, val);
-	local_irq_restore(flags);
-}
-
 void __lruvec_stat_mod_folio(struct folio *folio,
 			     enum node_stat_item idx, int val);
 
@@ -554,12 +544,6 @@ static inline void mod_lruvec_page_state(struct page *page,
 
 #else
 
-static inline void __mod_lruvec_state(struct lruvec *lruvec,
-				      enum node_stat_item idx, int val)
-{
-	mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
-}
-
 static inline void mod_lruvec_state(struct lruvec *lruvec,
 				    enum node_stat_item idx, int val)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3a59d3ee92a7..c31074e5852b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -757,7 +757,7 @@ static void mod_memcg_lruvec_state(struct lruvec *lruvec,
 }
 
 /**
- * __mod_lruvec_state - update lruvec memory statistics
+ * mod_lruvec_state - update lruvec memory statistics
  * @lruvec: the lruvec
  * @idx: the stat item
  * @val: delta to add to the counter, can be negative
@@ -766,7 +766,7 @@ static void mod_memcg_lruvec_state(struct lruvec *lruvec,
  * function updates the all three counters that are affected by a
  * change of state at this level: per-node, per-cgroup, per-lruvec.
  */
-void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
+void mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 			int val)
 {
 	/* Update node */
@@ -794,7 +794,7 @@ void __lruvec_stat_mod_folio(struct folio *folio, enum node_stat_item idx,
 	}
 
 	lruvec = mem_cgroup_lruvec(memcg, pgdat);
-	__mod_lruvec_state(lruvec, idx, val);
+	mod_lruvec_state(lruvec, idx, val);
 	rcu_read_unlock();
 }
 EXPORT_SYMBOL(__lruvec_stat_mod_folio);
@@ -818,7 +818,7 @@ void mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
 		mod_node_page_state(pgdat, idx, val);
 	} else {
 		lruvec = mem_cgroup_lruvec(memcg, pgdat);
-		__mod_lruvec_state(lruvec, idx, val);
+		mod_lruvec_state(lruvec, idx, val);
 	}
 	rcu_read_unlock();
 }
diff --git a/mm/migrate.c b/mm/migrate.c
index 567dfae4d9f8..be00c3c82f3a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -675,27 +675,27 @@ static int __folio_migrate_mapping(struct address_space *mapping,
 		old_lruvec = mem_cgroup_lruvec(memcg, oldzone->zone_pgdat);
 		new_lruvec = mem_cgroup_lruvec(memcg, newzone->zone_pgdat);
 
-		__mod_lruvec_state(old_lruvec, NR_FILE_PAGES, -nr);
-		__mod_lruvec_state(new_lruvec, NR_FILE_PAGES, nr);
+		mod_lruvec_state(old_lruvec, NR_FILE_PAGES, -nr);
+		mod_lruvec_state(new_lruvec, NR_FILE_PAGES, nr);
 		if (folio_test_swapbacked(folio) && !folio_test_swapcache(folio)) {
-			__mod_lruvec_state(old_lruvec, NR_SHMEM, -nr);
-			__mod_lruvec_state(new_lruvec, NR_SHMEM, nr);
+			mod_lruvec_state(old_lruvec, NR_SHMEM, -nr);
+			mod_lruvec_state(new_lruvec, NR_SHMEM, nr);
 
 			if (folio_test_pmd_mappable(folio)) {
-				__mod_lruvec_state(old_lruvec, NR_SHMEM_THPS, -nr);
-				__mod_lruvec_state(new_lruvec, NR_SHMEM_THPS, nr);
+				mod_lruvec_state(old_lruvec, NR_SHMEM_THPS, -nr);
+				mod_lruvec_state(new_lruvec, NR_SHMEM_THPS, nr);
 			}
 		}
 #ifdef CONFIG_SWAP
 		if (folio_test_swapcache(folio)) {
-			__mod_lruvec_state(old_lruvec, NR_SWAPCACHE, -nr);
-			__mod_lruvec_state(new_lruvec, NR_SWAPCACHE, nr);
+			mod_lruvec_state(old_lruvec, NR_SWAPCACHE, -nr);
+			mod_lruvec_state(new_lruvec, NR_SWAPCACHE, nr);
 		}
 #endif
 		if (dirty && mapping_can_writeback(mapping)) {
-			__mod_lruvec_state(old_lruvec, NR_FILE_DIRTY, -nr);
+			mod_lruvec_state(old_lruvec, NR_FILE_DIRTY, -nr);
 			__mod_zone_page_state(oldzone, NR_ZONE_WRITE_PENDING, -nr);
-			__mod_lruvec_state(new_lruvec, NR_FILE_DIRTY, nr);
+			mod_lruvec_state(new_lruvec, NR_FILE_DIRTY, nr);
 			__mod_zone_page_state(newzone, NR_ZONE_WRITE_PENDING, nr);
 		}
 	}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ba760072830b..b3231bdde4e6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2019,7 +2019,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
 	spin_lock_irq(&lruvec->lru_lock);
 	move_folios_to_lru(lruvec, &folio_list);
 
-	__mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(sc),
+	mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(sc),
 					stat.nr_demoted);
 	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
 	item = PGSTEAL_KSWAPD + reclaimer_offset(sc);
@@ -4745,7 +4745,7 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
 		reset_batch_size(walk);
 	}
 
-	__mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(sc),
+	mod_lruvec_state(lruvec, PGDEMOTE_KSWAPD + reclaimer_offset(sc),
 					stat.nr_demoted);
 
 	item = PGSTEAL_KSWAPD + reclaimer_offset(sc);
-- 
2.47.3



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 4/4] memcg: remove __lruvec_stat_mod_folio
  2025-11-10 23:20 [PATCH 0/4] memcg: cleanup the memcg stats interfaces Shakeel Butt
                   ` (2 preceding siblings ...)
  2025-11-10 23:20 ` [PATCH 3/4] memcg: remove __mod_lruvec_state Shakeel Butt
@ 2025-11-10 23:20 ` Shakeel Butt
  2025-11-11  5:41   ` Harry Yoo
  2025-11-11 18:59   ` Roman Gushchin
  2025-11-11  0:59 ` [PATCH 0/4] memcg: cleanup the memcg stats interfaces Harry Yoo
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 31+ messages in thread
From: Shakeel Butt @ 2025-11-10 23:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Harry Yoo, Qi Zheng, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

The __lruvec_stat_mod_folio is already safe against irqs, so there is no
need to have a separate interface (i.e. lruvec_stat_mod_folio) which
wraps calls to it with irq disabling and reenabling. Let's rename
__lruvec_stat_mod_folio to lruvec_stat_mod_folio.

Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
---
 include/linux/vmstat.h | 30 +-----------------------------
 mm/filemap.c           | 20 ++++++++++----------
 mm/huge_memory.c       |  4 ++--
 mm/khugepaged.c        |  8 ++++----
 mm/memcontrol.c        |  4 ++--
 mm/page-writeback.c    |  2 +-
 mm/rmap.c              |  4 ++--
 mm/shmem.c             |  6 +++---
 8 files changed, 25 insertions(+), 53 deletions(-)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 4eb7753e6e5c..3398a345bda8 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -523,19 +523,9 @@ static inline const char *vm_event_name(enum vm_event_item item)
 void mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 			int val);
 
-void __lruvec_stat_mod_folio(struct folio *folio,
+void lruvec_stat_mod_folio(struct folio *folio,
 			     enum node_stat_item idx, int val);
 
-static inline void lruvec_stat_mod_folio(struct folio *folio,
-					 enum node_stat_item idx, int val)
-{
-	unsigned long flags;
-
-	local_irq_save(flags);
-	__lruvec_stat_mod_folio(folio, idx, val);
-	local_irq_restore(flags);
-}
-
 static inline void mod_lruvec_page_state(struct page *page,
 					 enum node_stat_item idx, int val)
 {
@@ -550,12 +540,6 @@ static inline void mod_lruvec_state(struct lruvec *lruvec,
 	mod_node_page_state(lruvec_pgdat(lruvec), idx, val);
 }
 
-static inline void __lruvec_stat_mod_folio(struct folio *folio,
-					 enum node_stat_item idx, int val)
-{
-	mod_node_page_state(folio_pgdat(folio), idx, val);
-}
-
 static inline void lruvec_stat_mod_folio(struct folio *folio,
 					 enum node_stat_item idx, int val)
 {
@@ -570,18 +554,6 @@ static inline void mod_lruvec_page_state(struct page *page,
 
 #endif /* CONFIG_MEMCG */
 
-static inline void __lruvec_stat_add_folio(struct folio *folio,
-					   enum node_stat_item idx)
-{
-	__lruvec_stat_mod_folio(folio, idx, folio_nr_pages(folio));
-}
-
-static inline void __lruvec_stat_sub_folio(struct folio *folio,
-					   enum node_stat_item idx)
-{
-	__lruvec_stat_mod_folio(folio, idx, -folio_nr_pages(folio));
-}
-
 static inline void lruvec_stat_add_folio(struct folio *folio,
 					 enum node_stat_item idx)
 {
diff --git a/mm/filemap.c b/mm/filemap.c
index 63eb163af99c..9a52fb3ba093 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -182,13 +182,13 @@ static void filemap_unaccount_folio(struct address_space *mapping,
 
 	nr = folio_nr_pages(folio);
 
-	__lruvec_stat_mod_folio(folio, NR_FILE_PAGES, -nr);
+	lruvec_stat_mod_folio(folio, NR_FILE_PAGES, -nr);
 	if (folio_test_swapbacked(folio)) {
-		__lruvec_stat_mod_folio(folio, NR_SHMEM, -nr);
+		lruvec_stat_mod_folio(folio, NR_SHMEM, -nr);
 		if (folio_test_pmd_mappable(folio))
-			__lruvec_stat_mod_folio(folio, NR_SHMEM_THPS, -nr);
+			lruvec_stat_mod_folio(folio, NR_SHMEM_THPS, -nr);
 	} else if (folio_test_pmd_mappable(folio)) {
-		__lruvec_stat_mod_folio(folio, NR_FILE_THPS, -nr);
+		lruvec_stat_mod_folio(folio, NR_FILE_THPS, -nr);
 		filemap_nr_thps_dec(mapping);
 	}
 	if (test_bit(AS_KERNEL_FILE, &folio->mapping->flags))
@@ -831,13 +831,13 @@ void replace_page_cache_folio(struct folio *old, struct folio *new)
 	old->mapping = NULL;
 	/* hugetlb pages do not participate in page cache accounting. */
 	if (!folio_test_hugetlb(old))
-		__lruvec_stat_sub_folio(old, NR_FILE_PAGES);
+		lruvec_stat_sub_folio(old, NR_FILE_PAGES);
 	if (!folio_test_hugetlb(new))
-		__lruvec_stat_add_folio(new, NR_FILE_PAGES);
+		lruvec_stat_add_folio(new, NR_FILE_PAGES);
 	if (folio_test_swapbacked(old))
-		__lruvec_stat_sub_folio(old, NR_SHMEM);
+		lruvec_stat_sub_folio(old, NR_SHMEM);
 	if (folio_test_swapbacked(new))
-		__lruvec_stat_add_folio(new, NR_SHMEM);
+		lruvec_stat_add_folio(new, NR_SHMEM);
 	xas_unlock_irq(&xas);
 	if (free_folio)
 		free_folio(old);
@@ -920,9 +920,9 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 
 		/* hugetlb pages do not participate in page cache accounting */
 		if (!huge) {
-			__lruvec_stat_mod_folio(folio, NR_FILE_PAGES, nr);
+			lruvec_stat_mod_folio(folio, NR_FILE_PAGES, nr);
 			if (folio_test_pmd_mappable(folio))
-				__lruvec_stat_mod_folio(folio,
+				lruvec_stat_mod_folio(folio,
 						NR_FILE_THPS, nr);
 		}
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 949250932bb4..943099eae8d5 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3866,10 +3866,10 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
 			if (folio_test_pmd_mappable(folio) &&
 			    new_order < HPAGE_PMD_ORDER) {
 				if (folio_test_swapbacked(folio)) {
-					__lruvec_stat_mod_folio(folio,
+					lruvec_stat_mod_folio(folio,
 							NR_SHMEM_THPS, -nr);
 				} else {
-					__lruvec_stat_mod_folio(folio,
+					lruvec_stat_mod_folio(folio,
 							NR_FILE_THPS, -nr);
 					filemap_nr_thps_dec(mapping);
 				}
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 1a08673b0d8b..2a460664a67d 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2174,14 +2174,14 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 	}
 
 	if (is_shmem)
-		__lruvec_stat_mod_folio(new_folio, NR_SHMEM_THPS, HPAGE_PMD_NR);
+		lruvec_stat_mod_folio(new_folio, NR_SHMEM_THPS, HPAGE_PMD_NR);
 	else
-		__lruvec_stat_mod_folio(new_folio, NR_FILE_THPS, HPAGE_PMD_NR);
+		lruvec_stat_mod_folio(new_folio, NR_FILE_THPS, HPAGE_PMD_NR);
 
 	if (nr_none) {
-		__lruvec_stat_mod_folio(new_folio, NR_FILE_PAGES, nr_none);
+		lruvec_stat_mod_folio(new_folio, NR_FILE_PAGES, nr_none);
 		/* nr_none is always 0 for non-shmem. */
-		__lruvec_stat_mod_folio(new_folio, NR_SHMEM, nr_none);
+		lruvec_stat_mod_folio(new_folio, NR_SHMEM, nr_none);
 	}
 
 	/*
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c31074e5852b..7f074d72dabc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -777,7 +777,7 @@ void mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
 		mod_memcg_lruvec_state(lruvec, idx, val);
 }
 
-void __lruvec_stat_mod_folio(struct folio *folio, enum node_stat_item idx,
+void lruvec_stat_mod_folio(struct folio *folio, enum node_stat_item idx,
 			     int val)
 {
 	struct mem_cgroup *memcg;
@@ -797,7 +797,7 @@ void __lruvec_stat_mod_folio(struct folio *folio, enum node_stat_item idx,
 	mod_lruvec_state(lruvec, idx, val);
 	rcu_read_unlock();
 }
-EXPORT_SYMBOL(__lruvec_stat_mod_folio);
+EXPORT_SYMBOL(lruvec_stat_mod_folio);
 
 void mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
 {
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index a124ab6a205d..ccdeb0e84d39 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2652,7 +2652,7 @@ static void folio_account_dirtied(struct folio *folio,
 		inode_attach_wb(inode, folio);
 		wb = inode_to_wb(inode);
 
-		__lruvec_stat_mod_folio(folio, NR_FILE_DIRTY, nr);
+		lruvec_stat_mod_folio(folio, NR_FILE_DIRTY, nr);
 		__zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, nr);
 		__node_stat_mod_folio(folio, NR_DIRTIED, nr);
 		wb_stat_mod(wb, WB_RECLAIMABLE, nr);
diff --git a/mm/rmap.c b/mm/rmap.c
index 60c3cd70b6ea..1b3a3c7b0aeb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1212,12 +1212,12 @@ static void __folio_mod_stat(struct folio *folio, int nr, int nr_pmdmapped)
 
 	if (nr) {
 		idx = folio_test_anon(folio) ? NR_ANON_MAPPED : NR_FILE_MAPPED;
-		__lruvec_stat_mod_folio(folio, idx, nr);
+		lruvec_stat_mod_folio(folio, idx, nr);
 	}
 	if (nr_pmdmapped) {
 		if (folio_test_anon(folio)) {
 			idx = NR_ANON_THPS;
-			__lruvec_stat_mod_folio(folio, idx, nr_pmdmapped);
+			lruvec_stat_mod_folio(folio, idx, nr_pmdmapped);
 		} else {
 			/* NR_*_PMDMAPPED are not maintained per-memcg */
 			idx = folio_test_swapbacked(folio) ?
diff --git a/mm/shmem.c b/mm/shmem.c
index c3ed2dcd17f8..4fba8a597256 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -882,9 +882,9 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
 static void shmem_update_stats(struct folio *folio, int nr_pages)
 {
 	if (folio_test_pmd_mappable(folio))
-		__lruvec_stat_mod_folio(folio, NR_SHMEM_THPS, nr_pages);
-	__lruvec_stat_mod_folio(folio, NR_FILE_PAGES, nr_pages);
-	__lruvec_stat_mod_folio(folio, NR_SHMEM, nr_pages);
+		lruvec_stat_mod_folio(folio, NR_SHMEM_THPS, nr_pages);
+	lruvec_stat_mod_folio(folio, NR_FILE_PAGES, nr_pages);
+	lruvec_stat_mod_folio(folio, NR_SHMEM, nr_pages);
 }
 
 /*
-- 
2.47.3



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] memcg: cleanup the memcg stats interfaces
  2025-11-10 23:20 [PATCH 0/4] memcg: cleanup the memcg stats interfaces Shakeel Butt
                   ` (3 preceding siblings ...)
  2025-11-10 23:20 ` [PATCH 4/4] memcg: remove __lruvec_stat_mod_folio Shakeel Butt
@ 2025-11-11  0:59 ` Harry Yoo
  2025-11-11  2:23   ` Qi Zheng
  2025-11-11  8:36 ` Qi Zheng
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Harry Yoo @ 2025-11-11  0:59 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Qi Zheng, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

On Mon, Nov 10, 2025 at 03:20:04PM -0800, Shakeel Butt wrote:
> The memcg stats are safe against irq (and nmi) context and thus does not
> require disabling irqs. However for some stats which are also maintained
> at node level, it is using irq unsafe interface and thus requiring the
> users to still disables irqs or use interfaces which explicitly disables
> irqs. Let's move memcg code to use irq safe node level stats function
> which is already optimized for architectures with HAVE_CMPXCHG_LOCAL
> (all major ones), so there will not be any performance penalty for its
> usage.

Are you or Qi planning a follow-up that converts spin_lock_irq() to
spin_lock() in places where they disabled IRQs was just to update vmstat?

Qi's zombie memcg series will depends on that work I guess..

-- 
Cheers,
Harry / Hyeonggon

> Shakeel Butt (4):
>   memcg: use mod_node_page_state to update stats
>   memcg: remove __mod_lruvec_kmem_state
>   memcg: remove __mod_lruvec_state
>   memcg: remove __lruvec_stat_mod_folio
> 
>  include/linux/memcontrol.h | 28 ++++------------------
>  include/linux/mm_inline.h  |  2 +-
>  include/linux/vmstat.h     | 48 ++------------------------------------
>  mm/filemap.c               | 20 ++++++++--------
>  mm/huge_memory.c           |  4 ++--
>  mm/khugepaged.c            |  8 +++----
>  mm/memcontrol.c            | 20 ++++++++--------
>  mm/migrate.c               | 20 ++++++++--------
>  mm/page-writeback.c        |  2 +-
>  mm/rmap.c                  |  4 ++--
>  mm/shmem.c                 |  6 ++---
>  mm/vmscan.c                |  4 ++--
>  mm/workingset.c            |  2 +-
>  13 files changed, 53 insertions(+), 115 deletions(-)
> 
> -- 
> 2.47.3
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/4] memcg: use mod_node_page_state to update stats
  2025-11-10 23:20 ` [PATCH 1/4] memcg: use mod_node_page_state to update stats Shakeel Butt
@ 2025-11-11  1:39   ` Harry Yoo
  2025-11-11 18:58   ` Roman Gushchin
  1 sibling, 0 replies; 31+ messages in thread
From: Harry Yoo @ 2025-11-11  1:39 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Qi Zheng, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

On Mon, Nov 10, 2025 at 03:20:05PM -0800, Shakeel Butt wrote:
> The memcg stats are safe against irq (and nmi) context and thus does not
> require disabling irqs. However some code paths for memcg stats also
> update the node level stats and use irq unsafe interface and thus
> require the users to disable irqs. However node level stats, on
> architectures with HAVE_CMPXCHG_LOCAL (all major ones), has interface
> which does not require irq disabling. Let's move memcg stats code to
> start using that interface for node level stats.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---

Looks good to me,
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/4] memcg: remove __mod_lruvec_kmem_state
  2025-11-10 23:20 ` [PATCH 2/4] memcg: remove __mod_lruvec_kmem_state Shakeel Butt
@ 2025-11-11  1:46   ` Harry Yoo
  2025-11-11  8:23   ` Qi Zheng
  2025-11-11 18:58   ` Roman Gushchin
  2 siblings, 0 replies; 31+ messages in thread
From: Harry Yoo @ 2025-11-11  1:46 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Qi Zheng, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

On Mon, Nov 10, 2025 at 03:20:06PM -0800, Shakeel Butt wrote:
> The __mod_lruvec_kmem_state is already safe against irqs, so there is no
> need to have a separate interface (i.e. mod_lruvec_kmem_state) which
> wraps calls to it with irq disabling and reenabling. Let's rename
> __mod_lruvec_kmem_state to mod_lruvec_kmem_state.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---

Looks good to me,
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] memcg: cleanup the memcg stats interfaces
  2025-11-11  0:59 ` [PATCH 0/4] memcg: cleanup the memcg stats interfaces Harry Yoo
@ 2025-11-11  2:23   ` Qi Zheng
  2025-11-11  2:39     ` Shakeel Butt
  0 siblings, 1 reply; 31+ messages in thread
From: Qi Zheng @ 2025-11-11  2:23 UTC (permalink / raw)
  To: Harry Yoo, Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Vlastimil Babka, linux-mm, cgroups, linux-kernel,
	Meta kernel team

Hi,

On 11/11/25 8:59 AM, Harry Yoo wrote:
> On Mon, Nov 10, 2025 at 03:20:04PM -0800, Shakeel Butt wrote:
>> The memcg stats are safe against irq (and nmi) context and thus does not
>> require disabling irqs. However for some stats which are also maintained
>> at node level, it is using irq unsafe interface and thus requiring the
>> users to still disables irqs or use interfaces which explicitly disables
>> irqs. Let's move memcg code to use irq safe node level stats function
>> which is already optimized for architectures with HAVE_CMPXCHG_LOCAL
>> (all major ones), so there will not be any performance penalty for its
>> usage.

Good job. Thanks!

> 
> Are you or Qi planning a follow-up that converts spin_lock_irq() to
> spin_lock() in places where they disabled IRQs was just to update vmstat?

Perhaps this change could be implemented together in [PATCH 1/4]?

Of course, it's also reasonable to make it a separate patch. If we
choose this method, I’m fine with either me or Shakeel doing it.

> 
> Qi's zombie memcg series will depends on that work I guess..

Yes, and there are other places that also need to be converted, such as
__folio_migrate_mapping().

Thanks,
Qi

> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] memcg: cleanup the memcg stats interfaces
  2025-11-11  2:23   ` Qi Zheng
@ 2025-11-11  2:39     ` Shakeel Butt
  2025-11-11  2:48       ` Qi Zheng
  0 siblings, 1 reply; 31+ messages in thread
From: Shakeel Butt @ 2025-11-11  2:39 UTC (permalink / raw)
  To: Qi Zheng
  Cc: Harry Yoo, Andrew Morton, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

On Tue, Nov 11, 2025 at 10:23:15AM +0800, Qi Zheng wrote:
> Hi,
> 
[...]
> > 
> > Are you or Qi planning a follow-up that converts spin_lock_irq() to
> > spin_lock() in places where they disabled IRQs was just to update vmstat?
> 
> Perhaps this change could be implemented together in [PATCH 1/4]?
> 
> Of course, it's also reasonable to make it a separate patch. If we
> choose this method, I’m fine with either me or Shakeel doing it.
> 

Let's do it separately as I wanted to keep the memcg related changes
self-contained.

Qi, can you please take a stab at that?

> > 
> > Qi's zombie memcg series will depends on that work I guess..
> 
> Yes, and there are other places that also need to be converted, such as
> __folio_migrate_mapping().

I see __mod_zone_page_state() usage in __folio_migrate_mapping() and
using the same reasoning we can convert it to use mod_zone_page_state().
Where else do you need to do these conversions (other than
__folio_migrate_mapping)?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] memcg: cleanup the memcg stats interfaces
  2025-11-11  2:39     ` Shakeel Butt
@ 2025-11-11  2:48       ` Qi Zheng
  2025-11-11  3:00         ` Shakeel Butt
  2025-11-11  3:05         ` Harry Yoo
  0 siblings, 2 replies; 31+ messages in thread
From: Qi Zheng @ 2025-11-11  2:48 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Harry Yoo, Andrew Morton, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

Hi Shakeel,

On 11/11/25 10:39 AM, Shakeel Butt wrote:
> On Tue, Nov 11, 2025 at 10:23:15AM +0800, Qi Zheng wrote:
>> Hi,
>>
> [...]
>>>
>>> Are you or Qi planning a follow-up that converts spin_lock_irq() to
>>> spin_lock() in places where they disabled IRQs was just to update vmstat?
>>
>> Perhaps this change could be implemented together in [PATCH 1/4]?
>>
>> Of course, it's also reasonable to make it a separate patch. If we
>> choose this method, I’m fine with either me or Shakeel doing it.
>>
> 
> Let's do it separately as I wanted to keep the memcg related changes
> self-contained.

OK.

> 
> Qi, can you please take a stab at that?

Sure, I will do it.

> 
>>>
>>> Qi's zombie memcg series will depends on that work I guess..
>>
>> Yes, and there are other places that also need to be converted, such as
>> __folio_migrate_mapping().
> 
> I see __mod_zone_page_state() usage in __folio_migrate_mapping() and
> using the same reasoning we can convert it to use mod_zone_page_state().
> Where else do you need to do these conversions (other than
> __folio_migrate_mapping)?

I mean converting these places to use spin_lock() instead of
spin_lock_irq().

Thanks,
Qi



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] memcg: cleanup the memcg stats interfaces
  2025-11-11  2:48       ` Qi Zheng
@ 2025-11-11  3:00         ` Shakeel Butt
  2025-11-11  3:07           ` Qi Zheng
  2025-11-11  3:05         ` Harry Yoo
  1 sibling, 1 reply; 31+ messages in thread
From: Shakeel Butt @ 2025-11-11  3:00 UTC (permalink / raw)
  To: Qi Zheng
  Cc: Harry Yoo, Andrew Morton, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

On Tue, Nov 11, 2025 at 10:48:18AM +0800, Qi Zheng wrote:
> Hi Shakeel,
> 
> On 11/11/25 10:39 AM, Shakeel Butt wrote:
> > On Tue, Nov 11, 2025 at 10:23:15AM +0800, Qi Zheng wrote:
> > > Hi,
> > > 
> > [...]
> > > > 
> > > > Are you or Qi planning a follow-up that converts spin_lock_irq() to
> > > > spin_lock() in places where they disabled IRQs was just to update vmstat?
> > > 
> > > Perhaps this change could be implemented together in [PATCH 1/4]?
> > > 
> > > Of course, it's also reasonable to make it a separate patch. If we
> > > choose this method, I’m fine with either me or Shakeel doing it.
> > > 
> > 
> > Let's do it separately as I wanted to keep the memcg related changes
> > self-contained.
> 
> OK.
> 
> > 
> > Qi, can you please take a stab at that?
> 
> Sure, I will do it.
> 
> > 
> > > > 
> > > > Qi's zombie memcg series will depends on that work I guess..
> > > 
> > > Yes, and there are other places that also need to be converted, such as
> > > __folio_migrate_mapping().
> > 
> > I see __mod_zone_page_state() usage in __folio_migrate_mapping() and
> > using the same reasoning we can convert it to use mod_zone_page_state().
> > Where else do you need to do these conversions (other than
> > __folio_migrate_mapping)?
> 
> I mean converting these places to use spin_lock() instead of
> spin_lock_irq().

For only stats, right?


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] memcg: cleanup the memcg stats interfaces
  2025-11-11  2:48       ` Qi Zheng
  2025-11-11  3:00         ` Shakeel Butt
@ 2025-11-11  3:05         ` Harry Yoo
  2025-11-11  8:01           ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 31+ messages in thread
From: Harry Yoo @ 2025-11-11  3:05 UTC (permalink / raw)
  To: Qi Zheng
  Cc: Shakeel Butt, Andrew Morton, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team, Sebastian Andrzej Siewior,
	Clark Williams, linux-rt-devel

On Tue, Nov 11, 2025 at 10:48:18AM +0800, Qi Zheng wrote:
> Hi Shakeel,
> 
> On 11/11/25 10:39 AM, Shakeel Butt wrote:
> > On Tue, Nov 11, 2025 at 10:23:15AM +0800, Qi Zheng wrote:
> > > Hi,
> > > 
> > [...]
> > > > 
> > > > Are you or Qi planning a follow-up that converts spin_lock_irq() to
> > > > spin_lock() in places where they disabled IRQs was just to update vmstat?
> > > 
> > > Perhaps this change could be implemented together in [PATCH 1/4]?
> > > 
> > > Of course, it's also reasonable to make it a separate patch. If we
> > > choose this method, I’m fine with either me or Shakeel doing it.
> > > 
> > 
> > Let's do it separately as I wanted to keep the memcg related changes
> > self-contained.
> 
> OK.

Agreed.

> > Qi, can you please take a stab at that?
> 
> Sure, I will do it.

I'll be more than happy to review that ;)

> > > > Qi's zombie memcg series will depends on that work I guess..
> > > 
> > > Yes, and there are other places that also need to be converted, such as
> > > __folio_migrate_mapping().
> > 
> > I see __mod_zone_page_state() usage in __folio_migrate_mapping() and
> > using the same reasoning we can convert it to use mod_zone_page_state().
> > Where else do you need to do these conversions (other than
> > __folio_migrate_mapping)?
> 
> I mean converting these places to use spin_lock() instead of
> spin_lock_irq().

Just one thing I noticed while looking at __folio_migrate_mapping()...

- xas_lock_irq() -> xas_unlock() -> local_irq_enable()
- swap_cluster_get_and_lock_irq() -> swap_cluster_unlock() -> local_irq_enable()

is wrong because spin_lock_irq() doesn't disable IRQ on PREEMPT_RT.

Not 100% sure if it would be benign or lead to actual bugs that need
to be fixed in -stable kernels.

Cc'ing RT folks again :)

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] memcg: cleanup the memcg stats interfaces
  2025-11-11  3:00         ` Shakeel Butt
@ 2025-11-11  3:07           ` Qi Zheng
  2025-11-11  3:18             ` Harry Yoo
  0 siblings, 1 reply; 31+ messages in thread
From: Qi Zheng @ 2025-11-11  3:07 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Harry Yoo, Andrew Morton, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team



On 11/11/25 11:00 AM, Shakeel Butt wrote:
> On Tue, Nov 11, 2025 at 10:48:18AM +0800, Qi Zheng wrote:
>> Hi Shakeel,
>>
>> On 11/11/25 10:39 AM, Shakeel Butt wrote:
>>> On Tue, Nov 11, 2025 at 10:23:15AM +0800, Qi Zheng wrote:
>>>> Hi,
>>>>
>>> [...]
>>>>>
>>>>> Are you or Qi planning a follow-up that converts spin_lock_irq() to
>>>>> spin_lock() in places where they disabled IRQs was just to update vmstat?
>>>>
>>>> Perhaps this change could be implemented together in [PATCH 1/4]?
>>>>
>>>> Of course, it's also reasonable to make it a separate patch. If we
>>>> choose this method, I’m fine with either me or Shakeel doing it.
>>>>
>>>
>>> Let's do it separately as I wanted to keep the memcg related changes
>>> self-contained.
>>
>> OK.
>>
>>>
>>> Qi, can you please take a stab at that?
>>
>> Sure, I will do it.
>>
>>>
>>>>>
>>>>> Qi's zombie memcg series will depends on that work I guess..
>>>>
>>>> Yes, and there are other places that also need to be converted, such as
>>>> __folio_migrate_mapping().
>>>
>>> I see __mod_zone_page_state() usage in __folio_migrate_mapping() and
>>> using the same reasoning we can convert it to use mod_zone_page_state().
>>> Where else do you need to do these conversions (other than
>>> __folio_migrate_mapping)?
>>
>> I mean converting these places to use spin_lock() instead of
>> spin_lock_irq().
> 
> For only stats, right?

Right, for thoes places where they disabled IRQs was just to update
vmstat.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] memcg: cleanup the memcg stats interfaces
  2025-11-11  3:07           ` Qi Zheng
@ 2025-11-11  3:18             ` Harry Yoo
  2025-11-11  3:29               ` Qi Zheng
  0 siblings, 1 reply; 31+ messages in thread
From: Harry Yoo @ 2025-11-11  3:18 UTC (permalink / raw)
  To: Qi Zheng
  Cc: Shakeel Butt, Andrew Morton, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

On Tue, Nov 11, 2025 at 11:07:09AM +0800, Qi Zheng wrote:
> 
> 
> On 11/11/25 11:00 AM, Shakeel Butt wrote:
> > On Tue, Nov 11, 2025 at 10:48:18AM +0800, Qi Zheng wrote:
> > > Hi Shakeel,
> > > 
> > > On 11/11/25 10:39 AM, Shakeel Butt wrote:
> > > > On Tue, Nov 11, 2025 at 10:23:15AM +0800, Qi Zheng wrote:
> > > > > Hi,
> > > > > 
> > > > [...]
> > > > > > 
> > > > > > Are you or Qi planning a follow-up that converts spin_lock_irq() to
> > > > > > spin_lock() in places where they disabled IRQs was just to update vmstat?
> > > > > 
> > > > > Perhaps this change could be implemented together in [PATCH 1/4]?
> > > > > 
> > > > > Of course, it's also reasonable to make it a separate patch. If we
> > > > > choose this method, I’m fine with either me or Shakeel doing it.
> > > > > 
> > > > 
> > > > Let's do it separately as I wanted to keep the memcg related changes
> > > > self-contained.
> > > 
> > > OK.
> > > 
> > > > 
> > > > Qi, can you please take a stab at that?
> > > 
> > > Sure, I will do it.
> > > 
> > > > 
> > > > > > 
> > > > > > Qi's zombie memcg series will depends on that work I guess..
> > > > > 
> > > > > Yes, and there are other places that also need to be converted, such as
> > > > > __folio_migrate_mapping().
> > > > 
> > > > I see __mod_zone_page_state() usage in __folio_migrate_mapping() and
> > > > using the same reasoning we can convert it to use mod_zone_page_state().
> > > > Where else do you need to do these conversions (other than
> > > > __folio_migrate_mapping)?
> > > 
> > > I mean converting these places to use spin_lock() instead of
> > > spin_lock_irq().
> > 
> > For only stats, right?
> 
> Right, for thoes places where they disabled IRQs was just to update
> vmstat.

...Or if they disabled IRQs for other reasons as well, we can still move
vmstat update code outside the IRQ disabled region.

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] memcg: cleanup the memcg stats interfaces
  2025-11-11  3:18             ` Harry Yoo
@ 2025-11-11  3:29               ` Qi Zheng
  0 siblings, 0 replies; 31+ messages in thread
From: Qi Zheng @ 2025-11-11  3:29 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Shakeel Butt, Andrew Morton, Johannes Weiner, Michal Hocko,
	Roman Gushchin, Muchun Song, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team



On 11/11/25 11:18 AM, Harry Yoo wrote:
> On Tue, Nov 11, 2025 at 11:07:09AM +0800, Qi Zheng wrote:
>>
>>
>> On 11/11/25 11:00 AM, Shakeel Butt wrote:
>>> On Tue, Nov 11, 2025 at 10:48:18AM +0800, Qi Zheng wrote:
>>>> Hi Shakeel,
>>>>
>>>> On 11/11/25 10:39 AM, Shakeel Butt wrote:
>>>>> On Tue, Nov 11, 2025 at 10:23:15AM +0800, Qi Zheng wrote:
>>>>>> Hi,
>>>>>>
>>>>> [...]
>>>>>>>
>>>>>>> Are you or Qi planning a follow-up that converts spin_lock_irq() to
>>>>>>> spin_lock() in places where they disabled IRQs was just to update vmstat?
>>>>>>
>>>>>> Perhaps this change could be implemented together in [PATCH 1/4]?
>>>>>>
>>>>>> Of course, it's also reasonable to make it a separate patch. If we
>>>>>> choose this method, I’m fine with either me or Shakeel doing it.
>>>>>>
>>>>>
>>>>> Let's do it separately as I wanted to keep the memcg related changes
>>>>> self-contained.
>>>>
>>>> OK.
>>>>
>>>>>
>>>>> Qi, can you please take a stab at that?
>>>>
>>>> Sure, I will do it.
>>>>
>>>>>
>>>>>>>
>>>>>>> Qi's zombie memcg series will depends on that work I guess..
>>>>>>
>>>>>> Yes, and there are other places that also need to be converted, such as
>>>>>> __folio_migrate_mapping().
>>>>>
>>>>> I see __mod_zone_page_state() usage in __folio_migrate_mapping() and
>>>>> using the same reasoning we can convert it to use mod_zone_page_state().
>>>>> Where else do you need to do these conversions (other than
>>>>> __folio_migrate_mapping)?
>>>>
>>>> I mean converting these places to use spin_lock() instead of
>>>> spin_lock_irq().
>>>
>>> For only stats, right?
>>
>> Right, for thoes places where they disabled IRQs was just to update
>> vmstat.
> 
> ...Or if they disabled IRQs for other reasons as well, we can still move
> vmstat update code outside the IRQ disabled region.

Ok, I will take a closer look.

> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/4] memcg: remove __mod_lruvec_state
  2025-11-10 23:20 ` [PATCH 3/4] memcg: remove __mod_lruvec_state Shakeel Butt
@ 2025-11-11  5:21   ` Harry Yoo
  2025-11-11 18:58   ` Roman Gushchin
  1 sibling, 0 replies; 31+ messages in thread
From: Harry Yoo @ 2025-11-11  5:21 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Qi Zheng, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

On Mon, Nov 10, 2025 at 03:20:07PM -0800, Shakeel Butt wrote:
> The __mod_lruvec_state is already safe against irqs, so there is no
> need to have a separate interface (i.e. mod_lruvec_state) which
> wraps calls to it with irq disabling and reenabling. Let's rename
> __mod_lruvec_state to mod_lruvec_state.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---

Looks good to me,
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/4] memcg: remove __lruvec_stat_mod_folio
  2025-11-10 23:20 ` [PATCH 4/4] memcg: remove __lruvec_stat_mod_folio Shakeel Butt
@ 2025-11-11  5:41   ` Harry Yoo
  2025-11-11 18:59   ` Roman Gushchin
  1 sibling, 0 replies; 31+ messages in thread
From: Harry Yoo @ 2025-11-11  5:41 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Muchun Song, Qi Zheng, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

On Mon, Nov 10, 2025 at 03:20:08PM -0800, Shakeel Butt wrote:
> The __lruvec_stat_mod_folio is already safe against irqs, so there is no
> need to have a separate interface (i.e. lruvec_stat_mod_folio) which
> wraps calls to it with irq disabling and reenabling. Let's rename
> __lruvec_stat_mod_folio to lruvec_stat_mod_folio.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---

Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] memcg: cleanup the memcg stats interfaces
  2025-11-11  3:05         ` Harry Yoo
@ 2025-11-11  8:01           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 31+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-11-11  8:01 UTC (permalink / raw)
  To: Harry Yoo
  Cc: Qi Zheng, Shakeel Butt, Andrew Morton, Johannes Weiner,
	Michal Hocko, Roman Gushchin, Muchun Song, Vlastimil Babka,
	linux-mm, cgroups, linux-kernel, Meta kernel team,
	Clark Williams, linux-rt-devel

> > I mean converting these places to use spin_lock() instead of
> > spin_lock_irq().
> 
> Just one thing I noticed while looking at __folio_migrate_mapping()...
> 
> - xas_lock_irq() -> xas_unlock() -> local_irq_enable()
> - swap_cluster_get_and_lock_irq() -> swap_cluster_unlock() -> local_irq_enable()
> 
> is wrong because spin_lock_irq() doesn't disable IRQ on PREEMPT_RT.
> 
> Not 100% sure if it would be benign or lead to actual bugs that need
> to be fixed in -stable kernels.
> 
> Cc'ing RT folks again :)

The tail in __folio_migrate_mapping() after xas_unlock()/
swap_cluster_unlock(), is limited to __mod_lruvec_state() based stats
updates. There is a preempt_disable_nested() in __mod_zone_page_state()
to ensure that the update happens on the same CPU and is not preempted.
On PREEMPT_RT there should be no in-IRQ updates of these counters.
The IRQ enable at the end does nothing. There might be CPU migration
between the individual stats updates.
If it remains like this, it is fine. Please don't advertise ;)

Sebastian


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/4] memcg: remove __mod_lruvec_kmem_state
  2025-11-10 23:20 ` [PATCH 2/4] memcg: remove __mod_lruvec_kmem_state Shakeel Butt
  2025-11-11  1:46   ` Harry Yoo
@ 2025-11-11  8:23   ` Qi Zheng
  2025-11-11 18:58   ` Roman Gushchin
  2 siblings, 0 replies; 31+ messages in thread
From: Qi Zheng @ 2025-11-11  8:23 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Harry Yoo, Vlastimil Babka, linux-mm, cgroups, linux-kernel,
	Meta kernel team, Andrew Morton

Hi Shakeel,

On 11/11/25 7:20 AM, Shakeel Butt wrote:
> The __mod_lruvec_kmem_state is already safe against irqs, so there is no
> need to have a separate interface (i.e. mod_lruvec_kmem_state) which
> wraps calls to it with irq disabling and reenabling. Let's rename
> __mod_lruvec_kmem_state to mod_lruvec_kmem_state.
> 
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Reviewed-by: Qi Zheng <zhengqi.arch@bytedance.com>

One nit below:

> ---
>   include/linux/memcontrol.h | 28 +++++-----------------------
>   mm/memcontrol.c            |  2 +-
>   mm/workingset.c            |  2 +-
>   3 files changed, 7 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index f82fac2fd988..1384a9d305e1 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -957,17 +957,7 @@ unsigned long lruvec_page_state_local(struct lruvec *lruvec,
>   void mem_cgroup_flush_stats(struct mem_cgroup *memcg);
>   void mem_cgroup_flush_stats_ratelimited(struct mem_cgroup *memcg);
>   
> -void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val);
> -
> -static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
> -					 int val)
> -{
> -	unsigned long flags;
> -
> -	local_irq_save(flags);
> -	__mod_lruvec_kmem_state(p, idx, val);
> -	local_irq_restore(flags);
> -}
> +void mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val);
>   
>   void count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx,
>   			unsigned long count);
> @@ -1403,14 +1393,6 @@ static inline void mem_cgroup_flush_stats_ratelimited(struct mem_cgroup *memcg)
>   {
>   }
>   
> -static inline void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
> -					   int val)
> -{
> -	struct page *page = virt_to_head_page(p);
> -
> -	mod_node_page_state(page_pgdat(page), idx, val);
> -}
> -
>   static inline void mod_lruvec_kmem_state(void *p, enum node_stat_item idx,
>   					 int val)
>   {
> @@ -1470,14 +1452,14 @@ struct slabobj_ext {
>   #endif
>   } __aligned(8);
>   
> -static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
> +static inline void inc_lruvec_kmem_state(void *p, enum node_stat_item idx)
>   {
> -	__mod_lruvec_kmem_state(p, idx, 1);
> +	mod_lruvec_kmem_state(p, idx, 1);
>   }

The inc_lruvec_kmem_state() has only one user.

>   
> -static inline void __dec_lruvec_kmem_state(void *p, enum node_stat_item idx)
> +static inline void dec_lruvec_kmem_state(void *p, enum node_stat_item idx)
>   {
> -	__mod_lruvec_kmem_state(p, idx, -1);
> +	mod_lruvec_kmem_state(p, idx, -1);
>   }

The dec_lruvec_kmem_state() has no user.

Not sure whether inc_lruvec_kmem_state() and dec_lruvec_kmem_state()
also need to be removed.

Thanks,
Qi

>   
>   static inline struct lruvec *parent_lruvec(struct lruvec *lruvec)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index f4b8a6414ed3..3a59d3ee92a7 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -799,7 +799,7 @@ void __lruvec_stat_mod_folio(struct folio *folio, enum node_stat_item idx,
>   }
>   EXPORT_SYMBOL(__lruvec_stat_mod_folio);
>   
> -void __mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
> +void mod_lruvec_kmem_state(void *p, enum node_stat_item idx, int val)
>   {
>   	pg_data_t *pgdat = page_pgdat(virt_to_page(p));
>   	struct mem_cgroup *memcg;
> diff --git a/mm/workingset.c b/mm/workingset.c
> index d32dc2e02a61..892f6fe94ea9 100644
> --- a/mm/workingset.c
> +++ b/mm/workingset.c
> @@ -749,7 +749,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
>   	if (WARN_ON_ONCE(node->count != node->nr_values))
>   		goto out_invalid;
>   	xa_delete_node(node, workingset_update_node);
> -	__inc_lruvec_kmem_state(node, WORKINGSET_NODERECLAIM);
> +	inc_lruvec_kmem_state(node, WORKINGSET_NODERECLAIM);
>   
>   out_invalid:
>   	xa_unlock_irq(&mapping->i_pages);



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] memcg: cleanup the memcg stats interfaces
  2025-11-10 23:20 [PATCH 0/4] memcg: cleanup the memcg stats interfaces Shakeel Butt
                   ` (4 preceding siblings ...)
  2025-11-11  0:59 ` [PATCH 0/4] memcg: cleanup the memcg stats interfaces Harry Yoo
@ 2025-11-11  8:36 ` Qi Zheng
  2025-11-11 16:45   ` Shakeel Butt
  2025-11-11  9:54 ` Vlastimil Babka
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Qi Zheng @ 2025-11-11  8:36 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Harry Yoo, Vlastimil Babka, linux-mm, cgroups, linux-kernel,
	Meta kernel team, Andrew Morton

Hi Shakeel,

On 11/11/25 7:20 AM, Shakeel Butt wrote:
> The memcg stats are safe against irq (and nmi) context and thus does not
> require disabling irqs. However for some stats which are also maintained
> at node level, it is using irq unsafe interface and thus requiring the
> users to still disables irqs or use interfaces which explicitly disables
> irqs. Let's move memcg code to use irq safe node level stats function
> which is already optimized for architectures with HAVE_CMPXCHG_LOCAL
> (all major ones), so there will not be any performance penalty for its
> usage.

Generally, places that call __mod_lruvec_state() also call
__mod_zone_page_state(), and it also has the corresponding optimized
version (mod_zone_page_state()). It seems necessary to clean that up
as well, so that those disabling-IRQs that are only used for updating
vmstat can be removed.

Thanks,
Qi

> 
> Shakeel Butt (4):
>    memcg: use mod_node_page_state to update stats
>    memcg: remove __mod_lruvec_kmem_state
>    memcg: remove __mod_lruvec_state
>    memcg: remove __lruvec_stat_mod_folio
> 
>   include/linux/memcontrol.h | 28 ++++------------------
>   include/linux/mm_inline.h  |  2 +-
>   include/linux/vmstat.h     | 48 ++------------------------------------
>   mm/filemap.c               | 20 ++++++++--------
>   mm/huge_memory.c           |  4 ++--
>   mm/khugepaged.c            |  8 +++----
>   mm/memcontrol.c            | 20 ++++++++--------
>   mm/migrate.c               | 20 ++++++++--------
>   mm/page-writeback.c        |  2 +-
>   mm/rmap.c                  |  4 ++--
>   mm/shmem.c                 |  6 ++---
>   mm/vmscan.c                |  4 ++--
>   mm/workingset.c            |  2 +-
>   13 files changed, 53 insertions(+), 115 deletions(-)
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] memcg: cleanup the memcg stats interfaces
  2025-11-10 23:20 [PATCH 0/4] memcg: cleanup the memcg stats interfaces Shakeel Butt
                   ` (5 preceding siblings ...)
  2025-11-11  8:36 ` Qi Zheng
@ 2025-11-11  9:54 ` Vlastimil Babka
  2025-11-11 19:01 ` Roman Gushchin
  2025-11-15 19:27 ` Shakeel Butt
  8 siblings, 0 replies; 31+ messages in thread
From: Vlastimil Babka @ 2025-11-11  9:54 UTC (permalink / raw)
  To: Shakeel Butt, Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Harry Yoo, Qi Zheng, linux-mm, cgroups, linux-kernel,
	Meta kernel team

On 11/11/25 00:20, Shakeel Butt wrote:
> The memcg stats are safe against irq (and nmi) context and thus does not
> require disabling irqs. However for some stats which are also maintained
> at node level, it is using irq unsafe interface and thus requiring the
> users to still disables irqs or use interfaces which explicitly disables
> irqs. Let's move memcg code to use irq safe node level stats function
> which is already optimized for architectures with HAVE_CMPXCHG_LOCAL
> (all major ones), so there will not be any performance penalty for its
> usage.
> 
> Shakeel Butt (4):
>   memcg: use mod_node_page_state to update stats
>   memcg: remove __mod_lruvec_kmem_state
>   memcg: remove __mod_lruvec_state
>   memcg: remove __lruvec_stat_mod_folio

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> 
>  include/linux/memcontrol.h | 28 ++++------------------
>  include/linux/mm_inline.h  |  2 +-
>  include/linux/vmstat.h     | 48 ++------------------------------------
>  mm/filemap.c               | 20 ++++++++--------
>  mm/huge_memory.c           |  4 ++--
>  mm/khugepaged.c            |  8 +++----
>  mm/memcontrol.c            | 20 ++++++++--------
>  mm/migrate.c               | 20 ++++++++--------
>  mm/page-writeback.c        |  2 +-
>  mm/rmap.c                  |  4 ++--
>  mm/shmem.c                 |  6 ++---
>  mm/vmscan.c                |  4 ++--
>  mm/workingset.c            |  2 +-
>  13 files changed, 53 insertions(+), 115 deletions(-)
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] memcg: cleanup the memcg stats interfaces
  2025-11-11  8:36 ` Qi Zheng
@ 2025-11-11 16:45   ` Shakeel Butt
  2025-11-12  2:11     ` Qi Zheng
  0 siblings, 1 reply; 31+ messages in thread
From: Shakeel Butt @ 2025-11-11 16:45 UTC (permalink / raw)
  To: Qi Zheng
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Harry Yoo, Vlastimil Babka, linux-mm, cgroups, linux-kernel,
	Meta kernel team, Andrew Morton

On Tue, Nov 11, 2025 at 04:36:14PM +0800, Qi Zheng wrote:
> Hi Shakeel,
> 
> On 11/11/25 7:20 AM, Shakeel Butt wrote:
> > The memcg stats are safe against irq (and nmi) context and thus does not
> > require disabling irqs. However for some stats which are also maintained
> > at node level, it is using irq unsafe interface and thus requiring the
> > users to still disables irqs or use interfaces which explicitly disables
> > irqs. Let's move memcg code to use irq safe node level stats function
> > which is already optimized for architectures with HAVE_CMPXCHG_LOCAL
> > (all major ones), so there will not be any performance penalty for its
> > usage.
> 
> Generally, places that call __mod_lruvec_state() also call
> __mod_zone_page_state(), and it also has the corresponding optimized
> version (mod_zone_page_state()). It seems necessary to clean that up
> as well, so that those disabling-IRQs that are only used for updating
> vmstat can be removed.

I agree, please take a stab at that.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/4] memcg: use mod_node_page_state to update stats
  2025-11-10 23:20 ` [PATCH 1/4] memcg: use mod_node_page_state to update stats Shakeel Butt
  2025-11-11  1:39   ` Harry Yoo
@ 2025-11-11 18:58   ` Roman Gushchin
  1 sibling, 0 replies; 31+ messages in thread
From: Roman Gushchin @ 2025-11-11 18:58 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
	Harry Yoo, Qi Zheng, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

Shakeel Butt <shakeel.butt@linux.dev> writes:

> The memcg stats are safe against irq (and nmi) context and thus does not
> require disabling irqs. However some code paths for memcg stats also
> update the node level stats and use irq unsafe interface and thus
> require the users to disable irqs. However node level stats, on
> architectures with HAVE_CMPXCHG_LOCAL (all major ones), has interface
> which does not require irq disabling. Let's move memcg stats code to
> start using that interface for node level stats.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Acked-by: Roman Gushchin <roman.gushchin@linux.dev>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/4] memcg: remove __mod_lruvec_kmem_state
  2025-11-10 23:20 ` [PATCH 2/4] memcg: remove __mod_lruvec_kmem_state Shakeel Butt
  2025-11-11  1:46   ` Harry Yoo
  2025-11-11  8:23   ` Qi Zheng
@ 2025-11-11 18:58   ` Roman Gushchin
  2 siblings, 0 replies; 31+ messages in thread
From: Roman Gushchin @ 2025-11-11 18:58 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
	Harry Yoo, Qi Zheng, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

Shakeel Butt <shakeel.butt@linux.dev> writes:

> The __mod_lruvec_kmem_state is already safe against irqs, so there is no
> need to have a separate interface (i.e. mod_lruvec_kmem_state) which
> wraps calls to it with irq disabling and reenabling. Let's rename
> __mod_lruvec_kmem_state to mod_lruvec_kmem_state.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Acked-by: Roman Gushchin <roman.gushchin@linux.dev>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/4] memcg: remove __mod_lruvec_state
  2025-11-10 23:20 ` [PATCH 3/4] memcg: remove __mod_lruvec_state Shakeel Butt
  2025-11-11  5:21   ` Harry Yoo
@ 2025-11-11 18:58   ` Roman Gushchin
  1 sibling, 0 replies; 31+ messages in thread
From: Roman Gushchin @ 2025-11-11 18:58 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
	Harry Yoo, Qi Zheng, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

Shakeel Butt <shakeel.butt@linux.dev> writes:

> The __mod_lruvec_state is already safe against irqs, so there is no
> need to have a separate interface (i.e. mod_lruvec_state) which
> wraps calls to it with irq disabling and reenabling. Let's rename
> __mod_lruvec_state to mod_lruvec_state.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Acked-by: Roman Gushchin <roman.gushchin@linux.dev>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/4] memcg: remove __lruvec_stat_mod_folio
  2025-11-10 23:20 ` [PATCH 4/4] memcg: remove __lruvec_stat_mod_folio Shakeel Butt
  2025-11-11  5:41   ` Harry Yoo
@ 2025-11-11 18:59   ` Roman Gushchin
  1 sibling, 0 replies; 31+ messages in thread
From: Roman Gushchin @ 2025-11-11 18:59 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
	Harry Yoo, Qi Zheng, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

Shakeel Butt <shakeel.butt@linux.dev> writes:

> The __lruvec_stat_mod_folio is already safe against irqs, so there is no
> need to have a separate interface (i.e. lruvec_stat_mod_folio) which
> wraps calls to it with irq disabling and reenabling. Let's rename
> __lruvec_stat_mod_folio to lruvec_stat_mod_folio.
>
> Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>

Acked-by: Roman Gushchin <roman.gushchin@linux.dev>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] memcg: cleanup the memcg stats interfaces
  2025-11-10 23:20 [PATCH 0/4] memcg: cleanup the memcg stats interfaces Shakeel Butt
                   ` (6 preceding siblings ...)
  2025-11-11  9:54 ` Vlastimil Babka
@ 2025-11-11 19:01 ` Roman Gushchin
  2025-11-11 19:34   ` Shakeel Butt
  2025-11-15 19:27 ` Shakeel Butt
  8 siblings, 1 reply; 31+ messages in thread
From: Roman Gushchin @ 2025-11-11 19:01 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
	Harry Yoo, Qi Zheng, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

Shakeel Butt <shakeel.butt@linux.dev> writes:

> The memcg stats are safe against irq (and nmi) context and thus does not
> require disabling irqs. However for some stats which are also maintained
> at node level, it is using irq unsafe interface and thus requiring the
> users to still disables irqs or use interfaces which explicitly disables
> irqs. Let's move memcg code to use irq safe node level stats function
> which is already optimized for architectures with HAVE_CMPXCHG_LOCAL
> (all major ones), so there will not be any performance penalty for its
> usage.

Do you have any production data for this or it's theory-based?

In general I feel we need a benchmark focused on memcg stats:
there was a number of performance improvements and regressions in this
code over last years, so a dedicated benchmark can help with measuring
them.

Nice cleanup btw, thanks!


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] memcg: cleanup the memcg stats interfaces
  2025-11-11 19:01 ` Roman Gushchin
@ 2025-11-11 19:34   ` Shakeel Butt
  0 siblings, 0 replies; 31+ messages in thread
From: Shakeel Butt @ 2025-11-11 19:34 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Muchun Song,
	Harry Yoo, Qi Zheng, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

On Tue, Nov 11, 2025 at 11:01:47AM -0800, Roman Gushchin wrote:
> Shakeel Butt <shakeel.butt@linux.dev> writes:
> 
> > The memcg stats are safe against irq (and nmi) context and thus does not
> > require disabling irqs. However for some stats which are also maintained
> > at node level, it is using irq unsafe interface and thus requiring the
> > users to still disables irqs or use interfaces which explicitly disables
> > irqs. Let's move memcg code to use irq safe node level stats function
> > which is already optimized for architectures with HAVE_CMPXCHG_LOCAL
> > (all major ones), so there will not be any performance penalty for its
> > usage.
> 
> Do you have any production data for this or it's theory-based?

At the moment it is theory-based or more specifically based on the
comments on HAVE_CMPXCHG_LOCAL variants of stats update functions.

> 
> In general I feel we need a benchmark focused on memcg stats:
> there was a number of performance improvements and regressions in this
> code over last years, so a dedicated benchmark can help with measuring
> them.

Yeah it makes sense to have a benchmark. Let me see which benchmark
trigger this code paths a lot. At the high level, these interfaces are
used in reclaim and migration which are not really that performance
critical. I will try benchmarks with a lot of allocs/frees.

> 
> Nice cleanup btw, thanks!

Thanks.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] memcg: cleanup the memcg stats interfaces
  2025-11-11 16:45   ` Shakeel Butt
@ 2025-11-12  2:11     ` Qi Zheng
  0 siblings, 0 replies; 31+ messages in thread
From: Qi Zheng @ 2025-11-12  2:11 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Harry Yoo, Vlastimil Babka, linux-mm, cgroups, linux-kernel,
	Meta kernel team, Andrew Morton



On 11/12/25 12:45 AM, Shakeel Butt wrote:
> On Tue, Nov 11, 2025 at 04:36:14PM +0800, Qi Zheng wrote:
>> Hi Shakeel,
>>
>> On 11/11/25 7:20 AM, Shakeel Butt wrote:
>>> The memcg stats are safe against irq (and nmi) context and thus does not
>>> require disabling irqs. However for some stats which are also maintained
>>> at node level, it is using irq unsafe interface and thus requiring the
>>> users to still disables irqs or use interfaces which explicitly disables
>>> irqs. Let's move memcg code to use irq safe node level stats function
>>> which is already optimized for architectures with HAVE_CMPXCHG_LOCAL
>>> (all major ones), so there will not be any performance penalty for its
>>> usage.
>>
>> Generally, places that call __mod_lruvec_state() also call
>> __mod_zone_page_state(), and it also has the corresponding optimized
>> version (mod_zone_page_state()). It seems necessary to clean that up
>> as well, so that those disabling-IRQs that are only used for updating
>> vmstat can be removed.
> 
> I agree, please take a stab at that.

OK, will do.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/4] memcg: cleanup the memcg stats interfaces
  2025-11-10 23:20 [PATCH 0/4] memcg: cleanup the memcg stats interfaces Shakeel Butt
                   ` (7 preceding siblings ...)
  2025-11-11 19:01 ` Roman Gushchin
@ 2025-11-15 19:27 ` Shakeel Butt
  8 siblings, 0 replies; 31+ messages in thread
From: Shakeel Butt @ 2025-11-15 19:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Muchun Song,
	Harry Yoo, Qi Zheng, Vlastimil Babka, linux-mm, cgroups,
	linux-kernel, Meta kernel team

Hi Andrew, can you please pick this series as it is ready for wider
testing.

thanks,
Shakeel

On Mon, Nov 10, 2025 at 03:20:04PM -0800, Shakeel Butt wrote:
> The memcg stats are safe against irq (and nmi) context and thus does not
> require disabling irqs. However for some stats which are also maintained
> at node level, it is using irq unsafe interface and thus requiring the
> users to still disables irqs or use interfaces which explicitly disables
> irqs. Let's move memcg code to use irq safe node level stats function
> which is already optimized for architectures with HAVE_CMPXCHG_LOCAL
> (all major ones), so there will not be any performance penalty for its
> usage.
> 
> Shakeel Butt (4):
>   memcg: use mod_node_page_state to update stats
>   memcg: remove __mod_lruvec_kmem_state
>   memcg: remove __mod_lruvec_state
>   memcg: remove __lruvec_stat_mod_folio
> 
>  include/linux/memcontrol.h | 28 ++++------------------
>  include/linux/mm_inline.h  |  2 +-
>  include/linux/vmstat.h     | 48 ++------------------------------------
>  mm/filemap.c               | 20 ++++++++--------
>  mm/huge_memory.c           |  4 ++--
>  mm/khugepaged.c            |  8 +++----
>  mm/memcontrol.c            | 20 ++++++++--------
>  mm/migrate.c               | 20 ++++++++--------
>  mm/page-writeback.c        |  2 +-
>  mm/rmap.c                  |  4 ++--
>  mm/shmem.c                 |  6 ++---
>  mm/vmscan.c                |  4 ++--
>  mm/workingset.c            |  2 +-
>  13 files changed, 53 insertions(+), 115 deletions(-)
> 
> -- 
> 2.47.3
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2025-11-15 19:27 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-10 23:20 [PATCH 0/4] memcg: cleanup the memcg stats interfaces Shakeel Butt
2025-11-10 23:20 ` [PATCH 1/4] memcg: use mod_node_page_state to update stats Shakeel Butt
2025-11-11  1:39   ` Harry Yoo
2025-11-11 18:58   ` Roman Gushchin
2025-11-10 23:20 ` [PATCH 2/4] memcg: remove __mod_lruvec_kmem_state Shakeel Butt
2025-11-11  1:46   ` Harry Yoo
2025-11-11  8:23   ` Qi Zheng
2025-11-11 18:58   ` Roman Gushchin
2025-11-10 23:20 ` [PATCH 3/4] memcg: remove __mod_lruvec_state Shakeel Butt
2025-11-11  5:21   ` Harry Yoo
2025-11-11 18:58   ` Roman Gushchin
2025-11-10 23:20 ` [PATCH 4/4] memcg: remove __lruvec_stat_mod_folio Shakeel Butt
2025-11-11  5:41   ` Harry Yoo
2025-11-11 18:59   ` Roman Gushchin
2025-11-11  0:59 ` [PATCH 0/4] memcg: cleanup the memcg stats interfaces Harry Yoo
2025-11-11  2:23   ` Qi Zheng
2025-11-11  2:39     ` Shakeel Butt
2025-11-11  2:48       ` Qi Zheng
2025-11-11  3:00         ` Shakeel Butt
2025-11-11  3:07           ` Qi Zheng
2025-11-11  3:18             ` Harry Yoo
2025-11-11  3:29               ` Qi Zheng
2025-11-11  3:05         ` Harry Yoo
2025-11-11  8:01           ` Sebastian Andrzej Siewior
2025-11-11  8:36 ` Qi Zheng
2025-11-11 16:45   ` Shakeel Butt
2025-11-12  2:11     ` Qi Zheng
2025-11-11  9:54 ` Vlastimil Babka
2025-11-11 19:01 ` Roman Gushchin
2025-11-11 19:34   ` Shakeel Butt
2025-11-15 19:27 ` Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox