* [PATCH v2] mm: move pgscan and pgsteal to node stats
@ 2026-02-18 3:29 JP Kobryn (Meta)
2026-02-18 6:44 ` Michael S. Tsirkin
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: JP Kobryn (Meta) @ 2026-02-18 3:29 UTC (permalink / raw)
To: linux-mm, mst, mhocko, vbabka
Cc: apopple, akpm, axelrasmussen, byungchul, cgroups, david,
eperezma, gourry, jasowang, hannes, joshua.hahnjy, Liam.Howlett,
linux-kernel, lorenzo.stoakes, matthew.brost, rppt, muchun.song,
zhengqi.arch, rakie.kim, roman.gushchin, shakeel.butt, surenb,
virtualization, weixugc, xuanzhuo, ying.huang, yuanchu, ziy,
kernel-team
From: JP Kobryn <jp.kobryn@linux.dev>
There are situations where reclaim kicks in on a system with free memory.
One possible cause is a NUMA imbalance scenario where one or more nodes are
under pressure. It would help if we could easily identify such nodes.
Move the pgscan and pgsteal counters from vm_event_item to node_stat_item
to provide per-node reclaim visibility. With these counters as node stats,
the values are now displayed in the per-node section of /proc/zoneinfo,
which allows for quick identification of the affected nodes.
/proc/vmstat continues to report the same counters, aggregated across all
nodes. But the ordering of these items within the readout changes as they
move from the vm events section to the node stats section.
Memcg accounting of these counters is preserved. The relocated counters
remain visible in memory.stat alongside the existing aggregate pgscan and
pgsteal counters.
However, this change affects how the global counters are accumulated.
Previously, the global event count update was gated on !cgroup_reclaim(),
excluding memcg-based reclaim from /proc/vmstat. Now that
mod_lruvec_state() is being used to update the counters, the global
counters will include all reclaim. This is consistent with how pgdemote
counters are already tracked.
Finally, the virtio_balloon driver is updated to use
global_node_page_state() to fetch the counters, as they are no longer
accessible through the vm_events array.
Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
---
drivers/virtio/virtio_balloon.c | 8 ++---
include/linux/mmzone.h | 12 ++++++++
include/linux/vm_event_item.h | 12 --------
mm/memcontrol.c | 52 +++++++++++++++++++++++----------
mm/vmscan.c | 32 ++++++++------------
mm/vmstat.c | 24 +++++++--------
6 files changed, 76 insertions(+), 64 deletions(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 4e549abe59ff..ab945532ceef 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -369,13 +369,13 @@ static inline unsigned int update_balloon_vm_stats(struct virtio_balloon *vb)
update_stat(vb, idx++, VIRTIO_BALLOON_S_ALLOC_STALL, stall);
update_stat(vb, idx++, VIRTIO_BALLOON_S_ASYNC_SCAN,
- pages_to_bytes(events[PGSCAN_KSWAPD]));
+ pages_to_bytes(global_node_page_state(PGSCAN_KSWAPD)));
update_stat(vb, idx++, VIRTIO_BALLOON_S_DIRECT_SCAN,
- pages_to_bytes(events[PGSCAN_DIRECT]));
+ pages_to_bytes(global_node_page_state(PGSCAN_DIRECT)));
update_stat(vb, idx++, VIRTIO_BALLOON_S_ASYNC_RECLAIM,
- pages_to_bytes(events[PGSTEAL_KSWAPD]));
+ pages_to_bytes(global_node_page_state(PGSTEAL_KSWAPD)));
update_stat(vb, idx++, VIRTIO_BALLOON_S_DIRECT_RECLAIM,
- pages_to_bytes(events[PGSTEAL_DIRECT]));
+ pages_to_bytes(global_node_page_state(PGSTEAL_DIRECT)));
#ifdef CONFIG_HUGETLB_PAGE
update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3e51190a55e4..1aa9c7aec889 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -255,6 +255,18 @@ enum node_stat_item {
PGDEMOTE_DIRECT,
PGDEMOTE_KHUGEPAGED,
PGDEMOTE_PROACTIVE,
+ PGSTEAL_KSWAPD,
+ PGSTEAL_DIRECT,
+ PGSTEAL_KHUGEPAGED,
+ PGSTEAL_PROACTIVE,
+ PGSTEAL_ANON,
+ PGSTEAL_FILE,
+ PGSCAN_KSWAPD,
+ PGSCAN_DIRECT,
+ PGSCAN_KHUGEPAGED,
+ PGSCAN_PROACTIVE,
+ PGSCAN_ANON,
+ PGSCAN_FILE,
#ifdef CONFIG_HUGETLB_PAGE
NR_HUGETLB,
#endif
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 22a139f82d75..1fa3b3ad0ff9 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -40,19 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
PGLAZYFREED,
PGREFILL,
PGREUSE,
- PGSTEAL_KSWAPD,
- PGSTEAL_DIRECT,
- PGSTEAL_KHUGEPAGED,
- PGSTEAL_PROACTIVE,
- PGSCAN_KSWAPD,
- PGSCAN_DIRECT,
- PGSCAN_KHUGEPAGED,
- PGSCAN_PROACTIVE,
PGSCAN_DIRECT_THROTTLE,
- PGSCAN_ANON,
- PGSCAN_FILE,
- PGSTEAL_ANON,
- PGSTEAL_FILE,
#ifdef CONFIG_NUMA
PGSCAN_ZONE_RECLAIM_SUCCESS,
PGSCAN_ZONE_RECLAIM_FAILED,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 007413a53b45..e89e77457701 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -328,6 +328,18 @@ static const unsigned int memcg_node_stat_items[] = {
PGDEMOTE_DIRECT,
PGDEMOTE_KHUGEPAGED,
PGDEMOTE_PROACTIVE,
+ PGSTEAL_KSWAPD,
+ PGSTEAL_DIRECT,
+ PGSTEAL_KHUGEPAGED,
+ PGSTEAL_PROACTIVE,
+ PGSTEAL_ANON,
+ PGSTEAL_FILE,
+ PGSCAN_KSWAPD,
+ PGSCAN_DIRECT,
+ PGSCAN_KHUGEPAGED,
+ PGSCAN_PROACTIVE,
+ PGSCAN_ANON,
+ PGSCAN_FILE,
#ifdef CONFIG_HUGETLB_PAGE
NR_HUGETLB,
#endif
@@ -441,14 +453,6 @@ static const unsigned int memcg_vm_event_stat[] = {
#endif
PSWPIN,
PSWPOUT,
- PGSCAN_KSWAPD,
- PGSCAN_DIRECT,
- PGSCAN_KHUGEPAGED,
- PGSCAN_PROACTIVE,
- PGSTEAL_KSWAPD,
- PGSTEAL_DIRECT,
- PGSTEAL_KHUGEPAGED,
- PGSTEAL_PROACTIVE,
PGFAULT,
PGMAJFAULT,
PGREFILL,
@@ -1382,6 +1386,14 @@ static const struct memory_stat memory_stats[] = {
{ "pgdemote_direct", PGDEMOTE_DIRECT },
{ "pgdemote_khugepaged", PGDEMOTE_KHUGEPAGED },
{ "pgdemote_proactive", PGDEMOTE_PROACTIVE },
+ { "pgsteal_kswapd", PGSTEAL_KSWAPD },
+ { "pgsteal_direct", PGSTEAL_DIRECT },
+ { "pgsteal_khugepaged", PGSTEAL_KHUGEPAGED },
+ { "pgsteal_proactive", PGSTEAL_PROACTIVE },
+ { "pgscan_kswapd", PGSCAN_KSWAPD },
+ { "pgscan_direct", PGSCAN_DIRECT },
+ { "pgscan_khugepaged", PGSCAN_KHUGEPAGED },
+ { "pgscan_proactive", PGSCAN_PROACTIVE },
#ifdef CONFIG_NUMA_BALANCING
{ "pgpromote_success", PGPROMOTE_SUCCESS },
#endif
@@ -1425,6 +1437,14 @@ static int memcg_page_state_output_unit(int item)
case PGDEMOTE_DIRECT:
case PGDEMOTE_KHUGEPAGED:
case PGDEMOTE_PROACTIVE:
+ case PGSTEAL_KSWAPD:
+ case PGSTEAL_DIRECT:
+ case PGSTEAL_KHUGEPAGED:
+ case PGSTEAL_PROACTIVE:
+ case PGSCAN_KSWAPD:
+ case PGSCAN_DIRECT:
+ case PGSCAN_KHUGEPAGED:
+ case PGSCAN_PROACTIVE:
#ifdef CONFIG_NUMA_BALANCING
case PGPROMOTE_SUCCESS:
#endif
@@ -1496,15 +1516,15 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
/* Accumulated memory events */
seq_buf_printf(s, "pgscan %lu\n",
- memcg_events(memcg, PGSCAN_KSWAPD) +
- memcg_events(memcg, PGSCAN_DIRECT) +
- memcg_events(memcg, PGSCAN_PROACTIVE) +
- memcg_events(memcg, PGSCAN_KHUGEPAGED));
+ memcg_page_state(memcg, PGSCAN_KSWAPD) +
+ memcg_page_state(memcg, PGSCAN_DIRECT) +
+ memcg_page_state(memcg, PGSCAN_PROACTIVE) +
+ memcg_page_state(memcg, PGSCAN_KHUGEPAGED));
seq_buf_printf(s, "pgsteal %lu\n",
- memcg_events(memcg, PGSTEAL_KSWAPD) +
- memcg_events(memcg, PGSTEAL_DIRECT) +
- memcg_events(memcg, PGSTEAL_PROACTIVE) +
- memcg_events(memcg, PGSTEAL_KHUGEPAGED));
+ memcg_page_state(memcg, PGSTEAL_KSWAPD) +
+ memcg_page_state(memcg, PGSTEAL_DIRECT) +
+ memcg_page_state(memcg, PGSTEAL_PROACTIVE) +
+ memcg_page_state(memcg, PGSTEAL_KHUGEPAGED));
for (i = 0; i < ARRAY_SIZE(memcg_vm_event_stat); i++) {
#ifdef CONFIG_MEMCG_V1
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 44e4fcd6463c..dd6d87340941 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1984,7 +1984,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
unsigned long nr_taken;
struct reclaim_stat stat;
bool file = is_file_lru(lru);
- enum vm_event_item item;
+ enum node_stat_item item;
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
bool stalled = false;
@@ -2010,10 +2010,8 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken);
item = PGSCAN_KSWAPD + reclaimer_offset(sc);
- if (!cgroup_reclaim(sc))
- __count_vm_events(item, nr_scanned);
- count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned);
- __count_vm_events(PGSCAN_ANON + file, nr_scanned);
+ mod_lruvec_state(lruvec, item, nr_scanned);
+ mod_lruvec_state(lruvec, PGSCAN_ANON + file, nr_scanned);
spin_unlock_irq(&lruvec->lru_lock);
@@ -2030,10 +2028,8 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
stat.nr_demoted);
__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
item = PGSTEAL_KSWAPD + reclaimer_offset(sc);
- if (!cgroup_reclaim(sc))
- __count_vm_events(item, nr_reclaimed);
- count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed);
- __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed);
+ mod_lruvec_state(lruvec, item, nr_reclaimed);
+ mod_lruvec_state(lruvec, PGSTEAL_ANON + file, nr_reclaimed);
lru_note_cost_unlock_irq(lruvec, file, stat.nr_pageout,
nr_scanned - nr_reclaimed);
@@ -4542,7 +4538,7 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
{
int i;
int gen;
- enum vm_event_item item;
+ enum node_stat_item item;
int sorted = 0;
int scanned = 0;
int isolated = 0;
@@ -4601,13 +4597,11 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
}
item = PGSCAN_KSWAPD + reclaimer_offset(sc);
- if (!cgroup_reclaim(sc)) {
- __count_vm_events(item, isolated);
+ if (!cgroup_reclaim(sc))
__count_vm_events(PGREFILL, sorted);
- }
- count_memcg_events(memcg, item, isolated);
+ mod_lruvec_state(lruvec, item, isolated);
count_memcg_events(memcg, PGREFILL, sorted);
- __count_vm_events(PGSCAN_ANON + type, isolated);
+ mod_lruvec_state(lruvec, PGSCAN_ANON + type, isolated);
trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, scan_batch,
scanned, skipped, isolated,
type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
@@ -4692,7 +4686,7 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
LIST_HEAD(clean);
struct folio *folio;
struct folio *next;
- enum vm_event_item item;
+ enum node_stat_item item;
struct reclaim_stat stat;
struct lru_gen_mm_walk *walk;
bool skip_retry = false;
@@ -4756,10 +4750,8 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
stat.nr_demoted);
item = PGSTEAL_KSWAPD + reclaimer_offset(sc);
- if (!cgroup_reclaim(sc))
- __count_vm_events(item, reclaimed);
- count_memcg_events(memcg, item, reclaimed);
- __count_vm_events(PGSTEAL_ANON + type, reclaimed);
+ mod_lruvec_state(lruvec, item, reclaimed);
+ mod_lruvec_state(lruvec, PGSTEAL_ANON + type, reclaimed);
spin_unlock_irq(&lruvec->lru_lock);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 99270713e0c1..d952c1e763e6 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1276,6 +1276,18 @@ const char * const vmstat_text[] = {
[I(PGDEMOTE_DIRECT)] = "pgdemote_direct",
[I(PGDEMOTE_KHUGEPAGED)] = "pgdemote_khugepaged",
[I(PGDEMOTE_PROACTIVE)] = "pgdemote_proactive",
+ [I(PGSTEAL_KSWAPD)] = "pgsteal_kswapd",
+ [I(PGSTEAL_DIRECT)] = "pgsteal_direct",
+ [I(PGSTEAL_KHUGEPAGED)] = "pgsteal_khugepaged",
+ [I(PGSTEAL_PROACTIVE)] = "pgsteal_proactive",
+ [I(PGSTEAL_ANON)] = "pgsteal_anon",
+ [I(PGSTEAL_FILE)] = "pgsteal_file",
+ [I(PGSCAN_KSWAPD)] = "pgscan_kswapd",
+ [I(PGSCAN_DIRECT)] = "pgscan_direct",
+ [I(PGSCAN_KHUGEPAGED)] = "pgscan_khugepaged",
+ [I(PGSCAN_PROACTIVE)] = "pgscan_proactive",
+ [I(PGSCAN_ANON)] = "pgscan_anon",
+ [I(PGSCAN_FILE)] = "pgscan_file",
#ifdef CONFIG_HUGETLB_PAGE
[I(NR_HUGETLB)] = "nr_hugetlb",
#endif
@@ -1320,19 +1332,7 @@ const char * const vmstat_text[] = {
[I(PGREFILL)] = "pgrefill",
[I(PGREUSE)] = "pgreuse",
- [I(PGSTEAL_KSWAPD)] = "pgsteal_kswapd",
- [I(PGSTEAL_DIRECT)] = "pgsteal_direct",
- [I(PGSTEAL_KHUGEPAGED)] = "pgsteal_khugepaged",
- [I(PGSTEAL_PROACTIVE)] = "pgsteal_proactive",
- [I(PGSCAN_KSWAPD)] = "pgscan_kswapd",
- [I(PGSCAN_DIRECT)] = "pgscan_direct",
- [I(PGSCAN_KHUGEPAGED)] = "pgscan_khugepaged",
- [I(PGSCAN_PROACTIVE)] = "pgscan_proactive",
[I(PGSCAN_DIRECT_THROTTLE)] = "pgscan_direct_throttle",
- [I(PGSCAN_ANON)] = "pgscan_anon",
- [I(PGSCAN_FILE)] = "pgscan_file",
- [I(PGSTEAL_ANON)] = "pgsteal_anon",
- [I(PGSTEAL_FILE)] = "pgsteal_file",
#ifdef CONFIG_NUMA
[I(PGSCAN_ZONE_RECLAIM_SUCCESS)] = "zone_reclaim_success",
--
2.47.3
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: move pgscan and pgsteal to node stats
2026-02-18 3:29 [PATCH v2] mm: move pgscan and pgsteal to node stats JP Kobryn (Meta)
@ 2026-02-18 6:44 ` Michael S. Tsirkin
2026-02-18 7:57 ` Michal Hocko
2026-02-18 8:54 ` Vlastimil Babka (SUSE)
2 siblings, 0 replies; 6+ messages in thread
From: Michael S. Tsirkin @ 2026-02-18 6:44 UTC (permalink / raw)
To: JP Kobryn (Meta)
Cc: linux-mm, mhocko, vbabka, apopple, akpm, axelrasmussen,
byungchul, cgroups, david, eperezma, gourry, jasowang, hannes,
joshua.hahnjy, Liam.Howlett, linux-kernel, lorenzo.stoakes,
matthew.brost, rppt, muchun.song, zhengqi.arch, rakie.kim,
roman.gushchin, shakeel.butt, surenb, virtualization, weixugc,
xuanzhuo, ying.huang, yuanchu, ziy, kernel-team
On Tue, Feb 17, 2026 at 07:29:41PM -0800, JP Kobryn (Meta) wrote:
> From: JP Kobryn <jp.kobryn@linux.dev>
>
> There are situations where reclaim kicks in on a system with free memory.
> One possible cause is a NUMA imbalance scenario where one or more nodes are
> under pressure. It would help if we could easily identify such nodes.
>
> Move the pgscan and pgsteal counters from vm_event_item to node_stat_item
> to provide per-node reclaim visibility. With these counters as node stats,
> the values are now displayed in the per-node section of /proc/zoneinfo,
> which allows for quick identification of the affected nodes.
>
> /proc/vmstat continues to report the same counters, aggregated across all
> nodes. But the ordering of these items within the readout changes as they
> move from the vm events section to the node stats section.
>
> Memcg accounting of these counters is preserved. The relocated counters
> remain visible in memory.stat alongside the existing aggregate pgscan and
> pgsteal counters.
>
> However, this change affects how the global counters are accumulated.
> Previously, the global event count update was gated on !cgroup_reclaim(),
> excluding memcg-based reclaim from /proc/vmstat. Now that
> mod_lruvec_state() is being used to update the counters, the global
> counters will include all reclaim. This is consistent with how pgdemote
> counters are already tracked.
>
> Finally, the virtio_balloon driver is updated to use
> global_node_page_state() to fetch the counters, as they are no longer
> accessible through the vm_events array.
>
> Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
balloon changed:
Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> drivers/virtio/virtio_balloon.c | 8 ++---
> include/linux/mmzone.h | 12 ++++++++
> include/linux/vm_event_item.h | 12 --------
> mm/memcontrol.c | 52 +++++++++++++++++++++++----------
> mm/vmscan.c | 32 ++++++++------------
> mm/vmstat.c | 24 +++++++--------
> 6 files changed, 76 insertions(+), 64 deletions(-)
>
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 4e549abe59ff..ab945532ceef 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -369,13 +369,13 @@ static inline unsigned int update_balloon_vm_stats(struct virtio_balloon *vb)
> update_stat(vb, idx++, VIRTIO_BALLOON_S_ALLOC_STALL, stall);
>
> update_stat(vb, idx++, VIRTIO_BALLOON_S_ASYNC_SCAN,
> - pages_to_bytes(events[PGSCAN_KSWAPD]));
> + pages_to_bytes(global_node_page_state(PGSCAN_KSWAPD)));
> update_stat(vb, idx++, VIRTIO_BALLOON_S_DIRECT_SCAN,
> - pages_to_bytes(events[PGSCAN_DIRECT]));
> + pages_to_bytes(global_node_page_state(PGSCAN_DIRECT)));
> update_stat(vb, idx++, VIRTIO_BALLOON_S_ASYNC_RECLAIM,
> - pages_to_bytes(events[PGSTEAL_KSWAPD]));
> + pages_to_bytes(global_node_page_state(PGSTEAL_KSWAPD)));
> update_stat(vb, idx++, VIRTIO_BALLOON_S_DIRECT_RECLAIM,
> - pages_to_bytes(events[PGSTEAL_DIRECT]));
> + pages_to_bytes(global_node_page_state(PGSTEAL_DIRECT)));
>
> #ifdef CONFIG_HUGETLB_PAGE
> update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 3e51190a55e4..1aa9c7aec889 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -255,6 +255,18 @@ enum node_stat_item {
> PGDEMOTE_DIRECT,
> PGDEMOTE_KHUGEPAGED,
> PGDEMOTE_PROACTIVE,
> + PGSTEAL_KSWAPD,
> + PGSTEAL_DIRECT,
> + PGSTEAL_KHUGEPAGED,
> + PGSTEAL_PROACTIVE,
> + PGSTEAL_ANON,
> + PGSTEAL_FILE,
> + PGSCAN_KSWAPD,
> + PGSCAN_DIRECT,
> + PGSCAN_KHUGEPAGED,
> + PGSCAN_PROACTIVE,
> + PGSCAN_ANON,
> + PGSCAN_FILE,
> #ifdef CONFIG_HUGETLB_PAGE
> NR_HUGETLB,
> #endif
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 22a139f82d75..1fa3b3ad0ff9 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -40,19 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> PGLAZYFREED,
> PGREFILL,
> PGREUSE,
> - PGSTEAL_KSWAPD,
> - PGSTEAL_DIRECT,
> - PGSTEAL_KHUGEPAGED,
> - PGSTEAL_PROACTIVE,
> - PGSCAN_KSWAPD,
> - PGSCAN_DIRECT,
> - PGSCAN_KHUGEPAGED,
> - PGSCAN_PROACTIVE,
> PGSCAN_DIRECT_THROTTLE,
> - PGSCAN_ANON,
> - PGSCAN_FILE,
> - PGSTEAL_ANON,
> - PGSTEAL_FILE,
> #ifdef CONFIG_NUMA
> PGSCAN_ZONE_RECLAIM_SUCCESS,
> PGSCAN_ZONE_RECLAIM_FAILED,
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 007413a53b45..e89e77457701 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -328,6 +328,18 @@ static const unsigned int memcg_node_stat_items[] = {
> PGDEMOTE_DIRECT,
> PGDEMOTE_KHUGEPAGED,
> PGDEMOTE_PROACTIVE,
> + PGSTEAL_KSWAPD,
> + PGSTEAL_DIRECT,
> + PGSTEAL_KHUGEPAGED,
> + PGSTEAL_PROACTIVE,
> + PGSTEAL_ANON,
> + PGSTEAL_FILE,
> + PGSCAN_KSWAPD,
> + PGSCAN_DIRECT,
> + PGSCAN_KHUGEPAGED,
> + PGSCAN_PROACTIVE,
> + PGSCAN_ANON,
> + PGSCAN_FILE,
> #ifdef CONFIG_HUGETLB_PAGE
> NR_HUGETLB,
> #endif
> @@ -441,14 +453,6 @@ static const unsigned int memcg_vm_event_stat[] = {
> #endif
> PSWPIN,
> PSWPOUT,
> - PGSCAN_KSWAPD,
> - PGSCAN_DIRECT,
> - PGSCAN_KHUGEPAGED,
> - PGSCAN_PROACTIVE,
> - PGSTEAL_KSWAPD,
> - PGSTEAL_DIRECT,
> - PGSTEAL_KHUGEPAGED,
> - PGSTEAL_PROACTIVE,
> PGFAULT,
> PGMAJFAULT,
> PGREFILL,
> @@ -1382,6 +1386,14 @@ static const struct memory_stat memory_stats[] = {
> { "pgdemote_direct", PGDEMOTE_DIRECT },
> { "pgdemote_khugepaged", PGDEMOTE_KHUGEPAGED },
> { "pgdemote_proactive", PGDEMOTE_PROACTIVE },
> + { "pgsteal_kswapd", PGSTEAL_KSWAPD },
> + { "pgsteal_direct", PGSTEAL_DIRECT },
> + { "pgsteal_khugepaged", PGSTEAL_KHUGEPAGED },
> + { "pgsteal_proactive", PGSTEAL_PROACTIVE },
> + { "pgscan_kswapd", PGSCAN_KSWAPD },
> + { "pgscan_direct", PGSCAN_DIRECT },
> + { "pgscan_khugepaged", PGSCAN_KHUGEPAGED },
> + { "pgscan_proactive", PGSCAN_PROACTIVE },
> #ifdef CONFIG_NUMA_BALANCING
> { "pgpromote_success", PGPROMOTE_SUCCESS },
> #endif
> @@ -1425,6 +1437,14 @@ static int memcg_page_state_output_unit(int item)
> case PGDEMOTE_DIRECT:
> case PGDEMOTE_KHUGEPAGED:
> case PGDEMOTE_PROACTIVE:
> + case PGSTEAL_KSWAPD:
> + case PGSTEAL_DIRECT:
> + case PGSTEAL_KHUGEPAGED:
> + case PGSTEAL_PROACTIVE:
> + case PGSCAN_KSWAPD:
> + case PGSCAN_DIRECT:
> + case PGSCAN_KHUGEPAGED:
> + case PGSCAN_PROACTIVE:
> #ifdef CONFIG_NUMA_BALANCING
> case PGPROMOTE_SUCCESS:
> #endif
> @@ -1496,15 +1516,15 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
>
> /* Accumulated memory events */
> seq_buf_printf(s, "pgscan %lu\n",
> - memcg_events(memcg, PGSCAN_KSWAPD) +
> - memcg_events(memcg, PGSCAN_DIRECT) +
> - memcg_events(memcg, PGSCAN_PROACTIVE) +
> - memcg_events(memcg, PGSCAN_KHUGEPAGED));
> + memcg_page_state(memcg, PGSCAN_KSWAPD) +
> + memcg_page_state(memcg, PGSCAN_DIRECT) +
> + memcg_page_state(memcg, PGSCAN_PROACTIVE) +
> + memcg_page_state(memcg, PGSCAN_KHUGEPAGED));
> seq_buf_printf(s, "pgsteal %lu\n",
> - memcg_events(memcg, PGSTEAL_KSWAPD) +
> - memcg_events(memcg, PGSTEAL_DIRECT) +
> - memcg_events(memcg, PGSTEAL_PROACTIVE) +
> - memcg_events(memcg, PGSTEAL_KHUGEPAGED));
> + memcg_page_state(memcg, PGSTEAL_KSWAPD) +
> + memcg_page_state(memcg, PGSTEAL_DIRECT) +
> + memcg_page_state(memcg, PGSTEAL_PROACTIVE) +
> + memcg_page_state(memcg, PGSTEAL_KHUGEPAGED));
>
> for (i = 0; i < ARRAY_SIZE(memcg_vm_event_stat); i++) {
> #ifdef CONFIG_MEMCG_V1
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 44e4fcd6463c..dd6d87340941 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1984,7 +1984,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
> unsigned long nr_taken;
> struct reclaim_stat stat;
> bool file = is_file_lru(lru);
> - enum vm_event_item item;
> + enum node_stat_item item;
> struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> bool stalled = false;
>
> @@ -2010,10 +2010,8 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
>
> __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken);
> item = PGSCAN_KSWAPD + reclaimer_offset(sc);
> - if (!cgroup_reclaim(sc))
> - __count_vm_events(item, nr_scanned);
> - count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned);
> - __count_vm_events(PGSCAN_ANON + file, nr_scanned);
> + mod_lruvec_state(lruvec, item, nr_scanned);
> + mod_lruvec_state(lruvec, PGSCAN_ANON + file, nr_scanned);
>
> spin_unlock_irq(&lruvec->lru_lock);
>
> @@ -2030,10 +2028,8 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
> stat.nr_demoted);
> __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
> item = PGSTEAL_KSWAPD + reclaimer_offset(sc);
> - if (!cgroup_reclaim(sc))
> - __count_vm_events(item, nr_reclaimed);
> - count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed);
> - __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed);
> + mod_lruvec_state(lruvec, item, nr_reclaimed);
> + mod_lruvec_state(lruvec, PGSTEAL_ANON + file, nr_reclaimed);
>
> lru_note_cost_unlock_irq(lruvec, file, stat.nr_pageout,
> nr_scanned - nr_reclaimed);
> @@ -4542,7 +4538,7 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> {
> int i;
> int gen;
> - enum vm_event_item item;
> + enum node_stat_item item;
> int sorted = 0;
> int scanned = 0;
> int isolated = 0;
> @@ -4601,13 +4597,11 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> }
>
> item = PGSCAN_KSWAPD + reclaimer_offset(sc);
> - if (!cgroup_reclaim(sc)) {
> - __count_vm_events(item, isolated);
> + if (!cgroup_reclaim(sc))
> __count_vm_events(PGREFILL, sorted);
> - }
> - count_memcg_events(memcg, item, isolated);
> + mod_lruvec_state(lruvec, item, isolated);
> count_memcg_events(memcg, PGREFILL, sorted);
> - __count_vm_events(PGSCAN_ANON + type, isolated);
> + mod_lruvec_state(lruvec, PGSCAN_ANON + type, isolated);
> trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, scan_batch,
> scanned, skipped, isolated,
> type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
> @@ -4692,7 +4686,7 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> LIST_HEAD(clean);
> struct folio *folio;
> struct folio *next;
> - enum vm_event_item item;
> + enum node_stat_item item;
> struct reclaim_stat stat;
> struct lru_gen_mm_walk *walk;
> bool skip_retry = false;
> @@ -4756,10 +4750,8 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> stat.nr_demoted);
>
> item = PGSTEAL_KSWAPD + reclaimer_offset(sc);
> - if (!cgroup_reclaim(sc))
> - __count_vm_events(item, reclaimed);
> - count_memcg_events(memcg, item, reclaimed);
> - __count_vm_events(PGSTEAL_ANON + type, reclaimed);
> + mod_lruvec_state(lruvec, item, reclaimed);
> + mod_lruvec_state(lruvec, PGSTEAL_ANON + type, reclaimed);
>
> spin_unlock_irq(&lruvec->lru_lock);
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 99270713e0c1..d952c1e763e6 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1276,6 +1276,18 @@ const char * const vmstat_text[] = {
> [I(PGDEMOTE_DIRECT)] = "pgdemote_direct",
> [I(PGDEMOTE_KHUGEPAGED)] = "pgdemote_khugepaged",
> [I(PGDEMOTE_PROACTIVE)] = "pgdemote_proactive",
> + [I(PGSTEAL_KSWAPD)] = "pgsteal_kswapd",
> + [I(PGSTEAL_DIRECT)] = "pgsteal_direct",
> + [I(PGSTEAL_KHUGEPAGED)] = "pgsteal_khugepaged",
> + [I(PGSTEAL_PROACTIVE)] = "pgsteal_proactive",
> + [I(PGSTEAL_ANON)] = "pgsteal_anon",
> + [I(PGSTEAL_FILE)] = "pgsteal_file",
> + [I(PGSCAN_KSWAPD)] = "pgscan_kswapd",
> + [I(PGSCAN_DIRECT)] = "pgscan_direct",
> + [I(PGSCAN_KHUGEPAGED)] = "pgscan_khugepaged",
> + [I(PGSCAN_PROACTIVE)] = "pgscan_proactive",
> + [I(PGSCAN_ANON)] = "pgscan_anon",
> + [I(PGSCAN_FILE)] = "pgscan_file",
> #ifdef CONFIG_HUGETLB_PAGE
> [I(NR_HUGETLB)] = "nr_hugetlb",
> #endif
> @@ -1320,19 +1332,7 @@ const char * const vmstat_text[] = {
>
> [I(PGREFILL)] = "pgrefill",
> [I(PGREUSE)] = "pgreuse",
> - [I(PGSTEAL_KSWAPD)] = "pgsteal_kswapd",
> - [I(PGSTEAL_DIRECT)] = "pgsteal_direct",
> - [I(PGSTEAL_KHUGEPAGED)] = "pgsteal_khugepaged",
> - [I(PGSTEAL_PROACTIVE)] = "pgsteal_proactive",
> - [I(PGSCAN_KSWAPD)] = "pgscan_kswapd",
> - [I(PGSCAN_DIRECT)] = "pgscan_direct",
> - [I(PGSCAN_KHUGEPAGED)] = "pgscan_khugepaged",
> - [I(PGSCAN_PROACTIVE)] = "pgscan_proactive",
> [I(PGSCAN_DIRECT_THROTTLE)] = "pgscan_direct_throttle",
> - [I(PGSCAN_ANON)] = "pgscan_anon",
> - [I(PGSCAN_FILE)] = "pgscan_file",
> - [I(PGSTEAL_ANON)] = "pgsteal_anon",
> - [I(PGSTEAL_FILE)] = "pgsteal_file",
>
> #ifdef CONFIG_NUMA
> [I(PGSCAN_ZONE_RECLAIM_SUCCESS)] = "zone_reclaim_success",
> --
> 2.47.3
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: move pgscan and pgsteal to node stats
2026-02-18 3:29 [PATCH v2] mm: move pgscan and pgsteal to node stats JP Kobryn (Meta)
2026-02-18 6:44 ` Michael S. Tsirkin
@ 2026-02-18 7:57 ` Michal Hocko
2026-02-18 8:54 ` Vlastimil Babka (SUSE)
2 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2026-02-18 7:57 UTC (permalink / raw)
To: JP Kobryn (Meta)
Cc: linux-mm, mst, vbabka, apopple, akpm, axelrasmussen, byungchul,
cgroups, david, eperezma, gourry, jasowang, hannes,
joshua.hahnjy, Liam.Howlett, linux-kernel, lorenzo.stoakes,
matthew.brost, rppt, muchun.song, zhengqi.arch, rakie.kim,
roman.gushchin, shakeel.butt, surenb, virtualization, weixugc,
xuanzhuo, ying.huang, yuanchu, ziy, kernel-team
On Tue 17-02-26 19:29:41, JP Kobryn (Meta) wrote:
> From: JP Kobryn <jp.kobryn@linux.dev>
>
> There are situations where reclaim kicks in on a system with free memory.
> One possible cause is a NUMA imbalance scenario where one or more nodes are
> under pressure. It would help if we could easily identify such nodes.
>
> Move the pgscan and pgsteal counters from vm_event_item to node_stat_item
> to provide per-node reclaim visibility. With these counters as node stats,
> the values are now displayed in the per-node section of /proc/zoneinfo,
> which allows for quick identification of the affected nodes.
>
> /proc/vmstat continues to report the same counters, aggregated across all
> nodes. But the ordering of these items within the readout changes as they
> move from the vm events section to the node stats section.
>
> Memcg accounting of these counters is preserved. The relocated counters
> remain visible in memory.stat alongside the existing aggregate pgscan and
> pgsteal counters.
>
> However, this change affects how the global counters are accumulated.
> Previously, the global event count update was gated on !cgroup_reclaim(),
> excluding memcg-based reclaim from /proc/vmstat. Now that
> mod_lruvec_state() is being used to update the counters, the global
> counters will include all reclaim. This is consistent with how pgdemote
> counters are already tracked.
>
> Finally, the virtio_balloon driver is updated to use
> global_node_page_state() to fetch the counters, as they are no longer
> accessible through the vm_events array.
>
> Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Thanks!
> ---
> drivers/virtio/virtio_balloon.c | 8 ++---
> include/linux/mmzone.h | 12 ++++++++
> include/linux/vm_event_item.h | 12 --------
> mm/memcontrol.c | 52 +++++++++++++++++++++++----------
> mm/vmscan.c | 32 ++++++++------------
> mm/vmstat.c | 24 +++++++--------
> 6 files changed, 76 insertions(+), 64 deletions(-)
>
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 4e549abe59ff..ab945532ceef 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -369,13 +369,13 @@ static inline unsigned int update_balloon_vm_stats(struct virtio_balloon *vb)
> update_stat(vb, idx++, VIRTIO_BALLOON_S_ALLOC_STALL, stall);
>
> update_stat(vb, idx++, VIRTIO_BALLOON_S_ASYNC_SCAN,
> - pages_to_bytes(events[PGSCAN_KSWAPD]));
> + pages_to_bytes(global_node_page_state(PGSCAN_KSWAPD)));
> update_stat(vb, idx++, VIRTIO_BALLOON_S_DIRECT_SCAN,
> - pages_to_bytes(events[PGSCAN_DIRECT]));
> + pages_to_bytes(global_node_page_state(PGSCAN_DIRECT)));
> update_stat(vb, idx++, VIRTIO_BALLOON_S_ASYNC_RECLAIM,
> - pages_to_bytes(events[PGSTEAL_KSWAPD]));
> + pages_to_bytes(global_node_page_state(PGSTEAL_KSWAPD)));
> update_stat(vb, idx++, VIRTIO_BALLOON_S_DIRECT_RECLAIM,
> - pages_to_bytes(events[PGSTEAL_DIRECT]));
> + pages_to_bytes(global_node_page_state(PGSTEAL_DIRECT)));
>
> #ifdef CONFIG_HUGETLB_PAGE
> update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 3e51190a55e4..1aa9c7aec889 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -255,6 +255,18 @@ enum node_stat_item {
> PGDEMOTE_DIRECT,
> PGDEMOTE_KHUGEPAGED,
> PGDEMOTE_PROACTIVE,
> + PGSTEAL_KSWAPD,
> + PGSTEAL_DIRECT,
> + PGSTEAL_KHUGEPAGED,
> + PGSTEAL_PROACTIVE,
> + PGSTEAL_ANON,
> + PGSTEAL_FILE,
> + PGSCAN_KSWAPD,
> + PGSCAN_DIRECT,
> + PGSCAN_KHUGEPAGED,
> + PGSCAN_PROACTIVE,
> + PGSCAN_ANON,
> + PGSCAN_FILE,
> #ifdef CONFIG_HUGETLB_PAGE
> NR_HUGETLB,
> #endif
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 22a139f82d75..1fa3b3ad0ff9 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -40,19 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> PGLAZYFREED,
> PGREFILL,
> PGREUSE,
> - PGSTEAL_KSWAPD,
> - PGSTEAL_DIRECT,
> - PGSTEAL_KHUGEPAGED,
> - PGSTEAL_PROACTIVE,
> - PGSCAN_KSWAPD,
> - PGSCAN_DIRECT,
> - PGSCAN_KHUGEPAGED,
> - PGSCAN_PROACTIVE,
> PGSCAN_DIRECT_THROTTLE,
> - PGSCAN_ANON,
> - PGSCAN_FILE,
> - PGSTEAL_ANON,
> - PGSTEAL_FILE,
> #ifdef CONFIG_NUMA
> PGSCAN_ZONE_RECLAIM_SUCCESS,
> PGSCAN_ZONE_RECLAIM_FAILED,
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 007413a53b45..e89e77457701 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -328,6 +328,18 @@ static const unsigned int memcg_node_stat_items[] = {
> PGDEMOTE_DIRECT,
> PGDEMOTE_KHUGEPAGED,
> PGDEMOTE_PROACTIVE,
> + PGSTEAL_KSWAPD,
> + PGSTEAL_DIRECT,
> + PGSTEAL_KHUGEPAGED,
> + PGSTEAL_PROACTIVE,
> + PGSTEAL_ANON,
> + PGSTEAL_FILE,
> + PGSCAN_KSWAPD,
> + PGSCAN_DIRECT,
> + PGSCAN_KHUGEPAGED,
> + PGSCAN_PROACTIVE,
> + PGSCAN_ANON,
> + PGSCAN_FILE,
> #ifdef CONFIG_HUGETLB_PAGE
> NR_HUGETLB,
> #endif
> @@ -441,14 +453,6 @@ static const unsigned int memcg_vm_event_stat[] = {
> #endif
> PSWPIN,
> PSWPOUT,
> - PGSCAN_KSWAPD,
> - PGSCAN_DIRECT,
> - PGSCAN_KHUGEPAGED,
> - PGSCAN_PROACTIVE,
> - PGSTEAL_KSWAPD,
> - PGSTEAL_DIRECT,
> - PGSTEAL_KHUGEPAGED,
> - PGSTEAL_PROACTIVE,
> PGFAULT,
> PGMAJFAULT,
> PGREFILL,
> @@ -1382,6 +1386,14 @@ static const struct memory_stat memory_stats[] = {
> { "pgdemote_direct", PGDEMOTE_DIRECT },
> { "pgdemote_khugepaged", PGDEMOTE_KHUGEPAGED },
> { "pgdemote_proactive", PGDEMOTE_PROACTIVE },
> + { "pgsteal_kswapd", PGSTEAL_KSWAPD },
> + { "pgsteal_direct", PGSTEAL_DIRECT },
> + { "pgsteal_khugepaged", PGSTEAL_KHUGEPAGED },
> + { "pgsteal_proactive", PGSTEAL_PROACTIVE },
> + { "pgscan_kswapd", PGSCAN_KSWAPD },
> + { "pgscan_direct", PGSCAN_DIRECT },
> + { "pgscan_khugepaged", PGSCAN_KHUGEPAGED },
> + { "pgscan_proactive", PGSCAN_PROACTIVE },
> #ifdef CONFIG_NUMA_BALANCING
> { "pgpromote_success", PGPROMOTE_SUCCESS },
> #endif
> @@ -1425,6 +1437,14 @@ static int memcg_page_state_output_unit(int item)
> case PGDEMOTE_DIRECT:
> case PGDEMOTE_KHUGEPAGED:
> case PGDEMOTE_PROACTIVE:
> + case PGSTEAL_KSWAPD:
> + case PGSTEAL_DIRECT:
> + case PGSTEAL_KHUGEPAGED:
> + case PGSTEAL_PROACTIVE:
> + case PGSCAN_KSWAPD:
> + case PGSCAN_DIRECT:
> + case PGSCAN_KHUGEPAGED:
> + case PGSCAN_PROACTIVE:
> #ifdef CONFIG_NUMA_BALANCING
> case PGPROMOTE_SUCCESS:
> #endif
> @@ -1496,15 +1516,15 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
>
> /* Accumulated memory events */
> seq_buf_printf(s, "pgscan %lu\n",
> - memcg_events(memcg, PGSCAN_KSWAPD) +
> - memcg_events(memcg, PGSCAN_DIRECT) +
> - memcg_events(memcg, PGSCAN_PROACTIVE) +
> - memcg_events(memcg, PGSCAN_KHUGEPAGED));
> + memcg_page_state(memcg, PGSCAN_KSWAPD) +
> + memcg_page_state(memcg, PGSCAN_DIRECT) +
> + memcg_page_state(memcg, PGSCAN_PROACTIVE) +
> + memcg_page_state(memcg, PGSCAN_KHUGEPAGED));
> seq_buf_printf(s, "pgsteal %lu\n",
> - memcg_events(memcg, PGSTEAL_KSWAPD) +
> - memcg_events(memcg, PGSTEAL_DIRECT) +
> - memcg_events(memcg, PGSTEAL_PROACTIVE) +
> - memcg_events(memcg, PGSTEAL_KHUGEPAGED));
> + memcg_page_state(memcg, PGSTEAL_KSWAPD) +
> + memcg_page_state(memcg, PGSTEAL_DIRECT) +
> + memcg_page_state(memcg, PGSTEAL_PROACTIVE) +
> + memcg_page_state(memcg, PGSTEAL_KHUGEPAGED));
>
> for (i = 0; i < ARRAY_SIZE(memcg_vm_event_stat); i++) {
> #ifdef CONFIG_MEMCG_V1
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 44e4fcd6463c..dd6d87340941 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1984,7 +1984,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
> unsigned long nr_taken;
> struct reclaim_stat stat;
> bool file = is_file_lru(lru);
> - enum vm_event_item item;
> + enum node_stat_item item;
> struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> bool stalled = false;
>
> @@ -2010,10 +2010,8 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
>
> __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken);
> item = PGSCAN_KSWAPD + reclaimer_offset(sc);
> - if (!cgroup_reclaim(sc))
> - __count_vm_events(item, nr_scanned);
> - count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned);
> - __count_vm_events(PGSCAN_ANON + file, nr_scanned);
> + mod_lruvec_state(lruvec, item, nr_scanned);
> + mod_lruvec_state(lruvec, PGSCAN_ANON + file, nr_scanned);
>
> spin_unlock_irq(&lruvec->lru_lock);
>
> @@ -2030,10 +2028,8 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
> stat.nr_demoted);
> __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
> item = PGSTEAL_KSWAPD + reclaimer_offset(sc);
> - if (!cgroup_reclaim(sc))
> - __count_vm_events(item, nr_reclaimed);
> - count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed);
> - __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed);
> + mod_lruvec_state(lruvec, item, nr_reclaimed);
> + mod_lruvec_state(lruvec, PGSTEAL_ANON + file, nr_reclaimed);
>
> lru_note_cost_unlock_irq(lruvec, file, stat.nr_pageout,
> nr_scanned - nr_reclaimed);
> @@ -4542,7 +4538,7 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> {
> int i;
> int gen;
> - enum vm_event_item item;
> + enum node_stat_item item;
> int sorted = 0;
> int scanned = 0;
> int isolated = 0;
> @@ -4601,13 +4597,11 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> }
>
> item = PGSCAN_KSWAPD + reclaimer_offset(sc);
> - if (!cgroup_reclaim(sc)) {
> - __count_vm_events(item, isolated);
> + if (!cgroup_reclaim(sc))
> __count_vm_events(PGREFILL, sorted);
> - }
> - count_memcg_events(memcg, item, isolated);
> + mod_lruvec_state(lruvec, item, isolated);
> count_memcg_events(memcg, PGREFILL, sorted);
> - __count_vm_events(PGSCAN_ANON + type, isolated);
> + mod_lruvec_state(lruvec, PGSCAN_ANON + type, isolated);
> trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, scan_batch,
> scanned, skipped, isolated,
> type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
> @@ -4692,7 +4686,7 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> LIST_HEAD(clean);
> struct folio *folio;
> struct folio *next;
> - enum vm_event_item item;
> + enum node_stat_item item;
> struct reclaim_stat stat;
> struct lru_gen_mm_walk *walk;
> bool skip_retry = false;
> @@ -4756,10 +4750,8 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> stat.nr_demoted);
>
> item = PGSTEAL_KSWAPD + reclaimer_offset(sc);
> - if (!cgroup_reclaim(sc))
> - __count_vm_events(item, reclaimed);
> - count_memcg_events(memcg, item, reclaimed);
> - __count_vm_events(PGSTEAL_ANON + type, reclaimed);
> + mod_lruvec_state(lruvec, item, reclaimed);
> + mod_lruvec_state(lruvec, PGSTEAL_ANON + type, reclaimed);
>
> spin_unlock_irq(&lruvec->lru_lock);
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 99270713e0c1..d952c1e763e6 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1276,6 +1276,18 @@ const char * const vmstat_text[] = {
> [I(PGDEMOTE_DIRECT)] = "pgdemote_direct",
> [I(PGDEMOTE_KHUGEPAGED)] = "pgdemote_khugepaged",
> [I(PGDEMOTE_PROACTIVE)] = "pgdemote_proactive",
> + [I(PGSTEAL_KSWAPD)] = "pgsteal_kswapd",
> + [I(PGSTEAL_DIRECT)] = "pgsteal_direct",
> + [I(PGSTEAL_KHUGEPAGED)] = "pgsteal_khugepaged",
> + [I(PGSTEAL_PROACTIVE)] = "pgsteal_proactive",
> + [I(PGSTEAL_ANON)] = "pgsteal_anon",
> + [I(PGSTEAL_FILE)] = "pgsteal_file",
> + [I(PGSCAN_KSWAPD)] = "pgscan_kswapd",
> + [I(PGSCAN_DIRECT)] = "pgscan_direct",
> + [I(PGSCAN_KHUGEPAGED)] = "pgscan_khugepaged",
> + [I(PGSCAN_PROACTIVE)] = "pgscan_proactive",
> + [I(PGSCAN_ANON)] = "pgscan_anon",
> + [I(PGSCAN_FILE)] = "pgscan_file",
> #ifdef CONFIG_HUGETLB_PAGE
> [I(NR_HUGETLB)] = "nr_hugetlb",
> #endif
> @@ -1320,19 +1332,7 @@ const char * const vmstat_text[] = {
>
> [I(PGREFILL)] = "pgrefill",
> [I(PGREUSE)] = "pgreuse",
> - [I(PGSTEAL_KSWAPD)] = "pgsteal_kswapd",
> - [I(PGSTEAL_DIRECT)] = "pgsteal_direct",
> - [I(PGSTEAL_KHUGEPAGED)] = "pgsteal_khugepaged",
> - [I(PGSTEAL_PROACTIVE)] = "pgsteal_proactive",
> - [I(PGSCAN_KSWAPD)] = "pgscan_kswapd",
> - [I(PGSCAN_DIRECT)] = "pgscan_direct",
> - [I(PGSCAN_KHUGEPAGED)] = "pgscan_khugepaged",
> - [I(PGSCAN_PROACTIVE)] = "pgscan_proactive",
> [I(PGSCAN_DIRECT_THROTTLE)] = "pgscan_direct_throttle",
> - [I(PGSCAN_ANON)] = "pgscan_anon",
> - [I(PGSCAN_FILE)] = "pgscan_file",
> - [I(PGSTEAL_ANON)] = "pgsteal_anon",
> - [I(PGSTEAL_FILE)] = "pgsteal_file",
>
> #ifdef CONFIG_NUMA
> [I(PGSCAN_ZONE_RECLAIM_SUCCESS)] = "zone_reclaim_success",
> --
> 2.47.3
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: move pgscan and pgsteal to node stats
2026-02-18 3:29 [PATCH v2] mm: move pgscan and pgsteal to node stats JP Kobryn (Meta)
2026-02-18 6:44 ` Michael S. Tsirkin
2026-02-18 7:57 ` Michal Hocko
@ 2026-02-18 8:54 ` Vlastimil Babka (SUSE)
2026-02-18 18:02 ` JP Kobryn (Meta)
2 siblings, 1 reply; 6+ messages in thread
From: Vlastimil Babka (SUSE) @ 2026-02-18 8:54 UTC (permalink / raw)
To: JP Kobryn (Meta), linux-mm, mst, mhocko, vbabka
Cc: apopple, akpm, axelrasmussen, byungchul, cgroups, david,
eperezma, gourry, jasowang, hannes, joshua.hahnjy, Liam.Howlett,
linux-kernel, lorenzo.stoakes, matthew.brost, rppt, muchun.song,
zhengqi.arch, rakie.kim, roman.gushchin, shakeel.butt, surenb,
virtualization, weixugc, xuanzhuo, ying.huang, yuanchu, ziy,
kernel-team
On 2/18/26 04:29, JP Kobryn (Meta) wrote:
> From: JP Kobryn <jp.kobryn@linux.dev>
>
> There are situations where reclaim kicks in on a system with free memory.
> One possible cause is a NUMA imbalance scenario where one or more nodes are
> under pressure. It would help if we could easily identify such nodes.
>
> Move the pgscan and pgsteal counters from vm_event_item to node_stat_item
> to provide per-node reclaim visibility. With these counters as node stats,
> the values are now displayed in the per-node section of /proc/zoneinfo,
> which allows for quick identification of the affected nodes.
>
> /proc/vmstat continues to report the same counters, aggregated across all
> nodes. But the ordering of these items within the readout changes as they
> move from the vm events section to the node stats section.
>
> Memcg accounting of these counters is preserved. The relocated counters
> remain visible in memory.stat alongside the existing aggregate pgscan and
> pgsteal counters.
>
> However, this change affects how the global counters are accumulated.
> Previously, the global event count update was gated on !cgroup_reclaim(),
> excluding memcg-based reclaim from /proc/vmstat. Now that
> mod_lruvec_state() is being used to update the counters, the global
> counters will include all reclaim. This is consistent with how pgdemote
> counters are already tracked.
Hm so that leaves PGREFILL (scanned in the active list) the odd one out,
right? Not being per-node and gated on !cgroup_reclaim() for global stats.
Should we change it too for full consistency?
> Finally, the virtio_balloon driver is updated to use
> global_node_page_state() to fetch the counters, as they are no longer
> accessible through the vm_events array.
>
> Signed-off-by: JP Kobryn <jp.kobryn@linux.dev>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
> drivers/virtio/virtio_balloon.c | 8 ++---
> include/linux/mmzone.h | 12 ++++++++
> include/linux/vm_event_item.h | 12 --------
> mm/memcontrol.c | 52 +++++++++++++++++++++++----------
> mm/vmscan.c | 32 ++++++++------------
> mm/vmstat.c | 24 +++++++--------
> 6 files changed, 76 insertions(+), 64 deletions(-)
>
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 4e549abe59ff..ab945532ceef 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -369,13 +369,13 @@ static inline unsigned int update_balloon_vm_stats(struct virtio_balloon *vb)
> update_stat(vb, idx++, VIRTIO_BALLOON_S_ALLOC_STALL, stall);
>
> update_stat(vb, idx++, VIRTIO_BALLOON_S_ASYNC_SCAN,
> - pages_to_bytes(events[PGSCAN_KSWAPD]));
> + pages_to_bytes(global_node_page_state(PGSCAN_KSWAPD)));
> update_stat(vb, idx++, VIRTIO_BALLOON_S_DIRECT_SCAN,
> - pages_to_bytes(events[PGSCAN_DIRECT]));
> + pages_to_bytes(global_node_page_state(PGSCAN_DIRECT)));
> update_stat(vb, idx++, VIRTIO_BALLOON_S_ASYNC_RECLAIM,
> - pages_to_bytes(events[PGSTEAL_KSWAPD]));
> + pages_to_bytes(global_node_page_state(PGSTEAL_KSWAPD)));
> update_stat(vb, idx++, VIRTIO_BALLOON_S_DIRECT_RECLAIM,
> - pages_to_bytes(events[PGSTEAL_DIRECT]));
> + pages_to_bytes(global_node_page_state(PGSTEAL_DIRECT)));
>
> #ifdef CONFIG_HUGETLB_PAGE
> update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 3e51190a55e4..1aa9c7aec889 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -255,6 +255,18 @@ enum node_stat_item {
> PGDEMOTE_DIRECT,
> PGDEMOTE_KHUGEPAGED,
> PGDEMOTE_PROACTIVE,
> + PGSTEAL_KSWAPD,
> + PGSTEAL_DIRECT,
> + PGSTEAL_KHUGEPAGED,
> + PGSTEAL_PROACTIVE,
> + PGSTEAL_ANON,
> + PGSTEAL_FILE,
> + PGSCAN_KSWAPD,
> + PGSCAN_DIRECT,
> + PGSCAN_KHUGEPAGED,
> + PGSCAN_PROACTIVE,
> + PGSCAN_ANON,
> + PGSCAN_FILE,
> #ifdef CONFIG_HUGETLB_PAGE
> NR_HUGETLB,
> #endif
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 22a139f82d75..1fa3b3ad0ff9 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -40,19 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> PGLAZYFREED,
> PGREFILL,
> PGREUSE,
> - PGSTEAL_KSWAPD,
> - PGSTEAL_DIRECT,
> - PGSTEAL_KHUGEPAGED,
> - PGSTEAL_PROACTIVE,
> - PGSCAN_KSWAPD,
> - PGSCAN_DIRECT,
> - PGSCAN_KHUGEPAGED,
> - PGSCAN_PROACTIVE,
> PGSCAN_DIRECT_THROTTLE,
> - PGSCAN_ANON,
> - PGSCAN_FILE,
> - PGSTEAL_ANON,
> - PGSTEAL_FILE,
> #ifdef CONFIG_NUMA
> PGSCAN_ZONE_RECLAIM_SUCCESS,
> PGSCAN_ZONE_RECLAIM_FAILED,
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 007413a53b45..e89e77457701 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -328,6 +328,18 @@ static const unsigned int memcg_node_stat_items[] = {
> PGDEMOTE_DIRECT,
> PGDEMOTE_KHUGEPAGED,
> PGDEMOTE_PROACTIVE,
> + PGSTEAL_KSWAPD,
> + PGSTEAL_DIRECT,
> + PGSTEAL_KHUGEPAGED,
> + PGSTEAL_PROACTIVE,
> + PGSTEAL_ANON,
> + PGSTEAL_FILE,
> + PGSCAN_KSWAPD,
> + PGSCAN_DIRECT,
> + PGSCAN_KHUGEPAGED,
> + PGSCAN_PROACTIVE,
> + PGSCAN_ANON,
> + PGSCAN_FILE,
> #ifdef CONFIG_HUGETLB_PAGE
> NR_HUGETLB,
> #endif
> @@ -441,14 +453,6 @@ static const unsigned int memcg_vm_event_stat[] = {
> #endif
> PSWPIN,
> PSWPOUT,
> - PGSCAN_KSWAPD,
> - PGSCAN_DIRECT,
> - PGSCAN_KHUGEPAGED,
> - PGSCAN_PROACTIVE,
> - PGSTEAL_KSWAPD,
> - PGSTEAL_DIRECT,
> - PGSTEAL_KHUGEPAGED,
> - PGSTEAL_PROACTIVE,
> PGFAULT,
> PGMAJFAULT,
> PGREFILL,
> @@ -1382,6 +1386,14 @@ static const struct memory_stat memory_stats[] = {
> { "pgdemote_direct", PGDEMOTE_DIRECT },
> { "pgdemote_khugepaged", PGDEMOTE_KHUGEPAGED },
> { "pgdemote_proactive", PGDEMOTE_PROACTIVE },
> + { "pgsteal_kswapd", PGSTEAL_KSWAPD },
> + { "pgsteal_direct", PGSTEAL_DIRECT },
> + { "pgsteal_khugepaged", PGSTEAL_KHUGEPAGED },
> + { "pgsteal_proactive", PGSTEAL_PROACTIVE },
> + { "pgscan_kswapd", PGSCAN_KSWAPD },
> + { "pgscan_direct", PGSCAN_DIRECT },
> + { "pgscan_khugepaged", PGSCAN_KHUGEPAGED },
> + { "pgscan_proactive", PGSCAN_PROACTIVE },
> #ifdef CONFIG_NUMA_BALANCING
> { "pgpromote_success", PGPROMOTE_SUCCESS },
> #endif
> @@ -1425,6 +1437,14 @@ static int memcg_page_state_output_unit(int item)
> case PGDEMOTE_DIRECT:
> case PGDEMOTE_KHUGEPAGED:
> case PGDEMOTE_PROACTIVE:
> + case PGSTEAL_KSWAPD:
> + case PGSTEAL_DIRECT:
> + case PGSTEAL_KHUGEPAGED:
> + case PGSTEAL_PROACTIVE:
> + case PGSCAN_KSWAPD:
> + case PGSCAN_DIRECT:
> + case PGSCAN_KHUGEPAGED:
> + case PGSCAN_PROACTIVE:
> #ifdef CONFIG_NUMA_BALANCING
> case PGPROMOTE_SUCCESS:
> #endif
> @@ -1496,15 +1516,15 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
>
> /* Accumulated memory events */
> seq_buf_printf(s, "pgscan %lu\n",
> - memcg_events(memcg, PGSCAN_KSWAPD) +
> - memcg_events(memcg, PGSCAN_DIRECT) +
> - memcg_events(memcg, PGSCAN_PROACTIVE) +
> - memcg_events(memcg, PGSCAN_KHUGEPAGED));
> + memcg_page_state(memcg, PGSCAN_KSWAPD) +
> + memcg_page_state(memcg, PGSCAN_DIRECT) +
> + memcg_page_state(memcg, PGSCAN_PROACTIVE) +
> + memcg_page_state(memcg, PGSCAN_KHUGEPAGED));
> seq_buf_printf(s, "pgsteal %lu\n",
> - memcg_events(memcg, PGSTEAL_KSWAPD) +
> - memcg_events(memcg, PGSTEAL_DIRECT) +
> - memcg_events(memcg, PGSTEAL_PROACTIVE) +
> - memcg_events(memcg, PGSTEAL_KHUGEPAGED));
> + memcg_page_state(memcg, PGSTEAL_KSWAPD) +
> + memcg_page_state(memcg, PGSTEAL_DIRECT) +
> + memcg_page_state(memcg, PGSTEAL_PROACTIVE) +
> + memcg_page_state(memcg, PGSTEAL_KHUGEPAGED));
>
> for (i = 0; i < ARRAY_SIZE(memcg_vm_event_stat); i++) {
> #ifdef CONFIG_MEMCG_V1
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 44e4fcd6463c..dd6d87340941 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1984,7 +1984,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
> unsigned long nr_taken;
> struct reclaim_stat stat;
> bool file = is_file_lru(lru);
> - enum vm_event_item item;
> + enum node_stat_item item;
> struct pglist_data *pgdat = lruvec_pgdat(lruvec);
> bool stalled = false;
>
> @@ -2010,10 +2010,8 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
>
> __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken);
> item = PGSCAN_KSWAPD + reclaimer_offset(sc);
> - if (!cgroup_reclaim(sc))
> - __count_vm_events(item, nr_scanned);
> - count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned);
> - __count_vm_events(PGSCAN_ANON + file, nr_scanned);
> + mod_lruvec_state(lruvec, item, nr_scanned);
> + mod_lruvec_state(lruvec, PGSCAN_ANON + file, nr_scanned);
>
> spin_unlock_irq(&lruvec->lru_lock);
>
> @@ -2030,10 +2028,8 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
> stat.nr_demoted);
> __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
> item = PGSTEAL_KSWAPD + reclaimer_offset(sc);
> - if (!cgroup_reclaim(sc))
> - __count_vm_events(item, nr_reclaimed);
> - count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed);
> - __count_vm_events(PGSTEAL_ANON + file, nr_reclaimed);
> + mod_lruvec_state(lruvec, item, nr_reclaimed);
> + mod_lruvec_state(lruvec, PGSTEAL_ANON + file, nr_reclaimed);
>
> lru_note_cost_unlock_irq(lruvec, file, stat.nr_pageout,
> nr_scanned - nr_reclaimed);
> @@ -4542,7 +4538,7 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> {
> int i;
> int gen;
> - enum vm_event_item item;
> + enum node_stat_item item;
> int sorted = 0;
> int scanned = 0;
> int isolated = 0;
> @@ -4601,13 +4597,11 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> }
>
> item = PGSCAN_KSWAPD + reclaimer_offset(sc);
> - if (!cgroup_reclaim(sc)) {
> - __count_vm_events(item, isolated);
> + if (!cgroup_reclaim(sc))
> __count_vm_events(PGREFILL, sorted);
> - }
> - count_memcg_events(memcg, item, isolated);
> + mod_lruvec_state(lruvec, item, isolated);
> count_memcg_events(memcg, PGREFILL, sorted);
> - __count_vm_events(PGSCAN_ANON + type, isolated);
> + mod_lruvec_state(lruvec, PGSCAN_ANON + type, isolated);
> trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, scan_batch,
> scanned, skipped, isolated,
> type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
> @@ -4692,7 +4686,7 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> LIST_HEAD(clean);
> struct folio *folio;
> struct folio *next;
> - enum vm_event_item item;
> + enum node_stat_item item;
> struct reclaim_stat stat;
> struct lru_gen_mm_walk *walk;
> bool skip_retry = false;
> @@ -4756,10 +4750,8 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> stat.nr_demoted);
>
> item = PGSTEAL_KSWAPD + reclaimer_offset(sc);
> - if (!cgroup_reclaim(sc))
> - __count_vm_events(item, reclaimed);
> - count_memcg_events(memcg, item, reclaimed);
> - __count_vm_events(PGSTEAL_ANON + type, reclaimed);
> + mod_lruvec_state(lruvec, item, reclaimed);
> + mod_lruvec_state(lruvec, PGSTEAL_ANON + type, reclaimed);
>
> spin_unlock_irq(&lruvec->lru_lock);
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 99270713e0c1..d952c1e763e6 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1276,6 +1276,18 @@ const char * const vmstat_text[] = {
> [I(PGDEMOTE_DIRECT)] = "pgdemote_direct",
> [I(PGDEMOTE_KHUGEPAGED)] = "pgdemote_khugepaged",
> [I(PGDEMOTE_PROACTIVE)] = "pgdemote_proactive",
> + [I(PGSTEAL_KSWAPD)] = "pgsteal_kswapd",
> + [I(PGSTEAL_DIRECT)] = "pgsteal_direct",
> + [I(PGSTEAL_KHUGEPAGED)] = "pgsteal_khugepaged",
> + [I(PGSTEAL_PROACTIVE)] = "pgsteal_proactive",
> + [I(PGSTEAL_ANON)] = "pgsteal_anon",
> + [I(PGSTEAL_FILE)] = "pgsteal_file",
> + [I(PGSCAN_KSWAPD)] = "pgscan_kswapd",
> + [I(PGSCAN_DIRECT)] = "pgscan_direct",
> + [I(PGSCAN_KHUGEPAGED)] = "pgscan_khugepaged",
> + [I(PGSCAN_PROACTIVE)] = "pgscan_proactive",
> + [I(PGSCAN_ANON)] = "pgscan_anon",
> + [I(PGSCAN_FILE)] = "pgscan_file",
> #ifdef CONFIG_HUGETLB_PAGE
> [I(NR_HUGETLB)] = "nr_hugetlb",
> #endif
> @@ -1320,19 +1332,7 @@ const char * const vmstat_text[] = {
>
> [I(PGREFILL)] = "pgrefill",
> [I(PGREUSE)] = "pgreuse",
> - [I(PGSTEAL_KSWAPD)] = "pgsteal_kswapd",
> - [I(PGSTEAL_DIRECT)] = "pgsteal_direct",
> - [I(PGSTEAL_KHUGEPAGED)] = "pgsteal_khugepaged",
> - [I(PGSTEAL_PROACTIVE)] = "pgsteal_proactive",
> - [I(PGSCAN_KSWAPD)] = "pgscan_kswapd",
> - [I(PGSCAN_DIRECT)] = "pgscan_direct",
> - [I(PGSCAN_KHUGEPAGED)] = "pgscan_khugepaged",
> - [I(PGSCAN_PROACTIVE)] = "pgscan_proactive",
> [I(PGSCAN_DIRECT_THROTTLE)] = "pgscan_direct_throttle",
> - [I(PGSCAN_ANON)] = "pgscan_anon",
> - [I(PGSCAN_FILE)] = "pgscan_file",
> - [I(PGSTEAL_ANON)] = "pgsteal_anon",
> - [I(PGSTEAL_FILE)] = "pgsteal_file",
>
> #ifdef CONFIG_NUMA
> [I(PGSCAN_ZONE_RECLAIM_SUCCESS)] = "zone_reclaim_success",
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: move pgscan and pgsteal to node stats
2026-02-18 8:54 ` Vlastimil Babka (SUSE)
@ 2026-02-18 18:02 ` JP Kobryn (Meta)
2026-02-18 19:53 ` JP Kobryn (Meta)
0 siblings, 1 reply; 6+ messages in thread
From: JP Kobryn (Meta) @ 2026-02-18 18:02 UTC (permalink / raw)
To: Vlastimil Babka (SUSE), linux-mm, mst, mhocko, vbabka
Cc: apopple, akpm, axelrasmussen, byungchul, cgroups, david,
eperezma, gourry, jasowang, hannes, joshua.hahnjy, Liam.Howlett,
linux-kernel, lorenzo.stoakes, matthew.brost, rppt, muchun.song,
zhengqi.arch, rakie.kim, roman.gushchin, shakeel.butt, surenb,
virtualization, weixugc, xuanzhuo, ying.huang, yuanchu, ziy,
kernel-team
On 2/18/26 12:54 AM, Vlastimil Babka (SUSE) wrote:
> On 2/18/26 04:29, JP Kobryn (Meta) wrote:
>> From: JP Kobryn <jp.kobryn@linux.dev>
>>
>> There are situations where reclaim kicks in on a system with free memory.
>> One possible cause is a NUMA imbalance scenario where one or more nodes are
>> under pressure. It would help if we could easily identify such nodes.
>>
>> Move the pgscan and pgsteal counters from vm_event_item to node_stat_item
>> to provide per-node reclaim visibility. With these counters as node stats,
>> the values are now displayed in the per-node section of /proc/zoneinfo,
>> which allows for quick identification of the affected nodes.
>>
>> /proc/vmstat continues to report the same counters, aggregated across all
>> nodes. But the ordering of these items within the readout changes as they
>> move from the vm events section to the node stats section.
>>
>> Memcg accounting of these counters is preserved. The relocated counters
>> remain visible in memory.stat alongside the existing aggregate pgscan and
>> pgsteal counters.
>>
>> However, this change affects how the global counters are accumulated.
>> Previously, the global event count update was gated on !cgroup_reclaim(),
>> excluding memcg-based reclaim from /proc/vmstat. Now that
>> mod_lruvec_state() is being used to update the counters, the global
>> counters will include all reclaim. This is consistent with how pgdemote
>> counters are already tracked.
>
> Hm so that leaves PGREFILL (scanned in the active list) the odd one out,
> right? Not being per-node and gated on !cgroup_reclaim() for global stats.
> Should we change it too for full consistency?
I'm fine with adding coverage for the active list side as well. For
completeness, I could also include PGDEACTIVATE.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: move pgscan and pgsteal to node stats
2026-02-18 18:02 ` JP Kobryn (Meta)
@ 2026-02-18 19:53 ` JP Kobryn (Meta)
0 siblings, 0 replies; 6+ messages in thread
From: JP Kobryn (Meta) @ 2026-02-18 19:53 UTC (permalink / raw)
To: Vlastimil Babka (SUSE), linux-mm, mst, mhocko, vbabka
Cc: apopple, akpm, axelrasmussen, byungchul, cgroups, david,
eperezma, gourry, jasowang, hannes, joshua.hahnjy, Liam.Howlett,
linux-kernel, lorenzo.stoakes, matthew.brost, rppt, muchun.song,
zhengqi.arch, rakie.kim, roman.gushchin, shakeel.butt, surenb,
virtualization, weixugc, xuanzhuo, ying.huang, yuanchu, ziy,
kernel-team
On 2/18/26 10:02 AM, JP Kobryn (Meta) wrote:
> On 2/18/26 12:54 AM, Vlastimil Babka (SUSE) wrote:
>> On 2/18/26 04:29, JP Kobryn (Meta) wrote:
>>> From: JP Kobryn <jp.kobryn@linux.dev>
>>>
>>> There are situations where reclaim kicks in on a system with free
>>> memory.
>>> One possible cause is a NUMA imbalance scenario where one or more
>>> nodes are
>>> under pressure. It would help if we could easily identify such nodes.
>>>
>>> Move the pgscan and pgsteal counters from vm_event_item to
>>> node_stat_item
>>> to provide per-node reclaim visibility. With these counters as node
>>> stats,
>>> the values are now displayed in the per-node section of /proc/zoneinfo,
>>> which allows for quick identification of the affected nodes.
>>>
>>> /proc/vmstat continues to report the same counters, aggregated across
>>> all
>>> nodes. But the ordering of these items within the readout changes as
>>> they
>>> move from the vm events section to the node stats section.
>>>
>>> Memcg accounting of these counters is preserved. The relocated counters
>>> remain visible in memory.stat alongside the existing aggregate pgscan
>>> and
>>> pgsteal counters.
>>>
>>> However, this change affects how the global counters are accumulated.
>>> Previously, the global event count update was gated on !
>>> cgroup_reclaim(),
>>> excluding memcg-based reclaim from /proc/vmstat. Now that
>>> mod_lruvec_state() is being used to update the counters, the global
>>> counters will include all reclaim. This is consistent with how pgdemote
>>> counters are already tracked.
>>
>> Hm so that leaves PGREFILL (scanned in the active list) the odd one out,
>> right? Not being per-node and gated on !cgroup_reclaim() for global
>> stats.
>> Should we change it too for full consistency?
>
> I'm fine with adding coverage for the active list side as well. For
> completeness, I could also include PGDEACTIVATE.
Actually, I see PGDEACTIVATE is not gated so I'll leave that one out.
I'll send v3 and include PGREFILL.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-02-18 19:53 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-18 3:29 [PATCH v2] mm: move pgscan and pgsteal to node stats JP Kobryn (Meta)
2026-02-18 6:44 ` Michael S. Tsirkin
2026-02-18 7:57 ` Michal Hocko
2026-02-18 8:54 ` Vlastimil Babka (SUSE)
2026-02-18 18:02 ` JP Kobryn (Meta)
2026-02-18 19:53 ` JP Kobryn (Meta)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox