linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH mm-unstable v1 0/2] Track pages allocated for struct
@ 2024-10-31 22:45 Kinsey Ho
  2024-10-31 22:45 ` [PATCH mm-unstable v1 1/2] mm: add generic system-wide page counters Kinsey Ho
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Kinsey Ho @ 2024-10-31 22:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Pasha Tatashin, David Rientjes, willy,
	Vlastimil Babka, David Hildenbrand, Kinsey Ho, Joel Granados,
	Kaiyang Zhao, Sourav Panda, linux-kernel, cgroups, linux-mm

We noticed high overhead for pages allocated for struct swap_cgroup in
our fleet. This patchset adds the number of pages allocated for struct
swap_cgroup to vmstat. This can be a useful metric for identifying
unneeded overhead on systems which configure swap.

Before adding the new stat, Patch 1 introduces a generic system-wide
counting interface. 

Kinsey Ho (2):
  mm: add generic system-wide page counters
  mm, swap: add pages allocated for struct swap_cgroup to vmstat

 include/linux/vmstat.h | 11 +++++++++++
 mm/swap_cgroup.c       |  3 +++
 mm/vmstat.c            | 35 ++++++++++++++++++++++++++---------
 3 files changed, 40 insertions(+), 9 deletions(-)

-- 
2.47.0.163.g1226f6d8fa-goog



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH mm-unstable v1 1/2] mm: add generic system-wide page counters
  2024-10-31 22:45 [PATCH mm-unstable v1 0/2] Track pages allocated for struct Kinsey Ho
@ 2024-10-31 22:45 ` Kinsey Ho
  2024-10-31 22:45 ` [PATCH mm-unstable v1 2/2] mm, swap: add pages allocated for struct swap_cgroup to vmstat Kinsey Ho
  2024-10-31 23:06 ` [PATCH mm-unstable v1 0/2] Track pages allocated for struct Andrew Morton
  2 siblings, 0 replies; 8+ messages in thread
From: Kinsey Ho @ 2024-10-31 22:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Pasha Tatashin, David Rientjes, willy,
	Vlastimil Babka, David Hildenbrand, Kinsey Ho, Joel Granados,
	Kaiyang Zhao, Sourav Panda, linux-kernel, cgroups, linux-mm

commit f4cb78af91e3 ("mm: add system wide stats items category") and
commit 9d8573111024 ("mm: don't account memmap per-node") renamed
NR_VM_WRITEBACK_STAT_ITEMS to NR_VM_STAT_ITEMS to track
memmap/memmap_boot pages system wide.

Extend the implementation so that the system wide page statistics can
be tracked using a generic interface. This patch is in preparation for
the next patch which adds a rarely modified system wide vmstat.

Note that this implementation uses global atomic fields with no per-cpu
optimizations as the existing usecase (memmap pages) is rarely modified
as well.

Signed-off-by: Kinsey Ho <kinseyho@google.com>
---
 include/linux/vmstat.h |  8 ++++++++
 mm/vmstat.c            | 32 +++++++++++++++++++++++---------
 2 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index d2761bf8ff32..ac4d42c4fabd 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -145,6 +145,11 @@ extern atomic_long_t vm_zone_stat[NR_VM_ZONE_STAT_ITEMS];
 extern atomic_long_t vm_node_stat[NR_VM_NODE_STAT_ITEMS];
 extern atomic_long_t vm_numa_event[NR_VM_NUMA_EVENT_ITEMS];
 
+/*
+ * Global page accounting (no per cpu differentials).
+ */
+extern atomic_long_t vm_global_stat[NR_VM_STAT_ITEMS];
+
 #ifdef CONFIG_NUMA
 static inline void zone_numa_event_add(long x, struct zone *zone,
 				enum numa_stat_item item)
@@ -491,6 +496,9 @@ static inline void node_stat_sub_folio(struct folio *folio,
 	mod_node_page_state(folio_pgdat(folio), item, -folio_nr_pages(folio));
 }
 
+void mod_global_page_state(enum vm_stat_item item, long nr);
+unsigned long global_page_state(enum vm_stat_item item);
+
 extern const char * const vmstat_text[];
 
 static inline const char *zone_stat_name(enum zone_stat_item item)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 22a294556b58..e5a6dd5106c2 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -161,9 +161,11 @@ void vm_events_fold_cpu(int cpu)
  */
 atomic_long_t vm_zone_stat[NR_VM_ZONE_STAT_ITEMS] __cacheline_aligned_in_smp;
 atomic_long_t vm_node_stat[NR_VM_NODE_STAT_ITEMS] __cacheline_aligned_in_smp;
+atomic_long_t vm_global_stat[NR_VM_STAT_ITEMS] __cacheline_aligned_in_smp;
 atomic_long_t vm_numa_event[NR_VM_NUMA_EVENT_ITEMS] __cacheline_aligned_in_smp;
 EXPORT_SYMBOL(vm_zone_stat);
 EXPORT_SYMBOL(vm_node_stat);
+EXPORT_SYMBOL(vm_global_stat);
 
 #ifdef CONFIG_NUMA
 static void fold_vm_zone_numa_events(struct zone *zone)
@@ -1033,22 +1035,34 @@ unsigned long node_page_state(struct pglist_data *pgdat,
 }
 #endif
 
+void mod_global_page_state(enum vm_stat_item item, long nr)
+{
+	atomic_long_add(nr, &vm_global_stat[item]);
+}
+
+unsigned long global_page_state(enum vm_stat_item item)
+{
+	long x = atomic_long_read(&vm_global_stat[item]);
+#ifdef CONFIG_SMP
+	if (x < 0)
+		x = 0;
+#endif
+	return x;
+}
+
 /*
  * Count number of pages "struct page" and "struct page_ext" consume.
- * nr_memmap_boot_pages: # of pages allocated by boot allocator
- * nr_memmap_pages: # of pages that were allocated by buddy allocator
+ * NR_MEMMAP_BOOT_PAGES: # of pages allocated by boot allocator
+ * NR_MEMMAP_PAGES: # of pages that were allocated by buddy allocator
  */
-static atomic_long_t nr_memmap_boot_pages = ATOMIC_LONG_INIT(0);
-static atomic_long_t nr_memmap_pages = ATOMIC_LONG_INIT(0);
-
 void memmap_boot_pages_add(long delta)
 {
-	atomic_long_add(delta, &nr_memmap_boot_pages);
+	mod_global_page_state(NR_MEMMAP_BOOT_PAGES, delta);
 }
 
 void memmap_pages_add(long delta)
 {
-	atomic_long_add(delta, &nr_memmap_pages);
+	mod_global_page_state(NR_MEMMAP_PAGES, delta);
 }
 
 #ifdef CONFIG_COMPACTION
@@ -1880,8 +1894,8 @@ static void *vmstat_start(struct seq_file *m, loff_t *pos)
 
 	global_dirty_limits(v + NR_DIRTY_BG_THRESHOLD,
 			    v + NR_DIRTY_THRESHOLD);
-	v[NR_MEMMAP_PAGES] = atomic_long_read(&nr_memmap_pages);
-	v[NR_MEMMAP_BOOT_PAGES] = atomic_long_read(&nr_memmap_boot_pages);
+	for (int i = NR_MEMMAP_PAGES; i < NR_VM_STAT_ITEMS; i++)
+		v[i] = global_page_state(i);
 	v += NR_VM_STAT_ITEMS;
 
 #ifdef CONFIG_VM_EVENT_COUNTERS
-- 
2.47.0.163.g1226f6d8fa-goog



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH mm-unstable v1 2/2] mm, swap: add pages allocated for struct swap_cgroup to vmstat
  2024-10-31 22:45 [PATCH mm-unstable v1 0/2] Track pages allocated for struct Kinsey Ho
  2024-10-31 22:45 ` [PATCH mm-unstable v1 1/2] mm: add generic system-wide page counters Kinsey Ho
@ 2024-10-31 22:45 ` Kinsey Ho
  2024-11-04 16:22   ` Michal Koutný
  2024-10-31 23:06 ` [PATCH mm-unstable v1 0/2] Track pages allocated for struct Andrew Morton
  2 siblings, 1 reply; 8+ messages in thread
From: Kinsey Ho @ 2024-10-31 22:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Pasha Tatashin, David Rientjes, willy,
	Vlastimil Babka, David Hildenbrand, Kinsey Ho, Joel Granados,
	Kaiyang Zhao, Sourav Panda, linux-kernel, cgroups, linux-mm

Export the number of pages allocated for storing struct swap_cgroup in
vmstat using global system-wide counters.

Signed-off-by: Kinsey Ho <kinseyho@google.com>
---
 include/linux/vmstat.h | 3 +++
 mm/swap_cgroup.c       | 3 +++
 mm/vmstat.c            | 3 +++
 3 files changed, 9 insertions(+)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index ac4d42c4fabd..227e951d1219 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -41,6 +41,9 @@ enum vm_stat_item {
 	NR_DIRTY_BG_THRESHOLD,
 	NR_MEMMAP_PAGES,	/* page metadata allocated through buddy allocator */
 	NR_MEMMAP_BOOT_PAGES,	/* page metadata allocated through boot allocator */
+#if defined(CONFIG_MEMCG) && defined(CONFIG_SWAP)
+	NR_SWAP_CGROUP_PAGES,	/* allocated to store struct swap_cgroup */
+#endif
 	NR_VM_STAT_ITEMS,
 };
 
diff --git a/mm/swap_cgroup.c b/mm/swap_cgroup.c
index da1278f0563b..82eda8a3efe1 100644
--- a/mm/swap_cgroup.c
+++ b/mm/swap_cgroup.c
@@ -53,6 +53,8 @@ static int swap_cgroup_prepare(int type)
 		if (!(idx % SWAP_CLUSTER_MAX))
 			cond_resched();
 	}
+	mod_global_page_state(NR_SWAP_CGROUP_PAGES, ctrl->length);
+
 	return 0;
 not_enough_page:
 	max = idx;
@@ -228,6 +230,7 @@ void swap_cgroup_swapoff(int type)
 			if (!(i % SWAP_CLUSTER_MAX))
 				cond_resched();
 		}
+		mod_global_page_state(NR_SWAP_CGROUP_PAGES, -length);
 		vfree(map);
 	}
 }
diff --git a/mm/vmstat.c b/mm/vmstat.c
index e5a6dd5106c2..259574261ec1 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1295,6 +1295,9 @@ const char * const vmstat_text[] = {
 	"nr_dirty_background_threshold",
 	"nr_memmap_pages",
 	"nr_memmap_boot_pages",
+#if defined(CONFIG_MEMCG) && defined(CONFIG_SWAP)
+	"nr_swap_cgroup_pages",
+#endif
 
 #if defined(CONFIG_VM_EVENT_COUNTERS) || defined(CONFIG_MEMCG)
 	/* enum vm_event_item counters */
-- 
2.47.0.163.g1226f6d8fa-goog



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH mm-unstable v1 0/2] Track pages allocated for struct
  2024-10-31 22:45 [PATCH mm-unstable v1 0/2] Track pages allocated for struct Kinsey Ho
  2024-10-31 22:45 ` [PATCH mm-unstable v1 1/2] mm: add generic system-wide page counters Kinsey Ho
  2024-10-31 22:45 ` [PATCH mm-unstable v1 2/2] mm, swap: add pages allocated for struct swap_cgroup to vmstat Kinsey Ho
@ 2024-10-31 23:06 ` Andrew Morton
  2024-11-01 15:54   ` Matthew Wilcox
                     ` (2 more replies)
  2 siblings, 3 replies; 8+ messages in thread
From: Andrew Morton @ 2024-10-31 23:06 UTC (permalink / raw)
  To: Kinsey Ho
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Pasha Tatashin, David Rientjes, willy,
	Vlastimil Babka, David Hildenbrand, Joel Granados, Kaiyang Zhao,
	Sourav Panda, linux-kernel, cgroups, linux-mm

hm.

On Thu, 31 Oct 2024 22:45:49 +0000 Kinsey Ho <kinseyho@google.com> wrote:

> We noticed high overhead for pages allocated for struct swap_cgroup in
> our fleet.

This is scanty.  Please describe the problem further.

> This patchset adds the number of pages allocated for struct
> swap_cgroup to vmstat. This can be a useful metric for identifying
> unneeded overhead on systems which configure swap.

Possibly dumb question: can we switch swap_cgroup_prepare to kmalloc()
(or kmem-cache_alloc()) and use slab's accounting to satisfy this
requirement?

> Before adding the new stat, Patch 1 introduces a generic system-wide
> counting interface. 

I don't really understand this one and would like to hear much more
about the motivation.

And: "the existing use case" is OK with a global counter, but what about
future use cases?

And: what are the future use cases?

Thanks.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH mm-unstable v1 0/2] Track pages allocated for struct
  2024-10-31 23:06 ` [PATCH mm-unstable v1 0/2] Track pages allocated for struct Andrew Morton
@ 2024-11-01 15:54   ` Matthew Wilcox
  2024-11-13  0:19   ` Kinsey Ho
  2024-11-13 19:18   ` Roman Gushchin
  2 siblings, 0 replies; 8+ messages in thread
From: Matthew Wilcox @ 2024-11-01 15:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kinsey Ho, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Muchun Song, Pasha Tatashin, David Rientjes,
	Vlastimil Babka, David Hildenbrand, Joel Granados, Kaiyang Zhao,
	Sourav Panda, linux-kernel, cgroups, linux-mm

On Thu, Oct 31, 2024 at 04:06:04PM -0700, Andrew Morton wrote:
> Possibly dumb question: can we switch swap_cgroup_prepare to kmalloc()
> (or kmem-cache_alloc()) and use slab's accounting to satisfy this
> requirement?

It looks to me like a bad reimplemention of vmalloc().  It looks like
this code used to make more sense once upon a time, but now it's really
just vmalloc().  Or did I miss something?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH mm-unstable v1 2/2] mm, swap: add pages allocated for struct swap_cgroup to vmstat
  2024-10-31 22:45 ` [PATCH mm-unstable v1 2/2] mm, swap: add pages allocated for struct swap_cgroup to vmstat Kinsey Ho
@ 2024-11-04 16:22   ` Michal Koutný
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Koutný @ 2024-11-04 16:22 UTC (permalink / raw)
  To: Kinsey Ho
  Cc: Andrew Morton, Johannes Weiner, Michal Hocko, Roman Gushchin,
	Shakeel Butt, Muchun Song, Pasha Tatashin, David Rientjes, willy,
	Vlastimil Babka, David Hildenbrand, Joel Granados, Kaiyang Zhao,
	Sourav Panda, linux-kernel, cgroups, linux-mm

[-- Attachment #1: Type: text/plain, Size: 527 bytes --]

Hello Kinsey.

On Thu, Oct 31, 2024 at 10:45:51PM GMT, Kinsey Ho <kinseyho@google.com> wrote:
> Export the number of pages allocated for storing struct swap_cgroup in
> vmstat using global system-wide counters.

This consumption is quite static (it only changes between swapon/swapoff
switches). The resulting value can be calculated as a linear combination
of entries in /proc/swaps already (if you know the right coefficients).

I'm not sure this warrants the new entry (or is my assumption about
static-ness wrong?)

Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH mm-unstable v1 0/2] Track pages allocated for struct
  2024-10-31 23:06 ` [PATCH mm-unstable v1 0/2] Track pages allocated for struct Andrew Morton
  2024-11-01 15:54   ` Matthew Wilcox
@ 2024-11-13  0:19   ` Kinsey Ho
  2024-11-13 19:18   ` Roman Gushchin
  2 siblings, 0 replies; 8+ messages in thread
From: Kinsey Ho @ 2024-11-13  0:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Pasha Tatashin, David Rientjes, willy,
	Vlastimil Babka, David Hildenbrand, Joel Granados, Kaiyang Zhao,
	Sourav Panda, linux-kernel, cgroups, linux-mm

Hi Andrew,

Thank you for the review and comments!

On Fri, Nov 1, 2024 at 6:57 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> hm.
>
> On Thu, 31 Oct 2024 22:45:49 +0000 Kinsey Ho <kinseyho@google.com> wrote:
>
> > We noticed high overhead for pages allocated for struct swap_cgroup in
> > our fleet.
>
> This is scanty.  Please describe the problem further.

In our fleet, we had machines with multiple large swap files
configured, and we noticed that we hadn't accounted for the overhead
from the pages allocated for struct swap_cgroup. In some cases, we saw
a couple GiB of overhead from these pages, so this patchset's goal is
to expose this overhead value for easier detection.

> And: "the existing use case" is OK with a global counter, but what about
> future use cases?
>
> And: what are the future use cases?

As global counting already exists with memmap/memmap_boot pages, the
introduction of a generic global counter interface was just to try and
aggregate the global counting code when introducing another use case.
However, since the value of pages allocated for swap_cgroup can be
derived from /proc/swaps, it doesn't seem warranted that a new entry
be added to vmstat. We've decided to drop this patchset. Thanks again!


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH mm-unstable v1 0/2] Track pages allocated for struct
  2024-10-31 23:06 ` [PATCH mm-unstable v1 0/2] Track pages allocated for struct Andrew Morton
  2024-11-01 15:54   ` Matthew Wilcox
  2024-11-13  0:19   ` Kinsey Ho
@ 2024-11-13 19:18   ` Roman Gushchin
  2 siblings, 0 replies; 8+ messages in thread
From: Roman Gushchin @ 2024-11-13 19:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kinsey Ho, Johannes Weiner, Michal Hocko, Shakeel Butt,
	Muchun Song, Pasha Tatashin, David Rientjes, willy,
	Vlastimil Babka, David Hildenbrand, Joel Granados, Kaiyang Zhao,
	Sourav Panda, linux-kernel, cgroups, linux-mm

On Thu, Oct 31, 2024 at 04:06:04PM -0700, Andrew Morton wrote:
> hm.
> 
> On Thu, 31 Oct 2024 22:45:49 +0000 Kinsey Ho <kinseyho@google.com> wrote:
> 
> > We noticed high overhead for pages allocated for struct swap_cgroup in
> > our fleet.
> 
> This is scanty.  Please describe the problem further.
> 
> > This patchset adds the number of pages allocated for struct
> > swap_cgroup to vmstat. This can be a useful metric for identifying
> > unneeded overhead on systems which configure swap.
> 
> Possibly dumb question: can we switch swap_cgroup_prepare to kmalloc()
> (or kmem-cache_alloc()) and use slab's accounting to satisfy this
> requirement?

Or vzalloc/kvmalloc(), which are used for allocating the rest of swap-related
meta-data.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-11-13 19:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-31 22:45 [PATCH mm-unstable v1 0/2] Track pages allocated for struct Kinsey Ho
2024-10-31 22:45 ` [PATCH mm-unstable v1 1/2] mm: add generic system-wide page counters Kinsey Ho
2024-10-31 22:45 ` [PATCH mm-unstable v1 2/2] mm, swap: add pages allocated for struct swap_cgroup to vmstat Kinsey Ho
2024-11-04 16:22   ` Michal Koutný
2024-10-31 23:06 ` [PATCH mm-unstable v1 0/2] Track pages allocated for struct Andrew Morton
2024-11-01 15:54   ` Matthew Wilcox
2024-11-13  0:19   ` Kinsey Ho
2024-11-13 19:18   ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox