* [PATCH linux-next v3 1/6] memcg: add per-memcg ksm_rmap_items stat
2025-09-21 15:07 [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics xu.xin16
@ 2025-09-21 15:08 ` xu.xin16
2025-09-23 8:28 ` David Hildenbrand
2025-09-21 15:11 ` [PATCH linux-next v3 2/6] memcg: show ksm_zero_pages count in memory.stat xu.xin16
` (7 subsequent siblings)
8 siblings, 1 reply; 14+ messages in thread
From: xu.xin16 @ 2025-09-21 15:08 UTC (permalink / raw)
To: xu.xin16, akpm
Cc: akpm, shakeel.butt, hannes, mhocko, roman.gushchin, david,
chengming.zhou, muchun.song, linux-kernel, linux-mm, cgroups
From: xu xin <xu.xin16@zte.com.cn>
With the enablement of container-level KSM (e.g., via prctl), there is
a growing demand for container-level observability of KSM behavior.
The value of "ksm_rmap_items" indicates the total allocated ksm
rmap_items of this memcg, which could be used to determine how
unbeneficial the ksm-policy (like madvise), they are using brings,
since the bigger the ratio of ksm_rmap_items over ksm_merging_pages,
the more unbeneficial the ksm bring.
Add the counter in the existing memory.stat without adding a new interface.
We traverse all processes of a memcg and summing the processes'
ksm_rmap_items counters instead of adding enum item in memcg_stat_item
or node_stat_item and updating the corresponding enum counter when
ksmd manipulate pages.
Finally, we can look up ksm_rmap_items of per-memcg simply by:
cat /sys/fs/cgroup/memory.stat | grep ksm_rmap_items
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
include/linux/ksm.h | 1 +
mm/ksm.c | 39 +++++++++++++++++++++++++++++++++++++++
mm/memcontrol.c | 5 +++++
3 files changed, 45 insertions(+)
diff --git a/include/linux/ksm.h b/include/linux/ksm.h
index 067538fc4d58..ce2a32b73f95 100644
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -100,6 +100,7 @@ void collect_procs_ksm(const struct folio *folio, const struct page *page,
struct list_head *to_kill, int force_early);
long ksm_process_profit(struct mm_struct *);
bool ksm_process_mergeable(struct mm_struct *mm);
+void memcg_stat_ksm_show(struct mem_cgroup *memcg, struct seq_buf *s);
#else /* !CONFIG_KSM */
diff --git a/mm/ksm.c b/mm/ksm.c
index 2ef29802a49b..be0efa0f8f2b 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -40,6 +40,7 @@
#include <linux/oom.h>
#include <linux/numa.h>
#include <linux/pagewalk.h>
+#include <linux/seq_buf.h>
#include <asm/tlbflush.h>
#include "internal.h"
@@ -3308,6 +3309,44 @@ long ksm_process_profit(struct mm_struct *mm)
}
#endif /* CONFIG_PROC_FS */
+#ifdef CONFIG_MEMCG
+struct memcg_ksm_stat {
+ unsigned long ksm_rmap_items;
+};
+
+static int evaluate_memcg_ksm_stat(struct task_struct *task, void *arg)
+{
+ struct mm_struct *mm;
+ struct memcg_ksm_stat *ksm_stat = arg;
+
+ mm = get_task_mm(task);
+ if (mm) {
+ ksm_stat->ksm_rmap_items += mm->ksm_rmap_items;
+ mmput(mm);
+ }
+
+ return 0;
+}
+
+/* Show the ksm statistic count at memory.stat under cgroup mountpoint */
+void memcg_stat_ksm_show(struct mem_cgroup *memcg, struct seq_buf *s)
+{
+ struct memcg_ksm_stat ksm_stat;
+
+ if (mem_cgroup_is_root(memcg)) {
+ /* Just use the global counters when root memcg */
+ ksm_stat.ksm_rmap_items = ksm_rmap_items;
+ } else {
+ /* Initialization */
+ ksm_stat.ksm_rmap_items = 0;
+ /* Summing all processes'ksm statistic items */
+ mem_cgroup_scan_tasks(memcg, evaluate_memcg_ksm_stat, &ksm_stat);
+ }
+ /* Print memcg ksm statistic items */
+ seq_buf_printf(s, "ksm_rmap_items %lu\n", ksm_stat.ksm_rmap_items);
+}
+#endif
+
#ifdef CONFIG_SYSFS
/*
* This all compiles without CONFIG_SYSFS, but is a waste of space.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e090f29eb03b..705717f73b89 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -63,6 +63,7 @@
#include <linux/seq_buf.h>
#include <linux/sched/isolation.h>
#include <linux/kmemleak.h>
+#include <linux/ksm.h>
#include "internal.h"
#include <net/sock.h>
#include <net/ip.h>
@@ -1493,6 +1494,10 @@ static void memcg_stat_format(struct mem_cgroup *memcg, struct seq_buf *s)
}
}
+#ifdef CONFIG_KSM
+ memcg_stat_ksm_show(memcg, s);
+#endif
+
/* Accumulated memory events */
seq_buf_printf(s, "pgscan %lu\n",
memcg_events(memcg, PGSCAN_KSWAPD) +
--
2.25.1
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH linux-next v3 1/6] memcg: add per-memcg ksm_rmap_items stat
2025-09-21 15:08 ` [PATCH linux-next v3 1/6] memcg: add per-memcg ksm_rmap_items stat xu.xin16
@ 2025-09-23 8:28 ` David Hildenbrand
0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2025-09-23 8:28 UTC (permalink / raw)
To: xu.xin16, akpm
Cc: shakeel.butt, hannes, mhocko, roman.gushchin, chengming.zhou,
muchun.song, linux-kernel, linux-mm, cgroups
> +
> +/* Show the ksm statistic count at memory.stat under cgroup mountpoint */
> +void memcg_stat_ksm_show(struct mem_cgroup *memcg, struct seq_buf *s)
> +{
> + struct memcg_ksm_stat ksm_stat;
> +
> + if (mem_cgroup_is_root(memcg)) {
> + /* Just use the global counters when root memcg */
> + ksm_stat.ksm_rmap_items = ksm_rmap_items;
> + } else {
> + /* Initialization */
> + ksm_stat.ksm_rmap_items = 0;
> + /* Summing all processes'ksm statistic items */
> + mem_cgroup_scan_tasks(memcg, evaluate_memcg_ksm_stat, &ksm_stat);
What happens here if you have to tasks that share the same MM? (CLONE_VM
without CLONE_THREAD)?
Wouldn't we end up counting the same MM multiple times?
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH linux-next v3 2/6] memcg: show ksm_zero_pages count in memory.stat
2025-09-21 15:07 [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics xu.xin16
2025-09-21 15:08 ` [PATCH linux-next v3 1/6] memcg: add per-memcg ksm_rmap_items stat xu.xin16
@ 2025-09-21 15:11 ` xu.xin16
2025-09-21 15:12 ` [PATCH linux-next v3 3/6] memcg: show ksm_merging_pages xu.xin16
` (6 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: xu.xin16 @ 2025-09-21 15:11 UTC (permalink / raw)
To: xu.xin16, akpm
Cc: shakeel.butt, hannes, mhocko, roman.gushchin, david,
chengming.zhou, muchun.song, linux-kernel, linux-mm, cgroups
From: xu xin <xu.xin16@zte.com.cn>
Users can obtain ksm_zero_pages of a cgroup just by:
'cat /sys/fs/cgroup/memory.stat | grep ksm_zero_pages
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
mm/ksm.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/ksm.c b/mm/ksm.c
index be0efa0f8f2b..2fb4198458a4 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -3312,6 +3312,7 @@ long ksm_process_profit(struct mm_struct *mm)
#ifdef CONFIG_MEMCG
struct memcg_ksm_stat {
unsigned long ksm_rmap_items;
+ long ksm_zero_pages;
};
static int evaluate_memcg_ksm_stat(struct task_struct *task, void *arg)
@@ -3322,6 +3323,7 @@ static int evaluate_memcg_ksm_stat(struct task_struct *task, void *arg)
mm = get_task_mm(task);
if (mm) {
ksm_stat->ksm_rmap_items += mm->ksm_rmap_items;
+ ksm_stat->ksm_zero_pages += mm_ksm_zero_pages(mm);
mmput(mm);
}
@@ -3336,14 +3338,17 @@ void memcg_stat_ksm_show(struct mem_cgroup *memcg, struct seq_buf *s)
if (mem_cgroup_is_root(memcg)) {
/* Just use the global counters when root memcg */
ksm_stat.ksm_rmap_items = ksm_rmap_items;
+ ksm_stat.ksm_zero_pages = atomic_long_read(&ksm_zero_pages);
} else {
/* Initialization */
ksm_stat.ksm_rmap_items = 0;
+ ksm_stat.ksm_zero_pages = 0;
/* Summing all processes'ksm statistic items */
mem_cgroup_scan_tasks(memcg, evaluate_memcg_ksm_stat, &ksm_stat);
}
/* Print memcg ksm statistic items */
seq_buf_printf(s, "ksm_rmap_items %lu\n", ksm_stat.ksm_rmap_items);
+ seq_buf_printf(s, "ksm_zero_pages %lu\n", ksm_stat.ksm_zero_pages);
}
#endif
--
2.25.1
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH linux-next v3 3/6] memcg: show ksm_merging_pages
2025-09-21 15:07 [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics xu.xin16
2025-09-21 15:08 ` [PATCH linux-next v3 1/6] memcg: add per-memcg ksm_rmap_items stat xu.xin16
2025-09-21 15:11 ` [PATCH linux-next v3 2/6] memcg: show ksm_zero_pages count in memory.stat xu.xin16
@ 2025-09-21 15:12 ` xu.xin16
2025-09-21 15:13 ` [PATCH linux-next v3 4/6] ksm: make ksm_process_profit available on CONFIG_PROCFS=n xu.xin16
` (5 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: xu.xin16 @ 2025-09-21 15:12 UTC (permalink / raw)
To: xu.xin16, akpm
Cc: shakeel.butt, hannes, mhocko, roman.gushchin, david,
chengming.zhou, muchun.song, linux-kernel, linux-mm, cgroups
From: xu xin <xu.xin16@zte.com.cn>
Users can obtain ksm_merging_pages of a cgroup just by:
'cat /sys/fs/cgroup/memory.stat | grep ksm_merging_pages
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
mm/ksm.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/mm/ksm.c b/mm/ksm.c
index 2fb4198458a4..e49f4b86ffb0 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -3313,6 +3313,7 @@ long ksm_process_profit(struct mm_struct *mm)
struct memcg_ksm_stat {
unsigned long ksm_rmap_items;
long ksm_zero_pages;
+ unsigned long ksm_merging_pages;
};
static int evaluate_memcg_ksm_stat(struct task_struct *task, void *arg)
@@ -3324,6 +3325,7 @@ static int evaluate_memcg_ksm_stat(struct task_struct *task, void *arg)
if (mm) {
ksm_stat->ksm_rmap_items += mm->ksm_rmap_items;
ksm_stat->ksm_zero_pages += mm_ksm_zero_pages(mm);
+ ksm_stat->ksm_merging_pages += mm->ksm_merging_pages;
mmput(mm);
}
@@ -3339,16 +3341,20 @@ void memcg_stat_ksm_show(struct mem_cgroup *memcg, struct seq_buf *s)
/* Just use the global counters when root memcg */
ksm_stat.ksm_rmap_items = ksm_rmap_items;
ksm_stat.ksm_zero_pages = atomic_long_read(&ksm_zero_pages);
+ ksm_stat.ksm_merging_pages = ksm_pages_shared +
+ ksm_pages_sharing;
} else {
/* Initialization */
ksm_stat.ksm_rmap_items = 0;
ksm_stat.ksm_zero_pages = 0;
+ ksm_stat.ksm_merging_pages = 0;
/* Summing all processes'ksm statistic items */
mem_cgroup_scan_tasks(memcg, evaluate_memcg_ksm_stat, &ksm_stat);
}
/* Print memcg ksm statistic items */
seq_buf_printf(s, "ksm_rmap_items %lu\n", ksm_stat.ksm_rmap_items);
seq_buf_printf(s, "ksm_zero_pages %lu\n", ksm_stat.ksm_zero_pages);
+ seq_buf_printf(s, "ksm_merging_pages %lu\n", ksm_stat.ksm_merging_pages);
}
#endif
--
2.25.1
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH linux-next v3 4/6] ksm: make ksm_process_profit available on CONFIG_PROCFS=n
2025-09-21 15:07 [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics xu.xin16
` (2 preceding siblings ...)
2025-09-21 15:12 ` [PATCH linux-next v3 3/6] memcg: show ksm_merging_pages xu.xin16
@ 2025-09-21 15:13 ` xu.xin16
2025-09-23 8:25 ` David Hildenbrand
2025-09-21 15:14 ` [PATCH linux-next v3 5/6] memcg: add per-memcg ksm_profit xu.xin16
` (4 subsequent siblings)
8 siblings, 1 reply; 14+ messages in thread
From: xu.xin16 @ 2025-09-21 15:13 UTC (permalink / raw)
To: xu.xin16, akpm
Cc: shakeel.butt, hannes, mhocko, roman.gushchin, david,
chengming.zhou, muchun.song, linux-kernel, linux-mm, cgroups
From: xu xin <xu.xin16@zte.com.cn>
This remove the restriction CONFIG_PROCFS=y for the heler function
ksm_process_profit(), then we can use it for the later patches on
CONFIG_PROCFS=n.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/all/202509142046.QatEaTQV-lkp@intel.com/
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
mm/ksm.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/mm/ksm.c b/mm/ksm.c
index e49f4b86ffb0..a68d4b37b503 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -3282,7 +3282,6 @@ static void wait_while_offlining(void)
}
#endif /* CONFIG_MEMORY_HOTREMOVE */
-#ifdef CONFIG_PROC_FS
/*
* The process is mergeable only if any VMA is currently
* applicable to KSM.
@@ -3307,7 +3306,6 @@ long ksm_process_profit(struct mm_struct *mm)
return (long)(mm->ksm_merging_pages + mm_ksm_zero_pages(mm)) * PAGE_SIZE -
mm->ksm_rmap_items * sizeof(struct ksm_rmap_item);
}
-#endif /* CONFIG_PROC_FS */
#ifdef CONFIG_MEMCG
struct memcg_ksm_stat {
--
2.25.1
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH linux-next v3 4/6] ksm: make ksm_process_profit available on CONFIG_PROCFS=n
2025-09-21 15:13 ` [PATCH linux-next v3 4/6] ksm: make ksm_process_profit available on CONFIG_PROCFS=n xu.xin16
@ 2025-09-23 8:25 ` David Hildenbrand
0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2025-09-23 8:25 UTC (permalink / raw)
To: xu.xin16, akpm
Cc: shakeel.butt, hannes, mhocko, roman.gushchin, chengming.zhou,
muchun.song, linux-kernel, linux-mm, cgroups
On 21.09.25 17:13, xu.xin16@zte.com.cn wrote:
> From: xu xin <xu.xin16@zte.com.cn>
>
> This remove the restriction CONFIG_PROCFS=y for the heler function
s/heler/helper.
> ksm_process_profit(), then we can use it for the later patches on
> CONFIG_PROCFS=n.
Better to something like this:
"Let's provide ksm_process_profit() also without CONFIG_PROCFS so we can
use it from memcg code next."
?
>
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/all/202509142046.QatEaTQV-lkp@intel.com/
Both tags should be dropped as there is nothing fixed here.
> Signed-off-by: xu xin <xu.xin16@zte.com.cn>
> ---
> mm/ksm.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/mm/ksm.c b/mm/ksm.c
> index e49f4b86ffb0..a68d4b37b503 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -3282,7 +3282,6 @@ static void wait_while_offlining(void)
> }
> #endif /* CONFIG_MEMORY_HOTREMOVE */
>
> -#ifdef CONFIG_PROC_FS
> /*
> * The process is mergeable only if any VMA is currently
> * applicable to KSM.
> @@ -3307,7 +3306,6 @@ long ksm_process_profit(struct mm_struct *mm)
> return (long)(mm->ksm_merging_pages + mm_ksm_zero_pages(mm)) * PAGE_SIZE -
> mm->ksm_rmap_items * sizeof(struct ksm_rmap_item);
> }
> -#endif /* CONFIG_PROC_FS */
>
> #ifdef CONFIG_MEMCG
> struct memcg_ksm_stat {
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH linux-next v3 5/6] memcg: add per-memcg ksm_profit
2025-09-21 15:07 [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics xu.xin16
` (3 preceding siblings ...)
2025-09-21 15:13 ` [PATCH linux-next v3 4/6] ksm: make ksm_process_profit available on CONFIG_PROCFS=n xu.xin16
@ 2025-09-21 15:14 ` xu.xin16
2025-09-21 15:15 ` [PATCH linux-next v3 6/6] Documentation: add KSM statistic counters description xu.xin16
` (3 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: xu.xin16 @ 2025-09-21 15:14 UTC (permalink / raw)
To: xu.xin16, akpm
Cc: shakeel.butt, hannes, mhocko, roman.gushchin, david,
chengming.zhou, muchun.song, linux-kernel, linux-mm, cgroups
From: xu xin <xu.xin16@zte.com.cn>
Users can obtain ksm_profit of a cgroup just by:
'cat /sys/fs/cgroup/memory.stat | grep ksm_profit
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
mm/ksm.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/mm/ksm.c b/mm/ksm.c
index a68d4b37b503..55329398797f 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -3307,11 +3307,18 @@ long ksm_process_profit(struct mm_struct *mm)
mm->ksm_rmap_items * sizeof(struct ksm_rmap_item);
}
+static inline long ksm_general_profit(void)
+{
+ return (ksm_pages_sharing + atomic_long_read(&ksm_zero_pages)) * PAGE_SIZE -
+ ksm_rmap_items * sizeof(struct ksm_rmap_item);
+}
+
#ifdef CONFIG_MEMCG
struct memcg_ksm_stat {
unsigned long ksm_rmap_items;
long ksm_zero_pages;
unsigned long ksm_merging_pages;
+ long ksm_profit;
};
static int evaluate_memcg_ksm_stat(struct task_struct *task, void *arg)
@@ -3324,6 +3331,7 @@ static int evaluate_memcg_ksm_stat(struct task_struct *task, void *arg)
ksm_stat->ksm_rmap_items += mm->ksm_rmap_items;
ksm_stat->ksm_zero_pages += mm_ksm_zero_pages(mm);
ksm_stat->ksm_merging_pages += mm->ksm_merging_pages;
+ ksm_stat->ksm_profit += ksm_process_profit(mm);
mmput(mm);
}
@@ -3341,11 +3349,13 @@ void memcg_stat_ksm_show(struct mem_cgroup *memcg, struct seq_buf *s)
ksm_stat.ksm_zero_pages = atomic_long_read(&ksm_zero_pages);
ksm_stat.ksm_merging_pages = ksm_pages_shared +
ksm_pages_sharing;
+ ksm_stat.ksm_profit = ksm_general_profit();
} else {
/* Initialization */
ksm_stat.ksm_rmap_items = 0;
ksm_stat.ksm_zero_pages = 0;
ksm_stat.ksm_merging_pages = 0;
+ ksm_stat.ksm_profit = 0;
/* Summing all processes'ksm statistic items */
mem_cgroup_scan_tasks(memcg, evaluate_memcg_ksm_stat, &ksm_stat);
}
@@ -3353,6 +3363,7 @@ void memcg_stat_ksm_show(struct mem_cgroup *memcg, struct seq_buf *s)
seq_buf_printf(s, "ksm_rmap_items %lu\n", ksm_stat.ksm_rmap_items);
seq_buf_printf(s, "ksm_zero_pages %lu\n", ksm_stat.ksm_zero_pages);
seq_buf_printf(s, "ksm_merging_pages %lu\n", ksm_stat.ksm_merging_pages);
+ seq_buf_printf(s, "ksm_profit %lu\n", ksm_stat.ksm_profit);
}
#endif
@@ -3647,12 +3658,7 @@ KSM_ATTR_RO(ksm_zero_pages);
static ssize_t general_profit_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
{
- long general_profit;
-
- general_profit = (ksm_pages_sharing + atomic_long_read(&ksm_zero_pages)) * PAGE_SIZE -
- ksm_rmap_items * sizeof(struct ksm_rmap_item);
-
- return sysfs_emit(buf, "%ld\n", general_profit);
+ return sysfs_emit(buf, "%ld\n", ksm_general_profit());
}
KSM_ATTR_RO(general_profit);
--
2.25.1
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH linux-next v3 6/6] Documentation: add KSM statistic counters description
2025-09-21 15:07 [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics xu.xin16
` (4 preceding siblings ...)
2025-09-21 15:14 ` [PATCH linux-next v3 5/6] memcg: add per-memcg ksm_profit xu.xin16
@ 2025-09-21 15:15 ` xu.xin16
2025-09-22 8:20 ` [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics Michal Hocko
` (2 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: xu.xin16 @ 2025-09-21 15:15 UTC (permalink / raw)
To: xu.xin16, akpm
Cc: shakeel.butt, hannes, mhocko, roman.gushchin, david,
chengming.zhou, muchun.song, linux-kernel, linux-mm, cgroups
From: xu xin <xu.xin16@zte.com.cn>
This add KSM-related statistic counters description in cgroup-v2.rst,
including "ksm_rmap_items", "ksm_zero_pages", "ksm_merging_pages" and
"ksm_profit".
Signed-off-by: xu xin <xu.xin16@zte.com.cn>
---
Documentation/admin-guide/cgroup-v2.rst | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index a1e3d431974c..c8c4faa4e3fd 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1776,6 +1776,23 @@ The following nested keys are defined.
up if hugetlb usage is accounted for in memory.current (i.e.
cgroup is mounted with the memory_hugetlb_accounting option).
+ ksm_rmap_items
+ Number of ksm_rmap_item structures in use. The structure
+ ksm_rmap_item stores the reverse mapping information for virtual
+ addresses. KSM will generate a ksm_rmap_item for each
+ ksm-scanned page of the process.
+
+ ksm_zero_pages
+ Number of empty pages are merged with kernel zero pages by KSM,
+ which is only useful when /sys/kernel/mm/ksm/use_zero_pages.
+
+ ksm_merging_pages
+ Number of pages of this process are involved in KSM merging
+ (not including ksm_zero_pages).
+
+ ksm_profit
+ Amount of profitable memory that KSM brings (Saved bytes).
+
memory.numa_stat
A read-only nested-keyed file which exists on non-root cgroups.
--
2.25.1
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics
2025-09-21 15:07 [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics xu.xin16
` (5 preceding siblings ...)
2025-09-21 15:15 ` [PATCH linux-next v3 6/6] Documentation: add KSM statistic counters description xu.xin16
@ 2025-09-22 8:20 ` Michal Hocko
2025-09-22 9:31 ` 答复: " xu.xin16
2025-09-23 8:26 ` David Hildenbrand
2025-09-23 17:58 ` Shakeel Butt
8 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2025-09-22 8:20 UTC (permalink / raw)
To: xu.xin16
Cc: akpm, shakeel.butt, hannes, roman.gushchin, david,
chengming.zhou, muchun.song, linux-kernel, linux-mm, cgroups
On Sun 21-09-25 23:07:26, xu.xin16@zte.com.cn wrote:
> From: xu xin <xu.xin16@zte.com.cn>
>
> v2->v3:
> ------
> Some fixes of compilation error due to missed inclusion of header or missed
> function definition on some kernel config.
> https://lore.kernel.org/all/202509142147.WQI0impC-lkp@intel.com/
> https://lore.kernel.org/all/202509142046.QatEaTQV-lkp@intel.com/
>
> v1->v2:
> ------
> According to Shakeel's suggestion, expose these metric item into memory.stat
> instead of a new interface.
> https://lore.kernel.org/all/ir2s6sqi6hrbz7ghmfngbif6fbgmswhqdljlntesurfl2xvmmv@yp3w2lqyipb5/
>
> Background
> ==========
>
> With the enablement of container-level KSM (e.g., via prctl [1]), there is
> a growing demand for container-level observability of KSM behavior. However,
> current cgroup implementations lack support for exposing KSM-related metrics.
Could you be more specific why this is needed and what it will be used
for?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread* 答复: [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics
2025-09-22 8:20 ` [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics Michal Hocko
@ 2025-09-22 9:31 ` xu.xin16
2025-09-23 14:17 ` Michal Hocko
0 siblings, 1 reply; 14+ messages in thread
From: xu.xin16 @ 2025-09-22 9:31 UTC (permalink / raw)
To: mhocko
Cc: akpm, shakeel.butt, hannes, roman.gushchin, david,
chengming.zhou, muchun.song, linux-kernel, linux-mm, cgroups
> > From: xu xin <xu.xin16@zte.com.cn>
> >
> > v2->v3:
> > ------
> > Some fixes of compilation error due to missed inclusion of header or missed
> > function definition on some kernel config.
> > https://lore.kernel.org/all/202509142147.WQI0impC-lkp@intel.com/
> > https://lore.kernel.org/all/202509142046.QatEaTQV-lkp@intel.com/
> >
> > v1->v2:
> > ------
> > According to Shakeel's suggestion, expose these metric item into memory.stat
> > instead of a new interface.
> > https://lore.kernel.org/all/ir2s6sqi6hrbz7ghmfngbif6fbgmswhqdljlntesurfl2xvmmv@yp3w2lqyipb5/
> >
> > Background
> > ==========
> >
> > With the enablement of container-level KSM (e.g., via prctl [1]), there is
> > a growing demand for container-level observability of KSM behavior. However,
> > current cgroup implementations lack support for exposing KSM-related metrics.
>
> Could you be more specific why this is needed and what it will be used
> for?
Yes. Some Linux application developers or vendors are eager to deploy container-level
KSM feature in containers (docker, containerd or runc and so on). They have found
significant memory savings without needing to modify application source code as
before—for example, by adding prctl to enable KSM in the container’s startup
program. Processes within the container can inherit KSM attributes via fork,
allowing the entire container to have KSM enabled.
However, in practice, not all containers benefit from KSM’s memory savings. Some
containers may have few identical pages but incur additional memory overhead due
to excessive ksm_rmap_items generation from KSM scanning. Therefore, we need to
provide a container-level KSM monitoring method, enabling users to adjust their
strategies based on actual KSM merging performance.
> --
> Michal Hocko
> SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 答复: [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics
2025-09-22 9:31 ` 答复: " xu.xin16
@ 2025-09-23 14:17 ` Michal Hocko
0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2025-09-23 14:17 UTC (permalink / raw)
To: xu.xin16
Cc: akpm, shakeel.butt, hannes, roman.gushchin, david,
chengming.zhou, muchun.song, linux-kernel, linux-mm, cgroups
On Mon 22-09-25 17:31:58, xu.xin16@zte.com.cn wrote:
> > > From: xu xin <xu.xin16@zte.com.cn>
> > >
> > > v2->v3:
> > > ------
> > > Some fixes of compilation error due to missed inclusion of header or missed
> > > function definition on some kernel config.
> > > https://lore.kernel.org/all/202509142147.WQI0impC-lkp@intel.com/
> > > https://lore.kernel.org/all/202509142046.QatEaTQV-lkp@intel.com/
> > >
> > > v1->v2:
> > > ------
> > > According to Shakeel's suggestion, expose these metric item into memory.stat
> > > instead of a new interface.
> > > https://lore.kernel.org/all/ir2s6sqi6hrbz7ghmfngbif6fbgmswhqdljlntesurfl2xvmmv@yp3w2lqyipb5/
> > >
> > > Background
> > > ==========
> > >
> > > With the enablement of container-level KSM (e.g., via prctl [1]), there is
> > > a growing demand for container-level observability of KSM behavior. However,
> > > current cgroup implementations lack support for exposing KSM-related metrics.
> >
> > Could you be more specific why this is needed and what it will be used
> > for?
>
> Yes. Some Linux application developers or vendors are eager to deploy container-level
> KSM feature in containers (docker, containerd or runc and so on). They have found
> significant memory savings without needing to modify application source code as
> before—for example, by adding prctl to enable KSM in the container’s startup
> program. Processes within the container can inherit KSM attributes via fork,
> allowing the entire container to have KSM enabled.
>
> However, in practice, not all containers benefit from KSM’s memory savings. Some
> containers may have few identical pages but incur additional memory overhead due
> to excessive ksm_rmap_items generation from KSM scanning. Therefore, we need to
> provide a container-level KSM monitoring method, enabling users to adjust their
> strategies based on actual KSM merging performance.
So what is the strategy here? You watch the runtime behavior and then
disable KSM based on previous run? I do not think this could be changed
during the runtime, rigtht? So it would only work for the next run and
that would rely that the workload is consistent in that over re-runs
right?
I am not really convinced TBH, but not as much as to NAK this. What
concerns me a bit is that these per memcg stats are slightly different
from global ones without a very good explanation (or maybe I have just
not understood it properly).
Also the usecase sounds a bit shaky as it doesn't really give admins
great control other than a hope that a new execution of the container
will behave consistently with previous runs. I thought the whole concept
of per process KSM is based on "we know our userspace benefits" rather
than "let's try and see".
All in all I worry this will turn out not really used in the end and we
will have yet another counters to maintain without real users.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics
2025-09-21 15:07 [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics xu.xin16
` (6 preceding siblings ...)
2025-09-22 8:20 ` [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics Michal Hocko
@ 2025-09-23 8:26 ` David Hildenbrand
2025-09-23 17:58 ` Shakeel Butt
8 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2025-09-23 8:26 UTC (permalink / raw)
To: xu.xin16, akpm, shakeel.butt, hannes, mhocko, roman.gushchin
Cc: chengming.zhou, muchun.song, linux-kernel, linux-mm, cgroups
On 21.09.25 17:07, xu.xin16@zte.com.cn wrote:
> From: xu xin <xu.xin16@zte.com.cn>
>
> v2->v3:
> ------
> Some fixes of compilation error due to missed inclusion of header or missed
> function definition on some kernel config.
> https://lore.kernel.org/all/202509142147.WQI0impC-lkp@intel.com/
> https://lore.kernel.org/all/202509142046.QatEaTQV-lkp@intel.com/
>
> v1->v2:
> ------
> According to Shakeel's suggestion, expose these metric item into memory.stat
> instead of a new interface.
> https://lore.kernel.org/all/ir2s6sqi6hrbz7ghmfngbif6fbgmswhqdljlntesurfl2xvmmv@yp3w2lqyipb5/
>
> Background
> ==========
>
> With the enablement of container-level KSM (e.g., via prctl [1]), there is
> a growing demand for container-level observability of KSM behavior. However,
> current cgroup implementations lack support for exposing KSM-related metrics.
>
> So add the counter in the existing memory.stat without adding a new interface.
> To diaplay per-memcg KSM statistic counters, we traverse all processes of a
> memcg and summing the processes' ksm_rmap_items counters instead of adding enum
> item in memcg_stat_item or node_stat_item and updating the corresponding enum
> counter when ksmd manipulate pages.
>
> Now Linux users can look up all per-memcg KSM counters by:
>
> # cat /sys/fs/cgroup/xuxin/memory.stat | grep ksm
> ksm_rmap_items 0
> ksm_zero_pages 0
> ksm_merging_pages 0
> ksm_profit 0
No strong opinion from my side: seems to mostly only collect stats from
all tasks to summarize them per memcg.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics
2025-09-21 15:07 [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics xu.xin16
` (7 preceding siblings ...)
2025-09-23 8:26 ` David Hildenbrand
@ 2025-09-23 17:58 ` Shakeel Butt
8 siblings, 0 replies; 14+ messages in thread
From: Shakeel Butt @ 2025-09-23 17:58 UTC (permalink / raw)
To: xu.xin16
Cc: akpm, hannes, mhocko, roman.gushchin, david, chengming.zhou,
muchun.song, linux-kernel, linux-mm, cgroups
Hi Xu,
On Sun, Sep 21, 2025 at 11:07:26PM +0800, xu.xin16@zte.com.cn wrote:
> From: xu xin <xu.xin16@zte.com.cn>
>
> v2->v3:
> ------
> Some fixes of compilation error due to missed inclusion of header or missed
> function definition on some kernel config.
> https://lore.kernel.org/all/202509142147.WQI0impC-lkp@intel.com/
> https://lore.kernel.org/all/202509142046.QatEaTQV-lkp@intel.com/
>
> v1->v2:
> ------
> According to Shakeel's suggestion, expose these metric item into memory.stat
> instead of a new interface.
> https://lore.kernel.org/all/ir2s6sqi6hrbz7ghmfngbif6fbgmswhqdljlntesurfl2xvmmv@yp3w2lqyipb5/
>
> Background
> ==========
>
> With the enablement of container-level KSM (e.g., via prctl [1]), there is
> a growing demand for container-level observability of KSM behavior. However,
> current cgroup implementations lack support for exposing KSM-related metrics.
>
> So add the counter in the existing memory.stat without adding a new interface.
> To diaplay per-memcg KSM statistic counters, we traverse all processes of a
> memcg and summing the processes' ksm_rmap_items counters instead of adding enum
> item in memcg_stat_item or node_stat_item and updating the corresponding enum
> counter when ksmd manipulate pages.
>
> Now Linux users can look up all per-memcg KSM counters by:
>
> # cat /sys/fs/cgroup/xuxin/memory.stat | grep ksm
> ksm_rmap_items 0
> ksm_zero_pages 0
> ksm_merging_pages 0
> ksm_profit 0
>
> Q&A
> ====
> why don't I add enum item in memcg_stat_item or node_stat_item like
> other items in memory.stat ?
>
> I tried the way of adding enum item in memcg_stat_item and updating them when
> ksmd manipulate pages, but it failed with error statistic ksm counters of
> memcg. This is because of the following reasons:
>
> 1) The KSM counter of memcgroup can be correctly incremented, but cannot be
> properly decremented. E,g,, when ksmd scans pages of a process, it can use
> the mm_struct of the struct ksm_rmap_item to reverse-lookup the memcg
> and then increase the value via mod_memcg_state(memcg, MEMCG_KSM_COUNT, 1).
> However, when the process exits abruptly, since ksmd asynchronously scans
> the mmslot list in the background, it is no longer able to correctly locate
> the original memcg through mm_struct by get_mem_cgroup_from_mm(), as the
> task_struct has already been freed.
>
> 2) The first issue could potentially be addressed by adding a memcg
> pointer directly into the ksm_rmap_item structure. However, this
> increases memory overhead, especially when there are a large
> number of ksm_rmap_items in the system (due to a high volume of
> pages being scanned by ksmd). Moreover, this approach does not
> resolve the same problem for ksm_zero_pages, because updates to
> ksm_zero_pages are not performed through ksm_rmap_item, but
> rather directly during unmap or page table entry (pte) faults
> based on the mm_struct. At that point, if the process has
> already exited, the corresponding memcg can no longer be
> accurately identified.
>
Thanks for writing this up and sorry to disappoint you but this
explanation is giving me more reasons that memcg is not the right place
for these stats.
If you take a look at the memcg stats exposed through memory.stat, there
are two generally two types. First are the ones that describe the type
or property of the underlying memory and that memory is associated or
charged to the memcg e.g. anon or file or kernel (and other types)
memory. Please note that this memory lifetime can be independent from
the process that might have allocated them.
Second are the events that are faced by the processes in that
memcg like page faults, reclaim etc.
The ksm stats are about the process and not about the memcg of the
process. Process jumping from one memcg to another will take all these
stats with it. You can easily get ksm stats in userspace by traversing
/proc/pids/ksm_stats with the pids from cgroup.procs. You are just
looking for an easier way to get such stats instead of manual traversal.
I would suggest exploring cgroup iter based bpf program which can do
the stats collect and expose to userspace for a given cgroup hierarchy.
^ permalink raw reply [flat|nested] 14+ messages in thread