From: Shakeel Butt <shakeel.butt@linux.dev>
To: xu.xin16@zte.com.cn
Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@kernel.org,
roman.gushchin@linux.dev, david@redhat.com,
chengming.zhou@linux.dev, muchun.song@linux.dev,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
cgroups@vger.kernel.org
Subject: Re: [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics
Date: Tue, 23 Sep 2025 10:58:47 -0700 [thread overview]
Message-ID: <4sunwlleii5mrlwvnio4rm4uvrngzcdbsig7xer3ytyixpu543@7dlwpeeocjbl> (raw)
In-Reply-To: <20250921230726978agBBWNsPLi2hCp9Sxed1Y@zte.com.cn>
Hi Xu,
On Sun, Sep 21, 2025 at 11:07:26PM +0800, xu.xin16@zte.com.cn wrote:
> From: xu xin <xu.xin16@zte.com.cn>
>
> v2->v3:
> ------
> Some fixes of compilation error due to missed inclusion of header or missed
> function definition on some kernel config.
> https://lore.kernel.org/all/202509142147.WQI0impC-lkp@intel.com/
> https://lore.kernel.org/all/202509142046.QatEaTQV-lkp@intel.com/
>
> v1->v2:
> ------
> According to Shakeel's suggestion, expose these metric item into memory.stat
> instead of a new interface.
> https://lore.kernel.org/all/ir2s6sqi6hrbz7ghmfngbif6fbgmswhqdljlntesurfl2xvmmv@yp3w2lqyipb5/
>
> Background
> ==========
>
> With the enablement of container-level KSM (e.g., via prctl [1]), there is
> a growing demand for container-level observability of KSM behavior. However,
> current cgroup implementations lack support for exposing KSM-related metrics.
>
> So add the counter in the existing memory.stat without adding a new interface.
> To diaplay per-memcg KSM statistic counters, we traverse all processes of a
> memcg and summing the processes' ksm_rmap_items counters instead of adding enum
> item in memcg_stat_item or node_stat_item and updating the corresponding enum
> counter when ksmd manipulate pages.
>
> Now Linux users can look up all per-memcg KSM counters by:
>
> # cat /sys/fs/cgroup/xuxin/memory.stat | grep ksm
> ksm_rmap_items 0
> ksm_zero_pages 0
> ksm_merging_pages 0
> ksm_profit 0
>
> Q&A
> ====
> why don't I add enum item in memcg_stat_item or node_stat_item like
> other items in memory.stat ?
>
> I tried the way of adding enum item in memcg_stat_item and updating them when
> ksmd manipulate pages, but it failed with error statistic ksm counters of
> memcg. This is because of the following reasons:
>
> 1) The KSM counter of memcgroup can be correctly incremented, but cannot be
> properly decremented. E,g,, when ksmd scans pages of a process, it can use
> the mm_struct of the struct ksm_rmap_item to reverse-lookup the memcg
> and then increase the value via mod_memcg_state(memcg, MEMCG_KSM_COUNT, 1).
> However, when the process exits abruptly, since ksmd asynchronously scans
> the mmslot list in the background, it is no longer able to correctly locate
> the original memcg through mm_struct by get_mem_cgroup_from_mm(), as the
> task_struct has already been freed.
>
> 2) The first issue could potentially be addressed by adding a memcg
> pointer directly into the ksm_rmap_item structure. However, this
> increases memory overhead, especially when there are a large
> number of ksm_rmap_items in the system (due to a high volume of
> pages being scanned by ksmd). Moreover, this approach does not
> resolve the same problem for ksm_zero_pages, because updates to
> ksm_zero_pages are not performed through ksm_rmap_item, but
> rather directly during unmap or page table entry (pte) faults
> based on the mm_struct. At that point, if the process has
> already exited, the corresponding memcg can no longer be
> accurately identified.
>
Thanks for writing this up and sorry to disappoint you but this
explanation is giving me more reasons that memcg is not the right place
for these stats.
If you take a look at the memcg stats exposed through memory.stat, there
are two generally two types. First are the ones that describe the type
or property of the underlying memory and that memory is associated or
charged to the memcg e.g. anon or file or kernel (and other types)
memory. Please note that this memory lifetime can be independent from
the process that might have allocated them.
Second are the events that are faced by the processes in that
memcg like page faults, reclaim etc.
The ksm stats are about the process and not about the memcg of the
process. Process jumping from one memcg to another will take all these
stats with it. You can easily get ksm stats in userspace by traversing
/proc/pids/ksm_stats with the pids from cgroup.procs. You are just
looking for an easier way to get such stats instead of manual traversal.
I would suggest exploring cgroup iter based bpf program which can do
the stats collect and expose to userspace for a given cgroup hierarchy.
prev parent reply other threads:[~2025-09-23 17:58 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-21 15:07 xu.xin16
2025-09-21 15:08 ` [PATCH linux-next v3 1/6] memcg: add per-memcg ksm_rmap_items stat xu.xin16
2025-09-23 8:28 ` David Hildenbrand
2025-09-21 15:11 ` [PATCH linux-next v3 2/6] memcg: show ksm_zero_pages count in memory.stat xu.xin16
2025-09-21 15:12 ` [PATCH linux-next v3 3/6] memcg: show ksm_merging_pages xu.xin16
2025-09-21 15:13 ` [PATCH linux-next v3 4/6] ksm: make ksm_process_profit available on CONFIG_PROCFS=n xu.xin16
2025-09-23 8:25 ` David Hildenbrand
2025-09-21 15:14 ` [PATCH linux-next v3 5/6] memcg: add per-memcg ksm_profit xu.xin16
2025-09-21 15:15 ` [PATCH linux-next v3 6/6] Documentation: add KSM statistic counters description xu.xin16
2025-09-22 8:20 ` [PATCH linux-next v3 0/6] memcg: Support per-memcg KSM metrics Michal Hocko
2025-09-22 9:31 ` 答复: " xu.xin16
2025-09-23 14:17 ` Michal Hocko
2025-09-23 8:26 ` David Hildenbrand
2025-09-23 17:58 ` Shakeel Butt [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4sunwlleii5mrlwvnio4rm4uvrngzcdbsig7xer3ytyixpu543@7dlwpeeocjbl \
--to=shakeel.butt@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=chengming.zhou@linux.dev \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=xu.xin16@zte.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox