From: "Yasunori Gotou (Fujitsu)" <y-goto@fujitsu.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
"rafael@kernel.org" <rafael@kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com>
Subject: RE: [PATCH RFC 3/4] mm/vmstat: rename pgdemote_* to pgdemote_dst_* and add pgdemote_src_*
Date: Thu, 2 Nov 2023 07:38:19 +0000 [thread overview]
Message-ID: <TYWPR01MB1008262A8FCBBEF0331EB16FD90A6A@TYWPR01MB10082.jpnprd01.prod.outlook.com> (raw)
In-Reply-To: <fbca99e1-40c4-4ffd-a0a1-89728dd0b900@fujitsu.com>
Hello,
> On 02/11/2023 13:45, Huang, Ying wrote:
> > Li Zhijian <lizhijian@fujitsu.com> writes:
> >
> >> pgdemote_src_*: pages demoted from this node.
> >> pgdemote_dst_*: pages demoted to this node.
> >>
> >> So that we are able to know their demotion per-node stats by checking this.
> >>
> >> In the environment, node0 and node1 are DRAM, node3 is PMEM.
> >>
> >> Global stats:
> >> $ grep -E 'demote' /proc/vmstat
> >> pgdemote_src_kswapd 130155
> >> pgdemote_src_direct 113497
> >> pgdemote_src_khugepaged 0
> >> pgdemote_dst_kswapd 130155
> >> pgdemote_dst_direct 113497
> >> pgdemote_dst_khugepaged 0
> >>
> >> Per-node stats:
> >> $ grep demote /sys/devices/system/node/node0/vmstat
> >> pgdemote_src_kswapd 68454
> >> pgdemote_src_direct 83431
> >> pgdemote_src_khugepaged 0
> >> pgdemote_dst_kswapd 0
> >> pgdemote_dst_direct 0
> >> pgdemote_dst_khugepaged 0
> >>
> >> $ grep demote /sys/devices/system/node/node1/vmstat
> >> pgdemote_src_kswapd 185834
> >> pgdemote_src_direct 30066
> >> pgdemote_src_khugepaged 0
> >> pgdemote_dst_kswapd 0
> >> pgdemote_dst_direct 0
> >> pgdemote_dst_khugepaged 0
> >>
> >> $ grep demote /sys/devices/system/node/node3/vmstat
> >> pgdemote_src_kswapd 0
> >> pgdemote_src_direct 0
> >> pgdemote_src_khugepaged 0
> >> pgdemote_dst_kswapd 254288
> >> pgdemote_dst_direct 113497
> >> pgdemote_dst_khugepaged 0
> >>
> >> From above stats, we know node3 is the demotion destination which one
> >> the node0 and node1 will demote to.
> >
> > Why do we need these information? Do you have some use case?
>
> I recall our customers have mentioned that they want to know how much the
> memory is demoted
> to the CXL memory device in a specific period.
I'll mention about it more.
I had a conversation with one of our customers. He expressed a desire for more detailed
profile information to analyze the behavior of demotion (and promotion) when
his workloads are executed.
If the results are not satisfactory for his workloads, he wants to tune his servers for his workloads
with these profiles.
Additionally, depending on the results, he may want to change his server configuration.
For example, he may want to buy more expensive DDR memories rather than cheaper CXL memory.
In my impression, our customers seems to think that CXL memory is NOT as reliable as DDR memory yet.
Therefore, they want to prepare for the new world that CXL will bring, and want to have a method
for the preparation by profiling information as much as possible.
it this enough for your question?
Thanks,
>
>
> >>> mod_node_page_state(NODE_DATA(target_nid),
> >>> - PGDEMOTE_KSWAPD + reclaimer_offset(),
> nr_succeeded);
> >>> + PGDEMOTE_DST_KSWAPD + reclaimer_offset(),
> nr_succeeded);
>
> But if the *target_nid* is only indicate the preferred node, this accounting
> maybe not accurate.
>
>
> Thanks
> Zhijian
>
> >
> > --
> > Best Regards,
> > Huang, Ying
> >
> >> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> >> ---
> >> RFC: their names are open to discussion, maybe pgdemote_from/to_*
> >> Another defect of this patch is that, SUM(pgdemote_src_*) is always same
> >> as SUM(pgdemote_dst_*) in the global stats, shall we hide one of them.
> >> ---
> >> include/linux/mmzone.h | 9 ++++++---
> >> mm/vmscan.c | 13 ++++++++++---
> >> mm/vmstat.c | 9 ++++++---
> >> 3 files changed, 22 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >> index ad0309eea850..a6140d894bec 100644
> >> --- a/include/linux/mmzone.h
> >> +++ b/include/linux/mmzone.h
> >> @@ -207,9 +207,12 @@ enum node_stat_item {
> >> PGPROMOTE_SUCCESS, /* promote successfully */
> >> PGPROMOTE_CANDIDATE, /* candidate pages to promote */
> >> /* PGDEMOTE_*: pages demoted */
> >> - PGDEMOTE_KSWAPD,
> >> - PGDEMOTE_DIRECT,
> >> - PGDEMOTE_KHUGEPAGED,
> >> + PGDEMOTE_SRC_KSWAPD,
> >> + PGDEMOTE_SRC_DIRECT,
> >> + PGDEMOTE_SRC_KHUGEPAGED,
> >> + PGDEMOTE_DST_KSWAPD,
> >> + PGDEMOTE_DST_DIRECT,
> >> + PGDEMOTE_DST_KHUGEPAGED,
> >> #endif
> >> NR_VM_NODE_STAT_ITEMS
> >> };
> >> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >> index 2f1fb4ec3235..55d2287d7150 100644
> >> --- a/mm/vmscan.c
> >> +++ b/mm/vmscan.c
> >> @@ -1111,13 +1111,18 @@ void drop_slab(void)
> >> static int reclaimer_offset(void)
> >> {
> >> BUILD_BUG_ON(PGSTEAL_DIRECT - PGSTEAL_KSWAPD !=
> >> - PGDEMOTE_DIRECT - PGDEMOTE_KSWAPD);
> >> + PGDEMOTE_SRC_DIRECT -
> PGDEMOTE_SRC_KSWAPD);
> >> BUILD_BUG_ON(PGSTEAL_DIRECT - PGSTEAL_KSWAPD !=
> >> PGSCAN_DIRECT - PGSCAN_KSWAPD);
> >> BUILD_BUG_ON(PGSTEAL_KHUGEPAGED - PGSTEAL_KSWAPD !=
> >> - PGDEMOTE_KHUGEPAGED -
> PGDEMOTE_KSWAPD);
> >> + PGDEMOTE_SRC_KHUGEPAGED -
> PGDEMOTE_SRC_KSWAPD);
> >> BUILD_BUG_ON(PGSTEAL_KHUGEPAGED - PGSTEAL_KSWAPD !=
> >> PGSCAN_KHUGEPAGED - PGSCAN_KSWAPD);
> >> + BUILD_BUG_ON(PGDEMOTE_SRC_DIRECT -
> PGDEMOTE_SRC_KSWAPD !=
> >> + PGDEMOTE_DST_DIRECT -
> PGDEMOTE_DST_KSWAPD);
> >> + BUILD_BUG_ON(PGDEMOTE_SRC_KHUGEPAGED -
> PGDEMOTE_SRC_KSWAPD !=
> >> + PGDEMOTE_DST_KHUGEPAGED -
> PGDEMOTE_DST_KSWAPD);
> >> +
> >>
> >> if (current_is_kswapd())
> >> return 0;
> >> @@ -1678,8 +1683,10 @@ static unsigned int demote_folio_list(struct
> list_head *demote_folios,
> >> (unsigned long)&mtc, MIGRATE_ASYNC,
> MR_DEMOTION,
> >> &nr_succeeded);
> >>
> >> + mod_node_page_state(pgdat,
> >> + PGDEMOTE_SRC_KSWAPD + reclaimer_offset(),
> nr_succeeded);
> >> mod_node_page_state(NODE_DATA(target_nid),
> >> - PGDEMOTE_KSWAPD + reclaimer_offset(),
> nr_succeeded);
> >> + PGDEMOTE_DST_KSWAPD + reclaimer_offset(),
> nr_succeeded);
> >>
> >> return nr_succeeded;
> >> }
> >> diff --git a/mm/vmstat.c b/mm/vmstat.c
> >> index f141c48c39e4..63f106a5e008 100644
> >> --- a/mm/vmstat.c
> >> +++ b/mm/vmstat.c
> >> @@ -1244,9 +1244,12 @@ const char * const vmstat_text[] = {
> >> #ifdef CONFIG_NUMA_BALANCING
> >> "pgpromote_success",
> >> "pgpromote_candidate",
> >> - "pgdemote_kswapd",
> >> - "pgdemote_direct",
> >> - "pgdemote_khugepaged",
> >> + "pgdemote_src_kswapd",
> >> + "pgdemote_src_direct",
> >> + "pgdemote_src_khugepaged",
> >> + "pgdemote_dst_kswapd",
> >> + "pgdemote_dst_direct",
> >> + "pgdemote_dst_khugepaged",
> >> #endif
> >>
> >> /* enum writeback_stat_item counters */
next prev parent reply other threads:[~2023-11-02 7:38 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-02 2:56 Subject: [PATCH RFC 0/4] Demotion Profiling Improvements Li Zhijian
2023-11-02 2:56 ` [PATCH RFC 1/4] drivers/base/node: Add demotion_nodes sys infterface Li Zhijian
2023-11-02 3:17 ` Huang, Ying
2023-11-02 3:39 ` Zhijian Li (Fujitsu)
2023-11-02 5:18 ` Huang, Ying
2023-11-02 5:54 ` Zhijian Li (Fujitsu)
2023-11-02 5:58 ` Huang, Ying
2023-11-03 3:05 ` Zhijian Li (Fujitsu)
2024-01-30 8:53 ` Li Zhijian
2024-01-31 1:13 ` Huang, Ying
2024-01-31 3:18 ` Zhijian Li (Fujitsu)
2024-02-02 7:43 ` Zhijian Li (Fujitsu)
2024-02-02 8:19 ` Huang, Ying
2024-02-05 7:31 ` Zhijian Li (Fujitsu)
2024-01-31 6:23 ` Yasunori Gotou (Fujitsu)
2024-01-31 6:52 ` Huang, Ying
2023-11-02 2:56 ` [PATCH RFC 2/4] mm/vmstat: Move pgdemote_* to per-node stats Li Zhijian
2023-11-02 4:56 ` Huang, Ying
2023-11-02 5:43 ` Huang, Ying
2023-11-02 5:57 ` Zhijian Li (Fujitsu)
2023-11-02 2:56 ` [PATCH RFC 3/4] mm/vmstat: rename pgdemote_* to pgdemote_dst_* and add pgdemote_src_* Li Zhijian
2023-11-02 5:45 ` Huang, Ying
2023-11-02 6:34 ` Zhijian Li (Fujitsu)
2023-11-02 6:56 ` Huang, Ying
2023-11-02 7:38 ` Yasunori Gotou (Fujitsu) [this message]
2023-11-02 7:46 ` Huang, Ying
2023-11-02 9:45 ` Yasunori Gotou (Fujitsu)
2023-11-03 6:14 ` Huang, Ying
2023-11-06 5:02 ` Yasunori Gotou (Fujitsu)
2023-11-02 2:56 ` [PATCH RFC 4/4] drivers/base/node: add demote_src and demote_dst to numastat Li Zhijian
2023-11-02 5:40 ` Greg Kroah-Hartman
2023-11-02 8:15 ` Zhijian Li (Fujitsu)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=TYWPR01MB1008262A8FCBBEF0331EB16FD90A6A@TYWPR01MB10082.jpnprd01.prod.outlook.com \
--to=y-goto@fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizhijian@fujitsu.com \
--cc=rafael@kernel.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox