From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8C44C4332F for ; Fri, 3 Nov 2023 06:16:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1EEC88D00BB; Fri, 3 Nov 2023 02:16:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 19F088D000F; Fri, 3 Nov 2023 02:16:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 066A58D00BB; Fri, 3 Nov 2023 02:16:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id EC5E88D000F for ; Fri, 3 Nov 2023 02:16:31 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C07D2402A6 for ; Fri, 3 Nov 2023 06:16:31 +0000 (UTC) X-FDA: 81415633782.22.B6E580F Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.88]) by imf25.hostedemail.com (Postfix) with ESMTP id 564CCA0008 for ; Fri, 3 Nov 2023 06:16:27 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Eu1J8ugn; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf25.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698992189; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8X/3qrC41oYVHYM5kowu5FLM6u+073L3SeXuFK59GBc=; b=oQCq7bwb+8uV65Zdjrfp3cf270ZEoMGSJdddUr8fhbIxTx5prPyUWrz98gF7i9fico6S9e i8lh6civ85TuEOY0G0eZlIo2gAQJ7L60gk0WVBE4N5JJYNAHfQbxgFDdeNiHbamLCvHSay cCycS8Jk/D/bgQamQEBTFbjzWXWwA48= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Eu1J8ugn; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf25.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698992189; a=rsa-sha256; cv=none; b=A4LOOJ5S8f4iZ+d7D5wFZJGBWEWpHJoLPRUd03bio7YDJRSVfPl7ZT+JrREv1RtyFlWYJ1 r7HS+Pn3BzwO5KNF6p09gxFEt61NK5dZiUc94hqMsZGx2reSwJH72ZTWcpPZ4Z7m0VjDHB fticwmIri3iq2iif8Acjl0gPfr1IWIg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698992188; x=1730528188; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=9jHkBKj9vFoE1PXdMlM0Bvfm8ZcZurSAFkjdr/YE9pY=; b=Eu1J8ugnB3HedDQ3NmaYEs/3xQb66X0gX/qhnIHTNvBziv2Ge7gMWrF6 1hgnYn6HPrYmza7ZcWYO9aCiemwPAsztw12AcU1OdWv+LXfANeuCqsk94 SOFIbMIJruwI+Lyc1OCI8/YmS3PG14B0ejOwVLBqGpHeG/G7ygX+UpKWR pDHFo0+OnZTJ8bInzw4AeIu1RgEzbZnJkpEVRVq61s/dZN86E36PNda3s Y64JzZJyt/JFTI/oLyS28usl/En0/X47XMF+ZuNeuRHjRz/0XOoRizyVH PwBjYBu24PXv9XsKJelTH7kDNJOL4AX0t5uLn7szN4LRSTilBstnYAsFq g==; X-IronPort-AV: E=McAfee;i="6600,9927,10882"; a="419995023" X-IronPort-AV: E=Sophos;i="6.03,273,1694761200"; d="scan'208";a="419995023" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2023 23:16:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10882"; a="935012661" X-IronPort-AV: E=Sophos;i="6.03,273,1694761200"; d="scan'208";a="935012661" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2023 23:16:23 -0700 From: "Huang, Ying" To: "Yasunori Gotou (Fujitsu)" Cc: Andrew Morton , Greg Kroah-Hartman , "rafael@kernel.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "Zhijian Li (Fujitsu)" Subject: Re: [PATCH RFC 3/4] mm/vmstat: rename pgdemote_* to pgdemote_dst_* and add pgdemote_src_* In-Reply-To: (Yasunori Gotou's message of "Thu, 2 Nov 2023 09:45:38 +0000") References: <20231102025648.1285477-1-lizhijian@fujitsu.com> <20231102025648.1285477-4-lizhijian@fujitsu.com> <87r0l81zfd.fsf@yhuang6-desk2.ccr.corp.intel.com> <871qd81ttm.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Fri, 03 Nov 2023 14:14:21 +0800 Message-ID: <87sf5nz7lu.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 564CCA0008 X-Stat-Signature: emtinwifomp49iuw5ugj5edjcg9s4tbx X-Rspam-User: X-HE-Tag: 1698992187-67087 X-HE-Meta: U2FsdGVkX1+jptVQOljNuwvmNoUVfMgRyNr0rN4OX3HL4l9E9LR+2d4UPn7onXgeZDOv7zSumwksTESd2T/5ylJkjW9xQOPMVbUEd+JuHD4s4+xQPshiv65Rt6w/ydKPRWd3afljNF4uWfQbC478pjSw51y8jI0K+870UV9rB2N30Q5Wvc9R/o9vc8fU5Uu8zx9JXtQAIWzHnZidgENBsoOx+d5hA/6F6ssfTBvEjwPvY00a4bLFTtwKHgAX31q7th0FQmmALgnQfuNp/DMcHQpRDXpzJwDFa62vinc3d8Qe3/t47Uwzjwl1q4NRrd7YpnlavOelBdcmXW/GL/VuJGMjfTTBYJTD/aGVS3f4hPCa6LtwQXq3pvNUlBn5chhog9IWEi25KeZ0Pp/u17CYj9NvCQI+eUenLNlkPYJ4jRM9TeV22+KYkLSMFpEkaDk6oyP/tEUzQJaM5GVC0CacJQcW45Pa0TYhGeZrHxQ3FG7fDx21B3qAb04hLToK+AYq82ddzNd9/eVVR7G9n+pIVSQrG2GwVEUEmQMYgBpjOX5GGJ+QullKxtowv2ORQvibiwZRH00fu5wtcOqjJ8WB7EC+B6OVrx2yNRsVgI8ipzXjT1/vVBulMw+VwLy+wfZu/JdXHcCC9LEfrNugyIdekZ5vx0qr7EnxaLVpqt3buqqNvHW3n3f6HmpfsVhpWFYr6cfkxXUlcfZRLdAA00h18tcSlD4cre2J+latteQW05GNpLIwtfLL+Q37++eUCA9fb2Qyj6dHCBvvHRALhPsYF0uL8XYipgplFc/OPUQivQojY+IN5o2zYAsfAuJTZpwrFAb+meBiuY/rR1fM8dsQeNNMdD80c3gQAwxrprnlDj1WZCOCQCNdxAcDKLPvb1vR2/MhZDWy7F8gy7Tf2RaYFJ0Jl4xA3QvrUvQgzrohF4uO8T/IpWrrkBmwuyupEsbA49IMZcxW5u9YX8CFJS+ ke5rrEa9 /vj3TaYx5w9PkqQNwj2WWGsuRsEAx+lG13ayqQ5JQLaoFCAh2ZOaOR0Xmv7wxNzUyMqnfNswVBMfGu9pxZJvWnvLQpcDWw1TRjZuliuOu898vaYKDtY+ZHiUcSnnWJgodc3swVlqyIWETSAKiI1a7tePkaQnWwLoEtx0UFEpfuFU1vjOpI+trXaDp2hi+0eSvHMLOlwhfHHJJCWMTvGcRTRprgj9uFcBjfW+hnbQQycuX2JDPfQWshHGQx8lgvMvgU/qW X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: "Yasunori Gotou (Fujitsu)" writes: >> > Hello, >> > >> >> On 02/11/2023 13:45, Huang, Ying wrote: >> >> > Li Zhijian writes: >> >> > >> >> >> pgdemote_src_*: pages demoted from this node. >> >> >> pgdemote_dst_*: pages demoted to this node. >> >> >> >> >> >> So that we are able to know their demotion per-node stats by checking >> this. >> >> >> >> >> >> In the environment, node0 and node1 are DRAM, node3 is PMEM. >> >> >> >> >> >> Global stats: >> >> >> $ grep -E 'demote' /proc/vmstat >> >> >> pgdemote_src_kswapd 130155 >> >> >> pgdemote_src_direct 113497 >> >> >> pgdemote_src_khugepaged 0 >> >> >> pgdemote_dst_kswapd 130155 >> >> >> pgdemote_dst_direct 113497 >> >> >> pgdemote_dst_khugepaged 0 >> >> >> >> >> >> Per-node stats: >> >> >> $ grep demote /sys/devices/system/node/node0/vmstat >> >> >> pgdemote_src_kswapd 68454 >> >> >> pgdemote_src_direct 83431 >> >> >> pgdemote_src_khugepaged 0 >> >> >> pgdemote_dst_kswapd 0 >> >> >> pgdemote_dst_direct 0 >> >> >> pgdemote_dst_khugepaged 0 >> >> >> >> >> >> $ grep demote /sys/devices/system/node/node1/vmstat >> >> >> pgdemote_src_kswapd 185834 >> >> >> pgdemote_src_direct 30066 >> >> >> pgdemote_src_khugepaged 0 >> >> >> pgdemote_dst_kswapd 0 >> >> >> pgdemote_dst_direct 0 >> >> >> pgdemote_dst_khugepaged 0 >> >> >> >> >> >> $ grep demote /sys/devices/system/node/node3/vmstat >> >> >> pgdemote_src_kswapd 0 >> >> >> pgdemote_src_direct 0 >> >> >> pgdemote_src_khugepaged 0 >> >> >> pgdemote_dst_kswapd 254288 >> >> >> pgdemote_dst_direct 113497 >> >> >> pgdemote_dst_khugepaged 0 >> >> >> >> >> >> From above stats, we know node3 is the demotion destination which >> >> >> one the node0 and node1 will demote to. >> >> > >> >> > Why do we need these information? Do you have some use case? >> >> >> >> I recall our customers have mentioned that they want to know how much >> >> the memory is demoted to the CXL memory device in a specific period. >> > >> > I'll mention about it more. >> > >> > I had a conversation with one of our customers. He expressed a desire >> > for more detailed profile information to analyze the behavior of >> > demotion (and promotion) when his workloads are executed. >> > If the results are not satisfactory for his workloads, he wants to >> > tune his servers for his workloads with these profiles. >> > Additionally, depending on the results, he may want to change his server >> configuration. >> > For example, he may want to buy more expensive DDR memories rather than >> cheaper CXL memory. >> > >> > In my impression, our customers seems to think that CXL memory is NOT as >> reliable as DDR memory yet. >> > Therefore, they want to prepare for the new world that CXL will bring, >> > and want to have a method for the preparation by profiling information as >> much as possible. >> > >> > it this enough for your question? >> >> I want some more detailed information about how these stats are used? >> Why isn't per-node pgdemote_xxx counter enough? > > I rechecked the customer's original request. > > - If a memory area is demoted to a CXL memory node, he wanted to analyze how it affects performance > of their workload, such as latency. He wanted to use CXL Node memory usage as basic > information for the analysis. > > - If he notices that demotion occurs well on a server and CXL memories are used 85% constantly, he > may want to add DDR DRAM or select some other ways to avoid demotion. > (His image is likely Swap free/used.) > IIRC, demotion target is not spread to all of the CXL memory node, right? > Then, he needs to know how CXL memory is occupied by demoted memory. > > If I misunderstand something, or you have any better idea, > please let us know. I'll talk with him again. (It will be next week...) To check CXL memory usage, /proc/PID/numa_maps, /sys/fs/cgroup/CGROUP/memory.numa_stat, and /sys/devices/system/node/nodeN/meminfo can be used for process, cgroup, and NUMA node respectively. Is this enough? -- Best Regards, Huang, Ying >> > >> >> >> >> >> >> >>> mod_node_page_state(NODE_DATA(target_nid), >> >> >>> - PGDEMOTE_KSWAPD + reclaimer_offset(), >> >> nr_succeeded); >> >> >>> + PGDEMOTE_DST_KSWAPD + reclaimer_offset(), >> >> nr_succeeded); >> >> >> >> But if the *target_nid* is only indicate the preferred node, this >> >> accounting maybe not accurate. >> >> [snip]