From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7B05C7115B for ; Tue, 24 Jun 2025 02:47:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3DD5E6B00B5; Mon, 23 Jun 2025 22:47:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 368E86B00B7; Mon, 23 Jun 2025 22:47:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A3BC6B00BA; Mon, 23 Jun 2025 22:47:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 17AD36B00B5 for ; Mon, 23 Jun 2025 22:47:03 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 252995F8F3 for ; Tue, 24 Jun 2025 02:47:02 +0000 (UTC) X-FDA: 83588757084.28.47A6FBC Received: from out30-112.freemail.mail.aliyun.com (out30-112.freemail.mail.aliyun.com [115.124.30.112]) by imf21.hostedemail.com (Postfix) with ESMTP id 399F91C0003 for ; Tue, 24 Jun 2025 02:46:58 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=RJgCFtU5; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf21.hostedemail.com: domain of ying.huang@linux.alibaba.com designates 115.124.30.112 as permitted sender) smtp.mailfrom=ying.huang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750733220; a=rsa-sha256; cv=none; b=AMkW5L1GGiUcBm73PykT5RSvvy0bljvg+WjPCj/rWGkjs3glWmfAemUsZyUVG0tHvM7IjX fWtBTw+3cxeMFq6F9USGl0eyIBXLLF/A+1gibV3lWAyJhSSjdtEOEgVSXwIjHWXrVYwrwl 1Ymth4c99noebu4bnoQOS1UnapFmW+s= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=RJgCFtU5; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf21.hostedemail.com: domain of ying.huang@linux.alibaba.com designates 115.124.30.112 as permitted sender) smtp.mailfrom=ying.huang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750733220; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EuFm/S2kcfuA/k9NxIggqP6DKi+9r/AflvcKbobw8vk=; b=X27PL+DyIY6RPAkapFuzsZuP9KeLG8RZJRAo0HqUVtvcQsoq/LwX/JG/YkbfEeoY/yWNX/ Y20qN1PIekGGg9EoTIgO09eCofzvwpyofROv9ETBEJMfZSzIKNmuN7oIpqGoB2XhfUavfv Y7N2H6omQvF/u3Y/PQryY1mJ3r7gzR0= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1750733216; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type; bh=EuFm/S2kcfuA/k9NxIggqP6DKi+9r/AflvcKbobw8vk=; b=RJgCFtU5yV4c1Zxg46KCRTUzl4o//H8mEiBK6L0+xmCGJxq+wnxpl5EiTCdrIUx8bC7nhXodKZwA+g5jicCDis0N7QEBrXIIe4wQpWvSsyCjSgREBh4iSX4wuJkzHcRQJ0f4uGHE/TKfamizEdmmYDEnihpGzR9XOsoZ0q/tUWU= Received: from DESKTOP-5N7EMDA(mailfrom:ying.huang@linux.alibaba.com fp:SMTPD_---0Weefnzt_1750733205 cluster:ay36) by smtp.aliyun-inc.com; Tue, 24 Jun 2025 10:46:55 +0800 From: "Huang, Ying" To: "Zhijian Li (Fujitsu)" Cc: "linux-mm@kvack.org" , "akpm@linux-foundation.org" , "linux-kernel@vger.kernel.org" , "Yasunori Gotou (Fujitsu)" , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , kernel test robot Subject: Re: [PATCH RFC] mm: memory-tiering: Fix PGPROMOTE_CANDIDATE accounting In-Reply-To: <47f42c60-9752-4bc6-9079-627b6e0b9cfc@fujitsu.com> (Zhijian Li's message of "Mon, 23 Jun 2025 08:54:28 +0000") References: <20250619075245.3272384-1-lizhijian@fujitsu.com> <87ldpn2afw.fsf@DESKTOP-5N7EMDA> <47f42c60-9752-4bc6-9079-627b6e0b9cfc@fujitsu.com> Date: Tue, 24 Jun 2025 10:46:44 +0800 Message-ID: <87ms9xonzf.fsf@DESKTOP-5N7EMDA> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 399F91C0003 X-Stat-Signature: cuph6wshuxt3uy8xr49dn4dkndr9oxrs X-Rspam-User: X-HE-Tag: 1750733218-377684 X-HE-Meta: U2FsdGVkX18mnjD+BvdoNUMvn/ZzTH+vPfexsZgLMz3AcUagYkBgzx6I8cGJU/DtT05jSud4G60vPxGO+qHx8hRdQhKU1VDqldalm8JDy0Xp6ua0q7Sf04k1fNResmUM1VkbRf1G2WpU5TpNyfhzR7AurH/jdXiaBVKlbr2kqvMcNJbJasODU9k03PMmf2zjPbMYaPQSiW8oKKF9r5LhSXOpw55iWhqcbq8Ld9WXWFav6Y0PbCwI2guXpi7LqRTLzhvpbgO0SUznTwaIEnNIu1LzeMPscF4C4CzOeDtrYL5ILuy2e2h48v0FAevWEfSkl8OenWpUHx9EOPQAfK0tWz4gGshFNJgRSyk46bZaeC5JmDsOW/2hgisIT3NZXRCPYdimgMo/ffsIVtVtlugKLR2lcFPxRoyuwxg1L4SVQeRT1fuxb/WD9R0xnNhG07BlTxnFkDikGM3oqrBCcC0kuUujkpDSiMafxpK1t7W9WpXTmTBqnyB/uZoRNetnlfSjbnQBQc2gauJ3lwSo/bn7oNKXU+FAkE+MCVwq4gMPLnq3I8ose60QrIwK/RoWzWdNPdQ6xdIcn6L9NZHW8yFmr6+3e/mJjxgbI4uGPXDgyteLoDCyKHdo3T/+S7f+8aPz7toinjSJj+Q+qBDHW5chyOE+U3YmJodD6PiGR3wg7gGMEPtYP0FwEP9VGPQOw8oSNqBaoUdrZAIMpbyLDISzOQX/JqjdkTNwrF8aVzE4c5Az9AvUCxOm7KO11a0j67L5pxnxIPY0Be2YP3fOewlhS8SwRq19cF18RyI0Xm5g+qDpG1v47Uz8ObqFQBEP3igVCgKkMeGnpeNCEyVwRnWgcZYAtehRw5ZH1veOHYOmJzY4qY+Zz4LTuA1pz79u6pfrqo5iX6Eo6ZT2mR7J4SsDyWqu+gY1UiA6+7GyvOi0J5jbKOUrKum8FgRFK6Kr6UhmBQz/+tvVzCCpM4DnLQp 1Hmyi/o1 2dOzLqDvOEFZxVfEDx68TaRDPc5R2tAJ646FaB/CBv1xmK2EWn6qec2LTpEBck96vgJaNRSryJLPxB88toXwBwf9FpNyFOnJ7d57NqNvyFta4GZKeRg6Q5B6KK6I4ctBbrHIa5GnFQ5E+ulw9Y14a7JED8xJiJJ6y68S50kpUttYSVzgKCMsyK+lmdkTG5onWaBDC7b8ZTk0QXL2pW2bag6TLvys6ps0qTNbXbPeCpnqPWHc/+S7JfrxqNQLcbxMgcp0I6ac0MrirNy06gcnzMltPYk4AAnyafHfWoy3DFbKYjCTCWG20Af5vLAZe/ZLndfYpEK+kgs+i5C/orJrNNNPDcB0jqGgP17B8cvyr5ddBtpbnrIp2BWxYMvZCXda502x3pIqLUS3ZyC3y1bo3oprytQl/5l0LWA37+rrvzZ/7xwo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: "Zhijian Li (Fujitsu)" writes: > On 20/06/2025 14:28, Huang, Ying wrote: >> Li Zhijian writes: >> >>> Goto-san reported confusing pgpromote statistics where >>> the pgpromote_success count significantly exceeded pgpromote_candidate. >>> The issue manifests under specific memory pressure conditions: >>> when top-tier memory (DRAM) is exhausted by memhog and allocation begins >>> in lower-tier memory (CXL). After terminating memhog, the stats show: >> >> The above description is confusing. The page promotion occurs when the >> size of the top-tier free space is large enough (after killing the >> memhog above). The accessed lower-tier memory will be promoted upon >> accessing to take full advantage of the more expensive top-tier memory. > > Yeah, that's what the promotion does. > > Let's clarify the reproducer steps specifically(thanks Goto-san for the reproducer): > On a system with three nodes (nodes 0-1: DRAM 4GB, node 2: NVDIMM 4GB): > > # Enable demotion only > echo 1 > /sys/kernel/mm/numa/demotion_enabled > numactl -m 0-1 memhog -r200 3500M >/dev/null & > pid=$! > sleep 2 > numactl memhog -r100 2500M >/dev/null & > sleep 10 > kill -9 $pid > # Enable promotion > echo 2 > /proc/sys/kernel/numa_balancing > > # After a few seconds, we observe `pgpromote_candidate < pgpromote_success` > > In this scenario, after terminating the first memhog, the conditions > for pgdat_free_space_enough() are quickly met, triggering promotion. > However, these migrated pages are only accounted for in PGPROMOTE_SUCCESS, not in PGPROMOTE_CANDIDATE. Yes. This is the expected behavior of current implementation. > >> >>> $ grep -e pgpromote /proc/vmstat >>> pgpromote_success 2579 >>> pgpromote_candidate 1 >>> >>> This update increments PGPROMOTE_CANDIDATE within the free space branch >>> when a promotion decision is made, which may alter the mechanism of the >>> rate limit. Consequently, it becomes easier to reach the rate limit than >>> it was previously. >>> >>> For example: >>> Rate Limit = 100 pages/sec >>> Scenario: >>> T0: 90 free-space migrations >>> T0+100ms: 20-page migration request >>> >>> Before: >>> Rate limit is *not* reached: 0 + 20 = 20 < 100 >>> PGPROMOTE_CANDIDATE: 20 >>> After: >>> Rate limit is reached: 90 + 20 = 110 > 100 >>> PGPROMOTE_CANDIDATE: 110 >> >> Yes. The rate limit will be influenced by the change. So, more tests >> may be needed to verify it will not incurs regressions. > > > Testing this might be challenging due to workload dependencies. Do you > have any recommended workloads for evaluation? Some in-memory database should be good workloads, for example, redis, etc. > Alternatively, could we could rely on the LKP project for impact assessment(Current patch has not really tested > by LKP due to a compiling error, I will post a V2 soon). LKP has some basic workload to test this, for example, pmbench with Gauss-ih access pattern. > However, regarding the rate limit change itself, I consider this patch > logically correct. As stated in the numa_promotion_rate_limit() > comment: >> "For memory tiering mode, too high promotion/demotion throughput may hurt application latency." > It seems there is no justification for excluding > pgdat_free_space_enough() triggered promotions from the rate limiting > mechanism. In fact, we don't rate limit promotion if there are enough free space on fast memory to fill the fast memory quickly. I think that it's necessary to prevent the fast memory from under-utilized ASAP. > > >> >>> >>> Reported-by: Yasunori Gotou (Fujitsu) >>> Signed-off-by: Li Zhijian [snip] --- Best Regards, Huang, Ying