From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33FB7C54E58 for ; Mon, 25 Mar 2024 05:01:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A0FF86B0092; Mon, 25 Mar 2024 01:01:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9998F6B0093; Mon, 25 Mar 2024 01:01:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7EBA16B0095; Mon, 25 Mar 2024 01:01:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 682FC6B0092 for ; Mon, 25 Mar 2024 01:01:29 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 2DD9640868 for ; Mon, 25 Mar 2024 05:01:29 +0000 (UTC) X-FDA: 81934363098.02.B3CA636 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf30.hostedemail.com (Postfix) with ESMTP id 3416380002 for ; Mon, 25 Mar 2024 05:01:25 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=SJUnY1PO; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf30.hostedemail.com: domain of donettom@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=donettom@linux.ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711342886; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N2k9wSgo9UX01/19bcUNfkgoAa0/3oF73J+1H/ePw/I=; b=bRdmOdVJM9hVpBerqGZBj0IoTzrrLZqp50dUGUJXqd18SfxUGuUYto4q/5bXgL8cRQnBKw 1BHepG+C02ArSlh64p9tGSkYh+ihGpfPdEKna9gWFgCqQU2ejnSW22gViO7Ca1tCNzmP7n SSWHJUbWIttsP/9wAJ2L/klkDekM+GA= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=SJUnY1PO; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf30.hostedemail.com: domain of donettom@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=donettom@linux.ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711342886; a=rsa-sha256; cv=none; b=VfqwVdEGjTOdaSCjMWMNbxCcftFmw3QvvZ3LpssSmAybDE+zrs7V4EdXdGgDIE5qD8JS2k Q3ORFN0rqSpKvTrwNkjc5X/o37au+rLH5YZtCQayK0Eke40P2bXl/VjrIaEJIDqyFQRoKb Vfz1w4CL9RCmDOtbmA5lCPR9M1A5d3o= Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 42P3cDQR000437; Mon, 25 Mar 2024 05:00:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=N2k9wSgo9UX01/19bcUNfkgoAa0/3oF73J+1H/ePw/I=; b=SJUnY1POALicvQsMTys6mELXiORyOS09CCF6slIYipE2NVttlwxLCFsmxgfkzMqCgAlO 2f+HeWG5o/yaUFGqHlgn/2yj8BlgQN/ZbGLbWmmorVsqhrrzpoI4YOQpHsq//Gd0B0fp 20Ak9RZQkRR9FiDwSy0jkRo/kgS8IL5cEmASM4pEb/cGEUDAsC2lZSwGUyJ1wNY0ARb1 hIPgoSB96BFMgdoZgDWVS2JDB1SnMFmlUBQpSATJCH+iWODGipFVKVncKTKq4I5kPEZ5 93OdNQ8p0+UTAJYeRhEbO/uQVzhCgqQ05AbWEHYx3xH9t7UoEbaxEkW/hevwt70Q2xqM Yg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x2h6t1fak-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Mar 2024 05:00:45 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 42P50iY3023005; Mon, 25 Mar 2024 05:00:44 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x2h6t1faj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Mar 2024 05:00:44 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 42P4JnVh016410; Mon, 25 Mar 2024 05:00:43 GMT Received: from smtprelay07.wdc07v.mail.ibm.com ([172.16.1.74]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3x29dtq6dw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Mar 2024 05:00:43 +0000 Received: from smtpav04.dal12v.mail.ibm.com (smtpav04.dal12v.mail.ibm.com [10.241.53.103]) by smtprelay07.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 42P50ebo24707822 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 25 Mar 2024 05:00:42 GMT Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 898B958067; Mon, 25 Mar 2024 05:00:40 +0000 (GMT) Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EDF4E58077; Mon, 25 Mar 2024 05:00:33 +0000 (GMT) Received: from [9.109.245.191] (unknown [9.109.245.191]) by smtpav04.dal12v.mail.ibm.com (Postfix) with ESMTP; Mon, 25 Mar 2024 05:00:33 +0000 (GMT) Message-ID: <29936a4b-95c3-4592-8eae-7d4741e4a51f@linux.ibm.com> Date: Mon, 25 Mar 2024 10:30:32 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 2/2] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy Content-Language: en-US To: "Huang, Ying" Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Aneesh Kumar , Michal Hocko , Dave Hansen , Mel Gorman , Feng Tang , Andrea Arcangeli , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , Matthew Wilcox , Vlastimil Babka , Dan Williams , Hugh Dickins , Kefeng Wang , Suren Baghdasaryan References: <87h6gyr7jf.fsf@yhuang6-desk2.ccr.corp.intel.com> <875xxbqb51.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Donet Tom In-Reply-To: <875xxbqb51.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: E2dXiDVVREiPB9wvERfQdEXSxnh3LQXD X-Proofpoint-ORIG-GUID: oMUQTqu-LlmdFFm-_qnRf5hDuc3QCurp X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-25_02,2024-03-21_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 priorityscore=1501 malwarescore=0 adultscore=0 suspectscore=0 clxscore=1015 phishscore=0 bulkscore=0 lowpriorityscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2403210000 definitions=main-2403250026 X-Rspam-User: X-Stat-Signature: ar8afzphapnyfebaf9m7x3dexw1fnpk8 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 3416380002 X-HE-Tag: 1711342884-197980 X-HE-Meta: U2FsdGVkX1+ouBZOd9RLnGk9UP+bSU0ICMDDYmSxO9TqyDw4WVaD0xwtiEzTMSdEaUhS+HHCCu37VvUvIpra//5cc91KFKuwU5kNCEiJplhZx+/33Hxg1ZbE6SlrklAeOSWltSEfQoLQIQgzCiBxZfZOEt257K+xqHZKBIZv6xvxZO2pwbGNdWYlaUIXsSa13q1M0v+/Eaq92I2ZHBbhTnazM0+v5GkskX6XFpnIIDD1EKysiYDP7xmljKxTkPUqa+9N1YOV1C4yDa0eLsCwwghNOE4NTb/bYgMVNwu5utLSbZNUXN5Ybc3T/Mq4tV9A6C/ia828SRrECpK97AMqHuQsE9ggYTx2dI4vAc0NndutxyW5OxqKgtymAJZhDSSBlMj4CaT925ZtrXFNDJ6swrNpB5A0fZtqGBEuHCaDCp3zJcAUkcpU97p7f8VQi+2RC8witj/nSVEXg7HTKLafl71Hxc0MShMGHr/RGuElIh2tthxsz40M5PKr+tt1rPWSq3Bp1knHRZ40qNQ60HdptJLHu/aXizqz/GhytKj0lH6Iqa94jvm7eoWqeeCDfSUGNCmTjw78Vt04d/W4KFH5nvlzZmZK+F1JYR6HervCVcekBNOK24k0mJNRFXwpO50VMo22j8suKMr12QYHl97cJAbMh5RqK8FI/x44dim85zcvllY6m4OPpyek2fgNoKBZncXjIT4RDjGgyh08AyQyq9AbshbZhHz4dnM5VYCHPJUOo9rdMqdOmyLPabO1avxNYQZX+tZ5+n/8RlRv/c78gMn5wLiZqdFWSXZ3IYuCK7ZZ99wHvuVmVOv32yCvlxGCxgh8kIbdod7BdVy3+qTXBTHjehndPrtjnP4Tt68JFd9GyYNU2tj2nWP2uCJ57cJpIf8UvHHW43LOTfXy4aiEYXdAGwjHVU0TwtET8luGDGFiAxSY2vTmFVt8Rkdl2VO93FAJIahJa0FZpur+LtF penos3Gf ImIsN9qsLdVlqIetF1KJXSsuKf2KGU3HIXUzvys+w0Aa3U0oTm19OmNBdHVzW5sJE4uSOLJd1qjqPfgiA95ohpt2IiRoRw6VMeiGxF8UUyEyb0H0lwBvjZ8zvPpwK15zU0RsAKAHITUMeG0g1NLkyIgdp5UbHvxXp8/eefnLt3rGCDUJ7K5YhQIR2hpgqsAhLqphFD3QQXHysdzjTHQGmqtmKyamRYY69HK9BmkPe1MMIi0LSi0+6Iios3edXig0fI14GdEzn8CTp/1LhmuGwkfhWgfP/QDP2b6b3hZdoM9l2qI8K6lHV+TMMo0B0sJMtllhhEUBDXPichLQi45jjkcO2j+LV5pgjtwHt1fiMfi1wpBE2kHs8RsX2CmTqd0FLXGi3 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 3/25/24 08:18, Huang, Ying wrote: > Donet Tom writes: > >> On 3/22/24 14:02, Huang, Ying wrote: >>> Donet Tom writes: >>> >>>> commit bda420b98505 ("numa balancing: migrate on fault among multiple bound >>>> nodes") added support for migrate on protnone reference with MPOL_BIND >>>> memory policy. This allowed numa fault migration when the executing node >>>> is part of the policy mask for MPOL_BIND. This patch extends migration >>>> support to MPOL_PREFERRED_MANY policy. >>>> >>>> Currently, we cannot specify MPOL_PREFERRED_MANY with the mempolicy flag >>>> MPOL_F_NUMA_BALANCING. This causes issues when we want to use >>>> NUMA_BALANCING_MEMORY_TIERING. To effectively use the slow memory tier, >>>> the kernel should not allocate pages from the slower memory tier via >>>> allocation control zonelist fallback. Instead, we should move cold pages >>>> from the faster memory node via memory demotion. For a page allocation, >>>> kswapd is only woken up after we try to allocate pages from all nodes in >>>> the allocation zone list. This implies that, without using memory >>>> policies, we will end up allocating hot pages in the slower memory tier. >>>> >>>> MPOL_PREFERRED_MANY was added by commit b27abaccf8e8 ("mm/mempolicy: add >>>> MPOL_PREFERRED_MANY for multiple preferred nodes") to allow better >>>> allocation control when we have memory tiers in the system. With >>>> MPOL_PREFERRED_MANY, the user can use a policy node mask consisting only >>>> of faster memory nodes. When we fail to allocate pages from the faster >>>> memory node, kswapd would be woken up, allowing demotion of cold pages >>>> to slower memory nodes. >>>> >>>> With the current kernel, such usage of memory policies implies we can't >>>> do page promotion from a slower memory tier to a faster memory tier >>>> using numa fault. This patch fixes this issue. >>>> >>>> For MPOL_PREFERRED_MANY, if the executing node is in the policy node >>>> mask, we allow numa migration to the executing nodes. If the executing >>>> node is not in the policy node mask, we do not allow numa migration. >>> Can we provide more information about this? I suggest to use an >>> example, for instance, pages may be distributed among multiple sockets >>> unexpectedly. >> Thank you for your suggestion. However, this commit message explains all the scenarios. > Yes. The commit message is correct and covers many cases. What I > suggested is to describe why we do that? An examples can not covers all > possibility, but it is easy to be understood. For example, something as > below? > > For example, on a 2-sockets system, there are N0, N1, N2 in socket 0, N3 > in socket 1. N0, N1, N3 have fast memory and CPU, while N2 has slow > memory and no CPU. For a workload, we may use MPOL_PREFERRED_MANY with > nodemask with N0 and N1 set because the workload runs on CPUs of socket > 0 at most times. Then, even if the workload runs on CPUs of N3 > occasionally, we will not try to migrate the workload pages from N2 to > N3 because users may want to avoid cross-socket access as much as > possible in the long term. Thank you. I will change the commit message and post V4. Thanks Donet Tom > >> For example, Consider a system with 3 numa nodes (N0,N1 and N6). >> N0 and N1 are tier1 DRAM nodes  and N6 is tier 2 PMEM node. >> >> Scenario 1: The process is executing on N1, >> If the executing node is in the policy node mask, >> Curr Loc Pages - The numa node where page present(folio node) >> ================================================================================== >> Process    Policy          Curr Loc Pages      Observations >> ----------------------------------------------------------------------------------- >> N1           N0 N1 N6              N0                   Pages Migrated from N0 to N1 >> N1           N0 N1 N6              N6                   Pages Migrated from N6 to N1 >> N1           N0 N1                 N1                   Pages Migrated from N1 to N6 > Pages are not Migrating ? > >> N1           N0 N1                 N6                   Pages Migrated from N6 to N1 >> ------------------------------------------------------------------------------------ >> Scenario 2:  The process is executing on N1, >> If the executing node is NOT in the policy node mask, >> Curr Loc Pages - The numa node where page present(folio node) >> =================================================================================== >> Process    Policy       Curr Loc Pages    Observations >> ----------------------------------------------------------------------------------- >> N1            N0 N6             N0              Pages are not Migrating >> N1            N0 N6             N6              Pages are not migration, >> N1            N0                N0              Pages are not Migrating >> ------------------------------------------------------------------------------------ >> >> Scenario 3: The process is executing on N1, >> If the executing node and folio nodes are  NOT in the policy node mask, >> Curr Loc Pages - The numa node where page present (folio node) >> ==================================================================================== >> Thread    Policy       Curr Loc Pages        Observations >> ------------------------------------------------------------------------------------ >> N1          N0               N6                 Pages are not Migrating >> N1          N6               N0                 Pages are not Migrating >> ------------------------------------------------------------------------------------ >> >> We can conclude that even if the pages are distributed among multiple sockets, >> if the executing node is in the policy node mask, we allow numa migration to the >> executing nodes. If the executing node is not in the policy node mask, >> we do not allow numa migration. >> > [snip] > > -- > Best Regards, > Huang, Ying