From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4572C54E64 for ; Mon, 25 Mar 2024 05:03:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 46C4F6B0095; Mon, 25 Mar 2024 01:03:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F4746B0096; Mon, 25 Mar 2024 01:03:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 26E376B0098; Mon, 25 Mar 2024 01:03:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0F0006B0095 for ; Mon, 25 Mar 2024 01:03:09 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A554DC08BC for ; Mon, 25 Mar 2024 05:03:08 +0000 (UTC) X-FDA: 81934367256.22.23734F9 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf22.hostedemail.com (Postfix) with ESMTP id 39EE3C0011 for ; Mon, 25 Mar 2024 05:03:06 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=eFwZh5Zu; spf=pass (imf22.hostedemail.com: domain of donettom@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=donettom@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711342986; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qgEOJHHINd6gVGciNHxGiBy5AgfGlWmbQJemalMcRNo=; b=8sAozHmPXUfSs4vG1LaVZXXaNkf0g5L+RGLgO04O2lNL4j+06dh2vQN09TQpsPJs1ToXgY /c/tm+JpEfK9O8DbqaWY7Nx94h1kF1bTPGmIiLWOLjbcksp0TmlIHF60Fc1r9dkYTCJNmi hITEgVvjowiZ1RUn507Y0ToLp3d9wWQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711342986; a=rsa-sha256; cv=none; b=6qTutPizsPxF2F9//NCdX0Eq7to2eAkcSXhgLVI/UQAkoh0F4HJq4yvHThgnsaRIk9qIO/ Ig/7Qm6yMx/kzwWG5/DC3oOUC6fWtCMUX+sz2Wtjnm4+IVu4w8y6D58IDq4JbFeag8fnr7 9DexWFWJuKYTsDUnz+7xq01ItpAlxuw= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=eFwZh5Zu; spf=pass (imf22.hostedemail.com: domain of donettom@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=donettom@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 42P4jXjk000871; Mon, 25 Mar 2024 05:02:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=qgEOJHHINd6gVGciNHxGiBy5AgfGlWmbQJemalMcRNo=; b=eFwZh5ZuO9jpVW8ovfjsaNj9hGOMHZcC6XrLUtUU+OYtpMDxEUspH5ItkdKc0jnoJlqb lz8ZbqIx/gLW7quDO6mILVVkp797WzmGhtHLk8oTeqyZ9bynd6CtXCH64GnFMhfLWbVy johlwWIAwoD+CXyjlFRlC5bhWFM/AuwkX97qOX7B6GaOC2tCQyTSQMZfvYn5yKmBPrgd xqTVHofgEdJwaIV7IN/OSmkIutd000dFjCErD1H99EZWTFYKR7UGqPDhObK5B+BD7IXD iFaOil008VTAt/quS+Ec4ZHAjGSJuskwN9MiSVcQsnhUYhYqnG3kfvnjJwTlesMUpEGo wg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x2h6t1feg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Mar 2024 05:02:30 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 42P52TvA027023; Mon, 25 Mar 2024 05:02:29 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x2h6t1fef-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Mar 2024 05:02:29 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 42P4JnWZ016410; Mon, 25 Mar 2024 05:02:28 GMT Received: from smtprelay03.dal12v.mail.ibm.com ([172.16.1.5]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3x29dtq6u7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 25 Mar 2024 05:02:28 +0000 Received: from smtpav04.dal12v.mail.ibm.com (smtpav04.dal12v.mail.ibm.com [10.241.53.103]) by smtprelay03.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 42P52Qi527460312 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 25 Mar 2024 05:02:28 GMT Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0ACF258079; Mon, 25 Mar 2024 05:02:26 +0000 (GMT) Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6B0B45807D; Mon, 25 Mar 2024 05:02:19 +0000 (GMT) Received: from [9.109.245.191] (unknown [9.109.245.191]) by smtpav04.dal12v.mail.ibm.com (Postfix) with ESMTP; Mon, 25 Mar 2024 05:02:19 +0000 (GMT) Message-ID: <61054afa-9f18-45f1-987d-e6f242012096@linux.ibm.com> Date: Mon, 25 Mar 2024 10:32:18 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 2/2] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy Content-Language: en-US To: "Huang, Ying" Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Aneesh Kumar , Michal Hocko , Dave Hansen , Mel Gorman , Feng Tang , Andrea Arcangeli , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , Matthew Wilcox , Vlastimil Babka , Dan Williams , Hugh Dickins , Kefeng Wang , Suren Baghdasaryan References: <87h6gyr7jf.fsf@yhuang6-desk2.ccr.corp.intel.com> <875xxbqb51.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Donet Tom In-Reply-To: <875xxbqb51.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: bJuITvuynOeSBj7ytHw3NgXcX9j_qAd2 X-Proofpoint-ORIG-GUID: u5B-9gb0OMzVfbP0MrJAZF2rGbuMzj1H X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-25_02,2024-03-21_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 priorityscore=1501 malwarescore=0 adultscore=0 suspectscore=0 clxscore=1015 phishscore=0 bulkscore=0 lowpriorityscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2403210000 definitions=main-2403250026 X-Rspamd-Queue-Id: 39EE3C0011 X-Rspam-User: X-Stat-Signature: kxustoucc5qttg3npbnzzmkwsxujzqit X-Rspamd-Server: rspam03 X-HE-Tag: 1711342985-89573 X-HE-Meta: U2FsdGVkX186W+uCc2quNlNw85rQxORM08jgdpRzBHAcnBbsFhYHUeAK7BF+VT5gq51zMVqV8fyKxLIWN9JJOvItPDWMS0k96tNK5rRJRPHVbMHbM8cJ0qT5PvlxU4V/U5tcUFDKTVW6sdazgEiUXt/JVMoKF4FNTA5kjuye4gVYM7qqQFbvJkBAJVZ/KlgiFThDPFqH1Aru9WxN+KJii0ryW88gWDsV7HA3DzA9l7V8nJrv1w4OMAZhvtv4Dc941C/ZflrvRrl/YHI/8cMzp69wdCzYIvJyavfscEvD2mTgT+4RQeUryQE6UNEAZ3YCAzpMRVxTkDvx8fCOmjF/zJ6AcjOhUF5/TkiKKlCBgzBPQbddQnexsORKbP/HPzvQtf7uinE4teRdV7QM+oEnCaqFvgqPakxQlqn8hLH8Dg3DYQcFSUdKxf3Oxl6oN0EbzD8Gkwc8AiOrsiFgG2PaZA7pEeG+vnvLIKGcolJe4zOWYmYAGakmB7jbZqZ/pS7yk6waWmd0ftoFcJ5P8TexCrtPhK3HsnaRz1GU7zI2Yzf2DMvqQs6pxR9HfxZloupbTUARfPbhoQK9Txf3fpA+cabA1Fd3VnnRlNukLf8HsHCNmJcrAUITP7LuGseUJTCUK4UBZ6xLbPf+MImFeD0a7mnrQaJcD4nVS++XMiXTmApm0XomieFUtTZ8eK5sQCQ54yVENbOmYL6ssrBFcwVOsAAOkQ3nSNWHI+mYbBhCkW4yhkzl5HhKG/Y35V+svkS8boBQuFUWp4CvjCiyxCGxpMwjRuZe+oP7Xnl+NSKUyz0uaOT2KoAdiul7cp3fUl1tdAWqwK+cv19zErj+r7XX6E2pGIHdKPrUpleF4YqsbzP84j7B78vO6HirN5UMgGrPK8fzGjVd68Ku2OAvb4y22u3ADYVEEySA/Q6PIb21hblRK5uQsaPf07IZ99SSTzJbHcsiJzL8t6MOYrMROL3 zlxS1Q7L 55N+DVtSx2ws3iCHmv/lez3SN5XSPm5aHoib/PcwvPXxrHlHg8wN18SvM5IAj/r9eulkP8ZK5bmJZkFAzH8cSmfVloO2htZcjqfpHGYb5T3TtiF30ccj0a91co8JV/WaCEZCXhACgsr70TjLOYntKsKFl0S/3UatMpuoQDQ1BOJ7CbvDxNSBEE+3xsPRTbCRAUbttZ8oUZb1p97WRModdb7arkUZAMgRuHMGh0TUmLo58asVQakWxZMUUHptE29hF/gTNZGZkfyg5qJuqAgc4yvFsi/zZHTIrJ+fx9Uh2TevMVClxk9bJXLMQLUGmAiBnB89Q9WnZl9dA8BsgtnhRxK4lzwCaB81Pg+JL/MPp+pb/bUo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 3/25/24 08:18, Huang, Ying wrote: > Donet Tom writes: > >> On 3/22/24 14:02, Huang, Ying wrote: >>> Donet Tom writes: >>> >>>> commit bda420b98505 ("numa balancing: migrate on fault among multiple bound >>>> nodes") added support for migrate on protnone reference with MPOL_BIND >>>> memory policy. This allowed numa fault migration when the executing node >>>> is part of the policy mask for MPOL_BIND. This patch extends migration >>>> support to MPOL_PREFERRED_MANY policy. >>>> >>>> Currently, we cannot specify MPOL_PREFERRED_MANY with the mempolicy flag >>>> MPOL_F_NUMA_BALANCING. This causes issues when we want to use >>>> NUMA_BALANCING_MEMORY_TIERING. To effectively use the slow memory tier, >>>> the kernel should not allocate pages from the slower memory tier via >>>> allocation control zonelist fallback. Instead, we should move cold pages >>>> from the faster memory node via memory demotion. For a page allocation, >>>> kswapd is only woken up after we try to allocate pages from all nodes in >>>> the allocation zone list. This implies that, without using memory >>>> policies, we will end up allocating hot pages in the slower memory tier. >>>> >>>> MPOL_PREFERRED_MANY was added by commit b27abaccf8e8 ("mm/mempolicy: add >>>> MPOL_PREFERRED_MANY for multiple preferred nodes") to allow better >>>> allocation control when we have memory tiers in the system. With >>>> MPOL_PREFERRED_MANY, the user can use a policy node mask consisting only >>>> of faster memory nodes. When we fail to allocate pages from the faster >>>> memory node, kswapd would be woken up, allowing demotion of cold pages >>>> to slower memory nodes. >>>> >>>> With the current kernel, such usage of memory policies implies we can't >>>> do page promotion from a slower memory tier to a faster memory tier >>>> using numa fault. This patch fixes this issue. >>>> >>>> For MPOL_PREFERRED_MANY, if the executing node is in the policy node >>>> mask, we allow numa migration to the executing nodes. If the executing >>>> node is not in the policy node mask, we do not allow numa migration. >>> Can we provide more information about this? I suggest to use an >>> example, for instance, pages may be distributed among multiple sockets >>> unexpectedly. >> Thank you for your suggestion. However, this commit message explains all the scenarios. > Yes. The commit message is correct and covers many cases. What I > suggested is to describe why we do that? An examples can not covers all > possibility, but it is easy to be understood. For example, something as > below? > > For example, on a 2-sockets system, there are N0, N1, N2 in socket 0, N3 > in socket 1. N0, N1, N3 have fast memory and CPU, while N2 has slow > memory and no CPU. For a workload, we may use MPOL_PREFERRED_MANY with > nodemask with N0 and N1 set because the workload runs on CPUs of socket > 0 at most times. Then, even if the workload runs on CPUs of N3 > occasionally, we will not try to migrate the workload pages from N2 to > N3 because users may want to avoid cross-socket access as much as > possible in the long term. > >> For example, Consider a system with 3 numa nodes (N0,N1 and N6). >> N0 and N1 are tier1 DRAM nodes  and N6 is tier 2 PMEM node. >> >> Scenario 1: The process is executing on N1, >> If the executing node is in the policy node mask, >> Curr Loc Pages - The numa node where page present(folio node) >> ================================================================================== >> Process    Policy          Curr Loc Pages      Observations >> ----------------------------------------------------------------------------------- >> N1           N0 N1 N6              N0                   Pages Migrated from N0 to N1 >> N1           N0 N1 N6              N6                   Pages Migrated from N6 to N1 >> N1           N0 N1                 N1                   Pages Migrated from N1 to N6 > Pages are not Migrating ? Sorry .This is a mistake. In this case Pages are not migrating. Thanks Donet. > >> N1           N0 N1                 N6                   Pages Migrated from N6 to N1 >> ------------------------------------------------------------------------------------ >> Scenario 2:  The process is executing on N1, >> If the executing node is NOT in the policy node mask, >> Curr Loc Pages - The numa node where page present(folio node) >> =================================================================================== >> Process    Policy       Curr Loc Pages    Observations >> ----------------------------------------------------------------------------------- >> N1            N0 N6             N0              Pages are not Migrating >> N1            N0 N6             N6              Pages are not migration, >> N1            N0                N0              Pages are not Migrating >> ------------------------------------------------------------------------------------ >> >> Scenario 3: The process is executing on N1, >> If the executing node and folio nodes are  NOT in the policy node mask, >> Curr Loc Pages - The numa node where page present (folio node) >> ==================================================================================== >> Thread    Policy       Curr Loc Pages        Observations >> ------------------------------------------------------------------------------------ >> N1          N0               N6                 Pages are not Migrating >> N1          N6               N0                 Pages are not Migrating >> ------------------------------------------------------------------------------------ >> >> We can conclude that even if the pages are distributed among multiple sockets, >> if the executing node is in the policy node mask, we allow numa migration to the >> executing nodes. If the executing node is not in the policy node mask, >> we do not allow numa migration. >> > [snip] > > -- > Best Regards, > Huang, Ying