From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70D1AC48BC4 for ; Tue, 20 Feb 2024 07:25:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC3896B0082; Tue, 20 Feb 2024 02:25:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E73BF6B0087; Tue, 20 Feb 2024 02:25:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3B986B0088; Tue, 20 Feb 2024 02:25:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C22C16B0082 for ; Tue, 20 Feb 2024 02:25:58 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 6C207C05AC for ; Tue, 20 Feb 2024 07:25:58 +0000 (UTC) X-FDA: 81811347996.16.225972B Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) by imf23.hostedemail.com (Postfix) with ESMTP id 7E74014000C for ; Tue, 20 Feb 2024 07:25:56 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=J1wX5hJ3; spf=pass (imf23.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.21 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708413956; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Uny0P4JSx3DIr7gME+fUr7OpWFPrw/PiD8/g/NoTMXw=; b=aLz+huOWWZi9IattuaL1CyrAeaU7FeZ34YXPhi6Q1SrMBvlq5QwWhLgWNxbV8AjJnE/tXn U97wFLSVTob9onAM1sx75BsOjzN2fOZs6Ao3VSBDyvHLU2bRH6B9zUHWIs/CGp6DyIcdvs lfOAbLiYvxQDzaOEqJ53ErDo/hmNkBw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708413956; a=rsa-sha256; cv=none; b=tkapa1TvE4Cyb747Yol/kfEmSwTaW12u3giYcGvhQnOJI7vH+2pbjVEbukT1Ry9TFh20U+ cQn3eM0yzhe4CgapA8Qmm1XjgEea5Q1zPf3yy69Fwf9KJi4rGXcn8Zfsx/OZaGoONCabNH DL2WnOvnoS7AK+mbkRJW5XL52XHyr5U= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=J1wX5hJ3; spf=pass (imf23.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.21 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1708413957; x=1739949957; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=Bu7bDg5XnddRWp+AqMHunndsJ8KOIOFxplMKQMYJOoM=; b=J1wX5hJ3NnWbEqDCThUmQHiBAZ/wqQN/5UnOSubNVQngTXusnZF8Kg81 WenOwldw+Z3Vemp1cxjWAVkLb7DrErQEjMlXINF1kjpazlLEqwiRA9Mmh RvQPJYzTQzpadJPssFG9nKqY3Bx+tpJH8LixaAjSpDAQo4jDbGI9qMgOW D67VA/4Ind9J/9SBdoID79zBxrifq2xhoYvtBOzhhrbRch691UNtPsQ9W YDzzqppkYJf7RQ/VNtVS8L+yq4+rf43SRbbOPP63wcWZp+K0jIzHcr/k1 52yiz4n4CYvuE/dGjA6s3ukBfedLmj9ihWkYCv90DoIaZ+ToRuDcVokWT Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10989"; a="2406419" X-IronPort-AV: E=Sophos;i="6.06,172,1705392000"; d="scan'208";a="2406419" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Feb 2024 23:25:55 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.06,172,1705392000"; d="scan'208";a="4664185" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Feb 2024 23:25:50 -0800 From: "Huang, Ying" To: "Aneesh Kumar K.V" Cc: Donet Tom , Michal Hocko , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dave Hansen , Mel Gorman , Ben Widawsky , Feng Tang , Andrea Arcangeli , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , Matthew Wilcox , Mike Kravetz , Vlastimil Babka , Dan Williams , Hugh Dickins , Kefeng Wang , Suren Baghdasaryan Subject: Re: [PATCH 3/3] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy In-Reply-To: (Aneesh Kumar K. V.'s message of "Tue, 20 Feb 2024 12:14:48 +0530") References: <9c3f7b743477560d1c5b12b8c111a584a2cc92ee.1708097962.git.donettom@linux.ibm.com> <8d7737208bd24e754dc7a538a3f7f02de84f1f72.1708097962.git.donettom@linux.ibm.com> <87bk8bprpr.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Tue, 20 Feb 2024 15:23:54 +0800 Message-ID: <87y1bfoayd.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 7E74014000C X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: uqibhzs7zde8hw577ma5uou35rohny4p X-HE-Tag: 1708413956-448780 X-HE-Meta: U2FsdGVkX19bjOOHiJ8jJoWjyhc+lGdbX4w5o/hsEGhkGcun2neUo1N8RMspg4JZ4Q8NMG08Y3EdKtahpV/zr8QrNC8HrUPq4az7pBcLXwRkV4lxT8xwJ0+Z410NROFLWnaB5PwxJTHZgD3l+RbJT8H7f8wj2rhNLE/teen2QQI0KNsoFiThjEouJn5XxGOnIpemq/Em+5PF41r+PenfWnLtMndSC6lBVHSfesOEzU0ksvUYmodY4YdTdMW2oQUxKB5ORwCuC2jNEJwGmuhjIufrr8wA6Odl3rFnqelJby51FCLVKGAdr+Zd92Y9aqCSm/TARqlGzo6JkQ/olyCTjcj+zLFnCRmKu/7JLF1HRsjzkVy18yrtf3/q8Pt9fFyv2HRyHVjuFV/06VGoR09VFUmEMo+ci61T2kImO1ddEVrwsXKzw9qbObitgUXsE6LZlaS44TKMSw6HA5HkvhCkN/+u/WIal560LrbXCtvei0zKa+V5o+5pf4gYZAG+15eTgzOVK1kZpG0FQb5XidKy0RCYMzcJBOvXAJZyMJxRGl0WdPIcfcVivAWrGPJ8lzL7S2wRb4tEK/IbNapLkHhSgSO2+MLATYXw3geihMn6wh9sgdSdG7hmlOud8LKiNbwFrw9Fik2pjuaRq6fYYDivD867sf9cIbb6uk7f5S3eaGWmLvUJYullJ8UaknIsKWg5qK2wpuOkzkEyTlzzGQxkUD5hUppnWmqYi86mvrQrArtDfPtzzDf992Ejk3Rm/pBb7hhCkWUziuM3lxLu2QR7V3DBphrF2cgSLvV/bwSZ/3yoz/C1wY7Ifu9/oK0dJm6d0fZ9l53bQO/xmeIFetoanWTm+lc4zX19H9nZTBQScz548OnqUOEgJFXgIKP41sFxcFmk3Sdrs7A= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: "Aneesh Kumar K.V" writes: > On 2/20/24 12:06 PM, Huang, Ying wrote: >> Donet Tom writes: >>=20 >>> On 2/19/24 17:37, Michal Hocko wrote: >>>> On Sat 17-02-24 01:31:35, Donet Tom wrote: >>>>> commit bda420b98505 ("numa balancing: migrate on fault among multiple= bound >>>>> nodes") added support for migrate on protnone reference with MPOL_BIND >>>>> memory policy. This allowed numa fault migration when the executing n= ode >>>>> is part of the policy mask for MPOL_BIND. This patch extends migration >>>>> support to MPOL_PREFERRED_MANY policy. >>>>> >>>>> Currently, we cannot specify MPOL_PREFERRED_MANY with the mempolicy f= lag >>>>> MPOL_F_NUMA_BALANCING. This causes issues when we want to use >>>>> NUMA_BALANCING_MEMORY_TIERING. To effectively use the slow memory tie= r, >>>>> the kernel should not allocate pages from the slower memory tier via >>>>> allocation control zonelist fallback. Instead, we should move cold pa= ges >>>>> from the faster memory node via memory demotion. For a page allocatio= n, >>>>> kswapd is only woken up after we try to allocate pages from all nodes= in >>>>> the allocation zone list. This implies that, without using memory >>>>> policies, we will end up allocating hot pages in the slower memory ti= er. >>>>> >>>>> MPOL_PREFERRED_MANY was added by commit b27abaccf8e8 ("mm/mempolicy: = add >>>>> MPOL_PREFERRED_MANY for multiple preferred nodes") to allow better >>>>> allocation control when we have memory tiers in the system. With >>>>> MPOL_PREFERRED_MANY, the user can use a policy node mask consisting o= nly >>>>> of faster memory nodes. When we fail to allocate pages from the faster >>>>> memory node, kswapd would be woken up, allowing demotion of cold pages >>>>> to slower memory nodes. >>>>> >>>>> With the current kernel, such usage of memory policies implies we can= 't >>>>> do page promotion from a slower memory tier to a faster memory tier >>>>> using numa fault. This patch fixes this issue. >>>>> >>>>> For MPOL_PREFERRED_MANY, if the executing node is in the policy node >>>>> mask, we allow numa migration to the executing nodes. If the executing >>>>> node is not in the policy node mask but the folio is already allocated >>>>> based on policy preference (the folio node is in the policy node mask= ), >>>>> we don't allow numa migration. If both the executing node and folio n= ode >>>>> are outside the policy node mask, we allow numa migration to the >>>>> executing nodes. >>>> The feature makes sense to me. How has this been tested? Do you have a= ny >>>> numbers to present? >>> >>> Hi Michal >>> >>> I have a test program which allocate memory on a specified node and >>> trigger the promotion or migration (Keep accessing the pages). >>> >>> Without this patch if we set MPOL_PREFERRED_MANY promotion or migration= was not happening >>> with this patch I could see pages are getting migrated or promoted. >>> >>> My system has 2 CPU+DRAM node (Tier 1) and 1 PMEM node(Tier 2). Below >>> are my test results. >>> >>> In below table N0 and N1 are Tier1 Nodes. N6 is the Tier2 Node. >>> Exec_Node is the execution node, Policy is the nodes in nodemask and >>> "Curr Location Pages" is the node where pages present before migration >>> or promotion start. >>> >>> Tests Results >>> ------------------ >>> Scenario 1:=C2=A0 if the executing node is in the policy node mask >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D >>> Exec_Node=C2=A0=C2=A0=C2=A0 Policy=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 Curr Location Pages Observations >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D >>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N0 N1 N6= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N1= Pages Migrated from N1 to N0 >>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 N0 N1 N6=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 N6 = Pages Promoted from N6 to N0 >>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 N0 N1=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 = N1 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 Pages M= igrated from N1 to N0 >>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 N0 N1=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 = =C2=A0N6 =C2=A0 =C2=A0 Pages Promoted from N6 to N0 >>> >>> Scenario 2: If the folio node is in policy node mask and Exec node not = in policy=C2=A0 node mask >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D >>> Exec_Node=C2=A0=C2=A0=C2=A0 Policy=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = Curr Location Pages=C2=A0=C2=A0=C2=A0 =C2=A0 Observations >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D >>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 N1 N6=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N1 = Pages are not Migrating to N0 >>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N1 N6=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N6 = Pages are not migration to N0 >>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0N1= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 =C2=A0N1 =C2=A0 =C2=A0 Pages are not Migrating to N0 >>> >>> Scenario 3: both the folio node and executing node are outside the poli= cy nodemask >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D >>> Exec_Node=C2=A0 =C2=A0 Policy=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 Curr Location Pages=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Observations >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D >>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 N1=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N6 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 Pages P= romoted from N6 to N0 >>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0N6 N1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0Pages Migrated from N1 to N0 >>> >>=20 >> Please use some benchmarks (e.g., redis + memtier) and show the >> proc-vmstat stats and benchamrk score. > > > Without this change numa fault migration is not supported with MPOL_PREFE= RRED_MANY > policy. So there is no performance comparison with and without patch. W.r= .t effectiveness of numa > fault migration, that is a different topic from this patch IIUC, the goal of the patch is to optimize performance, right? If so, the benchmark score will help justify the change. -- Best Regards, Huang, Ying