From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24004C48BC3 for ; Tue, 20 Feb 2024 06:38:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7EECF6B0088; Tue, 20 Feb 2024 01:38:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 79F056B0089; Tue, 20 Feb 2024 01:38:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68E296B008A; Tue, 20 Feb 2024 01:38:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 5A2DE6B0088 for ; Tue, 20 Feb 2024 01:38:37 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D72B8C0672 for ; Tue, 20 Feb 2024 06:38:36 +0000 (UTC) X-FDA: 81811228632.20.5C28CE2 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) by imf24.hostedemail.com (Postfix) with ESMTP id B9AB6180010 for ; Tue, 20 Feb 2024 06:38:34 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=nYoYKbG+; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.19 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708411115; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YmvjxhiQTR5Q5gOgfOsk1dmsBgiDIdD/ft/3eDYfGL8=; b=Hep3Hb5MYwsxapja9Gu8eLIGAK7LA7Uw5uuUdriTIQXAOokNQn2UC09S976EFCCIbBPksw zcxJEiRhMF7abrb6YHk+9IRvHbKjP5qR3F26MNBe0aMfR6MEwUOyvPfqq2vWzvvKLva4ZV 6vY5oQsdT5Y7+kcZHUO8auRsinRWujI= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=nYoYKbG+; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.19 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708411115; a=rsa-sha256; cv=none; b=Ji6E9NuKdERzf+Gg/Zkg16yWjOwFa9G7nZE/cr7qaDjfrW358bJt6PCSuh/nGVbI8V5FNd st0Y/3sq+d0zIC9IWuTg632+bL3wvj689m7VVHLboTuf9yHqh9N6fbkGW9AMwSWKy6w4zJ fiQWhIiahxgPfmRxNbPE9RMzMriytYY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1708411115; x=1739947115; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=Klx/nY3WOFzoOE9Y1r2yIIXvxsA5VgjOXmNBewN3N1Y=; b=nYoYKbG+mQPHUERgRgyjZntfvQKHPz2uBP4qZm8rOsRNKDghyPbikZVF MUXERvgo7TJPSs6w2mvTa6Oqih0S7X9CT3tA06uHtdLm7yRyx1H6fVcA5 5WrYWd4CnZD+4MV302d+K44w0rEi7TmWESJpOcgO1w49vXUp8/zWUjlxb oHDrjNy8+TJ0+RXlUIGQby9N4yY1PPLSklb3jxPvfSkpBb344thgN9BwM Sa1MX8G06kzJs2HQXwzzfFU/8dQ3MkCIcymVax3G/NSGvNRk12Th+TO5f wuWgVaZ88dYcBVCwVf/xd9UC6QkxbB9jqSNd7apa2NTVYVxwYmcMB/QLH g==; X-IronPort-AV: E=McAfee;i="6600,9927,10989"; a="2363599" X-IronPort-AV: E=Sophos;i="6.06,172,1705392000"; d="scan'208";a="2363599" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Feb 2024 22:38:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.06,172,1705392000"; d="scan'208";a="9379846" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Feb 2024 22:38:28 -0800 From: "Huang, Ying" To: Donet Tom Cc: Michal Hocko , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Aneesh Kumar , Dave Hansen , Mel Gorman , Ben Widawsky , Feng Tang , Andrea Arcangeli , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , Matthew Wilcox , Mike Kravetz , Vlastimil Babka , Dan Williams , Hugh Dickins , Kefeng Wang , Suren Baghdasaryan Subject: Re: [PATCH 3/3] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy In-Reply-To: (Donet Tom's message of "Mon, 19 Feb 2024 19:14:48 +0530") References: <9c3f7b743477560d1c5b12b8c111a584a2cc92ee.1708097962.git.donettom@linux.ibm.com> <8d7737208bd24e754dc7a538a3f7f02de84f1f72.1708097962.git.donettom@linux.ibm.com> Date: Tue, 20 Feb 2024 14:36:32 +0800 Message-ID: <87bk8bprpr.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: B9AB6180010 X-Stat-Signature: hsjo8wcdnkue9xaippbca3odd45fbpz3 X-Rspam-User: X-HE-Tag: 1708411114-723311 X-HE-Meta: U2FsdGVkX1/jM8NlvsYbTkhpaPf5+DIZC4gHzAryX0P9iDgnhDShl0RIb/a0yYOVSGm9Y4QChVBH9JSvLaYYOIYTVmf68iYeSZXV6FjiwzS5MR7aKlKxQUipailo3tuQfBcm51uY2aLcdeb8YHFLjd+oBsrtV7ipDbEDrEKpFv7Qn+tm//UkStVaBDf4AxC2d/9cJI6sgk2wJ9oxZlCbMpj9XnXLxy4BXnDyWnwOwfOZsTQ5hawbBUFxalg9o/lVvtBq5MpsW/YH+LOw2Op9rJkD5gUae5KjkT/jz4gIat1Sq8FKmiFaYVAMqpC2HL7JbHLXYJlL+1s4SWEO32R549Ax79EIGLiglTcSFTz4DEMuPZZ60jsUj97BnyLAqcNX3dd/LVPlRlk0guN1S6asvGrB+oXzrAtyFXziHiwfTyb/Swiy6n3pNHfPG0xqT9IgY5KHW5a36abkiz99w3nlXuH2hrkvkEmxpITWGgizHdo8aTZb8XeoE3sO9vc5B/0s55WzSfdbFc7si7A08znFUlBcFhUT8yvBsJ+EFaoEPC1bcq9cW1vgSb1swQVhGNrNicd1TuRflzrx4QW0DOapJBWuwNVwDdZipLewiT+u8RVpPKEPznONkjqx9MCGiR9xblfci6w74s5tv1WYwt24TIN0SyhP3lqKTTW2UFKIfbitPpwdCuCAyBY2ZL5v/Euwq+2WsQ8HzcbMERpY4RNzB8ZWT9GxwOl65w+hWflCst59yectWud3ZeVzyc6kXT+MgKFpWOAAt1070IOazL/zpfPSnvaWwt2RTz0hOQGU0f27nlsdWx+/Uti2SlVxdrfMOnq8BkU2QDlnmAYSoptIWSTv6In7QDCTO1Lx2dZ8ZuMg+vqTyfmRDD8g1SMK22V+WFlwUUuIywgtqq4NDgGta3nDGmU69EiL4XOgmK07rKUIvv9ALE8Niic4HThoziwAoGobi7+MzJI6cJDU2fn 5OU1NTzO En86dfTeaPf0UxOvGWFwF2WaDRi8pgXlG5Ag2T3TGo02wlD8ZU2uEzxUrslGh9cZRP9dh1rgI8sxuYw6kNIZ/vFYz8aXxo6a3htkFsFLY/Wh9vkQ78qQES6jdZ/mFgeyPZpMF3J+bmtDa7Ta6dUS5uDV4ok+qRI5tmPItdM8GH6vlw8IRA661JlXSAz7jRi6UBaC7cFtIL4WQxV0GXit+UbCFWNJks5U/SfMlhoUZoOQJ+VMtDXIkSwbZpZmgNvKmv7JaoUxBDSBibPeyWG/zINkjJg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Donet Tom writes: > On 2/19/24 17:37, Michal Hocko wrote: >> On Sat 17-02-24 01:31:35, Donet Tom wrote: >>> commit bda420b98505 ("numa balancing: migrate on fault among multiple b= ound >>> nodes") added support for migrate on protnone reference with MPOL_BIND >>> memory policy. This allowed numa fault migration when the executing node >>> is part of the policy mask for MPOL_BIND. This patch extends migration >>> support to MPOL_PREFERRED_MANY policy. >>> >>> Currently, we cannot specify MPOL_PREFERRED_MANY with the mempolicy flag >>> MPOL_F_NUMA_BALANCING. This causes issues when we want to use >>> NUMA_BALANCING_MEMORY_TIERING. To effectively use the slow memory tier, >>> the kernel should not allocate pages from the slower memory tier via >>> allocation control zonelist fallback. Instead, we should move cold pages >>> from the faster memory node via memory demotion. For a page allocation, >>> kswapd is only woken up after we try to allocate pages from all nodes in >>> the allocation zone list. This implies that, without using memory >>> policies, we will end up allocating hot pages in the slower memory tier. >>> >>> MPOL_PREFERRED_MANY was added by commit b27abaccf8e8 ("mm/mempolicy: add >>> MPOL_PREFERRED_MANY for multiple preferred nodes") to allow better >>> allocation control when we have memory tiers in the system. With >>> MPOL_PREFERRED_MANY, the user can use a policy node mask consisting only >>> of faster memory nodes. When we fail to allocate pages from the faster >>> memory node, kswapd would be woken up, allowing demotion of cold pages >>> to slower memory nodes. >>> >>> With the current kernel, such usage of memory policies implies we can't >>> do page promotion from a slower memory tier to a faster memory tier >>> using numa fault. This patch fixes this issue. >>> >>> For MPOL_PREFERRED_MANY, if the executing node is in the policy node >>> mask, we allow numa migration to the executing nodes. If the executing >>> node is not in the policy node mask but the folio is already allocated >>> based on policy preference (the folio node is in the policy node mask), >>> we don't allow numa migration. If both the executing node and folio node >>> are outside the policy node mask, we allow numa migration to the >>> executing nodes. >> The feature makes sense to me. How has this been tested? Do you have any >> numbers to present? > > Hi Michal > > I have a test program which allocate memory on a specified node and > trigger the promotion or migration (Keep accessing the pages). > > Without this patch if we set MPOL_PREFERRED_MANY promotion or migration w= as not happening > with this patch I could see pages are getting migrated or promoted. > > My system has 2 CPU+DRAM node (Tier 1) and 1 PMEM node(Tier 2). Below > are my test results. > > In below table N0 and N1 are Tier1 Nodes. N6 is the Tier2 Node. > Exec_Node is the execution node, Policy is the nodes in nodemask and > "Curr Location Pages" is the node where pages present before migration > or promotion start. > > Tests Results > ------------------ > Scenario 1:=C2=A0 if the executing node is in the policy node mask > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D > Exec_Node=C2=A0=C2=A0=C2=A0 Policy=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 Curr Location Pages Observations > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D > N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N0 N1 N6= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N1= Pages Migrated from N1 to N0 > N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 N0 N1 N6=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 N6 = Pages Promoted from N6 to N0 > N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 N0 N1=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 N1 = =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 Pages Migr= ated from N1 to N0 > N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 N0 N1=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 =C2= =A0N6 =C2=A0 =C2=A0 Pages Promoted from N6 to N0 > > Scenario 2: If the folio node is in policy node mask and Exec node not in= policy=C2=A0 node mask > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D > Exec_Node=C2=A0=C2=A0=C2=A0 Policy=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Cu= rr Location Pages=C2=A0=C2=A0=C2=A0 =C2=A0 Observations > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D > N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 N1 N6=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N1 = Pages are not Migrating to N0 > N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N1 N6=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N6 = Pages are not migration to N0 > N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0N1=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 =C2=A0N1 =C2=A0 =C2=A0 Pages are not Migrating to N0 > > Scenario 3: both the folio node and executing node are outside the policy= nodemask > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > Exec_Node=C2=A0 =C2=A0 Policy=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 Curr Location Pages=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Observations > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 N1=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N6 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 Pages P= romoted from N6 to N0 > N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0N6 N1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0Pages Migrated from N1 to N0 > Please use some benchmarks (e.g., redis + memtier) and show the proc-vmstat stats and benchamrk score. Not part of the kernel series, but don't forget to submit patches to the man pages project and numactl tool to let users use it. -- Best Regards, Huang, Ying > Thanks > Donet Tom > >> >>> Signed-off-by: Aneesh Kumar K.V (IBM) >>> Signed-off-by: Donet Tom >>> --- >>> mm/mempolicy.c | 28 ++++++++++++++++++++++++++-- >>> 1 file changed, 26 insertions(+), 2 deletions(-) >> I haven't spotted anything obviously wrong in the patch itself but I >> admit this is not an area I am actively familiar with so I might be >> missing something.