From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 70D1AC48BC4
	for <linux-mm@archiver.kernel.org>; Tue, 20 Feb 2024 07:25:59 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id EC3896B0082; Tue, 20 Feb 2024 02:25:58 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id E73BF6B0087; Tue, 20 Feb 2024 02:25:58 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id D3B986B0088; Tue, 20 Feb 2024 02:25:58 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id C22C16B0082
	for <linux-mm@kvack.org>; Tue, 20 Feb 2024 02:25:58 -0500 (EST)
Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay10.hostedemail.com (Postfix) with ESMTP id 6C207C05AC
	for <linux-mm@kvack.org>; Tue, 20 Feb 2024 07:25:58 +0000 (UTC)
X-FDA: 81811347996.16.225972B
Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21])
	by imf23.hostedemail.com (Postfix) with ESMTP id 7E74014000C
	for <linux-mm@kvack.org>; Tue, 20 Feb 2024 07:25:56 +0000 (UTC)
Authentication-Results: imf23.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=J1wX5hJ3;
	spf=pass (imf23.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.21 as permitted sender) smtp.mailfrom=ying.huang@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1708413956;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=Uny0P4JSx3DIr7gME+fUr7OpWFPrw/PiD8/g/NoTMXw=;
	b=aLz+huOWWZi9IattuaL1CyrAeaU7FeZ34YXPhi6Q1SrMBvlq5QwWhLgWNxbV8AjJnE/tXn
	U97wFLSVTob9onAM1sx75BsOjzN2fOZs6Ao3VSBDyvHLU2bRH6B9zUHWIs/CGp6DyIcdvs
	lfOAbLiYvxQDzaOEqJ53ErDo/hmNkBw=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708413956; a=rsa-sha256;
	cv=none;
	b=tkapa1TvE4Cyb747Yol/kfEmSwTaW12u3giYcGvhQnOJI7vH+2pbjVEbukT1Ry9TFh20U+
	cQn3eM0yzhe4CgapA8Qmm1XjgEea5Q1zPf3yy69Fwf9KJi4rGXcn8Zfsx/OZaGoONCabNH
	DL2WnOvnoS7AK+mbkRJW5XL52XHyr5U=
ARC-Authentication-Results: i=1;
	imf23.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=J1wX5hJ3;
	spf=pass (imf23.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.21 as permitted sender) smtp.mailfrom=ying.huang@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1708413957; x=1739949957;
  h=from:to:cc:subject:in-reply-to:references:date:
   message-id:mime-version:content-transfer-encoding;
  bh=Bu7bDg5XnddRWp+AqMHunndsJ8KOIOFxplMKQMYJOoM=;
  b=J1wX5hJ3NnWbEqDCThUmQHiBAZ/wqQN/5UnOSubNVQngTXusnZF8Kg81
   WenOwldw+Z3Vemp1cxjWAVkLb7DrErQEjMlXINF1kjpazlLEqwiRA9Mmh
   RvQPJYzTQzpadJPssFG9nKqY3Bx+tpJH8LixaAjSpDAQo4jDbGI9qMgOW
   D67VA/4Ind9J/9SBdoID79zBxrifq2xhoYvtBOzhhrbRch691UNtPsQ9W
   YDzzqppkYJf7RQ/VNtVS8L+yq4+rf43SRbbOPP63wcWZp+K0jIzHcr/k1
   52yiz4n4CYvuE/dGjA6s3ukBfedLmj9ihWkYCv90DoIaZ+ToRuDcVokWT
   Q==;
X-IronPort-AV: E=McAfee;i="6600,9927,10989"; a="2406419"
X-IronPort-AV: E=Sophos;i="6.06,172,1705392000"; 
   d="scan'208";a="2406419"
Received: from fmviesa009.fm.intel.com ([10.60.135.149])
  by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Feb 2024 23:25:55 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.06,172,1705392000"; 
   d="scan'208";a="4664185"
Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55])
  by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Feb 2024 23:25:50 -0800
From: "Huang, Ying" <ying.huang@intel.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>,  Michal Hocko <mhocko@suse.com>,
  Andrew Morton <akpm@linux-foundation.org>,  linux-mm@kvack.org,
  linux-kernel@vger.kernel.org,  Dave Hansen <dave.hansen@linux.intel.com>,
  Mel Gorman <mgorman@suse.de>,  Ben Widawsky <ben.widawsky@intel.com>,
  Feng Tang <feng.tang@intel.com>,  Andrea Arcangeli <aarcange@redhat.com>,
  Peter Zijlstra <peterz@infradead.org>,  Ingo Molnar <mingo@redhat.com>,
  Rik van Riel <riel@surriel.com>,  Johannes Weiner <hannes@cmpxchg.org>,
  Matthew Wilcox <willy@infradead.org>,  Mike Kravetz
 <mike.kravetz@oracle.com>,  Vlastimil Babka <vbabka@suse.cz>,  Dan
 Williams <dan.j.williams@intel.com>,  Hugh Dickins <hughd@google.com>,
  Kefeng Wang <wangkefeng.wang@huawei.com>,  Suren Baghdasaryan
 <surenb@google.com>
Subject: Re: [PATCH 3/3] mm/numa_balancing:Allow migrate on protnone
 reference with MPOL_PREFERRED_MANY policy
In-Reply-To: <e88eedb7-cad6-4298-8710-4abc98048529@kernel.org> (Aneesh Kumar
	K. V.'s message of "Tue, 20 Feb 2024 12:14:48 +0530")
References: <9c3f7b743477560d1c5b12b8c111a584a2cc92ee.1708097962.git.donettom@linux.ibm.com>
	<8d7737208bd24e754dc7a538a3f7f02de84f1f72.1708097962.git.donettom@linux.ibm.com>
	<ZdNEg_aA0LHJY22T@tiehlicka>
	<e7b138a4-de46-4cb6-94b8-67019e0369e9@linux.ibm.com>
	<87bk8bprpr.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<e88eedb7-cad6-4298-8710-4abc98048529@kernel.org>
Date: Tue, 20 Feb 2024 15:23:54 +0800
Message-ID: <87y1bfoayd.fsf@yhuang6-desk2.ccr.corp.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 7E74014000C
X-Rspam-User: 
X-Rspamd-Server: rspam11
X-Stat-Signature: uqibhzs7zde8hw577ma5uou35rohny4p
X-HE-Tag: 1708413956-448780
X-HE-Meta: U2FsdGVkX19bjOOHiJ8jJoWjyhc+lGdbX4w5o/hsEGhkGcun2neUo1N8RMspg4JZ4Q8NMG08Y3EdKtahpV/zr8QrNC8HrUPq4az7pBcLXwRkV4lxT8xwJ0+Z410NROFLWnaB5PwxJTHZgD3l+RbJT8H7f8wj2rhNLE/teen2QQI0KNsoFiThjEouJn5XxGOnIpemq/Em+5PF41r+PenfWnLtMndSC6lBVHSfesOEzU0ksvUYmodY4YdTdMW2oQUxKB5ORwCuC2jNEJwGmuhjIufrr8wA6Odl3rFnqelJby51FCLVKGAdr+Zd92Y9aqCSm/TARqlGzo6JkQ/olyCTjcj+zLFnCRmKu/7JLF1HRsjzkVy18yrtf3/q8Pt9fFyv2HRyHVjuFV/06VGoR09VFUmEMo+ci61T2kImO1ddEVrwsXKzw9qbObitgUXsE6LZlaS44TKMSw6HA5HkvhCkN/+u/WIal560LrbXCtvei0zKa+V5o+5pf4gYZAG+15eTgzOVK1kZpG0FQb5XidKy0RCYMzcJBOvXAJZyMJxRGl0WdPIcfcVivAWrGPJ8lzL7S2wRb4tEK/IbNapLkHhSgSO2+MLATYXw3geihMn6wh9sgdSdG7hmlOud8LKiNbwFrw9Fik2pjuaRq6fYYDivD867sf9cIbb6uk7f5S3eaGWmLvUJYullJ8UaknIsKWg5qK2wpuOkzkEyTlzzGQxkUD5hUppnWmqYi86mvrQrArtDfPtzzDf992Ejk3Rm/pBb7hhCkWUziuM3lxLu2QR7V3DBphrF2cgSLvV/bwSZ/3yoz/C1wY7Ifu9/oK0dJm6d0fZ9l53bQO/xmeIFetoanWTm+lc4zX19H9nZTBQScz548OnqUOEgJFXgIKP41sFxcFmk3Sdrs7A=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

"Aneesh Kumar K.V" <aneesh.kumar@kernel.org> writes:

> On 2/20/24 12:06 PM, Huang, Ying wrote:
>> Donet Tom <donettom@linux.ibm.com> writes:
>>=20
>>> On 2/19/24 17:37, Michal Hocko wrote:
>>>> On Sat 17-02-24 01:31:35, Donet Tom wrote:
>>>>> commit bda420b98505 ("numa balancing: migrate on fault among multiple=
 bound
>>>>> nodes") added support for migrate on protnone reference with MPOL_BIND
>>>>> memory policy. This allowed numa fault migration when the executing n=
ode
>>>>> is part of the policy mask for MPOL_BIND. This patch extends migration
>>>>> support to MPOL_PREFERRED_MANY policy.
>>>>>
>>>>> Currently, we cannot specify MPOL_PREFERRED_MANY with the mempolicy f=
lag
>>>>> MPOL_F_NUMA_BALANCING. This causes issues when we want to use
>>>>> NUMA_BALANCING_MEMORY_TIERING. To effectively use the slow memory tie=
r,
>>>>> the kernel should not allocate pages from the slower memory tier via
>>>>> allocation control zonelist fallback. Instead, we should move cold pa=
ges
>>>>> from the faster memory node via memory demotion. For a page allocatio=
n,
>>>>> kswapd is only woken up after we try to allocate pages from all nodes=
 in
>>>>> the allocation zone list. This implies that, without using memory
>>>>> policies, we will end up allocating hot pages in the slower memory ti=
er.
>>>>>
>>>>> MPOL_PREFERRED_MANY was added by commit b27abaccf8e8 ("mm/mempolicy: =
add
>>>>> MPOL_PREFERRED_MANY for multiple preferred nodes") to allow better
>>>>> allocation control when we have memory tiers in the system. With
>>>>> MPOL_PREFERRED_MANY, the user can use a policy node mask consisting o=
nly
>>>>> of faster memory nodes. When we fail to allocate pages from the faster
>>>>> memory node, kswapd would be woken up, allowing demotion of cold pages
>>>>> to slower memory nodes.
>>>>>
>>>>> With the current kernel, such usage of memory policies implies we can=
't
>>>>> do page promotion from a slower memory tier to a faster memory tier
>>>>> using numa fault. This patch fixes this issue.
>>>>>
>>>>> For MPOL_PREFERRED_MANY, if the executing node is in the policy node
>>>>> mask, we allow numa migration to the executing nodes. If the executing
>>>>> node is not in the policy node mask but the folio is already allocated
>>>>> based on policy preference (the folio node is in the policy node mask=
),
>>>>> we don't allow numa migration. If both the executing node and folio n=
ode
>>>>> are outside the policy node mask, we allow numa migration to the
>>>>> executing nodes.
>>>> The feature makes sense to me. How has this been tested? Do you have a=
ny
>>>> numbers to present?
>>>
>>> Hi Michal
>>>
>>> I have a test program which allocate memory on a specified node and
>>> trigger the promotion or migration (Keep accessing the pages).
>>>
>>> Without this patch if we set MPOL_PREFERRED_MANY promotion or migration=
 was not happening
>>> with this patch I could see pages are getting  migrated or promoted.
>>>
>>> My system has 2 CPU+DRAM node (Tier 1) and 1 PMEM node(Tier 2). Below
>>> are my test results.
>>>
>>> In below table N0 and N1 are Tier1 Nodes. N6 is the Tier2 Node.
>>> Exec_Node is the execution node, Policy is the nodes in nodemask and
>>> "Curr Location Pages" is the node where pages present before migration
>>> or promotion start.
>>>
>>> Tests Results
>>> ------------------
>>> Scenario 1:=C2=A0 if the executing node is in the policy node mask
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D
>>> Exec_Node=C2=A0=C2=A0=C2=A0 Policy=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0 Curr Location Pages       Observations
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D
>>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N0 N1 N6=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N1=
                Pages Migrated from N1 to N0
>>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 N0 N1 N6=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 N6        =
        Pages Promoted from N6 to N0
>>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 N0 N1=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0  =
N1 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0    Pages M=
igrated from N1 to N0
>>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 N0 N1=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 =
=C2=A0N6 =C2=A0 =C2=A0            Pages Promoted from N6 to N0
>>>
>>> Scenario 2: If the folio node is in policy node mask and Exec node not =
in policy=C2=A0 node mask
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D
>>> Exec_Node=C2=A0=C2=A0=C2=A0 Policy=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =
Curr Location Pages=C2=A0=C2=A0=C2=A0  =C2=A0 Observations
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D
>>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0  =C2=A0=C2=A0=C2=A0 N1 N6=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N1            =
   Pages are not Migrating to N0
>>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N1 N6=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N6   =
            Pages are not migration to N0
>>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0N1=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 =C2=A0N1 =C2=A0 =C2=A0           Pages are not Migrating to N0
>>>
>>> Scenario 3: both the folio node and executing node are outside the poli=
cy nodemask
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D
>>> Exec_Node=C2=A0 =C2=A0 Policy=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0 Curr Location Pages=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Observations
>>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D
>>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 N1=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 N6 =C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0 Pages P=
romoted from N6 to N0
>>> N0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0N6                     N1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0Pages Migrated from N1 to N0
>>>
>>=20
>> Please use some benchmarks (e.g., redis + memtier) and show the
>> proc-vmstat stats and benchamrk score.
>
>
> Without this change numa fault migration is not supported with MPOL_PREFE=
RRED_MANY
> policy. So there is no performance comparison with and without patch. W.r=
.t effectiveness of numa
> fault migration, that is a different topic from this patch

IIUC, the goal of the patch is to optimize performance, right?  If so,
the benchmark score will help justify the change.

--
Best Regards,
Huang, Ying