From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3ED3FC54E71 for ; Fri, 22 Mar 2024 08:34:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD01E6B0082; Fri, 22 Mar 2024 04:34:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A5A0F6B0083; Fri, 22 Mar 2024 04:34:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D24B6B0087; Fri, 22 Mar 2024 04:34:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 765986B0082 for ; Fri, 22 Mar 2024 04:34:22 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 37D77121177 for ; Fri, 22 Mar 2024 08:34:22 +0000 (UTC) X-FDA: 81924013164.29.339F583 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) by imf24.hostedemail.com (Postfix) with ESMTP id 0BBEE18000E for ; Fri, 22 Mar 2024 08:34:19 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=cUpPQHxp; spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711096460; a=rsa-sha256; cv=none; b=mrXfpHWl4N7SKdJ23idxe3CeuMj4DoartPjmhccaIreEfnQLxB0sriL+cABBEVKQDSfLs4 bkpNn6eyQAtInxALs5FcrSBSmyuFqEpWAC2GkW8cqCPUTG+p7S+EwjbNDz+lE8uezVupGD SeaDZGp7QvfSluT9/aags12mByD2G5g= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=cUpPQHxp; spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711096460; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=C7oJsHOQSVFgF1+9ewQKwNsVRaAUoOh2vdgl32F4L0k=; b=g0Tz/sByjEKvBD6PnpU+lh1xyFqj2Mq4X2sQ6mrG7Pc8+vpSF5hHvQl56S2/TFiwxtccUP cOzhCXDETHMsgWhUkVwU+IGy+BVp1gVFd59I2p6NkB+1Ex+kL4Rf3di2SEve2KS9RVbppD gQJ/jDcuq5Tbmh39PQqI0vjKbESsj74= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711096460; x=1742632460; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=mRVzyIiP7iscM7MSn5nw7hKjhSKi3ZHs2yFxGw3ITds=; b=cUpPQHxpcPVYzxvyl1mOAAIUVIB4T2z/AFrp9LuEDcLkZ9B0V9tZefxR nLOKXW8dG8WoyzqfUjPeyjQeIEp0Nb8sUuyPh4IVorxSTIOwowCUEkk6h OGNz5arW93szvEpYOBdRIssien4PumoL+AJpUEnuENVUPD4IEOTwQhKCO q8fJwt5wElhT/+hJdxth8YMWzI7wXvtFPyQ4c8xQ7NW9zp+Ro+VTm8ePk kITkmoGVnSpugINhh8Sq5S7Nl9wutJi+FtQr+L+5/RmlW+VgfBxPh4dsO YM+EZn2ZXAAvB7E2gLIn0yWCSxHSNItY8S8683GGl9c2bW2rOZj1+z34Q g==; X-IronPort-AV: E=McAfee;i="6600,9927,11020"; a="6024938" X-IronPort-AV: E=Sophos;i="6.07,145,1708416000"; d="scan'208";a="6024938" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Mar 2024 01:34:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,145,1708416000"; d="scan'208";a="19381543" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Mar 2024 01:34:14 -0700 From: "Huang, Ying" To: Donet Tom Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Aneesh Kumar , Michal Hocko , Dave Hansen , Mel Gorman , Feng Tang , Andrea Arcangeli , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , Matthew Wilcox , Vlastimil Babka , Dan Williams , Hugh Dickins , Kefeng Wang , Suren Baghdasaryan Subject: Re: [PATCH v3 2/2] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy In-Reply-To: (Donet Tom's message of "Thu, 21 Mar 2024 06:29:51 -0500") References: Date: Fri, 22 Mar 2024 16:32:20 +0800 Message-ID: <87h6gyr7jf.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 0BBEE18000E X-Stat-Signature: 6khhz6jnaq4owx7gojk4i7ptz8huj4gz X-Rspam-User: X-HE-Tag: 1711096459-327070 X-HE-Meta: U2FsdGVkX18rvc5cweWLgVPWvqgtDhWU0jZNg1df9fJGkNcbxXijxtvbS5beFkF53KdWIpTkJjPhddwwUtuEbiyJmMj+ui1ETcDmCYzQZ687jIoQDBfY+thNNyJRo7OZ6N3ThY1jni9i0grdOPEaaGZeHb2fuy3/RxO4351Iesmct6JH2IssCztNXdPgUCqRAHb5vfGe0qAkLTvNuiotwrilGMsYOU2PNRSE35LEecum4AVMwjD94Y1BHHefW/oUsLE700XQ1j4R4hboGqZ+uIqYx1yUDiQxKoBL8uEduZhLseZQgILoyvfkZipcDHgm7NOOaAzLH+8kIvqGkt8QYEC25pEbol+zlptUw5iSyF4oZZoZ2LDKRFM0teo2XQalEdt/GGu+gnj8s01uJf5datjGFWovYSSq4aDM/idpfujsHb+Jb2OTTs1o5w+0BGaX2R2WtZNHAlOHfdCfQljDhj9dU/RJjFxMXOMGgJMsAW2mYrjFKwGXbCEPfbErv14VYNvg58aiAtoCxnX0J7s8YtxqgALFSGAdJl97K2jHE4f/nZBC/szwNQcYSDM425u1LLXcHURqgdOa4E/gXahK5bqypKi5jZN463bG0K2biIii7r1B7xW9rWlfbeCqVFICk+WYkxfSx9NyW7Z28AabcPl9HGbLjK5rrJQiWnGaWtNvoZFlfmpmJEPXQKRigKYbYuANxhROzrnO++3NQBuupgyXBMh9KprFZmCxDV6cHEG7NWRIHt0wQxpyQTjK1TyJqkKpTos0X5FGR0AUgnAZdtkOmH1sG8At/KDKrkiMbEf4WyJn+QTzPQaj4LAuMmtDEXTpRXeBJFLjM8cBzOD+pYA68JvqjBuHuHPYnU8dU++f2F3Zyj0h6CEEbtBxVwO37cIoqiI6CfQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Donet Tom writes: > commit bda420b98505 ("numa balancing: migrate on fault among multiple bound > nodes") added support for migrate on protnone reference with MPOL_BIND > memory policy. This allowed numa fault migration when the executing node > is part of the policy mask for MPOL_BIND. This patch extends migration > support to MPOL_PREFERRED_MANY policy. > > Currently, we cannot specify MPOL_PREFERRED_MANY with the mempolicy flag > MPOL_F_NUMA_BALANCING. This causes issues when we want to use > NUMA_BALANCING_MEMORY_TIERING. To effectively use the slow memory tier, > the kernel should not allocate pages from the slower memory tier via > allocation control zonelist fallback. Instead, we should move cold pages > from the faster memory node via memory demotion. For a page allocation, > kswapd is only woken up after we try to allocate pages from all nodes in > the allocation zone list. This implies that, without using memory > policies, we will end up allocating hot pages in the slower memory tier. > > MPOL_PREFERRED_MANY was added by commit b27abaccf8e8 ("mm/mempolicy: add > MPOL_PREFERRED_MANY for multiple preferred nodes") to allow better > allocation control when we have memory tiers in the system. With > MPOL_PREFERRED_MANY, the user can use a policy node mask consisting only > of faster memory nodes. When we fail to allocate pages from the faster > memory node, kswapd would be woken up, allowing demotion of cold pages > to slower memory nodes. > > With the current kernel, such usage of memory policies implies we can't > do page promotion from a slower memory tier to a faster memory tier > using numa fault. This patch fixes this issue. > > For MPOL_PREFERRED_MANY, if the executing node is in the policy node > mask, we allow numa migration to the executing nodes. If the executing > node is not in the policy node mask, we do not allow numa migration. Can we provide more information about this? I suggest to use an example, for instance, pages may be distributed among multiple sockets unexpectedly. -- Best Regards, Huang, Ying > Signed-off-by: Aneesh Kumar K.V (IBM) > Signed-off-by: Donet Tom > --- > mm/mempolicy.c | 22 +++++++++++++++++----- > 1 file changed, 17 insertions(+), 5 deletions(-) > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index aa48376e2d34..13100a290918 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -1504,9 +1504,10 @@ static inline int sanitize_mpol_flags(int *mode, unsigned short *flags) > if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES)) > return -EINVAL; > if (*flags & MPOL_F_NUMA_BALANCING) { > - if (*mode != MPOL_BIND) > + if (*mode == MPOL_BIND || *mode == MPOL_PREFERRED_MANY) > + *flags |= (MPOL_F_MOF | MPOL_F_MORON); > + else > return -EINVAL; > - *flags |= (MPOL_F_MOF | MPOL_F_MORON); > } > return 0; > } > @@ -2770,15 +2771,26 @@ int mpol_misplaced(struct folio *folio, struct vm_fault *vmf, > break; > > case MPOL_BIND: > - /* Optimize placement among multiple nodes via NUMA balancing */ > + case MPOL_PREFERRED_MANY: > + /* > + * Even though MPOL_PREFERRED_MANY can allocate pages outside > + * policy nodemask we don't allow numa migration to nodes > + * outside policy nodemask for now. This is done so that if we > + * want demotion to slow memory to happen, before allocating > + * from some DRAM node say 'x', we will end up using a > + * MPOL_PREFERRED_MANY mask excluding node 'x'. In such scenario > + * we should not promote to node 'x' from slow memory node. > + */ > if (pol->flags & MPOL_F_MORON) { > + /* > + * Optimize placement among multiple nodes > + * via NUMA balancing > + */ > if (node_isset(thisnid, pol->nodes)) > break; > goto out; > } > - fallthrough; > > - case MPOL_PREFERRED_MANY: > /* > * use current page if in policy nodemask, > * else select nearest allowed node, if any.