From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1700C48BC3 for ; Tue, 20 Feb 2024 07:20:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A6F06B0075; Tue, 20 Feb 2024 02:20:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 657AD6B0087; Tue, 20 Feb 2024 02:20:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5466C6B0088; Tue, 20 Feb 2024 02:20:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 45C016B0087 for ; Tue, 20 Feb 2024 02:20:18 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 14E35A05D9 for ; Tue, 20 Feb 2024 07:20:18 +0000 (UTC) X-FDA: 81811333716.18.5F306A0 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) by imf28.hostedemail.com (Postfix) with ESMTP id 546E3C000C for ; Tue, 20 Feb 2024 07:20:15 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Rmhc5P+h; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.10 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708413616; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3oqqLKRlH3fN8t5NPCMJWWMTnEWqZ79zmv88ECvLA3A=; b=tly2h/fzUpAcDL9jhFQHDRom5cIkZ/YVd0S6bDYhrS4yupkEsJVhhW/uHm8lx/E07dz3zw kDeowci3cu9ujP9SIyUPOGud9Z9m0fufcfltDDD6lij6xTtk5UEmByHl2m3W2oHK2wbqYP 29H+sJ0z8ceWBLSqVjEAGE+PNQxmwtU= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Rmhc5P+h; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.10 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708413616; a=rsa-sha256; cv=none; b=GmrWnggK4XQRLm8cpdWSMTnW6I6HigoIjNlwEPjYVpducx238k0AfL9Iee2B9jMhHYpwJz WRpwe/K0yXg68yT4Omg2f1S22hmczUaNsArSIvFC427y5rW89Nb7iC0hApAoDZ62QzhY6E SQHu6i/BeFDtREj7CbtvEQz/t+xHEe0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1708413616; x=1739949616; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=OHde8bTYZStDVq7VnJxwBpRl980/3rKekMTSgOwZ8Pw=; b=Rmhc5P+hkwBd6L3cMN24KPsXNhIiqwtn3kRLO8Ihj1QcaRfnNUggGJ23 EmCXZthAxHnnbYTVvw/zUSqu635pm63O0T0OOZZTYBSJZNk85yuturhp+ qk72X0nVdQnUD1M41XW4SVYnUUDUEWynD/AJF8v9u0tRbWvvQf6knRkwT 2RDF4xxhqcSdJkxlFrAG2nk2N8UChgvPL0eInzLeBfOpNDIpzn2Hvh1aJ wuKJXkTS4mIUpZih4d8QidDwqFREwwtIBLSZ+gBswUkTprDVCHiK+p2X6 eNiv6ieDDBhlvVsp/ooJhM4gThXKdxXEjkUH3gOgkrTBh4E5+QEGPf1ZI g==; X-IronPort-AV: E=McAfee;i="6600,9927,10989"; a="19937359" X-IronPort-AV: E=Sophos;i="6.06,172,1705392000"; d="scan'208";a="19937359" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Feb 2024 23:20:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.06,172,1705392000"; d="scan'208";a="9308569" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Feb 2024 23:20:08 -0800 From: "Huang, Ying" To: Donet Tom Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Aneesh Kumar , Dave Hansen , Mel Gorman , Ben Widawsky , Feng Tang , Michal Hocko , Andrea Arcangeli , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , Matthew Wilcox , Mike Kravetz , Vlastimil Babka , Dan Williams , Hugh Dickins , Kefeng Wang , Suren Baghdasaryan Subject: Re: [PATCH 3/3] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy In-Reply-To: <8d7737208bd24e754dc7a538a3f7f02de84f1f72.1708097962.git.donettom@linux.ibm.com> (Donet Tom's message of "Sat, 17 Feb 2024 01:31:35 -0600") References: <9c3f7b743477560d1c5b12b8c111a584a2cc92ee.1708097962.git.donettom@linux.ibm.com> <8d7737208bd24e754dc7a538a3f7f02de84f1f72.1708097962.git.donettom@linux.ibm.com> Date: Tue, 20 Feb 2024 15:18:13 +0800 Message-ID: <877cizppsa.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 546E3C000C X-Stat-Signature: bisghpw56qd73sqrg9rtic8xgnxo1cb8 X-HE-Tag: 1708413615-78460 X-HE-Meta: U2FsdGVkX1/6DUB8a+AFbkkvVVhegeYJYpmC9R//IvZywikitFsBosNsq/XbiZ1bXGzsVgoujGB1gUSYkgzyx6DAf2Kk6ETppA0IWoX9vtcNdgk9GVT98IHYr8l1qUakM/nBbrPd5ChqGAcZHaIIXi/1zNtD2cFkb4a+HG4GOXkde2+M6K+VTrB4OJU7+Zz2bIUCY4RAqH6tBzQWiRwbf8k3kBQw27xF+cKjj/lOyNYQNFFdKzbu2r25NKa9QdljZQetL/hSvn7gqyH6fV54mF642tvB2hb3SOFJcrIuo+0OaEhVBVeSDSNbyFtB14CTnh2aMT1SXKpOYnkX/ZHr18SziypjHEGNukx8vgu1C5+GhRjR/jIiNxFyzSOxJ7pXNXGmo+sKKtOFqVUVp4zL8kmM66AJrxX1qxjl0M+y86zRSEMMRnYcwNKR3GF8AxIOFpNCXKpW/yh7NwnxlYT1HTUaQH2lErwaYeh0+QuC4vokp2hm1Mg0VXLm4XEv9zk8BuH0ZEVrb0r/bvsvNxu7u6SOdBYnRDgFAlEWGXiaMzrbN/rVspCazeydXS+YDUbGwpln3OxNwluGcHE4vp2gdVPYt2Kr+m4EfBz8P6WmG0TPrmD+61mepif9jFVkwHNELmkhWl1aZJs4fvTraiyzEy3uKyKLGGy+8b3x6hD0z13jmfgUAwCxKz5cxmawQBOkJCYglX+DO8dZyXLUH4wv2AFYN42MnX64p4dglOhJLC0ZxABnGxR9rHCQ4/qSavMf2pfW/kMs5ZdPVybNVw+OdQi8fUQbN/EdnNJnODjaTBGAPwUJ9CajBsgj4fOHEXHCBHiJEUUnCdZphHOJ3KGuMDGlO9Z0DhPJcjVK6Wj5jPCcQLXZlvaulofA/SEyU6lBvIXhJA3KtlQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Donet Tom writes: > commit bda420b98505 ("numa balancing: migrate on fault among multiple bound > nodes") added support for migrate on protnone reference with MPOL_BIND > memory policy. This allowed numa fault migration when the executing node > is part of the policy mask for MPOL_BIND. This patch extends migration > support to MPOL_PREFERRED_MANY policy. > > Currently, we cannot specify MPOL_PREFERRED_MANY with the mempolicy flag > MPOL_F_NUMA_BALANCING. This causes issues when we want to use > NUMA_BALANCING_MEMORY_TIERING. To effectively use the slow memory tier, > the kernel should not allocate pages from the slower memory tier via > allocation control zonelist fallback. Instead, we should move cold pages > from the faster memory node via memory demotion. For a page allocation, > kswapd is only woken up after we try to allocate pages from all nodes in > the allocation zone list. This implies that, without using memory > policies, we will end up allocating hot pages in the slower memory tier. > > MPOL_PREFERRED_MANY was added by commit b27abaccf8e8 ("mm/mempolicy: add > MPOL_PREFERRED_MANY for multiple preferred nodes") to allow better > allocation control when we have memory tiers in the system. With > MPOL_PREFERRED_MANY, the user can use a policy node mask consisting only > of faster memory nodes. When we fail to allocate pages from the faster > memory node, kswapd would be woken up, allowing demotion of cold pages > to slower memory nodes. > > With the current kernel, such usage of memory policies implies we can't > do page promotion from a slower memory tier to a faster memory tier > using numa fault. This patch fixes this issue. > > For MPOL_PREFERRED_MANY, if the executing node is in the policy node > mask, we allow numa migration to the executing nodes. If the executing > node is not in the policy node mask but the folio is already allocated > based on policy preference (the folio node is in the policy node mask), > we don't allow numa migration. If both the executing node and folio node > are outside the policy node mask, we allow numa migration to the > executing nodes. > > Signed-off-by: Aneesh Kumar K.V (IBM) > Signed-off-by: Donet Tom > --- > mm/mempolicy.c | 28 ++++++++++++++++++++++++++-- > 1 file changed, 26 insertions(+), 2 deletions(-) > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index 73d698e21dae..8c4c92b10371 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -1458,9 +1458,10 @@ static inline int sanitize_mpol_flags(int *mode, unsigned short *flags) > if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES)) > return -EINVAL; > if (*flags & MPOL_F_NUMA_BALANCING) { > - if (*mode != MPOL_BIND) > + if (*mode == MPOL_BIND || *mode == MPOL_PREFERRED_MANY) > + *flags |= (MPOL_F_MOF | MPOL_F_MORON); > + else > return -EINVAL; > - *flags |= (MPOL_F_MOF | MPOL_F_MORON); > } > return 0; > } > @@ -2463,6 +2464,23 @@ static void sp_free(struct sp_node *n) > kmem_cache_free(sn_cache, n); > } > > +static inline bool mpol_preferred_should_numa_migrate(int exec_node, int folio_node, > + struct mempolicy *pol) > +{ > + /* if the executing node is in the policy node mask, migrate */ > + if (node_isset(exec_node, pol->nodes)) > + return true; > + > + /* If the folio node is in policy node mask, don't migrate */ > + if (node_isset(folio_node, pol->nodes)) > + return false; > + /* > + * both the folio node and executing node are outside the policy nodemask, > + * migrate as normal numa fault migration. > + */ > + return true; Why? This may cause some unexpected result. For example, pages may be distributed among multiple sockets unexpectedly. So, I prefer the more conservative policy, that is, only migrate if this node is in pol->nodes. -- Best Regards, Huang, Ying > +} > + > /** > * mpol_misplaced - check whether current folio node is valid in policy > * > @@ -2526,6 +2544,12 @@ int mpol_misplaced(struct folio *folio, struct vm_area_struct *vma, > break; > > case MPOL_PREFERRED_MANY: > + if (pol->flags & MPOL_F_MORON) { > + if (!mpol_preferred_should_numa_migrate(thisnid, curnid, pol)) > + goto out; > + break; > + } > + > /* > * use current page if in policy nodemask, > * else select nearest allowed node, if any.