From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E3B8C54764 for ; Tue, 20 Feb 2024 07:54:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9FDC16B0085; Tue, 20 Feb 2024 02:54:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9AEED6B0088; Tue, 20 Feb 2024 02:54:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 878C76B0089; Tue, 20 Feb 2024 02:54:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7543A6B0085 for ; Tue, 20 Feb 2024 02:54:13 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id A149940582 for ; Tue, 20 Feb 2024 07:54:12 +0000 (UTC) X-FDA: 81811419144.08.1BD63F4 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf03.hostedemail.com (Postfix) with ESMTP id 1D11C2000B for ; Tue, 20 Feb 2024 07:54:10 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="Vr/2mGgF"; spf=pass (imf03.hostedemail.com: domain of aneesh.kumar@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=aneesh.kumar@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708415651; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Q8xxIZHJKfSWHeEttC/KI/t79x9L+4NFuahYf5DAJxg=; b=IcYMSeSAegTalkLaSrq0vUGvvEeCUidwLYNnRY5+omlU04XUdP9bZ3Ye6IL7YAOGSk+Nzb dw8zl9spf5vGP44th8SJuK/CYlRJscfsohTw0ajy7eoFqxchnnG9zfX91QJWi62b7GS68m GTY3U2/iNInAVPyffBAGcRJdB8sgnOg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708415651; a=rsa-sha256; cv=none; b=X82Rp59ZHzF8Og8KwHms2hyCnReP715lBRPQabB7pXbcjh2PQvtfLM7nNAumOIv7dpPxPG qWofsD+DcJCtsFBucVR0AueQbB5ZqYN7VZp1cyiqGDXgOb/j3A2BnqE/vHR5BF+ZFGvI7k PKYvwtqE/EABIa6WSB0Nbfar4JmpR94= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="Vr/2mGgF"; spf=pass (imf03.hostedemail.com: domain of aneesh.kumar@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=aneesh.kumar@kernel.org; dmarc=pass (policy=none) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 11D266103E; Tue, 20 Feb 2024 07:54:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 76B5BC433C7; Tue, 20 Feb 2024 07:54:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1708415649; bh=nZRwj2fpad3qzEKDGddArjPflIxngkmQgThhJYg64dc=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=Vr/2mGgFfAue0SY+UF7cEYZOwWdvp5FO1cUEQ2Wf2EHU31cahpsEujyr03eOYRAyK hU0zoIJExoHn6KqcUZIT5D4gpst4/yDY3qmPglzpc3lZfcoWVwRB2FbgfwfSzPRZMU fo7Rj+/wuIIp03poIfY63SM+Gs7MQqVh/UmD3Wx1wjfoa+Aooq2PX3I6iY8RMPOI0f FsezOj6nz4TsmlocRTpgED0mgpocO2JBQ+4iybcAYTN1X9NiT+PDnsip9GbGHTOqtY 4QgKzroZ0KsseifiabXTIRIWIi8pFNYYcfgJlVjOo4Rgbf7fq5H1z2YFYybYeTR532 eLZh+M7Pqs7EQ== X-Mailer: emacs 29.2 (via feedmail 11-beta-1 I) From: Aneesh Kumar K.V To: "Huang, Ying" , Donet Tom Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dave Hansen , Mel Gorman , Ben Widawsky , Feng Tang , Michal Hocko , Andrea Arcangeli , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , Matthew Wilcox , Mike Kravetz , Vlastimil Babka , Dan Williams , Hugh Dickins , Kefeng Wang , Suren Baghdasaryan Subject: Re: [PATCH 3/3] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy In-Reply-To: <877cizppsa.fsf@yhuang6-desk2.ccr.corp.intel.com> References: <9c3f7b743477560d1c5b12b8c111a584a2cc92ee.1708097962.git.donettom@linux.ibm.com> <8d7737208bd24e754dc7a538a3f7f02de84f1f72.1708097962.git.donettom@linux.ibm.com> <877cizppsa.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Tue, 20 Feb 2024 13:23:59 +0530 Message-ID: <87sf1nzi3s.fsf@kernel.org> MIME-Version: 1.0 Content-Type: text/plain X-Stat-Signature: h1goaj8ca7aifhx5xb5mnah84ouddn8x X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 1D11C2000B X-Rspam-User: X-HE-Tag: 1708415650-111335 X-HE-Meta: U2FsdGVkX19yY/ofAaIxVNffa/mHvG3acSQPeAnSrKIQwN23TYTqcsXFp7ZtCg/rj4X3KOOhw6wcPDfBqLP9hhzIZiptnRhHZSaYFMpVnF+eA3Ybh+lVziFi/eyv3rAk8ftCSrmZfkDw9/G9gQ/m1j1SQpWTB7zclFSeoZIs+yGseTjMXa6ooVNr7xY4i+/aq5cgTC8zJZCRA3rEW+n7/5U4rx1+n0qaQMdux9mPTvSzda+zU2cpdS/T7tNneOvEKuizIbSrMNB8ODe3X9+etUrv0SWQQXSRQdxzLYhUjy3df9IMMX1sfH0bWOEATwUhvJ1P9WDdlM3npT7UYuSNwY+IzuNE8kYD3SeWg0j9IgXTZRSR9+mruejT/YNvK8SCkXChtijDNu0N7whOAVngHwMxhyF4uYSHcaY4neMGLzwU1j0OrMjkjpSiPg7fX3Gqpf8gpxHlH2yJfD+spngzEKBza/9oqtssh0foRQvA7Sf0WKgOAB8aFEQwfBXOgjacrloy4M2sI3kYDlh5TEC9FXDEnx7y1RMQRaoB4Ge+2psOhajGy3y4L9rnU39H0/pypZYe5RSgCHBLJ/xJVJ2PAFYSQWIIL6s1AUDcXDZ1C7SqBsLAa0Jaz80tvIjFcT/z+LIepeMQvHfm7Bv4JR3ZhHb2Pbp4Njv3IBPXUIbjhNZY3vtT1fWNFR2TfhdvgCbVB1lM4KCEU7vWQTHmqoXKEgmI2wMAV//Y8T4Nwr5/uoBZEOCeYcHk1/DE3JFZtu02wVBWUYGgmMkO1tRs536BK9no+HT/5TfLp4FOeqC/x1chgBbZoUPAQKURJ3NRNasfrXGeEQvdiKsAw6n37Qy9pEJAQ/01KiJf51xiodflpyO3cjGjTd48hDWMnpjaxk9h9xqS04npRKAgmQ+BQzoEyahfNioMgMf83RLMUsghvR7akBVoLtu7DeJ+IUSVSYej X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: "Huang, Ying" writes: > Donet Tom writes: > >> commit bda420b98505 ("numa balancing: migrate on fault among multiple bound >> nodes") added support for migrate on protnone reference with MPOL_BIND >> memory policy. This allowed numa fault migration when the executing node >> is part of the policy mask for MPOL_BIND. This patch extends migration >> support to MPOL_PREFERRED_MANY policy. >> >> Currently, we cannot specify MPOL_PREFERRED_MANY with the mempolicy flag >> MPOL_F_NUMA_BALANCING. This causes issues when we want to use >> NUMA_BALANCING_MEMORY_TIERING. To effectively use the slow memory tier, >> the kernel should not allocate pages from the slower memory tier via >> allocation control zonelist fallback. Instead, we should move cold pages >> from the faster memory node via memory demotion. For a page allocation, >> kswapd is only woken up after we try to allocate pages from all nodes in >> the allocation zone list. This implies that, without using memory >> policies, we will end up allocating hot pages in the slower memory tier. >> >> MPOL_PREFERRED_MANY was added by commit b27abaccf8e8 ("mm/mempolicy: add >> MPOL_PREFERRED_MANY for multiple preferred nodes") to allow better >> allocation control when we have memory tiers in the system. With >> MPOL_PREFERRED_MANY, the user can use a policy node mask consisting only >> of faster memory nodes. When we fail to allocate pages from the faster >> memory node, kswapd would be woken up, allowing demotion of cold pages >> to slower memory nodes. >> >> With the current kernel, such usage of memory policies implies we can't >> do page promotion from a slower memory tier to a faster memory tier >> using numa fault. This patch fixes this issue. >> >> For MPOL_PREFERRED_MANY, if the executing node is in the policy node >> mask, we allow numa migration to the executing nodes. If the executing >> node is not in the policy node mask but the folio is already allocated >> based on policy preference (the folio node is in the policy node mask), >> we don't allow numa migration. If both the executing node and folio node >> are outside the policy node mask, we allow numa migration to the >> executing nodes. >> >> Signed-off-by: Aneesh Kumar K.V (IBM) >> Signed-off-by: Donet Tom >> --- >> mm/mempolicy.c | 28 ++++++++++++++++++++++++++-- >> 1 file changed, 26 insertions(+), 2 deletions(-) >> >> diff --git a/mm/mempolicy.c b/mm/mempolicy.c >> index 73d698e21dae..8c4c92b10371 100644 >> --- a/mm/mempolicy.c >> +++ b/mm/mempolicy.c >> @@ -1458,9 +1458,10 @@ static inline int sanitize_mpol_flags(int *mode, unsigned short *flags) >> if ((*flags & MPOL_F_STATIC_NODES) && (*flags & MPOL_F_RELATIVE_NODES)) >> return -EINVAL; >> if (*flags & MPOL_F_NUMA_BALANCING) { >> - if (*mode != MPOL_BIND) >> + if (*mode == MPOL_BIND || *mode == MPOL_PREFERRED_MANY) >> + *flags |= (MPOL_F_MOF | MPOL_F_MORON); >> + else >> return -EINVAL; >> - *flags |= (MPOL_F_MOF | MPOL_F_MORON); >> } >> return 0; >> } >> @@ -2463,6 +2464,23 @@ static void sp_free(struct sp_node *n) >> kmem_cache_free(sn_cache, n); >> } >> >> +static inline bool mpol_preferred_should_numa_migrate(int exec_node, int folio_node, >> + struct mempolicy *pol) >> +{ >> + /* if the executing node is in the policy node mask, migrate */ >> + if (node_isset(exec_node, pol->nodes)) >> + return true; >> + >> + /* If the folio node is in policy node mask, don't migrate */ >> + if (node_isset(folio_node, pol->nodes)) >> + return false; >> + /* >> + * both the folio node and executing node are outside the policy nodemask, >> + * migrate as normal numa fault migration. >> + */ >> + return true; > > Why? This may cause some unexpected result. For example, pages may be > distributed among multiple sockets unexpectedly. So, I prefer the more > conservative policy, that is, only migrate if this node is in > pol->nodes. > This will only have an impact if the user specifies MPOL_F_NUMA_BALANCING. This means that the user is explicitly requesting for frequently accessed memory pages to be migrated. Memory policy MPOL_PREFERRED_MANY is able to allocate pages from nodes outside of policy->nodes. For the specific use case that I am interested in, it should be okay to restrict it to policy->nodes. However, I am wondering if this is too restrictive given the definition of MPOL_PREFERRED_MANY. -aneesh