From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1235BC48BC3 for ; Tue, 20 Feb 2024 03:57:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6FBC66B0075; Mon, 19 Feb 2024 22:57:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6ABEC6B0078; Mon, 19 Feb 2024 22:57:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5756F6B007B; Mon, 19 Feb 2024 22:57:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 46E3C6B0075 for ; Mon, 19 Feb 2024 22:57:44 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 181FF1C05A0 for ; Tue, 20 Feb 2024 03:57:44 +0000 (UTC) X-FDA: 81810823248.25.3255737 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf17.hostedemail.com (Postfix) with ESMTP id 9E65140010 for ; Tue, 20 Feb 2024 03:57:41 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="AX/MII87"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of aneesh.kumar@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=aneesh.kumar@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708401462; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0gASgYZgZi+It+7MHbIvMTDn9WKRqc9kOcHHZeAWS5M=; b=Yku9FawrpfJaWZhGS2nVDjVlt1gsxzItf476TqcAi+P1kBBPOsi/kLGmwemzLmg6959lhy eWM/tYEuRsHF7XpksDycgKqEvAKrn640epvMlmiHs+5dcbgsN8tzNRWJW77JqSRhbUuDpC q/5VNMK/DxQi8FivxdILEkJaDceA/d0= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="AX/MII87"; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of aneesh.kumar@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=aneesh.kumar@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708401462; a=rsa-sha256; cv=none; b=G5BnXBar7YAcHYovzkdlmbkoJf3KqWgy4IXEyvR3koVRRw0HjbsWWpib0KnZoRE6AAPVay NWuQDIdjApvlcF0nUWG0io+fpOj2oEsKhTB98G/BYqnZrei/lrlAS5nC5rPPObPhFzUGgn m8X4z8eqLJO+BTuc7H8dH83G82AXPgY= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id C6D53CE17D8; Tue, 20 Feb 2024 03:57:37 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4D584C433F1; Tue, 20 Feb 2024 03:57:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1708401457; bh=rZPnI4KMugcJSPwPz2HR6/oy8yjLOWhV4hBrxhikVuk=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=AX/MII87GnTedQL8c8IPeysawr8qITVTJp8VBbL3Wuo0RUQ4DOOhTktQ5ViuGNBXw tYPzTbuQD13aqGbWyDl0x5swwCNeyLcHv/weo+JNydbFaz7lJve504/22ncPNjF9Xo YV0CpohiV1R005N1bjxSCApesHcImfn6aFBuBUcRVbObMaSshxtnLQG71OiDI4Yhz6 gqRwAqPUvmqY5Dqn5lXmVtCjPBwRe/MXwZeOu6YM/s2DjhUZu0NN35HtIZmnk0+zMd rfamuRJ3RverZZc7iJWrcrXxKseZ57r0ZD9pZpTdGcQYH9sy9Z5CD5Tnnnu/tVh3NF ZT0zl0GzeEc7A== Message-ID: <0f0fd8e3-98b2-4001-ba6a-6a8a26a5393f@kernel.org> Date: Tue, 20 Feb 2024 09:27:25 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 3/3] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy Content-Language: en-US To: Michal Hocko , Donet Tom Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , Dave Hansen , Mel Gorman , Ben Widawsky , Feng Tang , Andrea Arcangeli , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , Matthew Wilcox , Mike Kravetz , Vlastimil Babka , Dan Williams , Hugh Dickins , Kefeng Wang , Suren Baghdasaryan References: <9c3f7b743477560d1c5b12b8c111a584a2cc92ee.1708097962.git.donettom@linux.ibm.com> <8d7737208bd24e754dc7a538a3f7f02de84f1f72.1708097962.git.donettom@linux.ibm.com> <25b420aa-3fe6-40a4-8d60-a46ab61ee7b7@linux.ibm.com> From: "Aneesh Kumar K.V" In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 9E65140010 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: trxgayi45owg1sz1d8ecc3aydkebfa6o X-HE-Tag: 1708401461-551307 X-HE-Meta: U2FsdGVkX1/yKge0QMZmzQddeD3wTfWIDMsc39638BqLQX0yIgllKfwURaP60ycz4K1tG2CX4kTaDy4Lk/YvesfG4FB61JMzgyv+V9cLmq1r2nhAtGgUC8e+DQXMRs3OCQIDG2MpXeogn8Ck4OaYFzKmoCerxJOnhY+ezvW9JcrK+t61cjWfZOeTGM2wAO5fqz5M+otZ+XwA4e253iNRJcDid1moLXUuSHOjx+9hnQSG8VMk+w9z7EJT07Tslk/Q4k5fzCCQHoxYBl+xrHONCGzOVN8rYPUk5uZ5g5pwGK4t8Dt4ww3V16SOIt6HSlltcMg/e6946sUrSaMGVYNJ8+PEyKm1AA0LCQaDXebXDoq+N+Z4kCko544FfpnwmxCa9eSVELZYIYcgwHbil0a1ux0+AKE8zfnfWpE/6dGf/Hlb54K27Ti9R4cPkMlUrZVlX7+ri3b9ERuB8OBHkbZzElhJTXB3juDunR2qF90stCKVB558Dn3x6YpF/rVj2RsJWtb11TAqF+GnlG/nxlLNO2Y5fiI71lZ/S1SGPyXdZIqdTpl8e5OyFKyNj3nNeMhVpDIEwHXU8It3fwn5orCypEozdgWmAv9UGk3xsnJjCWWajACiInFN3Of7zwknd+dA4Rl7SBEyDdVy3VjMqcjBdUoxf7CcHFAAzjtCKkmdfDryeP33w0slDZZbDJUy4Nwnu977ElVx+RwPKC2O0Y6+fCPewrY+NC6M2Ve4mKLbvi+I6xkme/exHoURz+Nmqa1HX/s+EK0eO/PXyWXKXcn41EMxnQ1BMpcEw0m7VPnMbz8nBrH5dDSNoSpUe0vW3kQWJ0cxZiynCFKcE5Y4UZyD8HvVCLBBl5VB23mk3mm2Hao2rYmqdvmA1un9Su180DNiWO6OHqYUFHIRFQ3rPjjw7Xb9jeeqXQPwkyZhvXWyF+jjxOgy7eUHQONs8/EuW1NQwI+19dKIX14gShF2Dke uYzsMGhI CHRrrNzZnHW9775XBZvXlRm5w4oFlJt3G1tYbYNCZi+frqAtumzDj5UzxRG0/kcafsqDfw44zV8BcTaPTsawKR2ZbhTuf5IGFzBsdRFZHOGuDPGYZ73OsI2LtJI2F2rSWDBBBp4rQ1pY139F2vwnBwCuOytBRPeO8gUhOAOfxPckIMH+uUkcSXNo/gfCXDh7xUGAD3AglLMZjJEvi1iAX9hGXC8lhT/sdUUfm1Wt4pstZCbXAgSP31w3xsAOi1Xh/dozI8K8piJbQXazj2wx9BK9xjQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2/20/24 12:42 AM, Michal Hocko wrote: > On Mon 19-02-24 20:37:17, Donet Tom wrote: >> >> On 2/19/24 19:50, Michal Hocko wrote: >>> On Sat 17-02-24 01:31:35, Donet Tom wrote: >>> [...] >>>> +static inline bool mpol_preferred_should_numa_migrate(int exec_node, int folio_node, >>>> + struct mempolicy *pol) >>>> +{ >>>> + /* if the executing node is in the policy node mask, migrate */ >>>> + if (node_isset(exec_node, pol->nodes)) >>>> + return true; >>>> + >>>> + /* If the folio node is in policy node mask, don't migrate */ >>>> + if (node_isset(folio_node, pol->nodes)) >>>> + return false; >>>> + /* >>>> + * both the folio node and executing node are outside the policy nodemask, >>>> + * migrate as normal numa fault migration. >>>> + */ >>>> + return true; >>>> +} >>> I have looked at this again and only now noticed that this doesn't >>> really work as one would expected. >>> >>> case MPOL_PREFERRED_MANY: >>> /* >>> * use current page if in policy nodemask, >>> * else select nearest allowed node, if any. >>> * If no allowed nodes, use current [!misplaced]. >>> */ >>> if (node_isset(curnid, pol->nodes)) >>> goto out; >>> z = first_zones_zonelist( >>> node_zonelist(numa_node_id(), GFP_HIGHUSER), >>> gfp_zone(GFP_HIGHUSER), >>> &pol->nodes); >>> polnid = zone_to_nid(z->zone); >>> break; >>> >>> Will collapse the whole MPOL_PREFERRED_MANY nodemask into the first >>> notde into that mask. Is that really what we want here? Shouldn't we use >>> the full nodemask as the migration target? >> >> With this patch it will take full nodemask and find out the correct migration target. It will not collapse into first node. > > Correct me if I am wrong, but mpol_misplaced will return the first node > of the preffered node mask and then migrate_misplaced_folio would use > it as a target node for alloc_misplaced_dst_folio which performs > __GFP_THISNODE allocation so it won't fall back to a different node. I think the confusion is between MPOL_F_MOF (migrate on fault) vs MPOL_F_MORON( protnone fault/numa fault). With MPOL_F_MOF alone what we wanted to achieve was to have have mbind() lazy migrate the pages based on policy node mask. The change was introduced in commit commit b24f53a0bea3 ("mm: mempolicy: Add MPOL_MF_LAZY") and later dropped by commit 2cafb582173f ("mempolicy: remove confusing MPOL_MF_LAZY dead code"). We still have mpol_misplaced changes to handle the node selection for MPOL_F_MOF flag (this is dead code IIUC). MPOL_F_MORON was added in commit 5606e3877ad8 ("mm: numa: Migrate on reference policy") and with currently upstream only MPOL_BIND support that flag. With that flag specified and with the changes in the patch mpol_misplaced becomes case MPOL_PREFERRED_MANY: if (pol->flags & MPOL_F_MORON) { if (!mpol_preferred_should_numa_migrate(thisnid, curnid, pol)) goto out; break; } /* * use current page if in policy nodemask, * else select nearest allowed node, if any. * If no allowed nodes, use current [!misplaced]. */ if (node_isset(curnid, pol->nodes)) goto out; z = first_zones_zonelist( node_zonelist(thisnid, GFP_HIGHUSER), gfp_zone(GFP_HIGHUSER), &pol->nodes); polnid = zone_to_nid(z->zone); break; .... ... } /* Migrate the folio towards the node whose CPU is referencing it */ if (pol->flags & MPOL_F_MORON) { polnid = thisnid; if (!should_numa_migrate_memory(current, folio, curnid, thiscpu)) goto out; } if (curnid != polnid) ret = polnid; out: mpol_cond_put(pol); return ret; } ie, if we can do numa migration, we select the currently executing node as the target node otherwise we end up returning from the function with ret = NUMA_NO_NODE. -aneesh