From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92355C55178 for ; Wed, 28 Oct 2020 02:35:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1545321D7B for ; Wed, 28 Oct 2020 02:35:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1545321D7B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id ACCCB6B0062; Tue, 27 Oct 2020 22:34:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A7C8D6B0068; Tue, 27 Oct 2020 22:34:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B9EB6B006C; Tue, 27 Oct 2020 22:34:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0236.hostedemail.com [216.40.44.236]) by kanga.kvack.org (Postfix) with ESMTP id 6F6766B0062 for ; Tue, 27 Oct 2020 22:34:59 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 10E57181AEF10 for ; Wed, 28 Oct 2020 02:34:59 +0000 (UTC) X-FDA: 77419766718.13.order25_30120a227281 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id D356318140B60 for ; Wed, 28 Oct 2020 02:34:58 +0000 (UTC) X-HE-Tag: order25_30120a227281 X-Filterd-Recvd-Size: 5282 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Wed, 28 Oct 2020 02:34:57 +0000 (UTC) IronPort-SDR: 9pOjhY4lQVgE7cPHiCgaLxqLutDcH8IPQEXVV/j/CFmZ1sagZ0lbQVgFClqoe8R6UjJqwNi1Us BC1VRcdLRJeA== X-IronPort-AV: E=McAfee;i="6000,8403,9787"; a="165603439" X-IronPort-AV: E=Sophos;i="5.77,425,1596524400"; d="scan'208";a="165603439" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2020 19:34:56 -0700 IronPort-SDR: Z/1XgyJ1AzVx9JBvPFhf8iNM4LPPKJffY+SFmFmM84bqW3F98HgPrNYgdjqv8qcVtFo2fo7+Jc uA4SZK1c28IQ== X-IronPort-AV: E=Sophos;i="5.77,425,1596524400"; d="scan'208";a="536043675" Received: from yhuang-mobile.sh.intel.com ([10.238.5.184]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2020 19:34:51 -0700 From: Huang Ying To: Peter Zijlstra Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , Andrew Morton , Ingo Molnar , Mel Gorman , Rik van Riel , Johannes Weiner , "Matthew Wilcox (Oracle)" , Dave Hansen , Andi Kleen , Michal Hocko , David Rientjes Subject: [PATCH -V2 2/2] autonuma: Migrate on fault among multiple bound nodes Date: Wed, 28 Oct 2020 10:34:11 +0800 Message-Id: <20201028023411.15045-3-ying.huang@intel.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201028023411.15045-1-ying.huang@intel.com> References: <20201028023411.15045-1-ying.huang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now, AutoNUMA can only optimize the page placement among the NUMA nodes i= f the default memory policy is used. Because the memory policy specified expli= citly should take precedence. But this seems too strict in some situations. F= or example, on a system with 4 NUMA nodes, if the memory of an application i= s bound to the node 0 and 1, AutoNUMA can potentially migrate the pages between t= he node 0 and 1 to reduce cross-node accessing without breaking the explicit memo= ry binding policy. So in this patch, if mbind(.mode=3DMPOL_BIND, .flags=3DMPOL_MF_LAZY) is u= sed to bind the memory of the application to multiple nodes, and in the hint page fau= lt handler both the faulting page node and the accessing node are in the pol= icy nodemask, the page will be tried to be migrated to the accessing node to = reduce the cross-node accessing. [Peter Zijlstra: provided the simplified implementation method.] Questions: Sysctl knob kernel.numa_balancing can enable/disable AutoNUMA optimizing globally. But for the memory areas that are bound to multiple NUMA nodes= , even if the AutoNUMA is enabled globally via the sysctl knob, we still need to= enable AutoNUMA again with a special flag. Why not just optimize the page place= ment if possible as long as AutoNUMA is enabled globally? The interface would lo= ok simpler with that. Signed-off-by: "Huang, Ying" Cc: Andrew Morton Cc: Ingo Molnar Cc: Mel Gorman Cc: Rik van Riel Cc: Johannes Weiner Cc: "Matthew Wilcox (Oracle)" Cc: Dave Hansen Cc: Andi Kleen Cc: Michal Hocko Cc: David Rientjes --- mm/mempolicy.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index f6948b659643..d0d25c2601a0 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2490,15 +2490,19 @@ int mpol_misplaced(struct page *page, struct vm_a= rea_struct *vma, unsigned long break; =20 case MPOL_BIND: - /* - * allows binding to multiple nodes. - * use current page if in policy nodemask, - * else select nearest allowed node, if any. - * If no allowed nodes, use current [!misplaced]. + * Allows binding to multiple nodes. If both current and + * accessing nodes are in policy nodemask, migrate to + * accessing node to optimize page placement. Otherwise, + * use current page if in policy nodemask, else select + * nearest allowed node, if any. If no allowed nodes, use + * current [!misplaced]. */ - if (node_isset(curnid, pol->v.nodes)) + if (node_isset(curnid, pol->v.nodes)) { + if (node_isset(thisnid, pol->v.nodes)) + goto mopron; goto out; + } z =3D first_zones_zonelist( node_zonelist(numa_node_id(), GFP_HIGHUSER), gfp_zone(GFP_HIGHUSER), @@ -2512,6 +2516,7 @@ int mpol_misplaced(struct page *page, struct vm_are= a_struct *vma, unsigned long =20 /* Migrate the page towards the node whose CPU is referencing it */ if (pol->flags & MPOL_F_MOPRON) { +mopron: polnid =3D thisnid; =20 if (!should_numa_migrate_memory(current, page, curnid, thiscpu)) --=20 2.28.0