From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05586C4346E for ; Thu, 24 Sep 2020 08:25:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8E2E1208B8 for ; Thu, 24 Sep 2020 08:25:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8E2E1208B8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2F1B890000E; Thu, 24 Sep 2020 04:25:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A26790000C; Thu, 24 Sep 2020 04:25:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1DFA790000E; Thu, 24 Sep 2020 04:25:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0001.hostedemail.com [216.40.44.1]) by kanga.kvack.org (Postfix) with ESMTP id 09E8390000C for ; Thu, 24 Sep 2020 04:25:50 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id C176F8249980 for ; Thu, 24 Sep 2020 08:25:49 +0000 (UTC) X-FDA: 77297271618.14.spy44_03022a42715d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id A214318229818 for ; Thu, 24 Sep 2020 08:25:49 +0000 (UTC) X-HE-Tag: spy44_03022a42715d X-Filterd-Recvd-Size: 5282 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Thu, 24 Sep 2020 08:25:48 +0000 (UTC) IronPort-SDR: +gi9kh1DlDScE57boD8MKp2xH++jHLbgU76UOogIzJ/UFpQ8zZYlj8L88SZTuA+INJhyI9XdXd gK/qd3QudhEA== X-IronPort-AV: E=McAfee;i="6000,8403,9753"; a="148887091" X-IronPort-AV: E=Sophos;i="5.77,296,1596524400"; d="scan'208";a="148887091" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Sep 2020 01:25:48 -0700 IronPort-SDR: LDhIrBWR8ke5+Wr267OXdcKiI3pTv1EVGBwoKfMTnDU6HRlBRuOk3sT+IRaRcg7fyPOCqBKV3p Iul6CeLA9HvA== X-IronPort-AV: E=Sophos;i="5.77,296,1596524400"; d="scan'208";a="486812118" Received: from yhuang-mobile.sh.intel.com ([10.238.4.22]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Sep 2020 01:25:43 -0700 From: Huang Ying To: Peter Zijlstra Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , Andrew Morton , Ingo Molnar , Mel Gorman , Rik van Riel , Johannes Weiner , "Matthew Wilcox (Oracle)" , Dave Hansen , Andi Kleen , Michal Hocko , David Rientjes Subject: [PATCH 2/2] autonuma: Migrate on fault among multiple bound nodes Date: Thu, 24 Sep 2020 16:25:09 +0800 Message-Id: <20200924082509.445336-2-ying.huang@intel.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200924082509.445336-1-ying.huang@intel.com> References: <20200924082509.445336-1-ying.huang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now, AutoNUMA can only optimize the page placement among the NUMA nodes i= f the default memory policy is used. Because the memory policy specified expli= citly should take precedence. But this seems too strict in some situations. F= or example, on a system with 4 NUMA nodes, if the memory of an application i= s bound to the node 0 and 1, AutoNUMA can potentially migrate the pages between t= he node 0 and 1 to reduce cross-node accessing without breaking the explicit memo= ry binding policy. So in this patch, if mbind(.mode=3DMPOL_BIND, .flags=3DMPOL_MF_LAZY) is u= sed to bind the memory of the application to multiple nodes, and in the hint page fau= lt handler both the faulting page node and the accessing node are in the pol= icy nodemask, the page will be tried to be migrated to the accessing node to = reduce the cross-node accessing. [Peter Zijlstra: provided the simplified implementation method.] Questions: Sysctl knob kernel.numa_balancing can enable/disable AutoNUMA optimizing globally. But for the memory areas that are bound to multiple NUMA nodes= , even if the AutoNUMA is enabled globally via the sysctl knob, we still need to= enable AutoNUMA again with a special flag. Why not just optimize the page place= ment if possible as long as AutoNUMA is enabled globally? The interface would lo= ok simpler with that. Signed-off-by: "Huang, Ying" Cc: Andrew Morton Cc: Ingo Molnar Cc: Mel Gorman Cc: Rik van Riel Cc: Johannes Weiner Cc: "Matthew Wilcox (Oracle)" Cc: Dave Hansen Cc: Andi Kleen Cc: Michal Hocko Cc: David Rientjes --- mm/mempolicy.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 62cd159aa46d..73119ee460c6 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2494,15 +2494,19 @@ int mpol_misplaced(struct page *page, struct vm_a= rea_struct *vma, unsigned long break; =20 case MPOL_BIND: - /* - * allows binding to multiple nodes. - * use current page if in policy nodemask, - * else select nearest allowed node, if any. - * If no allowed nodes, use current [!misplaced]. + * Allows binding to multiple nodes. If both current and + * accessing nodes are in policy nodemask, migrate to + * accessing node to optimize page placement. Otherwise, + * use current page if in policy nodemask, else select + * nearest allowed node, if any. If no allowed nodes, use + * current [!misplaced]. */ - if (node_isset(curnid, pol->v.nodes)) + if (node_isset(curnid, pol->v.nodes)) { + if (node_isset(thisnid, pol->v.nodes)) + goto mopron; goto out; + } z =3D first_zones_zonelist( node_zonelist(numa_node_id(), GFP_HIGHUSER), gfp_zone(GFP_HIGHUSER), @@ -2516,6 +2520,7 @@ int mpol_misplaced(struct page *page, struct vm_are= a_struct *vma, unsigned long =20 /* Migrate the page towards the node whose CPU is referencing it */ if (pol->flags & MPOL_F_MOPRON) { +mopron: polnid =3D thisnid; =20 if (!should_numa_migrate_memory(current, page, curnid, thiscpu)) --=20 2.28.0