From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 204B6C4363D for ; Tue, 22 Sep 2020 06:54:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3F5FC23A1D for ; Tue, 22 Sep 2020 06:54:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3F5FC23A1D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9A72D6B009C; Tue, 22 Sep 2020 02:54:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 957E56B009D; Tue, 22 Sep 2020 02:54:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8707D6B009E; Tue, 22 Sep 2020 02:54:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0184.hostedemail.com [216.40.44.184]) by kanga.kvack.org (Postfix) with ESMTP id 704196B009C for ; Tue, 22 Sep 2020 02:54:36 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3869C8249980 for ; Tue, 22 Sep 2020 06:54:36 +0000 (UTC) X-FDA: 77289784152.08.bone59_0d029442714b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 18DA41819E76F for ; Tue, 22 Sep 2020 06:54:36 +0000 (UTC) X-HE-Tag: bone59_0d029442714b X-Filterd-Recvd-Size: 5156 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Sep 2020 06:54:34 +0000 (UTC) IronPort-SDR: 3KSSElGTxEWq4JnoR3HP9VZtWYMuSpSxl3dTHmymFhrRbXOxan5s41X7SCNb1ey/3DzCqlMOHS tBqcJvU45UDQ== X-IronPort-AV: E=McAfee;i="6000,8403,9751"; a="224694579" X-IronPort-AV: E=Sophos;i="5.77,289,1596524400"; d="scan'208";a="224694579" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2020 23:54:33 -0700 IronPort-SDR: uL6ZEh4+QkK4B7jYVcpSqFTt11KDoZHGeimnATf6+7xFTr8WG3QICWTd1zEhfHQ33LDV5UGrNZ F8XxaCiSS9eQ== X-IronPort-AV: E=Sophos;i="5.77,289,1596524400"; d="scan'208";a="485834643" Received: from yhuang-mobile.sh.intel.com ([10.238.4.22]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2020 23:54:29 -0700 From: Huang Ying To: Peter Zijlstra Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huang Ying , Andrew Morton , Ingo Molnar , Mel Gorman , Rik van Riel , Johannes Weiner , "Matthew Wilcox (Oracle)" , Dave Hansen , Andi Kleen , Michal Hocko , David Rientjes Subject: [RFC -V2] autonuma: Migrate on fault among multiple bound nodes Date: Tue, 22 Sep 2020 14:54:01 +0800 Message-Id: <20200922065401.376348-1-ying.huang@intel.com> X-Mailer: git-send-email 2.28.0 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now, AutoNUMA can only optimize the page placement among the NUMA nodes i= f the default memory policy is used. Because the memory policy specified expli= citly should take precedence. But this seems too strict in some situations. F= or example, on a system with 4 NUMA nodes, if the memory of an application i= s bound to the node 0 and 1, AutoNUMA can potentially migrate the pages between t= he node 0 and 1 to reduce cross-node accessing without breaking the explicit memo= ry binding policy. So in this patch, if mbind(.mode=3DMPOL_BIND, .flags=3DMPOL_MF_LAZY) is u= sed to bind the memory of the application to multiple nodes, and in the hint page fau= lt handler both the faulting page node and the accessing node are in the pol= icy nodemask, the page will be tried to be migrated to the accessing node to = reduce the cross-node accessing. [Peter Zijlstra: provided the simplified implementation method.] Questions: Sysctl knob kernel.numa_balancing can enable/disable AutoNUMA optimizing globally. But for the memory areas that are bound to multiple NUMA nodes= , even if the AutoNUMA is enabled globally via the sysctl knob, we still need to= enable AutoNUMA again with a special flag. Why not just optimize the page place= ment if possible as long as AutoNUMA is enabled globally? The interface would lo= ok simpler with that. Signed-off-by: "Huang, Ying" Cc: Andrew Morton Cc: Ingo Molnar Cc: Mel Gorman Cc: Rik van Riel Cc: Johannes Weiner Cc: "Matthew Wilcox (Oracle)" Cc: Dave Hansen Cc: Andi Kleen Cc: Michal Hocko Cc: David Rientjes --- mm/mempolicy.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index eddbe4e56c73..273969204732 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2494,15 +2494,19 @@ int mpol_misplaced(struct page *page, struct vm_a= rea_struct *vma, unsigned long break; =20 case MPOL_BIND: - /* - * allows binding to multiple nodes. - * use current page if in policy nodemask, - * else select nearest allowed node, if any. - * If no allowed nodes, use current [!misplaced]. + * Allows binding to multiple nodes. If both current and + * accessing nodes are in policy nodemask, migrate to + * accessing node to optimize page placement. Otherwise, + * use current page if in policy nodemask, else select + * nearest allowed node, if any. If no allowed nodes, use + * current [!misplaced]. */ - if (node_isset(curnid, pol->v.nodes)) + if (node_isset(curnid, pol->v.nodes)) { + if (node_isset(thisnid, pol->v.nodes)) + goto moron; goto out; + } z =3D first_zones_zonelist( node_zonelist(numa_node_id(), GFP_HIGHUSER), gfp_zone(GFP_HIGHUSER), @@ -2516,6 +2520,7 @@ int mpol_misplaced(struct page *page, struct vm_are= a_struct *vma, unsigned long =20 /* Migrate the page towards the node whose CPU is referencing it */ if (pol->flags & MPOL_F_MORON) { +moron: polnid =3D thisnid; =20 if (!should_numa_migrate_memory(current, page, curnid, thiscpu)) --=20 2.28.0