From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29AE6C4363D for ; Wed, 23 Sep 2020 05:44:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AA32921D91 for ; Wed, 23 Sep 2020 05:44:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AA32921D91 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BE35D6B0003; Wed, 23 Sep 2020 01:44:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B934E6B0055; Wed, 23 Sep 2020 01:44:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA8E96B005A; Wed, 23 Sep 2020 01:44:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0157.hostedemail.com [216.40.44.157]) by kanga.kvack.org (Postfix) with ESMTP id 90A9A6B0003 for ; Wed, 23 Sep 2020 01:44:23 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3DD881EE6 for ; Wed, 23 Sep 2020 05:44:23 +0000 (UTC) X-FDA: 77293236006.25.jelly09_5a1331f27153 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 0DF851804E3A0 for ; Wed, 23 Sep 2020 05:44:23 +0000 (UTC) X-HE-Tag: jelly09_5a1331f27153 X-Filterd-Recvd-Size: 6516 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Wed, 23 Sep 2020 05:44:21 +0000 (UTC) IronPort-SDR: 5LtfseYoXbMv8a31G3Fa66K2PZMGPzZA4XEdwJHXfshHOIEiFskyQO3Zv+/xQX31b+MIn9qT+E N1GxA4IIdI/Q== X-IronPort-AV: E=McAfee;i="6000,8403,9752"; a="148541283" X-IronPort-AV: E=Sophos;i="5.77,293,1596524400"; d="scan'208";a="148541283" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2020 22:44:16 -0700 IronPort-SDR: ecSZn7beoVqP/TqvZcl1OzTC3kykreZ5ip1xx5+wWc0Jw2P8qE+3tzgCOSM0m0R26e/CQZhdM9 SEdPvw7ke0+g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,293,1596524400"; d="scan'208";a="347216154" Received: from yhuang-dev.sh.intel.com (HELO yhuang-dev) ([10.239.159.164]) by FMSMGA003.fm.intel.com with ESMTP; 22 Sep 2020 22:44:12 -0700 From: "Huang\, Ying" To: Phil Auld Cc: Peter Zijlstra , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Ingo Molnar , Mel Gorman , Johannes Weiner , "Matthew Wilcox \(Oracle\)" , Dave Hansen , Andi Kleen , Michal Hocko , David Rientjes Subject: Re: [RFC -V2] autonuma: Migrate on fault among multiple bound nodes References: <20200922065401.376348-1-ying.huang@intel.com> <20200922125049.GA10420@lorien.usersys.redhat.com> Date: Wed, 23 Sep 2020 13:44:12 +0800 In-Reply-To: <20200922125049.GA10420@lorien.usersys.redhat.com> (Phil Auld's message of "Tue, 22 Sep 2020 08:51:53 -0400") Message-ID: <87o8lxoxn7.fsf@yhuang-dev.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Phil Auld writes: > Hi, > > On Tue, Sep 22, 2020 at 02:54:01PM +0800 Huang Ying wrote: >> Now, AutoNUMA can only optimize the page placement among the NUMA nodes if the >> default memory policy is used. Because the memory policy specified explicitly >> should take precedence. But this seems too strict in some situations. For >> example, on a system with 4 NUMA nodes, if the memory of an application is bound >> to the node 0 and 1, AutoNUMA can potentially migrate the pages between the node >> 0 and 1 to reduce cross-node accessing without breaking the explicit memory >> binding policy. >> >> So in this patch, if mbind(.mode=MPOL_BIND, .flags=MPOL_MF_LAZY) is used to bind >> the memory of the application to multiple nodes, and in the hint page fault >> handler both the faulting page node and the accessing node are in the policy >> nodemask, the page will be tried to be migrated to the accessing node to reduce >> the cross-node accessing. >> > > Do you have any performance numbers that show the effects of this on > a workload? I have done some simple test to confirm that NUMA balancing works in the target configuration. As for performance numbers, it's exactly same as that of the original NUMA balancing in a different configuration. Between without memory binding and with memory bound to all NUMA nodes. > >> [Peter Zijlstra: provided the simplified implementation method.] >> >> Questions: >> >> Sysctl knob kernel.numa_balancing can enable/disable AutoNUMA optimizing >> globally. But for the memory areas that are bound to multiple NUMA nodes, even >> if the AutoNUMA is enabled globally via the sysctl knob, we still need to enable >> AutoNUMA again with a special flag. Why not just optimize the page placement if >> possible as long as AutoNUMA is enabled globally? The interface would look >> simpler with that. > > > I agree. I think it should try to do this if globally enabled. Thanks! >> >> Signed-off-by: "Huang, Ying" >> Cc: Andrew Morton >> Cc: Ingo Molnar >> Cc: Mel Gorman >> Cc: Rik van Riel >> Cc: Johannes Weiner >> Cc: "Matthew Wilcox (Oracle)" >> Cc: Dave Hansen >> Cc: Andi Kleen >> Cc: Michal Hocko >> Cc: David Rientjes >> --- >> mm/mempolicy.c | 17 +++++++++++------ >> 1 file changed, 11 insertions(+), 6 deletions(-) >> >> diff --git a/mm/mempolicy.c b/mm/mempolicy.c >> index eddbe4e56c73..273969204732 100644 >> --- a/mm/mempolicy.c >> +++ b/mm/mempolicy.c >> @@ -2494,15 +2494,19 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long >> break; >> >> case MPOL_BIND: >> - >> /* >> - * allows binding to multiple nodes. >> - * use current page if in policy nodemask, >> - * else select nearest allowed node, if any. >> - * If no allowed nodes, use current [!misplaced]. >> + * Allows binding to multiple nodes. If both current and >> + * accessing nodes are in policy nodemask, migrate to >> + * accessing node to optimize page placement. Otherwise, >> + * use current page if in policy nodemask, else select >> + * nearest allowed node, if any. If no allowed nodes, use >> + * current [!misplaced]. >> */ >> - if (node_isset(curnid, pol->v.nodes)) >> + if (node_isset(curnid, pol->v.nodes)) { >> + if (node_isset(thisnid, pol->v.nodes)) >> + goto moron; > > Nice label :) OK. Because quite some people pay attention to this. I will rename all "moron" to "mopron" as suggested by Matthew. Although MPOL_F_MORON is defined in include/uapi/linux/mempolicy.h, it is explicitly marked as internal flags. Best Regards, Huang, Ying >> goto out; >> + } >> z = first_zones_zonelist( >> node_zonelist(numa_node_id(), GFP_HIGHUSER), >> gfp_zone(GFP_HIGHUSER), >> @@ -2516,6 +2520,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long >> >> /* Migrate the page towards the node whose CPU is referencing it */ >> if (pol->flags & MPOL_F_MORON) { >> +moron: >> polnid = thisnid; >> >> if (!should_numa_migrate_memory(current, page, curnid, thiscpu)) >> -- >> 2.28.0 >> > > > Cheers, > Phil