From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FE06C433E2 for ; Wed, 16 Sep 2020 08:11:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A3A95206F7 for ; Wed, 16 Sep 2020 08:11:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="ZsKOyPFs" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A3A95206F7 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D0AC66B0037; Wed, 16 Sep 2020 04:11:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C94808E0001; Wed, 16 Sep 2020 04:11:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B5D5D6B005C; Wed, 16 Sep 2020 04:11:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0232.hostedemail.com [216.40.44.232]) by kanga.kvack.org (Postfix) with ESMTP id 99D326B0037 for ; Wed, 16 Sep 2020 04:11:02 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 591A58249980 for ; Wed, 16 Sep 2020 08:11:02 +0000 (UTC) X-FDA: 77268203964.09.ice05_070a06727118 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id 33BBC180AD806 for ; Wed, 16 Sep 2020 08:11:02 +0000 (UTC) X-HE-Tag: ice05_070a06727118 X-Filterd-Recvd-Size: 7242 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Wed, 16 Sep 2020 08:11:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=Mp7kNPDyqWwplraoK0gqQ2HI42Y/k4k6SwtQAUL0cLs=; b=ZsKOyPFsoDeyUDWHUmm+CnzZpt 0b96zoqCrKymqAOXDI6sRABxNfDJEEj9jxhe/aIgYBA6UApvJMxP8VKPJF5Cg2nI2+Bs/5YvZlraH NrfAYU76qUwuoUgw6ACoeeLY4GtP1hA6Atg5PgsTzfrReEeGDr1fs7id9xFCmiFTclDaELsGuoK7T L2iTsZ2+L46Tr9Jno25zH0sQ/VFkmeCckMHSSffZvDqRKqGb6VgfblqyNTq+2c02DEb9kfAt5yyzw B6dzk8VxwFt949FHL12IV0tvfK/G0qhKwUnXOxo4li841Q9lwsdEYPcEnIr6qmIj/IWChojBifn8x 9xpyz+zQ==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1kISWj-0005g0-Jv; Wed, 16 Sep 2020 08:10:53 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 407223012C3; Wed, 16 Sep 2020 10:10:52 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 22A132C32F829; Wed, 16 Sep 2020 10:10:52 +0200 (CEST) Date: Wed, 16 Sep 2020 10:10:52 +0200 From: peterz@infradead.org To: Huang Ying Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Ingo Molnar , Mel Gorman , Rik van Riel , Johannes Weiner , "Matthew Wilcox (Oracle)" , Dave Hansen , Andi Kleen , Michal Hocko , David Rientjes Subject: Re: [RFC] autonuma: Migrate on fault among multiple bound nodes Message-ID: <20200916081052.GI2674@hirez.programming.kicks-ass.net> References: <20200916005936.232788-1-ying.huang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200916005936.232788-1-ying.huang@intel.com> X-Rspamd-Queue-Id: 33BBC180AD806 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 16, 2020 at 08:59:36AM +0800, Huang Ying wrote: > So in this patch, if MPOL_BIND is used to bind the memory of the > application to multiple nodes, and in the hint page fault handler both > the faulting page node and the accessing node are in the policy > nodemask, the page will be tried to be migrated to the accessing node > to reduce the cross-node accessing. Seems fair enough.. > Questions: > > Sysctl knob kernel.numa_balancing can enable/disable AutoNUMA > optimizing globally. And now, it appears that the explicit NUMA > memory policy specifying (e.g. via numactl, mbind(), etc.) acts like > an implicit per-thread/VMA knob to enable/disable the AutoNUMA > optimizing for the thread/VMA. Although this looks like a side effect > instead of an API, from commit fc3147245d19 ("mm: numa: Limit NUMA > scanning to migrate-on-fault VMAs"), this is used by some users? So > the question is, do we need an explicit per-thread/VMA knob to > enable/disable AutoNUMA optimizing for the thread/VMA? Or just use > the global knob, either optimize all thread/VMAs as long as the > explicitly specified memory policies are respected, or don't optimize > at all. I don't understand the question; that commit is not about disabling numa balancing, it's about avoiding pointless work and overhead. What's the point of scanning memory if you're not going to be allowed to move it anyway. > Signed-off-by: "Huang, Ying" > Cc: Andrew Morton > Cc: Ingo Molnar > Cc: Mel Gorman > Cc: Rik van Riel > Cc: Johannes Weiner > Cc: "Matthew Wilcox (Oracle)" > Cc: Dave Hansen > Cc: Andi Kleen > Cc: Michal Hocko > Cc: David Rientjes > --- > mm/mempolicy.c | 43 +++++++++++++++++++++++++++++++------------ > 1 file changed, 31 insertions(+), 12 deletions(-) > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index eddbe4e56c73..a941eab2de24 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -1827,6 +1827,13 @@ static struct mempolicy *get_vma_policy(struct vm_area_struct *vma, > return pol; > } > > +static bool mpol_may_mof(struct mempolicy *pol) > +{ > + /* May migrate among bound nodes for MPOL_BIND */ > + return pol->flags & MPOL_F_MOF || > + (pol->mode == MPOL_BIND && nodes_weight(pol->v.nodes) > 1); > +} This is weird, why not just set F_MOF on the policy? In fact, why wouldn't something like: mbind(.mode=MPOL_BIND, .flags=MPOL_MF_LAZY); work today? Afaict MF_LAZY will unconditionally result in M_MOF. > @@ -2494,20 +2503,30 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long > break; > > case MPOL_BIND: > /* > + * Allows binding to multiple nodes. If both current and > + * accessing nodes are in policy nodemask, migrate to > + * accessing node to optimize page placement. Otherwise, > + * use current page if in policy nodemask or MPOL_F_MOF not > + * set, else select nearest allowed node, if any. If no > + * allowed nodes, use current [!misplaced]. > */ > + if (node_isset(curnid, pol->v.nodes)) { > + if (node_isset(thisnid, pol->v.nodes)) { > + moron = true; > + polnid = thisnid; > + } else { > + goto out; > + } > + } else if (!(pol->flags & MPOL_F_MOF)) { > goto out; > + } else { > + z = first_zones_zonelist( > node_zonelist(numa_node_id(), GFP_HIGHUSER), > gfp_zone(GFP_HIGHUSER), > &pol->v.nodes); > + polnid = zone_to_nid(z->zone); > + } > break; > > default: Did that want to be this instead? I don't think I follow the other changes. diff --git a/mm/mempolicy.c b/mm/mempolicy.c index eddbe4e56c73..2a64913f9ac6 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2501,8 +2501,11 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long * else select nearest allowed node, if any. * If no allowed nodes, use current [!misplaced]. */ - if (node_isset(curnid, pol->v.nodes)) + if (node_isset(curnid, pol->v.nodes)) { + if (node_isset(thisnod, pol->v.nodes)) + goto moron; goto out; + } z = first_zones_zonelist( node_zonelist(numa_node_id(), GFP_HIGHUSER), gfp_zone(GFP_HIGHUSER), @@ -2516,6 +2519,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long /* Migrate the page towards the node whose CPU is referencing it */ if (pol->flags & MPOL_F_MORON) { +moron: polnid = thisnid; if (!should_numa_migrate_memory(current, page, curnid, thiscpu))