From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F00EC433E0 for ; Wed, 24 Jun 2020 08:26:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D7CD920823 for ; Wed, 24 Jun 2020 08:26:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D7CD920823 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 337D46B0002; Wed, 24 Jun 2020 04:26:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C17B6B0005; Wed, 24 Jun 2020 04:26:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 160EC6B0008; Wed, 24 Jun 2020 04:26:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0093.hostedemail.com [216.40.44.93]) by kanga.kvack.org (Postfix) with ESMTP id E05DC6B0002 for ; Wed, 24 Jun 2020 04:26:03 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 879AB181AC9CB for ; Wed, 24 Jun 2020 08:26:03 +0000 (UTC) X-FDA: 76963422606.07.scent78_000958726e42 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id 561571803F9AE for ; Wed, 24 Jun 2020 08:26:03 +0000 (UTC) X-HE-Tag: scent78_000958726e42 X-Filterd-Recvd-Size: 6667 Received: from mail-ed1-f68.google.com (mail-ed1-f68.google.com [209.85.208.68]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Wed, 24 Jun 2020 08:26:02 +0000 (UTC) Received: by mail-ed1-f68.google.com with SMTP id t21so835926edr.12 for ; Wed, 24 Jun 2020 01:26:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=2YdFi/aNoXUSsDd2ibPtT1nQqm/ZsHCMdwG1P2HmFdM=; b=lKxFaAvfDwLoqvmBeITuHPptkn3BKjHjsVILJzgDhDeuKjGiY1BV91jlI1hb8nnmrM 3hPFg2hsdMX6SSuaFT99GXg5BQoskNM07z+y9KHPuTSo1jrNEYUNsqtpW52X915urEWP uWpG2lZ+K7pOx7On3QfgrjkAx50lEU5+qFIcBU51KF5mrC9b2E9/zp3r3WMya2IQ9y88 ie8xNqqA9IUtfsxTf9vAm09gnAxlJV0P+tMaS6SJzYBWvi9TCfeSGaX6dvcUXDYWYdnE XMdNZ0dPkM+M/eKT2R6oEUF4wtD6Uy9AK2z5KlpML7oahTqvcVYc5g7shaLeyDWy7VDc bueQ== X-Gm-Message-State: AOAM530ATt2Kb9hjDvinzET6W+UhPGK83PYFXtTWQ+5nOXcV1UwNM67T m/6A9C0Gk/xgnUHsItfS2D8= X-Google-Smtp-Source: ABdhPJwwg3E+UTnKjlQn1I523bTJ9qF5cTcb2qsfZ7hsRJAx3QQIG9cO7hBHHD6w+Uq+ikyujAe3Ng== X-Received: by 2002:a05:6402:b99:: with SMTP id cf25mr6204123edb.291.1592987161796; Wed, 24 Jun 2020 01:26:01 -0700 (PDT) Received: from localhost (ip-37-188-168-3.eurotel.cz. [37.188.168.3]) by smtp.gmail.com with ESMTPSA id t5sm16155937eds.81.2020.06.24.01.26.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 01:26:01 -0700 (PDT) Date: Wed, 24 Jun 2020 10:25:59 +0200 From: Michal Hocko To: Ben Widawsky Cc: linux-mm , Andrew Morton , Lee Schermerhorn Subject: Re: [PATCH 02/18] mm/mempolicy: Use node_mem_id() instead of node_id() Message-ID: <20200624082559.GE1320@dhcp22.suse.cz> References: <20200619162425.1052382-1-ben.widawsky@intel.com> <20200619162425.1052382-3-ben.widawsky@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200619162425.1052382-3-ben.widawsky@intel.com> X-Rspamd-Queue-Id: 561571803F9AE X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri 19-06-20 09:24:09, Ben Widawsky wrote: > Calling out some distinctions first as I understand it, and the > reasoning of the patch: > numa_node_id() - The node id for the currently running CPU. > numa_mem_id() - The node id for the closest memory node. Correct > The case where they are not the same is CONFIG_HAVE_MEMORYLESS_NODES. > Only ia64 and powerpc support this option, so it is perhaps not a very > interesting situation to most. Other arches can have nodes without any memory as well. Just offline all the managed memory via hotplug... (please note that such node still might have memory present! Just not useable by the page allocator) > The question is, when you do want which? Short answer is that you shouldn't really care. The fact that we do and that we have a distinct API for that is IMHO a mistake of the past IMHO. A slightly longer answer would be that the allocator should fallback to a proper node(s) even if you specify a memory less node as a preferred one. That is achieved by proper zonelist for each existing NUMA node. There are bugs here and there when some nodes do not get their zonelists initialized but fundamentally this should be nobrainer. There are also corner cases where an application might have been bound to a node which went offline completely which would be "fun" > numa_node_id() is definitely > what's desired if MPOL_PREFERRED, or MPOL_LOCAL were used, since the ABI > states "This mode specifies "local allocation"; the memory is allocated > on the node of the CPU that triggered the allocation (the "local > node")." In fact from the allocator point of view there is no real difference because there is nothing to allocate from the node without memory, obviously so it would fallback to the next node/zones from the closest node... > It would be weird, though not impossible to set this policy on > a CPU that has memoryless nodes. Keep in mind that the memory might went away via hotplug. > A more likely way to hit this is with > interleaving. The current interfaces will return some equally weird > thing, but at least it's symmetric. Therefore, in cases where the node > is being queried for the currently running process, it probably makes > sense to use numa_node_id(). For other cases however, when CPU is trying > to obtain the "local" memory, numa_mem_id() already contains this and > should be used instead. > > This really should only effect configurations where > CONFIG_HAVE_MEMORYLESS_NODES=y, and even on those machines it's quite > possible the ultimate behavior would be identical. > > Cc: Andrew Morton > Cc: Lee Schermerhorn > Signed-off-by: Ben Widawsky Well, mempolicy.c uses numa_node_id in most cases and I do not see these two being special. So if we want to change that it should be done consistently. I even suspect that these changes are mostly nops because respective zonelists will do the right thing, But there might be land mines here and there - e.g. if __GFP_THISNODE was used then somebody might expect a failure rather than a misplaced allocation because there is other fallback mechanism on a depleted numa node. All that being said I am not sure this is an actual improvement. > --- >I mm/mempolicy.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index 36ee3267c25f..99e0f3f9c4a6 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -1991,7 +1991,7 @@ static unsigned offset_il_node(struct mempolicy *pol, unsigned long n) > int nid; > > if (!nnodes) > - return numa_node_id(); > + return numa_mem_id(); > target = (unsigned int)n % nnodes; > nid = first_node(pol->v.nodes); > for (i = 0; i < target; i++) > @@ -2049,7 +2049,7 @@ int huge_node(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags, > nid = interleave_nid(*mpol, vma, addr, > huge_page_shift(hstate_vma(vma))); > } else { > - nid = policy_node(gfp_flags, *mpol, numa_node_id()); > + nid = policy_node(gfp_flags, *mpol, numa_mem_id()); > if ((*mpol)->mode == MPOL_BIND) > *nodemask = &(*mpol)->v.nodes; > } > -- > 2.27.0 > -- Michal Hocko SUSE Labs