From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_2 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68C6DC4360C for ; Tue, 8 Oct 2019 11:18:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 38598206BB for ; Tue, 8 Oct 2019 11:18:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 38598206BB Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A6C3F8E0005; Tue, 8 Oct 2019 07:18:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A1D2A8E0003; Tue, 8 Oct 2019 07:18:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 95A5A8E0005; Tue, 8 Oct 2019 07:18:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0132.hostedemail.com [216.40.44.132]) by kanga.kvack.org (Postfix) with ESMTP id 74ECF8E0003 for ; Tue, 8 Oct 2019 07:18:06 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 1A6592C32 for ; Tue, 8 Oct 2019 11:18:06 +0000 (UTC) X-FDA: 76020368172.12.sky34_46765f8191458 X-HE-Tag: sky34_46765f8191458 X-Filterd-Recvd-Size: 5234 Received: from huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Oct 2019 11:18:04 +0000 (UTC) Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 440B2A04CF488C9331FE; Tue, 8 Oct 2019 19:17:56 +0800 (CST) Received: from localhost (10.202.226.61) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.439.0; Tue, 8 Oct 2019 19:17:47 +0800 Date: Tue, 8 Oct 2019 12:17:29 +0100 From: Jonathan Cameron To: Ingo Molnar CC: , , , , , Keith Busch , , "Rafael J . Wysocki" , , "Andrew Morton" , Dan Williams Subject: Re: [PATCH V5 3/4] x86: Support Generic Initiator only proximity domains Message-ID: <20191008121729.00005ee9@huawei.com> In-Reply-To: <20191007145505.GB88143@gmail.com> References: <20191004114330.104746-1-Jonathan.Cameron@huawei.com> <20191004114330.104746-4-Jonathan.Cameron@huawei.com> <20191007145505.GB88143@gmail.com> Organization: Huawei X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; i686-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.202.226.61] X-CFilter-Loop: Reflected X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 7 Oct 2019 16:55:05 +0200 Ingo Molnar wrote: > * Jonathan Cameron wrote: > > > Done in a somewhat different fashion to arm64. > > Here the infrastructure for memoryless domains was already > > in place. That infrastruture applies just as well to > > domains that also don't have a CPU, hence it works for > > Generic Initiator Domains. > > > > In common with memoryless domains we only register GI domains > > if the proximity node is not online. If a domain is already > > a memory containing domain, or a memoryless domain there is > > nothing to do just because it also contains a Generic Initiator. > > > > Signed-off-by: Jonathan Cameron > > --- > > arch/x86/include/asm/numa.h | 2 ++ > > arch/x86/kernel/setup.c | 1 + > > arch/x86/mm/numa.c | 14 ++++++++++++++ > > 3 files changed, 17 insertions(+) > > > > diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h > > index bbfde3d2662f..f631467272a3 100644 > > --- a/arch/x86/include/asm/numa.h > > +++ b/arch/x86/include/asm/numa.h > > @@ -62,12 +62,14 @@ extern void numa_clear_node(int cpu); > > extern void __init init_cpu_to_node(void); > > extern void numa_add_cpu(int cpu); > > extern void numa_remove_cpu(int cpu); > > +extern void init_gi_nodes(void); > > #else /* CONFIG_NUMA */ > > static inline void numa_set_node(int cpu, int node) { } > > static inline void numa_clear_node(int cpu) { } > > static inline void init_cpu_to_node(void) { } > > static inline void numa_add_cpu(int cpu) { } > > static inline void numa_remove_cpu(int cpu) { } > > +static inline void init_gi_nodes(void) { } > > #endif /* CONFIG_NUMA */ > > > > #ifdef CONFIG_DEBUG_PER_CPU_MAPS > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > > index cfb533d42371..b6c977907ea5 100644 > > --- a/arch/x86/kernel/setup.c > > +++ b/arch/x86/kernel/setup.c > > @@ -1264,6 +1264,7 @@ void __init setup_arch(char **cmdline_p) > > prefill_possible_map(); > > > > init_cpu_to_node(); > > + init_gi_nodes(); > > > > io_apic_init_mappings(); > > > > diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c > > index 4123100e0eaf..50bf724a425e 100644 > > --- a/arch/x86/mm/numa.c > > +++ b/arch/x86/mm/numa.c > > @@ -733,6 +733,20 @@ static void __init init_memory_less_node(int nid) > > */ > > } > > > > +/* > > + * Generic Initiator Nodes may have neither CPU nor Memory. > > + * At this stage if either of the others were present we would > > + * already be online. > > + */ > > +void __init init_gi_nodes(void) > > +{ > > + int nid; > > + > > + for_each_node_state(nid, N_GENERIC_INITIATOR) > > + if (!node_online(nid)) > > + init_memory_less_node(nid); > > +} > > Nit: missing curly braces. Good point. > > How do these work in practice, will a system that only had nodes 0-1 > today grow a third node '2' that won't have any CPUs on memory on them? Yes. Exactly that. The result is that fallback lists etc work when _PXM is used to assign a device into that new node. The interesting bit comes when a driver does something more interesting and queries the numa distances from SLIT. At that point the driver can elect to do load balancing across multiple nodes at similar distances. In theory you can also specify a device you wish to put into the node via the SRAT entry (IIRC using segment + BDF for PCI devices), but for now I haven't implemented that method. > > Thanks, > > Ingo Thanks, Jonathan