From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5741CDB46E for ; Thu, 12 Oct 2023 13:31:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4FE0F8D0129; Thu, 12 Oct 2023 09:31:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 486168D0002; Thu, 12 Oct 2023 09:31:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 326E58D0129; Thu, 12 Oct 2023 09:31:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 1C89C8D0002 for ; Thu, 12 Oct 2023 09:31:03 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E136A16060A for ; Thu, 12 Oct 2023 13:31:02 +0000 (UTC) X-FDA: 81336895164.10.A5AE4A7 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by imf02.hostedemail.com (Postfix) with ESMTP id 046718002A for ; Thu, 12 Oct 2023 13:30:59 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf02.hostedemail.com: domain of rongwei.wang@linux.alibaba.com designates 115.124.30.131 as permitted sender) smtp.mailfrom=rongwei.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697117461; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7iaCKuAxIzhL4ULWDimSrGEpl+pDc849Yiq/EPdrHNA=; b=Z3W7pAp8WTihTWU50GoAquUbckEHq0ki1/ZuclUi9N8DW1xPw41k9SPDl4aKwPVxSKX4ws VMc8uYJfEuj1yL9QbXhNEBAuVzSxLyeLpW470A1RWE/6QtAif8gZy8EwUppB+LUxismLW/ I9a1IG+hyEPifX5Pwr73SUlnhyKE3KU= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf02.hostedemail.com: domain of rongwei.wang@linux.alibaba.com designates 115.124.30.131 as permitted sender) smtp.mailfrom=rongwei.wang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697117461; a=rsa-sha256; cv=none; b=BB/yG0CjPY3MbbU2EwlQFeuaQBByNNjP6kZz1n0B2zBPBrxC08bE8wHWEVBeAQQqLiJtG6 5IVmOU9v4ZhQdzs5rzC+ddtedNmMpNMjV/AzsbVdWj9aHkVDitslxkG7I+sapCCIYcQjrd bMiXIui5zRfLd85jB3mkqYMMRtN9ZYc= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R671e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045192;MF=rongwei.wang@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0Vu-xY8c_1697117451; Received: from 30.27.105.7(mailfrom:rongwei.wang@linux.alibaba.com fp:SMTPD_---0Vu-xY8c_1697117451) by smtp.aliyun-inc.com; Thu, 12 Oct 2023 21:30:52 +0800 Message-ID: <57eba42c-732a-4a30-a714-5e5538f2e5d5@linux.alibaba.com> Date: Thu, 12 Oct 2023 21:30:50 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC 0/5] support NUMA emulation for arm64 To: Pierre Gondois , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: akpm@linux-foundation.org, willy@infradead.org, catalin.marinas@arm.com, dave.hansen@linux.intel.com, tj@kernel.org, mingo@redhat.com References: <20231012024842.99703-1-rongwei.wang@linux.alibaba.com> Content-Language: en-US From: Rongwei Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Stat-Signature: 8s5d3px4ejsj7y919ni5anfoyq1p3r3d X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 046718002A X-HE-Tag: 1697117459-241197 X-HE-Meta: U2FsdGVkX1+C9+IIxIHkADAoEI/JqYh9qS9KVvvBKUB+mlYTKlWjsOkz9RABCo3eY8cQyKnqF2ThLy5W+Uodif7shLZhm0vPzE8sPC5KL8OpVDO5g24ONQWDwyHJRt146EkXEuBiOHJQs5i1rkNEDtR9FwJsEBaLu6DD4sJaC2hYQbUF8bqjtgLgHVB47Or/JWW8yUAVM042iJIX4ur/jfjsH//nPdFrB9PgX+0U/yN8wHdO+ArX05SzE/fmIH2QPir0np928DqJgRWu9VA+Eou46vzYpPg5aG3yMqp98AEixQ3xhYUnkYDzQKTJbMbd8XPNTaXc8RF0hgF5wjy9AbU3Soh84gblSbi9tzDx1fh9OfUt/WvDwOMlWDafgiiYqHPoEptFQPpkNAXpfM6QZGShYCZbaJ3HrTdGtG7u7YWXcGghipp7LqcIok3TBmI0Lto+e/8VtZNxMh15ONvnU+RzrTK64Jn7cuXzJbOg/4At8xptUzynTbOgnYHfmul9GwRvQO77b+IMBTsJ+Gn0bWyDzL4FfOvpL3DN1uFfvt86IsyTnDHui/WsFGfLJEMy7HeAt4fwpr8PWpgPsv4HKwNbGViA0n+XJ53w5lVzaqAi75E5+MBmY11mMEi/jhLZ+CZsQyEOm8ZCGx3RxjaE56+W8eR63UcDQ6EwA5nmzLB2YeyOGXtGU9ru1giEUsH6h3VEVR11VoG2IpgU5N/eFi0PUfN3I/ggykFLhXhw6sguXzaEOW+UhTp8e1M5IP8mzEDrNwkMYLHy2UgN0roU0jDAit7Vu0GAIbi/JrnBHvEGQoGDHpU/Yt58wKn1puiAFWL51LpkgAExE5y/RpiARtnsFmV3NM4Im1d9eEVix7lpOCFZRSx6w5dVEB9LX5GgkvQlQI4MpU/lg95ynTY0TEWiojh5jWEj4vUDGzpTl4WDGpOk457OEEHywfqt3wZfDN2l4sUMDP+6BCrurQ/ oL5JAHt4 ZDyj7FslnywLKU9xHe6cAbLBB3LZOHa3QYK7tw+6a+p4UGTIiSQJenAK85i47DSpOn2k1917j3GHdo5/5YySq2WPlz53sismeIOD9r4JysdHzyWc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2023/10/12 20:37, Pierre Gondois wrote: > Hello Rongwei, > > On 10/12/23 04:48, Rongwei Wang wrote: >> A brief introduction >> ==================== >> >> The NUMA emulation can fake more node base on a single >> node system, e.g. >> >> one node system: >> >> [root@localhost ~]# numactl -H >> available: 1 nodes (0) >> node 0 cpus: 0 1 2 3 4 5 6 7 >> node 0 size: 31788 MB >> node 0 free: 31446 MB >> node distances: >> node   0 >>    0:  10 >> >> add numa=fake=2 (fake 2 node on each origin node): >> >> [root@localhost ~]# numactl -H >> available: 2 nodes (0-1) >> node 0 cpus: 0 1 2 3 4 5 6 7 >> node 0 size: 15806 MB >> node 0 free: 15451 MB >> node 1 cpus: 0 1 2 3 4 5 6 7 >> node 1 size: 16029 MB >> node 1 free: 15989 MB >> node distances: >> node   0   1 >>    0:  10  10 >>    1:  10  10 >> >> As above shown, a new node has been faked. As cpus, the realization >> of x86 NUMA emulation is kept. Maybe each node should has 4 cores is >> better (not sure, next to do if so). >> >> Why do this >> =========== >> >> It seems has following reasons: >>    (1) In x86 host, apply NUMA emulation can fake more nodes environment >>        to test or verify some performance stuff, but arm64 only has >>        one method that modify ACPI table to do this. It's troublesome >>        more or less. >>    (2) Reduce competition for some locks. Here an example we found: >>        will-it-scale/tlb_flush1_processes -t 96 -s 10, it shows obvious >>        hotspot on lruvec->lock when test in single environment. What's >>        more, The performance improved greatly if test in two more nodes >>        system. The data shows below (more is better): >> >> --------------------------------------------------------------------- >>        threads/process |   1     |     12   |     24   | 48     |   96 >> --------------------------------------------------------------------- >>        one node        | 14 1122 | 110 5372 | 111 2615 | 79 7084  | >> 72 4516 >> --------------------------------------------------------------------- >>        numa=fake=2     | 14 1168 | 144 4848 | 215 9070 | 157 0412 | >> 142 3968 >> --------------------------------------------------------------------- >>                        | For concurrency 12, no lruvec->lock hotspot. >> For 24, >>        hotspot         | one node has 24% hotspot on lruvec->lock, but >>                        | two nodes env hasn't. >> --------------------------------------------------------------------- >> >> As for risks (e.g. numa balance...), they need to be discussed here. >> >> Lastly, this just is a draft, I can improve next if it's acceptable. > > I'm not engaging on the utility/relevance of the patch-set, but I tried > them on an arm64 system with the 'numa=fake=2' parameter and could not Sorry, my fault. I should mention this in previous brief introduction: acpi=on numa=fake=2. The default patch of arm64 numa initialize is numa_init() -> dummy_numa_init() if turn off acpi (this path has not been taken into account yet in this patch, next will to do). What's more, if you test these patchset in qemu-kvm, you should add below parameters in the script. object memory-backend-ram,id=mem0,size=32G \ numa node,memdev=mem0,cpus=0-7,nodeid=0 \ (Above parameters just make sure SRAT table has NUMA configure, avoiding path of numa_init() -> dummy_numa_init()) > see 2 nodes being created under: >   /sys/devices/system/node/ > Indeed it seems that even though numa_emulation() is moved to a generic > mm/numa.c file, the function is only called from: >   arch/x86/mm/numa.c:numa_init() > (or maybe I'm misinterpreting the intent of the patches). Here drivers/base/arch_numa.c:numa_init() has called numa_emulation() (I guess it works if you add acpi=on :-)). > > Also I had the following errors when building (still for arm64): > mm/numa.c:862:8: error: implicit declaration of function > 'early_cpu_to_node' is invalid in C99 > [-Werror,-Wimplicit-function-declaration] >         nid = early_cpu_to_node(cpu); It seems CONFIG_DEBUG_PER_CPU_MAPS enabled in your environment? You can disable CONFIG_DEBUG_PER_CPU_MAPS and test it again. I have not test it with CONFIG_DEBUG_PER_CPU_MAPS enabled. It's very helpful, I will fix it next time. If you have any questions, please let me know. Regards, -wrw > ^ > mm/numa.c:862:8: note: did you mean 'early_map_cpu_to_node'? > ./include/asm-generic/numa.h:37:13: note: 'early_map_cpu_to_node' > declared here > void __init early_map_cpu_to_node(unsigned int cpu, int nid); >             ^ > mm/numa.c:874:3: error: implicit declaration of function > 'debug_cpumask_set_cpu' is invalid in C99 > [-Werror,-Wimplicit-function-declaration] >                 debug_cpumask_set_cpu(cpu, nid, enable); >                 ^ > mm/numa.c:874:3: note: did you mean '__cpumask_set_cpu'? > ./include/linux/cpumask.h:474:29: note: '__cpumask_set_cpu' declared here > static __always_inline void __cpumask_set_cpu(unsigned int cpu, struct > cpumask *dstp) >                             ^ > 2 errors generated. > > Regards, > Pierre > >> >> Thanks! >> >> Rongwei Wang (5): >>    mm/numa: move numa emulation APIs into generic files >>    mm: percpu: fix variable type of cpu >>    arch_numa: remove __init in early_cpu_to_node() >>    mm/numa: support CONFIG_NUMA_EMU for arm64 >>    mm/numa: migrate leftover numa emulation into mm/numa.c >> >>   arch/x86/Kconfig                          |   8 - >>   arch/x86/include/asm/numa.h               |   3 - >>   arch/x86/mm/Makefile                      |   1 - >>   arch/x86/mm/numa.c                        | 216 +------------- >>   arch/x86/mm/numa_internal.h               |  14 +- >>   drivers/base/arch_numa.c                  |   7 +- >>   include/asm-generic/numa.h                |  33 +++ >>   include/linux/percpu.h                    |   2 +- >>   mm/Kconfig                                |   8 + >>   mm/Makefile                               |   1 + >>   arch/x86/mm/numa_emulation.c => mm/numa.c | 333 +++++++++++++++++++++- >>   11 files changed, 373 insertions(+), 253 deletions(-) >>   rename arch/x86/mm/numa_emulation.c => mm/numa.c (63%) >>