linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Rongwei Wang <rongwei.wang@linux.alibaba.com>
To: Pierre Gondois <pierre.gondois@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: akpm@linux-foundation.org, willy@infradead.org,
	catalin.marinas@arm.com, dave.hansen@linux.intel.com,
	tj@kernel.org, mingo@redhat.com
Subject: Re: [PATCH RFC 0/5] support NUMA emulation for arm64
Date: Thu, 12 Oct 2023 21:30:50 +0800	[thread overview]
Message-ID: <57eba42c-732a-4a30-a714-5e5538f2e5d5@linux.alibaba.com> (raw)
In-Reply-To: <f43112b2-ef8a-46ba-b57f-e3f0cc81b638@arm.com>


On 2023/10/12 20:37, Pierre Gondois wrote:
> Hello Rongwei,
>
> On 10/12/23 04:48, Rongwei Wang wrote:
>> A brief introduction
>> ====================
>>
>> The NUMA emulation can fake more node base on a single
>> node system, e.g.
>>
>> one node system:
>>
>> [root@localhost ~]# numactl -H
>> available: 1 nodes (0)
>> node 0 cpus: 0 1 2 3 4 5 6 7
>> node 0 size: 31788 MB
>> node 0 free: 31446 MB
>> node distances:
>> node   0
>>    0:  10
>>
>> add numa=fake=2 (fake 2 node on each origin node):
>>
>> [root@localhost ~]# numactl -H
>> available: 2 nodes (0-1)
>> node 0 cpus: 0 1 2 3 4 5 6 7
>> node 0 size: 15806 MB
>> node 0 free: 15451 MB
>> node 1 cpus: 0 1 2 3 4 5 6 7
>> node 1 size: 16029 MB
>> node 1 free: 15989 MB
>> node distances:
>> node   0   1
>>    0:  10  10
>>    1:  10  10
>>
>> As above shown, a new node has been faked. As cpus, the realization
>> of x86 NUMA emulation is kept. Maybe each node should has 4 cores is
>> better (not sure, next to do if so).
>>
>> Why do this
>> ===========
>>
>> It seems has following reasons:
>>    (1) In x86 host, apply NUMA emulation can fake more nodes environment
>>        to test or verify some performance stuff, but arm64 only has
>>        one method that modify ACPI table to do this. It's troublesome
>>        more or less.
>>    (2) Reduce competition for some locks. Here an example we found:
>>        will-it-scale/tlb_flush1_processes -t 96 -s 10, it shows obvious
>>        hotspot on lruvec->lock when test in single environment. What's
>>        more, The performance improved greatly if test in two more nodes
>>        system. The data shows below (more is better):
>>
>> ---------------------------------------------------------------------
>>        threads/process |   1     |     12   |     24   | 48     |   96
>> ---------------------------------------------------------------------
>>        one node        | 14 1122 | 110 5372 | 111 2615 | 79 7084  | 
>> 72 4516
>> ---------------------------------------------------------------------
>>        numa=fake=2     | 14 1168 | 144 4848 | 215 9070 | 157 0412 | 
>> 142 3968
>> ---------------------------------------------------------------------
>>                        | For concurrency 12, no lruvec->lock hotspot. 
>> For 24,
>>        hotspot         | one node has 24% hotspot on lruvec->lock, but
>>                        | two nodes env hasn't.
>> ---------------------------------------------------------------------
>>
>> As for risks (e.g. numa balance...), they need to be discussed here.
>>
>> Lastly, this just is a draft, I can improve next if it's acceptable.
>
> I'm not engaging on the utility/relevance of the patch-set, but I tried
> them on an arm64 system with the 'numa=fake=2' parameter and could not

Sorry, my fault.

I should mention this in previous brief introduction: acpi=on numa=fake=2.

The default patch of arm64 numa initialize is numa_init() -> 
dummy_numa_init() if turn off acpi (this path has not been taken into 
account yet in this patch, next will to do).

What's more, if you test these patchset in qemu-kvm, you should add 
below parameters in the script.

object memory-backend-ram,id=mem0,size=32G \
numa node,memdev=mem0,cpus=0-7,nodeid=0 \

(Above parameters just make sure SRAT table has NUMA configure, avoiding 
path of numa_init() -> dummy_numa_init())

> see 2 nodes being created under:
>   /sys/devices/system/node/
> Indeed it seems that even though numa_emulation() is moved to a generic
> mm/numa.c file, the function is only called from:
>   arch/x86/mm/numa.c:numa_init()
> (or maybe I'm misinterpreting the intent of the patches).

Here drivers/base/arch_numa.c:numa_init() has called numa_emulation() (I 
guess it works if you add acpi=on :-)).


>
> Also I had the following errors when building (still for arm64):
> mm/numa.c:862:8: error: implicit declaration of function 
> 'early_cpu_to_node' is invalid in C99 
> [-Werror,-Wimplicit-function-declaration]
>         nid = early_cpu_to_node(cpu);

It seems CONFIG_DEBUG_PER_CPU_MAPS enabled in your environment? You can 
disable CONFIG_DEBUG_PER_CPU_MAPS and test it again.

I have not test it with CONFIG_DEBUG_PER_CPU_MAPS enabled. It's very 
helpful, I will fix it next time.

If you have any questions, please let me know.

Regards,

-wrw

> ^
> mm/numa.c:862:8: note: did you mean 'early_map_cpu_to_node'?
> ./include/asm-generic/numa.h:37:13: note: 'early_map_cpu_to_node' 
> declared here
> void __init early_map_cpu_to_node(unsigned int cpu, int nid);
>             ^
> mm/numa.c:874:3: error: implicit declaration of function 
> 'debug_cpumask_set_cpu' is invalid in C99 
> [-Werror,-Wimplicit-function-declaration]
>                 debug_cpumask_set_cpu(cpu, nid, enable);
>                 ^
> mm/numa.c:874:3: note: did you mean '__cpumask_set_cpu'?
> ./include/linux/cpumask.h:474:29: note: '__cpumask_set_cpu' declared here
> static __always_inline void __cpumask_set_cpu(unsigned int cpu, struct 
> cpumask *dstp)
>                             ^
> 2 errors generated.
>
> Regards,
> Pierre
>
>>
>> Thanks!
>>
>> Rongwei Wang (5):
>>    mm/numa: move numa emulation APIs into generic files
>>    mm: percpu: fix variable type of cpu
>>    arch_numa: remove __init in early_cpu_to_node()
>>    mm/numa: support CONFIG_NUMA_EMU for arm64
>>    mm/numa: migrate leftover numa emulation into mm/numa.c
>>
>>   arch/x86/Kconfig                          |   8 -
>>   arch/x86/include/asm/numa.h               |   3 -
>>   arch/x86/mm/Makefile                      |   1 -
>>   arch/x86/mm/numa.c                        | 216 +-------------
>>   arch/x86/mm/numa_internal.h               |  14 +-
>>   drivers/base/arch_numa.c                  |   7 +-
>>   include/asm-generic/numa.h                |  33 +++
>>   include/linux/percpu.h                    |   2 +-
>>   mm/Kconfig                                |   8 +
>>   mm/Makefile                               |   1 +
>>   arch/x86/mm/numa_emulation.c => mm/numa.c | 333 +++++++++++++++++++++-
>>   11 files changed, 373 insertions(+), 253 deletions(-)
>>   rename arch/x86/mm/numa_emulation.c => mm/numa.c (63%)
>>


  reply	other threads:[~2023-10-12 13:31 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-12  2:48 Rongwei Wang
2023-10-12  2:48 ` [PATCH RFC 1/5] mm/numa: move numa emulation APIs into generic files Rongwei Wang
2023-10-12  6:05   ` Ingo Molnar
2023-10-12  2:48 ` [PATCH RFC 2/5] mm: percpu: fix variable type of cpu Rongwei Wang
2023-10-12  2:48 ` [PATCH RFC 3/5] arch_numa: remove __init in early_cpu_to_node() Rongwei Wang
2023-10-12  2:48 ` [PATCH RFC 4/5] mm/numa: support CONFIG_NUMA_EMU for arm64 Rongwei Wang
2023-10-12  2:48 ` [PATCH RFC 5/5] mm/numa: migrate leftover numa emulation into mm/numa.c Rongwei Wang
2023-10-12 12:37 ` [PATCH RFC 0/5] support NUMA emulation for arm64 Pierre Gondois
2023-10-12 13:30   ` Rongwei Wang [this message]
2023-10-23 13:03     ` Pierre Gondois
2024-02-20 11:36 ` [PATCH v1 0/2] support NUMA emulation for genertic arch Rongwei Wang
2024-02-20 11:36   ` [PATCH v1 1/2] arch_numa: remove __init for early_cpu_to_node Rongwei Wang
2024-02-20 11:36   ` [PATCH v1 2/2] numa: introduce numa emulation for genertic arch Rongwei Wang
2024-02-21  6:12   ` [PATCH v1 0/2] support NUMA " Mike Rapoport
2024-02-21 15:51     ` Pierre Gondois
2024-02-29  3:26       ` Rongwei Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57eba42c-732a-4a30-a714-5e5538f2e5d5@linux.alibaba.com \
    --to=rongwei.wang@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=pierre.gondois@arm.com \
    --cc=tj@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox