From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4CEDC02185 for ; Mon, 20 Jan 2025 04:35:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C64646B0082; Sun, 19 Jan 2025 23:35:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C149F6B0083; Sun, 19 Jan 2025 23:35:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB63A6B0085; Sun, 19 Jan 2025 23:35:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 846D86B0082 for ; Sun, 19 Jan 2025 23:35:45 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id CF0DD1C9B0A for ; Mon, 20 Jan 2025 04:35:44 +0000 (UTC) X-FDA: 83026567008.23.5503B71 Received: from invmail3.skhynix.com (exvmail3.hynix.com [166.125.252.90]) by imf24.hostedemail.com (Postfix) with ESMTP id ECFDC180005 for ; Mon, 20 Jan 2025 04:35:41 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=none; spf=pass (imf24.hostedemail.com: domain of hyeonggon.yoo@sk.com designates 166.125.252.90 as permitted sender) smtp.mailfrom=hyeonggon.yoo@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737347743; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mLG/9lv9Lm34UFBjvdcedI3lOp26spiGvFKhs5ipFOA=; b=ho+tBGKH90nzBN9yckz5D3/TlO1sz3tFpdW407HlYyRkf/VcsW0F9bMrMdImi8OLKvODpX j0EE5Vf4XvDMZJM20tdZcbxI7o5KrabB5bOg25/Z+Zb1tVMsmdkqBpEW4gKIO/18xzapFY wdPTNahYWFrIsoNR7+wwWvfl9xPt8BI= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none; spf=pass (imf24.hostedemail.com: domain of hyeonggon.yoo@sk.com designates 166.125.252.90 as permitted sender) smtp.mailfrom=hyeonggon.yoo@sk.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737347743; a=rsa-sha256; cv=none; b=ZgoZ1V1Ctapth1mlufyBZvLcIBX6sFrW4hbqrjSbx+bVyq/zWjgJ8z1G/A/cd/eys66+sM mMCwJHugiQJziTc7VyKamIBvdamyjIj1Ejpme6Wy08YUS8kyeque95lLbEMSWFCo3pXRcg TobCBQdJlrsDk8bMuT4+OfvoNRUvGVs= X-AuditID: a67dfc59-7a9ff700000194b3-d9-678dd29982ce Message-ID: <99eee28a-4e45-4d34-ace8-88f9a8e3ca2a@sk.com> Date: Mon, 20 Jan 2025 13:35:37 +0900 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Cc: kernel_team@skhynix.com, 42.hyeyoo@gmail.com, Bruno Faccini , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "akpm@linux-foundation.org" , "rppt@kernel.org" , "david@redhat.com" , "jhubbard@nvidia.com" , "mrusiniak@nvidia.com" Subject: Re: [fake numa not working] [PATCH 1/1] mm/fake-numa: allow later numa node hotplug To: Zi Yan References: <20250106120659.359610-1-bfaccini@nvidia.com> <20250106120659.359610-2-bfaccini@nvidia.com> <1DC02706-256B-4B61-B309-7D86595F4B22@nvidia.com> <7a641617-a4e5-45cd-bb1f-628bacd44046@sk.com> Content-Language: en-US From: Hyeonggon Yoo In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFupmkeLIzCtJLcpLzFFi42LhesuzSHfmpd50g21LtS0m9hhYzFm/hs1i zu+9bBZf1/9ittj4dBG7xeVdc9gs7q35z2px6s4dJosj67czWcw+eo/dgctj56y77B6bVnWy eWz6NInd48SM3ywevc3v2Dze77vK5vF5k1wAexSXTUpqTmZZapG+XQJXRvei88wFH5YxVnQd usvewLisjbGLkZNDQsBEYu6jy8ww9vcjW9m6GDk4eAUsJWZ02oCEWQRUJV5u2sUCYvMKCEqc nPkEzBYVkJe4f2sGexcjFwezwDpmiemN+9lBEsICcRKfJ/SCzRcRkJY43feHGaRISKCRWeLo 7w1gCWYBcYlbT+YzgSxjE9CS2NGZChLmBLrhwYX37BAlZhJdW7ugyuUlmrfOhrrzOZvE809m ELakxMEVN1gmMArOQnLfLCQbZiEZNQvJqAWMLKsYRTLzynITM3OM9YqzMyrzMiv0kvNzNzEC o2hZ7Z/IHYzfLgQfYhTgYFTi4U1g7U0XYk0sK67MPcQowcGsJMIr+qEnXYg3JbGyKrUoP76o NCe1+BCjNAeLkjiv0bfyFCGB9MSS1OzU1ILUIpgsEwenVAOjqzF/IHPRnrdy0r2pa/dxn7gy Q9Cc5VNDwt7lO6X9F7E6fDNq4v/8zJhxSSHjmomeXddT5uiXtf26cDWGZUJU1jH+bR781qpP 1X5qv6vue/hcT/y/w7GnHMVzzaq+3HQ/FNVzmNvq8rNHKw2+OX7reJRt8ju28sGDiUdtxC9Z 9bFo+z5d+uykEktxRqKhFnNRcSIAL8ID3J4CAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrILMWRmVeSWpSXmKPExsXCNUOnRHfmpd50g+ZFYhYTewws5qxfw2Yx 5/deNouv638xW2x8uojd4vDck6wWl3fNYbO4t+Y/q8WpO3eYLI6s385kMfvoPXYHbo+ds+6y e2xa1cnmsenTJHaPEzN+s3j0Nr9j83i/7yqbx+IXH5g8Pm+SC+CI4rJJSc3JLEst0rdL4Mro XnSeueDDMsaKrkN32RsYl7UxdjFyckgImEh8P7KVrYuRg4NXwFJiRqcNSJhFQFXi5aZdLCA2 r4CgxMmZT8BsUQF5ifu3ZrB3MXJxMAusY5aY3rifHSQhLBAn8XlCL9hMEQFpidN9f5hBioQE Gpkljv7eAJZgFhCXuPVkPhPIMjYBLYkdnakgYU6gGx5ceM8OUWIm0bW1C6pcXqJ562zmCYx8 s5DcMQvJpFlIWmYhaVnAyLKKUSQzryw3MTPHTK84O6MyL7NCLzk/dxMjMCKW1f6ZtIPx22X3 Q4wCHIxKPLwJrL3pQqyJZcWVuYcYJTiYlUR4RT/0pAvxpiRWVqUW5ccXleakFh9ilOZgURLn 9QpPTRASSE8sSc1OTS1ILYLJMnFwSjUwFn7rmy/wqka4RCv05YfPR7Y+MD2eFZmp3rbk7hY7 69Cmh1KL8/e9CLi3qW3hZ4UWX9evt5eG6C7tqjL2W73zRvb5vLJTPZ/U2l6Uc0T9nrC5nttD 3190LXdnZvqL7OIthWWcf5vtgqz0j637vspwWoyk2KuuFK6ap+Wzpr7RFOpgcmFcZOSoxFKc kWioxVxUnAgAZm5Z5oQCAAA= X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: ECFDC180005 X-Stat-Signature: sy54f93g3cmnmsafwjrit8jhh98zzi5r X-HE-Tag: 1737347741-727129 X-HE-Meta: U2FsdGVkX1/5kAL/YZlzjUq6lYDFStAly4Alia+erzaA1ioIpjxu4bDlLLSXILoxjLMBU/3V5rLRJuVR7Hz8RW4wG+JAk7RPIx7OUs/kmh0YKA7rdbY+A86Wf+GtPw8YwQC9tpavj3It5KAwLfzkXK4B36yzsBRrbLU/3M3GST5rKXmlSiRWOl0JS3WBaAgkFs8n9gRgXWs4yeuqLSl9syzgN0/Om+F6mMe0x3PSEttA5x+Zwl3sZO6Y2BtEiDLnOB6izXhtuSqT9lVVJYN5tey6TLX2VWytNWYb/OOfyuYRapMcp4c5DTgvv/dEUEsK8Pxi0WqkWHTt8DhVHJJodHuhS/q30mLvv/RSeh16/SR/CePlg5Bl/bqWOYEBIgjqi2Lm0yYAC6cZJvxP44sQW+GVqYcPHcVlPkI7zhz9y827HpKsCbWoX704atI2UH6hiv/nk6ksaN6TaRvwTrI6ehsj3rQxxYDqzQcjqMOYEjDvtwF0BxqQ1Nq/8WGVZUiD10twVMbSgaVelDw7Pg2n68gIP+/n8Sa3JYnQVqetmPC9G85q9M7C8b97Ab2HruziH1M93cz2dfE+vnbRUpHlV1kIqQqMzPDxlFDnvlcQJYLTa0pKTxBgVnom+BTfdDY8K9dX42JVYiwURsN/9gGXTyTKb4mUavN1PCreisRoCVR/MmwPmsyfgQRQWNhVKHwADhvLfqcjPDXODO2VY8ywXhqEWaYThwtlAhAXi02W9f5dVFl6oZ5DR3HsbKOAKTI9OM1Z9MFcG5/Ww2WkJ9T/978nutTGqDUKGs3o82/n8plOh2Y8LzYpN0TAQ+4eKTH1gpgk/1n92+hLwKPdKv9aYP4OUP/3rjJauiUU+LxplvC7x4SdNNUOXe7juDCxlDK3Mz8Aozn+GbGSCvF+62tYJtH3M4Gy8JBXXQQiZGqKijL2NvqifSOi6GT0EC6vxC02t8R+Ort3a/D8PuD2RXS HH4HMYsh C2Lv1LJYWaRiIdANFCPph2VcRFmJDaAQ5Cb0vgz81LdTvCo28q3EsJWi7xWJ2hbNZR4aBAntZBjMUdgZQPf8YEV6+1AFh13BQz5PN0A/0s9+W3Gx+YiSiyrB2KL+ORFD68e3sx8lPrAmHUKwAy/NcIRnoTCeCWhcE+cRfg7Jy4a8kqQGtaPEMCIGBgg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 1/20/2025 12:27 PM, Zi Yan wrote: > On Sun Jan 19, 2025 at 9:37 PM EST, Hyeonggon Yoo wrote: >> >> >> On 1/20/2025 11:16 AM, Zi Yan wrote: >>> On Sun Jan 19, 2025 at 7:37 PM EST, Hyeonggon Yoo wrote: >>>> >>>> >>>> On 1/18/2025 4:09 AM, Zi Yan wrote: >>>>> On 17 Jan 2025, at 0:00, Hyeonggon Yoo wrote: >>>>> >>>>>> On 1/6/2025 9:06 PM, Bruno Faccini wrote: >>>>>>> Current fake-numa implementation prevents new Numa >>>>>>> nodes to be later hot-plugged by drivers. >>>>>>> A common symptom of this limitation is the >>>>>>> "node was absent from the node_possible_map" >>>>>>> message by associated warning in mm/memory_hotplug.c: >>>>>>> add_memory_resource(). >>>>>>> This comes from the lack of remapping in both >>>>>>> pxm_to_node_map[] and node_to_pxm_map[] tables >>>>>>> to take fake-numa nodes into account and thus >>>>>>> triggers collisions with original and physical nodes >>>>>>> only-mapping that had been determined from BIOS tables. >>>>>>> This patch fixes this by doing the necessary node-ids >>>>>>> translation in both pxm_to_node_map[]/node_to_pxm_map[] >>>>>>> tables. >>>>>>> node_distance[] table has also been fixed accordingly. >>>>>>> >>>>>>> Signed-off-by: Bruno Faccini >>>>>> >>>>>> Hi Bruno, >>>>>> >>>>>> This commit causes WARN() and disables fakenuma >>>>>> when I pass "numa=fake=4U" on my QEMU setup. >>>>>> >>>>>> Attaching WARN() log and git bisect log to help debugging. >>>>> >>>>> Is your VM getting 4 NUMA nodes at the end? >>>> >>>> No, it only gets node 0 at the end. >>> >>> Got it. >>> >>>> >>>>> Can you also share your QEMU command line and your kernel config? >>>> >>>> The config is attached (fakenuma.config) >>>> >>>> QEMU command line: >>>> $ qemu-system-x86_64 \ >>>> -cpu host \ >>>> -enable-kvm \ >>>> -smp $(nproc) \ >>>> -m 32G \ >>>> -nographic \ >>>> -kernel arch/x86/boot/bzImage \ >>>> -initrd ../../rootfs.cpio.gz \ >>>> -nic user,model=virtio-net-pci \ >>>> -append "console=ttyS0 numa=fake=4U" >>>> >>>>> This can help Bruno reproduce the issue locally. At least, I >>>>> cannot reproduce it on mm-everything-2025-01-16-23-18. >>>> >>>> Still reproduced on mm-everything-2025-01-18-04-55 with my setup. >>>> >>>>> I also >>>>> notice that my VM does not have “No NUMA configuration found” >>>>> (in your log) in its kernel log and that might explain >>>>> why I could not reproduce it. >>>> >>>> Did you do NUMA configuration (-numa in QEMU) for your VM? >>>> I didn't, but it's still expected to work 'cause it's fake numa. >>> >>> No. It seems that the "No NUMA configuration found" is caused by your >>> qemu, where dummy_numa_init() is used (maybe ACPI SRAT is not >>> populated?) >> >> FYI, Yes. ACPI SRAT is not populated. >> >> $ qemu-system-x86_64 --version >> QEMU emulator version 6.2.0 (Debian 1:6.2+dfsg-2ubuntu6.24) >> Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers > > Basically, fix_pxm_node_maps() breaks fakenuma when it cannot fix the > pxm_to_node_map() due to ACPI SRAT is missing. IMHO, it should let > fakenuma proceed with a warning. The diff below fixed the issue > locally, but Bruno can tell if I miss anything. > > diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c > index 59fffe34c9d0..18d04f054e93 100644 > --- a/drivers/acpi/numa/srat.c > +++ b/drivers/acpi/numa/srat.c > @@ -95,9 +95,12 @@ int __init fix_pxm_node_maps(int max_nid) > int i, j, index = -1, count = 0; > nodemask_t nodes_to_enable; > > - if (numa_off || srat_disabled()) > + if (numa_off) > return -1; > > + if (srat_disabled()) > + return 0; > + > /* find fake nodes PXM mapping */ > for (i = 0; i < MAX_NUMNODES; i++) { > if (node_to_pxm_map[i] != PXM_INVAL) { > @@ -117,9 +120,9 @@ int __init fix_pxm_node_maps(int max_nid) > } > } > } > - if (WARN(index != max_nid, "%d max nid when expected %d\n", > - index, max_nid)) > - return -1; > + /* No srat data */ > + if (index != max_nid) > + return 0; > > nodes_clear(nodes_to_enable); Thanks, this fixes the bug on my environment as well. >> >>> I used the patch below to emulate the situation and >>> reproduced the issue. Since dummy_numa_init() is used when there is >>> no underlying NUMA architecture, NUMA initialization fails, or NUMA >>> is disabled on the command line (the last does not apply here), >>> it should be properly handled. I will let Bruno look into it. >>> >>> Thank you for the info. >>> >>> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c >>> index 64e5cdb2460a..70dfd9bfb23f 100644 >>> --- a/arch/x86/mm/numa.c >>> +++ b/arch/x86/mm/numa.c >>> @@ -223,7 +223,7 @@ static int __init dummy_numa_init(void) >>> */ >>> void __init x86_numa_init(void) >>> { >>> - if (!numa_off) { >>> + if (0 && !numa_off) { >>> #ifdef CONFIG_ACPI_NUMA >>> if (!numa_init(x86_acpi_numa_init)) >>> return; >>> >>>>>> >>>>>> [WARN() splat] >>>>>> >>>>>> [ 0.009676] No NUMA configuration found >>>>>> [ 0.009677] Faking a node at [mem 0x0000000000000000-0x000000083fffffff] >>>>>> [ 0.009705] Fake node size 16895MB too small, increasing to 16896MB >>>>>> [ 0.009708] Faking node 0 at [mem 0x0000000000001000-0x0000000420000fff] (16896MB) >>>>>> [ 0.009711] Faking node 1 at [mem 0x0000000420001000-0x000000083fffffff] (16895MB) >>>>>> [ 0.009738] ------------[ cut here ]------------ >>>>>> [ 0.009739] -1 max nid when expected 1 >>>>>> [ 0.009760] WARNING: CPU: 0 PID: 0 at drivers/acpi/numa/srat.c:120 fix_pxm_node_maps+0x577/0x810 >>>>>> [ 0.009766] Modules linked in: >>>>>> [ 0.009769] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.13.0-rc6+ #31 >>>>>> [ 0.009771] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 >>>>>> [ 0.009772] RIP: 0010:fix_pxm_node_maps+0x577/0x810 >>>>>> [ 0.009774] Code: 8b 85 40 ff ff ff 44 8b 8d 44 ff ff ff 8b 85 48 ff ff ff e9 83 fb ff ff 44 89 f2 44 89 ce 48 c7 c7 aa 2f 84 8d e8 b9 71 01 fe <0f> 0b 41 b8 ff ff ff ff e9 ba fb ff ff 48 c7 c7 a0 d8 d1 8d 44 89 >>>>>> [ 0.009776] RSP: 0000:ffffffff8da03be0 EFLAGS: 00010046 ORIG_RAX: 0000000000000000 >>>>>> [ 0.009778] RAX: 0000000000000000 RBX: 0000000000000400 RCX: 0000000000000000 >>>>>> [ 0.009779] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 >>>>>> [ 0.009780] RBP: ffffffff8da03ca8 R08: 0000000000000000 R09: 0000000000000000 >>>>>> [ 0.009781] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 >>>>>> [ 0.009782] R13: 00000000000007ff R14: 0000000000000001 R15: 0000000000000000 >>>>>> [ 0.009783] FS: 0000000000000000(0000) GS:ffffffff8de8c000(0000) knlGS:0000000000000000 >>>>>> [ 0.009784] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> [ 0.009785] CR2: ffff90e4bde01000 CR3: 00000001bd03e000 CR4: 00000000000200b0 >>>>>> [ 0.009788] Call Trace: >>>>>> [ 0.009790] >>>>>> [ 0.009792] ? show_regs+0x71/0x90 >>>>>> [ 0.009797] ? __warn+0x8d/0x150 >>>>>> [ 0.009799] ? fix_pxm_node_maps+0x577/0x810 >>>>>> [ 0.009802] ? report_bug+0x1ab/0x1c0 >>>>>> [ 0.009805] ? fixup_exception+0x27/0x380 >>>>>> [ 0.009809] ? early_fixup_exception+0xa2/0xf0 >>>>>> [ 0.009813] ? do_early_exception+0x28/0x90 >>>>>> [ 0.009816] ? early_idt_handler_common+0x2f/0x3a >>>>>> [ 0.009819] ? fix_pxm_node_maps+0x577/0x810 >>>>>> [ 0.009822] ? numa_cleanup_meminfo+0x8c/0x5b0 >>>>>> [ 0.009828] numa_emulation+0x4e3/0xad0 >>>>>> [ 0.009830] ? __pfx_dummy_numa_init+0x10/0x10 >>>>>> [ 0.009833] ? __pfx_dummy_numa_init+0x10/0x10 >>>>>> [ 0.009836] numa_memblks_init+0x10c/0x2c0 >>>>>> [ 0.009838] ? __pfx_dummy_numa_init+0x10/0x10 >>>>>> [ 0.009840] numa_init+0x61/0x3e0 >>>>>> [ 0.009842] ? topology_register_boot_apic+0x2a/0x40 >>>>>> [ 0.009846] x86_numa_init+0x65/0x80 >>>>>> [ 0.009848] initmem_init+0xe/0x20 >>>>>> [ 0.009850] setup_arch+0x9d8/0xfa0 >>>>>> [ 0.009852] ? _printk+0x58/0x90 >>>>>> [ 0.009855] start_kernel+0x5f/0xb50 >>>>>> [ 0.009860] x86_64_start_reservations+0x18/0x30 >>>>>> [ 0.009863] x86_64_start_kernel+0xc0/0x110 >>>>>> [ 0.009864] ? setup_ghcb+0xe/0x140 >>>>>> [ 0.009867] common_startup_64+0x13e/0x141 >>>>>> [ 0.009871] >>>>>> [ 0.009871] ---[ end trace 0000000000000000 ]--- >>>>>> >>>>>> [Git bisect log] >>>>>> >>>>>> # bad: [f378252a2168c2fbf8fc08b635061e5f6748c1f2] kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags() >>>>>> git bisect bad f378252a2168c2fbf8fc08b635061e5f6748c1f2 >>>>>> # good: [cbc5dde0a461240046e8a41c43d7c3b76d5db952] fs/proc: fix softlockup in __read_vmcore (part 2) >>>>>> git bisect good cbc5dde0a461240046e8a41c43d7c3b76d5db952 >>>>>> # good: [fb73203263021f2ee9dd54d280b1c543d10acd76] mm: move common part of pagetable_*_ctor to helper >>>>>> git bisect good fb73203263021f2ee9dd54d280b1c543d10acd76 >>>>>> # bad: [8fd966c9310f14d5bbe653cf087853582170aad6] mm, swap: remove old allocation path for HDD >>>>>> git bisect bad 8fd966c9310f14d5bbe653cf087853582170aad6 >>>>>> # bad: [8646971f02651146554233d63fdd721f5f060973] mm: shmem: skip swapcache for swapin of synchronous swap device >>>>>> git bisect bad 8646971f02651146554233d63fdd721f5f060973 >>>>>> # good: [b8ba614eaa8dd31702c61e7df7231b1b18b99259] mm/damon/paddr: report filter-passed bytes back for DAMOS_STAT action >>>>>> git bisect good b8ba614eaa8dd31702c61e7df7231b1b18b99259 >>>>>> # good: [fd0935b8e9e8c2edf05a3d0ffa09d0aa3e9cf2dd] Docs/ABI/damon: document per-region DAMOS filter-passed bytes stat file >>>>>> git bisect good fd0935b8e9e8c2edf05a3d0ffa09d0aa3e9cf2dd >>>>>> # good: [3ffd3cd7e2e8793d3b5fbb83c03668c47ab6d599] selftests/damon: remove tests for DAMON debugfs interface >>>>>> git bisect good 3ffd3cd7e2e8793d3b5fbb83c03668c47ab6d599 >>>>>> # good: [bf9f93f0b7bff8af780330f5094775a39caf3612] mm/damon: remove DAMON debugfs interface >>>>>> git bisect good bf9f93f0b7bff8af780330f5094775a39caf3612 >>>>>> # bad: [1746744be6ff271e29345a5be4cc144aa33b10ab] mm/memmap: prevent double scanning of memmap by kmemleak >>>>>> git bisect bad 1746744be6ff271e29345a5be4cc144aa33b10ab >>>>>> # bad: [ca19a5ab8756ef78c34d02d2b8c266a9cc7bf57d] mm/fake-numa: allow later numa node hotplug >>>>>> git bisect bad ca19a5ab8756ef78c34d02d2b8c266a9cc7bf57d >>>>>> # first bad commit: [ca19a5ab8756ef78c34d02d2b8c266a9cc7bf57d] mm/fake-numa: allow later numa node hotplug >>>>>> >>>>>>> --- >>>>>>> drivers/acpi/numa/srat.c | 86 ++++++++++++++++++++++++++++++++++++ >>>>>>> include/acpi/acpi_numa.h | 5 +++ >>>>>>> include/linux/numa_memblks.h | 3 ++ >>>>>>> mm/numa_emulation.c | 45 ++++++++++++++++--- >>>>>>> mm/numa_memblks.c | 2 +- >>>>>>> 5 files changed, 133 insertions(+), 8 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c >>>>>>> index bec0dcd1f9c3..59fffe34c9d0 100644 >>>>>>> --- a/drivers/acpi/numa/srat.c >>>>>>> +++ b/drivers/acpi/numa/srat.c >>>>>>> @@ -81,6 +81,92 @@ int acpi_map_pxm_to_node(int pxm) >>>>>>> } >>>>>>> EXPORT_SYMBOL(acpi_map_pxm_to_node); >>>>>>> +#ifdef CONFIG_NUMA_EMU >>>>>>> +/* >>>>>>> + * Take max_nid - 1 fake-numa nodes into account in both >>>>>>> + * pxm_to_node_map()/node_to_pxm_map[] tables. >>>>>>> + */ >>>>>>> +int __init fix_pxm_node_maps(int max_nid) >>>>>>> +{ >>>>>>> + static int pxm_to_node_map_copy[MAX_PXM_DOMAINS] __initdata >>>>>>> + = { [0 ... MAX_PXM_DOMAINS - 1] = NUMA_NO_NODE }; >>>>>>> + static int node_to_pxm_map_copy[MAX_NUMNODES] __initdata >>>>>>> + = { [0 ... MAX_NUMNODES - 1] = PXM_INVAL }; >>>>>>> + int i, j, index = -1, count = 0; >>>>>>> + nodemask_t nodes_to_enable; >>>>>>> + >>>>>>> + if (numa_off || srat_disabled()) >>>>>>> + return -1; >>>>>>> + >>>>>>> + /* find fake nodes PXM mapping */ >>>>>>> + for (i = 0; i < MAX_NUMNODES; i++) { >>>>>>> + if (node_to_pxm_map[i] != PXM_INVAL) { >>>>>>> + for (j = 0; j <= max_nid; j++) { >>>>>>> + if ((emu_nid_to_phys[j] == i) && >>>>>>> + WARN(node_to_pxm_map_copy[j] != PXM_INVAL, >>>>>>> + "Node %d is already binded to PXM %d\n", >>>>>>> + j, node_to_pxm_map_copy[j])) >>>>>>> + return -1; >>>>>>> + if (emu_nid_to_phys[j] == i) { >>>>>>> + node_to_pxm_map_copy[j] = >>>>>>> + node_to_pxm_map[i]; >>>>>>> + if (j > index) >>>>>>> + index = j; >>>>>>> + count++; >>>>>>> + } >>>>>>> + } >>>>>>> + } >>>>>>> + } >>>>>>> + if (WARN(index != max_nid, "%d max nid when expected %d\n", >>>>>>> + index, max_nid)) >>>>>>> + return -1; >>>>>>> + >>>>>>> + nodes_clear(nodes_to_enable); >>>>>>> + >>>>>>> + /* map phys nodes not used for fake nodes */ >>>>>>> + for (i = 0; i < MAX_NUMNODES; i++) { >>>>>>> + if (node_to_pxm_map[i] != PXM_INVAL) { >>>>>>> + for (j = 0; j <= max_nid; j++) >>>>>>> + if (emu_nid_to_phys[j] == i) >>>>>>> + break; >>>>>>> + /* fake nodes PXM mapping has been done */ >>>>>>> + if (j <= max_nid) >>>>>>> + continue; >>>>>>> + /* find first hole */ >>>>>>> + for (j = 0; >>>>>>> + j < MAX_NUMNODES && >>>>>>> + node_to_pxm_map_copy[j] != PXM_INVAL; >>>>>>> + j++) >>>>>>> + ; >>>>>>> + if (WARN(j == MAX_NUMNODES, >>>>>>> + "Number of nodes exceeds MAX_NUMNODES\n")) >>>>>>> + return -1; >>>>>>> + node_to_pxm_map_copy[j] = node_to_pxm_map[i]; >>>>>>> + node_set(j, nodes_to_enable); >>>>>>> + count++; >>>>>>> + } >>>>>>> + } >>>>>>> + >>>>>>> + /* creating reverse mapping in pxm_to_node_map[] */ >>>>>>> + for (i = 0; i < MAX_NUMNODES; i++) >>>>>>> + if (node_to_pxm_map_copy[i] != PXM_INVAL && >>>>>>> + pxm_to_node_map_copy[node_to_pxm_map_copy[i]] == NUMA_NO_NODE) >>>>>>> + pxm_to_node_map_copy[node_to_pxm_map_copy[i]] = i; >>>>>>> + >>>>>>> + /* overwrite with new mapping */ >>>>>>> + for (i = 0; i < MAX_NUMNODES; i++) { >>>>>>> + node_to_pxm_map[i] = node_to_pxm_map_copy[i]; >>>>>>> + pxm_to_node_map[i] = pxm_to_node_map_copy[i]; >>>>>>> + } >>>>>>> + >>>>>>> + /* enable other nodes found in PXM for hotplug */ >>>>>>> + nodes_or(numa_nodes_parsed, nodes_to_enable, numa_nodes_parsed); >>>>>>> + >>>>>>> + pr_debug("found %d total number of nodes\n", count); >>>>>>> + return 0; >>>>>>> +} >>>>>>> +#endif >>>>>>> + >>>>>>> static void __init >>>>>>> acpi_table_print_srat_entry(struct acpi_subtable_header *header) >>>>>>> { >>>>>>> diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h >>>>>>> index b5f594754a9e..99b960bd473c 100644 >>>>>>> --- a/include/acpi/acpi_numa.h >>>>>>> +++ b/include/acpi/acpi_numa.h >>>>>>> @@ -17,11 +17,16 @@ extern int node_to_pxm(int); >>>>>>> extern int acpi_map_pxm_to_node(int); >>>>>>> extern unsigned char acpi_srat_revision; >>>>>>> extern void disable_srat(void); >>>>>>> +extern int fix_pxm_node_maps(int max_nid); >>>>>>> extern void bad_srat(void); >>>>>>> extern int srat_disabled(void); >>>>>>> #else /* CONFIG_ACPI_NUMA */ >>>>>>> +static inline int fix_pxm_node_maps(int max_nid) >>>>>>> +{ >>>>>>> + return 0; >>>>>>> +} >>>>>>> static inline void disable_srat(void) >>>>>>> { >>>>>>> } >>>>>>> diff --git a/include/linux/numa_memblks.h b/include/linux/numa_memblks.h >>>>>>> index cfad6ce7e1bd..dd85613cdd86 100644 >>>>>>> --- a/include/linux/numa_memblks.h >>>>>>> +++ b/include/linux/numa_memblks.h >>>>>>> @@ -29,7 +29,10 @@ int __init numa_cleanup_meminfo(struct numa_meminfo *mi); >>>>>>> int __init numa_memblks_init(int (*init_func)(void), >>>>>>> bool memblock_force_top_down); >>>>>>> +extern int numa_distance_cnt; >>>>>>> + >>>>>>> #ifdef CONFIG_NUMA_EMU >>>>>>> +extern int emu_nid_to_phys[MAX_NUMNODES]; >>>>>>> int numa_emu_cmdline(char *str); >>>>>>> void __init numa_emu_update_cpu_to_node(int *emu_nid_to_phys, >>>>>>> unsigned int nr_emu_nids); >>>>>>> diff --git a/mm/numa_emulation.c b/mm/numa_emulation.c >>>>>>> index 031fb9961bf7..9d55679d99ce 100644 >>>>>>> --- a/mm/numa_emulation.c >>>>>>> +++ b/mm/numa_emulation.c >>>>>>> @@ -8,11 +8,12 @@ >>>>>>> #include >>>>>>> #include >>>>>>> #include >>>>>>> +#include >>>>>>> #define FAKE_NODE_MIN_SIZE ((u64)32 << 20) >>>>>>> #define FAKE_NODE_MIN_HASH_MASK (~(FAKE_NODE_MIN_SIZE - 1UL)) >>>>>>> -static int emu_nid_to_phys[MAX_NUMNODES]; >>>>>>> +int emu_nid_to_phys[MAX_NUMNODES]; >>>>>>> static char *emu_cmdline __initdata; >>>>>>> int __init numa_emu_cmdline(char *str) >>>>>>> @@ -379,6 +380,7 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt) >>>>>>> size_t phys_size = numa_dist_cnt * numa_dist_cnt * sizeof(phys_dist[0]); >>>>>>> int max_emu_nid, dfl_phys_nid; >>>>>>> int i, j, ret; >>>>>>> + nodemask_t physnode_mask = numa_nodes_parsed; >>>>>>> if (!emu_cmdline) >>>>>>> goto no_emu; >>>>>>> @@ -395,7 +397,6 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt) >>>>>>> * split the system RAM into N fake nodes. >>>>>>> */ >>>>>>> if (strchr(emu_cmdline, 'U')) { >>>>>>> - nodemask_t physnode_mask = numa_nodes_parsed; >>>>>>> unsigned long n; >>>>>>> int nid = 0; >>>>>>> @@ -465,9 +466,6 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt) >>>>>>> */ >>>>>>> max_emu_nid = setup_emu2phys_nid(&dfl_phys_nid); >>>>>>> - /* commit */ >>>>>>> - *numa_meminfo = ei; >>>>>>> - >>>>>>> /* Make sure numa_nodes_parsed only contains emulated nodes */ >>>>>>> nodes_clear(numa_nodes_parsed); >>>>>>> for (i = 0; i < ARRAY_SIZE(ei.blk); i++) >>>>>>> @@ -475,10 +473,21 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt) >>>>>>> ei.blk[i].nid != NUMA_NO_NODE) >>>>>>> node_set(ei.blk[i].nid, numa_nodes_parsed); >>>>>>> - numa_emu_update_cpu_to_node(emu_nid_to_phys, ARRAY_SIZE(emu_nid_to_phys)); >>>>>>> + /* fix pxm_to_node_map[] and node_to_pxm_map[] to avoid collision >>>>>>> + * with faked numa nodes, particularly during later memory hotplug >>>>>>> + * handling, and also update numa_nodes_parsed accordingly. >>>>>>> + */ >>>>>>> + ret = fix_pxm_node_maps(max_emu_nid); >>>>>>> + if (ret < 0) >>>>>>> + goto no_emu; >>>>>>> + >>>>>>> + /* commit */ >>>>>>> + *numa_meminfo = ei; >>>>>>> + >>>>>>> + numa_emu_update_cpu_to_node(emu_nid_to_phys, max_emu_nid + 1); >>>>>>> /* make sure all emulated nodes are mapped to a physical node */ >>>>>>> - for (i = 0; i < ARRAY_SIZE(emu_nid_to_phys); i++) >>>>>>> + for (i = 0; i < max_emu_nid + 1; i++) >>>>>>> if (emu_nid_to_phys[i] == NUMA_NO_NODE) >>>>>>> emu_nid_to_phys[i] = dfl_phys_nid; >>>>>>> @@ -501,12 +510,34 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt) >>>>>>> numa_set_distance(i, j, dist); >>>>>>> } >>>>>>> } >>>>>>> + for (i = 0; i < numa_distance_cnt; i++) { >>>>>>> + for (j = 0; j < numa_distance_cnt; j++) { >>>>>>> + int physi, physj; >>>>>>> + u8 dist; >>>>>>> + >>>>>>> + /* distance between fake nodes is already ok */ >>>>>>> + if (emu_nid_to_phys[i] != NUMA_NO_NODE && >>>>>>> + emu_nid_to_phys[j] != NUMA_NO_NODE) >>>>>>> + continue; >>>>>>> + if (emu_nid_to_phys[i] != NUMA_NO_NODE) >>>>>>> + physi = emu_nid_to_phys[i]; >>>>>>> + else >>>>>>> + physi = i - max_emu_nid; >>>>>>> + if (emu_nid_to_phys[j] != NUMA_NO_NODE) >>>>>>> + physj = emu_nid_to_phys[j]; >>>>>>> + else >>>>>>> + physj = j - max_emu_nid; >>>>>>> + dist = phys_dist[physi * numa_dist_cnt + physj]; >>>>>>> + numa_set_distance(i, j, dist); >>>>>>> + } >>>>>>> + } >>>>>>> /* free the copied physical distance table */ >>>>>>> memblock_free(phys_dist, phys_size); >>>>>>> return; >>>>>>> no_emu: >>>>>>> + numa_nodes_parsed = physnode_mask; >>>>>>> /* No emulation. Build identity emu_nid_to_phys[] for numa_add_cpu() */ >>>>>>> for (i = 0; i < ARRAY_SIZE(emu_nid_to_phys); i++) >>>>>>> emu_nid_to_phys[i] = i; >>>>>>> diff --git a/mm/numa_memblks.c b/mm/numa_memblks.c >>>>>>> index a3877e9bc878..ff4054f4334d 100644 >>>>>>> --- a/mm/numa_memblks.c >>>>>>> +++ b/mm/numa_memblks.c >>>>>>> @@ -7,7 +7,7 @@ >>>>>>> #include >>>>>>> #include >>>>>>> -static int numa_distance_cnt; >>>>>>> +int numa_distance_cnt; >>>>>>> static u8 *numa_distance; >>>>>>> nodemask_t numa_nodes_parsed __initdata; >>>>> >>>>> >>>>> Best Regards, >>>>> Yan, Zi >>>>> >>> >>> >>> >>> > > > >