From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87443C02185 for ; Fri, 17 Jan 2025 05:01:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 04B716B0082; Fri, 17 Jan 2025 00:01:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F3D7C6B0083; Fri, 17 Jan 2025 00:01:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DDDF66B0085; Fri, 17 Jan 2025 00:01:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B7D2A6B0082 for ; Fri, 17 Jan 2025 00:01:00 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 5BD901A170F for ; Fri, 17 Jan 2025 05:01:00 +0000 (UTC) X-FDA: 83015744280.20.56A426C Received: from invmail3.skhynix.com (exvmail3.skhynix.com [166.125.252.90]) by imf15.hostedemail.com (Postfix) with ESMTP id 69BACA000F for ; Fri, 17 Jan 2025 05:00:57 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf15.hostedemail.com: domain of hyeonggon.yoo@sk.com designates 166.125.252.90 as permitted sender) smtp.mailfrom=hyeonggon.yoo@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737090058; a=rsa-sha256; cv=none; b=bqpX1uvFCc+RZSodKgWth9ZHFMK1vExUghbvqX5V2Z4chLbBXZFIDvm7fv463Y9bY8YOYN rKV433vs0DA4TXG/zlYELlc+ghwCzn3T2gIqS6/Z61eI7L22h5f0Vf6KC0uHLHZN4+cFo7 djR4d7lxwBuKS8CFA2uB8/YfrpY6PYc= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf15.hostedemail.com: domain of hyeonggon.yoo@sk.com designates 166.125.252.90 as permitted sender) smtp.mailfrom=hyeonggon.yoo@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737090058; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BvLiLCcPuw8kBcMUqse3FCEDvifFa6AZEQjX88KYphY=; b=eIpYOAWFsu7Yama4/WQlGL78j0Uyjtn0DMMKPQr8Ta0jr2ORvG/ponq5WbFBG6iqlQitpo wOdM8h+YqD5doYEjgB6eMW3BCkNXb7jpP8m2Ux/oNe9blyEihJqV24FnUbO+jN7M6ijxMt oLnoAqP2pJv6u5PfbciTomgtkNRCiQg= X-AuditID: a67dfc59-7a9ff700000194b3-55-6789e405033c Message-ID: Date: Fri, 17 Jan 2025 14:00:52 +0900 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Cc: kernel_team@skhynix.com, 42.hyeyoo@gmail.com, "linux-mm@kvack.org" , "akpm@linux-foundation.org" , "rppt@kernel.org" , "david@redhat.com" , "ziy@nvidia.com" , "jhubbard@nvidia.com" , "mrusiniak@nvidia.com" Subject: Re: [fake numa not working][PATCH 1/1] mm/fake-numa: allow later numa node hotplug To: Bruno Faccini , "linux-kernel@vger.kernel.org" References: <20250106120659.359610-1-bfaccini@nvidia.com> <20250106120659.359610-2-bfaccini@nvidia.com> Content-Language: en-US From: Hyeonggon Yoo In-Reply-To: <20250106120659.359610-2-bfaccini@nvidia.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFupikeLIzCtJLcpLzFFi42LhesuzSJftSWe6waorFhYTewws5qxfw2Yx 5/deNouv638xW2x8uojd4vKuOWwW99b8Z7U4decOk8WR9duZLGYfvcfuwOWxc9Zddo9NqzrZ PDZ9msTucWLGbxaP3uZ3bB7v911l8/i8SS6APYrLJiU1J7MstUjfLoEr41DjT/aC+aUVNw4s Zmxg/BnbxcjJISFgInFgRiMTjD2v/TYjiM0rYCnx4fw1dhCbRUBVYvaR+awQcUGJkzOfsIDY ogLyEvdvzQCq4eJgFnjJJHHl5QqwImGBWIlN/6Ywg9giAhkS1+c/B4sLCWRK9HS8ABvKLCAu cevJfKDFHBxsAloSOzpTQcKcAlYSHzqfsUGUmEl0be1ihLDlJba/ncMMsktC4DGbROfnZnaI oyUlDq64wTKBUXAWkvtmIVkxC8msWUhmLWBkWcUokplXlpuYmWOsV5ydUZmXWaGXnJ+7iREY Q8tq/0TuYPx2IfgQowAHoxIP7w3OznQh1sSy4srcQ4wSHMxKIrxpvzvShXhTEiurUovy44tK c1KLDzFKc7AoifMafStPERJITyxJzU5NLUgtgskycXBKNTB6f+jMafTq37v7lV9oDuvzTVYN d6PETSQ9wrKdptuUVfeaHVe7fmiBv9ZBM54T5k/bTDcfWu1x7ZNK4sqOnIDJgZ0fGv+skk2Z ULCh9f4Eq6en3j2tqdkf83fpBQ7xM19/fzaf81lU6KTO3ACdbw+fhjQKH3vZF7v1+9+J5zYl K4nlNX0Xy92vxFKckWioxVxUnAgAVdjG+J0CAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrHLMWRmVeSWpSXmKPExsXCNUOnRJf1SWe6wcJgi4k9BhZz1q9hs5jz ey+bxdf1v5gtNj5dxG5xeO5JVovLu+awWdxb85/V4tSdO0wWR9ZvZ7KYffQeuwO3x85Zd9k9 Nq3qZPPY9GkSu8eJGb9ZPHqb37F5vN93lc1j8YsPTB6fN8kFcERx2aSk5mSWpRbp2yVwZRxq /MleML+04saBxYwNjD9juxg5OSQETCTmtd9mBLF5BSwlPpy/xg5iswioSsw+Mp8VIi4ocXLm ExYQW1RAXuL+rRlANVwczAIvmSSuvFwBViQsECux6d8UZhBbRCBD4vr852BxIYFMiZ6OF2BD mQXEJW49mc/UxcjBwSagJbGjMxUkzClgJfGh8xkbRImZRNfWLkYIW15i+9s5zBMY+WYhOWMW kkmzkLTMQtKygJFlFaNIZl5ZbmJmjplecXZGZV5mhV5yfu4mRmAsLKv9M2kH47fL7ocYBTgY lXh4b3B2pguxJpYVV+YeYpTgYFYS4U373ZEuxJuSWFmVWpQfX1Sak1p8iFGag0VJnNcrPDVB SCA9sSQ1OzW1ILUIJsvEwSnVwLhIc9eDaWmbpv7Omsr5V6anYal5dKRv0EeVKItlAp3MIuEr FedHfNR7vmzqbaWSu23mvXPm28/8dIr9mcS8/aeLfTdkHOHY7TBFoZ795anVjgV76z84b3z+ Znn5yif2k5qePNygM3/bja83bqpOP7P4geTs1Ftnvn28oLvK2Z57zY17p2pmvW75qcRSnJFo qMVcVJwIAED2yhKBAgAA X-CFilter-Loop: Reflected X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 69BACA000F X-Stat-Signature: rtd9b4rerkia974mjeczc1bowr5coxbe X-Rspam-User: X-HE-Tag: 1737090057-215582 X-HE-Meta: U2FsdGVkX1+yaYlSNE9S0rbaROjIAy8sLRFUXJ24FATqphX7tw9GRNYia0aOEdw9Z32YHG75yVhiNuZwqH4agxKyzqPozkaMO45ILvLQq6Hf/Q+Jv5cf/724J4bdO1WSDMkIOSx+vudsw7i2oAnOWP2HzblKk6bUi5tmuDZDb8UzqSPiY2+bfmoKPm6zLB8B9sLCFZDajg+scSLEaqVG2r8R3cfKdrjEIL8mwgpZauJ41F2CqkhXsWi1p3OmueJ3DH7PXGwSzy2lDGfl1Iry97da/Tj8Z98vb7Gfbp+wsG15KrIP0Aw9hJPlKPJ1hhCXDmkUtcxNzJyzacS3se1GYtFhLU/ks2+yNx907QcoVv+qODIRDbKhiMiQC33sjGPigvYCVUBgpYAcTSe8ANr2GifSfQPp3nzDCjYMNuxs/IezrUdajvdTikH40TuWSAtXXUiKpmJhB3v/TsjzMYzpT5s8ecFjFVQWTEBzwDINSPqoR0VviQFd0edJZqjsrA/kIUwnyEBdpOoHFRiXMNsvpkY4suar/GM6Le9aBhKsCwvUm13mNEJDa46xYMj7l9lQdz6TX5HdAcfS+PwLtAEpq1vhB60/l3n3ertPqvScUzYPgMWe3fw1onDzMjzt5PzFPu6adlGwua/07vVEL/kpJ+gf9+j5PAQX8ZSQk+NtEuPT3WLJ8R5G71WaVP9tx59cRHK/rTNnHT34AU6ovJRQHJyzFr6qzyd33ZcEfqbBdO665xx/NYs32SfVI+6AM4djHHMTp5PxnJ4oyGGg3fEkwNcJ/cZhoEWruTtXsl82I5wAxvCQNeA+Idnhm5L8Q3DLP3AKWhL6mk0gzpwU6iPbQK5EGRYrSskadCTjW/6iF7W56/ZW6s1E7mBvvoV7cPnfWay7cZccAUpjlcBGCKuMQBRvN6yKSW9Ql97tKAf8+n1cX2p++a2au3oSJ+Xfn8Hrsn3O6mWi5Pa4AS2ds24 Qx2sve/f aeO+KJlhfsBVtQgsoLm5WED3x/BdIV9yJuYE7cyvhpXuJrqhb5fOcEiKM47fc/DqqiwlrqjypBp40+mViZQHVDkFcew6c25un89K3RiFscKg1w4fNW2DwUV+Ctz4uRbZrApoqf8+bHeJumwiwCdoir+qgNA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 1/6/2025 9:06 PM, Bruno Faccini wrote: > Current fake-numa implementation prevents new Numa > nodes to be later hot-plugged by drivers. > A common symptom of this limitation is the > "node was absent from the node_possible_map" > message by associated warning in mm/memory_hotplug.c: > add_memory_resource(). > This comes from the lack of remapping in both > pxm_to_node_map[] and node_to_pxm_map[] tables > to take fake-numa nodes into account and thus > triggers collisions with original and physical nodes > only-mapping that had been determined from BIOS tables. > This patch fixes this by doing the necessary node-ids > translation in both pxm_to_node_map[]/node_to_pxm_map[] > tables. > node_distance[] table has also been fixed accordingly. > > Signed-off-by: Bruno Faccini Hi Bruno, This commit causes WARN() and disables fakenuma when I pass "numa=fake=4U" on my QEMU setup. Attaching WARN() log and git bisect log to help debugging. Best, Hyeonggon [WARN() splat] [ 0.009676] No NUMA configuration found [ 0.009677] Faking a node at [mem 0x0000000000000000-0x000000083fffffff] [ 0.009705] Fake node size 16895MB too small, increasing to 16896MB [ 0.009708] Faking node 0 at [mem 0x0000000000001000-0x0000000420000fff] (16896MB) [ 0.009711] Faking node 1 at [mem 0x0000000420001000-0x000000083fffffff] (16895MB) [ 0.009738] ------------[ cut here ]------------ [ 0.009739] -1 max nid when expected 1 [ 0.009760] WARNING: CPU: 0 PID: 0 at drivers/acpi/numa/srat.c:120 fix_pxm_node_maps+0x577/0x810 [ 0.009766] Modules linked in: [ 0.009769] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.13.0-rc6+ #31 [ 0.009771] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 0.009772] RIP: 0010:fix_pxm_node_maps+0x577/0x810 [ 0.009774] Code: 8b 85 40 ff ff ff 44 8b 8d 44 ff ff ff 8b 85 48 ff ff ff e9 83 fb ff ff 44 89 f2 44 89 ce 48 c7 c7 aa 2f 84 8d e8 b9 71 01 fe <0f> 0b 41 b8 ff ff ff ff e9 ba fb ff ff 48 c7 c7 a0 d8 d1 8d 44 89 [ 0.009776] RSP: 0000:ffffffff8da03be0 EFLAGS: 00010046 ORIG_RAX: 0000000000000000 [ 0.009778] RAX: 0000000000000000 RBX: 0000000000000400 RCX: 0000000000000000 [ 0.009779] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 0.009780] RBP: ffffffff8da03ca8 R08: 0000000000000000 R09: 0000000000000000 [ 0.009781] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [ 0.009782] R13: 00000000000007ff R14: 0000000000000001 R15: 0000000000000000 [ 0.009783] FS: 0000000000000000(0000) GS:ffffffff8de8c000(0000) knlGS:0000000000000000 [ 0.009784] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.009785] CR2: ffff90e4bde01000 CR3: 00000001bd03e000 CR4: 00000000000200b0 [ 0.009788] Call Trace: [ 0.009790] [ 0.009792] ? show_regs+0x71/0x90 [ 0.009797] ? __warn+0x8d/0x150 [ 0.009799] ? fix_pxm_node_maps+0x577/0x810 [ 0.009802] ? report_bug+0x1ab/0x1c0 [ 0.009805] ? fixup_exception+0x27/0x380 [ 0.009809] ? early_fixup_exception+0xa2/0xf0 [ 0.009813] ? do_early_exception+0x28/0x90 [ 0.009816] ? early_idt_handler_common+0x2f/0x3a [ 0.009819] ? fix_pxm_node_maps+0x577/0x810 [ 0.009822] ? numa_cleanup_meminfo+0x8c/0x5b0 [ 0.009828] numa_emulation+0x4e3/0xad0 [ 0.009830] ? __pfx_dummy_numa_init+0x10/0x10 [ 0.009833] ? __pfx_dummy_numa_init+0x10/0x10 [ 0.009836] numa_memblks_init+0x10c/0x2c0 [ 0.009838] ? __pfx_dummy_numa_init+0x10/0x10 [ 0.009840] numa_init+0x61/0x3e0 [ 0.009842] ? topology_register_boot_apic+0x2a/0x40 [ 0.009846] x86_numa_init+0x65/0x80 [ 0.009848] initmem_init+0xe/0x20 [ 0.009850] setup_arch+0x9d8/0xfa0 [ 0.009852] ? _printk+0x58/0x90 [ 0.009855] start_kernel+0x5f/0xb50 [ 0.009860] x86_64_start_reservations+0x18/0x30 [ 0.009863] x86_64_start_kernel+0xc0/0x110 [ 0.009864] ? setup_ghcb+0xe/0x140 [ 0.009867] common_startup_64+0x13e/0x141 [ 0.009871] [ 0.009871] ---[ end trace 0000000000000000 ]--- [Git bisect log] # bad: [f378252a2168c2fbf8fc08b635061e5f6748c1f2] kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags() git bisect bad f378252a2168c2fbf8fc08b635061e5f6748c1f2 # good: [cbc5dde0a461240046e8a41c43d7c3b76d5db952] fs/proc: fix softlockup in __read_vmcore (part 2) git bisect good cbc5dde0a461240046e8a41c43d7c3b76d5db952 # good: [fb73203263021f2ee9dd54d280b1c543d10acd76] mm: move common part of pagetable_*_ctor to helper git bisect good fb73203263021f2ee9dd54d280b1c543d10acd76 # bad: [8fd966c9310f14d5bbe653cf087853582170aad6] mm, swap: remove old allocation path for HDD git bisect bad 8fd966c9310f14d5bbe653cf087853582170aad6 # bad: [8646971f02651146554233d63fdd721f5f060973] mm: shmem: skip swapcache for swapin of synchronous swap device git bisect bad 8646971f02651146554233d63fdd721f5f060973 # good: [b8ba614eaa8dd31702c61e7df7231b1b18b99259] mm/damon/paddr: report filter-passed bytes back for DAMOS_STAT action git bisect good b8ba614eaa8dd31702c61e7df7231b1b18b99259 # good: [fd0935b8e9e8c2edf05a3d0ffa09d0aa3e9cf2dd] Docs/ABI/damon: document per-region DAMOS filter-passed bytes stat file git bisect good fd0935b8e9e8c2edf05a3d0ffa09d0aa3e9cf2dd # good: [3ffd3cd7e2e8793d3b5fbb83c03668c47ab6d599] selftests/damon: remove tests for DAMON debugfs interface git bisect good 3ffd3cd7e2e8793d3b5fbb83c03668c47ab6d599 # good: [bf9f93f0b7bff8af780330f5094775a39caf3612] mm/damon: remove DAMON debugfs interface git bisect good bf9f93f0b7bff8af780330f5094775a39caf3612 # bad: [1746744be6ff271e29345a5be4cc144aa33b10ab] mm/memmap: prevent double scanning of memmap by kmemleak git bisect bad 1746744be6ff271e29345a5be4cc144aa33b10ab # bad: [ca19a5ab8756ef78c34d02d2b8c266a9cc7bf57d] mm/fake-numa: allow later numa node hotplug git bisect bad ca19a5ab8756ef78c34d02d2b8c266a9cc7bf57d # first bad commit: [ca19a5ab8756ef78c34d02d2b8c266a9cc7bf57d] mm/fake-numa: allow later numa node hotplug > --- > drivers/acpi/numa/srat.c | 86 ++++++++++++++++++++++++++++++++++++ > include/acpi/acpi_numa.h | 5 +++ > include/linux/numa_memblks.h | 3 ++ > mm/numa_emulation.c | 45 ++++++++++++++++--- > mm/numa_memblks.c | 2 +- > 5 files changed, 133 insertions(+), 8 deletions(-) > > diff --git a/drivers/acpi/numa/srat.c b/drivers/acpi/numa/srat.c > index bec0dcd1f9c3..59fffe34c9d0 100644 > --- a/drivers/acpi/numa/srat.c > +++ b/drivers/acpi/numa/srat.c > @@ -81,6 +81,92 @@ int acpi_map_pxm_to_node(int pxm) > } > EXPORT_SYMBOL(acpi_map_pxm_to_node); > > +#ifdef CONFIG_NUMA_EMU > +/* > + * Take max_nid - 1 fake-numa nodes into account in both > + * pxm_to_node_map()/node_to_pxm_map[] tables. > + */ > +int __init fix_pxm_node_maps(int max_nid) > +{ > + static int pxm_to_node_map_copy[MAX_PXM_DOMAINS] __initdata > + = { [0 ... MAX_PXM_DOMAINS - 1] = NUMA_NO_NODE }; > + static int node_to_pxm_map_copy[MAX_NUMNODES] __initdata > + = { [0 ... MAX_NUMNODES - 1] = PXM_INVAL }; > + int i, j, index = -1, count = 0; > + nodemask_t nodes_to_enable; > + > + if (numa_off || srat_disabled()) > + return -1; > + > + /* find fake nodes PXM mapping */ > + for (i = 0; i < MAX_NUMNODES; i++) { > + if (node_to_pxm_map[i] != PXM_INVAL) { > + for (j = 0; j <= max_nid; j++) { > + if ((emu_nid_to_phys[j] == i) && > + WARN(node_to_pxm_map_copy[j] != PXM_INVAL, > + "Node %d is already binded to PXM %d\n", > + j, node_to_pxm_map_copy[j])) > + return -1; > + if (emu_nid_to_phys[j] == i) { > + node_to_pxm_map_copy[j] = > + node_to_pxm_map[i]; > + if (j > index) > + index = j; > + count++; > + } > + } > + } > + } > + if (WARN(index != max_nid, "%d max nid when expected %d\n", > + index, max_nid)) > + return -1; > + > + nodes_clear(nodes_to_enable); > + > + /* map phys nodes not used for fake nodes */ > + for (i = 0; i < MAX_NUMNODES; i++) { > + if (node_to_pxm_map[i] != PXM_INVAL) { > + for (j = 0; j <= max_nid; j++) > + if (emu_nid_to_phys[j] == i) > + break; > + /* fake nodes PXM mapping has been done */ > + if (j <= max_nid) > + continue; > + /* find first hole */ > + for (j = 0; > + j < MAX_NUMNODES && > + node_to_pxm_map_copy[j] != PXM_INVAL; > + j++) > + ; > + if (WARN(j == MAX_NUMNODES, > + "Number of nodes exceeds MAX_NUMNODES\n")) > + return -1; > + node_to_pxm_map_copy[j] = node_to_pxm_map[i]; > + node_set(j, nodes_to_enable); > + count++; > + } > + } > + > + /* creating reverse mapping in pxm_to_node_map[] */ > + for (i = 0; i < MAX_NUMNODES; i++) > + if (node_to_pxm_map_copy[i] != PXM_INVAL && > + pxm_to_node_map_copy[node_to_pxm_map_copy[i]] == NUMA_NO_NODE) > + pxm_to_node_map_copy[node_to_pxm_map_copy[i]] = i; > + > + /* overwrite with new mapping */ > + for (i = 0; i < MAX_NUMNODES; i++) { > + node_to_pxm_map[i] = node_to_pxm_map_copy[i]; > + pxm_to_node_map[i] = pxm_to_node_map_copy[i]; > + } > + > + /* enable other nodes found in PXM for hotplug */ > + nodes_or(numa_nodes_parsed, nodes_to_enable, numa_nodes_parsed); > + > + pr_debug("found %d total number of nodes\n", count); > + return 0; > +} > +#endif > + > static void __init > acpi_table_print_srat_entry(struct acpi_subtable_header *header) > { > diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h > index b5f594754a9e..99b960bd473c 100644 > --- a/include/acpi/acpi_numa.h > +++ b/include/acpi/acpi_numa.h > @@ -17,11 +17,16 @@ extern int node_to_pxm(int); > extern int acpi_map_pxm_to_node(int); > extern unsigned char acpi_srat_revision; > extern void disable_srat(void); > +extern int fix_pxm_node_maps(int max_nid); > > extern void bad_srat(void); > extern int srat_disabled(void); > > #else /* CONFIG_ACPI_NUMA */ > +static inline int fix_pxm_node_maps(int max_nid) > +{ > + return 0; > +} > static inline void disable_srat(void) > { > } > diff --git a/include/linux/numa_memblks.h b/include/linux/numa_memblks.h > index cfad6ce7e1bd..dd85613cdd86 100644 > --- a/include/linux/numa_memblks.h > +++ b/include/linux/numa_memblks.h > @@ -29,7 +29,10 @@ int __init numa_cleanup_meminfo(struct numa_meminfo *mi); > int __init numa_memblks_init(int (*init_func)(void), > bool memblock_force_top_down); > > +extern int numa_distance_cnt; > + > #ifdef CONFIG_NUMA_EMU > +extern int emu_nid_to_phys[MAX_NUMNODES]; > int numa_emu_cmdline(char *str); > void __init numa_emu_update_cpu_to_node(int *emu_nid_to_phys, > unsigned int nr_emu_nids); > diff --git a/mm/numa_emulation.c b/mm/numa_emulation.c > index 031fb9961bf7..9d55679d99ce 100644 > --- a/mm/numa_emulation.c > +++ b/mm/numa_emulation.c > @@ -8,11 +8,12 @@ > #include > #include > #include > +#include > > #define FAKE_NODE_MIN_SIZE ((u64)32 << 20) > #define FAKE_NODE_MIN_HASH_MASK (~(FAKE_NODE_MIN_SIZE - 1UL)) > > -static int emu_nid_to_phys[MAX_NUMNODES]; > +int emu_nid_to_phys[MAX_NUMNODES]; > static char *emu_cmdline __initdata; > > int __init numa_emu_cmdline(char *str) > @@ -379,6 +380,7 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt) > size_t phys_size = numa_dist_cnt * numa_dist_cnt * sizeof(phys_dist[0]); > int max_emu_nid, dfl_phys_nid; > int i, j, ret; > + nodemask_t physnode_mask = numa_nodes_parsed; > > if (!emu_cmdline) > goto no_emu; > @@ -395,7 +397,6 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt) > * split the system RAM into N fake nodes. > */ > if (strchr(emu_cmdline, 'U')) { > - nodemask_t physnode_mask = numa_nodes_parsed; > unsigned long n; > int nid = 0; > > @@ -465,9 +466,6 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt) > */ > max_emu_nid = setup_emu2phys_nid(&dfl_phys_nid); > > - /* commit */ > - *numa_meminfo = ei; > - > /* Make sure numa_nodes_parsed only contains emulated nodes */ > nodes_clear(numa_nodes_parsed); > for (i = 0; i < ARRAY_SIZE(ei.blk); i++) > @@ -475,10 +473,21 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt) > ei.blk[i].nid != NUMA_NO_NODE) > node_set(ei.blk[i].nid, numa_nodes_parsed); > > - numa_emu_update_cpu_to_node(emu_nid_to_phys, ARRAY_SIZE(emu_nid_to_phys)); > + /* fix pxm_to_node_map[] and node_to_pxm_map[] to avoid collision > + * with faked numa nodes, particularly during later memory hotplug > + * handling, and also update numa_nodes_parsed accordingly. > + */ > + ret = fix_pxm_node_maps(max_emu_nid); > + if (ret < 0) > + goto no_emu; > + > + /* commit */ > + *numa_meminfo = ei; > + > + numa_emu_update_cpu_to_node(emu_nid_to_phys, max_emu_nid + 1); > > /* make sure all emulated nodes are mapped to a physical node */ > - for (i = 0; i < ARRAY_SIZE(emu_nid_to_phys); i++) > + for (i = 0; i < max_emu_nid + 1; i++) > if (emu_nid_to_phys[i] == NUMA_NO_NODE) > emu_nid_to_phys[i] = dfl_phys_nid; > > @@ -501,12 +510,34 @@ void __init numa_emulation(struct numa_meminfo *numa_meminfo, int numa_dist_cnt) > numa_set_distance(i, j, dist); > } > } > + for (i = 0; i < numa_distance_cnt; i++) { > + for (j = 0; j < numa_distance_cnt; j++) { > + int physi, physj; > + u8 dist; > + > + /* distance between fake nodes is already ok */ > + if (emu_nid_to_phys[i] != NUMA_NO_NODE && > + emu_nid_to_phys[j] != NUMA_NO_NODE) > + continue; > + if (emu_nid_to_phys[i] != NUMA_NO_NODE) > + physi = emu_nid_to_phys[i]; > + else > + physi = i - max_emu_nid; > + if (emu_nid_to_phys[j] != NUMA_NO_NODE) > + physj = emu_nid_to_phys[j]; > + else > + physj = j - max_emu_nid; > + dist = phys_dist[physi * numa_dist_cnt + physj]; > + numa_set_distance(i, j, dist); > + } > + } > > /* free the copied physical distance table */ > memblock_free(phys_dist, phys_size); > return; > > no_emu: > + numa_nodes_parsed = physnode_mask; > /* No emulation. Build identity emu_nid_to_phys[] for numa_add_cpu() */ > for (i = 0; i < ARRAY_SIZE(emu_nid_to_phys); i++) > emu_nid_to_phys[i] = i; > diff --git a/mm/numa_memblks.c b/mm/numa_memblks.c > index a3877e9bc878..ff4054f4334d 100644 > --- a/mm/numa_memblks.c > +++ b/mm/numa_memblks.c > @@ -7,7 +7,7 @@ > #include > #include > > -static int numa_distance_cnt; > +int numa_distance_cnt; > static u8 *numa_distance; > > nodemask_t numa_nodes_parsed __initdata;