From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41424C0032E for ; Wed, 25 Oct 2023 13:15:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AA8E68D000A; Wed, 25 Oct 2023 09:15:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A59048D0001; Wed, 25 Oct 2023 09:15:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 920DC8D000A; Wed, 25 Oct 2023 09:15:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7F0518D0001 for ; Wed, 25 Oct 2023 09:15:37 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 37291160299 for ; Wed, 25 Oct 2023 13:15:37 +0000 (UTC) X-FDA: 81384030714.03.71961DC Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com [209.85.208.49]) by imf11.hostedemail.com (Postfix) with ESMTP id 71EC640028 for ; Wed, 25 Oct 2023 13:15:35 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jFgf3kO7; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of zhiguangni01@gmail.com designates 209.85.208.49 as permitted sender) smtp.mailfrom=zhiguangni01@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698239735; a=rsa-sha256; cv=none; b=MKgHqG55Ncw0LJm2kDBmgtlZYjoeeDKQSppvnVcSS96zIxAbshKYc9HuQ8nBTD40lW7jg4 qgVpVfz5yIc8wpPR37A9NyF3mRYsWLCrhtkaZGiekSOCDmGlM5xpYPJuIhU2T5qbEhra6V 7B+tUZff4k7vgbmZULFs3lCeMtJ/qNo= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=jFgf3kO7; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of zhiguangni01@gmail.com designates 209.85.208.49 as permitted sender) smtp.mailfrom=zhiguangni01@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698239735; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=9RaUlU1nxCqKDcemUMCbFBahiSg15h6A7+woT7EpFjA=; b=ezL51HWbfqk3gYta8P2BnJcvMIxzvPbFUs+l0Lz8aqlXtIk9fo2ud3+4Uipi6aSSjolAsT A+8oSE7V9zVRXV0Y+v1mUMwyXyQXzrigZO7hD1wo39YaQYjNr+NVm+T/xYMHYoo7qZGABD 1kuwYoKk9caIg7LfLS84ebLPGJyNLGk= Received: by mail-ed1-f49.google.com with SMTP id 4fb4d7f45d1cf-51e28cac164so1828374a12.1 for ; Wed, 25 Oct 2023 06:15:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698239734; x=1698844534; darn=kvack.org; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=9RaUlU1nxCqKDcemUMCbFBahiSg15h6A7+woT7EpFjA=; b=jFgf3kO78lxM/prASqO1b62q3b4QOnAcV3LVLar88AmFYAJMWHsOC68XDP7/JxzY0q x6dXkWPF7dDMtJLgqPJycRUFCRa1FGDaeSuiaaenqsFDksPT7zttLW6O/8u8/gWJe7Id RmqlCCJgl+3ke/5o17q/xIEzh18QuS5dKLauIPePT6qlyA8G+7d3FmC4IHdNajOYf4iC gA4I5yaMYZl7GgmtHP8QLlh1jXjqSgsw6owTSytXvGyvDAwX11PWcIrGL6Zafpqld4z3 jwS/Eg0wE6PxIQHxh7l4CFI1hk/TlGlEMnN/Sv2eLf56KMvZ7fzor6ddvRC52C+6Z6IL u0/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698239734; x=1698844534; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=9RaUlU1nxCqKDcemUMCbFBahiSg15h6A7+woT7EpFjA=; b=Z6BZwUekFNUTRWBwcBuntJvy0KYBIUoMYTMZTEVc7g2YTD3iY4M1DPxElKos88hInS RpUiGiLkNsjGBdUrOqzlxDiKxjCfhYdGPsOSTNVnoEt2DBv8Bj/ypvEP3w2yRq+fpxoU hx3B+pOke8fNrT4xcQA4eYxgV7QXwOf2WNhBbgb1YEiBOEK/9nbqPjh8xUP+cXA8h9Rq GWUHCyaoKPW51VZg3Mfvxd2pKGVDYaK4IROWHKW5BksM6aLbGY1JUT18HS73njjBaD/D 88bDZSCbhsEnH5D2JHUcFqO9hCt6Ykjc8kOIHJyAOMzHojYmDmpcfaPbOi+jJSoibmoS jjcQ== X-Gm-Message-State: AOJu0YxaHLLPm8zP4mETpjI3cCJdcAlYgyHBGGlMaXWUPufS5pwNAFXW oTgiUr+oPthQFPkaXOGJUihkj9IPwZUuOOr1ZdSwSgj11/WhMdr6 X-Google-Smtp-Source: AGHT+IGf0RFhqVSBCtZWMVshUEqDHl2Mhw5MfY0Igi6bPLXi+zIldIN36bhKyeMs64sCzHBPAcBSh0Gu0kSB2/0dvw8= X-Received: by 2002:a17:906:fe06:b0:9a9:f042:deb9 with SMTP id wy6-20020a170906fe0600b009a9f042deb9mr14576255ejb.19.1698239733182; Wed, 25 Oct 2023 06:15:33 -0700 (PDT) MIME-Version: 1.0 From: Liam Ni Date: Wed, 25 Oct 2023 21:15:21 +0800 Message-ID: Subject: [PATCH V7] NUMA: optimize detection of memory with no node id assigned by firmware To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, loongarch@lists.linux.dev, Mike Rapoport Cc: chenhuacai@kernel.org, kernel@xen0n.name, Dave Hansen , luto@kernel.org, peterz@infradead.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Andrew Morton , maobibo@loongson.cn, chenfeiyang@loongson.cn, Zhiguang Ni , zhoubinbin@loongson.cn Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 71EC640028 X-Stat-Signature: 3iywe8qjh81admd7aejgg7qqjndz5dz3 X-HE-Tag: 1698239735-120923 X-HE-Meta: U2FsdGVkX19l1hoclLzeEwVtSX+0RbMYpFL9UVvkx+VMD40Sc4xiYleR+3qV7RjRBuS3g96xWtIyw8/j9T1wHmaCmf77l6jbuAwIsIgWsi6Nti/5oeNfaCYP4ZhzK44c6rOg2F+95YA+L/8VNA9hBWHUqqHawf4hsBBbC7g+BBhwCD9ZHyMkzqPYjrn8ccEq5WFpD+h+yvlbN7/Jdjwnua+KQdJAbciE/hbA6Xhc9xOBb6E20VDSbJKQI6Vuh9TDKVJY52b2shJVanZAJZ1X0w5URP8Hxr7+ZXlZt+QqdA2HVlfenmFcC9tX+KOID1rtTX9cIUNvQ95Xb+klCr0IKxFxO+VfPmXN/dhNMjNLpe+1IM7CvnVg0wjNIa4HLiLJtR8jNxI2ZJUQ0zrWB4/DnwL6mcHqwDi6rWJJLak0WrEKWrQ2GyLwWGf0GDKJkNTqU4J0zENJkU2QAOxh4D1KGWD+c+uTaFUUnINr1rT/sqE6beibTyAcVX0FY/KgivSCKXb3XygpLzW3prebRub80dd1lCoYjZthGA0i5SCKvzp9FhJik8pNJVBv+X1KzfsJ9i0gDfqIfeEnVSs+uqON4eimRPhn1nAmi+fkGnG9EhbQjp2RC+pfsixwl26o05cyCo6undfwBSLXb3DByQMrY241lQ+wW59PrfBgQmOsi2EnJ3df0b2Gk3++XcqNoqeuArbLcmrjnwkBv9+JTUjPYUAdVSQvrfwDp14xxGi6o5ZqsDgcZxU1WYfKwqy9A1FcP+4vo7M1yqo8N4GbwhUoXLWQWbp+ColZjtSPj2KGlI077Ovj2xs7cHbQ9TUIkX25TnlG1WfT4tAU/4ya7EJZHOY2DHrdBKm9HaG8BJkK0BUdyGxtW+YPLLonIx8AoEYGgXMV5Jk3kKmjlQ1kbiOmheCulAJGJu2y0eN1pu53Xo18p7GxeODdNb7/mdRuajA417gTT6y3PsmKVZZoRZv c6L+yMRr GLOiHBjXxAIc52oMtVzyeovOv25mUXxz+kkrFpJUEC6uj3giZ/GHZiOwcY0/oUqSfi0OulEXQsrOcdjHamcEe0E3AotXXwX3PyJ+wVqnyd0aLY2FhbqhDjLzece9rUVt9Aa/oGBVgZGgXeHdR/cRWopyf8QHCSWfwLnx/uxUeFSmFPKzVA3eCtVJZt90SiDgBJGoZ/zq4kai3ir5eUakXUCtO6cz48LeCJmcFihv0pf4t+UMkph+74r2BCNDqDP9+1JN9DoQhv7zyYLMhJUDh7IPIuxjEjZPAgd0REbFerrAOCL8EepPEtjBBDFsDd3rVqt56czFiytmnsXfE8K90cHKWGbhGn37P7R9AkhhCrmh/5wKQ5hriJYQxhw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Sanity check that makes sure the nodes cover all memory loops over numa_meminfo to count the pages that have node id assigned by the firmware, then loops again over memblock.memory to find the total amount of memory and in the end checks that the difference between the total memory and memory that covered by nodes is less than some threshold. Worse, the loop over numa_meminfo calls __absent_pages_in_range() that also partially traverses memblock.memory. It's much simpler and more efficient to have a single traversal of memblock.memory that verifies that amount of memory not covered by nodes is less than a threshold. Introduce memblock_validate_numa_coverage() that does exactly that and use it instead of numa_meminfo_cover_memory(). Signed-off-by: Liam Ni --- arch/loongarch/kernel/numa.c | 28 +--------------------------- arch/x86/mm/numa.c | 34 ++-------------------------------- include/linux/memblock.h | 1 + mm/memblock.c | 34 ++++++++++++++++++++++++++++++++++ 4 files changed, 38 insertions(+), 59 deletions(-) diff --git a/arch/loongarch/kernel/numa.c b/arch/loongarch/kernel/numa.c index cb00804826f7..0e69679bfc8d 100644 --- a/arch/loongarch/kernel/numa.c +++ b/arch/loongarch/kernel/numa.c @@ -226,32 +226,6 @@ static void __init node_mem_init(unsigned int node) #ifdef CONFIG_ACPI_NUMA -/* - * Sanity check to catch more bad NUMA configurations (they are amazingly - * common). Make sure the nodes cover all memory. - */ -static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi) -{ - int i; - u64 numaram, biosram; - - numaram = 0; - for (i = 0; i < mi->nr_blks; i++) { - u64 s = mi->blk[i].start >> PAGE_SHIFT; - u64 e = mi->blk[i].end >> PAGE_SHIFT; - - numaram += e - s; - numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e); - if ((s64)numaram < 0) - numaram = 0; - } - max_pfn = max_low_pfn; - biosram = max_pfn - absent_pages_in_range(0, max_pfn); - - BUG_ON((s64)(biosram - numaram) >= (1 << (20 - PAGE_SHIFT))); - return true; -} - static void __init add_node_intersection(u32 node, u64 start, u64 size, u32 type) { static unsigned long num_physpages; @@ -396,7 +370,7 @@ int __init init_numa_memory(void) return -EINVAL; init_node_memblock(); - if (numa_meminfo_cover_memory(&numa_meminfo) == false) + if (!memblock_validate_numa_coverage(SZ_1M)) return -EINVAL; for_each_node_mask(node, node_possible_map) { diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 2aadb2019b4f..4079c9edaa93 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -447,37 +447,6 @@ int __node_distance(int from, int to) } EXPORT_SYMBOL(__node_distance); -/* - * Sanity check to catch more bad NUMA configurations (they are amazingly - * common). Make sure the nodes cover all memory. - */ -static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi) -{ - u64 numaram, e820ram; - int i; - - numaram = 0; - for (i = 0; i < mi->nr_blks; i++) { - u64 s = mi->blk[i].start >> PAGE_SHIFT; - u64 e = mi->blk[i].end >> PAGE_SHIFT; - numaram += e - s; - numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e); - if ((s64)numaram < 0) - numaram = 0; - } - - e820ram = max_pfn - absent_pages_in_range(0, max_pfn); - - /* We seem to lose 3 pages somewhere. Allow 1M of slack. */ - if ((s64)(e820ram - numaram) >= (1 << (20 - PAGE_SHIFT))) { - printk(KERN_ERR "NUMA: nodes only cover %LuMB of your %LuMB e820 RAM. Not used.\n", - (numaram << PAGE_SHIFT) >> 20, - (e820ram << PAGE_SHIFT) >> 20); - return false; - } - return true; -} - /* * Mark all currently memblock-reserved physical memory (which covers the * kernel's own memory ranges) as hot-unswappable. @@ -583,7 +552,8 @@ static int __init numa_register_memblks(struct numa_meminfo *mi) return -EINVAL; } } - if (!numa_meminfo_cover_memory(mi)) + + if (!memblock_validate_numa_coverage(SZ_1M)) return -EINVAL; /* Finally register nodes. */ diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 1c1072e3ca06..a94efe977539 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -120,6 +120,7 @@ int memblock_physmem_add(phys_addr_t base, phys_addr_t size); void memblock_trim_memory(phys_addr_t align); bool memblock_overlaps_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size); +bool memblock_validate_numa_coverage(unsigned long threshold_bytes); int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size); int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size); int memblock_mark_mirror(phys_addr_t base, phys_addr_t size); diff --git a/mm/memblock.c b/mm/memblock.c index 0863222af4a4..a1917aa331d6 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -734,6 +734,40 @@ int __init_memblock memblock_add(phys_addr_t base, phys_addr_t size) return memblock_add_range(&memblock.memory, base, size, MAX_NUMNODES, 0); } +/** + * memblock_validate_numa_coverage - check if amount of memory with + * no node ID assigned is less than a threshold + * @threshold_bytes: maximal number of bytes that can have unassigned node + * ID (in bytes). + * + * A buggy firmware may report memory that does not belong to any node. + * Check if amount of such memory is below @threshold_pages. + * + * Return: true on success, false on failure. + */ +bool __init_memblock memblock_validate_numa_coverage(unsigned long threshold_bytes) +{ + unsigned long nr_pages = 0; + unsigned long start_pfn, end_pfn, mem_size_mb; + int nid, i; + + /* calculate lose page */ + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { + if (nid == NUMA_NO_NODE) + nr_pages += end_pfn - start_pfn; + } + + if ((nr_pages << PAGE_SHIFT) >= threshold_bytes) { + mem_size_mb = memblock_phys_mem_size() >> 20; + pr_err("NUMA: no nodes coverage for %luMB of %luMB RAM\n", + (nr_pages << PAGE_SHIFT) >> 20, mem_size_mb); + return false; + } + + return true; +} + + /** * memblock_isolate_range - isolate given range into disjoint memblocks * @type: memblock type to isolate range for -- 2.25.1