From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 791CBCDB474 for ; Tue, 17 Oct 2023 08:31:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EDC1F8D00FD; Tue, 17 Oct 2023 04:31:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E8BE88D0007; Tue, 17 Oct 2023 04:31:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D530F8D00FD; Tue, 17 Oct 2023 04:31:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C6D978D0007 for ; Tue, 17 Oct 2023 04:31:50 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id A93B840E9E for ; Tue, 17 Oct 2023 08:31:50 +0000 (UTC) X-FDA: 81354285180.02.B7772E4 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf13.hostedemail.com (Postfix) with ESMTP id F24402000C for ; Tue, 17 Oct 2023 08:31:48 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LHOnZTB9; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of zhiguangni01@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=zhiguangni01@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697531509; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=XbcbAmOCv2GaogANlUUkIacNzCiiquB3TlI/GPI0gBs=; b=vbgSpPoTUnq8raq2Us90A5WcZAAYu3dyJOSPj6t+v2/xY+H6cZCEiPNeBcmcHUCggBfn/Y rz76CUFbxjSShix5MuPkqLlK1WM2p6hsAgCq9tlPztkHTSO9qSHW5f+d76fWrHovgdV2Gk ZqlB01Vwj3JnMDlC9ahk3VK9ZaRordI= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LHOnZTB9; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of zhiguangni01@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=zhiguangni01@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697531509; a=rsa-sha256; cv=none; b=wyfQPI3VjEo+NY67kpGXTrj4qKJTQm6it5VSw6y1HZP4AWDbhx4kb0RBG3XbmIPTMvB3uh BtmbFQkEuKzY4xwMLlRXoWvQYnz5FTpWL7EznAKqOrxrvdNB13vX+A33yTV8R8jkPbcTj3 JAP0Fgqf5L+uL3hz4xrMvIc/ITcg9Zk= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-1c871a095ceso36706185ad.2 for ; Tue, 17 Oct 2023 01:31:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697531507; x=1698136307; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=XbcbAmOCv2GaogANlUUkIacNzCiiquB3TlI/GPI0gBs=; b=LHOnZTB9GoPdPF+WefBX79eWuyIrWp7OkPcBxOku1jhoFH6mfeVf8nW54Bk1GiX4UT iOEwRBo3PQCb5QK5bMV2ZCp3y7C6LWfeMBP81D091ajWTnTR95tyVQU9+fdMOKaPLjhl YVrhsZglZmkMOClLpnCKKBstAZamXly4il48gcuLTQjdI6HnUsNqjsovh7amPS0RKgEa xmXgiEyZF7/MOMTxsH0G2ck1aoYdE0AWu98GnzJzJvF7XxnB3RuPMOsDFcBGAl7n47dy OLYxVKxvqH/oc9LbmmKZaWYDQEYJLYmdCBPSyZ5qlYrK4j9txMDrK6R8Um46VU0pHKzl F/3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697531507; x=1698136307; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=XbcbAmOCv2GaogANlUUkIacNzCiiquB3TlI/GPI0gBs=; b=XodrKiuGSYDiopW2gtsbc8MOTfKp216iE5PL1w+kTWYuYFeZbiajHxa3DTTI8k/khj 3q2B2lY0QMKAHbIA1Dfq31EKwuNNz2kxAxuO50B96Ymjgj7Dr1tAs0BFiSGx/onWKa8s mm3HIhT4FB2GR7QneB0FwxumyCaTuE3Mn1vMzFDZEUzdbKhR685nFw5Mg0PLWC4rtsRN 2gf++QlaS9GGv3lZN5oBfu65dkZrsPKkGYBHnFc4gRhrq2S4OQPbRZ1qJeOY3/XqOyOX +g4JyOPJGnzG86kYaSNvflYf8hOByMHL4xYrr09KH8eqog6arnVHrbuALbdXbhWH1nCE gY3A== X-Gm-Message-State: AOJu0Yy0k38grtOvLgv/IxRLqt1KDKrs73mgxijWsD6+qSrP44kDTOMu Bgc7hqrbB/1VcdZxHXMiK6M9dyGdkbAtJRx8 X-Google-Smtp-Source: AGHT+IEHlMnIQEbxWQLv6YjjNM5FsNxG9ZmE7/HIc2N7h4xzHEPy4DzECahLVTHTwFwqZjk3lkNLrA== X-Received: by 2002:a17:902:c712:b0:1ca:7af1:8a85 with SMTP id p18-20020a170902c71200b001ca7af18a85mr1286971plp.57.1697531507232; Tue, 17 Oct 2023 01:31:47 -0700 (PDT) Received: from aa-PC.zhaoxin.com ([180.169.121.82]) by smtp.gmail.com with ESMTPSA id l18-20020a170902d35200b001c9d235b3a0sm924017plk.7.2023.10.17.01.31.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 01:31:46 -0700 (PDT) From: Liam Ni To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, loongarch@lists.linux.dev, rppt@kernel.org Cc: chenhuacai@kernel.org, kernel@xen0n.name, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, akpm@linux-foundation.org, maobibo@loongson.cn, chenfeiyang@loongson.cn, zhiguangni01@gmail.com, zhoubinbin@loongson.cn Subject: [PATCH V5] NUMA: optimize detection of memory with no node id assigned by firmware Date: Tue, 17 Oct 2023 16:30:33 +0800 Message-Id: <20231017083033.118643-1-zhiguangni01@gmail.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Stat-Signature: 1apbftxo5roby1jqjjds445xne63d334 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: F24402000C X-HE-Tag: 1697531508-966796 X-HE-Meta: U2FsdGVkX1/pVQSFXAVxHIrLn2TA4CVji4x7xS0qWaZQy0xB5ZODh1ZKI7dloaRmgIz7OWTTLf7vdLPAf4+yc4FWv2YlhPnSSzE/Mmqawd2t/d+kXQGiF+mW1GvUp17IznUu2EXfSYCVwniKalbIcKqJvMWIMMFJuOQm5YaFDScV8m0QWhQNyKnrjyX+u2CfchJ58e6KK4ZfFYWs3RJpB+s2T87Mt+gmoWKzNKPgfvXeDEf8kB8yg43W+MfKoCTJOfdUk5SRiEWob9kEd7z8N4vJ/3mnLgisatZGOLGmx4qDzrAe5vf+kWAe5qteL0fCGT7cFIn3/q/NFW06Mm2h/k8JKl/eWXtAjiJ9fNA9g+FPo6DlR8WzbMX1wePpAFJQqH5fEtMoHBRNwKpAWo/BKbqsF79ylWVPdwyAE0GcUtkrfPMxklw7mW6LR5PjcAeJI4c+j090TaYVcbhT8s7dnrn0bjhv//+wLh5D9JorAnB6sjG05+EmCq4n+6p9OH21i6HLm5tdYvkpkn9Ls3azIhcSEFh69/W20xjjGUAlEPXcCPIyKCDhur9Nkm4oeJx5ZERVO810HHbL2Sw/5cizvA1lpXom5A/dGjcCkYo+Pv3CAHEboqYDaby76EeGpTtnTmkYiFn7wS3SUzr5tXG8RJ6dlHbtOwhXMJjuASJI79thSpw3JQQwz4/lzNNukBksS44+9SPEmdtwmpVP4MGhcGJh8nQEvjfkwm4y/VRJJYYVixrgTh54Yg9vtXR7W4cWg7IeCem1lRbP6ZjAub0QAQ6Z5b6Pp3zsGHzhq6ytigeUc9PmvrofFHky5wNpOSvFrE1pCktfr8XfNMPPO4TMO2aTI8vnuLE45zln3rqo8xm7ZHrYaD5yt7kZ/ya1KYDUo50ms1V/TLRV81ej+h0cgDWqyG2tL5PX7bvAjvpVFSIXNQiIFiNxM/pgcte8xq+OkjVIYfwmq9ajZKffq/y yfoR+VDH tFeZQO37O+mBIMzU5fkTFKwUJrYxL+HuTGLlDcW4v4MMSI86b+b+N0+LSmSj+ProiZBTxjMDWUB8vyX967n4ienFa/GPpZBdeNGmoLZeHrIEhAfB7AOEzUXcXxhfRqLY7RA56CC3qovLGTt5ZR6ezQdUp6UVJeQklb9gqVC/KV/3XRDBAJJX9tA3PzmgJLtQOwb5oWlsJlDY+4/RVRRh8fGiQJSujiE+WGPDaPEbD0WE9Weye285G42AR3cgpjrCuN/j8klW3p4lsSiEciAlTKPB4nTFPk4JcP9J77T7eWSpW+xNmIj+wuHgrBgYQmu2bkzIfrJrGKveaouvEqLZyFViuiVht2U2BEOR+4gZUBer+LOMhYhSkiMxPP+KLdEHq3PINvOJyZJuJKo0G1pDPv82NwAmfct1lXggTFgLhHFl2geXHz8HGwYSI5w8bi/8lvJ97 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Sanity check that makes sure the nodes cover all memory loops over numa_meminfo to count the pages that have node id assigned by the firmware, then loops again over memblock.memory to find the total amount of memory and in the end checks that the difference between the total memory and memory that covered by nodes is less than some threshold. Worse, the loop over numa_meminfo calls __absent_pages_in_range() that also partially traverses memblock.memory. It's much simpler and more efficient to have a single traversal of memblock.memory that verifies that amount of memory not covered by nodes is less than a threshold. Introduce memblock_validate_numa_coverage() that does exactly that and use it instead of numa_meminfo_cover_memory(). Signed-off-by: Liam Ni --- arch/loongarch/kernel/numa.c | 28 +--------------------------- arch/x86/mm/numa.c | 34 ++-------------------------------- include/linux/memblock.h | 1 + mm/memblock.c | 34 ++++++++++++++++++++++++++++++++++ 4 files changed, 38 insertions(+), 59 deletions(-) diff --git a/arch/loongarch/kernel/numa.c b/arch/loongarch/kernel/numa.c index cb00804826f7..fca94d16be34 100644 --- a/arch/loongarch/kernel/numa.c +++ b/arch/loongarch/kernel/numa.c @@ -226,32 +226,6 @@ static void __init node_mem_init(unsigned int node) #ifdef CONFIG_ACPI_NUMA -/* - * Sanity check to catch more bad NUMA configurations (they are amazingly - * common). Make sure the nodes cover all memory. - */ -static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi) -{ - int i; - u64 numaram, biosram; - - numaram = 0; - for (i = 0; i < mi->nr_blks; i++) { - u64 s = mi->blk[i].start >> PAGE_SHIFT; - u64 e = mi->blk[i].end >> PAGE_SHIFT; - - numaram += e - s; - numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e); - if ((s64)numaram < 0) - numaram = 0; - } - max_pfn = max_low_pfn; - biosram = max_pfn - absent_pages_in_range(0, max_pfn); - - BUG_ON((s64)(biosram - numaram) >= (1 << (20 - PAGE_SHIFT))); - return true; -} - static void __init add_node_intersection(u32 node, u64 start, u64 size, u32 type) { static unsigned long num_physpages; @@ -396,7 +370,7 @@ int __init init_numa_memory(void) return -EINVAL; init_node_memblock(); - if (numa_meminfo_cover_memory(&numa_meminfo) == false) + if (memblock_validate_numa_coverage(SZ_1M >> 12) == false) return -EINVAL; for_each_node_mask(node, node_possible_map) { diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 2aadb2019b4f..95376e7c263e 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -447,37 +447,6 @@ int __node_distance(int from, int to) } EXPORT_SYMBOL(__node_distance); -/* - * Sanity check to catch more bad NUMA configurations (they are amazingly - * common). Make sure the nodes cover all memory. - */ -static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi) -{ - u64 numaram, e820ram; - int i; - - numaram = 0; - for (i = 0; i < mi->nr_blks; i++) { - u64 s = mi->blk[i].start >> PAGE_SHIFT; - u64 e = mi->blk[i].end >> PAGE_SHIFT; - numaram += e - s; - numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e); - if ((s64)numaram < 0) - numaram = 0; - } - - e820ram = max_pfn - absent_pages_in_range(0, max_pfn); - - /* We seem to lose 3 pages somewhere. Allow 1M of slack. */ - if ((s64)(e820ram - numaram) >= (1 << (20 - PAGE_SHIFT))) { - printk(KERN_ERR "NUMA: nodes only cover %LuMB of your %LuMB e820 RAM. Not used.\n", - (numaram << PAGE_SHIFT) >> 20, - (e820ram << PAGE_SHIFT) >> 20); - return false; - } - return true; -} - /* * Mark all currently memblock-reserved physical memory (which covers the * kernel's own memory ranges) as hot-unswappable. @@ -583,7 +552,8 @@ static int __init numa_register_memblks(struct numa_meminfo *mi) return -EINVAL; } } - if (!numa_meminfo_cover_memory(mi)) + + if (!memblock_validate_numa_coverage(SZ_1M >> 12)) return -EINVAL; /* Finally register nodes. */ diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 1c1072e3ca06..727242f4b54a 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -120,6 +120,7 @@ int memblock_physmem_add(phys_addr_t base, phys_addr_t size); void memblock_trim_memory(phys_addr_t align); bool memblock_overlaps_region(struct memblock_type *type, phys_addr_t base, phys_addr_t size); +bool memblock_validate_numa_coverage(const u64 threshold_pages); int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size); int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size); int memblock_mark_mirror(phys_addr_t base, phys_addr_t size); diff --git a/mm/memblock.c b/mm/memblock.c index 0863222af4a4..4f1f2d8a8119 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -734,6 +734,40 @@ int __init_memblock memblock_add(phys_addr_t base, phys_addr_t size) return memblock_add_range(&memblock.memory, base, size, MAX_NUMNODES, 0); } +/** + * memblock_validate_numa_coverage - calculating memory with no node id assigned by firmware + * @threshold_pages: threshold memory of no node id assigned + * + * calculating memory with no node id assigned by firmware, + * If the number is less than the @threshold_pages, it returns true, + * otherwise it returns false. + * + * Return: + * true on success, false on failure. + */ +bool __init_memblock memblock_validate_numa_coverage(const u64 threshold_pages) +{ + unsigned long nr_pages = 0; + unsigned long start_pfn, end_pfn, mem_size_mb; + int nid, i; + + /* calculate lose page */ + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { + if (nid == NUMA_NO_NODE) + nr_pages += end_pfn - start_pfn; + } + + if (nr_pages >= threshold_pages) { + mem_size_mb = memblock_phys_mem_size() >> 20; + pr_err("NUMA: no nodes coverage for %luMB of %luMB RAM\n", + (nr_pages << PAGE_SHIFT) >> 20, mem_size_mb); + return false; + } + + return true; +} + + /** * memblock_isolate_range - isolate given range into disjoint memblocks * @type: memblock type to isolate range for -- 2.25.1