linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 1/5] drivers/base/node: Optimize memory block registration to reduce boot time
@ 2025-05-26 14:50 Donet Tom
  2025-05-26 14:50 ` [PATCH v6 2/5] drivers/base/node: remove register_mem_block_under_node_early() Donet Tom
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Donet Tom @ 2025-05-26 14:50 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton, Mike Rapoport, Oscar Salvador,
	Zi Yan, Greg Kroah-Hartman
  Cc: Ritesh Harjani, linux-mm, linux-kernel, Rafael J . Wysocki,
	Danilo Krummrich, Jonathan Cameron, Alison Schofield, Yury Norov,
	Dave Jiang, Madhavan Srinivasan, Nilay Shroff, linuxppc-dev,
	Donet Tom

During node device initialization, `memory blocks` are registered under
each NUMA node. The `memory blocks` to be registered are identified using
the node’s start and end PFNs, which are obtained from the node's pg_data

However, not all PFNs within this range necessarily belong to the same
node—some may belong to other nodes. Additionally, due to the
discontiguous nature of physical memory, certain sections within a
`memory block` may be absent.

As a result, `memory blocks` that fall between a node’s start and end
PFNs may span across multiple nodes, and some sections within those blocks
may be missing. `Memory blocks` have a fixed size, which is architecture
dependent.

Due to these considerations, the memory block registration is currently
performed as follows:

for_each_online_node(nid):
    start_pfn = pgdat->node_start_pfn;
    end_pfn = pgdat->node_start_pfn + node_spanned_pages;
    for_each_memory_block_between(PFN_PHYS(start_pfn), PFN_PHYS(end_pfn))
        mem_blk = memory_block_id(pfn_to_section_nr(pfn));
        pfn_mb_start=section_nr_to_pfn(mem_blk->start_section_nr)
        pfn_mb_end = pfn_start + memory_block_pfns - 1
        for (pfn = pfn_mb_start; pfn < pfn_mb_end; pfn++):
            if (get_nid_for_pfn(pfn) != nid):
                continue;
            else
                do_register_memory_block_under_node(nid, mem_blk,
                                                        MEMINIT_EARLY);

Here, we derive the start and end PFNs from the node's pg_data, then
determine the memory blocks that may belong to the node. For each
`memory block` in this range, we inspect all PFNs it contains and check
their associated NUMA node ID. If a PFN within the block matches the
current node, the memory block is registered under that node.

If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, get_nid_for_pfn() performs
a binary search in the `memblock regions` to determine the NUMA node ID
for a given PFN. If it is not enabled, the node ID is retrieved directly
from the struct page.

On large systems, this process can become time-consuming, especially since
we iterate over each `memory block` and all PFNs within it until a match is
found. When CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, the additional
overhead of the binary search increases the execution time significantly,
potentially leading to soft lockups during boot.

In this patch, we iterate over `memblock region` to identify the
`memory blocks` that belong to the current NUMA node. `memblock regions`
are contiguous memory ranges, each associated with a single NUMA node, and
they do not span across multiple nodes.

for_each_memory_region(r): // r => region
  if (!node_online(r->nid)):
    continue;
  else
    for_each_memory_block_between(r->base, r->base + r->size - 1):
      do_register_memory_block_under_node(r->nid, mem_blk, MEMINIT_EARLY);

We iterate over all memblock regions, and if the node associated with the
region is online, we calculate the start and end memory blocks based on the
region's start and end PFNs. We then register all the memory blocks within
that range under the region node.

Test Results on My system with 32TB RAM
=======================================
1. Boot time with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled.

Without this patch
------------------
Startup finished in 1min 16.528s (kernel)

With this patch
---------------
Startup finished in 17.236s (kernel) - 78% Improvement

2. Boot time with CONFIG_DEFERRED_STRUCT_PAGE_INIT disabled.

Without this patch
------------------
Startup finished in 28.320s (kernel)

With this patch
---------------
Startup finished in 15.621s (kernel) - 46% Improvement

Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>

---

v5 -> v6

1. Changed function name register_memory_blocks_under_node_early() to register_memory_blocks_under_nodes()
2. Added memblock_get_region_node() to get the region node.

v5 - https://lore.kernel.org/all/d2490e807b2c13950bc1d4199f22ec078cc4c56a.1747904868.git.donettom@linux.ibm.com/
v4 - https://lore.kernel.org/all/f94685be9cdc931a026999d236d7e92de29725c7.1747376551.git.donettom@linux.ibm.com/
v3 - https://lore.kernel.org/all/b49ed289096643ff5b5fbedcf1d1c1be42845a74.1746250339.git.donettom@linux.ibm.com/
v2 - https://lore.kernel.org/all/fbe1e0c7d91bf3fa9a64ff5d84b53ded1d0d5ac7.1745852397.git.donettom@linux.ibm.com/
v1 - https://lore.kernel.org/all/50142a29010463f436dc5c4feb540e5de3bb09df.1744175097.git.donettom@linux.ibm.com/
---
---
 drivers/base/memory.c  | 21 ++++-------------
 drivers/base/node.c    | 51 ++++++++++++++++++++++++++++++++++++++++--
 include/linux/memory.h | 19 +++++++++++++++-
 include/linux/node.h   |  3 +++
 4 files changed, 74 insertions(+), 20 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 19469e7f88c2..39fcc075a36f 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -22,6 +22,7 @@
 #include <linux/stat.h>
 #include <linux/slab.h>
 #include <linux/xarray.h>
+#include <linux/export.h>
 
 #include <linux/atomic.h>
 #include <linux/uaccess.h>
@@ -48,22 +49,8 @@ int mhp_online_type_from_str(const char *str)
 
 #define to_memory_block(dev) container_of(dev, struct memory_block, dev)
 
-static int sections_per_block;
-
-static inline unsigned long memory_block_id(unsigned long section_nr)
-{
-	return section_nr / sections_per_block;
-}
-
-static inline unsigned long pfn_to_block_id(unsigned long pfn)
-{
-	return memory_block_id(pfn_to_section_nr(pfn));
-}
-
-static inline unsigned long phys_to_block_id(unsigned long phys)
-{
-	return pfn_to_block_id(PFN_DOWN(phys));
-}
+int sections_per_block;
+EXPORT_SYMBOL(sections_per_block);
 
 static int memory_subsys_online(struct device *dev);
 static int memory_subsys_offline(struct device *dev);
@@ -632,7 +619,7 @@ int __weak arch_get_memory_phys_device(unsigned long start_pfn)
  *
  * Called under device_hotplug_lock.
  */
-static struct memory_block *find_memory_block_by_id(unsigned long block_id)
+struct memory_block *find_memory_block_by_id(unsigned long block_id)
 {
 	struct memory_block *mem;
 
diff --git a/drivers/base/node.c b/drivers/base/node.c
index cd13ef287011..20b6f4496e1b 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -20,6 +20,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/swap.h>
 #include <linux/slab.h>
+#include <linux/memblock.h>
 
 static const struct bus_type node_subsys = {
 	.name = "node",
@@ -850,6 +851,47 @@ void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
 			  kobject_name(&node_devices[mem_blk->nid]->dev.kobj));
 }
 
+/*
+ * register_memory_blocks_under_nodes : Register the memory blocks
+ *                 under the nodes.
+ *
+ * This function registers all memory blocks to their corresponding nodes
+ * based on the associated memory regions. Each memory region is tied to
+ * a specific node and does not span multiple nodes. Therefore, all memory
+ * blocks within a given region are considered to belong to that node. The
+ * function iterates through each memory region and registers the memory
+ * blocks contained within that region to the respective node. Since memory
+ * blocks can span across multiple regions (and hence multiple nodes), a
+ * single memory block may be registered under more than one node if it
+ * overlaps with regions belonging to different nodes.
+ */
+static void register_memory_blocks_under_nodes(void)
+{
+	struct memblock_region *r;
+
+	for_each_mem_region(r) {
+		const unsigned long start_block_id = phys_to_block_id(r->base);
+		const unsigned long end_block_id = phys_to_block_id(r->base + r->size - 1);
+		unsigned long block_id;
+		int nid = memblock_get_region_node(r);
+
+		if (!node_online(nid))
+			continue;
+
+		for (block_id = start_block_id; block_id <= end_block_id; block_id++) {
+			struct memory_block *mem;
+
+			mem = find_memory_block_by_id(block_id);
+			if (!mem)
+				continue;
+
+			do_register_memory_block_under_node(nid, mem, MEMINIT_EARLY);
+			put_device(&mem->dev);
+		}
+
+	}
+}
+
 void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
 				       unsigned long end_pfn,
 				       enum meminit_context context)
@@ -971,11 +1013,16 @@ void __init node_dev_init(void)
 
 	/*
 	 * Create all node devices, which will properly link the node
-	 * to applicable memory block devices and already created cpu devices.
+	 * to already created cpu devices.
 	 */
 	for_each_online_node(i) {
-		ret = register_one_node(i);
+		ret =  __register_one_node(i);
 		if (ret)
 			panic("%s() failed to add node: %d\n", __func__, ret);
 	}
+
+	/*
+	 * Link the node to memory block devices
+	 */
+	register_memory_blocks_under_nodes();
 }
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 12daa6ec7d09..2a61088e17ad 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -171,12 +171,30 @@ struct memory_group *memory_group_find_by_id(int mgid);
 typedef int (*walk_memory_groups_func_t)(struct memory_group *, void *);
 int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
 			       struct memory_group *excluded, void *arg);
+struct memory_block *find_memory_block_by_id(unsigned long block_id);
 #define hotplug_memory_notifier(fn, pri) ({		\
 	static __meminitdata struct notifier_block fn##_mem_nb =\
 		{ .notifier_call = fn, .priority = pri };\
 	register_memory_notifier(&fn##_mem_nb);			\
 })
 
+extern int sections_per_block;
+
+static inline unsigned long memory_block_id(unsigned long section_nr)
+{
+	return section_nr / sections_per_block;
+}
+
+static inline unsigned long pfn_to_block_id(unsigned long pfn)
+{
+	return memory_block_id(pfn_to_section_nr(pfn));
+}
+
+static inline unsigned long phys_to_block_id(unsigned long phys)
+{
+	return pfn_to_block_id(PFN_DOWN(phys));
+}
+
 #ifdef CONFIG_NUMA
 void memory_block_add_nid(struct memory_block *mem, int nid,
 			  enum meminit_context context);
@@ -188,5 +206,4 @@ void memory_block_add_nid(struct memory_block *mem, int nid,
  * can sleep.
  */
 extern struct mutex text_mutex;
-
 #endif /* _LINUX_MEMORY_H_ */
diff --git a/include/linux/node.h b/include/linux/node.h
index 2b7517892230..485370f3bc17 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -120,6 +120,9 @@ static inline void register_memory_blocks_under_node(int nid, unsigned long star
 						     enum meminit_context context)
 {
 }
+static inline void register_memory_blocks_under_nodes(void)
+{
+}
 #endif
 
 extern void unregister_node(struct node *node);
-- 
2.43.5



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v6 2/5] drivers/base/node: remove register_mem_block_under_node_early()
  2025-05-26 14:50 [PATCH v6 1/5] drivers/base/node: Optimize memory block registration to reduce boot time Donet Tom
@ 2025-05-26 14:50 ` Donet Tom
  2025-05-26 14:50 ` [PATCH v6 3/5] drivers/base/node: Remove register_memory_blocks_under_node() function call from register_one_node Donet Tom
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Donet Tom @ 2025-05-26 14:50 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton, Mike Rapoport, Oscar Salvador,
	Zi Yan, Greg Kroah-Hartman
  Cc: Ritesh Harjani, linux-mm, linux-kernel, Rafael J . Wysocki,
	Danilo Krummrich, Jonathan Cameron, Alison Schofield, Yury Norov,
	Dave Jiang, Madhavan Srinivasan, Nilay Shroff, linuxppc-dev,
	Donet Tom

The function register_mem_block_under_node_early() is no longer used,
as register_memory_blocks_under_node_early() now handles memory block
registration during early boot.

Removed register_mem_block_under_node_early() and get_nid_for_pfn(),
the latter was only used by the former.

Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>

---
v5 -> v6

No change

v5 - https://lore.kernel.org/all/d2490e807b2c13950bc1d4199f22ec078cc4c56a.1747904868.git.donettom@linux.ibm.com/
v4 - https://lore.kernel.org/all/f94685be9cdc931a026999d236d7e92de29725c7.1747376551.git.donettom@linux.ibm.com/
v3 - https://lore.kernel.org/all/b49ed289096643ff5b5fbedcf1d1c1be42845a74.1746250339.git.donettom@linux.ibm.com/
v2 - https://lore.kernel.org/all/fbe1e0c7d91bf3fa9a64ff5d84b53ded1d0d5ac7.1745852397.git.donettom@linux.ibm.com/
v1 - https://lore.kernel.org/all/50142a29010463f436dc5c4feb540e5de3bb09df.1744175097.git.donettom@linux.ibm.com/
---
---
 drivers/base/node.c | 58 +--------------------------------------------
 1 file changed, 1 insertion(+), 57 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 20b6f4496e1b..71ddf2d0d1af 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -748,15 +748,6 @@ int unregister_cpu_under_node(unsigned int cpu, unsigned int nid)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-static int __ref get_nid_for_pfn(unsigned long pfn)
-{
-#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-	if (system_state < SYSTEM_RUNNING)
-		return early_pfn_to_nid(pfn);
-#endif
-	return pfn_to_nid(pfn);
-}
-
 static void do_register_memory_block_under_node(int nid,
 						struct memory_block *mem_blk,
 						enum meminit_context context)
@@ -783,46 +774,6 @@ static void do_register_memory_block_under_node(int nid,
 				    ret);
 }
 
-/* register memory section under specified node if it spans that node */
-static int register_mem_block_under_node_early(struct memory_block *mem_blk,
-					       void *arg)
-{
-	unsigned long memory_block_pfns = memory_block_size_bytes() / PAGE_SIZE;
-	unsigned long start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
-	unsigned long end_pfn = start_pfn + memory_block_pfns - 1;
-	int nid = *(int *)arg;
-	unsigned long pfn;
-
-	for (pfn = start_pfn; pfn <= end_pfn; pfn++) {
-		int page_nid;
-
-		/*
-		 * memory block could have several absent sections from start.
-		 * skip pfn range from absent section
-		 */
-		if (!pfn_in_present_section(pfn)) {
-			pfn = round_down(pfn + PAGES_PER_SECTION,
-					 PAGES_PER_SECTION) - 1;
-			continue;
-		}
-
-		/*
-		 * We need to check if page belongs to nid only at the boot
-		 * case because node's ranges can be interleaved.
-		 */
-		page_nid = get_nid_for_pfn(pfn);
-		if (page_nid < 0)
-			continue;
-		if (page_nid != nid)
-			continue;
-
-		do_register_memory_block_under_node(nid, mem_blk, MEMINIT_EARLY);
-		return 0;
-	}
-	/* mem section does not span the specified node */
-	return 0;
-}
-
 /*
  * During hotplug we know that all pages in the memory block belong to the same
  * node.
@@ -896,15 +847,8 @@ void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
 				       unsigned long end_pfn,
 				       enum meminit_context context)
 {
-	walk_memory_blocks_func_t func;
-
-	if (context == MEMINIT_HOTPLUG)
-		func = register_mem_block_under_node_hotplug;
-	else
-		func = register_mem_block_under_node_early;
-
 	walk_memory_blocks(PFN_PHYS(start_pfn), PFN_PHYS(end_pfn - start_pfn),
-			   (void *)&nid, func);
+			   (void *)&nid, register_mem_block_under_node_hotplug);
 	return;
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
-- 
2.43.5



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v6 3/5] drivers/base/node: Remove register_memory_blocks_under_node() function call from register_one_node
  2025-05-26 14:50 [PATCH v6 1/5] drivers/base/node: Optimize memory block registration to reduce boot time Donet Tom
  2025-05-26 14:50 ` [PATCH v6 2/5] drivers/base/node: remove register_mem_block_under_node_early() Donet Tom
@ 2025-05-26 14:50 ` Donet Tom
  2025-05-26 14:50 ` [PATCH v6 4/5] drivers/base/node: Rename register_memory_blocks_under_node() and remove context argument Donet Tom
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Donet Tom @ 2025-05-26 14:50 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton, Mike Rapoport, Oscar Salvador,
	Zi Yan, Greg Kroah-Hartman
  Cc: Ritesh Harjani, linux-mm, linux-kernel, Rafael J . Wysocki,
	Danilo Krummrich, Jonathan Cameron, Alison Schofield, Yury Norov,
	Dave Jiang, Madhavan Srinivasan, Nilay Shroff, linuxppc-dev,
	Donet Tom

register_one_node() is now only called via cpu_up() → __try_online_node()
during CPU hotplug operations to online a node.

At this stage, the node has not yet had any memory added. As a result,
there are no memory blocks to walk or register, so calling
register_memory_blocks_under_node() is unnecessary.

Therefore, the call to register_memory_blocks_under_node()
has been removed from register_one_node().

Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>

---

v5 -> v6

Removed node_online() check.

v5 - https://lore.kernel.org/all/d2490e807b2c13950bc1d4199f22ec078cc4c56a.1747904868.git.donettom@linux.ibm.com/
v4 - https://lore.kernel.org/all/f94685be9cdc931a026999d236d7e92de29725c7.1747376551.git.donettom@linux.ibm.com/
v3 - https://lore.kernel.org/all/b49ed289096643ff5b5fbedcf1d1c1be42845a74.1746250339.git.donettom@linux.ibm.com/
v2 - https://lore.kernel.org/all/fbe1e0c7d91bf3fa9a64ff5d84b53ded1d0d5ac7.1745852397.git.donettom@linux.ibm.com/
v1 - https://lore.kernel.org/all/50142a29010463f436dc5c4feb540e5de3bb09df.1744175097.git.donettom@linux.ibm.com/
---
---
 include/linux/node.h | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/include/linux/node.h b/include/linux/node.h
index 485370f3bc17..b15de78e0408 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -134,21 +134,7 @@ extern int __register_one_node(int nid);
 /* Registers an online node */
 static inline int register_one_node(int nid)
 {
-	int error = 0;
-
-	if (node_online(nid)) {
-		struct pglist_data *pgdat = NODE_DATA(nid);
-		unsigned long start_pfn = pgdat->node_start_pfn;
-		unsigned long end_pfn = start_pfn + pgdat->node_spanned_pages;
-
-		error = __register_one_node(nid);
-		if (error)
-			return error;
-		register_memory_blocks_under_node(nid, start_pfn, end_pfn,
-						  MEMINIT_EARLY);
-	}
-
-	return error;
+	return __register_one_node(nid);
 }
 
 extern void unregister_one_node(int nid);
-- 
2.43.5



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v6 4/5] drivers/base/node: Rename register_memory_blocks_under_node() and remove context argument
  2025-05-26 14:50 [PATCH v6 1/5] drivers/base/node: Optimize memory block registration to reduce boot time Donet Tom
  2025-05-26 14:50 ` [PATCH v6 2/5] drivers/base/node: remove register_mem_block_under_node_early() Donet Tom
  2025-05-26 14:50 ` [PATCH v6 3/5] drivers/base/node: Remove register_memory_blocks_under_node() function call from register_one_node Donet Tom
@ 2025-05-26 14:50 ` Donet Tom
  2025-05-26 14:50 ` [PATCH v6 5/5] drivers/base/node: Rename __register_one_node() to register_one_node() Donet Tom
  2025-05-26 17:17 ` [PATCH v6 1/5] drivers/base/node: Optimize memory block registration to reduce boot time David Hildenbrand
  4 siblings, 0 replies; 8+ messages in thread
From: Donet Tom @ 2025-05-26 14:50 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton, Mike Rapoport, Oscar Salvador,
	Zi Yan, Greg Kroah-Hartman
  Cc: Ritesh Harjani, linux-mm, linux-kernel, Rafael J . Wysocki,
	Danilo Krummrich, Jonathan Cameron, Alison Schofield, Yury Norov,
	Dave Jiang, Madhavan Srinivasan, Nilay Shroff, linuxppc-dev,
	Donet Tom

The function register_memory_blocks_under_node() is now only called from
the memory hotplug path, as register_memory_blocks_under_node_early()
handles registration during early boot. Therefore, the context argument
used to differentiate between early boot and hotplug is no longer needed
and was removed.

Since the function is only called from the hotplug path, we renamed
register_memory_blocks_under_node() to
register_memory_blocks_under_node_hotplug()

Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>

---
v5 -> v6

No change

v5 - https://lore.kernel.org/all/d2490e807b2c13950bc1d4199f22ec078cc4c56a.1747904868.git.donettom@linux.ibm.com/
v4 - https://lore.kernel.org/all/f94685be9cdc931a026999d236d7e92de29725c7.1747376551.git.donettom@linux.ibm.com/
v3 - https://lore.kernel.org/all/b49ed289096643ff5b5fbedcf1d1c1be42845a74.1746250339.git.donettom@linux.ibm.com/
v2 - https://lore.kernel.org/all/fbe1e0c7d91bf3fa9a64ff5d84b53ded1d0d5ac7.1745852397.git.donettom@linux.ibm.com/
v1 - https://lore.kernel.org/all/50142a29010463f436dc5c4feb540e5de3bb09df.1744175097.git.donettom@linux.ibm.com/
---
---
 drivers/base/node.c  |  5 ++---
 include/linux/node.h | 11 +++++------
 mm/memory_hotplug.c  |  5 ++---
 3 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 71ddf2d0d1af..9d0977fa50e3 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -843,9 +843,8 @@ static void register_memory_blocks_under_nodes(void)
 	}
 }
 
-void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
-				       unsigned long end_pfn,
-				       enum meminit_context context)
+void register_memory_blocks_under_node_hotplug(int nid, unsigned long start_pfn,
+					       unsigned long end_pfn)
 {
 	walk_memory_blocks(PFN_PHYS(start_pfn), PFN_PHYS(end_pfn - start_pfn),
 			   (void *)&nid, register_mem_block_under_node_hotplug);
diff --git a/include/linux/node.h b/include/linux/node.h
index b15de78e0408..75b036a100d2 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -111,13 +111,12 @@ struct memory_block;
 extern struct node *node_devices[];
 
 #if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA)
-void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
-				       unsigned long end_pfn,
-				       enum meminit_context context);
+void register_memory_blocks_under_node_hotplug(int nid, unsigned long start_pfn,
+					       unsigned long end_pfn);
 #else
-static inline void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
-						     unsigned long end_pfn,
-						     enum meminit_context context)
+static inline void register_memory_blocks_under_node_hotplug(int nid,
+							     unsigned long start_pfn,
+							     unsigned long end_pfn)
 {
 }
 static inline void register_memory_blocks_under_nodes(void)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 8305483de38b..f734cc924b51 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1575,9 +1575,8 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		BUG_ON(ret);
 	}
 
-	register_memory_blocks_under_node(nid, PFN_DOWN(start),
-					  PFN_UP(start + size - 1),
-					  MEMINIT_HOTPLUG);
+	register_memory_blocks_under_node_hotplug(nid, PFN_DOWN(start),
+					  PFN_UP(start + size - 1));
 
 	/* create new memmap entry */
 	if (!strcmp(res->name, "System RAM"))
-- 
2.43.5



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v6 5/5] drivers/base/node: Rename __register_one_node() to register_one_node()
  2025-05-26 14:50 [PATCH v6 1/5] drivers/base/node: Optimize memory block registration to reduce boot time Donet Tom
                   ` (2 preceding siblings ...)
  2025-05-26 14:50 ` [PATCH v6 4/5] drivers/base/node: Rename register_memory_blocks_under_node() and remove context argument Donet Tom
@ 2025-05-26 14:50 ` Donet Tom
  2025-05-26 17:17   ` David Hildenbrand
  2025-05-26 17:17 ` [PATCH v6 1/5] drivers/base/node: Optimize memory block registration to reduce boot time David Hildenbrand
  4 siblings, 1 reply; 8+ messages in thread
From: Donet Tom @ 2025-05-26 14:50 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton, Mike Rapoport, Oscar Salvador,
	Zi Yan, Greg Kroah-Hartman
  Cc: Ritesh Harjani, linux-mm, linux-kernel, Rafael J . Wysocki,
	Danilo Krummrich, Jonathan Cameron, Alison Schofield, Yury Norov,
	Dave Jiang, Madhavan Srinivasan, Nilay Shroff, linuxppc-dev,
	Donet Tom

The register_one_node() function was a simple wrapper around
__register_one_node(). To simplify the code, register_one_node()
has been removed, and __register_one_node() has been renamed to
register_one_node().

Signed-off-by: Donet Tom <donettom@linux.ibm.com>
---
 arch/powerpc/platforms/pseries/pci_dlpar.c |  2 +-
 drivers/base/node.c                        |  4 ++--
 include/linux/node.h                       | 13 +------------
 mm/memory_hotplug.c                        |  2 +-
 4 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
index 52e2623a741d..aeb8633a3d00 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -29,7 +29,7 @@ struct pci_controller *init_phb_dynamic(struct device_node *dn)
 	nid = of_node_to_nid(dn);
 	if (likely((nid) >= 0)) {
 		if (!node_online(nid)) {
-			if (__register_one_node(nid)) {
+			if (register_one_node(nid)) {
 				pr_err("PCI: Failed to register node %d\n", nid);
 			} else {
 				update_numa_distance(dn);
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 9d0977fa50e3..94b8ac116aa4 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -852,7 +852,7 @@ void register_memory_blocks_under_node_hotplug(int nid, unsigned long start_pfn,
 }
 #endif /* CONFIG_MEMORY_HOTPLUG */
 
-int __register_one_node(int nid)
+int register_one_node(int nid)
 {
 	int error;
 	int cpu;
@@ -959,7 +959,7 @@ void __init node_dev_init(void)
 	 * to already created cpu devices.
 	 */
 	for_each_online_node(i) {
-		ret =  __register_one_node(i);
+		ret =  register_one_node(i);
 		if (ret)
 			panic("%s() failed to add node: %d\n", __func__, ret);
 	}
diff --git a/include/linux/node.h b/include/linux/node.h
index 75b036a100d2..88bceebcbfa5 100644
--- a/include/linux/node.h
+++ b/include/linux/node.h
@@ -128,14 +128,7 @@ extern void unregister_node(struct node *node);
 #ifdef CONFIG_NUMA
 extern void node_dev_init(void);
 /* Core of the node registration - only memory hotplug should use this */
-extern int __register_one_node(int nid);
-
-/* Registers an online node */
-static inline int register_one_node(int nid)
-{
-	return __register_one_node(nid);
-}
-
+extern int register_one_node(int nid);
 extern void unregister_one_node(int nid);
 extern int register_cpu_under_node(unsigned int cpu, unsigned int nid);
 extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
@@ -148,10 +141,6 @@ extern int register_memory_node_under_compute_node(unsigned int mem_nid,
 static inline void node_dev_init(void)
 {
 }
-static inline int __register_one_node(int nid)
-{
-	return 0;
-}
 static inline int register_one_node(int nid)
 {
 	return 0;
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f734cc924b51..4dadd156f836 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1571,7 +1571,7 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 		 * We online node here. We can't roll back from here.
 		 */
 		node_set_online(nid);
-		ret = __register_one_node(nid);
+		ret = register_one_node(nid);
 		BUG_ON(ret);
 	}
 
-- 
2.43.5



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v6 1/5] drivers/base/node: Optimize memory block registration to reduce boot time
  2025-05-26 14:50 [PATCH v6 1/5] drivers/base/node: Optimize memory block registration to reduce boot time Donet Tom
                   ` (3 preceding siblings ...)
  2025-05-26 14:50 ` [PATCH v6 5/5] drivers/base/node: Rename __register_one_node() to register_one_node() Donet Tom
@ 2025-05-26 17:17 ` David Hildenbrand
  2025-05-26 17:41   ` Donet Tom
  4 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2025-05-26 17:17 UTC (permalink / raw)
  To: Donet Tom, Andrew Morton, Mike Rapoport, Oscar Salvador, Zi Yan,
	Greg Kroah-Hartman
  Cc: Ritesh Harjani, linux-mm, linux-kernel, Rafael J . Wysocki,
	Danilo Krummrich, Jonathan Cameron, Alison Schofield, Yury Norov,
	Dave Jiang, Madhavan Srinivasan, Nilay Shroff, linuxppc-dev

On 26.05.25 16:50, Donet Tom wrote:
> During node device initialization, `memory blocks` are registered under
> each NUMA node. The `memory blocks` to be registered are identified using
> the node’s start and end PFNs, which are obtained from the node's pg_data
> 
> However, not all PFNs within this range necessarily belong to the same
> node—some may belong to other nodes. Additionally, due to the
> discontiguous nature of physical memory, certain sections within a
> `memory block` may be absent.
> 
> As a result, `memory blocks` that fall between a node’s start and end
> PFNs may span across multiple nodes, and some sections within those blocks
> may be missing. `Memory blocks` have a fixed size, which is architecture
> dependent.
> 
> Due to these considerations, the memory block registration is currently
> performed as follows:
> 
> for_each_online_node(nid):
>      start_pfn = pgdat->node_start_pfn;
>      end_pfn = pgdat->node_start_pfn + node_spanned_pages;
>      for_each_memory_block_between(PFN_PHYS(start_pfn), PFN_PHYS(end_pfn))
>          mem_blk = memory_block_id(pfn_to_section_nr(pfn));
>          pfn_mb_start=section_nr_to_pfn(mem_blk->start_section_nr)
>          pfn_mb_end = pfn_start + memory_block_pfns - 1
>          for (pfn = pfn_mb_start; pfn < pfn_mb_end; pfn++):
>              if (get_nid_for_pfn(pfn) != nid):
>                  continue;
>              else
>                  do_register_memory_block_under_node(nid, mem_blk,
>                                                          MEMINIT_EARLY);
> 
> Here, we derive the start and end PFNs from the node's pg_data, then
> determine the memory blocks that may belong to the node. For each
> `memory block` in this range, we inspect all PFNs it contains and check
> their associated NUMA node ID. If a PFN within the block matches the
> current node, the memory block is registered under that node.
> 
> If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, get_nid_for_pfn() performs
> a binary search in the `memblock regions` to determine the NUMA node ID
> for a given PFN. If it is not enabled, the node ID is retrieved directly
> from the struct page.
> 
> On large systems, this process can become time-consuming, especially since
> we iterate over each `memory block` and all PFNs within it until a match is
> found. When CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, the additional
> overhead of the binary search increases the execution time significantly,
> potentially leading to soft lockups during boot.
> 
> In this patch, we iterate over `memblock region` to identify the
> `memory blocks` that belong to the current NUMA node. `memblock regions`
> are contiguous memory ranges, each associated with a single NUMA node, and
> they do not span across multiple nodes.
> 
> for_each_memory_region(r): // r => region
>    if (!node_online(r->nid)):
>      continue;
>    else
>      for_each_memory_block_between(r->base, r->base + r->size - 1):
>        do_register_memory_block_under_node(r->nid, mem_blk, MEMINIT_EARLY);
> 
> We iterate over all memblock regions, and if the node associated with the
> region is online, we calculate the start and end memory blocks based on the
> region's start and end PFNs. We then register all the memory blocks within
> that range under the region node.
> 
> Test Results on My system with 32TB RAM
> =======================================
> 1. Boot time with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled.
> 
> Without this patch
> ------------------
> Startup finished in 1min 16.528s (kernel)
> 
> With this patch
> ---------------
> Startup finished in 17.236s (kernel) - 78% Improvement
> 
> 2. Boot time with CONFIG_DEFERRED_STRUCT_PAGE_INIT disabled.
> 
> Without this patch
> ------------------
> Startup finished in 28.320s (kernel)
> 
> With this patch
> ---------------
> Startup finished in 15.621s (kernel) - 46% Improvement
> 
> Acked-by: Oscar Salvador <osalvador@suse.de>
> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> Acked-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> 


Only a couple of nits:

> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index cd13ef287011..20b6f4496e1b 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -20,6 +20,7 @@
>   #include <linux/pm_runtime.h>
>   #include <linux/swap.h>
>   #include <linux/slab.h>
> +#include <linux/memblock.h>
>   
>   static const struct bus_type node_subsys = {
>   	.name = "node",
> @@ -850,6 +851,47 @@ void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>   			  kobject_name(&node_devices[mem_blk->nid]->dev.kobj));
>   }
>   
> +/*
> + * register_memory_blocks_under_nodes : Register the memory blocks
 > + *                 under the nodes.> + *
> + * This function registers all memory blocks to their corresponding nodes
> + * based on the associated memory regions. Each memory region is tied to
> + * a specific node and does not span multiple nodes. Therefore, all memory
> + * blocks within a given region are considered to belong to that node. The
> + * function iterates through each memory region and registers the memory
> + * blocks contained within that region to the respective node. Since memory
> + * blocks can span across multiple regions (and hence multiple nodes), a
> + * single memory block may be registered under more than one node if it
> + * overlaps with regions belonging to different nodes.

a) Do we need excessive doc for that?

b) It looks partially like kerneldoc, do we want to convert it to proper 
one?

/**
  * register_memory_blocks_under_nodes - register all memory blocks
  * 					under the corresponding nodes
  *
  ...

c) Maybe add a line break .. or two to make it a bit more readable.

 > + */> +static void register_memory_blocks_under_nodes(void)
> +{
> +	struct memblock_region *r;
> +
> +	for_each_mem_region(r) {
> +		const unsigned long start_block_id = phys_to_block_id(r->base);
> +		const unsigned long end_block_id = phys_to_block_id(r->base + r->size - 1);
> +		unsigned long block_id;
> +		int nid = memblock_get_region_node(r);

const int nid = memblock_get_region_node(r);
unsigned long block_id;

> +
> +		if (!node_online(nid))
> +			continue;
> +
> +		for (block_id = start_block_id; block_id <= end_block_id; block_id++) {
> +			struct memory_block *mem;
> +
> +			mem = find_memory_block_by_id(block_id);
> +			if (!mem)
> +				continue;
> +
> +			do_register_memory_block_under_node(nid, mem, MEMINIT_EARLY);
> +			put_device(&mem->dev);
> +		}
> +
> +	}
> +}
> +
>   void register_memory_blocks_under_node(int nid, unsigned long start_pfn,
>   				       unsigned long end_pfn,
>   				       enum meminit_context context)
> @@ -971,11 +1013,16 @@ void __init node_dev_init(void)
>   
>   	/*
>   	 * Create all node devices, which will properly link the node
> -	 * to applicable memory block devices and already created cpu devices.
> +	 * to already created cpu devices.
>   	 */
>   	for_each_online_node(i) {
> -		ret = register_one_node(i);
> +		ret =  __register_one_node(i);
>   		if (ret)
>   			panic("%s() failed to add node: %d\n", __func__, ret);
>   	}
> +
> +	/*
> +	 * Link the node to memory block devices
> +	 */


This comment is rather ... superfluous.  ... and it would fit into a 
single line.

> +	register_memory_blocks_under_nodes();
>   }
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index 12daa6ec7d09..2a61088e17ad 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -171,12 +171,30 @@ struct memory_group *memory_group_find_by_id(int mgid);
>   typedef int (*walk_memory_groups_func_t)(struct memory_group *, void *);
>   int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
>   			       struct memory_group *excluded, void *arg);
> +struct memory_block *find_memory_block_by_id(unsigned long block_id);
>   #define hotplug_memory_notifier(fn, pri) ({		\
>   	static __meminitdata struct notifier_block fn##_mem_nb =\
>   		{ .notifier_call = fn, .priority = pri };\
>   	register_memory_notifier(&fn##_mem_nb);			\
>   })
>   
> +extern int sections_per_block;
> +
> +static inline unsigned long memory_block_id(unsigned long section_nr)
> +{
> +	return section_nr / sections_per_block;
> +}
> +
> +static inline unsigned long pfn_to_block_id(unsigned long pfn)
> +{
> +	return memory_block_id(pfn_to_section_nr(pfn));
> +}
> +
> +static inline unsigned long phys_to_block_id(unsigned long phys)
> +{
> +	return pfn_to_block_id(PFN_DOWN(phys));
> +}
> +
>   #ifdef CONFIG_NUMA
>   void memory_block_add_nid(struct memory_block *mem, int nid,
>   			  enum meminit_context context);
> @@ -188,5 +206,4 @@ void memory_block_add_nid(struct memory_block *mem, int nid,
>    * can sleep.
>    */
>   extern struct mutex text_mutex;
> -

Unrelated change.


Thanks

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v6 5/5] drivers/base/node: Rename __register_one_node() to register_one_node()
  2025-05-26 14:50 ` [PATCH v6 5/5] drivers/base/node: Rename __register_one_node() to register_one_node() Donet Tom
@ 2025-05-26 17:17   ` David Hildenbrand
  0 siblings, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2025-05-26 17:17 UTC (permalink / raw)
  To: Donet Tom, Andrew Morton, Mike Rapoport, Oscar Salvador, Zi Yan,
	Greg Kroah-Hartman
  Cc: Ritesh Harjani, linux-mm, linux-kernel, Rafael J . Wysocki,
	Danilo Krummrich, Jonathan Cameron, Alison Schofield, Yury Norov,
	Dave Jiang, Madhavan Srinivasan, Nilay Shroff, linuxppc-dev

On 26.05.25 16:50, Donet Tom wrote:
> The register_one_node() function was a simple wrapper around
> __register_one_node(). To simplify the code, register_one_node()
> has been removed, and __register_one_node() has been renamed to
> register_one_node().
> 
> Signed-off-by: Donet Tom <donettom@linux.ibm.com>

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v6 1/5] drivers/base/node: Optimize memory block registration to reduce boot time
  2025-05-26 17:17 ` [PATCH v6 1/5] drivers/base/node: Optimize memory block registration to reduce boot time David Hildenbrand
@ 2025-05-26 17:41   ` Donet Tom
  0 siblings, 0 replies; 8+ messages in thread
From: Donet Tom @ 2025-05-26 17:41 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton, Mike Rapoport, Oscar Salvador,
	Zi Yan, Greg Kroah-Hartman
  Cc: Ritesh Harjani, linux-mm, linux-kernel, Rafael J . Wysocki,
	Danilo Krummrich, Jonathan Cameron, Alison Schofield, Yury Norov,
	Dave Jiang, Madhavan Srinivasan, Nilay Shroff, linuxppc-dev


On 5/26/25 10:47 PM, David Hildenbrand wrote:
> On 26.05.25 16:50, Donet Tom wrote:
>> During node device initialization, `memory blocks` are registered under
>> each NUMA node. The `memory blocks` to be registered are identified 
>> using
>> the node’s start and end PFNs, which are obtained from the node's 
>> pg_data
>>
>> However, not all PFNs within this range necessarily belong to the same
>> node—some may belong to other nodes. Additionally, due to the
>> discontiguous nature of physical memory, certain sections within a
>> `memory block` may be absent.
>>
>> As a result, `memory blocks` that fall between a node’s start and end
>> PFNs may span across multiple nodes, and some sections within those 
>> blocks
>> may be missing. `Memory blocks` have a fixed size, which is architecture
>> dependent.
>>
>> Due to these considerations, the memory block registration is currently
>> performed as follows:
>>
>> for_each_online_node(nid):
>>      start_pfn = pgdat->node_start_pfn;
>>      end_pfn = pgdat->node_start_pfn + node_spanned_pages;
>>      for_each_memory_block_between(PFN_PHYS(start_pfn), 
>> PFN_PHYS(end_pfn))
>>          mem_blk = memory_block_id(pfn_to_section_nr(pfn));
>> pfn_mb_start=section_nr_to_pfn(mem_blk->start_section_nr)
>>          pfn_mb_end = pfn_start + memory_block_pfns - 1
>>          for (pfn = pfn_mb_start; pfn < pfn_mb_end; pfn++):
>>              if (get_nid_for_pfn(pfn) != nid):
>>                  continue;
>>              else
>>                  do_register_memory_block_under_node(nid, mem_blk,
>> MEMINIT_EARLY);
>>
>> Here, we derive the start and end PFNs from the node's pg_data, then
>> determine the memory blocks that may belong to the node. For each
>> `memory block` in this range, we inspect all PFNs it contains and check
>> their associated NUMA node ID. If a PFN within the block matches the
>> current node, the memory block is registered under that node.
>>
>> If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, get_nid_for_pfn() 
>> performs
>> a binary search in the `memblock regions` to determine the NUMA node ID
>> for a given PFN. If it is not enabled, the node ID is retrieved directly
>> from the struct page.
>>
>> On large systems, this process can become time-consuming, especially 
>> since
>> we iterate over each `memory block` and all PFNs within it until a 
>> match is
>> found. When CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, the additional
>> overhead of the binary search increases the execution time 
>> significantly,
>> potentially leading to soft lockups during boot.
>>
>> In this patch, we iterate over `memblock region` to identify the
>> `memory blocks` that belong to the current NUMA node. `memblock regions`
>> are contiguous memory ranges, each associated with a single NUMA 
>> node, and
>> they do not span across multiple nodes.
>>
>> for_each_memory_region(r): // r => region
>>    if (!node_online(r->nid)):
>>      continue;
>>    else
>>      for_each_memory_block_between(r->base, r->base + r->size - 1):
>>        do_register_memory_block_under_node(r->nid, mem_blk, 
>> MEMINIT_EARLY);
>>
>> We iterate over all memblock regions, and if the node associated with 
>> the
>> region is online, we calculate the start and end memory blocks based 
>> on the
>> region's start and end PFNs. We then register all the memory blocks 
>> within
>> that range under the region node.
>>
>> Test Results on My system with 32TB RAM
>> =======================================
>> 1. Boot time with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled.
>>
>> Without this patch
>> ------------------
>> Startup finished in 1min 16.528s (kernel)
>>
>> With this patch
>> ---------------
>> Startup finished in 17.236s (kernel) - 78% Improvement
>>
>> 2. Boot time with CONFIG_DEFERRED_STRUCT_PAGE_INIT disabled.
>>
>> Without this patch
>> ------------------
>> Startup finished in 28.320s (kernel)
>>
>> With this patch
>> ---------------
>> Startup finished in 15.621s (kernel) - 46% Improvement
>>
>> Acked-by: Oscar Salvador <osalvador@suse.de>
>> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>> Acked-by: Zi Yan <ziy@nvidia.com>
>> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
>>
>
>
> Only a couple of nits:
>
>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>> index cd13ef287011..20b6f4496e1b 100644
>> --- a/drivers/base/node.c
>> +++ b/drivers/base/node.c
>> @@ -20,6 +20,7 @@
>>   #include <linux/pm_runtime.h>
>>   #include <linux/swap.h>
>>   #include <linux/slab.h>
>> +#include <linux/memblock.h>
>>     static const struct bus_type node_subsys = {
>>       .name = "node",
>> @@ -850,6 +851,47 @@ void unregister_memory_block_under_nodes(struct 
>> memory_block *mem_blk)
>> kobject_name(&node_devices[mem_blk->nid]->dev.kobj));
>>   }
>>   +/*
>> + * register_memory_blocks_under_nodes : Register the memory blocks
> > + *                 under the nodes.> + *
>> + * This function registers all memory blocks to their corresponding 
>> nodes
>> + * based on the associated memory regions. Each memory region is 
>> tied to
>> + * a specific node and does not span multiple nodes. Therefore, all 
>> memory
>> + * blocks within a given region are considered to belong to that 
>> node. The
>> + * function iterates through each memory region and registers the 
>> memory
>> + * blocks contained within that region to the respective node. Since 
>> memory
>> + * blocks can span across multiple regions (and hence multiple 
>> nodes), a
>> + * single memory block may be registered under more than one node if it
>> + * overlaps with regions belonging to different nodes.
>
> a) Do we need excessive doc for that?
>
> b) It looks partially like kerneldoc, do we want to convert it to 
> proper one?
>
> /**
>  * register_memory_blocks_under_nodes - register all memory blocks
>  *                     under the corresponding nodes
>  *
>  ...
>
> c) Maybe add a line break .. or two to make it a bit more readable.


Sure David, I will change it to a proper comment.


>
> > + */> +static void register_memory_blocks_under_nodes(void)
>> +{
>> +    struct memblock_region *r;
>> +
>> +    for_each_mem_region(r) {
>> +        const unsigned long start_block_id = phys_to_block_id(r->base);
>> +        const unsigned long end_block_id = phys_to_block_id(r->base 
>> + r->size - 1);
>> +        unsigned long block_id;
>> +        int nid = memblock_get_region_node(r);
>
> const int nid = memblock_get_region_node(r);
> unsigned long block_id;


Sure. I will change it.


>
>> +
>> +        if (!node_online(nid))
>> +            continue;
>> +
>> +        for (block_id = start_block_id; block_id <= end_block_id; 
>> block_id++) {
>> +            struct memory_block *mem;
>> +
>> +            mem = find_memory_block_by_id(block_id);
>> +            if (!mem)
>> +                continue;
>> +
>> +            do_register_memory_block_under_node(nid, mem, 
>> MEMINIT_EARLY);
>> +            put_device(&mem->dev);
>> +        }
>> +
>> +    }
>> +}
>> +
>>   void register_memory_blocks_under_node(int nid, unsigned long 
>> start_pfn,
>>                          unsigned long end_pfn,
>>                          enum meminit_context context)
>> @@ -971,11 +1013,16 @@ void __init node_dev_init(void)
>>         /*
>>        * Create all node devices, which will properly link the node
>> -     * to applicable memory block devices and already created cpu 
>> devices.
>> +     * to already created cpu devices.
>>        */
>>       for_each_online_node(i) {
>> -        ret = register_one_node(i);
>> +        ret =  __register_one_node(i);
>>           if (ret)
>>               panic("%s() failed to add node: %d\n", __func__, ret);
>>       }
>> +
>> +    /*
>> +     * Link the node to memory block devices
>> +     */
>
>
> This comment is rather ... superfluous.  ... and it would fit into a 
> single line.


I will remove it.


>
>> +    register_memory_blocks_under_nodes();
>>   }
>> diff --git a/include/linux/memory.h b/include/linux/memory.h
>> index 12daa6ec7d09..2a61088e17ad 100644
>> --- a/include/linux/memory.h
>> +++ b/include/linux/memory.h
>> @@ -171,12 +171,30 @@ struct memory_group 
>> *memory_group_find_by_id(int mgid);
>>   typedef int (*walk_memory_groups_func_t)(struct memory_group *, 
>> void *);
>>   int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t 
>> func,
>>                      struct memory_group *excluded, void *arg);
>> +struct memory_block *find_memory_block_by_id(unsigned long block_id);
>>   #define hotplug_memory_notifier(fn, pri) ({        \
>>       static __meminitdata struct notifier_block fn##_mem_nb =\
>>           { .notifier_call = fn, .priority = pri };\
>>       register_memory_notifier(&fn##_mem_nb);            \
>>   })
>>   +extern int sections_per_block;
>> +
>> +static inline unsigned long memory_block_id(unsigned long section_nr)
>> +{
>> +    return section_nr / sections_per_block;
>> +}
>> +
>> +static inline unsigned long pfn_to_block_id(unsigned long pfn)
>> +{
>> +    return memory_block_id(pfn_to_section_nr(pfn));
>> +}
>> +
>> +static inline unsigned long phys_to_block_id(unsigned long phys)
>> +{
>> +    return pfn_to_block_id(PFN_DOWN(phys));
>> +}
>> +
>>   #ifdef CONFIG_NUMA
>>   void memory_block_add_nid(struct memory_block *mem, int nid,
>>                 enum meminit_context context);
>> @@ -188,5 +206,4 @@ void memory_block_add_nid(struct memory_block 
>> *mem, int nid,
>>    * can sleep.
>>    */
>>   extern struct mutex text_mutex;
>> -
>
> Unrelated change.
>
>
> Thanks
>
> Acked-by: David Hildenbrand <david@redhat.com>


Thank you.



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-05-26 17:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-26 14:50 [PATCH v6 1/5] drivers/base/node: Optimize memory block registration to reduce boot time Donet Tom
2025-05-26 14:50 ` [PATCH v6 2/5] drivers/base/node: remove register_mem_block_under_node_early() Donet Tom
2025-05-26 14:50 ` [PATCH v6 3/5] drivers/base/node: Remove register_memory_blocks_under_node() function call from register_one_node Donet Tom
2025-05-26 14:50 ` [PATCH v6 4/5] drivers/base/node: Rename register_memory_blocks_under_node() and remove context argument Donet Tom
2025-05-26 14:50 ` [PATCH v6 5/5] drivers/base/node: Rename __register_one_node() to register_one_node() Donet Tom
2025-05-26 17:17   ` David Hildenbrand
2025-05-26 17:17 ` [PATCH v6 1/5] drivers/base/node: Optimize memory block registration to reduce boot time David Hildenbrand
2025-05-26 17:41   ` Donet Tom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox