* Re: [PATCH v4] mm: pass nid to reserve_bootmem_region()
@ 2025-04-25 10:02 Gutierrez Cantu, Bernardo
2025-04-25 10:20 ` [PATCH] mm: memblock: Fix arguments passed to memblock_set_node() Bernardo C. Gutierrez Cantu
0 siblings, 1 reply; 10+ messages in thread
From: Gutierrez Cantu, Bernardo @ 2025-04-25 10:02 UTC (permalink / raw)
To: yajun.deng; +Cc: akpm, linux-kernel, linux-mm, lkp, rppt, dwmw2
> + if (memblock_is_nomap(region))
> + reserve_bootmem_region(start, end, nid);
> +
> + memblock_set_node(start, end, &memblock.reserved, nid);
memblock_set_node() receives a `base` and a `size` argument. Passing `end` would
cause us to set the node id on an incorrect range. Will send a fixup patch
shortly...
Best regards
Bernardo
Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] mm: memblock: Fix arguments passed to memblock_set_node()
2025-04-25 10:02 [PATCH v4] mm: pass nid to reserve_bootmem_region() Gutierrez Cantu, Bernardo
@ 2025-04-25 10:20 ` Bernardo C. Gutierrez Cantu
2025-04-25 14:18 ` David Woodhouse
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Bernardo C. Gutierrez Cantu @ 2025-04-25 10:20 UTC (permalink / raw)
To: bercantu; +Cc: akpm, dwmw2, linux-kernel, linux-mm, lkp, rppt, yajun.deng
memblock_set_node() receives a `base` and a `size` arguments, but we are
passing the `start` and `end` of the memory regions when iterating over
them in memmap_init_reserved_pages() to set their node ids.
This results in the function setting the node ids for the reserved memory
regions in `[base, base + base + size)` instead of `[base, base + size)`.
Pass `start` and `size`, so that we iterate over the correct range.
Fixes: 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()")
Signed-off-by: Bernardo C. Gutierrez Cantu <bercantu@amazon.de>
---
mm/memblock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memblock.c b/mm/memblock.c
index 0a53db4d9f7b..9639f04b4fdf 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2196,7 +2196,7 @@ static void __init memmap_init_reserved_pages(void)
if (memblock_is_nomap(region))
reserve_bootmem_region(start, end, nid);
- memblock_set_node(start, end, &memblock.reserved, nid);
+ memblock_set_node(start, region->size, &memblock.reserved, nid);
}
/*
--
2.47.1
Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] mm: memblock: Fix arguments passed to memblock_set_node()
2025-04-25 10:20 ` [PATCH] mm: memblock: Fix arguments passed to memblock_set_node() Bernardo C. Gutierrez Cantu
@ 2025-04-25 14:18 ` David Woodhouse
2025-04-25 14:37 ` Jiaxun Yang
2025-04-26 1:05 ` Andrew Morton
2025-04-26 8:22 ` Mike Rapoport
2 siblings, 1 reply; 10+ messages in thread
From: David Woodhouse @ 2025-04-25 14:18 UTC (permalink / raw)
To: Bernardo C. Gutierrez Cantu
Cc: akpm, linux-kernel, linux-mm, lkp, rppt, yajun.deng, Huang Pei,
Thomas Bogendoerfer, Huacai Chen, Jiaxun Yang, linux-mips
[-- Attachment #1: Type: text/plain, Size: 1815 bytes --]
On Fri, 2025-04-25 at 10:20 +0000, Bernardo C. Gutierrez Cantu wrote:
> memblock_set_node() receives a `base` and a `size` arguments, but we are
> passing the `start` and `end` of the memory regions when iterating over
> them in memmap_init_reserved_pages() to set their node ids.
>
> This results in the function setting the node ids for the reserved memory
> regions in `[base, base + base + size)` instead of `[base, base + size)`.
>
> Pass `start` and `size`, so that we iterate over the correct range.
>
> Fixes: 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()")
>
> Signed-off-by: Bernardo C. Gutierrez Cantu <bercantu@amazon.de>
Thanks, Bernardo. Good catch!
Acked-by: David Woodhouse <dwmw2@infradead.org>
That function taking (start, size) arguments seems a little surprising
to me; I instinctively expect physical memory management functions to
use (start, end). Clearly the author of that commit did too :)
As you pointed out in our private chat though, the rest of the
memblock_* functions are all (start, size) too, so they are at least
consistent. I don't think it makes a lot of sense to talk about
changing them *all* at this point?
I did a quick grep for memblock_set_node() callers, and the one in
szmem() in arch/mips/loongson64/init.c looks odd.
/* set nid for reserved memory */
memblock_set_node((u64)node << 44, (u64)(node + 1) << 44,
&memblock.reserved, node);
At first glance I suspect the 'size' should just be (1<<44) or maybe it
should be inside the loop over the memmap, and called with mem_start,
mem_size each time?
And why are we calling memblock_reserve() for what appears to be a
single system-wide vgabios_addr, repeatedly each time szmem() is called
for a different NUMA node?
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] mm: memblock: Fix arguments passed to memblock_set_node()
2025-04-25 14:18 ` David Woodhouse
@ 2025-04-25 14:37 ` Jiaxun Yang
0 siblings, 0 replies; 10+ messages in thread
From: Jiaxun Yang @ 2025-04-25 14:37 UTC (permalink / raw)
To: David Woodhouse, Bernardo C. Gutierrez Cantu
Cc: Andrew Morton, linux-kernel, linux-mm, kernel test robot, rppt,
yajun.deng, Huang Pei, Thomas Bogendoerfer, Huacai Chen,
linux-mips
在2025年4月25日周五 下午3:18,David Woodhouse写道:
[...]
Hi David & co,
>
>
> I did a quick grep for memblock_set_node() callers, and the one in
> szmem() in arch/mips/loongson64/init.c looks odd.
>
> /* set nid for reserved memory */
> memblock_set_node((u64)node << 44, (u64)(node + 1) << 44,
> &memblock.reserved, node);
>
> At first glance I suspect the 'size' should just be (1<<44) or maybe it
> should be inside the loop over the memmap, and called with mem_start,
> mem_size each time?
You are right, it should be (1 << 44), it is an oversight when I was
converting MIPS private boot allocator to memblock.
>
> And why are we calling memblock_reserve() for what appears to be a
> single system-wide vgabios_addr, repeatedly each time szmem() is called
> for a different NUMA node?
Alas this should be done only for node 0.
Thanks
>
>
> 附件:
> * smime.p7s
--
- Jiaxun
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: memblock: Fix arguments passed to memblock_set_node()
2025-04-25 10:20 ` [PATCH] mm: memblock: Fix arguments passed to memblock_set_node() Bernardo C. Gutierrez Cantu
2025-04-25 14:18 ` David Woodhouse
@ 2025-04-26 1:05 ` Andrew Morton
2025-04-28 9:13 ` Bernardo C. Gutierrez Cantu
2025-04-26 8:22 ` Mike Rapoport
2 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2025-04-26 1:05 UTC (permalink / raw)
To: Bernardo C. Gutierrez Cantu
Cc: dwmw2, linux-kernel, linux-mm, lkp, rppt, yajun.deng
On Fri, 25 Apr 2025 10:20:03 +0000 "Bernardo C. Gutierrez Cantu" <bercantu@amazon.de> wrote:
> memblock_set_node() receives a `base` and a `size` arguments, but we are
> passing the `start` and `end` of the memory regions when iterating over
> them in memmap_init_reserved_pages() to set their node ids.
>
> This results in the function setting the node ids for the reserved memory
> regions in `[base, base + base + size)` instead of `[base, base + size)`.
>
> Pass `start` and `size`, so that we iterate over the correct range.
>
> Fixes: 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()")
>
> ...
>
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -2196,7 +2196,7 @@ static void __init memmap_init_reserved_pages(void)
> if (memblock_is_nomap(region))
> reserve_bootmem_region(start, end, nid);
>
> - memblock_set_node(start, end, &memblock.reserved, nid);
> + memblock_set_node(start, region->size, &memblock.reserved, nid);
> }
>
> /*
What were the runtime effects of this bug?
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] mm: memblock: Fix arguments passed to memblock_set_node()
2025-04-26 1:05 ` Andrew Morton
@ 2025-04-28 9:13 ` Bernardo C. Gutierrez Cantu
0 siblings, 0 replies; 10+ messages in thread
From: Bernardo C. Gutierrez Cantu @ 2025-04-28 9:13 UTC (permalink / raw)
To: akpm; +Cc: bercantu, dwmw2, linux-kernel, linux-mm, lkp, rppt, yajun.deng
On Sat, 26 Apr 2025 03:05:39 +0200 Andrew Morton <akpm@linux-foundation.org> wrote:
> On Fri, 25 Apr 2025 10:20:03 +0000 "Bernardo C. Gutierrez Cantu" <bercantu@amazon.de> wrote:
>
> > memblock_set_node() receives a `base` and a `size` arguments, but we are
> > passing the `start` and `end` of the memory regions when iterating over
> > them in memmap_init_reserved_pages() to set their node ids.
> >
> > This results in the function setting the node ids for the reserved memory
> > regions in `[base, base + base + size)` instead of `[base, base + size)`.
> >
> > Pass `start` and `size`, so that we iterate over the correct range.
> >
> > Fixes: 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()")
> >
> > ...
> >
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -2196,7 +2196,7 @@ static void __init memmap_init_reserved_pages(void)
> > if (memblock_is_nomap(region))
> > reserve_bootmem_region(start, end, nid);
> >
> > - memblock_set_node(start, end, &memblock.reserved, nid);
> > + memblock_set_node(start, region->size, &memblock.reserved, nid);
> > }
> >
> > /*
>
> What were the runtime effects of this bug?
I found the bug via code introspection while trying to learn how the boot time
memory allocation of Linux worked. I was not actively pursuing to fix a runtime
issue that I could then prove was fixed by this.
But I see that in most cases this bug should be mostly benign (which could
explain why it was not caught before). In kernels compiled with CONFIG_NUMA, the
memblock_set_node() function would iterate over more memblock regions and
potentially set incorrect node ids, which would then be overwritten with the
correct node ids once the caller function memmap_init_reserved_pages() iterates
over the next regions. So the resultant node ids after initialization should
still be correct.
It could still impact the length of the mm init process on kernel boot, but I
did not measure the improvement, as it is highly dependent on the memory
configuration system, and in my specific system under test, it was neglible.
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: memblock: Fix arguments passed to memblock_set_node()
2025-04-25 10:20 ` [PATCH] mm: memblock: Fix arguments passed to memblock_set_node() Bernardo C. Gutierrez Cantu
2025-04-25 14:18 ` David Woodhouse
2025-04-26 1:05 ` Andrew Morton
@ 2025-04-26 8:22 ` Mike Rapoport
2025-04-28 9:53 ` Bernardo C. Gutierrez Cantu
2 siblings, 1 reply; 10+ messages in thread
From: Mike Rapoport @ 2025-04-26 8:22 UTC (permalink / raw)
To: Bernardo C. Gutierrez Cantu
Cc: akpm, dwmw2, linux-kernel, linux-mm, lkp, yajun.deng
Hi Bernardo,
On Fri, Apr 25, 2025 at 10:20:03AM +0000, Bernardo C. Gutierrez Cantu wrote:
> memblock_set_node() receives a `base` and a `size` arguments, but we are
> passing the `start` and `end` of the memory regions when iterating over
> them in memmap_init_reserved_pages() to set their node ids.
>
> This results in the function setting the node ids for the reserved memory
> regions in `[base, base + base + size)` instead of `[base, base + size)`.
>
> Pass `start` and `size`, so that we iterate over the correct range.
>
> Fixes: 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()")
>
> Signed-off-by: Bernardo C. Gutierrez Cantu <bercantu@amazon.de>
There's already a fix in memblock tree:
https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock.git/commit/?h=for-next&id=06eaa824fd239edd1eab2754f29b2d03da313003
Will send PR to Linus soon.
> ---
> mm/memblock.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 0a53db4d9f7b..9639f04b4fdf 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -2196,7 +2196,7 @@ static void __init memmap_init_reserved_pages(void)
> if (memblock_is_nomap(region))
> reserve_bootmem_region(start, end, nid);
>
> - memblock_set_node(start, end, &memblock.reserved, nid);
> + memblock_set_node(start, region->size, &memblock.reserved, nid);
> }
>
> /*
> --
> 2.47.1
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH] mm: memblock: Fix arguments passed to memblock_set_node()
2025-04-26 8:22 ` Mike Rapoport
@ 2025-04-28 9:53 ` Bernardo C. Gutierrez Cantu
0 siblings, 0 replies; 10+ messages in thread
From: Bernardo C. Gutierrez Cantu @ 2025-04-28 9:53 UTC (permalink / raw)
To: rppt; +Cc: akpm, bercantu, dwmw2, linux-kernel, linux-mm, lkp, yajun.deng
On Sat, Apr 26, 2025 at 10:22:39AM +0200, Mike Rapoport wrote:
> Hi Bernardo,
>
> On Fri, Apr 25, 2025 at 10:20:03AM +0000, Bernardo C. Gutierrez Cantu wrote:
> > memblock_set_node() receives a `base` and a `size` arguments, but we are
> > passing the `start` and `end` of the memory regions when iterating over
> > them in memmap_init_reserved_pages() to set their node ids.
> >
> > This results in the function setting the node ids for the reserved memory
> > regions in `[base, base + base + size)` instead of `[base, base + size)`.
> >
> > Pass `start` and `size`, so that we iterate over the correct range.
> >
> > Fixes: 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()")
> >
> > Signed-off-by: Bernardo C. Gutierrez Cantu <bercantu@amazon.de>
>
> There's already a fix in memblock tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock.git/commit/?h=for-next&id=06eaa824fd239edd1eab2754f29b2d03da313003
>
> Will send PR to Linus soon.
>
Hi Mike
Oh ok, I see. Thank you, that patch should fix the issue.
Will it also be backported to LTS kernels affected by the bug, which was
introduced in v6.5?
> > ---
> > mm/memblock.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index 0a53db4d9f7b..9639f04b4fdf 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -2196,7 +2196,7 @@ static void __init memmap_init_reserved_pages(void)
> > if (memblock_is_nomap(region))
> > reserve_bootmem_region(start, end, nid);
> >
> > - memblock_set_node(start, end, &memblock.reserved, nid);
> > + memblock_set_node(start, region->size, &memblock.reserved, nid);
> > }
> >
> > /*
> > --
> > 2.47.1
>
> --
> Sincerely yours,
> Mike.
Best regards
Bernardo
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v4] mm: pass nid to reserve_bootmem_region()
@ 2023-06-19 2:34 Yajun Deng
2023-06-20 8:45 ` Mike Rapoport
0 siblings, 1 reply; 10+ messages in thread
From: Yajun Deng @ 2023-06-19 2:34 UTC (permalink / raw)
To: akpm, rppt; +Cc: linux-mm, linux-kernel, Yajun Deng, kernel test robot
early_pfn_to_nid() is called frequently in init_reserved_page(), it
returns the node id of the PFN. These PFN are probably from the same
memory region, they have the same node id. It's not necessary to call
early_pfn_to_nid() for each PFN.
Pass nid to reserve_bootmem_region() and drop the call to
early_pfn_to_nid() in init_reserved_page(). Also, set nid on all
reserved pages before doing this, as some reserved memory regions may
not be set nid.
The most beneficial function is memmap_init_reserved_pages() if
CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled.
The following data was tested on an x86 machine with 190GB of RAM.
before:
memmap_init_reserved_pages() 67ms
after:
memmap_init_reserved_pages() 20ms
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306160145.juJMr3Bi-lkp@intel.com
---
V3 -> V4: make the test for early_page_initialised() inside if
(IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT))
V2 -> V3: set nid on all reserved pages before pass nid.
V1 -> V2: fix build error when CONFIG_NUMA is not enabled.
---
include/linux/mm.h | 3 ++-
mm/memblock.c | 31 +++++++++++++++++++++----------
mm/mm_init.c | 30 +++++++++++++++++-------------
3 files changed, 40 insertions(+), 24 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index fdd966b11f79..a7a0e692d44d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2960,7 +2960,8 @@ extern unsigned long free_reserved_area(void *start, void *end,
extern void adjust_managed_page_count(struct page *page, long count);
-extern void reserve_bootmem_region(phys_addr_t start, phys_addr_t end);
+extern void reserve_bootmem_region(phys_addr_t start,
+ phys_addr_t end, int nid);
/* Free the reserved page into the buddy system, so it gets managed. */
static inline void free_reserved_page(struct page *page)
diff --git a/mm/memblock.c b/mm/memblock.c
index ff0da1858778..f9e61e565a53 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -2091,19 +2091,30 @@ static void __init memmap_init_reserved_pages(void)
{
struct memblock_region *region;
phys_addr_t start, end;
- u64 i;
+ int nid;
+
+ /*
+ * set nid on all reserved pages and also treat struct
+ * pages for the NOMAP regions as PageReserved
+ */
+ for_each_mem_region(region) {
+ nid = memblock_get_region_node(region);
+ start = region->base;
+ end = start + region->size;
+
+ if (memblock_is_nomap(region))
+ reserve_bootmem_region(start, end, nid);
+
+ memblock_set_node(start, end, &memblock.reserved, nid);
+ }
/* initialize struct pages for the reserved regions */
- for_each_reserved_mem_range(i, &start, &end)
- reserve_bootmem_region(start, end);
+ for_each_reserved_mem_region(region) {
+ nid = memblock_get_region_node(region);
+ start = region->base;
+ end = start + region->size;
- /* and also treat struct pages for the NOMAP regions as PageReserved */
- for_each_mem_region(region) {
- if (memblock_is_nomap(region)) {
- start = region->base;
- end = start + region->size;
- reserve_bootmem_region(start, end);
- }
+ reserve_bootmem_region(start, end, nid);
}
}
diff --git a/mm/mm_init.c b/mm/mm_init.c
index d393631599a7..a1963c3322af 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -646,10 +646,8 @@ static inline void pgdat_set_deferred_range(pg_data_t *pgdat)
}
/* Returns true if the struct page for the pfn is initialised */
-static inline bool __meminit early_page_initialised(unsigned long pfn)
+static inline bool __meminit early_page_initialised(unsigned long pfn, int nid)
{
- int nid = early_pfn_to_nid(pfn);
-
if (node_online(nid) && pfn >= NODE_DATA(nid)->first_deferred_pfn)
return false;
@@ -695,15 +693,14 @@ defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
return false;
}
-static void __meminit init_reserved_page(unsigned long pfn)
+static void __meminit init_reserved_page(unsigned long pfn, int nid)
{
pg_data_t *pgdat;
- int nid, zid;
+ int zid;
- if (early_page_initialised(pfn))
+ if (early_page_initialised(pfn, nid))
return;
- nid = early_pfn_to_nid(pfn);
pgdat = NODE_DATA(nid);
for (zid = 0; zid < MAX_NR_ZONES; zid++) {
@@ -717,7 +714,7 @@ static void __meminit init_reserved_page(unsigned long pfn)
#else
static inline void pgdat_set_deferred_range(pg_data_t *pgdat) {}
-static inline bool early_page_initialised(unsigned long pfn)
+static inline bool early_page_initialised(unsigned long pfn, int nid)
{
return true;
}
@@ -727,7 +724,7 @@ static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
return false;
}
-static inline void init_reserved_page(unsigned long pfn)
+static inline void init_reserved_page(unsigned long pfn, int nid)
{
}
#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
@@ -738,7 +735,8 @@ static inline void init_reserved_page(unsigned long pfn)
* marks the pages PageReserved. The remaining valid pages are later
* sent to the buddy page allocator.
*/
-void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end)
+void __meminit reserve_bootmem_region(phys_addr_t start,
+ phys_addr_t end, int nid)
{
unsigned long start_pfn = PFN_DOWN(start);
unsigned long end_pfn = PFN_UP(end);
@@ -747,7 +745,7 @@ void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end)
if (pfn_valid(start_pfn)) {
struct page *page = pfn_to_page(start_pfn);
- init_reserved_page(start_pfn);
+ init_reserved_page(start_pfn, nid);
/* Avoid false-positive PageTail() */
INIT_LIST_HEAD(&page->lru);
@@ -2579,8 +2577,14 @@ void __init set_dma_reserve(unsigned long new_dma_reserve)
void __init memblock_free_pages(struct page *page, unsigned long pfn,
unsigned int order)
{
- if (!early_page_initialised(pfn))
- return;
+
+ if (IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT)) {
+ int nid = early_pfn_to_nid(pfn);
+
+ if (!early_page_initialised(pfn, nid))
+ return;
+ }
+
if (!kmsan_memblock_free_pages(page, order)) {
/* KMSAN will take care of these pages. */
return;
--
2.25.1
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v4] mm: pass nid to reserve_bootmem_region()
2023-06-19 2:34 [PATCH v4] mm: pass nid to reserve_bootmem_region() Yajun Deng
@ 2023-06-20 8:45 ` Mike Rapoport
0 siblings, 0 replies; 10+ messages in thread
From: Mike Rapoport @ 2023-06-20 8:45 UTC (permalink / raw)
To: Yajun Deng; +Cc: akpm, linux-mm, linux-kernel, kernel test robot
On Mon, Jun 19, 2023 at 10:34:06AM +0800, Yajun Deng wrote:
> early_pfn_to_nid() is called frequently in init_reserved_page(), it
> returns the node id of the PFN. These PFN are probably from the same
> memory region, they have the same node id. It's not necessary to call
> early_pfn_to_nid() for each PFN.
>
> Pass nid to reserve_bootmem_region() and drop the call to
> early_pfn_to_nid() in init_reserved_page(). Also, set nid on all
> reserved pages before doing this, as some reserved memory regions may
> not be set nid.
>
> The most beneficial function is memmap_init_reserved_pages() if
> CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled.
>
> The following data was tested on an x86 machine with 190GB of RAM.
>
> before:
> memmap_init_reserved_pages() 67ms
>
> after:
> memmap_init_reserved_pages() 20ms
>
> Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202306160145.juJMr3Bi-lkp@intel.com
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
> ---
> V3 -> V4: make the test for early_page_initialised() inside if
> (IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT))
> V2 -> V3: set nid on all reserved pages before pass nid.
> V1 -> V2: fix build error when CONFIG_NUMA is not enabled.
> ---
> include/linux/mm.h | 3 ++-
> mm/memblock.c | 31 +++++++++++++++++++++----------
> mm/mm_init.c | 30 +++++++++++++++++-------------
> 3 files changed, 40 insertions(+), 24 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index fdd966b11f79..a7a0e692d44d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2960,7 +2960,8 @@ extern unsigned long free_reserved_area(void *start, void *end,
>
> extern void adjust_managed_page_count(struct page *page, long count);
>
> -extern void reserve_bootmem_region(phys_addr_t start, phys_addr_t end);
> +extern void reserve_bootmem_region(phys_addr_t start,
> + phys_addr_t end, int nid);
>
> /* Free the reserved page into the buddy system, so it gets managed. */
> static inline void free_reserved_page(struct page *page)
> diff --git a/mm/memblock.c b/mm/memblock.c
> index ff0da1858778..f9e61e565a53 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -2091,19 +2091,30 @@ static void __init memmap_init_reserved_pages(void)
> {
> struct memblock_region *region;
> phys_addr_t start, end;
> - u64 i;
> + int nid;
> +
> + /*
> + * set nid on all reserved pages and also treat struct
> + * pages for the NOMAP regions as PageReserved
> + */
> + for_each_mem_region(region) {
> + nid = memblock_get_region_node(region);
> + start = region->base;
> + end = start + region->size;
> +
> + if (memblock_is_nomap(region))
> + reserve_bootmem_region(start, end, nid);
> +
> + memblock_set_node(start, end, &memblock.reserved, nid);
> + }
>
> /* initialize struct pages for the reserved regions */
> - for_each_reserved_mem_range(i, &start, &end)
> - reserve_bootmem_region(start, end);
> + for_each_reserved_mem_region(region) {
> + nid = memblock_get_region_node(region);
> + start = region->base;
> + end = start + region->size;
>
> - /* and also treat struct pages for the NOMAP regions as PageReserved */
> - for_each_mem_region(region) {
> - if (memblock_is_nomap(region)) {
> - start = region->base;
> - end = start + region->size;
> - reserve_bootmem_region(start, end);
> - }
> + reserve_bootmem_region(start, end, nid);
> }
> }
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index d393631599a7..a1963c3322af 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -646,10 +646,8 @@ static inline void pgdat_set_deferred_range(pg_data_t *pgdat)
> }
>
> /* Returns true if the struct page for the pfn is initialised */
> -static inline bool __meminit early_page_initialised(unsigned long pfn)
> +static inline bool __meminit early_page_initialised(unsigned long pfn, int nid)
> {
> - int nid = early_pfn_to_nid(pfn);
> -
> if (node_online(nid) && pfn >= NODE_DATA(nid)->first_deferred_pfn)
> return false;
>
> @@ -695,15 +693,14 @@ defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
> return false;
> }
>
> -static void __meminit init_reserved_page(unsigned long pfn)
> +static void __meminit init_reserved_page(unsigned long pfn, int nid)
> {
> pg_data_t *pgdat;
> - int nid, zid;
> + int zid;
>
> - if (early_page_initialised(pfn))
> + if (early_page_initialised(pfn, nid))
> return;
>
> - nid = early_pfn_to_nid(pfn);
> pgdat = NODE_DATA(nid);
>
> for (zid = 0; zid < MAX_NR_ZONES; zid++) {
> @@ -717,7 +714,7 @@ static void __meminit init_reserved_page(unsigned long pfn)
> #else
> static inline void pgdat_set_deferred_range(pg_data_t *pgdat) {}
>
> -static inline bool early_page_initialised(unsigned long pfn)
> +static inline bool early_page_initialised(unsigned long pfn, int nid)
> {
> return true;
> }
> @@ -727,7 +724,7 @@ static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn)
> return false;
> }
>
> -static inline void init_reserved_page(unsigned long pfn)
> +static inline void init_reserved_page(unsigned long pfn, int nid)
> {
> }
> #endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
> @@ -738,7 +735,8 @@ static inline void init_reserved_page(unsigned long pfn)
> * marks the pages PageReserved. The remaining valid pages are later
> * sent to the buddy page allocator.
> */
> -void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end)
> +void __meminit reserve_bootmem_region(phys_addr_t start,
> + phys_addr_t end, int nid)
> {
> unsigned long start_pfn = PFN_DOWN(start);
> unsigned long end_pfn = PFN_UP(end);
> @@ -747,7 +745,7 @@ void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end)
> if (pfn_valid(start_pfn)) {
> struct page *page = pfn_to_page(start_pfn);
>
> - init_reserved_page(start_pfn);
> + init_reserved_page(start_pfn, nid);
>
> /* Avoid false-positive PageTail() */
> INIT_LIST_HEAD(&page->lru);
> @@ -2579,8 +2577,14 @@ void __init set_dma_reserve(unsigned long new_dma_reserve)
> void __init memblock_free_pages(struct page *page, unsigned long pfn,
> unsigned int order)
> {
> - if (!early_page_initialised(pfn))
> - return;
> +
> + if (IS_ENABLED(CONFIG_DEFERRED_STRUCT_PAGE_INIT)) {
> + int nid = early_pfn_to_nid(pfn);
> +
> + if (!early_page_initialised(pfn, nid))
> + return;
> + }
> +
> if (!kmsan_memblock_free_pages(page, order)) {
> /* KMSAN will take care of these pages. */
> return;
> --
> 2.25.1
>
--
Sincerely yours,
Mike.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-04-28 9:53 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-25 10:02 [PATCH v4] mm: pass nid to reserve_bootmem_region() Gutierrez Cantu, Bernardo
2025-04-25 10:20 ` [PATCH] mm: memblock: Fix arguments passed to memblock_set_node() Bernardo C. Gutierrez Cantu
2025-04-25 14:18 ` David Woodhouse
2025-04-25 14:37 ` Jiaxun Yang
2025-04-26 1:05 ` Andrew Morton
2025-04-28 9:13 ` Bernardo C. Gutierrez Cantu
2025-04-26 8:22 ` Mike Rapoport
2025-04-28 9:53 ` Bernardo C. Gutierrez Cantu
-- strict thread matches above, loose matches on Subject: below --
2023-06-19 2:34 [PATCH v4] mm: pass nid to reserve_bootmem_region() Yajun Deng
2023-06-20 8:45 ` Mike Rapoport
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox