[PATCH v2] mm/hugetlb: Fix hugepage allocation for interleaved memory nodes

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] mm/hugetlb: Fix hugepage allocation for interleaved memory nodes
@ 2025-01-11 11:06 Ritesh Harjani (IBM)
  2025-01-13  4:47 ` Sourabh Jain
  2025-01-20 21:39 ` Luiz Capitulino
  0 siblings, 2 replies; 3+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-01-11 11:06 UTC (permalink / raw)
  To: linux-mm
  Cc: Ritesh Harjani (IBM),
	Donet Tom, Gang Li, Daniel Jordan, Muchun Song, David Rientjes,
	Sourabh Jain, Pavithra Prakash

gather_bootmem_prealloc() function assumes the start nid as 0 and size as
num_node_state(N_MEMORY). That means in case if memory attached numa nodes
are interleaved, then gather_bootmem_prealloc_parallel() will fail to scan
few of these nodes.

Since memory attached numa nodes can be interleaved in any fashion, hence
ensure that the current code checks for all numa node ids
(.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that
it can distributes all nr_node_ids among the these many no. threads.

e.g. qemu cmdline
========================
numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20"
mem_cmd="-object memory-backend-ram,id=mem1,size=16G"

w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
==========================
~ # cat /proc/meminfo  |grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:               0 kB

with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
===========================
~ # cat /proc/meminfo |grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:       2
HugePages_Free:        2
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:         2097152 kB

Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization")
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Gang Li <gang.li@linux.dev>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: David Rientjes <rientjes@google.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: linux-mm@kvack.org
Suggested-by: Muchun Song <muchun.song@linux.dev>
Reported-by: Pavithra Prakash <pavrampu@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
v1 -> v2:
1. Made .size = nr_node_ids instead of only online nodes as suggested by Muchun.

[v1]: https://lore.kernel.org/linux-mm/7e0ca1e8acd7dd5c1fe7cbb252de4eb55a8e851b.1727984881.git.ritesh.list@gmail.com

 mm/hugetlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c498874a7170..4e2a1e907ec5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3289,7 +3289,7 @@ static void __init gather_bootmem_prealloc(void)
 		.thread_fn	= gather_bootmem_prealloc_parallel,
 		.fn_arg		= NULL,
 		.start		= 0,
-		.size		= num_node_state(N_MEMORY),
+		.size		= nr_node_ids,
 		.align		= 1,
 		.min_chunk	= 1,
 		.max_threads	= num_node_state(N_MEMORY),
--
2.39.5



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] mm/hugetlb: Fix hugepage allocation for interleaved memory nodes
  2025-01-11 11:06 [PATCH v2] mm/hugetlb: Fix hugepage allocation for interleaved memory nodes Ritesh Harjani (IBM)
@ 2025-01-13  4:47 ` Sourabh Jain
  2025-01-20 21:39 ` Luiz Capitulino
  1 sibling, 0 replies; 3+ messages in thread
From: Sourabh Jain @ 2025-01-13  4:47 UTC (permalink / raw)
  To: Ritesh Harjani (IBM), linux-mm
  Cc: Donet Tom, Gang Li, Daniel Jordan, Muchun Song, David Rientjes,
	Pavithra Prakash

Hello Ritesh,


On 11/01/25 16:36, Ritesh Harjani (IBM) wrote:
> gather_bootmem_prealloc() function assumes the start nid as 0 and size as
> num_node_state(N_MEMORY). That means in case if memory attached numa nodes
> are interleaved, then gather_bootmem_prealloc_parallel() will fail to scan
> few of these nodes.

Yes, I observed this issue on PowerVM platform where:

[root ~]# dmesg | grep -i numa
[    0.000000] numa: Partition configured for 32 NUMA nodes.
[    0.631132] numa: Node 1 CPUs: 0-15

With the above numa configuration passing `hugepagesz=1G hugepages=2` 
the Hugetlb
states in /proc/meminfo was 0.

#cat /proc/meminfo
snip...
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB

Where as with this FIX Included passing `hugepagesz=1G hugepages=2` on 
the same machine
showing correct Hugetlb states in /proc/meminfo:

# cat /proc/meminfo
snip...
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         2097152 kB


Thanks for the fix:

Tested-by: Sourabh Jain <sourabhjain@liunx.ibm.com>

- Sourabh Jain

>
> Since memory attached numa nodes can be interleaved in any fashion, hence
> ensure that the current code checks for all numa node ids
> (.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that
> it can distributes all nr_node_ids among the these many no. threads.
>
> e.g. qemu cmdline
> ========================
> numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20"
> mem_cmd="-object memory-backend-ram,id=mem1,size=16G"
>
> w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
> ==========================
> ~ # cat /proc/meminfo  |grep -i huge
> AnonHugePages:         0 kB
> ShmemHugePages:        0 kB
> FileHugePages:         0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:    1048576 kB
> Hugetlb:               0 kB
>
> with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
> ===========================
> ~ # cat /proc/meminfo |grep -i huge
> AnonHugePages:         0 kB
> ShmemHugePages:        0 kB
> FileHugePages:         0 kB
> HugePages_Total:       2
> HugePages_Free:        2
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:    1048576 kB
> Hugetlb:         2097152 kB
>
> Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization")
> Cc: Donet Tom <donettom@linux.ibm.com>
> Cc: Gang Li <gang.li@linux.dev>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
> Cc: linux-mm@kvack.org
> Suggested-by: Muchun Song <muchun.song@linux.dev>
> Reported-by: Pavithra Prakash <pavrampu@linux.ibm.com>
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> ---
> v1 -> v2:
> 1. Made .size = nr_node_ids instead of only online nodes as suggested by Muchun.
>
> [v1]: https://lore.kernel.org/linux-mm/7e0ca1e8acd7dd5c1fe7cbb252de4eb55a8e851b.1727984881.git.ritesh.list@gmail.com
>
>   mm/hugetlb.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c498874a7170..4e2a1e907ec5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3289,7 +3289,7 @@ static void __init gather_bootmem_prealloc(void)
>   		.thread_fn	= gather_bootmem_prealloc_parallel,
>   		.fn_arg		= NULL,
>   		.start		= 0,
> -		.size		= num_node_state(N_MEMORY),
> +		.size		= nr_node_ids,
>   		.align		= 1,
>   		.min_chunk	= 1,
>   		.max_threads	= num_node_state(N_MEMORY),
> --
> 2.39.5
>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] mm/hugetlb: Fix hugepage allocation for interleaved memory nodes
  2025-01-11 11:06 [PATCH v2] mm/hugetlb: Fix hugepage allocation for interleaved memory nodes Ritesh Harjani (IBM)
  2025-01-13  4:47 ` Sourabh Jain
@ 2025-01-20 21:39 ` Luiz Capitulino
  1 sibling, 0 replies; 3+ messages in thread
From: Luiz Capitulino @ 2025-01-20 21:39 UTC (permalink / raw)
  To: Ritesh Harjani (IBM), linux-mm
  Cc: Donet Tom, Gang Li, Daniel Jordan, Muchun Song, David Rientjes,
	Sourabh Jain, Pavithra Prakash

On 2025-01-11 06:06, Ritesh Harjani (IBM) wrote:
> gather_bootmem_prealloc() function assumes the start nid as 0 and size as
> num_node_state(N_MEMORY). That means in case if memory attached numa nodes
> are interleaved, then gather_bootmem_prealloc_parallel() will fail to scan
> few of these nodes.
> 
> Since memory attached numa nodes can be interleaved in any fashion, hence
> ensure that the current code checks for all numa node ids
> (.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that
> it can distributes all nr_node_ids among the these many no. threads.

With this change we guarantee that we check huge_boot_pages[] for all
available nodes (vs. only for num_node_state(N_MEMORY)) which looks
correct to me so:

   Reviewed-by: Luiz Capitulino <luizcap@redhat.com>

Now, although not really related to this patch, there's one detail: in
gather_bootmem_prealloc_node() we call prep_and_add_bootmem_folios()
even when folio_list is empty. This may cause a few calls to
flush_tlb_all() down the code path when CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP=y
even when huge_boot_pages[] is empty...

> 
> e.g. qemu cmdline
> ========================
> numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20"
> mem_cmd="-object memory-backend-ram,id=mem1,size=16G"
> 
> w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
> ==========================
> ~ # cat /proc/meminfo  |grep -i huge
> AnonHugePages:         0 kB
> ShmemHugePages:        0 kB
> FileHugePages:         0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:    1048576 kB
> Hugetlb:               0 kB
> 
> with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
> ===========================
> ~ # cat /proc/meminfo |grep -i huge
> AnonHugePages:         0 kB
> ShmemHugePages:        0 kB
> FileHugePages:         0 kB
> HugePages_Total:       2
> HugePages_Free:        2
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:    1048576 kB
> Hugetlb:         2097152 kB
> 
> Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization")
> Cc: Donet Tom <donettom@linux.ibm.com>
> Cc: Gang Li <gang.li@linux.dev>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
> Cc: linux-mm@kvack.org
> Suggested-by: Muchun Song <muchun.song@linux.dev>
> Reported-by: Pavithra Prakash <pavrampu@linux.ibm.com>
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> ---
> v1 -> v2:
> 1. Made .size = nr_node_ids instead of only online nodes as suggested by Muchun.
> 
> [v1]: https://lore.kernel.org/linux-mm/7e0ca1e8acd7dd5c1fe7cbb252de4eb55a8e851b.1727984881.git.ritesh.list@gmail.com
> 
>   mm/hugetlb.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c498874a7170..4e2a1e907ec5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3289,7 +3289,7 @@ static void __init gather_bootmem_prealloc(void)
>   		.thread_fn	= gather_bootmem_prealloc_parallel,
>   		.fn_arg		= NULL,
>   		.start		= 0,
> -		.size		= num_node_state(N_MEMORY),
> +		.size		= nr_node_ids,
>   		.align		= 1,
>   		.min_chunk	= 1,
>   		.max_threads	= num_node_state(N_MEMORY),
> --
> 2.39.5
> 
> 



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-01-20 21:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-11 11:06 [PATCH v2] mm/hugetlb: Fix hugepage allocation for interleaved memory nodes Ritesh Harjani (IBM)
2025-01-13  4:47 ` Sourabh Jain
2025-01-20 21:39 ` Luiz Capitulino

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox