[PATCH] mm: hugetlb: avoid fallback for specific node allocation of 1G pages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm: hugetlb: avoid fallback for specific node allocation of 1G pages
@ 2025-02-11  3:48 Luiz Capitulino
  2025-02-11  9:06 ` Oscar Salvador
  2025-02-11 13:49 ` David Hildenbrand
  0 siblings, 2 replies; 5+ messages in thread
From: Luiz Capitulino @ 2025-02-11  3:48 UTC (permalink / raw)
  To: linux-kernel, yaozhenguo1, muchun.song
  Cc: linux-mm, akpm, david, rppt, luizcap

When using the HugeTLB kernel command-line to allocate 1G pages from
a specific node, such as:

   default_hugepagesz=1G hugepages=1:1

If node 1 happens to not have enough memory for the requested number of
1G pages, the allocation falls back to other nodes. A quick way to
reproduce this is by creating a KVM guest with a memory-less node and
trying to allocate 1 1G page from it. Instead of failing, the allocation
will fallback to other nodes.

This defeats the purpose of node specific allocation. Also, specific
node allocation for 2M pages don't have this behavior: the allocation
will just fail for the pages it can't satisfy.

This issue happens because HugeTLB calls memblock_alloc_try_nid_raw()
for 1G boot-time allocation as this function falls back to other nodes
if the allocation can't be satisfied. Use memblock_alloc_exact_nid_raw()
instead, which ensures that the allocation will only be satisfied from
the specified node.

Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")

Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
---
 mm/hugetlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 65068671e460..163190e89ea1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3145,7 +3145,7 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid)

 	/* do node specific alloc */
 	if (nid != NUMA_NO_NODE) {
-		m = memblock_alloc_try_nid_raw(huge_page_size(h), huge_page_size(h),
+		m = memblock_alloc_exact_nid_raw(huge_page_size(h), huge_page_size(h),
 				0, MEMBLOCK_ALLOC_ACCESSIBLE, nid);
 		if (!m)
 			return 0;
-- 
2.48.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: hugetlb: avoid fallback for specific node allocation of 1G pages
  2025-02-11  3:48 [PATCH] mm: hugetlb: avoid fallback for specific node allocation of 1G pages Luiz Capitulino
@ 2025-02-11  9:06 ` Oscar Salvador
  2025-02-11 14:51   ` Luiz Capitulino
  2025-02-11 13:49 ` David Hildenbrand
  1 sibling, 1 reply; 5+ messages in thread
From: Oscar Salvador @ 2025-02-11  9:06 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: linux-kernel, yaozhenguo1, muchun.song, linux-mm, akpm, david,
	rppt, fvdl

On Mon, Feb 10, 2025 at 10:48:56PM -0500, Luiz Capitulino wrote:
> When using the HugeTLB kernel command-line to allocate 1G pages from
> a specific node, such as:
> 
>    default_hugepagesz=1G hugepages=1:1
> 
> If node 1 happens to not have enough memory for the requested number of
> 1G pages, the allocation falls back to other nodes. A quick way to
> reproduce this is by creating a KVM guest with a memory-less node and
> trying to allocate 1 1G page from it. Instead of failing, the allocation
> will fallback to other nodes.
> 
> This defeats the purpose of node specific allocation. Also, specific
> node allocation for 2M pages don't have this behavior: the allocation
> will just fail for the pages it can't satisfy.
> 
> This issue happens because HugeTLB calls memblock_alloc_try_nid_raw()
> for 1G boot-time allocation as this function falls back to other nodes
> if the allocation can't be satisfied. Use memblock_alloc_exact_nid_raw()
> instead, which ensures that the allocation will only be satisfied from
> the specified node.
> 
> Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
> 
> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>

Acked-by: Oscar Salvador <osalvador@suse.de>

This was discussed yesterday in [1], ccing Frank for awareness.

[1] https://patchwork.kernel.org/project/linux-mm/patch/20250206185109.1210657-6-fvdl@google.com/


-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: hugetlb: avoid fallback for specific node allocation of 1G pages
  2025-02-11  3:48 [PATCH] mm: hugetlb: avoid fallback for specific node allocation of 1G pages Luiz Capitulino
  2025-02-11  9:06 ` Oscar Salvador
@ 2025-02-11 13:49 ` David Hildenbrand
  1 sibling, 0 replies; 5+ messages in thread
From: David Hildenbrand @ 2025-02-11 13:49 UTC (permalink / raw)
  To: Luiz Capitulino, linux-kernel, yaozhenguo1, muchun.song
  Cc: linux-mm, akpm, rppt

On 11.02.25 04:48, Luiz Capitulino wrote:
> When using the HugeTLB kernel command-line to allocate 1G pages from
> a specific node, such as:
> 
>     default_hugepagesz=1G hugepages=1:1
> 
> If node 1 happens to not have enough memory for the requested number of
> 1G pages, the allocation falls back to other nodes. A quick way to
> reproduce this is by creating a KVM guest with a memory-less node and
> trying to allocate 1 1G page from it. Instead of failing, the allocation
> will fallback to other nodes.
> 
> This defeats the purpose of node specific allocation. Also, specific
> node allocation for 2M pages don't have this behavior: the allocation
> will just fail for the pages it can't satisfy.
> 
> This issue happens because HugeTLB calls memblock_alloc_try_nid_raw()
> for 1G boot-time allocation as this function falls back to other nodes
> if the allocation can't be satisfied. Use memblock_alloc_exact_nid_raw()
> instead, which ensures that the allocation will only be satisfied from
> the specified node.
> 
> Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
> 
> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
> ---
>   mm/hugetlb.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 65068671e460..163190e89ea1 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3145,7 +3145,7 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid)
>   
>   	/* do node specific alloc */
>   	if (nid != NUMA_NO_NODE) {
> -		m = memblock_alloc_try_nid_raw(huge_page_size(h), huge_page_size(h),
> +		m = memblock_alloc_exact_nid_raw(huge_page_size(h), huge_page_size(h),
>   				0, MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>   		if (!m)
>   			return 0;

Yeah, documentation says "The node format specifies the number of huge 
pages to allocate on specific nodes."

Likely the patch simply copied the memblock_alloc_try_nid_raw() call; 
memblock_alloc_exact_nid_raw() seems to be the right thing to do

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: hugetlb: avoid fallback for specific node allocation of 1G pages
  2025-02-11  9:06 ` Oscar Salvador
@ 2025-02-11 14:51   ` Luiz Capitulino
  2025-02-11 16:49     ` Frank van der Linden
  0 siblings, 1 reply; 5+ messages in thread
From: Luiz Capitulino @ 2025-02-11 14:51 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: linux-kernel, yaozhenguo1, muchun.song, linux-mm, akpm, david,
	rppt, fvdl

On 2025-02-11 04:06, Oscar Salvador wrote:
> On Mon, Feb 10, 2025 at 10:48:56PM -0500, Luiz Capitulino wrote:
>> When using the HugeTLB kernel command-line to allocate 1G pages from
>> a specific node, such as:
>>
>>     default_hugepagesz=1G hugepages=1:1
>>
>> If node 1 happens to not have enough memory for the requested number of
>> 1G pages, the allocation falls back to other nodes. A quick way to
>> reproduce this is by creating a KVM guest with a memory-less node and
>> trying to allocate 1 1G page from it. Instead of failing, the allocation
>> will fallback to other nodes.
>>
>> This defeats the purpose of node specific allocation. Also, specific
>> node allocation for 2M pages don't have this behavior: the allocation
>> will just fail for the pages it can't satisfy.
>>
>> This issue happens because HugeTLB calls memblock_alloc_try_nid_raw()
>> for 1G boot-time allocation as this function falls back to other nodes
>> if the allocation can't be satisfied. Use memblock_alloc_exact_nid_raw()
>> instead, which ensures that the allocation will only be satisfied from
>> the specified node.
>>
>> Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
>>
>> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
> 
> Acked-by: Oscar Salvador <osalvador@suse.de>
> 
> This was discussed yesterday in [1], ccing Frank for awareness.
> 
> [1] https://patchwork.kernel.org/project/linux-mm/patch/20250206185109.1210657-6-fvdl@google.com/

Interesting, thanks for the reference.

I stumbled over this issue back in December when debugging a HugeTLB issue
at Red Hat (David knows it ;) ) and had this patch pending for more than a
week now...



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: hugetlb: avoid fallback for specific node allocation of 1G pages
  2025-02-11 14:51   ` Luiz Capitulino
@ 2025-02-11 16:49     ` Frank van der Linden
  0 siblings, 0 replies; 5+ messages in thread
From: Frank van der Linden @ 2025-02-11 16:49 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Oscar Salvador, linux-kernel, yaozhenguo1, muchun.song, linux-mm,
	akpm, david, rppt

On Tue, Feb 11, 2025 at 6:51 AM Luiz Capitulino <luizcap@redhat.com> wrote:
>
> On 2025-02-11 04:06, Oscar Salvador wrote:
> > On Mon, Feb 10, 2025 at 10:48:56PM -0500, Luiz Capitulino wrote:
> >> When using the HugeTLB kernel command-line to allocate 1G pages from
> >> a specific node, such as:
> >>
> >>     default_hugepagesz=1G hugepages=1:1
> >>
> >> If node 1 happens to not have enough memory for the requested number of
> >> 1G pages, the allocation falls back to other nodes. A quick way to
> >> reproduce this is by creating a KVM guest with a memory-less node and
> >> trying to allocate 1 1G page from it. Instead of failing, the allocation
> >> will fallback to other nodes.
> >>
> >> This defeats the purpose of node specific allocation. Also, specific
> >> node allocation for 2M pages don't have this behavior: the allocation
> >> will just fail for the pages it can't satisfy.
> >>
> >> This issue happens because HugeTLB calls memblock_alloc_try_nid_raw()
> >> for 1G boot-time allocation as this function falls back to other nodes
> >> if the allocation can't be satisfied. Use memblock_alloc_exact_nid_raw()
> >> instead, which ensures that the allocation will only be satisfied from
> >> the specified node.
> >>
> >> Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
> >>
> >> Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
> >
> > Acked-by: Oscar Salvador <osalvador@suse.de>
> >
> > This was discussed yesterday in [1], ccing Frank for awareness.
> >
> > [1] https://patchwork.kernel.org/project/linux-mm/patch/20250206185109.1210657-6-fvdl@google.com/
>
> Interesting, thanks for the reference.
>
> I stumbled over this issue back in December when debugging a HugeTLB issue
> at Red Hat (David knows it ;) ) and had this patch pending for more than a
> week now...
>

Looks good, I'll drop the same change from my upcoming v4 series. This
will create a contextual dependency, but that's ok, this one will go
in first in any case.

Reviewed-by: Frank van der Linden <fvdl@google.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-02-11 16:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-11  3:48 [PATCH] mm: hugetlb: avoid fallback for specific node allocation of 1G pages Luiz Capitulino
2025-02-11  9:06 ` Oscar Salvador
2025-02-11 14:51   ` Luiz Capitulino
2025-02-11 16:49     ` Frank van der Linden
2025-02-11 13:49 ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox