linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm/hugetlb: fix surplus pages in dissolve_free_huge_page()
@ 2025-03-04 13:21 Jinjiang Tu
  2025-03-04 13:43 ` Oscar Salvador
  0 siblings, 1 reply; 3+ messages in thread
From: Jinjiang Tu @ 2025-03-04 13:21 UTC (permalink / raw)
  To: akpm, muchun.song, david, osalvador
  Cc: linux-mm, wangkefeng.wang, sunnanyong, tujinjiang

In dissolve_free_huge_page(), free huge pages are dissolved without
adjusting surplus count. However, free huge pages may be accounted as
surplus pages, and will lead to wrong surplus count.

I reproduce this issue on qemu. The steps are:
1) Node1 is memory-less at first. Hot-add memory to node1 by executing
the two commands in qemu monitor:
  object_add memory-backend-ram,id=mem1,size=1G
  device_add pc-dimm,id=dimm1,memdev=mem1,node=1
2) online one memory block of Node1 with:
  echo online_movable > /sys/devices/system/node/node1/memoryX/state
3) create 64 huge pages for node1
4) run a program to reserve (don't consume) all the huge pages
5) echo 0 > nr_huge_pages for node1. After this step, free huge pages in
Node1 are surplus.
6) create 80 huge pages for node0
7) offline memory of node1, The memory range to offline contains the free
surplus huge pages created in step3) ~ step5)
  echo offline > /sys/devices/system/node/node1/memoryX/state
8) kill the program in step 4)

The result:
           Node0     Node1
total       80        0
free        80        0
surplus     0         61

To fix it, adjust surplus when destroying huge pages if the node has
surplus pages in dissolve_free_hugetlb_folio().

The result with this patch:
           Node0     Node1
total       80        0
free        80        0
surplus     0         0

Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
---
Changelog since v1:
 improve commit message, suggested by David Hildenbrand

 mm/hugetlb.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 163190e89ea1..2a24ade9d157 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2135,6 +2135,8 @@ int dissolve_free_hugetlb_folio(struct folio *folio)
 
 	if (!folio_ref_count(folio)) {
 		struct hstate *h = folio_hstate(folio);
+		bool adjust_surplus = false;
+
 		if (!available_huge_pages(h))
 			goto out;
 
@@ -2157,7 +2159,9 @@ int dissolve_free_hugetlb_folio(struct folio *folio)
 			goto retry;
 		}
 
-		remove_hugetlb_folio(h, folio, false);
+		if (h->surplus_huge_pages_node[folio_nid(folio)])
+			adjust_surplus = true;
+		remove_hugetlb_folio(h, folio, adjust_surplus);
 		h->max_huge_pages--;
 		spin_unlock_irq(&hugetlb_lock);
 
@@ -2177,7 +2181,7 @@ int dissolve_free_hugetlb_folio(struct folio *folio)
 			rc = hugetlb_vmemmap_restore_folio(h, folio);
 			if (rc) {
 				spin_lock_irq(&hugetlb_lock);
-				add_hugetlb_folio(h, folio, false);
+				add_hugetlb_folio(h, folio, adjust_surplus);
 				h->max_huge_pages++;
 				goto out;
 			}
-- 
2.43.0



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] mm/hugetlb: fix surplus pages in dissolve_free_huge_page()
  2025-03-04 13:21 [PATCH v2] mm/hugetlb: fix surplus pages in dissolve_free_huge_page() Jinjiang Tu
@ 2025-03-04 13:43 ` Oscar Salvador
  2025-03-05  3:46   ` Jinjiang Tu
  0 siblings, 1 reply; 3+ messages in thread
From: Oscar Salvador @ 2025-03-04 13:43 UTC (permalink / raw)
  To: Jinjiang Tu
  Cc: akpm, muchun.song, david, linux-mm, wangkefeng.wang, sunnanyong

On Tue, Mar 04, 2025 at 09:21:06PM +0800, Jinjiang Tu wrote:
> In dissolve_free_huge_page(), free huge pages are dissolved without
> adjusting surplus count. However, free huge pages may be accounted as
> surplus pages, and will lead to wrong surplus count.
> 
> I reproduce this issue on qemu. The steps are:
> 1) Node1 is memory-less at first. Hot-add memory to node1 by executing
> the two commands in qemu monitor:
>   object_add memory-backend-ram,id=mem1,size=1G
>   device_add pc-dimm,id=dimm1,memdev=mem1,node=1
> 2) online one memory block of Node1 with:
>   echo online_movable > /sys/devices/system/node/node1/memoryX/state
> 3) create 64 huge pages for node1
> 4) run a program to reserve (don't consume) all the huge pages
> 5) echo 0 > nr_huge_pages for node1. After this step, free huge pages in
> Node1 are surplus.
> 6) create 80 huge pages for node0
> 7) offline memory of node1, The memory range to offline contains the free
> surplus huge pages created in step3) ~ step5)
>   echo offline > /sys/devices/system/node/node1/memoryX/state
> 8) kill the program in step 4)
> 
> The result:
>            Node0     Node1
> total       80        0
> free        80        0
> surplus     0         61
> 
> To fix it, adjust surplus when destroying huge pages if the node has
> surplus pages in dissolve_free_hugetlb_folio().
> 
> The result with this patch:
>            Node0     Node1
> total       80        0
> free        80        0
> surplus     0         0
> 
> Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>

Acked-by: Oscar Salvador <osalvador@suse.de>

> @@ -2157,7 +2159,9 @@ int dissolve_free_hugetlb_folio(struct folio *folio)
>  			goto retry;
>  		}
>  
> -		remove_hugetlb_folio(h, folio, false);
> +		if (h->surplus_huge_pages_node[folio_nid(folio)])
> +			adjust_surplus = true;
> +		remove_hugetlb_folio(h, folio, adjust_surplus);
>  		h->max_huge_pages--;
>  		spin_unlock_irq(&hugetlb_lock);
>  
> @@ -2177,7 +2181,7 @@ int dissolve_free_hugetlb_folio(struct folio *folio)
>  			rc = hugetlb_vmemmap_restore_folio(h, folio);
>  			if (rc) {
>  				spin_lock_irq(&hugetlb_lock);
> -				add_hugetlb_folio(h, folio, false);
> +				add_hugetlb_folio(h, folio, adjust_surplus);

I was about to point this out, but checking v1 I saw that David already
that.
My alternative would have been to just get rid of the adjust_surplus
boolean and to the checking right within the lock cycle e.g:

 if (h->surplus_huge_pages_node[folio_nid(folio)])
        add_hugetlb_folio(h, folio, true);
 else
        add_hugetlb_folio(h, folio, false);

But I guess that's fine as you already explained.

 

-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] mm/hugetlb: fix surplus pages in dissolve_free_huge_page()
  2025-03-04 13:43 ` Oscar Salvador
@ 2025-03-05  3:46   ` Jinjiang Tu
  0 siblings, 0 replies; 3+ messages in thread
From: Jinjiang Tu @ 2025-03-05  3:46 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: akpm, muchun.song, david, linux-mm, wangkefeng.wang, sunnanyong


在 2025/3/4 21:43, Oscar Salvador 写道:
> On Tue, Mar 04, 2025 at 09:21:06PM +0800, Jinjiang Tu wrote:
>> In dissolve_free_huge_page(), free huge pages are dissolved without
>> adjusting surplus count. However, free huge pages may be accounted as
>> surplus pages, and will lead to wrong surplus count.
>>
>> I reproduce this issue on qemu. The steps are:
>> 1) Node1 is memory-less at first. Hot-add memory to node1 by executing
>> the two commands in qemu monitor:
>>    object_add memory-backend-ram,id=mem1,size=1G
>>    device_add pc-dimm,id=dimm1,memdev=mem1,node=1
>> 2) online one memory block of Node1 with:
>>    echo online_movable > /sys/devices/system/node/node1/memoryX/state
>> 3) create 64 huge pages for node1
>> 4) run a program to reserve (don't consume) all the huge pages
>> 5) echo 0 > nr_huge_pages for node1. After this step, free huge pages in
>> Node1 are surplus.
>> 6) create 80 huge pages for node0
>> 7) offline memory of node1, The memory range to offline contains the free
>> surplus huge pages created in step3) ~ step5)
>>    echo offline > /sys/devices/system/node/node1/memoryX/state
>> 8) kill the program in step 4)
>>
>> The result:
>>             Node0     Node1
>> total       80        0
>> free        80        0
>> surplus     0         61
>>
>> To fix it, adjust surplus when destroying huge pages if the node has
>> surplus pages in dissolve_free_hugetlb_folio().
>>
>> The result with this patch:
>>             Node0     Node1
>> total       80        0
>> free        80        0
>> surplus     0         0
>>
>> Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
>> Acked-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
> Acked-by: Oscar Salvador <osalvador@suse.de>
>
>> @@ -2157,7 +2159,9 @@ int dissolve_free_hugetlb_folio(struct folio *folio)
>>   			goto retry;
>>   		}
>>   
>> -		remove_hugetlb_folio(h, folio, false);
>> +		if (h->surplus_huge_pages_node[folio_nid(folio)])
>> +			adjust_surplus = true;
>> +		remove_hugetlb_folio(h, folio, adjust_surplus);
>>   		h->max_huge_pages--;
>>   		spin_unlock_irq(&hugetlb_lock);
>>   
>> @@ -2177,7 +2181,7 @@ int dissolve_free_hugetlb_folio(struct folio *folio)
>>   			rc = hugetlb_vmemmap_restore_folio(h, folio);
>>   			if (rc) {
>>   				spin_lock_irq(&hugetlb_lock);
>> -				add_hugetlb_folio(h, folio, false);
>> +				add_hugetlb_folio(h, folio, adjust_surplus);
> I was about to point this out, but checking v1 I saw that David already
> that.
> My alternative would have been to just get rid of the adjust_surplus
> boolean and to the checking right within the lock cycle e.g:
>
>   if (h->surplus_huge_pages_node[folio_nid(folio)])
>          add_hugetlb_folio(h, folio, true);
>   else
>          add_hugetlb_folio(h, folio, false);

It seems wrong to fix like this. h->surplus_huge_pages_node[folio_nid(folio)] !=0 means existing surplus
pages after removing folio (the variable named folio), doesn't mean if the folio need
to be treated as surplus too.

> But I guess that's fine as you already explained.
>
>   
>


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-03-05  3:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-04 13:21 [PATCH v2] mm/hugetlb: fix surplus pages in dissolve_free_huge_page() Jinjiang Tu
2025-03-04 13:43 ` Oscar Salvador
2025-03-05  3:46   ` Jinjiang Tu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox