linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] docs: hugetlbpage.rst: add free surplus huge pages description
@ 2025-04-19  7:32 Jinjiang Tu
  2025-04-19 17:20 ` Randy Dunlap
  2025-04-22 13:02 ` Jinjiang Tu
  0 siblings, 2 replies; 4+ messages in thread
From: Jinjiang Tu @ 2025-04-19  7:32 UTC (permalink / raw)
  To: osalvador, muchun.song, akpm, david, corbet
  Cc: linux-mm, linux-doc, wangkefeng.wang, tujinjiang

When echo 0 > /proc/sys/vm/nr_hugepages is concurrent with freeing in-use
huge pages to the huge page pool, some free huge pages may fail to be
destroyed and accounted as surplus. The counts are like below:

  HugePages_Total: 1024
  HugePages_Free: 1024
  HugePages_Surp: 1024

When set_max_huge_pages() decrease the pool size, it first return free
pages to the buddy allocator, and then account other pages as surplus.
Between the two steps, the hugetlb_lock is released to free memory and
require the hugetlb_lock again. If another process free huge pages to the
pool between the two steps, these free huge pages will be accounted as
surplus.

Besides, Free surplus huge pages come from failing to restore vmemmap.

Once the two situation occurs, users couldn't directly shrink the huge
page pool via echo 0 > nr_hugepages, should use one of the two ways to
destroy these free surplus huge pages:
 1) echo $nr_surplus > nr_hugepages to convert the surplus free huge pages
to persistent free huge pages first, and then echo 0 > nr_hugepages to
destroy these huge pages.
 2) allocate these free surplus huge pages, and will try to destroy them
when freeing them.

However, there is no documentation to describe it, users may be confused
and don't know how to handle in such case. So update the documention.

Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
---
 Documentation/admin-guide/mm/hugetlbpage.rst | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst
index 67a941903fd2..0456cefae039 100644
--- a/Documentation/admin-guide/mm/hugetlbpage.rst
+++ b/Documentation/admin-guide/mm/hugetlbpage.rst
@@ -239,6 +239,17 @@ this condition holds--that is, until ``nr_hugepages+nr_overcommit_hugepages`` is
 increased sufficiently, or the surplus huge pages go out of use and are freed--
 no more surplus huge pages will be allowed to be allocated.
 
+Caveat: Shrinking the persistent huge page pool via ``nr_hugepages`` may be
+concurrent with freeing in-use huge pages to the huge page pool, leading to some
+huge pages are still in the huge page pool and accounted as surplus. Besides,
+When the feature of freeing unused vmemmap pages associated with each hugetlb page
+is enabled, free huge page may be accounted as surplus too. In such two cases, users
+couldn't directly shrink the huge page pool via echo 0 to ``nr_hugepages``, should
+echo $nr_surplus to ``nr_hugepages`` to convert the surplus free huge pages to
+persistent free huge pages first, and then echo 0 to ``nr_hugepages`` to destroy
+these huge pages. Another way to destroy is allocating these free surplus huge
+pages and these huge pages will be tried to destroy when they are freed.
+
 With support for multiple huge page pools at run-time available, much of
 the huge page userspace interface in ``/proc/sys/vm`` has been duplicated in
 sysfs.
-- 
2.43.0



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] docs: hugetlbpage.rst: add free surplus huge pages description
  2025-04-19  7:32 [RFC PATCH] docs: hugetlbpage.rst: add free surplus huge pages description Jinjiang Tu
@ 2025-04-19 17:20 ` Randy Dunlap
  2025-04-21  1:56   ` Jinjiang Tu
  2025-04-22 13:02 ` Jinjiang Tu
  1 sibling, 1 reply; 4+ messages in thread
From: Randy Dunlap @ 2025-04-19 17:20 UTC (permalink / raw)
  To: Jinjiang Tu, osalvador, muchun.song, akpm, david, corbet
  Cc: linux-mm, linux-doc, wangkefeng.wang



On 4/19/25 12:32 AM, Jinjiang Tu wrote:
> When echo 0 > /proc/sys/vm/nr_hugepages is concurrent with freeing in-use
> huge pages to the huge page pool, some free huge pages may fail to be
> destroyed and accounted as surplus. The counts are like below:
> 
>   HugePages_Total: 1024
>   HugePages_Free: 1024
>   HugePages_Surp: 1024
> 
> When set_max_huge_pages() decrease the pool size, it first return free
> pages to the buddy allocator, and then account other pages as surplus.
> Between the two steps, the hugetlb_lock is released to free memory and
> require the hugetlb_lock again. If another process free huge pages to the
> pool between the two steps, these free huge pages will be accounted as
> surplus.
> 
> Besides, Free surplus huge pages come from failing to restore vmemmap.
> 
> Once the two situation occurs, users couldn't directly shrink the huge
> page pool via echo 0 > nr_hugepages, should use one of the two ways to
> destroy these free surplus huge pages:
>  1) echo $nr_surplus > nr_hugepages to convert the surplus free huge pages
> to persistent free huge pages first, and then echo 0 > nr_hugepages to
> destroy these huge pages.
>  2) allocate these free surplus huge pages, and will try to destroy them
> when freeing them.
> 
> However, there is no documentation to describe it, users may be confused
> and don't know how to handle in such case. So update the documention.
> 
> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
> ---
>  Documentation/admin-guide/mm/hugetlbpage.rst | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst
> index 67a941903fd2..0456cefae039 100644
> --- a/Documentation/admin-guide/mm/hugetlbpage.rst
> +++ b/Documentation/admin-guide/mm/hugetlbpage.rst
> @@ -239,6 +239,17 @@ this condition holds--that is, until ``nr_hugepages+nr_overcommit_hugepages`` is
>  increased sufficiently, or the surplus huge pages go out of use and are freed--
>  no more surplus huge pages will be allowed to be allocated.
>  
> +Caveat: Shrinking the persistent huge page pool via ``nr_hugepages`` may be
> +concurrent with freeing in-use huge pages to the huge page pool, leading to some
> +huge pages are still in the huge page pool and accounted as surplus. Besides,
> +When the feature of freeing unused vmemmap pages associated with each hugetlb page

   when

> +is enabled, free huge page may be accounted as surplus too. In such two cases, users
> +couldn't directly shrink the huge page pool via echo 0 to ``nr_hugepages``, should

                                                                               but should


Also, please limit each line to <80 characters.

> +echo $nr_surplus to ``nr_hugepages`` to convert the surplus free huge pages to
> +persistent free huge pages first, and then echo 0 to ``nr_hugepages`` to destroy
> +these huge pages. Another way to destroy is allocating these free surplus huge
> +pages and these huge pages will be tried to destroy when they are freed.
> +

But I don't see why this is a user problem to be solved by users...

>  With support for multiple huge page pools at run-time available, much of
>  the huge page userspace interface in ``/proc/sys/vm`` has been duplicated in
>  sysfs.

-- 
~Randy



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] docs: hugetlbpage.rst: add free surplus huge pages description
  2025-04-19 17:20 ` Randy Dunlap
@ 2025-04-21  1:56   ` Jinjiang Tu
  0 siblings, 0 replies; 4+ messages in thread
From: Jinjiang Tu @ 2025-04-21  1:56 UTC (permalink / raw)
  To: Randy Dunlap, osalvador, muchun.song, akpm, david, corbet
  Cc: linux-mm, linux-doc, wangkefeng.wang


在 2025/4/20 1:20, Randy Dunlap 写道:
>
> On 4/19/25 12:32 AM, Jinjiang Tu wrote:
>> When echo 0 > /proc/sys/vm/nr_hugepages is concurrent with freeing in-use
>> huge pages to the huge page pool, some free huge pages may fail to be
>> destroyed and accounted as surplus. The counts are like below:
>>
>>    HugePages_Total: 1024
>>    HugePages_Free: 1024
>>    HugePages_Surp: 1024
>>
>> When set_max_huge_pages() decrease the pool size, it first return free
>> pages to the buddy allocator, and then account other pages as surplus.
>> Between the two steps, the hugetlb_lock is released to free memory and
>> require the hugetlb_lock again. If another process free huge pages to the
>> pool between the two steps, these free huge pages will be accounted as
>> surplus.
>>
>> Besides, Free surplus huge pages come from failing to restore vmemmap.
>>
>> Once the two situation occurs, users couldn't directly shrink the huge
>> page pool via echo 0 > nr_hugepages, should use one of the two ways to
>> destroy these free surplus huge pages:
>>   1) echo $nr_surplus > nr_hugepages to convert the surplus free huge pages
>> to persistent free huge pages first, and then echo 0 > nr_hugepages to
>> destroy these huge pages.
>>   2) allocate these free surplus huge pages, and will try to destroy them
>> when freeing them.
>>
>> However, there is no documentation to describe it, users may be confused
>> and don't know how to handle in such case. So update the documention.
>>
>> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
>> ---
>>   Documentation/admin-guide/mm/hugetlbpage.rst | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst
>> index 67a941903fd2..0456cefae039 100644
>> --- a/Documentation/admin-guide/mm/hugetlbpage.rst
>> +++ b/Documentation/admin-guide/mm/hugetlbpage.rst
>> @@ -239,6 +239,17 @@ this condition holds--that is, until ``nr_hugepages+nr_overcommit_hugepages`` is
>>   increased sufficiently, or the surplus huge pages go out of use and are freed--
>>   no more surplus huge pages will be allowed to be allocated.
>>   
>> +Caveat: Shrinking the persistent huge page pool via ``nr_hugepages`` may be
>> +concurrent with freeing in-use huge pages to the huge page pool, leading to some
>> +huge pages are still in the huge page pool and accounted as surplus. Besides,
>> +When the feature of freeing unused vmemmap pages associated with each hugetlb page
>     when
>
>> +is enabled, free huge page may be accounted as surplus too. In such two cases, users
>> +couldn't directly shrink the huge page pool via echo 0 to ``nr_hugepages``, should
>                                                                                 but should
>
>
> Also, please limit each line to <80 characters.
>
>> +echo $nr_surplus to ``nr_hugepages`` to convert the surplus free huge pages to
>> +persistent free huge pages first, and then echo 0 to ``nr_hugepages`` to destroy
>> +these huge pages. Another way to destroy is allocating these free surplus huge
>> +pages and these huge pages will be tried to destroy when they are freed.
>> +
> But I don't see why this is a user problem to be solved by users...

echo xx > nr_hugepages isn't a atomic operation against huge pages allocation/free, we can't
guarantee all huge pages will be destroyed after this operation. So users have to check if
huge pages are successfully destroyed.

>
>>   With support for multiple huge page pools at run-time available, much of
>>   the huge page userspace interface in ``/proc/sys/vm`` has been duplicated in
>>   sysfs.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] docs: hugetlbpage.rst: add free surplus huge pages description
  2025-04-19  7:32 [RFC PATCH] docs: hugetlbpage.rst: add free surplus huge pages description Jinjiang Tu
  2025-04-19 17:20 ` Randy Dunlap
@ 2025-04-22 13:02 ` Jinjiang Tu
  1 sibling, 0 replies; 4+ messages in thread
From: Jinjiang Tu @ 2025-04-22 13:02 UTC (permalink / raw)
  To: osalvador, muchun.song, akpm, david, corbet
  Cc: linux-mm, linux-doc, wangkefeng.wang


在 2025/4/19 15:32, Jinjiang Tu 写道:

Hi

> When echo 0 > /proc/sys/vm/nr_hugepages is concurrent with freeing in-use
> huge pages to the huge page pool, some free huge pages may fail to be
> destroyed and accounted as surplus. The counts are like below:
>
>    HugePages_Total: 1024
>    HugePages_Free: 1024
>    HugePages_Surp: 1024
>
> When set_max_huge_pages() decrease the pool size, it first return free
> pages to the buddy allocator, and then account other pages as surplus.
> Between the two steps, the hugetlb_lock is released to free memory and
> require the hugetlb_lock again. If another process free huge pages to the
> pool between the two steps, these free huge pages will be accounted as
> surplus.

I think this is a constraint of interface nr_hugepages, this interface couldn't
guarantee all huge pages will be freed. How do you think about it?

Thanks.

> Besides, Free surplus huge pages come from failing to restore vmemmap.
>
> Once the two situation occurs, users couldn't directly shrink the huge
> page pool via echo 0 > nr_hugepages, should use one of the two ways to
> destroy these free surplus huge pages:
>   1) echo $nr_surplus > nr_hugepages to convert the surplus free huge pages
> to persistent free huge pages first, and then echo 0 > nr_hugepages to
> destroy these huge pages.
>   2) allocate these free surplus huge pages, and will try to destroy them
> when freeing them.
>
> However, there is no documentation to describe it, users may be confused
> and don't know how to handle in such case. So update the documention.
>
> Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
> ---
>   Documentation/admin-guide/mm/hugetlbpage.rst | 11 +++++++++++
>   1 file changed, 11 insertions(+)
>
> diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst
> index 67a941903fd2..0456cefae039 100644
> --- a/Documentation/admin-guide/mm/hugetlbpage.rst
> +++ b/Documentation/admin-guide/mm/hugetlbpage.rst
> @@ -239,6 +239,17 @@ this condition holds--that is, until ``nr_hugepages+nr_overcommit_hugepages`` is
>   increased sufficiently, or the surplus huge pages go out of use and are freed--
>   no more surplus huge pages will be allowed to be allocated.
>   
> +Caveat: Shrinking the persistent huge page pool via ``nr_hugepages`` may be
> +concurrent with freeing in-use huge pages to the huge page pool, leading to some
> +huge pages are still in the huge page pool and accounted as surplus. Besides,
> +When the feature of freeing unused vmemmap pages associated with each hugetlb page
> +is enabled, free huge page may be accounted as surplus too. In such two cases, users
> +couldn't directly shrink the huge page pool via echo 0 to ``nr_hugepages``, should
> +echo $nr_surplus to ``nr_hugepages`` to convert the surplus free huge pages to
> +persistent free huge pages first, and then echo 0 to ``nr_hugepages`` to destroy
> +these huge pages. Another way to destroy is allocating these free surplus huge
> +pages and these huge pages will be tried to destroy when they are freed.
> +
>   With support for multiple huge page pools at run-time available, much of
>   the huge page userspace interface in ``/proc/sys/vm`` has been duplicated in
>   sysfs.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-04-22 13:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-19  7:32 [RFC PATCH] docs: hugetlbpage.rst: add free surplus huge pages description Jinjiang Tu
2025-04-19 17:20 ` Randy Dunlap
2025-04-21  1:56   ` Jinjiang Tu
2025-04-22 13:02 ` Jinjiang Tu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox