From: Jinjiang Tu <tujinjiang@huawei.com>
To: <osalvador@suse.de>, <david@redhat.com>,
<akpm@linux-foundation.org>, <muchun.song@linux.dev>
Cc: <linux-mm@kvack.org>, <wangkefeng.wang@huawei.com>,
<tujinjiang@huawei.com>
Subject: [PATCH v3] mm/hugetlb: fix set_max_huge_pages() when there are surplus pages
Date: Mon, 7 Apr 2025 20:47:06 +0800 [thread overview]
Message-ID: <20250407124706.2688092-1-tujinjiang@huawei.com> (raw)
In set_max_huge_pages(), min_count should mean the acquired persistent
huge pages, but it contains surplus huge pages. It will leads to failing
to freeing free huge pages for a Node.
Steps to reproduce:
1) create 5 hugetlb folios in Node0
2) run a program to use all the hugetlb folios
3) echo 0 > nr_hugepages for Node0 to free the hugetlb folios. Thus the 5
hugetlb folios in Node0 are accounted as surplus.
4) create 5 hugetlb folios in Node1
5) echo 0 > nr_hugepages for Node1 to free the hugetlb folios
The result:
Node0 Node1
Total 5 5
Free 0 5
Surp 5 5
We couldn't subtract surplus_huge_pages from min_mount, since free hugetlb
folios may be surplus due to HVO. In __update_and_free_hugetlb_folio(),
hugetlb_vmemmap_restore_folio() may fail, add the folio back to pool and
treat it as surplus. If we directly subtract surplus_huge_pages from
min_mount, some free folios will be subtracted twice.
To fix it, check if count is less than the num of free huge pages that
could be destroyed (i.e., available_huge_pages(h)), and remove hugetlb
folios if so.
Since there may exist free surplus hugetlb folios, we should remove
surplus folios first to make surplus count correct.
The result with this patch:
Node0 Node1
Total 5 0
Free 0 0
Surp 5 0
Fixes: 9a30523066cd ("hugetlb: add per node hstate attributes")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
---
Changelog since v2:
* Fix this issue by calculating free surplus count, and add comments,
suggested by Oscar Salvador
mm/hugetlb.c | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 39f92aad7bd1..e4aed3557339 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3825,6 +3825,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed,
static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
nodemask_t *nodes_allowed)
{
+ unsigned long persistent_free_count;
unsigned long min_count;
unsigned long allocated;
struct folio *folio;
@@ -3959,8 +3960,24 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
* though, we'll note that we're not allowed to exceed surplus
* and won't grow the pool anywhere else. Not until one of the
* sysctls are changed, or the surplus pages go out of use.
+ *
+ * min_count is the expected number of persistent pages, we
+ * shouldn't calculate min_count by using
+ * resv_huge_pages + persistent_huge_pages() - free_huge_pages,
+ * because there may exist free surplus huge pages, and this will
+ * lead to subtracting twice. Free surplus huge pages come from HVO
+ * failing to restore vmemmap, see comments in the callers of
+ * hugetlb_vmemmap_restore_folio(). Thus, we should calculate
+ * persistent free count first.
*/
- min_count = h->resv_huge_pages + h->nr_huge_pages - h->free_huge_pages;
+ persistent_free_count = h->free_huge_pages;
+ if (h->free_huge_pages > persistent_huge_pages(h)) {
+ if (h->free_huge_pages > h->surplus_huge_pages)
+ persistent_free_count -= h->surplus_huge_pages;
+ else
+ persistent_free_count = 0;
+ }
+ min_count = h->resv_huge_pages + persistent_huge_pages(h) - persistent_free_count;
min_count = max(count, min_count);
try_to_free_low(h, min_count, nodes_allowed);
--
2.43.0
next reply other threads:[~2025-04-07 12:57 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-07 12:47 Jinjiang Tu [this message]
2025-04-08 13:10 ` Oscar Salvador
2025-04-09 3:29 ` Jinjiang Tu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250407124706.2688092-1-tujinjiang@huawei.com \
--to=tujinjiang@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=wangkefeng.wang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox