linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jinjiang Tu <tujinjiang@huawei.com>
To: <osalvador@suse.de>, <david@redhat.com>,
	<akpm@linux-foundation.org>, <muchun.song@linux.dev>
Cc: <linux-mm@kvack.org>, <wangkefeng.wang@huawei.com>,
	<tujinjiang@huawei.com>
Subject: [PATCH v3] mm/hugetlb: fix set_max_huge_pages() when there are surplus pages
Date: Mon, 7 Apr 2025 20:47:06 +0800	[thread overview]
Message-ID: <20250407124706.2688092-1-tujinjiang@huawei.com> (raw)

In set_max_huge_pages(), min_count should mean the acquired persistent
huge pages, but it contains surplus huge pages. It will leads to failing
to freeing free huge pages for a Node.

Steps to reproduce:
1) create 5 hugetlb folios in Node0
2) run a program to use all the hugetlb folios
3) echo 0 > nr_hugepages for Node0 to free the hugetlb folios. Thus the 5
hugetlb folios in Node0 are accounted as surplus.
4) create 5 hugetlb folios in Node1
5) echo 0 > nr_hugepages for Node1 to free the hugetlb folios

The result:
        Node0    Node1
Total     5         5
Free      0         5
Surp      5         5

We couldn't subtract surplus_huge_pages from min_mount, since free hugetlb
folios may be surplus due to HVO. In __update_and_free_hugetlb_folio(),
hugetlb_vmemmap_restore_folio() may fail, add the folio back to pool and
treat it as surplus. If we directly subtract surplus_huge_pages from
min_mount, some free folios will be subtracted twice.

To fix it, check if count is less than the num of free huge pages that
could be destroyed (i.e., available_huge_pages(h)), and remove hugetlb
folios if so.

Since there may exist free surplus hugetlb folios, we should remove
surplus folios first to make surplus count correct.

The result with this patch:
        Node0    Node1
Total     5         0
Free      0         0
Surp      5         0

Fixes: 9a30523066cd ("hugetlb: add per node hstate attributes")
Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
---
Changelog since v2:
 * Fix this issue by calculating free surplus count, and add comments,
suggested by Oscar Salvador

 mm/hugetlb.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 39f92aad7bd1..e4aed3557339 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3825,6 +3825,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed,
 static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
 			      nodemask_t *nodes_allowed)
 {
+	unsigned long persistent_free_count;
 	unsigned long min_count;
 	unsigned long allocated;
 	struct folio *folio;
@@ -3959,8 +3960,24 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
 	 * though, we'll note that we're not allowed to exceed surplus
 	 * and won't grow the pool anywhere else. Not until one of the
 	 * sysctls are changed, or the surplus pages go out of use.
+	 *
+	 * min_count is the expected number of persistent pages, we
+	 * shouldn't calculate min_count by using
+	 * resv_huge_pages + persistent_huge_pages() - free_huge_pages,
+	 * because there may exist free surplus huge pages, and this will
+	 * lead to subtracting twice. Free surplus huge pages come from HVO
+	 * failing to restore vmemmap, see comments in the callers of
+	 * hugetlb_vmemmap_restore_folio(). Thus, we should calculate
+	 * persistent free count first.
 	 */
-	min_count = h->resv_huge_pages + h->nr_huge_pages - h->free_huge_pages;
+	persistent_free_count = h->free_huge_pages;
+	if (h->free_huge_pages > persistent_huge_pages(h)) {
+		if (h->free_huge_pages > h->surplus_huge_pages)
+			persistent_free_count -= h->surplus_huge_pages;
+		else
+			persistent_free_count = 0;
+	}
+	min_count = h->resv_huge_pages + persistent_huge_pages(h) - persistent_free_count;
 	min_count = max(count, min_count);
 	try_to_free_low(h, min_count, nodes_allowed);
 
-- 
2.43.0



             reply	other threads:[~2025-04-07 12:57 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-07 12:47 Jinjiang Tu [this message]
2025-04-08 13:10 ` Oscar Salvador
2025-04-09  3:29   ` Jinjiang Tu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250407124706.2688092-1-tujinjiang@huawei.com \
    --to=tujinjiang@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=wangkefeng.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox