[PATCH v2] mm/hugetlb: Restore failed global reservations to subpool

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] mm/hugetlb: Restore failed global reservations to subpool
@ 2026-01-16 20:40 Joshua Hahn
  2026-01-21 17:47 ` Andrew Morton
  2026-02-11  0:44 ` Usama Arif
  0 siblings, 2 replies; 5+ messages in thread
From: Joshua Hahn @ 2026-01-16 20:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Muchun Song, Oscar Salvador, Wupeng Ma,
	linux-kernel, linux-mm, stable, kernel-team

Commit a833a693a490 ("mm: hugetlb: fix incorrect fallback for subpool")
fixed an underflow error for hstate->resv_huge_pages caused by
incorrectly attributing globally requested pages to the subpool's
reservation.

Unfortunately, this fix also introduced the opposite problem, which would
leave spool->used_hpages elevated if the globally requested pages could
not be acquired. This is because while a subpool's reserve pages only
accounts for what is requested and allocated from the subpool, its
"used" counter keeps track of what is consumed in total, both from the
subpool and globally. Thus, we need to adjust spool->used_hpages in the
other direction, and make sure that globally requested pages are
uncharged from the subpool's used counter.

Each failed allocation attempt increments the used_hpages counter by
how many pages were requested from the global pool. Ultimately, this
renders the subpool unusable, as used_hpages approaches the max limit.

The issue can be reproduced as follows:
1. Allocate 4 hugetlb pages
2. Create a hugetlb mount with max=4, min=2
3. Consume 2 pages globally
4. Request 3 pages from the subpool (2 from subpool + 1 from global)
	4.1 hugepage_subpool_get_pages(spool, 3) succeeds.
		used_hpages += 3
	4.2 hugetlb_acct_memory(h, 1) fails: no global pages left
		used_hpages -= 2
5. Subpool now has used_hpages = 1, despite not being able to
   successfully allocate any hugepages. It believes it can now only
   allocate 3 more hugepages, not 4.

Repeating this process will ultimately render the subpool unable to
allocate any hugepages, since it believes that it is using the maximum
number of hugepages that the subpool has been allotted.

The underflow issue that the original commit fixes still remains fixed
as well.

Fixes: a833a693a490 ("mm: hugetlb: fix incorrect fallback for subpool")
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: stable@vger.kernel.org
---
v1 --> v2
- Moved "unsigned long flags" definition into the if statement it is used in
- Separated fix patch from cleanup patches for easier backporting for stable.

 mm/hugetlb.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5a147026633f..e48ff0c771f8 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6713,6 +6713,15 @@ long hugetlb_reserve_pages(struct inode *inode,
 		 */
 		hugetlb_acct_memory(h, -gbl_resv);
 	}
+	/* Restore used_hpages for pages that failed global reservation */
+	if (gbl_reserve && spool) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&spool->lock, flags);
+		if (spool->max_hpages != -1)
+			spool->used_hpages -= gbl_reserve;
+		unlock_or_release_subpool(spool, flags);
+	}
 out_uncharge_cgroup:
 	hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h),
 					    chg * pages_per_huge_page(h), h_cg);

base-commit: c1a60bf0f6df5c8a6cb6840a0d2fb0e9caf9f7cc
-- 
2.47.3

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mm/hugetlb: Restore failed global reservations to subpool
  2026-01-16 20:40 [PATCH v2] mm/hugetlb: Restore failed global reservations to subpool Joshua Hahn
@ 2026-01-21 17:47 ` Andrew Morton
  2026-02-03  2:39   ` Andrew Morton
  2026-02-11  0:44 ` Usama Arif
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2026-01-21 17:47 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: David Hildenbrand, Muchun Song, Oscar Salvador, Wupeng Ma,
	linux-kernel, linux-mm, stable, kernel-team

On Fri, 16 Jan 2026 15:40:36 -0500 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:

> Commit a833a693a490 ("mm: hugetlb: fix incorrect fallback for subpool")
> fixed an underflow error for hstate->resv_huge_pages caused by
> incorrectly attributing globally requested pages to the subpool's
> reservation.
> 
> Unfortunately, this fix also introduced the opposite problem, which would
> leave spool->used_hpages elevated if the globally requested pages could
> not be acquired. This is because while a subpool's reserve pages only
> accounts for what is requested and allocated from the subpool, its
> "used" counter keeps track of what is consumed in total, both from the
> subpool and globally. Thus, we need to adjust spool->used_hpages in the
> other direction, and make sure that globally requested pages are
> uncharged from the subpool's used counter.
> 
> ...
> 
> Fixes: a833a693a490 ("mm: hugetlb: fix incorrect fallback for subpool")
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> Cc: stable@vger.kernel.org

This (simple, cc:stable) patch presently has no reviews, if someone
could please be so kind.

> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6713,6 +6713,15 @@ long hugetlb_reserve_pages(struct inode *inode,
>  		 */
>  		hugetlb_acct_memory(h, -gbl_resv);
>  	}
> +	/* Restore used_hpages for pages that failed global reservation */
> +	if (gbl_reserve && spool) {
> +		unsigned long flags;
> +
> +		spin_lock_irqsave(&spool->lock, flags);
> +		if (spool->max_hpages != -1)
> +			spool->used_hpages -= gbl_reserve;
> +		unlock_or_release_subpool(spool, flags);
> +	}
>  out_uncharge_cgroup:
>  	hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h),
>  					    chg * pages_per_huge_page(h), h_cg);
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mm/hugetlb: Restore failed global reservations to subpool
  2026-01-21 17:47 ` Andrew Morton
@ 2026-02-03  2:39   ` Andrew Morton
  2026-02-03  3:23     ` Joshua Hahn
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2026-02-03  2:39 UTC (permalink / raw)
  To: Joshua Hahn, David Hildenbrand, Muchun Song, Oscar Salvador,
	Wupeng Ma, linux-kernel, linux-mm, stable, kernel-team

On Wed, 21 Jan 2026 09:47:54 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:

> On Fri, 16 Jan 2026 15:40:36 -0500 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
> 
> > Commit a833a693a490 ("mm: hugetlb: fix incorrect fallback for subpool")
> > fixed an underflow error for hstate->resv_huge_pages caused by
> > incorrectly attributing globally requested pages to the subpool's
> > reservation.
> > 
> > Unfortunately, this fix also introduced the opposite problem, which would
> > leave spool->used_hpages elevated if the globally requested pages could
> > not be acquired. This is because while a subpool's reserve pages only
> > accounts for what is requested and allocated from the subpool, its
> > "used" counter keeps track of what is consumed in total, both from the
> > subpool and globally. Thus, we need to adjust spool->used_hpages in the
> > other direction, and make sure that globally requested pages are
> > uncharged from the subpool's used counter.
> > 
> > ...
> > 
> > Fixes: a833a693a490 ("mm: hugetlb: fix incorrect fallback for subpool")
> > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> > Cc: stable@vger.kernel.org
> 
> This (simple, cc:stable) patch presently has no reviews, if someone
> could please be so kind.

Oh.

Joshua, it's unclear from the changelog - what are the userspace-visible
effects of the bug?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mm/hugetlb: Restore failed global reservations to subpool
  2026-02-03  2:39   ` Andrew Morton
@ 2026-02-03  3:23     ` Joshua Hahn
  0 siblings, 0 replies; 5+ messages in thread
From: Joshua Hahn @ 2026-02-03  3:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Muchun Song, Oscar Salvador, Wupeng Ma,
	linux-kernel, linux-mm, stable, kernel-team

On Mon, 2 Feb 2026 18:39:18 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 21 Jan 2026 09:47:54 -0800 Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > On Fri, 16 Jan 2026 15:40:36 -0500 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
> > 
> > > Commit a833a693a490 ("mm: hugetlb: fix incorrect fallback for subpool")
> > > fixed an underflow error for hstate->resv_huge_pages caused by
> > > incorrectly attributing globally requested pages to the subpool's
> > > reservation.
> > > 
> > > Unfortunately, this fix also introduced the opposite problem, which would
> > > leave spool->used_hpages elevated if the globally requested pages could
> > > not be acquired. This is because while a subpool's reserve pages only
> > > accounts for what is requested and allocated from the subpool, its
> > > "used" counter keeps track of what is consumed in total, both from the
> > > subpool and globally. Thus, we need to adjust spool->used_hpages in the
> > > other direction, and make sure that globally requested pages are
> > > uncharged from the subpool's used counter.
> > > 
> > > ...
> > > 
> > > Fixes: a833a693a490 ("mm: hugetlb: fix incorrect fallback for subpool")
> > > Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> > > Cc: stable@vger.kernel.org
> > 
> > This (simple, cc:stable) patch presently has no reviews, if someone
> > could please be so kind.
> 
> Oh.
> 
> Joshua, it's unclear from the changelog - what are the userspace-visible
> effects of the bug?

Hello Andrew,

Sorry about that, I definitely could have been more explicit with the
userspace behavior. What ends up happening is that the subpool will
imagine that all of its hugeTLB pages are consumed, so it will be
unable to service allocations trying to get hugeTLB pages from it,
despite none of the hugeTLB pages in the system really being used.

Maybe we can reword the following block:

> > > Repeating this process will ultimately render the subpool unable to
> > > allocate any hugepages, since it believes that it is using the maximum
> > > number of hugepages that the subpool has been allotted.

Into this block, to make it more explicit?

With each failed allocation attempt incrementing the used counter, the
subpool eventually reaches a point where its used counter equals its
max counter. At that point, any future allocations that try to allocate
hugeTLB pages from the subpool will fail, despite the subpool not having
any of its hugeTLB pages consumed by any user.

Once this happens, there is no way to make the subpool usable again,
since there is no way to decrement the used counter as no process
is really consuming the hugeTLB pages.

I hope this makes it a bit more clear, and please let me know if there is
anything else I can do! I hope you have a great evening,

Joshua

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] mm/hugetlb: Restore failed global reservations to subpool
  2026-01-16 20:40 [PATCH v2] mm/hugetlb: Restore failed global reservations to subpool Joshua Hahn
  2026-01-21 17:47 ` Andrew Morton
@ 2026-02-11  0:44 ` Usama Arif
  1 sibling, 0 replies; 5+ messages in thread
From: Usama Arif @ 2026-02-11  0:44 UTC (permalink / raw)
  To: Joshua Hahn
  Cc: Usama Arif, Andrew Morton, David Hildenbrand, Muchun Song,
	Oscar Salvador, Wupeng Ma, linux-kernel, linux-mm, stable,
	kernel-team

On Fri, 16 Jan 2026 15:40:36 -0500 Joshua Hahn <joshua.hahnjy@gmail.com> wrote:

> Commit a833a693a490 ("mm: hugetlb: fix incorrect fallback for subpool")
> fixed an underflow error for hstate->resv_huge_pages caused by
> incorrectly attributing globally requested pages to the subpool's
> reservation.
> 
> Unfortunately, this fix also introduced the opposite problem, which would
> leave spool->used_hpages elevated if the globally requested pages could
> not be acquired. This is because while a subpool's reserve pages only
> accounts for what is requested and allocated from the subpool, its
> "used" counter keeps track of what is consumed in total, both from the
> subpool and globally. Thus, we need to adjust spool->used_hpages in the
> other direction, and make sure that globally requested pages are
> uncharged from the subpool's used counter.
> 
> Each failed allocation attempt increments the used_hpages counter by
> how many pages were requested from the global pool. Ultimately, this
> renders the subpool unusable, as used_hpages approaches the max limit.
> 
> The issue can be reproduced as follows:
> 1. Allocate 4 hugetlb pages
> 2. Create a hugetlb mount with max=4, min=2
> 3. Consume 2 pages globally
> 4. Request 3 pages from the subpool (2 from subpool + 1 from global)
> 	4.1 hugepage_subpool_get_pages(spool, 3) succeeds.
> 		used_hpages += 3
> 	4.2 hugetlb_acct_memory(h, 1) fails: no global pages left
> 		used_hpages -= 2
> 5. Subpool now has used_hpages = 1, despite not being able to
>    successfully allocate any hugepages. It believes it can now only
>    allocate 3 more hugepages, not 4.
> 
> Repeating this process will ultimately render the subpool unable to
> allocate any hugepages, since it believes that it is using the maximum
> number of hugepages that the subpool has been allotted.
> 
> The underflow issue that the original commit fixes still remains fixed
> as well.
> 
> Fixes: a833a693a490 ("mm: hugetlb: fix incorrect fallback for subpool")
> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
> Cc: stable@vger.kernel.org
> ---
> v1 --> v2
> - Moved "unsigned long flags" definition into the if statement it is used in
> - Separated fix patch from cleanup patches for easier backporting for stable.
> 
>  mm/hugetlb.c | 9 +++++++++
>  1 file changed, 9 insertions(+)

Makes sense. Without this, used_hpages would keep on leaking if
hugetlb_acct_memory fails.

Acked-by: Usama Arif <usama.arif@linux.dev>

> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 5a147026633f..e48ff0c771f8 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6713,6 +6713,15 @@ long hugetlb_reserve_pages(struct inode *inode,
>  		 */
>  		hugetlb_acct_memory(h, -gbl_resv);
>  	}
> +	/* Restore used_hpages for pages that failed global reservation */
> +	if (gbl_reserve && spool) {
> +		unsigned long flags;
> +
> +		spin_lock_irqsave(&spool->lock, flags);
> +		if (spool->max_hpages != -1)
> +			spool->used_hpages -= gbl_reserve;
> +		unlock_or_release_subpool(spool, flags);
> +	}
>  out_uncharge_cgroup:
>  	hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h),
>  					    chg * pages_per_huge_page(h), h_cg);
> 
> base-commit: c1a60bf0f6df5c8a6cb6840a0d2fb0e9caf9f7cc
> -- 
> 2.47.3
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-02-11  0:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-16 20:40 [PATCH v2] mm/hugetlb: Restore failed global reservations to subpool Joshua Hahn
2026-01-21 17:47 ` Andrew Morton
2026-02-03  2:39   ` Andrew Morton
2026-02-03  3:23     ` Joshua Hahn
2026-02-11  0:44 ` Usama Arif

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox