* [PATCH] mm: hugetlb: Fix incorrect fallback for subpool
@ 2025-04-10 6:26 Wupeng Ma
2025-04-18 3:03 ` Andrew Morton
2025-04-28 8:41 ` Oscar Salvador
0 siblings, 2 replies; 4+ messages in thread
From: Wupeng Ma @ 2025-04-10 6:26 UTC (permalink / raw)
To: akpm, mike.kravetz, david, joshua.hahnjy
Cc: muchun.song, mawupeng1, linux-mm, linux-kernel
During our testing with hugetlb subpool enabled, we observe that
hstate->resv_huge_pages may underflow into negative values. Root cause
analysis reveals a race condition in subpool reservation fallback handling
as follow:
hugetlb_reserve_pages()
/* Attempt subpool reservation */
gbl_reserve = hugepage_subpool_get_pages(spool, chg);
/* Global reservation may fail after subpool allocation */
if (hugetlb_acct_memory(h, gbl_reserve) < 0)
goto out_put_pages;
out_put_pages:
/* This incorrectly restores reservation to subpool */
hugepage_subpool_put_pages(spool, chg);
When hugetlb_acct_memory() fails after subpool allocation, the current
implementation over-commits subpool reservations by returning the full
'chg' value instead of the actual allocated 'gbl_reserve' amount. This
discrepancy propagates to global reservations during subsequent releases,
eventually causing resv_huge_pages underflow.
This problem can be trigger easily with the following steps:
1. reverse hugepage for hugeltb allocation
2. mount hugetlbfs with min_size to enable hugetlb subpool
3. alloc hugepages with two task(make sure the second will fail due to
insufficient amount of hugepages)
4. with for a few seconds and repeat step 3 which will make
hstate->resv_huge_pages to go below zero.
To fix this problem, return corrent amount of pages to subpool during the
fallback after hugepage_subpool_get_pages is called.
Fixes: 1c5ecae3a93f ("hugetlbfs: add minimum size accounting to subpools")
Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
Tested-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
mm/hugetlb.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 39f92aad7bd1e..50bd1fe3ab400 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3010,7 +3010,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
struct hugepage_subpool *spool = subpool_vma(vma);
struct hstate *h = hstate_vma(vma);
struct folio *folio;
- long retval, gbl_chg;
+ long retval, gbl_chg, gbl_reserve;
map_chg_state map_chg;
int ret, idx;
struct hugetlb_cgroup *h_cg = NULL;
@@ -3163,8 +3163,16 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h),
h_cg);
out_subpool_put:
- if (map_chg)
- hugepage_subpool_put_pages(spool, 1);
+ /*
+ * put page to subpool iff the quota of subpool's rsv_hpages is used
+ * during hugepage_subpool_get_pages.
+ */
+ if (map_chg && !gbl_chg) {
+ gbl_reserve = hugepage_subpool_put_pages(spool, 1);
+ hugetlb_acct_memory(h, -gbl_reserve);
+ }
+
+
out_end_reservation:
if (map_chg != MAP_CHG_ENFORCED)
vma_end_reservation(h, vma, addr);
@@ -7216,7 +7224,7 @@ bool hugetlb_reserve_pages(struct inode *inode,
struct vm_area_struct *vma,
vm_flags_t vm_flags)
{
- long chg = -1, add = -1;
+ long chg = -1, add = -1, spool_resv, gbl_resv;
struct hstate *h = hstate_inode(inode);
struct hugepage_subpool *spool = subpool_inode(inode);
struct resv_map *resv_map;
@@ -7351,8 +7359,16 @@ bool hugetlb_reserve_pages(struct inode *inode,
return true;
out_put_pages:
- /* put back original number of pages, chg */
- (void)hugepage_subpool_put_pages(spool, chg);
+ spool_resv = chg - gbl_reserve;
+ if (spool_resv) {
+ /* put sub pool's reservation back, chg - gbl_reserve */
+ gbl_resv = hugepage_subpool_put_pages(spool, spool_resv);
+ /*
+ * subpool's reserved pages can not be put back due to race,
+ * return to hstate.
+ */
+ hugetlb_acct_memory(h, -gbl_resv);
+ }
out_uncharge_cgroup:
hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h),
chg * pages_per_huge_page(h), h_cg);
--
2.43.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] mm: hugetlb: Fix incorrect fallback for subpool
2025-04-10 6:26 [PATCH] mm: hugetlb: Fix incorrect fallback for subpool Wupeng Ma
@ 2025-04-18 3:03 ` Andrew Morton
2025-04-18 8:46 ` Oscar Salvador
2025-04-28 8:41 ` Oscar Salvador
1 sibling, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2025-04-18 3:03 UTC (permalink / raw)
To: Wupeng Ma
Cc: mike.kravetz, david, joshua.hahnjy, muchun.song, linux-mm, linux-kernel
On Thu, 10 Apr 2025 14:26:33 +0800 Wupeng Ma <mawupeng1@huawei.com> wrote:
> During our testing with hugetlb subpool enabled, we observe that
> hstate->resv_huge_pages may underflow into negative values. Root cause
> analysis reveals a race condition in subpool reservation fallback handling
> as follow:
>
> hugetlb_reserve_pages()
> /* Attempt subpool reservation */
> gbl_reserve = hugepage_subpool_get_pages(spool, chg);
>
> /* Global reservation may fail after subpool allocation */
> if (hugetlb_acct_memory(h, gbl_reserve) < 0)
> goto out_put_pages;
>
> out_put_pages:
> /* This incorrectly restores reservation to subpool */
> hugepage_subpool_put_pages(spool, chg);
>
> When hugetlb_acct_memory() fails after subpool allocation, the current
> implementation over-commits subpool reservations by returning the full
> 'chg' value instead of the actual allocated 'gbl_reserve' amount. This
> discrepancy propagates to global reservations during subsequent releases,
> eventually causing resv_huge_pages underflow.
>
> This problem can be trigger easily with the following steps:
> 1. reverse hugepage for hugeltb allocation
> 2. mount hugetlbfs with min_size to enable hugetlb subpool
> 3. alloc hugepages with two task(make sure the second will fail due to
> insufficient amount of hugepages)
> 4. with for a few seconds and repeat step 3 which will make
> hstate->resv_huge_pages to go below zero.
>
> To fix this problem, return corrent amount of pages to subpool during the
> fallback after hugepage_subpool_get_pages is called.
>
This has been in mm-hotfixes since April 1. Do we have any reviwers?
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3010,7 +3010,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> struct hugepage_subpool *spool = subpool_vma(vma);
> struct hstate *h = hstate_vma(vma);
> struct folio *folio;
> - long retval, gbl_chg;
> + long retval, gbl_chg, gbl_reserve;
> map_chg_state map_chg;
> int ret, idx;
> struct hugetlb_cgroup *h_cg = NULL;
> @@ -3163,8 +3163,16 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
> hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h),
> h_cg);
> out_subpool_put:
> - if (map_chg)
> - hugepage_subpool_put_pages(spool, 1);
> + /*
> + * put page to subpool iff the quota of subpool's rsv_hpages is used
> + * during hugepage_subpool_get_pages.
> + */
> + if (map_chg && !gbl_chg) {
> + gbl_reserve = hugepage_subpool_put_pages(spool, 1);
> + hugetlb_acct_memory(h, -gbl_reserve);
> + }
> +
> +
> out_end_reservation:
> if (map_chg != MAP_CHG_ENFORCED)
> vma_end_reservation(h, vma, addr);
> @@ -7216,7 +7224,7 @@ bool hugetlb_reserve_pages(struct inode *inode,
> struct vm_area_struct *vma,
> vm_flags_t vm_flags)
> {
> - long chg = -1, add = -1;
> + long chg = -1, add = -1, spool_resv, gbl_resv;
> struct hstate *h = hstate_inode(inode);
> struct hugepage_subpool *spool = subpool_inode(inode);
> struct resv_map *resv_map;
> @@ -7351,8 +7359,16 @@ bool hugetlb_reserve_pages(struct inode *inode,
> return true;
>
> out_put_pages:
> - /* put back original number of pages, chg */
> - (void)hugepage_subpool_put_pages(spool, chg);
> + spool_resv = chg - gbl_reserve;
> + if (spool_resv) {
> + /* put sub pool's reservation back, chg - gbl_reserve */
> + gbl_resv = hugepage_subpool_put_pages(spool, spool_resv);
> + /*
> + * subpool's reserved pages can not be put back due to race,
> + * return to hstate.
> + */
> + hugetlb_acct_memory(h, -gbl_resv);
> + }
> out_uncharge_cgroup:
> hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h),
> chg * pages_per_huge_page(h), h_cg);
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] mm: hugetlb: Fix incorrect fallback for subpool
2025-04-18 3:03 ` Andrew Morton
@ 2025-04-18 8:46 ` Oscar Salvador
0 siblings, 0 replies; 4+ messages in thread
From: Oscar Salvador @ 2025-04-18 8:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Wupeng Ma, mike.kravetz, david, joshua.hahnjy, muchun.song,
linux-mm, linux-kernel
On Thu, Apr 17, 2025 at 08:03:27PM -0700, Andrew Morton wrote:
> This has been in mm-hotfixes since April 1. Do we have any reviwers?
Sorry, this slipped through the cracks.
I plan to review this but I will not ne able to do so until middle of
the next week.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] mm: hugetlb: Fix incorrect fallback for subpool
2025-04-10 6:26 [PATCH] mm: hugetlb: Fix incorrect fallback for subpool Wupeng Ma
2025-04-18 3:03 ` Andrew Morton
@ 2025-04-28 8:41 ` Oscar Salvador
1 sibling, 0 replies; 4+ messages in thread
From: Oscar Salvador @ 2025-04-28 8:41 UTC (permalink / raw)
To: Wupeng Ma
Cc: akpm, mike.kravetz, david, joshua.hahnjy, muchun.song, linux-mm,
linux-kernel
On Thu, Apr 10, 2025 at 02:26:33PM +0800, Wupeng Ma wrote:
> During our testing with hugetlb subpool enabled, we observe that
> hstate->resv_huge_pages may underflow into negative values. Root cause
> analysis reveals a race condition in subpool reservation fallback handling
> as follow:
>
> hugetlb_reserve_pages()
> /* Attempt subpool reservation */
> gbl_reserve = hugepage_subpool_get_pages(spool, chg);
>
> /* Global reservation may fail after subpool allocation */
> if (hugetlb_acct_memory(h, gbl_reserve) < 0)
> goto out_put_pages;
>
> out_put_pages:
> /* This incorrectly restores reservation to subpool */
> hugepage_subpool_put_pages(spool, chg);
>
> When hugetlb_acct_memory() fails after subpool allocation, the current
> implementation over-commits subpool reservations by returning the full
> 'chg' value instead of the actual allocated 'gbl_reserve' amount. This
> discrepancy propagates to global reservations during subsequent releases,
> eventually causing resv_huge_pages underflow.
>
> This problem can be trigger easily with the following steps:
> 1. reverse hugepage for hugeltb allocation
> 2. mount hugetlbfs with min_size to enable hugetlb subpool
> 3. alloc hugepages with two task(make sure the second will fail due to
> insufficient amount of hugepages)
> 4. with for a few seconds and repeat step 3 which will make
> hstate->resv_huge_pages to go below zero.
>
> To fix this problem, return corrent amount of pages to subpool during the
> fallback after hugepage_subpool_get_pages is called.
>
> Fixes: 1c5ecae3a93f ("hugetlbfs: add minimum size accounting to subpools")
> Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
> Tested-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-04-28 8:41 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-10 6:26 [PATCH] mm: hugetlb: Fix incorrect fallback for subpool Wupeng Ma
2025-04-18 3:03 ` Andrew Morton
2025-04-18 8:46 ` Oscar Salvador
2025-04-28 8:41 ` Oscar Salvador
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox