linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Wupeng Ma <mawupeng1@huawei.com>
Cc: <mike.kravetz@oracle.com>, <david@redhat.com>,
	<joshua.hahnjy@gmail.com>, <muchun.song@linux.dev>,
	<linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: hugetlb: Fix incorrect fallback for subpool
Date: Thu, 17 Apr 2025 20:03:27 -0700	[thread overview]
Message-ID: <20250417200327.ef9d1aed59e198aa2c8b046e@linux-foundation.org> (raw)
In-Reply-To: <20250410062633.3102457-1-mawupeng1@huawei.com>

On Thu, 10 Apr 2025 14:26:33 +0800 Wupeng Ma <mawupeng1@huawei.com> wrote:

> During our testing with hugetlb subpool enabled, we observe that
> hstate->resv_huge_pages may underflow into negative values. Root cause
> analysis reveals a race condition in subpool reservation fallback handling
> as follow:
> 
> hugetlb_reserve_pages()
>     /* Attempt subpool reservation */
>     gbl_reserve = hugepage_subpool_get_pages(spool, chg);
> 
>     /* Global reservation may fail after subpool allocation */
>     if (hugetlb_acct_memory(h, gbl_reserve) < 0)
>         goto out_put_pages;
> 
> out_put_pages:
>     /* This incorrectly restores reservation to subpool */
>     hugepage_subpool_put_pages(spool, chg);
> 
> When hugetlb_acct_memory() fails after subpool allocation, the current
> implementation over-commits subpool reservations by returning the full
> 'chg' value instead of the actual allocated 'gbl_reserve' amount. This
> discrepancy propagates to global reservations during subsequent releases,
> eventually causing resv_huge_pages underflow.
> 
> This problem can be trigger easily with the following steps:
> 1. reverse hugepage for hugeltb allocation
> 2. mount hugetlbfs with min_size to enable hugetlb subpool
> 3. alloc hugepages with two task(make sure the second will fail due to
>    insufficient amount of hugepages)
> 4. with for a few seconds and repeat step 3 which will make
>    hstate->resv_huge_pages to go below zero.
> 
> To fix this problem, return corrent amount of pages to subpool during the
> fallback after hugepage_subpool_get_pages is called.
> 

This has been in mm-hotfixes since April 1.  Do we have any reviwers?

> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3010,7 +3010,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>  	struct hugepage_subpool *spool = subpool_vma(vma);
>  	struct hstate *h = hstate_vma(vma);
>  	struct folio *folio;
> -	long retval, gbl_chg;
> +	long retval, gbl_chg, gbl_reserve;
>  	map_chg_state map_chg;
>  	int ret, idx;
>  	struct hugetlb_cgroup *h_cg = NULL;
> @@ -3163,8 +3163,16 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
>  		hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h),
>  						    h_cg);
>  out_subpool_put:
> -	if (map_chg)
> -		hugepage_subpool_put_pages(spool, 1);
> +	/*
> +	 * put page to subpool iff the quota of subpool's rsv_hpages is used
> +	 * during hugepage_subpool_get_pages.
> +	 */
> +	if (map_chg && !gbl_chg) {
> +		gbl_reserve = hugepage_subpool_put_pages(spool, 1);
> +		hugetlb_acct_memory(h, -gbl_reserve);
> +	}
> +
> +
>  out_end_reservation:
>  	if (map_chg != MAP_CHG_ENFORCED)
>  		vma_end_reservation(h, vma, addr);
> @@ -7216,7 +7224,7 @@ bool hugetlb_reserve_pages(struct inode *inode,
>  					struct vm_area_struct *vma,
>  					vm_flags_t vm_flags)
>  {
> -	long chg = -1, add = -1;
> +	long chg = -1, add = -1, spool_resv, gbl_resv;
>  	struct hstate *h = hstate_inode(inode);
>  	struct hugepage_subpool *spool = subpool_inode(inode);
>  	struct resv_map *resv_map;
> @@ -7351,8 +7359,16 @@ bool hugetlb_reserve_pages(struct inode *inode,
>  	return true;
>  
>  out_put_pages:
> -	/* put back original number of pages, chg */
> -	(void)hugepage_subpool_put_pages(spool, chg);
> +	spool_resv = chg - gbl_reserve;
> +	if (spool_resv) {
> +		/* put sub pool's reservation back, chg - gbl_reserve */
> +		gbl_resv = hugepage_subpool_put_pages(spool, spool_resv);
> +		/*
> +		 * subpool's reserved pages can not be put back due to race,
> +		 * return to hstate.
> +		 */
> +		hugetlb_acct_memory(h, -gbl_resv);
> +	}
>  out_uncharge_cgroup:
>  	hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h),
>  					    chg * pages_per_huge_page(h), h_cg);
> -- 
> 2.43.0
> 


  reply	other threads:[~2025-04-18  3:03 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-10  6:26 Wupeng Ma
2025-04-18  3:03 ` Andrew Morton [this message]
2025-04-18  8:46   ` Oscar Salvador
2025-04-28  8:41 ` Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250417200327.ef9d1aed59e198aa2c8b046e@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mawupeng1@huawei.com \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox