linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Miaohe Lin <linmiaohe@huawei.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <songmuchun@bytedance.com>,
	Linux-MM <linux-mm@kvack.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [bug report] mm/hugetlb: various bugs with avoid_reserve case in alloc_huge_page()
Date: Fri, 19 Aug 2022 12:11:12 -0700	[thread overview]
Message-ID: <Yv/gUCt+ayp/KSoO@monkey> (raw)
In-Reply-To: <5b1b60d6-e699-2330-0b6f-14c8dd5d78d4@huawei.com>

On 08/19/22 15:20, Miaohe Lin wrote:
> On 2022/8/19 6:43, Mike Kravetz wrote:
> > On 08/17/22 16:31, Miaohe Lin wrote:
> >> Hi all:
> >>     When I investigate the mm/hugetlb.c code again, I found there are a few possible issues
> >> with avoid_reserve case. (It's really hard to follow the relevant code for me.) Please take
> >> a look at the below analysis:
> > 
> > Thank you for taking a close look at this code!
> > 
> > I agree that the code is hard to follow.  I have spent many hours/days/weeks
> > chasing down the cause of incorrect reservation counts.  I imagine there could
> > be more issues, especially when you add the uncommon avoid_reserve and
> > MAP_NORESERVE processing.
> 
> Many thanks for your time and reply, Mike!

Well, hugetlb reservations interrupted my sleep again :)  See below.

> > 
> >> 1.avoid_reserve issue with h->resv_huge_pages in alloc_huge_page.
> > 
> > Did you actually see this issue, or is it just based on code inspection?
> 
> No, it's based on code inspection. ;)
> 
> > I tried to recreate, but could not.  When looking closer, this may not
> > even be possible.
> > 
> >>     Assume:
> >> 	h->free_huge_pages 60
> >> 	h->resv_huge_pages 30
> >> 	spool->rsv_hpages  30
> > 
> > OK.
> > 
> >>
> >>     When avoid_reserve is true, after alloc_huge_page(), we will have:
> > 
> > Take a close look at the calling paths for alloc_huge_page when avoid_reserve
> > is true.  There are only two such call paths.
> > 1) copy_hugetlb_page_range - We allocate pages in the 'early COW' processing.
> >    In such cases, the pages are private and not associated with a file, or
> >    filesystem or subpool (spool).  Therefore, there should be no spool
> >    modifications.
> 
> Agree.
> 
> > 2) hugetlb_wp (formerly called hugetlb_cow) - Again, we are allocating a
> >    private page and should not be modifying spool.
> 
> Agree.
> 
> > 
> > If the above is correct, then we will not modify spool->rsv_hpages which
> > leads to the inconsistent results.
> 
> I missed to verify whether spool will be modified in avoid_reserve case. Sorry about that.
> 

That is how it SHOULD work.  However, there is a problem with a MAP_PRIVATE
mapping of a hugetlb file.  In this case, subpool_vma will return the
subpool associated with the file, and we will end up with a leaked reservation
as in your example.  I have verified with test code.

The first thought I had is that something like the following should be added.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 474bfbe9929e..5aa19574e890 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -254,7 +258,9 @@ static inline struct hugepage_subpool *subpool_inode(struct inode *inode)
 
 static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma)
 {
-	return subpool_inode(file_inode(vma->vm_file));
+	if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED))
+		return subpool_inode(file_inode(vma->vm_file));
+	return NULL;		/* no subpool for private mappings */
 }
 
 /* Helper that removes a struct file_region from the resv_map cache and returns


That certainly addresses the MAP_PRIVATE mapping of a hugetlb file issue.
I will collect up patches for issues we discover and submit together.

> > It is confusing that MAP_NORESERVE does not imply avoid_reserve will be
> > passed to alloc_huge_page.
> 
> It's introduced to guarantee that COW faults for a process that called mmap(MAP_PRIVATE) will succeed via commit
> 04f2cbe35699 ("hugetlb: guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed").
> It seems it has nothing to do with MAP_NORESERVE.
> 
> > 
> >> 	spool->rsv_hpages  29 /* hugepage_subpool_get_pages decreases it. */
> >> 	h->free_huge_pages 59
> >> 	h->resv_huge_pages 30 /* rsv_hpages is used, but *h->resv_huge_pages is not modified accordingly*. */
> >>
> >>     If the hugetlb page is freed later, we will have:
> >> 	spool->rsv_hpages  30 /* hugepage_subpool_put_pages increases it. */
> >> 	h->free_huge_pages 60
> >> 	h->resv_huge_pages 31 /* *increased wrongly* due to hugepage_subpool_put_pages(spool, 1) == 0. */
> >> 			   ^^
> >>
> > 
> > I'll take a closer look at 2 and 3 when we determine if 1 is a possible
> > issue or not.
> 
> I want to propose removing the avoid_reserve code. When called from above case 1) or 2), vma_needs_reservation()
> will always return 1 as there's no reservation for it. Also hugepage_subpool_get_pages() will always return 1 as
> it's not associated with a spool. So when avoid_reserve == true, map_chg and gbl_chg must be 1 and vma_has_reserves()
> will always return "false". As a result, passing in avoid_reserve == true will do nothing in fact. So it can be simply
> removed. Or am I miss something again?

I will take a closer look.  But, at a high level if avoid_reserve == true and
all pages are reserved we must fail the allocation or attempt dynamic
allocation if overcommit is allowed.  So, it seems we at least need the
flag to make this decision.
-- 
Mike Kravetz

> 
> If avoid_reserve code can be removed, below issue 2 and 3 won't be possible as they rely on avoid_reserve doing its work.
> 
> Thanks!
> Miaohe Lin


  reply	other threads:[~2022-08-19 19:11 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17  8:31 Miaohe Lin
2022-08-18 22:43 ` Mike Kravetz
2022-08-19  7:20   ` Miaohe Lin
2022-08-19 19:11     ` Mike Kravetz [this message]
2022-08-22  3:13       ` Miaohe Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yv/gUCt+ayp/KSoO@monkey \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox