linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@kernel.org>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH] huegtlbfs: fix page leak during migration of file pages
Date: Tue, 12 Feb 2019 02:24:28 +0000	[thread overview]
Message-ID: <20190212022428.GA12369@hori1.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <ffe58925-a301-6791-44d5-e3bec7f9ebf3@oracle.com>

On Mon, Feb 11, 2019 at 03:06:27PM -0800, Mike Kravetz wrote:
> On 2/7/19 11:31 PM, Naoya Horiguchi wrote:
> > On Thu, Feb 07, 2019 at 09:50:30PM -0800, Mike Kravetz wrote:
> >> On 2/7/19 6:31 PM, Naoya Horiguchi wrote:
> >>> On Thu, Feb 07, 2019 at 10:50:55AM -0800, Mike Kravetz wrote:
> >>>> On 1/30/19 1:14 PM, Mike Kravetz wrote:
> >>>>> +++ b/fs/hugetlbfs/inode.c
> >>>>> @@ -859,6 +859,16 @@ static int hugetlbfs_migrate_page(struct address_space *mapping,
> >>>>>  	rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> >>>>>  	if (rc != MIGRATEPAGE_SUCCESS)
> >>>>>  		return rc;
> >>>>> +
> >>>>> +	/*
> >>>>> +	 * page_private is subpool pointer in hugetlb pages, transfer
> >>>>> +	 * if needed.
> >>>>> +	 */
> >>>>> +	if (page_private(page) && !page_private(newpage)) {
> >>>>> +		set_page_private(newpage, page_private(page));
> >>>>> +		set_page_private(page, 0);
> >>>
> >>> You don't have to copy PagePrivate flag?
> >>>
> >>
> >> Well my original thought was no.  For hugetlb pages, PagePrivate is not
> >> associated with page_private.  It indicates a reservation was consumed.
> >> It is set  when a hugetlb page is newly allocated and the allocation is
> >> associated with a reservation and the global reservation count is
> >> decremented.  When the page is added to the page cache or rmap,
> >> PagePrivate is cleared.  If the page is free'ed before being added to page
> >> cache or rmap, PagePrivate tells free_huge_page to restore (increment) the
> >> reserve count as we did not 'instantiate' the page.
> >>
> >> So, PagePrivate is only set from the time a huge page is allocated until
> >> it is added to page cache or rmap.  My original thought was that the page
> >> could not be migrated during this time.  However, I am not sure if that
> >> reasoning is correct.  The page is not locked, so it would appear that it
> >> could be migrated?  But, if it can be migrated at this time then perhaps
> >> there are bigger issues for the (hugetlb) page fault code?
> > 
> > In my understanding, free hugetlb pages are not expected to be passed to
> > migrate_pages(), and currently that's ensured by each migration caller
> > which checks and avoids free hugetlb pages on its own.
> > migrate_pages() and its internal code are probably not aware of handling
> > free hugetlb pages, so if they are accidentally passed to migration code,
> > that's a big problem as you are concerned.
> > So the above reasoning should work at least this assumption is correct.
> > 
> > Most of migration callers are not intersted in moving free hugepages.
> > The one I'm not sure of is the code path from alloc_contig_range().
> > If someone think it's worthwhile to migrate free hugepage to get bigger
> > contiguous memory, he/she tries to enable that code path and the assumption
> > will be broken.
> 
> You are correct.  We do not migrate free huge pages.  I was thinking more
> about problems if we migrate a page while it is being added to a task's page
> table as in hugetlb_no_page.
> 
> Commit bcc54222309c ("mm: hugetlb: introduce page_huge_active") addresses
> this issue, but I believe there is a bug in the implementation.
> isolate_huge_page contains this test:
> 
> 	if (!page_huge_active(page) || !get_page_unless_zero(page)) {
> 		ret = false;
> 		goto unlock;
> 	}
> 
> If the condition is not met, then the huge page can be isolated and migrated.
> 
> In hugetlb_no_page, there is this block of code:
> 
>                 page = alloc_huge_page(vma, haddr, 0);
>                 if (IS_ERR(page)) {
>                         ret = vmf_error(PTR_ERR(page));
>                         goto out;
>                 }
>                 clear_huge_page(page, address, pages_per_huge_page(h));
>                 __SetPageUptodate(page);
>                 set_page_huge_active(page);
> 
>                 if (vma->vm_flags & VM_MAYSHARE) {
>                         int err = huge_add_to_page_cache(page, mapping, idx);
>                         if (err) {
>                                 put_page(page);
>                                 if (err == -EEXIST)
>                                         goto retry;
>                                 goto out;
>                         }
>                 } else {
>                         lock_page(page);
>                         if (unlikely(anon_vma_prepare(vma))) {
>                                 ret = VM_FAULT_OOM;
>                                 goto backout_unlocked;
>                         }
>                         anon_rmap = 1;
>                 }
>         } else {
> 
> Note that we call set_page_huge_active BEFORE locking the page.  This
> means that we can isolate the page and have migration take place while
> we continue to add the page to page tables.  I was able to make this
> happen by adding a udelay() after set_page_huge_active to simulate worst
> case scheduling behavior.  It resulted in VM_BUG_ON while unlocking page.
> My test had several threads faulting in huge pages.  Another thread was
> offlining the memory blocks forcing migration.

This shows another problem, so I agree we need a fix.

> 
> To fix this, we need to delay the set_page_huge_active call until after
> the page is locked.  I am testing a patch with this change.  Perhaps we
> should even delay calling set_page_huge_active until we know there are
> no errors and we know the page is actually in page tables?

Yes, calling set_page_huge_active after page table is set up sounds nice to me.

> 
> While looking at this, I think there is another issue.  When a hugetlb
> page is migrated, we do not migrate the 'page_huge_active' state of the
> page.  That should be moved as the page is migrated.  Correct?

Yes, and I think that putback_active_hugepage(new_hpage) at the last step
of migration sequence handles the copying of 'page_huge_active' state.

Thanks,
Naoya Horiguchi

  reply	other threads:[~2019-02-12  2:28 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-30 21:14 Mike Kravetz
2019-01-31 14:12 ` Sasha Levin
2019-02-01 22:36   ` Mike Kravetz
2019-02-07 18:50 ` Mike Kravetz
2019-02-08  2:31   ` Naoya Horiguchi
2019-02-08  5:50     ` Mike Kravetz
2019-02-08  7:31       ` Naoya Horiguchi
2019-02-11 23:06         ` Mike Kravetz
2019-02-12  2:24           ` Naoya Horiguchi [this message]
2019-02-12  2:37             ` Mike Kravetz
2019-02-12 22:14               ` [PATCH] huegtlbfs: fix races and page leaks during migration Mike Kravetz
2019-02-14  1:32                 ` Mike Kravetz
2019-02-15 15:48                 ` Sasha Levin
2019-02-18 21:14                 ` Sasha Levin
2019-02-21  6:09                 ` Andrew Morton
2019-02-21 19:11                   ` Mike Kravetz
2019-02-21 19:47                     ` Andrew Morton
2019-02-26  7:44                     ` Naoya Horiguchi
2019-02-27  0:35                       ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190212022428.GA12369@hori1.linux.bs1.fc.nec.co.jp \
    --to=n-horiguchi@ah.jp.nec.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox