From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Michal Hocko <mhocko@kernel.org>,
"Andrea Arcangeli" <aarcange@redhat.com>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
Mel Gorman <mgorman@techsingularity.net>,
Davidlohr Bueso <dave@stgolabs.net>,
Andrew Morton <akpm@linux-foundation.org>,
"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH] huegtlbfs: fix page leak during migration of file pages
Date: Fri, 8 Feb 2019 02:31:32 +0000 [thread overview]
Message-ID: <20190208023132.GA25778@hori1.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <917e7673-051b-e475-8711-ed012cff4c44@oracle.com>
On Thu, Feb 07, 2019 at 10:50:55AM -0800, Mike Kravetz wrote:
> On 1/30/19 1:14 PM, Mike Kravetz wrote:
> > Files can be created and mapped in an explicitly mounted hugetlbfs
> > filesystem. If pages in such files are migrated, the filesystem
> > usage will not be decremented for the associated pages. This can
> > result in mmap or page allocation failures as it appears there are
> > fewer pages in the filesystem than there should be.
>
> Does anyone have a little time to take a look at this?
>
> While migration of hugetlb pages 'should' not be a common issue, we
> have seen it happen via soft memory errors/page poisoning in production
> environments. Didn't see a leak in that case as it was with pages in a
> Sys V shared mem segment. However, our DB code is starting to make use
> of files in explicitly mounted hugetlbfs filesystems. Therefore, we are
> more likely to hit this bug in the field.
Hi Mike,
Thank you for finding/reporting the problem.
# sorry for my late response.
>
> >
> > For example, a test program which hole punches, faults and migrates
> > pages in such a file (1G in size) will eventually fail because it
> > can not allocate a page. Reported counts and usage at time of failure:
> >
> > node0
> > 537 free_hugepages
> > 1024 nr_hugepages
> > 0 surplus_hugepages
> > node1
> > 1000 free_hugepages
> > 1024 nr_hugepages
> > 0 surplus_hugepages
> >
> > Filesystem Size Used Avail Use% Mounted on
> > nodev 4.0G 4.0G 0 100% /var/opt/hugepool
> >
> > Note that the filesystem shows 4G of pages used, while actual usage is
> > 511 pages (just under 1G). Failed trying to allocate page 512.
> >
> > If a hugetlb page is associated with an explicitly mounted filesystem,
> > this information in contained in the page_private field. At migration
> > time, this information is not preserved. To fix, simply transfer
> > page_private from old to new page at migration time if necessary. Also,
> > migrate_page_states() unconditionally clears page_private and PagePrivate
> > of the old page. It is unlikely, but possible that these fields could
> > be non-NULL and are needed at hugetlb free page time. So, do not touch
> > these fields for hugetlb pages.
> >
> > Cc: <stable@vger.kernel.org>
> > Fixes: 290408d4a250 ("hugetlb: hugepage migration core")
> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> > ---
> > fs/hugetlbfs/inode.c | 10 ++++++++++
> > mm/migrate.c | 10 ++++++++--
> > 2 files changed, 18 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> > index 32920a10100e..fb6de1db8806 100644
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -859,6 +859,16 @@ static int hugetlbfs_migrate_page(struct address_space *mapping,
> > rc = migrate_huge_page_move_mapping(mapping, newpage, page);
> > if (rc != MIGRATEPAGE_SUCCESS)
> > return rc;
> > +
> > + /*
> > + * page_private is subpool pointer in hugetlb pages, transfer
> > + * if needed.
> > + */
> > + if (page_private(page) && !page_private(newpage)) {
> > + set_page_private(newpage, page_private(page));
> > + set_page_private(page, 0);
You don't have to copy PagePrivate flag?
> > + }
> > +
> > if (mode != MIGRATE_SYNC_NO_COPY)
> > migrate_page_copy(newpage, page);
> > else
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index f7e4bfdc13b7..0d9708803553 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -703,8 +703,14 @@ void migrate_page_states(struct page *newpage, struct page *page)
> > */
> > if (PageSwapCache(page))
> > ClearPageSwapCache(page);
> > - ClearPagePrivate(page);
> > - set_page_private(page, 0);
> > + /*
> > + * Unlikely, but PagePrivate and page_private could potentially
> > + * contain information needed at hugetlb free page time.
> > + */
> > + if (!PageHuge(page)) {
> > + ClearPagePrivate(page);
> > + set_page_private(page, 0);
> > + }
# This argument is mainly for existing code...
According to the comment on migrate_page():
/*
* Common logic to directly migrate a single LRU page suitable for
* pages that do not use PagePrivate/PagePrivate2.
*
* Pages are locked upon entry and exit.
*/
int migrate_page(struct address_space *mapping, ...
So this common logic assumes that page_private is not used, so why do
we explicitly clear page_private in migrate_page_states()?
buffer_migrate_page(), which is commonly used for the case when
page_private is used, does that clearing outside migrate_page_states().
So I thought that hugetlbfs_migrate_page() could do in the similar manner.
IOW, migrate_page_states() should not do anything on PagePrivate.
But there're a few other .migratepage callbacks, and I'm not sure all of
them are safe for the change, so this approach might not fit for a small fix.
# BTW, there seems a typo in $SUBJECT.
Thanks,
Naoya Horiguchi
next prev parent reply other threads:[~2019-02-08 2:33 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-30 21:14 Mike Kravetz
2019-01-31 14:12 ` Sasha Levin
2019-02-01 22:36 ` Mike Kravetz
2019-02-07 18:50 ` Mike Kravetz
2019-02-08 2:31 ` Naoya Horiguchi [this message]
2019-02-08 5:50 ` Mike Kravetz
2019-02-08 7:31 ` Naoya Horiguchi
2019-02-11 23:06 ` Mike Kravetz
2019-02-12 2:24 ` Naoya Horiguchi
2019-02-12 2:37 ` Mike Kravetz
2019-02-12 22:14 ` [PATCH] huegtlbfs: fix races and page leaks during migration Mike Kravetz
2019-02-14 1:32 ` Mike Kravetz
2019-02-15 15:48 ` Sasha Levin
2019-02-18 21:14 ` Sasha Levin
2019-02-21 6:09 ` Andrew Morton
2019-02-21 19:11 ` Mike Kravetz
2019-02-21 19:47 ` Andrew Morton
2019-02-26 7:44 ` Naoya Horiguchi
2019-02-27 0:35 ` Mike Kravetz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190208023132.GA25778@hori1.linux.bs1.fc.nec.co.jp \
--to=n-horiguchi@ah.jp.nec.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dave@stgolabs.net \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox