From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Naoya Horiguchi <nao.horiguchi@gmail.com>
Subject: Re: [PATCH v3 3/6] mm/hugetlb: fix getting refcount 0 page in hugetlb_fault()
Date: Fri, 5 Sep 2014 01:28:27 -0400 [thread overview]
Message-ID: <20140905052827.GB6883@nhori.redhat.com> (raw)
In-Reply-To: <alpine.LSU.2.11.1409031649190.10884@eggly.anvils>
On Wed, Sep 03, 2014 at 05:20:59PM -0700, Hugh Dickins wrote:
> On Thu, 28 Aug 2014, Naoya Horiguchi wrote:
>
> > When running the test which causes the race as shown in the previous patch,
> > we can hit the BUG "get_page() on refcount 0 page" in hugetlb_fault().
> >
> > This race happens when pte turns into migration entry just after the first
> > check of is_hugetlb_entry_migration() in hugetlb_fault() passed with false.
> > To fix this, we need to check pte_present() again with holding ptl.
> >
> > This patch also reorders taking ptl and doing pte_page(), because pte_page()
> > should be done in ptl. Due to this reordering, we need use trylock_page()
> > in page != pagecache_page case to respect locking order.
> >
> > ChangeLog v3:
> > - doing pte_page() and taking refcount under page table lock
> > - check pte_present after taking ptl, which makes it unnecessary to use
> > get_page_unless_zero()
> > - use trylock_page in page != pagecache_page case
> > - fixed target stable version
>
> ChangeLog vN below the --- (or am I contradicting some other advice?)
no, this is a practical advice.
> >
> > Fixes: 66aebce747ea ("hugetlb: fix race condition in hugetlb_fault()")
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Cc: <stable@vger.kernel.org> # [3.2+]
>
> One bug, one warning, a couple of suboptimals...
>
> > ---
> > mm/hugetlb.c | 32 ++++++++++++++++++--------------
> > 1 file changed, 18 insertions(+), 14 deletions(-)
> >
> > diff --git mmotm-2014-08-25-16-52.orig/mm/hugetlb.c mmotm-2014-08-25-16-52/mm/hugetlb.c
> > index c5345c5edb50..2aafe073cb06 100644
> > --- mmotm-2014-08-25-16-52.orig/mm/hugetlb.c
> > +++ mmotm-2014-08-25-16-52/mm/hugetlb.c
> > @@ -3184,6 +3184,15 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> > vma, address);
> > }
> >
> > + ptl = huge_pte_lock(h, mm, ptep);
> > +
> > + /* Check for a racing update before calling hugetlb_cow */
> > + if (unlikely(!pte_same(entry, huge_ptep_get(ptep))))
> > + goto out_ptl;
> > +
> > + if (!pte_present(entry))
> > + goto out_ptl;
>
> A comment on that test would be helpful. Is a migration entry
> the only !pte_present() case you would expect to find there?
No, we can have the same race with hwpoisoned entry, although it's
very rare.
> It would be better to test "entry" for this (or for being a migration
> entry) higher up, just after getting "entry": less to unwind on error.
Right, thanks.
> And better to call migration_entry_wait_huge(), after dropping locks,
> before returning 0, so that we don't keep the cpu busy faulting while
> the migration entry remains there. Maybe not important, but better.
OK.
> Probably best done with a goto unwinding code at end of function.
>
> (Whereas we don't worry about "wait"s in the !pte_same case,
> because !pte_same indicates that change is already occurring:
> it's prolonged pte_same cases that we want to get away from.)
>
> > +
> > /*
> > * hugetlb_cow() requires page locks of pte_page(entry) and
> > * pagecache_page, so here we need take the former one
> > @@ -3192,22 +3201,17 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> > * so no worry about deadlock.
> > */
> > page = pte_page(entry);
> > - get_page(page);
> > if (page != pagecache_page)
> > - lock_page(page);
> > -
> > - ptl = huge_pte_lockptr(h, mm, ptep);
> > - spin_lock(ptl);
> > - /* Check for a racing update before calling hugetlb_cow */
> > - if (unlikely(!pte_same(entry, huge_ptep_get(ptep))))
> > - goto out_ptl;
> > + if (!trylock_page(page))
> > + goto out_ptl;
>
> And, again to avoid keeping the cpu busy refaulting, it would be better
> to wait_on_page_locked(), after dropping locks, before returning 0;
> probably best done with another goto end of function.
OK.
> >
> > + get_page(page);
> >
> > if (flags & FAULT_FLAG_WRITE) {
> > if (!huge_pte_write(entry)) {
> > ret = hugetlb_cow(mm, vma, address, ptep, entry,
> > pagecache_page, ptl);
> > - goto out_ptl;
> > + goto out_put_page;
> > }
> > entry = huge_pte_mkdirty(entry);
> > }
> > @@ -3215,7 +3219,11 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> > if (huge_ptep_set_access_flags(vma, address, ptep, entry,
> > flags & FAULT_FLAG_WRITE))
> > update_mmu_cache(vma, address, ptep);
> > -
> > +out_put_page:
> > + put_page(page);
>
> If I'm reading this correctly, there's now a small but nasty chance that
> this put_page will be the one which frees the page, and the unlock_page
> below will then be unlocking a freed page. Our "Bad page" checks should
> detect that case, so it won't be as serious as unlocking someone else's
> page; but you still should avoid that possibility.
I shouldn't change the order of put_page and unlock_page.
>
> > +out_unlock_page:
>
> mm/hugetlb.c:3231:1: warning: label `out_unlock_page' defined but not used [-Wunused-label]
Sorry, I fix it.
> > + if (page != pagecache_page)
> > + unlock_page(page);
> > out_ptl:
> > spin_unlock(ptl);
> >
> > @@ -3223,10 +3231,6 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> > unlock_page(pagecache_page);
> > put_page(pagecache_page);
> > }
> > - if (page != pagecache_page)
> > - unlock_page(page);
> > - put_page(page);
> > -
> > out_mutex:
> > mutex_unlock(&htlb_fault_mutex_table[hash]);
> > return ret;
> > --
> > 1.9.3
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-09-05 6:03 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-29 1:38 [PATCH 0/6] hugepage migration fixes (v3) Naoya Horiguchi
2014-08-29 1:38 ` [PATCH v3 1/6] mm/hugetlb: reduce arch dependent code around follow_huge_* Naoya Horiguchi
2014-09-03 19:40 ` Hugh Dickins
2014-08-29 1:38 ` [PATCH v3 2/6] mm/hugetlb: take page table lock in follow_huge_(addr|pmd|pud)() Naoya Horiguchi
2014-09-03 21:17 ` Hugh Dickins
2014-09-05 5:27 ` Naoya Horiguchi
2014-09-08 7:13 ` Hugh Dickins
2014-09-08 21:37 ` Naoya Horiguchi
2014-09-09 19:03 ` Hugh Dickins
2014-09-30 16:54 ` Kirill A. Shutemov
2014-08-29 1:38 ` [PATCH v3 3/6] mm/hugetlb: fix getting refcount 0 page in hugetlb_fault() Naoya Horiguchi
2014-09-04 0:20 ` Hugh Dickins
2014-09-05 5:28 ` Naoya Horiguchi [this message]
2014-08-29 1:38 ` [PATCH v3 4/6] mm/hugetlb: add migration entry check in hugetlb_change_protection Naoya Horiguchi
2014-09-04 1:06 ` Hugh Dickins
2014-09-05 5:28 ` Naoya Horiguchi
2014-08-29 1:38 ` [PATCH v3 5/6] mm/hugetlb: add migration entry check in __unmap_hugepage_range Naoya Horiguchi
2014-09-04 1:47 ` Hugh Dickins
2014-09-05 5:28 ` Naoya Horiguchi
2014-08-29 1:39 ` [PATCH v3 6/6] mm/hugetlb: remove unused argument of follow_huge_addr() Naoya Horiguchi
2014-09-03 21:26 ` Hugh Dickins
2014-09-05 5:29 ` Naoya Horiguchi
2014-08-31 15:27 ` [PATCH 0/6] hugepage migration fixes (v3) Andi Kleen
2014-09-01 4:08 ` Naoya Horiguchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140905052827.GB6883@nhori.redhat.com \
--to=n-horiguchi@ah.jp.nec.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nao.horiguchi@gmail.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox