Re: [PATCH v3 2/5] mm/hugetlb: take page table lock in follow_huge_pmd()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Hugh Dickins <hughd@google.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	David Rientjes <rientjes@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Naoya Horiguchi <nao.horiguchi@gmail.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v3 2/5] mm/hugetlb: take page table lock in follow_huge_pmd()
Date: Mon, 29 Sep 2014 21:32:20 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LSU.2.11.1409292041540.4640@eggly.anvils> (raw)
In-Reply-To: <1410820799-27278-3-git-send-email-n-horiguchi@ah.jp.nec.com>

On Mon, 15 Sep 2014, Naoya Horiguchi wrote:
> We have a race condition between move_pages() and freeing hugepages,

I've been looking through these 5 today, and they're much better now,
thank you.  But a new concern below, and a minor correction to 3/5.

> --- mmotm-2014-09-09-14-42.orig/mm/gup.c
> +++ mmotm-2014-09-09-14-42/mm/gup.c
> @@ -162,33 +162,16 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
>  
>  	pmd = pmd_offset(pud, address);
>  	if (pmd_none(*pmd))
>  		return no_page_table(vma, flags);
> -	if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) {
> -		page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE);
> -		if (flags & FOLL_GET) {
> -			/*
> -			 * Refcount on tail pages are not well-defined and
> -			 * shouldn't be taken. The caller should handle a NULL
> -			 * return when trying to follow tail pages.
> -			 */
> -			if (PageHead(page))
> -				get_page(page);
> -			else
> -				page = NULL;
> -		}
> -		return page;
> -	}
> +	if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB)
> +		return follow_huge_pmd(mm, address, pmd, flags);

This code here allows for pmd_none() and pmd_huge(), and for pmd_numa()
and pmd_trans_huge() below; but makes no explicit allowance for !present
migration and hwpoison entries.

Is it assumed that the pmd_bad() test in follow_page_pte() will catch
those?  But what of races? migration entries are highly volatile.  And
is it assumed that a migration entry cannot pass the pmd_huge() test?

That may be true of x86 today, I'm not certain; but if the soft-dirty
people catch up with the hugetlb-migration people, that might change
(they #define _PAGE_SWP_SOFT_DIRTY _PAGE_PSE).

Why pmd_huge() does not itself test for present, I cannot say; but it
probably didn't matter at all before hwpoison and migration were added.

Mind you, with __get_user_pages()'s is_vm_hugtlb_page() test avoiding
all this code, maybe the only thing that can stumble here is your own
hugetlb migration code; but that appears to be guarded only by
down_read of mmap_sem, so races would be possible (if userspace
is silly enough or malicious enough to do so).

What we have here today looks too fragile to me, but it's probably
best dealt with by a separate patch.

Or I may be over-anxious, and there may be something "obvious"
that I'm missing, which saves us from further change.

>  	if ((flags & FOLL_NUMA) && pmd_numa(*pmd))
>  		return no_page_table(vma, flags);
>  	if (pmd_trans_huge(*pmd)) {
> diff --git mmotm-2014-09-09-14-42.orig/mm/hugetlb.c mmotm-2014-09-09-14-42/mm/hugetlb.c
> index 34351251e164..941832ee3d5a 100644
> --- mmotm-2014-09-09-14-42.orig/mm/hugetlb.c
> +++ mmotm-2014-09-09-14-42/mm/hugetlb.c
> @@ -3668,26 +3668,34 @@ follow_huge_addr(struct mm_struct *mm, unsigned long address,
>  
>  struct page * __weak
>  follow_huge_pmd(struct mm_struct *mm, unsigned long address,
> -		pmd_t *pmd, int write)
> +		pmd_t *pmd, int flags)
>  {
> -	struct page *page;
> +	struct page *page = NULL;
> +	spinlock_t *ptl;
>  
> -	page = pte_page(*(pte_t *)pmd);
> -	if (page)
> -		page += ((address & ~PMD_MASK) >> PAGE_SHIFT);
> +	ptl = pmd_lockptr(mm, pmd);
> +	spin_lock(ptl);
> +
> +	if (!pmd_huge(*pmd))
> +		goto out;

And similarly here.  Though at least here we now have the necessary
lock, so it's no longer racy, and maybe this pmd_huge() test just needs
to be replaced by a pmd_present() test?  Or are both needed?

> +
> +	page = pte_page(*(pte_t *)pmd) + ((address & ~PMD_MASK) >> PAGE_SHIFT);
> +
> +	if (flags & FOLL_GET)
> +		get_page(page);
> +out:
> +	spin_unlock(ptl);
>  	return page;
>  }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2014-09-30  4:34 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-15 22:39 [PATCH 0/5] hugepage migration fixes (v4) Naoya Horiguchi
2014-09-15 22:39 ` [PATCH v3 1/5] mm/hugetlb: reduce arch dependent code around follow_huge_* Naoya Horiguchi
2014-09-16 15:56   ` Naoya Horiguchi
2014-09-15 22:39 ` [PATCH v3 2/5] mm/hugetlb: take page table lock in follow_huge_pmd() Naoya Horiguchi
2014-09-30  4:32   ` Hugh Dickins [this message]
2014-10-07 16:56     ` Naoya Horiguchi
2014-09-15 22:39 ` [PATCH v3 3/5] mm/hugetlb: fix getting refcount 0 page in hugetlb_fault() Naoya Horiguchi
2014-09-30  4:52   ` Hugh Dickins
2014-10-07 16:56     ` Naoya Horiguchi
2014-09-15 22:39 ` [PATCH v3 4/5] mm/hugetlb: add migration/hwpoisoned entry check in hugetlb_change_protection Naoya Horiguchi
2014-09-15 22:39 ` [PATCH v3 5/5] mm/hugetlb: add migration entry check in __unmap_hugepage_range Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.1409292041540.4640@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=rientjes@google.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox