Re: [PATCH 2/3] mm, hugetlbfs: fix rmapping for anonymous hugepages with page_pgoff()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@linux-foundation.org>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Sasha Levin <sasha.levin@oracle.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH 2/3] mm, hugetlbfs: fix rmapping for anonymous hugepages with page_pgoff()
Date: Thu, 27 Feb 2014 13:19:57 -0800	[thread overview]
Message-ID: <20140227131957.d81cf9a643f4d3fd6b8d8b16@linux-foundation.org> (raw)
In-Reply-To: <1393475977-3381-3-git-send-email-n-horiguchi@ah.jp.nec.com>

On Wed, 26 Feb 2014 23:39:36 -0500 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:

> page->index stores pagecache index when the page is mapped into file mapping
> region, and the index is in pagecache size unit, so it depends on the page
> size. Some of users of reverse mapping obviously assumes that page->index
> is in PAGE_CACHE_SHIFT unit, so they don't work for anonymous hugepage.
> 
> For example, consider that we have 3-hugepage vma and try to mbind the 2nd
> hugepage to migrate to another node. Then the vma is split and migrate_page()
> is called for the 2nd hugepage (belonging to the middle vma.)
> In migrate operation, rmap_walk_anon() tries to find the relevant vma to
> which the target hugepage belongs, but here we miscalculate pgoff.
> So anon_vma_interval_tree_foreach() grabs invalid vma, which fires VM_BUG_ON.
> 
> This patch introduces a new API that is usable both for normal page and
> hugepage to get PAGE_SIZE offset from page->index. Users should clearly
> distinguish page_index for pagecache index and page_pgoff for page offset.

So this patch is really independent of the page-walker changes, but the
page walker changes need it.  So it is appropriate that this patch be
staged before that series, and separately.  Agree?

> Reported-by: Sasha Levin <sasha.levin@oracle.com> # if the reported problem is fixed
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: stable@vger.kernel.org # 3.12+

Why cc:stable?  This problem will only cause runtime issues when the
page walker patches are present?

> index 4f591df66778..a8bd14f42032 100644
> --- next-20140220.orig/include/linux/pagemap.h
> +++ next-20140220/include/linux/pagemap.h
> @@ -316,6 +316,19 @@ static inline loff_t page_file_offset(struct page *page)
>  	return ((loff_t)page_file_index(page)) << PAGE_CACHE_SHIFT;
>  }
>  
> +extern pgoff_t hugepage_pgoff(struct page *page);
> +
> +/*
> + * page->index stores pagecache index whose unit is not always PAGE_SIZE.
> + * This function converts it into PAGE_SIZE offset.
> + */
> +#define page_pgoff(page)					\
> +({								\
> +	unlikely(PageHuge(page)) ?				\
> +		hugepage_pgoff(page) :				\
> +		page->index >> (PAGE_CACHE_SHIFT - PAGE_SHIFT);	\
> +})

- I don't think this needs to be implemented in a macro?  Can we do
  it in good old C?

- Is PageHuge() the appropriate test?
  /*
   * PageHuge() only returns true for hugetlbfs pages, but not for normal or
   * transparent huge pages.  See the PageTransHuge() documentation for more
   * details.
   */

- Should page->index be shifted right or left?  Probably left - I
  doubt if PAGE_CACHE_SHIFT will ever be less than PAGE_SHIFT.

- I'm surprised we don't have a general what-is-this-page's-order
  function, so you can just do

	static inline pgoff_t page_pgoff(struct page *page)
	{
		return page->index << page_size_order(page);
	}

  And I think this would be a better implementation, as the (new)
  page_size_order() could be used elsewhere.

  page_size_order() would be a crappy name - can't think of anything
  better at present.

> --- next-20140220.orig/mm/memory-failure.c
> +++ next-20140220/mm/memory-failure.c
> @@ -404,7 +404,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
>  	if (av == NULL)	/* Not actually mapped anymore */
>  		return;
>  
> -	pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);

See this did a left shift.

> +	pgoff = page_pgoff(page);
>  	read_lock(&tasklist_lock);
>  	for_each_process (tsk) {
>  		struct anon_vma_chain *vmac;
> @@ -437,7 +437,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
>  	mutex_lock(&mapping->i_mmap_mutex);
>  	read_lock(&tasklist_lock);
>  	for_each_process(tsk) {
> -		pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);

him too.

> +		pgoff_t pgoff = page_pgoff(page);
>  
>  		if (!task_early_kill(tsk))
>  			continue;
> diff --git next-20140220.orig/mm/rmap.c next-20140220/mm/rmap.c
> index 9056a1f00b87..78405051474a 100644
> --- next-20140220.orig/mm/rmap.c
> +++ next-20140220/mm/rmap.c
> @@ -515,11 +515,7 @@ void page_unlock_anon_vma_read(struct anon_vma *anon_vma)
>  static inline unsigned long
>  __vma_address(struct page *page, struct vm_area_struct *vma)
>  {
> -	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);

And him.

> -
> -	if (unlikely(is_vm_hugetlb_page(vma)))
> -		pgoff = page->index << huge_page_order(page_hstate(page));
> -
> +	pgoff_t pgoff = page_pgoff(page);
>  	return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
>  }
>  
> @@ -1598,7 +1594,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page,
>  static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
>  {
>  	struct anon_vma *anon_vma;
> -	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);

again.

> +	pgoff_t pgoff = page_pgoff(page);
>  	struct anon_vma_chain *avc;
>  	int ret = SWAP_AGAIN;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2014-02-27 21:20 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-27  4:39 [PATCH 0/3] fixes on page table walker and hugepage rmapping Naoya Horiguchi
2014-02-27  4:39 ` [PATCH 1/3] mm/pagewalk.c: fix end address calculation in walk_page_range() Naoya Horiguchi
2014-02-27 21:03   ` Andrew Morton
2014-02-27 21:19     ` Naoya Horiguchi
2014-02-27 21:20       ` Kirill A. Shutemov
2014-02-27 21:54         ` Naoya Horiguchi
2014-02-27  4:39 ` [PATCH 2/3] mm, hugetlbfs: fix rmapping for anonymous hugepages with page_pgoff() Naoya Horiguchi
2014-02-27 21:19   ` Andrew Morton [this message]
2014-02-27 21:53     ` Naoya Horiguchi
2014-02-28 19:59       ` [PATCH v2] " Naoya Horiguchi
     [not found]       ` <5310ea8b.c425e00a.2cd9.ffffe097SMTPIN_ADDED_BROKEN@mx.google.com>
2014-02-28 23:14         ` Andrew Morton
2014-03-01  3:35           ` [PATCH v3] " Naoya Horiguchi
     [not found]           ` <1393644926-49vw3qw9@n-horiguchi@ah.jp.nec.com>
2014-03-01 23:08             ` Sasha Levin
2014-03-03  5:02               ` [PATCH] mm: add pte_present() check on existing hugetlb_entry callbacks Naoya Horiguchi
2014-03-03 20:06                 ` Sasha Levin
2014-03-03 21:38                   ` Sasha Levin
2014-03-04 21:32                     ` Naoya Horiguchi
     [not found]                     ` <1393968743-imrxpynb@n-horiguchi@ah.jp.nec.com>
2014-03-04 22:46                       ` Sasha Levin
2014-03-04 23:49                         ` Naoya Horiguchi
     [not found]                         ` <1393976967-lnmm5xcs@n-horiguchi@ah.jp.nec.com>
2014-03-06  4:31                           ` Sasha Levin
2014-03-06 16:08                             ` Naoya Horiguchi
     [not found]                             ` <1394122113-xsq3i6vw@n-horiguchi@ah.jp.nec.com>
2014-03-06 21:16                               ` Sasha Levin
2014-03-07  6:35                                 ` Naoya Horiguchi
2014-03-15  6:45                                   ` Naoya Horiguchi
2014-02-27  4:39 ` [PATCH 3/3] mm: call vma_adjust_trans_huge() only for thp-enabled vma Naoya Horiguchi
2014-02-27 21:23   ` Andrew Morton
2014-02-27 22:08     ` Naoya Horiguchi
2014-02-27 22:56   ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140227131957.d81cf9a643f4d3fd6b8d8b16@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox