From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: akpm@linux-foundation.org
Cc: sasha.levin@oracle.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, riel@redhat.com
Subject: Re: [PATCH 2/3] mm, hugetlbfs: fix rmapping for anonymous hugepages with page_pgoff()
Date: Thu, 27 Feb 2014 16:53:42 -0500 [thread overview]
Message-ID: <530fb3ee.03cb0e0a.407a.ffffffbcSMTPIN_ADDED_BROKEN@mx.google.com> (raw)
In-Reply-To: <20140227131957.d81cf9a643f4d3fd6b8d8b16@linux-foundation.org>
On Thu, Feb 27, 2014 at 01:19:57PM -0800, Andrew Morton wrote:
> On Wed, 26 Feb 2014 23:39:36 -0500 Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> wrote:
>
> > page->index stores pagecache index when the page is mapped into file mapping
> > region, and the index is in pagecache size unit, so it depends on the page
> > size. Some of users of reverse mapping obviously assumes that page->index
> > is in PAGE_CACHE_SHIFT unit, so they don't work for anonymous hugepage.
> >
> > For example, consider that we have 3-hugepage vma and try to mbind the 2nd
> > hugepage to migrate to another node. Then the vma is split and migrate_page()
> > is called for the 2nd hugepage (belonging to the middle vma.)
> > In migrate operation, rmap_walk_anon() tries to find the relevant vma to
> > which the target hugepage belongs, but here we miscalculate pgoff.
> > So anon_vma_interval_tree_foreach() grabs invalid vma, which fires VM_BUG_ON.
> >
> > This patch introduces a new API that is usable both for normal page and
> > hugepage to get PAGE_SIZE offset from page->index. Users should clearly
> > distinguish page_index for pagecache index and page_pgoff for page offset.
>
> So this patch is really independent of the page-walker changes, but the
> page walker changes need it. So it is appropriate that this patch be
> staged before that series, and separately. Agree?
I think that this patch is independen of page walker patch and vice versa.
But I agree that we can push this patch separately.
> > Reported-by: Sasha Levin <sasha.levin@oracle.com> # if the reported problem is fixed
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Cc: stable@vger.kernel.org # 3.12+
>
> Why cc:stable? This problem will only cause runtime issues when the
> page walker patches are present?
I think that we had this bug after "hugepage migration extension" patches
was merged at 3.12.
I did reproduce this bug on 3.14-rc1 (without page walker patch.)
> > index 4f591df66778..a8bd14f42032 100644
> > --- next-20140220.orig/include/linux/pagemap.h
> > +++ next-20140220/include/linux/pagemap.h
> > @@ -316,6 +316,19 @@ static inline loff_t page_file_offset(struct page *page)
> > return ((loff_t)page_file_index(page)) << PAGE_CACHE_SHIFT;
> > }
> >
> > +extern pgoff_t hugepage_pgoff(struct page *page);
> > +
> > +/*
> > + * page->index stores pagecache index whose unit is not always PAGE_SIZE.
> > + * This function converts it into PAGE_SIZE offset.
> > + */
> > +#define page_pgoff(page) \
> > +({ \
> > + unlikely(PageHuge(page)) ? \
> > + hugepage_pgoff(page) : \
> > + page->index >> (PAGE_CACHE_SHIFT - PAGE_SHIFT); \
> > +})
>
> - I don't think this needs to be implemented in a macro? Can we do
> it in good old C?
Yes, I have a headache on this because defining this here as a simple
C function causes build error.
We can't use PageHuge in include/linux/pagemap.h, but we can in callers'
side (they include linux/hugetlb.h.)
But yes, I think this looks like a dirty workaround, so it's nice if
we could have any better idea.
> - Is PageHuge() the appropriate test?
> /*
> * PageHuge() only returns true for hugetlbfs pages, but not for normal or
> * transparent huge pages. See the PageTransHuge() documentation for more
> * details.
> */
I think yes at least for now, because we have page->index in page unit
for transparent hugepage. And as long as I know, we don't do rmapping for
transparent hugepage (callers should split thps in advancde if they need do
rmapping.) If this situation changes, we will need change on this function.
> - Should page->index be shifted right or left? Probably left - I
> doubt if PAGE_CACHE_SHIFT will ever be less than PAGE_SHIFT.
My big mistake, sorry. It must be a left shift.
> - I'm surprised we don't have a general what-is-this-page's-order
> function, so you can just do
>
> static inline pgoff_t page_pgoff(struct page *page)
> {
> return page->index << page_size_order(page);
> }
>
> And I think this would be a better implementation, as the (new)
> page_size_order() could be used elsewhere.
>
> page_size_order() would be a crappy name - can't think of anything
> better at present.
I can't, either. If nobody suggests a better one, I'll take yours.
Thanks,
Naoya
> > --- next-20140220.orig/mm/memory-failure.c
> > +++ next-20140220/mm/memory-failure.c
> > @@ -404,7 +404,7 @@ static void collect_procs_anon(struct page *page, struct list_head *to_kill,
> > if (av == NULL) /* Not actually mapped anymore */
> > return;
> >
> > - pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
>
> See this did a left shift.
>
> > + pgoff = page_pgoff(page);
> > read_lock(&tasklist_lock);
> > for_each_process (tsk) {
> > struct anon_vma_chain *vmac;
> > @@ -437,7 +437,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
> > mutex_lock(&mapping->i_mmap_mutex);
> > read_lock(&tasklist_lock);
> > for_each_process(tsk) {
> > - pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
>
> him too.
>
> > + pgoff_t pgoff = page_pgoff(page);
> >
> > if (!task_early_kill(tsk))
> > continue;
> > diff --git next-20140220.orig/mm/rmap.c next-20140220/mm/rmap.c
> > index 9056a1f00b87..78405051474a 100644
> > --- next-20140220.orig/mm/rmap.c
> > +++ next-20140220/mm/rmap.c
> > @@ -515,11 +515,7 @@ void page_unlock_anon_vma_read(struct anon_vma *anon_vma)
> > static inline unsigned long
> > __vma_address(struct page *page, struct vm_area_struct *vma)
> > {
> > - pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
>
> And him.
>
> > -
> > - if (unlikely(is_vm_hugetlb_page(vma)))
> > - pgoff = page->index << huge_page_order(page_hstate(page));
> > -
> > + pgoff_t pgoff = page_pgoff(page);
> > return vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
> > }
> >
> > @@ -1598,7 +1594,7 @@ static struct anon_vma *rmap_walk_anon_lock(struct page *page,
> > static int rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc)
> > {
> > struct anon_vma *anon_vma;
> > - pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
>
> again.
>
> > + pgoff_t pgoff = page_pgoff(page);
> > struct anon_vma_chain *avc;
> > int ret = SWAP_AGAIN;
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-02-27 21:53 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-27 4:39 [PATCH 0/3] fixes on page table walker and hugepage rmapping Naoya Horiguchi
2014-02-27 4:39 ` [PATCH 1/3] mm/pagewalk.c: fix end address calculation in walk_page_range() Naoya Horiguchi
2014-02-27 21:03 ` Andrew Morton
2014-02-27 21:19 ` Naoya Horiguchi
2014-02-27 21:20 ` Kirill A. Shutemov
2014-02-27 21:54 ` Naoya Horiguchi
2014-02-27 4:39 ` [PATCH 2/3] mm, hugetlbfs: fix rmapping for anonymous hugepages with page_pgoff() Naoya Horiguchi
2014-02-27 21:19 ` Andrew Morton
2014-02-27 21:53 ` Naoya Horiguchi [this message]
2014-02-28 19:59 ` [PATCH v2] " Naoya Horiguchi
[not found] ` <5310ea8b.c425e00a.2cd9.ffffe097SMTPIN_ADDED_BROKEN@mx.google.com>
2014-02-28 23:14 ` Andrew Morton
2014-03-01 3:35 ` [PATCH v3] " Naoya Horiguchi
[not found] ` <1393644926-49vw3qw9@n-horiguchi@ah.jp.nec.com>
2014-03-01 23:08 ` Sasha Levin
2014-03-03 5:02 ` [PATCH] mm: add pte_present() check on existing hugetlb_entry callbacks Naoya Horiguchi
2014-03-03 20:06 ` Sasha Levin
2014-03-03 21:38 ` Sasha Levin
2014-03-04 21:32 ` Naoya Horiguchi
[not found] ` <1393968743-imrxpynb@n-horiguchi@ah.jp.nec.com>
2014-03-04 22:46 ` Sasha Levin
2014-03-04 23:49 ` Naoya Horiguchi
[not found] ` <1393976967-lnmm5xcs@n-horiguchi@ah.jp.nec.com>
2014-03-06 4:31 ` Sasha Levin
2014-03-06 16:08 ` Naoya Horiguchi
[not found] ` <1394122113-xsq3i6vw@n-horiguchi@ah.jp.nec.com>
2014-03-06 21:16 ` Sasha Levin
2014-03-07 6:35 ` Naoya Horiguchi
2014-03-15 6:45 ` Naoya Horiguchi
2014-02-27 4:39 ` [PATCH 3/3] mm: call vma_adjust_trans_huge() only for thp-enabled vma Naoya Horiguchi
2014-02-27 21:23 ` Andrew Morton
2014-02-27 22:08 ` Naoya Horiguchi
2014-02-27 22:56 ` Kirill A. Shutemov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=530fb3ee.03cb0e0a.407a.ffffffbcSMTPIN_ADDED_BROKEN@mx.google.com \
--to=n-horiguchi@ah.jp.nec.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
--cc=sasha.levin@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox