From: Hugh Dickins <hughd@google.com>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: Hugh Dickins <hughd@google.com>, Dave Jones <davej@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Minchan Kim <minchan@kernel.org>,
David Rientjes <rientjes@google.com>,
Andrea Arcangeli <aarcange@redhat.com>,
"H. Peter Anvin" <hpa@zytor.com>, Mel Gorman <mgorman@suse.de>,
linux-mm <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: mm: BUG in do_huge_pmd_wp_page
Date: Wed, 5 Feb 2014 14:50:08 -0800 (PST) [thread overview]
Message-ID: <alpine.LSU.2.11.1402051416220.4008@eggly.anvils> (raw)
In-Reply-To: <52F27F1C.10601@oracle.com>
On Wed, 5 Feb 2014, Sasha Levin wrote:
> On 02/03/2014 10:59 PM, Hugh Dickins wrote:
> > On Mon, 3 Feb 2014, Sasha Levin wrote:
> > >
> > > [ 762.701278] BUG: unable to handle kernel paging request at
> > > ffff88009eae6000
> > > [ 762.702462] IP: [<ffffffff81ae8455>] copy_page_rep+0x5/0x10
> > > [ 762.710135] Call Trace:
> > > [ 762.710135] [<ffffffff81298995>] ? copy_user_huge_page+0x1a5/0x210
> > > [ 762.710135] [<ffffffff812d7260>] do_huge_pmd_wp_page+0x3d0/0x650
> > > [ 762.710135] [<ffffffff811a308e>] ? put_lock_stats+0xe/0x30
> > > [ 762.710135] [<ffffffff8129b511>] __handle_mm_fault+0x2b1/0x3d0
> > > [ 762.710135] [<ffffffff8129b763>] handle_mm_fault+0x133/0x1c0
> > > [ 762.710135] [<ffffffff8129bcf8>] __get_user_pages+0x438/0x630
> > > [ 762.710135] [<ffffffff811a308e>] ? put_lock_stats+0xe/0x30
> > > [ 762.710135] [<ffffffff8129cfc4>] __mlock_vma_pages_range+0xd4/0xe0
> > > [ 762.710135] [<ffffffff8129d0e0>] __mm_populate+0x110/0x190
> > > [ 762.710135] [<ffffffff8129dcd0>] SyS_mlockall+0x160/0x1b0
> > > [ 762.710135] [<ffffffff84450650>] tracesys+0xdd/0xe2
> >
> > Here's what I suggested about that one in eecc1e426d68
> > "thp: fix copy_page_rep GPF by testing is_huge_zero_pmd once only":
> > Note: this is not the same issue as trinity's DEBUG_PAGEALLOC BUG
> > in copy_page_rep with RSI: ffff88009c422000, reported by Sasha Levin
> > in https://lkml.org/lkml/2013/3/29/103. I believe that one is due
> > to the source page being split, and a tail page freed, while copy
> > is in progress; and not a problem without DEBUG_PAGEALLOC, since
> > the pmd_same check will prevent a miscopy from being made visible.
> >
> > It could be fixed by additional locking, or by taking an additional
> > reference on every tail page, in the DEBUG_PAGEALLOC case (we wouldn't
> > want to add to the overhead in the normal case). I didn't feel very
> > motivated to uglify the code in that way just for DEBUG_PAGEALLOC and
> > trinity: if it only comes up once in seven months, I'm inclined to
> > live with it myself, but you may have a different perspective.
>
> Either something changed in the kernel or in trinity, but I'm now hitting it
> 3-4 times a day.
>
> I've been trying to look at the code to understand the issue you've
> described, but I can't pinpoint the exact location where that happen.
>
> Could you please point me to the relevant code sections?
I'm not sure which part of it is unclear.
copy_page_rep (arch/x86/lib/copy_page_64.S) is what copy_user_huge_page
(mm/memory.c) ends up calling, when it's invoked from do_huge_pmd_wp_page
(mm/huge_memory.c). At this point we hold down_read of this mm's mmap_sem,
and a get_page on the head of the THP; but we don't have down_write or
page_table_lock or compound_lock or anon_vma lock, some of which might
prevent concurrent THP splitting (I say "some" and "might" because I've
not gone back to check precisely which are actually relevant here: THP
locking rules are not the simplest...).
Do you accept that the THP might be split while we're copying? And if
that happens, then, for example, there might be a WP fault from another
thread to one of the 4k pages it gets split into, which results in that
particular 4k page being freed after it's been copied (I'm thinking its
refcount demands that it be copied at the time of the fault, but then
the additional ref gets freed - a fork proceeds to exec and frees it,
for example).
When the page is freed, free_pages_prepare (mm/page_alloc.c) calls
kernel_map_pages (arch/x86/mm/pageattr.c if CONFIG_DEBUG_PAGEALLOC) to
unmap the freed page from kernel virtual address space (__set_pages_np).
Hence "unable to handle kernel paging request" when copy_page_rep
reaches that part of what used to be the THP.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-02-05 22:50 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-29 13:04 Sasha Levin
2013-04-04 14:03 ` Sasha Levin
2013-04-04 14:30 ` Kirill A. Shutemov
2013-04-04 14:37 ` Sasha Levin
2013-04-04 16:28 ` Kirill A. Shutemov
2013-04-04 21:54 ` Sasha Levin
2014-04-04 19:37 ` Sasha Levin
2014-04-07 14:48 ` Kirill A. Shutemov
2014-04-07 14:56 ` Sasha Levin
2014-04-07 19:40 ` Sasha Levin
2014-04-07 20:11 ` Kirill A. Shutemov
2014-05-15 17:31 ` Sasha Levin
2014-05-15 17:37 ` Hugh Dickins
2014-05-15 17:43 ` Sasha Levin
2014-05-15 17:58 ` Hugh Dickins
2013-04-10 8:02 ` Minchan Kim
2013-04-11 13:18 ` Kirill A. Shutemov
2013-04-14 7:13 ` Minchan Kim
2013-04-11 14:55 ` Sasha Levin
2013-04-11 15:13 ` Kirill A. Shutemov
2013-04-11 15:14 ` Sasha Levin
2013-04-24 22:46 ` Andrew Morton
2013-04-26 0:51 ` Sasha Levin
2013-04-26 2:01 ` Dave Jones
2013-04-26 3:12 ` Sasha Levin
2014-02-04 3:01 ` Sasha Levin
2014-02-04 3:59 ` Hugh Dickins
2014-02-04 16:58 ` Kirill A. Shutemov
2014-02-05 18:12 ` Sasha Levin
2014-02-05 22:50 ` Hugh Dickins [this message]
2013-04-24 22:51 ` H. Peter Anvin
2013-04-24 23:40 ` Simon Jeons
2013-04-26 2:28 ` H. Peter Anvin
2013-04-26 1:30 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LSU.2.11.1402051416220.4008@eggly.anvils \
--to=hughd@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=davej@redhat.com \
--cc=hpa@zytor.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=minchan@kernel.org \
--cc=rientjes@google.com \
--cc=sasha.levin@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox