From: Jan Stancek <jstancek@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-mm@kvack.org,
kirill shutemov <kirill.shutemov@linux.intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
ltp@lists.linux.it, mhocko@kernel.org,
Rachel Sibley <rasibley@redhat.com>,
hughd@google.com, n-horiguchi@ah.jp.nec.com, aarcange@redhat.com,
aneesh kumar <aneesh.kumar@linux.vnet.ibm.com>,
dave@stgolabs.net, prakash sangappa <prakash.sangappa@oracle.com>,
colin king <colin.king@canonical.com>
Subject: Re: [bug] problems with migration of huge pages with v4.20-10214-ge1ef035d272e
Date: Thu, 3 Jan 2019 12:06:09 -0500 (EST) [thread overview]
Message-ID: <495081357.93179893.1546535169172.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <1808265696.93134171.1546519652798.JavaMail.zimbra@redhat.com>
----- Original Message -----
<snip>
> > That commit does cause BUGs for migration and page poisoning of anon huge
> > pages. The patch was trying to take care of i_mmap_rwsem locking outside
> > try_to_unmap infrastructure. This is because try_to_unmap will take the
> > semaphore in read mode (for file mappings) and we really need it to be
> > taken in write mode.
> >
> > The patch below continues to take the semaphore outside try_to_unmap for
> > the file mapping case. For anon mappings, the locking is done as a special
> > case in try_to_unmap_one. This is something I was trying to avoid as it
> > it harder to follow/understand. Any suggestions on how to restructure this
> > or make it more clear are welcome.
> >
> > Adding Andrew on Cc as he already sent the commit causing the BUGs
> > upstream.
> >
> > From: Mike Kravetz <mike.kravetz@oracle.com>
> >
> > hugetlbfs: fix migration and poisoning of anon huge pages
> >
> > Expanded use of i_mmap_rwsem for pmd sharing synchronization incorrectly
> > used page_mapping() of anon huge pages to get to address_space
> > i_mmap_rwsem. Since page_mapping() is NULL for pages of anon mappings,
> > an "unable to handle kernel NULL pointer" BUG would occur with stack
> > similar to:
> >
> > RIP: 0010:down_write+0x1b/0x40
> > Call Trace:
> > migrate_pages+0x81f/0xb90
> > __ia32_compat_sys_migrate_pages+0x190/0x190
> > do_move_pages_to_node.isra.53.part.54+0x2a/0x50
> > kernel_move_pages+0x566/0x7b0
> > __x64_sys_move_pages+0x24/0x30
> > do_syscall_64+0x5b/0x180
> > entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > To fix, only use page_mapping() for non-anon or file pages. For anon
> > pages wait until we find a vma in which the page is mapped and get the
> > address_space from vm_file.
> >
> > Fixes: b43a99900559 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
> > synchronization")
> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>
> Mike,
>
> 1) with LTP move_pages12 (MAP_PRIVATE version of reproducer)
> Patch below fixes the panic for me.
> It didn't apply cleanly to latest master, but conflicts were easy to resolve.
>
> 2) with MAP_SHARED version of reproducer
> It still hangs in user-space.
> v4.19 kernel appears to work fine so I've started a bisect.
My bisect with MAP_SHARED version arrived at same 2 commits:
c86aa7bbfd55 hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
b43a99900559 hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
Maybe a deadlock between page lock and mapping->i_mmap_rwsem?
thread1:
hugetlbfs_evict_inode
i_mmap_lock_write(mapping);
remove_inode_hugepages
lock_page(page);
thread2:
__unmap_and_move
trylock_page(page) / lock_page(page)
remove_migration_ptes
rmap_walk_file
i_mmap_lock_read(mapping);
Here's strace output:
<snip>
1196 11:27:16 mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = 0x7f646c400000
1197 11:27:16 set_robust_list(0x7f646d5b0e60, 24) = 0
1197 11:27:16 getppid() = 1196
1197 11:27:16 move_pages(1196, 1024, [0x7f646c400000, 0x7f646c401000, 0x7f646c402000, 0x7f646c403000, 0x7f646c404000, 0x7f646c405000, 0x7f646c406000, 0x7f646c407000, 0x7f646c408000, 0x7f646c409000, 0x7f646c40a000, 0x7f646c40b000, 0x7f646c40c000, 0x7f646c40d000, 0x7f646c40e000, 0x7f646c40f000, 0x7f646c410000, 0x7f646c411000, 0x7f646c412000, 0x7f646c413000, 0x7f646c414000, 0x7f646c415000, 0x7f646c416000, 0x7f646c417000, 0x7f646c418000, 0x7f646c419000, 0x7f646c41a000, 0x7f646c41b000, 0x7f646c41c000, 0x7f646c41d000, 0x7f646c41e000, 0x7f646c41f000, ...], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [-ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, ...], MPOL_MF_MOVE_ALL) = 0
1197 11:27:16 move_pages(1196, 1024, [0x7f646c400000, 0x7f646c401000, 0x7f646c402000, 0x7f646c403000, 0x7f646c404000, 0x7f646c405000, 0x7f646c406000, 0x7f646c407000, 0x7f646c408000, 0x7f646c409000, 0x7f646c40a000, 0x7f646c40b000, 0x7f646c40c000, 0x7f646c40d000, 0x7f646c40e000, 0x7f646c40f000, 0x7f646c410000, 0x7f646c411000, 0x7f646c412000, 0x7f646c413000, 0x7f646c414000, 0x7f646c415000, 0x7f646c416000, 0x7f646c417000, 0x7f646c418000, 0x7f646c419000, 0x7f646c41a000, 0x7f646c41b000, 0x7f646c41c000, 0x7f646c41d000, 0x7f646c41e000, 0x7f646c41f000, ...], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...], [1, -EACCES, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...], MPOL_MF_MOVE_ALL) = 0
1197 11:27:16 move_pages(1196, 1024, [0x7f646c400000, 0x7f646c401000, 0x7f646c402000, 0x7f646c403000, 0x7f646c404000, 0x7f646c405000, 0x7f646c406000, 0x7f646c407000, 0x7f646c408000, 0x7f646c409000, 0x7f646c40a000, 0x7f646c40b000, 0x7f646c40c000, 0x7f646c40d000, 0x7f646c40e000, 0x7f646c40f000, 0x7f646c410000, 0x7f646c411000, 0x7f646c412000, 0x7f646c413000, 0x7f646c414000, 0x7f646c415000, 0x7f646c416000, 0x7f646c417000, 0x7f646c418000, 0x7f646c419000, 0x7f646c41a000, 0x7f646c41b000, 0x7f646c41c000, 0x7f646c41d000, 0x7f646c41e000, 0x7f646c41f000, ...], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], <unfinished ...>
1196 11:27:16 munmap(0x7f646c400000, 4194304 <unfinished ...>
<hangs>
Regards,
Jan
next prev parent reply other threads:[~2019-01-03 17:06 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1038135449.92986364.1546459244292.JavaMail.zimbra@redhat.com>
2019-01-02 20:30 ` Jan Stancek
2019-01-02 20:30 ` Jan Stancek
2019-01-02 21:24 ` Mike Kravetz
2019-01-03 1:44 ` Mike Kravetz
2019-01-03 12:47 ` Jan Stancek
2019-01-03 12:47 ` Jan Stancek
2019-01-03 17:06 ` Jan Stancek [this message]
2019-01-03 17:06 ` Jan Stancek
2019-01-03 21:44 ` Mike Kravetz
2019-01-03 21:59 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=495081357.93179893.1546535169172.JavaMail.zimbra@redhat.com \
--to=jstancek@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=colin.king@canonical.com \
--cc=dave@stgolabs.net \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-mm@kvack.org \
--cc=ltp@lists.linux.it \
--cc=mhocko@kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=prakash.sangappa@oracle.com \
--cc=rasibley@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox