From: Mike Kravetz <mike.kravetz@oracle.com>
To: Jan Stancek <jstancek@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org,
kirill shutemov <kirill.shutemov@linux.intel.com>,
ltp@lists.linux.it, mhocko@kernel.org,
Rachel Sibley <rasibley@redhat.com>,
hughd@google.com, n-horiguchi@ah.jp.nec.com, aarcange@redhat.com,
aneesh kumar <aneesh.kumar@linux.vnet.ibm.com>,
dave@stgolabs.net, prakash sangappa <prakash.sangappa@oracle.com>,
colin king <colin.king@canonical.com>
Subject: Re: [bug] problems with migration of huge pages with v4.20-10214-ge1ef035d272e
Date: Thu, 3 Jan 2019 13:44:20 -0800 [thread overview]
Message-ID: <6e341052-fe38-b71c-ebb2-47e2e34f5963@oracle.com> (raw)
In-Reply-To: <495081357.93179893.1546535169172.JavaMail.zimbra@redhat.com>
On 1/3/19 9:06 AM, Jan Stancek wrote:
<snip>
>> 1) with LTP move_pages12 (MAP_PRIVATE version of reproducer)
>> Patch below fixes the panic for me.
>> It didn't apply cleanly to latest master, but conflicts were easy to resolve.
>>
>> 2) with MAP_SHARED version of reproducer
>> It still hangs in user-space.
>> v4.19 kernel appears to work fine so I've started a bisect.
>
> My bisect with MAP_SHARED version arrived at same 2 commits:
> c86aa7bbfd55 hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
> b43a99900559 hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
>
> Maybe a deadlock between page lock and mapping->i_mmap_rwsem?
>
> thread1:
> hugetlbfs_evict_inode
> i_mmap_lock_write(mapping);
> remove_inode_hugepages
> lock_page(page);
>
> thread2:
> __unmap_and_move
> trylock_page(page) / lock_page(page)
> remove_migration_ptes
> rmap_walk_file
> i_mmap_lock_read(mapping);
Thanks Jan! That is an ABBA deadlock. :(
Commit c86aa7bbfd55 ("Use i_mmap_rwsem to fix page fault/truncate race") is
the patch which causes remove_inode_hugepages to be called with i_mmap_rwsem
held in write mode. Clearly, i_mmap_rwsem should not be held when calling
remove_inode_hugepages. If you back out that patch, then the deadlock will
go away.
But, the whole point of that patch is to expand the locking so that
remove_inode_hugepages can not race with a page fault. If they do race, then
hugetlbfs specific metadata becomes inconsistent. With some tweaks to
c86aa7bbfd55, I think we could make truncate/page fault races safe. However,
the issue would still exist for hole punch/page fault races. We need some
way to prevent page faults while in remove_inode_hugepages.
Andrew, it might be best to revert these patches. I am not sure if all the
issues with this approach to synchronization can be fixed. To do so would
likely require more 'special case' conditions to code paths. The code is
already difficult to understand. I'd like to step back and take another look
at the best way to fix these problems. As mentioned before, the issues these
patches address have existed for at least 10 years. AFAIK, they have not been
seen in real world use cases. They were discovered via code inspection and
can only be reproduced with highly targeted test programs. So, waiting for
another release cycle to get a better solution might be the best approach.
I will continue to work this, but if you agree that backing out is the best
approach for now please let me know the process. Do I simply send a 'revert'
patch to you and the list?
--
Mike Kravetz
next prev parent reply other threads:[~2019-01-03 21:44 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1038135449.92986364.1546459244292.JavaMail.zimbra@redhat.com>
2019-01-02 20:30 ` Jan Stancek
2019-01-02 20:30 ` Jan Stancek
2019-01-02 21:24 ` Mike Kravetz
2019-01-03 1:44 ` Mike Kravetz
2019-01-03 12:47 ` Jan Stancek
2019-01-03 12:47 ` Jan Stancek
2019-01-03 17:06 ` Jan Stancek
2019-01-03 17:06 ` Jan Stancek
2019-01-03 21:44 ` Mike Kravetz [this message]
2019-01-03 21:59 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6e341052-fe38-b71c-ebb2-47e2e34f5963@oracle.com \
--to=mike.kravetz@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=colin.king@canonical.com \
--cc=dave@stgolabs.net \
--cc=hughd@google.com \
--cc=jstancek@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-mm@kvack.org \
--cc=ltp@lists.linux.it \
--cc=mhocko@kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=prakash.sangappa@oracle.com \
--cc=rasibley@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox