From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
To: Motohiro Kosaki <Motohiro.Kosaki@us.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>,
Sasha Levin <sasha.levin@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
Wanpeng Li <liwanp@linux.vnet.ibm.com>,
Michel Lespinasse <walken@google.com>,
Bob Liu <bob.liu@oracle.com>, Nick Piggin <npiggin@suse.de>,
Rik van Riel <riel@redhat.com>,
David Rientjes <rientjes@google.com>,
Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan@kernel.org>,
Hugh Dickins <hughd@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
linux-mm <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm/mlock: fix BUG_ON unlocked page for nolinear VMAs
Date: Mon, 6 Jan 2014 17:01:07 -0500 [thread overview]
Message-ID: <CAHGf_=rf7HGD-TBZQQZCCv8iQLC9NN9pbcZ6Mx1Z-LdwrEUsJw@mail.gmail.com> (raw)
In-Reply-To: <6B2BA408B38BA1478B473C31C3D2074E2BF812BC82@SV-EXCHANGE1.Corp.FC.LOCAL>
On Mon, Jan 6, 2014 at 11:47 AM, Motohiro Kosaki
<Motohiro.Kosaki@us.fujitsu.com> wrote:
>
>
>> -----Original Message-----
>> From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus
>> Torvalds
>> Sent: Friday, January 03, 2014 7:18 PM
>> To: Vlastimil Babka
>> Cc: Sasha Levin; Andrew Morton; Wanpeng Li; Michel Lespinasse; Bob Liu;
>> Nick Piggin; Motohiro Kosaki JP; Rik van Riel; David Rientjes; Mel Gorman;
>> Minchan Kim; Hugh Dickins; Johannes Weiner; linux-mm; Linux Kernel Mailing
>> List
>> Subject: Re: [PATCH] mm/mlock: fix BUG_ON unlocked page for nolinear
>> VMAs
>>
>> On Fri, Jan 3, 2014 at 3:36 PM, Vlastimil Babka <vbabka@suse.cz> wrote:
>> >
>> > I'm for going with the removal of BUG_ON. The TestSetPageMlocked
>> > should provide enough race protection.
>>
>> Maybe. But dammit, that's subtle, and I don't think you're even right.
>>
>> It basically depends on mlock_vma_page() and munlock_vma_page() being
>> able to run CONCURRENTLY on the same page. In particular, you could have a
>> mlock_vma_page() set the bit on one CPU, and munlock_vma_page()
>> immediately clearing it on another, and then the rest of those functions
>> could run with a totally arbitrary interleaving when working with the exact
>> same page.
>>
>> They both do basically
>>
>> if (!isolate_lru_page(page))
>> putback_lru_page(page);
>>
>> but one or the other would randomly win the race (it's internally protected
>> by the lru lock), and *if* the munlock_vma_page() wins it, it would also do
>>
>> try_to_munlock(page);
>>
>> but if mlock_vma_page() wins it, that wouldn't happen. That looks entirely
>> broken - you end up with the PageMlocked bit clear, but
>> try_to_munlock() was never called on that page, because
>> mlock_vma_page() got to the page isolation before the "subsequent"
>> munlock_vma_page().
>>
>> And this is very much what the page lock serialization would prevent.
>> So no, the PageMlocked in *no* way gives serialization. It's an atomic bit op,
>> yes, but that only "serializes" in one direction, not when you can have a mix
>> of bit setting and clearing.
>>
>> So quite frankly, I think you're wrong. The BUG_ON() is correct, or at least
>> enforces some kind of ordering. And try_to_unmap_cluster() is just broken
>> in calling that without the page being locked. That's my opinion. There may
>> be some *other* reason why it all happens to work, but no,
>> "TestSetPageMlocked should provide enough race protection" is simply not
>> true, and even if it were, it's way too subtle and odd to be a good rule.
>>
>> So I really object to just removing the BUG_ON(). Not with a *lot* more
>> explanation as to why these kinds of issues wouldn't matter.
>
> I don't have a perfect answer. But I can explain a bit history. Let's me try.
>
> First off, 5 years ago, Lee's original putback_lru_page() implementation required
> page-lock, but I removed the restriction months later. That's why we can see
> strange BUG_ON here.
>
> 5 years ago, both mlock(2) and munlock(2) called do_mlock() and it was protected by
> mmap_sem (write mdoe). Then, mlock and munlock had no race.
> Now, __mm_populate() (called by mlock(2)) is only protected by mmap_sem read-mode. However it is enough to
> protect against munlock.
>
> Next, In case of mlock vs reclaim, the key is that mlock(2) has two step operation. 1) turn on VM_LOCKED under
> mmap_sem write-mode, 2) turn on Page_Mlocked under mmap_sem read-mode. If reclaim race against step (1),
> reclaim must lose because it uses trylock. On the other hand, if reclaim race against step (2), reclaim must detect
> VM_LOCKED because both VM_LOCKED modifier and observer take mmap-sem.
>
> By the way, page isolation is still necessary because we need to protect another page modification like page migration.
>
>
> My memory was alomostly flushed and I might lost some technical concern and past discussion. Please point me out,
> If I am overlooking something.
No. I did talk about completely different issue. My memory is
completely broken as I said. I need to read latest code and dig past
discussion. Sorry again, please ignore my last mail.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-01-06 22:01 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-17 8:05 Wanpeng Li
2013-12-18 3:14 ` Wanpeng Li
2013-12-18 3:16 ` Wanpeng Li
2013-12-18 3:23 ` Wanpeng Li
[not found] ` <20131218032329.GA6044@hacker.(null)>
2013-12-18 3:32 ` Sasha Levin
2013-12-18 4:12 ` Wanpeng Li
2013-12-18 9:11 ` Vlastimil Babka
2013-12-18 9:23 ` Wanpeng Li
[not found] ` <52b1699f.87293c0a.75d1.34d3SMTPIN_ADDED_BROKEN@mx.google.com>
2013-12-18 21:43 ` Andrew Morton
2014-01-03 20:17 ` Sasha Levin
2014-01-03 20:52 ` Linus Torvalds
2014-01-03 23:36 ` Vlastimil Babka
2014-01-03 23:56 ` Andrew Morton
2014-01-04 3:03 ` Sasha Levin
2014-01-04 0:18 ` Linus Torvalds
2014-01-04 8:09 ` Vlastimil Babka
2014-01-05 0:27 ` Wanpeng Li
2014-01-06 16:47 ` Motohiro Kosaki
2014-01-06 22:01 ` KOSAKI Motohiro [this message]
2014-01-07 5:27 ` Wanpeng Li
2014-01-07 15:01 ` Vlastimil Babka
2014-01-08 1:06 ` Bob Liu
2014-01-08 2:44 ` Linus Torvalds
2014-01-10 17:48 ` Motohiro Kosaki
2014-01-13 14:03 ` Vlastimil Babka
2014-01-14 11:05 ` Vlastimil Babka
2014-01-04 3:31 ` Bob Liu
2013-12-18 8:54 ` Vlastimil Babka
2013-12-18 9:01 ` Wanpeng Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHGf_=rf7HGD-TBZQQZCCv8iQLC9NN9pbcZ6Mx1Z-LdwrEUsJw@mail.gmail.com' \
--to=kosaki.motohiro@jp.fujitsu.com \
--cc=Motohiro.Kosaki@us.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=bob.liu@oracle.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liwanp@linux.vnet.ibm.com \
--cc=mgorman@suse.de \
--cc=minchan@kernel.org \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=sasha.levin@oracle.com \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox