From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qc0-f172.google.com (mail-qc0-f172.google.com [209.85.216.172]) by kanga.kvack.org (Postfix) with ESMTP id 2E98A6B0035 for ; Mon, 6 Jan 2014 17:01:30 -0500 (EST) Received: by mail-qc0-f172.google.com with SMTP id e16so18277830qcx.3 for ; Mon, 06 Jan 2014 14:01:29 -0800 (PST) Received: from mail-oa0-x236.google.com (mail-oa0-x236.google.com [2607:f8b0:4003:c02::236]) by mx.google.com with ESMTPS id r6si73169783qaj.127.2014.01.06.14.01.27 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 06 Jan 2014 14:01:28 -0800 (PST) Received: by mail-oa0-f54.google.com with SMTP id o6so2899930oag.13 for ; Mon, 06 Jan 2014 14:01:27 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <6B2BA408B38BA1478B473C31C3D2074E2BF812BC82@SV-EXCHANGE1.Corp.FC.LOCAL> References: <1387267550-8689-1-git-send-email-liwanp@linux.vnet.ibm.com> <52b1138b.0201430a.19a8.605dSMTPIN_ADDED_BROKEN@mx.google.com> <52B11765.8030005@oracle.com> <52b120a5.a3b2440a.3acf.ffffd7c3SMTPIN_ADDED_BROKEN@mx.google.com> <52B166CF.6080300@suse.cz> <52b1699f.87293c0a.75d1.34d3SMTPIN_ADDED_BROKEN@mx.google.com> <20131218134316.977d5049209d9278e1dad225@linux-foundation.org> <52C71ACC.20603@oracle.com> <52C74972.6050909@suse.cz> <6B2BA408B38BA1478B473C31C3D2074E2BF812BC82@SV-EXCHANGE1.Corp.FC.LOCAL> From: KOSAKI Motohiro Date: Mon, 6 Jan 2014 17:01:07 -0500 Message-ID: Subject: Re: [PATCH] mm/mlock: fix BUG_ON unlocked page for nolinear VMAs Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Motohiro Kosaki Cc: Linus Torvalds , Vlastimil Babka , Sasha Levin , Andrew Morton , Wanpeng Li , Michel Lespinasse , Bob Liu , Nick Piggin , Rik van Riel , David Rientjes , Mel Gorman , Minchan Kim , Hugh Dickins , Johannes Weiner , linux-mm , Linux Kernel Mailing List On Mon, Jan 6, 2014 at 11:47 AM, Motohiro Kosaki wrote: > > >> -----Original Message----- >> From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus >> Torvalds >> Sent: Friday, January 03, 2014 7:18 PM >> To: Vlastimil Babka >> Cc: Sasha Levin; Andrew Morton; Wanpeng Li; Michel Lespinasse; Bob Liu; >> Nick Piggin; Motohiro Kosaki JP; Rik van Riel; David Rientjes; Mel Gorman; >> Minchan Kim; Hugh Dickins; Johannes Weiner; linux-mm; Linux Kernel Mailing >> List >> Subject: Re: [PATCH] mm/mlock: fix BUG_ON unlocked page for nolinear >> VMAs >> >> On Fri, Jan 3, 2014 at 3:36 PM, Vlastimil Babka wrote: >> > >> > I'm for going with the removal of BUG_ON. The TestSetPageMlocked >> > should provide enough race protection. >> >> Maybe. But dammit, that's subtle, and I don't think you're even right. >> >> It basically depends on mlock_vma_page() and munlock_vma_page() being >> able to run CONCURRENTLY on the same page. In particular, you could have a >> mlock_vma_page() set the bit on one CPU, and munlock_vma_page() >> immediately clearing it on another, and then the rest of those functions >> could run with a totally arbitrary interleaving when working with the exact >> same page. >> >> They both do basically >> >> if (!isolate_lru_page(page)) >> putback_lru_page(page); >> >> but one or the other would randomly win the race (it's internally protected >> by the lru lock), and *if* the munlock_vma_page() wins it, it would also do >> >> try_to_munlock(page); >> >> but if mlock_vma_page() wins it, that wouldn't happen. That looks entirely >> broken - you end up with the PageMlocked bit clear, but >> try_to_munlock() was never called on that page, because >> mlock_vma_page() got to the page isolation before the "subsequent" >> munlock_vma_page(). >> >> And this is very much what the page lock serialization would prevent. >> So no, the PageMlocked in *no* way gives serialization. It's an atomic bit op, >> yes, but that only "serializes" in one direction, not when you can have a mix >> of bit setting and clearing. >> >> So quite frankly, I think you're wrong. The BUG_ON() is correct, or at least >> enforces some kind of ordering. And try_to_unmap_cluster() is just broken >> in calling that without the page being locked. That's my opinion. There may >> be some *other* reason why it all happens to work, but no, >> "TestSetPageMlocked should provide enough race protection" is simply not >> true, and even if it were, it's way too subtle and odd to be a good rule. >> >> So I really object to just removing the BUG_ON(). Not with a *lot* more >> explanation as to why these kinds of issues wouldn't matter. > > I don't have a perfect answer. But I can explain a bit history. Let's me try. > > First off, 5 years ago, Lee's original putback_lru_page() implementation required > page-lock, but I removed the restriction months later. That's why we can see > strange BUG_ON here. > > 5 years ago, both mlock(2) and munlock(2) called do_mlock() and it was protected by > mmap_sem (write mdoe). Then, mlock and munlock had no race. > Now, __mm_populate() (called by mlock(2)) is only protected by mmap_sem read-mode. However it is enough to > protect against munlock. > > Next, In case of mlock vs reclaim, the key is that mlock(2) has two step operation. 1) turn on VM_LOCKED under > mmap_sem write-mode, 2) turn on Page_Mlocked under mmap_sem read-mode. If reclaim race against step (1), > reclaim must lose because it uses trylock. On the other hand, if reclaim race against step (2), reclaim must detect > VM_LOCKED because both VM_LOCKED modifier and observer take mmap-sem. > > By the way, page isolation is still necessary because we need to protect another page modification like page migration. > > > My memory was alomostly flushed and I might lost some technical concern and past discussion. Please point me out, > If I am overlooking something. No. I did talk about completely different issue. My memory is completely broken as I said. I need to read latest code and dig past discussion. Sorry again, please ignore my last mail. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org