From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <49336D26.2060607@google.com> Date: Sun, 30 Nov 2008 20:50:46 -0800 From: Mike Waychison MIME-Version: 1.0 Subject: Re: [RFC v1][PATCH]page_fault retry with NOPAGE_RETRY References: <604427e00811212247k1fe6b63u9efe8cfe37bddfb5@mail.gmail.com> <20081123091843.GK30453@elte.hu> <604427e00811251042t1eebded6k9916212b7c0c2ea0@mail.gmail.com> <20081126123246.GB23649@wotan.suse.de> <492DAA24.8040100@google.com> <20081127085554.GD28285@wotan.suse.de> <492E6849.6090205@google.com> <20081127130817.GP28285@wotan.suse.de> <492EEF0C.9040607@google.com> <20081128093713.GB1818@wotan.suse.de> <49307893.4030708@google.com> <4932EF90.9070601@gmail.com> In-Reply-To: <4932EF90.9070601@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Return-Path: To: =?ISO-8859-1?Q?T=F6r=F6k_Edwin?= Cc: Nick Piggin , Ying Han , Ingo Molnar , linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm , David Rientjes , Rohit Seth , Hugh Dickins , Peter Zijlstra , "H. Peter Anvin" List-ID: Torok Edwin wrote: > On 2008-11-29 01:02, Mike Waychison wrote: >> Nick Piggin wrote: >>> On Thu, Nov 27, 2008 at 11:03:40AM -0800, Mike Waychison wrote: >>>> Nick Piggin wrote: >>>>> On Thu, Nov 27, 2008 at 01:28:41AM -0800, Mike Waychison wrote: >>>>>> Torok however identified mmap taking on the order of several >>>>>> milliseconds due to this exact problem: >>>>>> >>>>>> http://lkml.org/lkml/2008/9/12/185 >>>>> Turns out to be a different problem. >>>>> >>>> What do you mean? >>> His is just contending on the write side. The retry patch doesn't help. >>> >> I disagree. How do you get 'write contention' from the following >> paragraph: >> >> "Just to confirm that the problem is with pagefaults and mmap, I dropped >> the mmap_sem in filemap_fault, and then >> I got same performance in my testprogram for mmap and read. Of course >> this is totally unsafe, because the mapping could change at any time." >> >> It reads to me that the writers were held off by the readers sleeping >> in IO. > > It is true that I have a write/write contention too, but do_page_fault > shows up too on lock_stat. > > This is my guess at what happens: > * filemap_fault used to sleep with mmap_sem held while waiting for the > page lock. > * the google patch avoids that, which is fine: if page lock can't be > taken, it drops mmap_sem, waits, then retries the fault once > * however after we acquired the page lock, mapping->a_ops->readpage is > invoked, mmap_sem is NOT dropped here: > > error = mapping->a_ops->readpage(file, page); > if (!error) { > wait_on_page_locked(page); > > If my understanding is correct ->readpage does the actual disk I/O, and > it keeps the page locked, when the lock is released we know it has finished. > So wait_on_page_locked(page) holds mmap_sem locked for read during the > disk I/O, preventing sys_mmap/sys_munmap from making progress. > > I don't know how to prove/disprove my guess above, suggestions welcome. > > Could the patch be changed to also release the mmap_sem after readpage, > and before wait_on_page_locked? Ya, my suspicion is that there is still some other code path where we are waiting on the locked page with mmap_sem still held. Ying and I will take a closer look this week. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org