From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <45186DC3.7000902@oracle.com> Date: Mon, 25 Sep 2006 20:01:07 -0400 From: Chuck Lever Reply-To: chuck.lever@oracle.com MIME-Version: 1.0 Subject: Re: Checking page_count(page) in invalidate_complete_page References: <4518333E.2060101@oracle.com> <20060925141036.73f1e2b3.akpm@osdl.org> <45185D7E.6070104@yahoo.com.au> <451862C5.1010900@oracle.com> <45186481.1090306@yahoo.com.au> In-Reply-To: <45186481.1090306@yahoo.com.au> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Nick Piggin Cc: Andrew Morton , Trond Myklebust , Steve Dickson , linux-mm@kvack.org List-ID: Nick Piggin wrote: > Chuck Lever wrote: > >> Nick Piggin wrote: >> >>> Andrew Morton wrote: >>> Also, you can't guarantee anything much about its refcount even then >>> (because it could be on a private reclaim list or pagevec somewhere). >>> >>>> We could retry the invalidation a few times, but that stinks. >>>> >>>> I think invalidate_inode_pages2() is sufficiently different from (ie: >>>> stronger than) invalidate_inode_pages() to justify the addition of a >>>> new >>>> invalidate_complete_page2(), which skips the page refcount check. >>>> >>> >>> Yes, I think that would be possible using the lock_page in do_no_page >>> trick. >>> That would also enable you to invalidate pages that have direct IO going >>> into them, and other weird and wonderful get_user_pages happenings. >>> >>> I haven't thrown away those patches, and I am looking for a >>> justification >>> for them because they make the code look nicer ;) >>> >>> For 2.6.18.stable, Andrew's idea of checking the return value and retry >>> might be the only option. >> >> >> I think allowing callers of invalidate_inode_pages2() to get the >> previous behavior is reasonable here. There are only 2 of them: v9fs >> and the NFS client. > > > > That still reintroduces the page fault race, but if the dumb > check'n'retry is > no good then it may be OK for 2.6.18.stable, considering the page fault > race > is much less common than the reclaim one. Not sure, not my call. The NFS client uses invalidate_inode_pages2 for files, symlinks, and directories. The latter two won't have the do_no_page race since you can't map those types of file objects. For NFS files, the do_no_page race does exist (at least theoretically -- I've never seen a report of such a problem). Most people are not brave enough to use shared mapped files on NFS... so that race may be very rare indeed. > Upstream, it should be fixed properly without re-introducing bugs along the > way. Of course... thanks for sending the history. I'm wondering aloud here, because I'm a VM neophyte. I'm not sure how common the reclaim race is in real environments. For instance, the test I'm running is pretty simple, and I run it just after the client has rebooted. Why is try_to_free_pages being called here? If I knew that I could probably make a better guess about how common this race is. Also, the last get_page() call is from pagevec_strip(). Why do we need to try to strip buffers off of a page that is guaranteed not to have any buffers? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org