From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 25 Sep 2006 14:10:36 -0700 From: Andrew Morton Subject: Re: Checking page_count(page) in invalidate_complete_page Message-Id: <20060925141036.73f1e2b3.akpm@osdl.org> In-Reply-To: <4518333E.2060101@oracle.com> References: <4518333E.2060101@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: chuck.lever@oracle.com Cc: Trond Myklebust , Steve Dickson , linux-mm@kvack.org List-ID: (Added linux-mm) On Mon, 25 Sep 2006 15:51:26 -0400 Chuck Lever wrote: > Hi Andrew- > > Steve Dickson and I have independently discovered some cache > invalidation problems in 2.6.18's NFS client. Using git bisect, I was > able to track it back to this commit: > > > commit 016eb4a0ed06a3677d67a584da901f0e9a63c666 > > Author: Andrew Morton > > Date: Fri Sep 8 09:48:38 2006 -0700 > > > > [PATCH] invalidate_complete_page() race fix > > > > If a CPU faults this page into pagetables after invalidate_mapping_pages() > > checked page_mapped(), invalidate_complete_page() will still proceed to remove > > the page from pagecache. This leaves the page-faulting process with a > > detached page. If it was MAP_SHARED then file data loss will ensue. > > > > Fix that up by checking the page's refcount after taking tree_lock. > > > > Cc: Nick Piggin > > Cc: Hugh Dickins > > Cc: > > Signed-off-by: Andrew Morton > > Signed-off-by: Linus Torvalds > > Instrumenting get_page and put_page has shown that the page reclaim > logic is temporarily bumping the page count on otherwise active but idle > pages, racing with the new page_count check in invalidate_complete_page > and preventing pages from being invalidated. > > One problem for the NFS client is that invalidate_inode_pages2 is being > used to invalidate the pages associated with a cached directory. If the > directory pages can't be invalidated because of this race, the contents > of the directory pages don't match the dentry cache, and all kinds of > strange behavior results. NFS is presently ignoring the return value from invalidate_inode_pages2(), in two places. Could I suggest you fix that? Then we'd at least not be seeing "strange behaviour" and things will be easier to diagnose next time. > I haven't checked the data invalidation behavior for regular files, but > the result will be file data corruption that comes and goes. > > Would it be acceptable to revert that page_count(page) != 2 check in > invalidate_complete_page ? Unfortunately not - that patch fixes cramfs failures and potential file corruption. The way to keep memory reclaim away from that page is to take zone->lru_lock, but that's quite impractical for several reasons. We could retry the invalidation a few times, but that stinks. I think invalidate_inode_pages2() is sufficiently different from (ie: stronger than) invalidate_inode_pages() to justify the addition of a new invalidate_complete_page2(), which skips the page refcount check. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org