From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4518589E.1070705@oracle.com> Date: Mon, 25 Sep 2006 18:30:54 -0400 From: Chuck Lever Reply-To: chuck.lever@oracle.com MIME-Version: 1.0 Subject: Re: Checking page_count(page) in invalidate_complete_page References: <4518333E.2060101@oracle.com> <20060925141036.73f1e2b3.akpm@osdl.org> In-Reply-To: <20060925141036.73f1e2b3.akpm@osdl.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Andrew Morton Cc: Trond Myklebust , Steve Dickson , linux-mm@kvack.org List-ID: Andrew Morton wrote: > (Added linux-mm) > > On Mon, 25 Sep 2006 15:51:26 -0400 > Chuck Lever wrote: > >> Hi Andrew- >> >> Steve Dickson and I have independently discovered some cache >> invalidation problems in 2.6.18's NFS client. Using git bisect, I was >> able to track it back to this commit: >> >>> commit 016eb4a0ed06a3677d67a584da901f0e9a63c666 >>> Author: Andrew Morton >>> Date: Fri Sep 8 09:48:38 2006 -0700 >>> >>> [PATCH] invalidate_complete_page() race fix >>> >>> If a CPU faults this page into pagetables after invalidate_mapping_pages() >>> checked page_mapped(), invalidate_complete_page() will still proceed to remove >>> the page from pagecache. This leaves the page-faulting process with a >>> detached page. If it was MAP_SHARED then file data loss will ensue. >>> >>> Fix that up by checking the page's refcount after taking tree_lock. >>> >>> Cc: Nick Piggin >>> Cc: Hugh Dickins >>> Cc: >>> Signed-off-by: Andrew Morton >>> Signed-off-by: Linus Torvalds >> Instrumenting get_page and put_page has shown that the page reclaim >> logic is temporarily bumping the page count on otherwise active but idle >> pages, racing with the new page_count check in invalidate_complete_page >> and preventing pages from being invalidated. >> >> One problem for the NFS client is that invalidate_inode_pages2 is being >> used to invalidate the pages associated with a cached directory. If the >> directory pages can't be invalidated because of this race, the contents >> of the directory pages don't match the dentry cache, and all kinds of >> strange behavior results. > > NFS is presently ignoring the return value from invalidate_inode_pages2(), > in two places. Could I suggest you fix that? Then we'd at least not be > seeing "strange behaviour" and things will be easier to diagnose next time. Yes, it certainly should check the return code there for directories and symlinks. For files, it is expected that some pages could be left valid if they are still being used for other I/O operations. Another place that's missing a return code check is invalidate_inode_pages2_range call in nfs_readdir_filler. >> Would it be acceptable to revert that page_count(page) != 2 check in >> invalidate_complete_page ? > > Unfortunately not - that patch fixes cramfs failures and potential file > corruption. It seems that the NFS client could now safely use a page cache invalidator that would wait for other page users to ensure that every page is invalidated properly, instead of skipping the pages that can't be immediately invalidated. In my opinion that would be the correct fix here for NFS. > The way to keep memory reclaim away from that page is to take > zone->lru_lock, but that's quite impractical for several reasons. > > We could retry the invalidation a few times, but that stinks. > > I think invalidate_inode_pages2() is sufficiently different from (ie: > stronger than) invalidate_inode_pages() to justify the addition of a new > invalidate_complete_page2(), which skips the page refcount check. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org