From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 31 Oct 2007 19:53:06 -0400 Message-Id: <200710312353.l9VNr67n013016@agora.fsl.cs.sunysb.edu> From: Erez Zadok Subject: Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland In-reply-to: Your message of "Mon, 29 Oct 2007 20:33:45 -0000." Sender: owner-linux-mm@kvack.org Return-Path: To: Hugh Dickins Cc: Erez Zadok , Pekka Enberg , Ryan Finnie , Andrew Morton , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, cjwatson@ubuntu.com, linux-mm@kvack.org List-ID: Hi Hugh, I've addressed all of your concerns and am happy to report that the newly revised unionfs_writepage works even better, including under my memory-pressure conditions. To summarize my changes since the last time: - I'm only masking __GFP_FS, not __GFP_IO - using find_or_create_page to avoid locking issues around mapping mask - handle for_reclaim case more efficiently - using copy_highpage so we handle KM_USER* - un/locking upper/lower page as/when needed - updated comments to clarify what/why - unionfs_sync_page: gone (yes, vfs.txt did confuse me, plus ecryptfs used to have it) Below is the newest version of unionfs_writepage. Let me know what you think. I have to say that with these changes, unionfs appears visibly faster under memory pressure. I suspect the for_reclaim handling is probably the largest contributor to this speedup. Many thanks, Erez. ////////////////////////////////////////////////////////////////////////////// static int unionfs_writepage(struct page *page, struct writeback_control *wbc) { int err = -EIO; struct inode *inode; struct inode *lower_inode; struct page *lower_page; struct address_space *lower_mapping; /* lower inode mapping */ gfp_t mask; inode = page->mapping->host; lower_inode = unionfs_lower_inode(inode); lower_mapping = lower_inode->i_mapping; /* * find lower page (returns a locked page) * * We turn off __GFP_FS while we look for or create a new lower * page. This prevents a recursion into the file system code, which * under memory pressure conditions could lead to a deadlock. This * is similar to how the loop driver behaves (see loop_set_fd in * drivers/block/loop.c). If we can't find the lower page, we * redirty our page and return "success" so that the VM will call us * again in the (hopefully near) future. */ mask = mapping_gfp_mask(lower_mapping) & ~(__GFP_FS); lower_page = find_or_create_page(lower_mapping, page->index, mask); if (!lower_page) { err = 0; set_page_dirty(page); goto out; } /* copy page data from our upper page to the lower page */ copy_highpage(lower_page, page); /* * Call lower writepage (expects locked page). However, if we are * called with wbc->for_reclaim, then the VFS/VM just wants to * reclaim our page. Therefore, we don't need to call the lower * ->writepage: just copy our data to the lower page (already done * above), then mark the lower page dirty and unlock it, and return * success. */ if (wbc->for_reclaim) { set_page_dirty(lower_page); unlock_page(lower_page); goto out_release; } BUG_ON(!lower_mapping->a_ops->writepage); clear_page_dirty_for_io(lower_page); /* emulate VFS behavior */ err = lower_mapping->a_ops->writepage(lower_page, wbc); if (err < 0) { ClearPageUptodate(page); goto out_release; } /* * Lower file systems such as ramfs and tmpfs, may return * AOP_WRITEPAGE_ACTIVATE so that the VM won't try to (pointlessly) * write the page again for a while. But those lower file systems * also set the page dirty bit back again. Since we successfully * copied our page data to the lower page, then the VM will come * back to the lower page (directly) and try to flush it. So we can * save the VM the hassle of coming back to our page and trying to * flush too. Therefore, we don't re-dirty our own page, and we * never return AOP_WRITEPAGE_ACTIVATE back to the VM (we consider * this a success). * * We also unlock the lower page if the lower ->writepage returned * AOP_WRITEPAGE_ACTIVATE. (This "anomalous" behaviour may be * addressed in future shmem/VM code.) */ if (err == AOP_WRITEPAGE_ACTIVATE) { err = 0; unlock_page(lower_page); } /* all is well */ SetPageUptodate(page); /* lower mtimes have changed: update ours */ unionfs_copy_attr_times(inode); out_release: /* b/c find_or_create_page increased refcnt */ page_cache_release(lower_page); out: /* * We unlock our page unconditionally, because we never return * AOP_WRITEPAGE_ACTIVATE. */ unlock_page(page); return err; } ////////////////////////////////////////////////////////////////////////////// -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org