From: Erez Zadok <ezk@cs.sunysb.edu>
To: Hugh Dickins <hugh@veritas.com>
Cc: Erez Zadok <ezk@cs.sunysb.edu>,
Pekka Enberg <penberg@cs.helsinki.fi>,
Ryan Finnie <ryan@finnie.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
cjwatson@ubuntu.com, linux-mm@kvack.org
Subject: Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland
Date: Wed, 31 Oct 2007 19:53:06 -0400 [thread overview]
Message-ID: <200710312353.l9VNr67n013016@agora.fsl.cs.sunysb.edu> (raw)
In-Reply-To: Your message of "Mon, 29 Oct 2007 20:33:45 -0000." <Pine.LNX.4.64.0710292027310.21528@blonde.wat.veritas.com>
Hi Hugh, I've addressed all of your concerns and am happy to report that the
newly revised unionfs_writepage works even better, including under my
memory-pressure conditions. To summarize my changes since the last time:
- I'm only masking __GFP_FS, not __GFP_IO
- using find_or_create_page to avoid locking issues around mapping mask
- handle for_reclaim case more efficiently
- using copy_highpage so we handle KM_USER*
- un/locking upper/lower page as/when needed
- updated comments to clarify what/why
- unionfs_sync_page: gone (yes, vfs.txt did confuse me, plus ecryptfs used
to have it)
Below is the newest version of unionfs_writepage. Let me know what you
think.
I have to say that with these changes, unionfs appears visibly faster under
memory pressure. I suspect the for_reclaim handling is probably the largest
contributor to this speedup.
Many thanks,
Erez.
//////////////////////////////////////////////////////////////////////////////
static int unionfs_writepage(struct page *page, struct writeback_control *wbc)
{
int err = -EIO;
struct inode *inode;
struct inode *lower_inode;
struct page *lower_page;
struct address_space *lower_mapping; /* lower inode mapping */
gfp_t mask;
inode = page->mapping->host;
lower_inode = unionfs_lower_inode(inode);
lower_mapping = lower_inode->i_mapping;
/*
* find lower page (returns a locked page)
*
* We turn off __GFP_FS while we look for or create a new lower
* page. This prevents a recursion into the file system code, which
* under memory pressure conditions could lead to a deadlock. This
* is similar to how the loop driver behaves (see loop_set_fd in
* drivers/block/loop.c). If we can't find the lower page, we
* redirty our page and return "success" so that the VM will call us
* again in the (hopefully near) future.
*/
mask = mapping_gfp_mask(lower_mapping) & ~(__GFP_FS);
lower_page = find_or_create_page(lower_mapping, page->index, mask);
if (!lower_page) {
err = 0;
set_page_dirty(page);
goto out;
}
/* copy page data from our upper page to the lower page */
copy_highpage(lower_page, page);
/*
* Call lower writepage (expects locked page). However, if we are
* called with wbc->for_reclaim, then the VFS/VM just wants to
* reclaim our page. Therefore, we don't need to call the lower
* ->writepage: just copy our data to the lower page (already done
* above), then mark the lower page dirty and unlock it, and return
* success.
*/
if (wbc->for_reclaim) {
set_page_dirty(lower_page);
unlock_page(lower_page);
goto out_release;
}
BUG_ON(!lower_mapping->a_ops->writepage);
clear_page_dirty_for_io(lower_page); /* emulate VFS behavior */
err = lower_mapping->a_ops->writepage(lower_page, wbc);
if (err < 0) {
ClearPageUptodate(page);
goto out_release;
}
/*
* Lower file systems such as ramfs and tmpfs, may return
* AOP_WRITEPAGE_ACTIVATE so that the VM won't try to (pointlessly)
* write the page again for a while. But those lower file systems
* also set the page dirty bit back again. Since we successfully
* copied our page data to the lower page, then the VM will come
* back to the lower page (directly) and try to flush it. So we can
* save the VM the hassle of coming back to our page and trying to
* flush too. Therefore, we don't re-dirty our own page, and we
* never return AOP_WRITEPAGE_ACTIVATE back to the VM (we consider
* this a success).
*
* We also unlock the lower page if the lower ->writepage returned
* AOP_WRITEPAGE_ACTIVATE. (This "anomalous" behaviour may be
* addressed in future shmem/VM code.)
*/
if (err == AOP_WRITEPAGE_ACTIVATE) {
err = 0;
unlock_page(lower_page);
}
/* all is well */
SetPageUptodate(page);
/* lower mtimes have changed: update ours */
unionfs_copy_attr_times(inode);
out_release:
/* b/c find_or_create_page increased refcnt */
page_cache_release(lower_page);
out:
/*
* We unlock our page unconditionally, because we never return
* AOP_WRITEPAGE_ACTIVATE.
*/
unlock_page(page);
return err;
}
//////////////////////////////////////////////////////////////////////////////
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-10-31 23:53 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <200710071920.l97JKJX5018871@agora.fsl.cs.sunysb.edu>
2007-10-11 21:47 ` Andrew Morton
2007-10-11 22:12 ` Ryan Finnie
2007-10-12 0:38 ` Hugh Dickins
2007-10-12 21:45 ` Pekka Enberg
2007-10-14 8:44 ` Hugh Dickins
2007-10-14 17:09 ` Pekka Enberg
2007-10-14 17:23 ` Erez Zadok
2007-10-14 17:50 ` Pekka J Enberg
2007-10-14 22:32 ` Erez Zadok
2007-10-15 11:47 ` Pekka Enberg
2007-10-16 18:02 ` Erez Zadok
2007-10-22 20:16 ` Hugh Dickins
2007-10-22 20:48 ` Pekka Enberg
2007-10-25 15:36 ` Hugh Dickins
2007-10-25 16:44 ` Erez Zadok
2007-10-25 18:23 ` Hugh Dickins
2007-10-26 2:00 ` Neil Brown
2007-10-26 8:09 ` Pekka Enberg
2007-10-26 11:26 ` Hugh Dickins
2007-10-26 8:05 ` Pekka Enberg
2007-10-22 21:04 ` Erez Zadok
2007-10-25 16:40 ` Hugh Dickins
2007-10-24 21:02 ` [PATCH] fix tmpfs BUG and AOP_WRITEPAGE_ACTIVATE Hugh Dickins
2007-10-24 21:08 ` Andrew Morton
2007-10-24 21:37 ` [PATCH+comment] " Hugh Dickins
2007-10-25 5:37 ` Pekka Enberg
2007-10-25 6:30 ` Hugh Dickins
2007-10-25 7:24 ` Pekka Enberg
2007-10-25 16:01 ` Erez Zadok
2007-10-25 20:51 ` H. Peter Anvin
2007-10-22 20:01 ` msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland Hugh Dickins
2007-10-22 20:40 ` Pekka Enberg
2007-10-22 19:42 ` Hugh Dickins
2007-10-22 21:38 ` Erez Zadok
2007-10-25 18:03 ` Hugh Dickins
2007-10-27 20:47 ` Erez Zadok
2007-10-28 20:23 ` Erez Zadok
2007-10-29 20:33 ` Hugh Dickins
2007-10-31 23:53 ` Erez Zadok [this message]
2007-11-05 15:40 ` Hugh Dickins
2007-11-05 16:38 ` Dave Hansen
2007-11-05 18:57 ` Hugh Dickins
2007-11-09 2:47 ` Erez Zadok
2007-11-09 6:05 ` Erez Zadok
2007-11-12 5:41 ` Hugh Dickins
2007-11-12 17:01 ` Hugh Dickins
2007-11-13 10:18 ` Erez Zadok
2007-11-17 21:24 ` Hugh Dickins
2007-11-20 1:30 ` Erez Zadok
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200710312353.l9VNr67n013016@agora.fsl.cs.sunysb.edu \
--to=ezk@cs.sunysb.edu \
--cc=akpm@linux-foundation.org \
--cc=cjwatson@ubuntu.com \
--cc=hugh@veritas.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=penberg@cs.helsinki.fi \
--cc=ryan@finnie.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox