linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Erez Zadok <ezk@cs.sunysb.edu>
To: Hugh Dickins <hugh@veritas.com>
Cc: Erez Zadok <ezk@cs.sunysb.edu>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Ryan Finnie <ryan@finnie.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	cjwatson@ubuntu.com, linux-mm@kvack.org
Subject: Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland
Date: Sun, 28 Oct 2007 16:23:20 -0400	[thread overview]
Message-ID: <200710282023.l9SKNKK0031790@agora.fsl.cs.sunysb.edu> (raw)
In-Reply-To: Your message of "Thu, 25 Oct 2007 19:03:14 BST." <Pine.LNX.4.64.0710251743190.9834@blonde.wat.veritas.com>

Huge,

I took your advise regarding ~(__GFP_FS|__GFP_IO), AOP_WRITEPAGE_ACTIVATE,
and such.  I revised my unionfs_writepage and unionfs_sync_page, and tested
it under memory pressure: I have a couple of live CDs that use tmpfs and can
deterministically reproduce the conditions resulting in A_W_A.  I also went
back to using grab_cache_page but with the gfp_mask suggestions you made.

I'm happy to report that it all works great now!  Below is the entirety of
the new unionfs_mmap and unionfs_sync_page code.  I'd appreciate if you and
others can look it over and see if you find any problems.

Thanks,
Erez.


static int unionfs_writepage(struct page *page, struct writeback_control *wbc)
{
	int err = -EIO;
	struct inode *inode;
	struct inode *lower_inode;
	struct page *lower_page;
	char *kaddr, *lower_kaddr;
	struct address_space *mapping; /* lower inode mapping */
	gfp_t old_gfp_mask;

	inode = page->mapping->host;
	lower_inode = unionfs_lower_inode(inode);
	mapping = lower_inode->i_mapping;

	/*
	 * find lower page (returns a locked page)
	 *
	 * We turn off __GFP_IO|__GFP_FS so as to prevent a deadlock under
	 * memory pressure conditions.  This is similar to how the loop
	 * driver behaves (see loop_set_fd in drivers/block/loop.c).
	 * If we can't find the lower page, we redirty our page and return
	 * "success" so that the VM will call us again in the (hopefully
	 * near) future.
	 */
	old_gfp_mask = mapping_gfp_mask(mapping);
	mapping_set_gfp_mask(mapping, old_gfp_mask & ~(__GFP_IO|__GFP_FS));

	lower_page = grab_cache_page(mapping, page->index);
	mapping_set_gfp_mask(mapping, old_gfp_mask);

	if (!lower_page) {
		err = 0;
		set_page_dirty(page);
		goto out;
	}

	/* get page address, and encode it */
	kaddr = kmap(page);
	lower_kaddr = kmap(lower_page);

	memcpy(lower_kaddr, kaddr, PAGE_CACHE_SIZE);

	kunmap(page);
	kunmap(lower_page);

	BUG_ON(!mapping->a_ops->writepage);

	/* call lower writepage (expects locked page) */
	clear_page_dirty_for_io(lower_page); /* emulate VFS behavior */
	err = mapping->a_ops->writepage(lower_page, wbc);

	/* b/c grab_cache_page locked it and ->writepage unlocks on success */
	if (err)
		unlock_page(lower_page);
	/* b/c grab_cache_page increased refcnt */
	page_cache_release(lower_page);

	if (err < 0) {
		ClearPageUptodate(page);
		goto out;
	}
	/*
	 * Lower file systems such as ramfs and tmpfs, may return
	 * AOP_WRITEPAGE_ACTIVATE so that the VM won't try to (pointlessly)
	 * write the page again for a while.  But those lower file systems
	 * also set the page dirty bit back again.  Since we successfully
	 * copied our page data to the lower page, then the VM will come
	 * back to the lower page (directly) and try to flush it.  So we can
	 * save the VM the hassle of coming back to our page and trying to
	 * flush too.  Therefore, we don't re-dirty our own page, and we
	 * don't return AOP_WRITEPAGE_ACTIVATE back to the VM (we consider
	 * this a success).
	 */
	if (err == AOP_WRITEPAGE_ACTIVATE)
		err = 0;

	/* all is well */
	SetPageUptodate(page);
	/* lower mtimes has changed: update ours */
	unionfs_copy_attr_times(inode);

	unlock_page(page);

out:
	return err;
}


static void unionfs_sync_page(struct page *page)
{
	struct inode *inode;
	struct inode *lower_inode;
	struct page *lower_page;
	struct address_space *mapping; /* lower inode mapping */
	gfp_t old_gfp_mask;

	inode = page->mapping->host;
	lower_inode = unionfs_lower_inode(inode);
	mapping = lower_inode->i_mapping;

	/*
	 * Find lower page (returns a locked page).
	 *
	 * We turn off __GFP_IO|__GFP_FS so as to prevent a deadlock under
	 * memory pressure conditions.  This is similar to how the loop
	 * driver behaves (see loop_set_fd in drivers/block/loop.c).
	 * If we can't find the lower page, we redirty our page and return
	 * "success" so that the VM will call us again in the (hopefully
	 * near) future.
	 */
	old_gfp_mask = mapping_gfp_mask(mapping);
	mapping_set_gfp_mask(mapping, old_gfp_mask & ~(__GFP_IO|__GFP_FS));

	lower_page = grab_cache_page(mapping, page->index);
	mapping_set_gfp_mask(mapping, old_gfp_mask);

	if (!lower_page) {
		printk(KERN_ERR "unionfs: grab_cache_page failed\n");
		goto out;
	}

	/* do the actual sync */

	/*
	 * XXX: can we optimize ala RAIF and set the lower page to be
	 * discarded after a successful sync_page?
	 */
	if (mapping && mapping->a_ops && mapping->a_ops->sync_page)
		mapping->a_ops->sync_page(lower_page);

	/* b/c grab_cache_page locked it */
	unlock_page(lower_page);
	/* b/c grab_cache_page increased refcnt */
	page_cache_release(lower_page);

out:
	return;
}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2007-10-28 20:23 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200710071920.l97JKJX5018871@agora.fsl.cs.sunysb.edu>
2007-10-11 21:47 ` Andrew Morton
2007-10-11 22:12   ` Ryan Finnie
2007-10-12  0:38     ` Hugh Dickins
2007-10-12 21:45       ` Pekka Enberg
2007-10-14  8:44         ` Hugh Dickins
2007-10-14 17:09           ` Pekka Enberg
2007-10-14 17:23             ` Erez Zadok
2007-10-14 17:50               ` Pekka J Enberg
2007-10-14 22:32                 ` Erez Zadok
2007-10-15 11:47                   ` Pekka Enberg
2007-10-16 18:02                     ` Erez Zadok
2007-10-22 20:16                     ` Hugh Dickins
2007-10-22 20:48                       ` Pekka Enberg
2007-10-25 15:36                         ` Hugh Dickins
2007-10-25 16:44                           ` Erez Zadok
2007-10-25 18:23                             ` Hugh Dickins
2007-10-26  2:00                           ` Neil Brown
2007-10-26  8:09                             ` Pekka Enberg
2007-10-26 11:26                             ` Hugh Dickins
2007-10-26  8:05                           ` Pekka Enberg
2007-10-22 21:04                       ` Erez Zadok
2007-10-25 16:40                         ` Hugh Dickins
2007-10-24 21:02                       ` [PATCH] fix tmpfs BUG and AOP_WRITEPAGE_ACTIVATE Hugh Dickins
2007-10-24 21:08                         ` Andrew Morton
2007-10-24 21:37                           ` [PATCH+comment] " Hugh Dickins
2007-10-25  5:37                             ` Pekka Enberg
2007-10-25  6:30                               ` Hugh Dickins
2007-10-25  7:24                                 ` Pekka Enberg
2007-10-25 16:01                                 ` Erez Zadok
2007-10-25 20:51                                   ` H. Peter Anvin
2007-10-22 20:01                   ` msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland Hugh Dickins
2007-10-22 20:40                     ` Pekka Enberg
2007-10-22 19:42               ` Hugh Dickins
2007-10-22 21:38                 ` Erez Zadok
2007-10-25 18:03                   ` Hugh Dickins
2007-10-27 20:47                     ` Erez Zadok
2007-10-28 20:23                     ` Erez Zadok [this message]
2007-10-29 20:33                       ` Hugh Dickins
2007-10-31 23:53                         ` Erez Zadok
2007-11-05 15:40                           ` Hugh Dickins
2007-11-05 16:38                             ` Dave Hansen
2007-11-05 18:57                               ` Hugh Dickins
2007-11-09  2:47                               ` Erez Zadok
2007-11-09  6:05                             ` Erez Zadok
2007-11-12  5:41                               ` Hugh Dickins
2007-11-12 17:01                               ` Hugh Dickins
2007-11-13 10:18                                 ` Erez Zadok
2007-11-17 21:24                                   ` Hugh Dickins
2007-11-20  1:30                                     ` Erez Zadok

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200710282023.l9SKNKK0031790@agora.fsl.cs.sunysb.edu \
    --to=ezk@cs.sunysb.edu \
    --cc=akpm@linux-foundation.org \
    --cc=cjwatson@ubuntu.com \
    --cc=hugh@veritas.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@cs.helsinki.fi \
    --cc=ryan@finnie.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox