Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Erez Zadok <ezk@cs.sunysb.edu>
To: Hugh Dickins <hugh@veritas.com>
Cc: Erez Zadok <ezk@cs.sunysb.edu>, Dave Hansen <haveblue@us.ibm.com>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Ryan Finnie <ryan@finnie.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	cjwatson@ubuntu.com, linux-mm@kvack.org
Subject: Re: msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland
Date: Fri, 9 Nov 2007 01:05:11 -0500	[thread overview]
Message-ID: <200711090605.lA965B1S024066@agora.fsl.cs.sunysb.edu> (raw)
In-Reply-To: Your message of "Mon, 05 Nov 2007 15:40:51 GMT." <Pine.LNX.4.64.0711051358440.7629@blonde.wat.veritas.com>

In message <Pine.LNX.4.64.0711051358440.7629@blonde.wat.veritas.com>, Hugh Dickins writes:
> [Dave, I've Cc'ed you re handle_write_count_underflow, see below.]
> 
> On Wed, 31 Oct 2007, Erez Zadok wrote:
> > 
> > Hi Hugh, I've addressed all of your concerns and am happy to report that the
> > newly revised unionfs_writepage works even better, including under my
> > memory-pressure conditions.  To summarize my changes since the last time:
> > 
> > - I'm only masking __GFP_FS, not __GFP_IO
> > - using find_or_create_page to avoid locking issues around mapping mask
> > - handle for_reclaim case more efficiently
> > - using copy_highpage so we handle KM_USER*
> > - un/locking upper/lower page as/when needed
> > - updated comments to clarify what/why
> > - unionfs_sync_page: gone (yes, vfs.txt did confuse me, plus ecryptfs used
> >   to have it)
> > 
> > Below is the newest version of unionfs_writepage.  Let me know what you
> > think.
> > 
> > I have to say that with these changes, unionfs appears visibly faster under
> > memory pressure.  I suspect the for_reclaim handling is probably the largest
> > contributor to this speedup.
> 
> That's good news, and that unionfs_writepage looks good to me -
> with three reservations I've not observed before.
> 
> One, I think you would be safer to do a set_page_dirty(lower_page)
> before your clear_page_dirty_for_io(lower_page).  I know that sounds
> silly, but see Linus' "Yes, Virginia" comment in clear_page_dirty_for_io:
> there's a lot of subtlety hereabouts, and I think you'd be mimicing the
> usual path closer if you set_page_dirty first - there's nothing else
> doing it on that lower_page, is there?  I'm not certain that you need
> to, but I think you'd do well to look into it and make up your own mind.

Hugh, my code looks like:

	if (wbc->for_reclaim) {
		set_page_dirty(lower_page);
		unlock_page(lower_page);
		goto out_release;
	}
	BUG_ON(!lower_mapping->a_ops->writepage);
	clear_page_dirty_for_io(lower_page); /* emulate VFS behavior */
	err = lower_mapping->a_ops->writepage(lower_page, wbc);

Do you mean I should set_page_dirty(lower_page) unconditionally before
clear_page_dirty_for_io?  (I already do that in the 'if' statement above it.)

> Two, I'm unsure of the way you're clearing or setting PageUptodate on
> the upper page there.  The rules for PageUptodate are fairly obvious
> when reading, but when a write fails, it's not so obvious.  Again, I'm
> not saying what you've got is wrong (it may be unavoidable, to keep
> synch between lower and upper), but it deserves a second thought.

I looked at all mainline filesystems's ->writepage to see what, if any, they
do with their page's uptodate flag.  Most f/s don't touch the flag one way
or another.

cifs_writepage sets the uptodate flag unconditionally: why?

ecryptfs_writepage has a legit reason: if encrypting the page failed, it doesn't want
anyone to use it, so it clears its page's uptodate flag (else it sets it as
uptodate).

hostfs_writepage clears pageuptodate if it failed to write_file(), which I'm
not sure if it makes sense or not.

ntfs_writepage goes as far as doing BUG_ON(!PageUptodate(page)) which
indicates to me that the page passed to ->writepage should always be
uptodate.  Is that a fair statement?

smb_writepage pretty much unconditionally calls SetPageUptodate(page).  Why?

Is there a reason smbfs and cifs both do this unconditionally?  If so, then
why is ntfs calling BUG_ON if the page isn't uptodate?  Either that BUG_ON
in ntfs is redundant, or cifs/smbfs's SetPageUptodate is redundant, but they
can't both be right.

And finally, unionfs clears the uptodate flag on error from the lower
->writepage, and otherwise sets the flag on success from the lower
->writepage.  My gut feeling is that unionfs shouldn't change the page
uptodate flag at all: if the VFS passes unionfs_writepage a page which isn't
uptodate, then the VFS has a serious problem b/c it'd be asking a f/s to
write out a page which isn't up-to-date, right?  Otherwise, whether
unionfs_writepage manages to write the lower page or not, why should that
invalidate the state of the unionfs page itself?  Come to think of it, I
think clearing pageuptodate on error from ->writepage(lower_page) may be
bad.  Imagine if after such a failed unionfs_writepage, I get a
unionfs_readpage: that ->readpage will get data from the lower f/s page and
copy it *over* the unionfs page, even if the upper page's data was more
recent prior to the failed call to unionfs_writepage.  IOW, we could be
reverting a user-visible mmap'ed page to a previous on-disk version.  What
do you think: could this happen?  Anyway, I'll run some exhaustive testing
next and see what happens if I don't set/clear the uptodate flag in
unionfs_writepage.

> Three, I believe you need to add a flush_dcache_page(lower_page)
> after the copy_highpage(lower_page): some architectures will need
> that to see the new data if they have lower_page mapped (though I
> expect it's anyway shaky ground to be accessing through the lower
> mount at the same time as modifying through the upper).

OK.

> For now I'm doing repeated make -j20 kernel builds, pushing into
> swap, in a unionfs mount of just a single dir on tmpfs.  This has
> shown up several problems, two of which I've had to hack around to
> get further.
[...]

Thanks.  I'll look more closely into these issues and your patches, and post
my findings.

Erez.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2007-11-09  6:05 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200710071920.l97JKJX5018871@agora.fsl.cs.sunysb.edu>
2007-10-11 21:47 ` Andrew Morton
2007-10-11 22:12   ` Ryan Finnie
2007-10-12  0:38     ` Hugh Dickins
2007-10-12 21:45       ` Pekka Enberg
2007-10-14  8:44         ` Hugh Dickins
2007-10-14 17:09           ` Pekka Enberg
2007-10-14 17:23             ` Erez Zadok
2007-10-14 17:50               ` Pekka J Enberg
2007-10-14 22:32                 ` Erez Zadok
2007-10-15 11:47                   ` Pekka Enberg
2007-10-16 18:02                     ` Erez Zadok
2007-10-22 20:16                     ` Hugh Dickins
2007-10-22 20:48                       ` Pekka Enberg
2007-10-25 15:36                         ` Hugh Dickins
2007-10-25 16:44                           ` Erez Zadok
2007-10-25 18:23                             ` Hugh Dickins
2007-10-26  2:00                           ` Neil Brown
2007-10-26  8:09                             ` Pekka Enberg
2007-10-26 11:26                             ` Hugh Dickins
2007-10-26  8:05                           ` Pekka Enberg
2007-10-22 21:04                       ` Erez Zadok
2007-10-25 16:40                         ` Hugh Dickins
2007-10-24 21:02                       ` [PATCH] fix tmpfs BUG and AOP_WRITEPAGE_ACTIVATE Hugh Dickins
2007-10-24 21:08                         ` Andrew Morton
2007-10-24 21:37                           ` [PATCH+comment] " Hugh Dickins
2007-10-25  5:37                             ` Pekka Enberg
2007-10-25  6:30                               ` Hugh Dickins
2007-10-25  7:24                                 ` Pekka Enberg
2007-10-25 16:01                                 ` Erez Zadok
2007-10-25 20:51                                   ` H. Peter Anvin
2007-10-22 20:01                   ` msync(2) bug(?), returns AOP_WRITEPAGE_ACTIVATE to userland Hugh Dickins
2007-10-22 20:40                     ` Pekka Enberg
2007-10-22 19:42               ` Hugh Dickins
2007-10-22 21:38                 ` Erez Zadok
2007-10-25 18:03                   ` Hugh Dickins
2007-10-27 20:47                     ` Erez Zadok
2007-10-28 20:23                     ` Erez Zadok
2007-10-29 20:33                       ` Hugh Dickins
2007-10-31 23:53                         ` Erez Zadok
2007-11-05 15:40                           ` Hugh Dickins
2007-11-05 16:38                             ` Dave Hansen
2007-11-05 18:57                               ` Hugh Dickins
2007-11-09  2:47                               ` Erez Zadok
2007-11-09  6:05                             ` Erez Zadok [this message]
2007-11-12  5:41                               ` Hugh Dickins
2007-11-12 17:01                               ` Hugh Dickins
2007-11-13 10:18                                 ` Erez Zadok
2007-11-17 21:24                                   ` Hugh Dickins
2007-11-20  1:30                                     ` Erez Zadok

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200711090605.lA965B1S024066@agora.fsl.cs.sunysb.edu \
    --to=ezk@cs.sunysb.edu \
    --cc=akpm@linux-foundation.org \
    --cc=cjwatson@ubuntu.com \
    --cc=haveblue@us.ibm.com \
    --cc=hugh@veritas.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@cs.helsinki.fi \
    --cc=ryan@finnie.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox