linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Sage Weil <sage@newdream.net>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Nick Piggin <npiggin@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org,
	Linux Memory Management List <linux-mm@kvack.org>,
	Chris Mason <chris.mason@oracle.com>
Subject: Re: [patch 2/2] fs: fix page_mkwrite error cases in core code and btrfs
Date: Thu, 12 Mar 2009 16:03:57 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0903121511300.30231@cobra.newdream.net> (raw)
In-Reply-To: <1236895724.7179.71.camel@heimdal.trondhjem.org>

On Thu, 12 Mar 2009, Trond Myklebust wrote:
> On Wed, 2009-03-11 at 04:55 +0100, Nick Piggin wrote:
> > page_mkwrite is called with neither the page lock nor the ptl held. This
> > means a page can be concurrently truncated or invalidated out from underneath
> > it. Callers are supposed to prevent truncate races themselves, however
> > previously the only thing they can do in case they hit one is to raise a
> > SIGBUS. A sigbus is wrong for the case that the page has been invalidated
> > or truncated within i_size (eg. hole punched). Callers may also have to
> > perform memory allocations in this path, where again, SIGBUS would be wrong.
> > 
> > The previous patch made it possible to properly specify errors. Convert
> > the generic buffer.c code and btrfs to return sane error values
> > (in the case of page removed from pagecache, VM_FAULT_NOPAGE will cause the
> > fault handler to exit without doing anything, and the fault will be retried 
> > properly).
> > 
> > This fixes core code, and converts btrfs as a template/example. All other
> > filesystems defining their own page_mkwrite should be fixed in a similar
> > manner.
> 
> There appears to be another atomicity problem in the same area of
> code...
> 
> The lack of locking between the call to ->page_mkwrite() and the
> subsequent call to set_page_dirty_balance() means that the filesystem
> may actually already have written out the page by the time you get round
> to calling set_page_dirty_balance().

We were just banging our heads against this issue last week.

Among other things, if ->set_page_dirty sets up anything in page->private, 
you can get an ->invalidatepage on a non-dirty page (which confused the 
hell out of me until I realized do_wp_page() was calling set_page_dirty 
too).

> How then is the filesystem supposed to guarantee that whatever structure
> it allocated in page_mkwrite() is still around when the page gets marked
> as dirty a second time?

Can page_mkwrite() be made responsible for marking the page dirty, instead 
of doing it from do_wp_page()?  That would allow the fs to do the dirtying 
under the protection of the page lock, or whatever other internal locking 
scheme it has.  That's how the regular write path works, and it would be 
nice to be able to just call write_{begin,end} from ->page_mkwrite() (as 
at least ext4 does) without being followed by a second racy call to 
->set_page_dirty()...

sage

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-03-12 23:04 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-11  3:53 [patch 1/2] mm: page_mkwrite change prototype to match fault Nick Piggin
2009-03-11  3:55 ` [patch 2/2] fs: fix page_mkwrite error cases in core code and btrfs Nick Piggin
2009-03-12 22:08   ` Trond Myklebust
2009-03-12 23:03     ` Sage Weil [this message]
2009-03-13  2:20       ` Nick Piggin
2009-03-13  3:21         ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0903121511300.30231@cobra.newdream.net \
    --to=sage@newdream.net \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox