Re: Status of buffered write path (deadlock fixes)

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mark Fasheh <mark.fasheh@oracle.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Linux Memory Management <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
	Andrew Morton <akpm@google.com>
Subject: Re: Status of buffered write path (deadlock fixes)
Date: Tue, 12 Dec 2006 14:31:09 -0800	[thread overview]
Message-ID: <20061212223109.GG6831@ca-server1.us.oracle.com> (raw)
In-Reply-To: <457D7EBA.7070005@yahoo.com.au>

On Tue, Dec 12, 2006 at 02:52:26AM +1100, Nick Piggin wrote:
> Nick Piggin wrote:
> >Mark Fasheh wrote:
> 
> >>->commit_write() would probably do fine. Currently, block_prepare_write()
> >>uses it to know which buffers were newly allocated (the file system 
> >>specific
> >>get_block_t sets the bit after allocation). I think we could safely move
> >>the clearing of that bit to block_commit_write(), thus still allowing 
> >>us to
> >>detect and zero those blocks in generic_file_buffered_write()
> >
> >
> >OK, great, I'll make a few patches and see how they look. What did you
> >think of those other uninitialised buffer problems in my first email?
> 
> Hmm, doesn't look like we can do this either because at least GFS2
> uses BH_New for its own special things.
> 
> Also, I don't know if the trick of only walking over BH_New buffers
> will work anyway, since we may still need to release resources on
> other buffers as well.

Oh, my idea was that only the range passed to ->commit() would be walked,
but any BH_New buffers (regardless of where they are in the page) would be
passed to the journal as well. So the logic would be:

for all the buffers in the page:
  If the buffer is new, or it is within the range passed to commit, pass to
  the journal.

Is there anything I'm still missing here?


> As you say, filesystems are simply not set up to expect this, which is a
> problem.
> 
> Maybe it isn't realistic to change the API this way, no matter how
> bad it is presently.

We definitely agree. It's not intuitive that the range should change
between ->prepare_write() and ->commit_write() and IMHO, all the issues
we've found are good evidence that this particular approach will be
problematic.


> What if we tackle the problem a different way?
> 
> 1. In the case of no page in the pagecache (or an otherwise
> !uptodate page), if the operation is a full-page write then we
> first copy all the user data *then* lock the page *then* insert it
> into pagecache and go on to call into the filesystem.

Silly question - what's preventing a reader from filling the !uptodate page with disk
data while the writer is copying the user buffer into it?


> 2. In the case of a !uptodate page and a partial-page write, then
> we can first bring the page uptodate, then continue (goto 3).
> 
> 3. In the case of an uptodate page, we could perform a full-length
> commit_write so long as we didn't expand i_size further than was
> copied, and were sure to trim off blocks allocated past that
> point.
> 
> This scheme IMO is not as "nice" as the partial commit patches,
> but in practical terms it may be much more realistic.

It seems more realistic in that it makes sure the write is properly setup
before calling into the file system. What do you think is not as nice about
it?
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2006-12-12 22:31 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-05  6:52 Nick Piggin
2006-12-07 19:55 ` Mark Fasheh
2006-12-08  3:28   ` Nick Piggin
2006-12-08 23:48     ` Mark Fasheh
2006-12-11  9:11       ` Nick Piggin
2006-12-11 14:20         ` Nick Piggin
2006-12-11 15:52         ` Nick Piggin
2006-12-11 16:12           ` Steven Whitehouse
2006-12-11 16:39             ` Nick Piggin
2006-12-11 17:18               ` Steven Whitehouse
2006-12-12 22:31           ` Mark Fasheh [this message]
2006-12-13  0:53             ` Nick Piggin
2006-12-13  1:47               ` Trond Myklebust
2006-12-13  1:56                 ` Nick Piggin
2006-12-13  2:31                   ` Trond Myklebust
2006-12-13  4:03                     ` Nick Piggin
2006-12-13 12:21                       ` Trond Myklebust
2006-12-13 13:49                     ` Peter Staubach
2006-12-13 13:55                       ` Trond Myklebust
2006-12-11 18:17 ` OGAWA Hirofumi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061212223109.GG6831@ca-server1.us.oracle.com \
    --to=mark.fasheh@oracle.com \
    --cc=akpm@google.com \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox