linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dmitriy Monakhov <dmonakhov@openvz.org>
To: Andrew Morton <akpm@osdl.org>
Cc: Dmitriy Monakhov <dmonakhov@openvz.org>,
	linux-kernel@vger.kernel.org,
	Linux Memory Management <linux-mm@kvack.org>,
	devel@openvz.org, xfs@oss.sgi.com
Subject: Re: [PATCH]  incorrect error handling inside generic_file_direct_write
Date: Wed, 13 Dec 2006 02:14:18 +0300	[thread overview]
Message-ID: <87y7pcn1v9.fsf@sw.ru> (raw)
In-Reply-To: <20061212024027.6c2a79d3.akpm@osdl.org> (Andrew Morton's message of "Tue, 12 Dec 2006 02:40:27 -0800")

[-- Attachment #1: Type: text/plain, Size: 1535 bytes --]

Andrew Morton <akpm@osdl.org> writes:

> On Tue, 12 Dec 2006 16:18:32 +0300
> Dmitriy Monakhov <dmonakhov@sw.ru> wrote:
>
>> >> but according to filemaps locking rules: mm/filemap.c:77
>> >>  ..
>> >>  *  ->i_mutex			(generic_file_buffered_write)
>> >>  *    ->mmap_sem		(fault_in_pages_readable->do_page_fault)
>> >>  ..
>> >> I'm confused a litle bit, where is the truth? 
>> >
>> > xfs_write() calls generic_file_direct_write() without taking i_mutex for
>> > O_DIRECT writes.
>> Yes, but my quastion is about __generic_file_aio_write_nolock().
>> As i understand _nolock sufix means that i_mutex was already locked 
>> by caller, am i right ?
>
> Nope.  It just means that __generic_file_aio_write_nolock() doesn't take
> the lock.  We don't assume or require that the caller took it.  For example
> the raw driver calls generic_file_aio_write_nolock() without taking
> i_mutex.  Raw isn't relevant to the problem (although ocfs2 might be).  But
> we cannot assume that all callers have taken i_mutex, I think.
>
> I guess we can make that a rule (document it, add
> BUG_ON(!mutex_is_locked(..)) if it isn't a blockdev) if needs be.  After
> really checking that this matches reality for all callers.
I've checked generic_file_aio_write_nolock() callers for non blockdev.
Only ocfs2 call it explicitly, and do it under i_mutex.
So we need to do: 
1) Change wrong comments.
2) Add BUG_ON(!mutex_is_locked(..)) for non blkdev.
3) Invoke vmtruncate only for non blkdev.

Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org>
-------

[-- Attachment #2: direct-io-fix.patch --]
[-- Type: text/plain, Size: 2123 bytes --]

diff --git a/mm/filemap.c b/mm/filemap.c
index 7b84dc8..540ef5e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2046,8 +2046,8 @@ generic_file_direct_write(struct kiocb *
 	/*
 	 * Sync the fs metadata but not the minor inode changes and
 	 * of course not the data as we did direct DMA for the IO.
-	 * i_mutex is held, which protects generic_osync_inode() from
-	 * livelocking.
+	 * i_mutex may not being held, if so some specific locking
+	 * ordering must protect generic_osync_inode() from livelocking.
 	 */
 	if (written >= 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
 		int err = generic_osync_inode(inode, mapping, OSYNC_METADATA);
@@ -2282,6 +2282,17 @@ __generic_file_aio_write_nolock(struct k
 
 		written = generic_file_direct_write(iocb, iov, &nr_segs, pos,
 							ppos, count, ocount);
+		/*
+		 * If host is not S_ISBLK generic_file_direct_write() may 
+		 * have instantiated a few blocks outside i_size  files
+		 * Trim these off again.
+		 */
+		if (unlikely(written < 0) && !S_ISBLK(inode->i_mode)) {
+			loff_t isize = i_size_read(inode);
+			if (pos + count > isize)
+				vmtruncate(inode, isize);
+		}
+
 		if (written < 0 || written == count)
 			goto out;
 		/*
@@ -2344,6 +2355,13 @@ ssize_t generic_file_aio_write_nolock(st
 	ssize_t ret;
 
 	BUG_ON(iocb->ki_pos != pos);
+	/*
+	 *  generic_file_buffered_write() may be called inside 
+	 *  __generic_file_aio_write_nolock() even in case of
+	 *  O_DIRECT for non S_ISBLK files. So i_mutex must be held.
+	 */
+	if (!S_ISBLK(inode->i_mode))
+		BUG_ON(!mutex_is_locked(&inode->i_mutex));
 
 	ret = __generic_file_aio_write_nolock(iocb, iov, nr_segs,
 			&iocb->ki_pos);
@@ -2386,8 +2404,8 @@ ssize_t generic_file_aio_write(struct ki
 EXPORT_SYMBOL(generic_file_aio_write);
 
 /*
- * Called under i_mutex for writes to S_ISREG files.   Returns -EIO if something
- * went wrong during pagecache shootdown.
+ * May be called without i_mutex for writes to S_ISREG files.
+ * Returns -EIO if something went wrong during pagecache shootdown.
  */
 static ssize_t
 generic_file_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,

  reply	other threads:[~2006-12-12 23:14 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-11 13:34 Dmitriy Monakhov
2006-12-11 12:38 ` [Devel] " Kirill Korotaev
2006-12-11 20:40 ` Andrew Morton
2006-12-12  9:22   ` Dmitriy Monakhov
2006-12-12  6:36     ` Andrew Morton
2006-12-12 12:20   ` Dmitriy Monakhov
2006-12-12  9:52     ` Andrew Morton
2006-12-12 13:18       ` Dmitriy Monakhov
2006-12-12 10:40         ` Andrew Morton
2006-12-12 23:14           ` Dmitriy Monakhov [this message]
2006-12-13  2:43           ` Chen, Kenneth W
2006-12-15 10:43             ` 'Christoph Hellwig'
2006-12-15 18:53               ` Chen, Kenneth W
2007-01-02 11:17                 ` 'Christoph Hellwig'

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y7pcn1v9.fsf@sw.ru \
    --to=dmonakhov@openvz.org \
    --cc=akpm@osdl.org \
    --cc=devel@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox