Re: 2.6.0-test9-mm2 - AIO tests still gets slab corruption

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Suparna Bhattacharya <suparna@in.ibm.com>
To: Andrew Morton <akpm@osdl.org>
Cc: Daniel McNeil <daniel@osdl.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-aio@kvack.org
Subject: Re: 2.6.0-test9-mm2 - AIO tests still gets slab corruption
Date: Tue, 11 Nov 2003 20:32:29 +0530	[thread overview]
Message-ID: <20031111150229.GA4345@in.ibm.com> (raw)
In-Reply-To: <20031110154232.55eb9b10.akpm@osdl.org>

On Mon, Nov 10, 2003 at 03:42:32PM -0800, Andrew Morton wrote:
> Daniel McNeil <daniel@osdl.org> wrote:
> >
> > Andrew,
> > 
> > test9-mm2 is still getting slab corruption with AIO:
> 
> Why?
> 
> > Maximal retry count.  Bytes done 0
> > Slab corruption: start=dc70f91c, expend=dc70f9eb, problemat=dc70f91c
> > Last user: [<c0192fa3>](__aio_put_req+0xbf/0x200)
> > Data: 00 01 10 00 00 02 20 00 *********6C ******************************A5
> > Next: 71 F0 2C .A3 2F 19 C0 71 F0 2C .********************
> > slab error in check_poison_obj(): cache `kiocb': object was modified after freeing
> > 
> > With suparna's retry-based-aio-dio patch, there are no kernel messages
> > and the tests do not see any uninitialized data.
> > 
> > Any reason not to add suparna's patch to -mm to fix these problems?
> 
> It relies on infrastructure which is not present in Linus's kernel.  We
> should only be interested in fixing mainline 2.6.x.
> 
> Furthermore I'd like to see the direct-vs-buffered locking fixes fully
> implemented against Linus's tree, not -mm.  They're almost there, but are
> not quite complete.  Running off and making it dependent on the retry
> infrastructure is not really helpful.
> 

It was just easier to do this in a non-kludgy way, if we used the
retry infrastructure. Here's why:

For fixing some of the cases, we run into a situation when we've 
already submitted some of the I/O as AIO (with AIO callbacks set up)
by the time we realise that we actually need to wait for that to 
complete synchronously before falling back to buffered i/o (otherwise
we can corrupt file data). 

With the retry model it is only the actual wait that occurs 
differently for AIO and Sync I/O, not the submission. So we 
can simply switch to be synchronous at the latter stage.

Having done that, though, I was actually working on trying 
to find a way to do this that could hold for the mainline as well
(i.e. without using retry infrastructure). The attached patch has 
some special casing tweaks that might do the job; it 
modifies the AIO-DIO callback to wakeup the caller synchronously 
instead of issuing an aio_complete in such situations. 

However the existing aio-dio tests do not seem to exercise some 
of those code paths, so I haven't had a chance to verify 
if it really works for that case (i.e. A single AIO-DIO request 
overwriting an allocated region followed by a hole).

The patch should apply to 2.6.0-test9-mm2.

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Labs, India


--- pure-mm/fs/direct-io.c	2003-10-30 14:22:51.000000000 +0530
+++ linux-2.6.0-test9-mm2/fs/direct-io.c	2003-10-31 17:09:35.000000000 +0530
@@ -209,7 +209,7 @@
  */
 static void dio_complete(struct dio *dio, loff_t offset, ssize_t bytes)
 {
-	if (dio->end_io)
+	if (dio->end_io && dio->result)
 		dio->end_io(dio->inode, offset, bytes, dio->map_bh.b_private);
 	if (dio->needs_locking)
 		up_read(&dio->inode->i_alloc_sem);
@@ -225,8 +225,14 @@
 		if (dio->is_async) {
 			dio_complete(dio, dio->block_in_file << dio->blkbits,
 					dio->result);
-			aio_complete(dio->iocb, dio->result, 0);
-			kfree(dio);
+			/* Complete AIO later if falling back to buffered i/o */
+			if (dio->result != -ENOTBLK) {
+				aio_complete(dio->iocb, dio->result, 0);
+				kfree(dio);
+			} else {
+				if (dio->waiter)
+					wake_up_process(dio->waiter);
+			}
 		}
 	}
 }
@@ -877,8 +883,6 @@
 	int ret2;
 	size_t bytes;
 
-	dio->is_async = !is_sync_kiocb(iocb);
-
 	dio->bio = NULL;
 	dio->inode = inode;
 	dio->rw = rw;
@@ -969,10 +973,11 @@
 		dio_bio_submit(dio);
 
 	/*
-	 * All new block allocations have been performed.  We can let i_sem
-	 * go now.
+	 * All block lookups have been performed. For READ requests
+	 * we can let i_sem go now that its achieved its purpose
+	 * of protecting us from looking up uninitialized blocks.
 	 */
-	if (dio->needs_locking)
+	if ((rw == READ) && dio->needs_locking)
 		up(&dio->inode->i_sem);
 
 	/*
@@ -982,8 +987,30 @@
 	if (dio->is_async) {
 		if (ret == 0)
 			ret = dio->result;	/* Bytes written */
+		if (ret == -ENOTBLK) {
+			/*
+			 * The request will be reissued via buffered I/O
+			 * when we return; Any I/O already issued
+			 * effectively becomes redundant.
+			 */
+			dio->result = ret;
+			dio->waiter = current;
+		}
 		finished_one_bio(dio);		/* This can free the dio */
 		blk_run_queues();
+		if (ret == -ENOTBLK) {
+			/*
+			 * Wait for already issued I/O to drain out and
+			 * release its references to user-space pages
+			 * before returning to fallback on buffered I/O
+			 */
+			while (atomic_read(&dio->bio_count)) {
+				set_current_state(TASK_UNINTERRUPTIBLE);
+				io_schedule();
+			}
+			set_current_state(TASK_RUNNING);
+			dio->waiter = NULL;
+		}
 	} else {
 		finished_one_bio(dio);
 		ret2 = dio_await_completion(dio);
@@ -1003,6 +1029,9 @@
 				ret = i_size - offset;
 		}
 		dio_complete(dio, offset, ret);
+		/* We could have also come here on an AIO file extend */
+		if (!is_sync_kiocb(iocb) && (ret != -ENOTBLK))
+			aio_complete(iocb, ret, 0);
 		kfree(dio);
 	}
 	return ret;
@@ -1029,6 +1058,7 @@
 	unsigned bdev_blkbits = 0;
 	unsigned blocksize_mask = (1 << blkbits) - 1;
 	ssize_t retval = -EINVAL;
+	loff_t end = offset;
 	struct dio *dio;
 	int needs_locking;
 
@@ -1047,6 +1077,7 @@
 	for (seg = 0; seg < nr_segs; seg++) {
 		addr = (unsigned long)iov[seg].iov_base;
 		size = iov[seg].iov_len;
+		end += size;
 		if ((addr & blocksize_mask) || (size & blocksize_mask))  {
 			if (bdev)
 				 blkbits = bdev_blkbits;
@@ -1081,11 +1112,17 @@
 		down_read(&inode->i_alloc_sem);
 	}
 	dio->needs_locking = needs_locking;
+	/*
+	 * For file extending writes updating i_size before data
+	 * writeouts complete can expose uninitialized blocks. So
+	 * even for AIO, we need to wait for i/o to complete before
+	 * returning in this case.
+	 */
+	dio->is_async = !is_sync_kiocb(iocb) && !((rw == WRITE) &&
+		(end > i_size_read(inode)));
 
 	retval = direct_io_worker(rw, iocb, inode, iov, offset,
 				nr_segs, blkbits, get_blocks, end_io, dio);
-	if (needs_locking && rw == WRITE)
-		down(&inode->i_sem);
 out:
 	return retval;
 }
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

next prev parent reply	other threads:[~2003-11-11 15:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-11-05  6:55 2.6.0-test9-mm2 Andrew Morton
2003-11-05 12:30 ` 2.6.0-test9-mm2 Alexander Hoogerhuis
2003-11-05 16:10 ` 2.6.0-test9-mm2 (compile stats) John Cherry
2003-11-05 17:02 ` 2.6.0-test9-mm2 Alistair John Strachan
2003-11-05 23:07 ` 2.6.0-test9-mm2 Martin J. Bligh
2003-11-10 23:06 ` 2.6.0-test9-mm2 - AIO tests still gets slab corruption Daniel McNeil
2003-11-10 23:42   ` Andrew Morton
2003-11-11 15:02     ` Suparna Bhattacharya [this message]
2003-11-12 20:10       ` Daniel McNeil
2003-11-13 11:29         ` Suparna Bhattacharya
2003-11-11 18:01     ` [PATCH 2.6.0-test9] AIO-ref-count.patch Daniel McNeil
2003-11-11 17:25 ` 2.6.0-test9-mm2 Daniel Drake
2003-11-12  1:18   ` 2.6.0-test9-mm2 Nick Piggin
2003-11-12  3:45     ` 2.6.0-test9-mm2 Mike Fedyk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20031111150229.GA4345@in.ibm.com \
    --to=suparna@in.ibm.com \
    --cc=akpm@osdl.org \
    --cc=daniel@osdl.org \
    --cc=linux-aio@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox