[PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3
@ 2009-11-12 19:30 Mel Gorman
  2009-11-12 19:30 ` [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed Mel Gorman
  2009-11-12 20:27 ` [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Chris Mason
  0 siblings, 2 replies; 16+ messages in thread
From: Mel Gorman @ 2009-11-12 19:30 UTC (permalink / raw)
  To: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker
  Cc: linux-kernel, linux-mm@kvack.org",
	KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List,
	Mel Gorman

Sorry for the long delay in posting another version. Testing is extremely
time-consuming and I wasn't getting to work on this as much as I'd have liked.

Changelog since V2
  o Dropped the kswapd-quickly-notice-high-order patch. In more detailed
    testing, it made latencies even worse as kswapd slept more on high-order
    congestion causing order-0 direct reclaims.
  o Added changes to how congestion_wait() works
  o Added a number of new patches altering the behaviour of reclaim

Since 2.6.31-rc1, there have been an increasing number of GFP_ATOMIC
failures. A significant number of these have been high-order GFP_ATOMIC
failures and while they are generally brushed away, there has been a large
increase in them recently and there are a number of possible areas the
problem could be in - core vm, page writeback and a specific driver. The
bugs affected by this that I am aware of are;

[Bug #14141] order 2 page allocation failures in iwlagn
[Bug #14141] order 2 page allocation failures (generic)
[Bug #14265] ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100
[No BZ ID]   Kernel crash on 2.6.31.x (kcryptd: page allocation failure..)
[No BZ ID]   page allocation failure message kernel 2.6.31.4 (tty-related)

The following are a series of patches that bring the behaviour of reclaim
and the page allocator more in line with 2.6.30.

Patches 1-3 should be tested first. The testing I've done shows that the
page allocator and behaviour of congestion_wait() is more in line with
2.6.30 than the vanilla kernels.

It'd be nice to have 2 more tests, applying each patch on top noting any
behaviour change. i.e. ideally there would be results for

 o patches 1+2+3
 o patches 1+2+3+4
 o patches 1+2+3+4+5

Of course, any tests results are welcome. The rest of the mail is the
results of my own tests.

I've tested against 2.6.31 and 2.6.32-rc6. I've somewhat replicated the
problem in Bug #14141 and believe the other bugs are variations of the same
style of problem. The basic reproduction case was;

1. X86-64 AMD Phenom and X86 P4 booted with mem=512MB. Expectation is
	any machine will do as long as it's 512MB for the size of workload
	involved.

2. A crypted work partition and swap partition was created. On my
   own setup, I gave no passphrase so it'd be easier to activate without
   interaction but there are multiple options. I should have taken better
   notes but the setup goes something like this;

	cryptsetup create -y crypt-partition /dev/sda5
	pvcreate /dev/mapper/crypt-partition
	vgcreate crypt-volume /dev/mapper/crypt-partition
	lvcreate -L 5G -n crypt-logical crypt-volume
	lvcreate -L 2G -n crypt-swap crypt-volume
	mkfs -t ext3 /dev/crypt-volume/crypt-logical
	mkswap /dev/crypt-volume/crypt-swap

3. With the partition mounted on /scratch, I
	cd /scratch
	mkdir music
	git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6

4. On a normal partition, I expand a tarball containing test scripts available at
	http://www.csn.ul.ie/~mel/postings/latency-20091112/latency-tests-with-results.tar.gz

	There are two helper programs that run as part of the test - a fake
	music player and a fake gitk.

	The fake music player uses rsync with bandwidth limits to start
	downloading a music folder from another machine. It's bandwidth
	limited to simulate playing music over NFS. I believe it generates
	similar if not exact traffic to a music player. It occured to be
	afterwards that if one patched ogg123 to print a line when 1/10th
	of a seconds worth of music was played, it could be used as an
	indirect measure of desktop interactivity and help pin down pesky
	"audio skips" bug reports.

	The fake gitk is based on observing roughly what gitk does using
	strace. It loads all the logs into a large buffer and then builds a
	very basic hash map of parent to child commits.  The data is stored
	because it was insufficient just to read the logs. It had to be
	kept in an in-memory buffer to generate swap.  It then discards the
	data and does it over again in a loop for a small number of times
	so the test is finite. When it processes a large number of commits,
	it outputs a line to stdout so that stalls can be observed. Ideal
	behaviour is that commits are read at a constant rate and latencies
	look flat.

	Output from the two programs is piped through another script -
	latency-output. It records how far into the test it was when the
	line was outputted and what the latency was since the last line
	appeared. The latency should always be very smooth. Because pipes
	buffer IO, they are all run by expect_unbuffered which is available
	from expect-dev on Debian at least.

	All the tests are driven via run-test.sh. While the tests run,
	it records the kern.log to track page allocation failures, records
	nr_writeback at regular intervals and tracks Page IO and Swap IO.

5. For running an actual test, a kernel is built, booted, the
	crypted partition activated, lvm restarted,
	/dev/crypt-volume/crypt-logical mounted on /scratch, all
	swap partitions turned off and then the swap partition on
	/dev/crypt-volume/crypt-swap activated. I then run run-test.sh from
	the tarball

6. Run the test script

To evaluate the patches, I considered three basic metrics.

o The length of time it takes fake-gitk to complete on average
o How often and how long fake-gitk stalled for
o How long was spent in congestion_wait

All generated data is in the tarball.

On X86, the results I got were

2.6.30-0000000-force-highorder           Elapsed:10:59.095  Failures:0

2.6.31-0000000-force-highorder           Elapsed:11:53.505  Failures:0
2.6.31-revert-8aa7e847                   Elapsed:14:01.595  Failures:0
2.6.31-0000012-pgalloc-2.6.30            Elapsed:13:32.237  Failures:0
2.6.31-0000123-congestion-both           Elapsed:12:44.170  Failures:0
2.6.31-0001234-kswapd-quick-recheck      Elapsed:10:35.327  Failures:0
2.6.31-0012345-adjust-priority           Elapsed:11:02.995  Failures:0

2.6.32-rc6-0000000-force-highorder       Elapsed:18:18.562  Failures:0
2.6.32-rc6-revert-8aa7e847               Elapsed:10:29.278  Failures:0
2.6.32-rc6-0000012-pgalloc-2.6.30        Elapsed:13:32.393  Failures:0
2.6.32-rc6-0000123-congestion-both       Elapsed:14:55.265  Failures:0
2.6.32-rc6-0001234-kswapd-quick-recheck  Elapsed:13:35.628  Failures:0
2.6.32-rc6-0012345-adjust-priority       Elapsed:12:41.278  Failures:0

The 0000000-force-highorder is a vanilla kernel patched so that network
receive always results in an order-2 allocation. This machine wasn't
suffering page allocation failures even under this circumstance. However,
note how slow 2.6.32-rc6 is and how much the revert helps. With the patches
applied, there is comparable performance.

Latencies were generally reduced with the patches applied. 2.6.32-rc6 was
particularly crazy with long stalls measured over the duration of the test
but has comparable latencies with 2.6.30 with the patches applied.

congestion_wait behaviour is more in line with 2.6.30 after the
patches with similar amounts of time being spent.  In general,
2.6.32-rc6-0012345-adjust-priority waits for longer than 2.6.30 or the
reverted kernels did. It also waits in more instances such as inside
shrink_inactive_list() where it didn't before. Forcing behaviour like 2.6.30
resulted in good figures but I couldn't justify the patches with anything
more solid than "in tests, it behaves well even though it doesn't make a
lot of sense"

On X86-64, the results I got were

2.6.30-0000000-force-highorder           Elapsed:09:48.545  Failures:0

2.6.31-0000000-force-highorder           Elapsed:09:13.020  Failures:0
2.6.31-revert-8aa7e847                   Elapsed:09:02.120  Failures:0
2.6.31-0000012-pgalloc-2.6.30            Elapsed:08:52.742  Failures:0
2.6.31-0000123-congestion-both           Elapsed:08:59.375  Failures:0
2.6.31-0001234-kswapd-quick-recheck      Elapsed:09:19.208  Failures:0
2.6.31-0012345-adjust-priority           Elapsed:09:39.225  Failures:0

2.6.32-rc6-0000000-force-highorder       Elapsed:19:38.585  Failures:5
2.6.32-rc6-revert-8aa7e847               Elapsed:17:21.257  Failures:0
2.6.32-rc6-0000012-pgalloc-2.6.30        Elapsed:18:56.682  Failures:1
2.6.32-rc6-0000123-congestion-both       Elapsed:16:08.340  Failures:0
2.6.32-rc6-0001234-kswapd-quick-recheck  Elapsed:18:11.200  Failures:7
2.6.32-rc6-0012345-adjust-priority       Elapsed:21:33.158  Failures:0

Failures were down and my impression was that it was much harder to cause
failures. Performance on mainline is still not as good as 2.6.30. On
this particular machine, I was able to force performance to be in line
but not with any patch I could justify in the general case.

Latencies were slightly reduced by applying the patches against 2.6.31.
against 2.6.32-rc6, applying the patches significantly reduced the latencies
but they are still significant. I'll continue to investigate what can be
done to improve this further.

Again, congestion_wait() is more in line with 2.6.30 when the patches
are applied. Similarly to X86, almost identical behaviour can be forced
by waiting on BLK_ASYNC_BOTH for each caller to congestion_wait() in the
reclaim and allocator paths.

Bottom line, the patches made triggering allocation failures much harder
and in a number of instances and latencies are reduced when the system
is under load. I will keep looking around this area - particularly the
performance under load on 2.6.32-rc6 but with 2.6.32 almost out the door,
I am releasing what I have now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed
  2009-11-12 19:30 [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Mel Gorman
@ 2009-11-12 19:30 ` Mel Gorman
  2009-11-12 20:27 ` [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Chris Mason
  1 sibling, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2009-11-12 19:30 UTC (permalink / raw)
  To: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker
  Cc: linux-kernel, linux-mm@kvack.org",
	KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List,
	Mel Gorman

If a direct reclaim makes no forward progress, it considers whether it
should go OOM or not. Whether OOM is triggered or not, it may retry the
application afterwards. In times past, this would always wake kswapd as well
but currently, kswapd is not woken up after direct reclaim fails. For order-0
allocations, this makes little difference but if there is a heavy mix of
higher-order allocations that direct reclaim is failing for, it might mean
that kswapd is not rewoken for higher orders as much as it did previously.

This patch wakes up kswapd when an allocation is being retried after a direct
reclaim failure. It would be expected that kswapd is already awake, but
this has the effect of telling kswapd to reclaim at the higher order as well.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/page_alloc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cdcedf6..250d055 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1817,9 +1817,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE)
 		goto nopage;
 
+restart:
 	wake_all_kswapd(order, zonelist, high_zoneidx);
 
-restart:
 	/*
 	 * OK, we're below the kswapd watermark and have kicked background
 	 * reclaim. Now things get more complex, so set up alloc_flags according
-- 
1.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3
  2009-11-12 19:30 [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Mel Gorman
  2009-11-12 19:30 ` [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed Mel Gorman
@ 2009-11-12 20:27 ` Chris Mason
  2009-11-12 22:00   ` Chris Mason
  1 sibling, 1 reply; 16+ messages in thread
From: Chris Mason @ 2009-11-12 20:27 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm,
	KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List

On Thu, Nov 12, 2009 at 07:30:06PM +0000, Mel Gorman wrote:
> Sorry for the long delay in posting another version. Testing is extremely
> time-consuming and I wasn't getting to work on this as much as I'd have liked.
> 
> Changelog since V2
>   o Dropped the kswapd-quickly-notice-high-order patch. In more detailed
>     testing, it made latencies even worse as kswapd slept more on high-order
>     congestion causing order-0 direct reclaims.
>   o Added changes to how congestion_wait() works
>   o Added a number of new patches altering the behaviour of reclaim
> 
> Since 2.6.31-rc1, there have been an increasing number of GFP_ATOMIC
> failures. A significant number of these have been high-order GFP_ATOMIC
> failures and while they are generally brushed away, there has been a large
> increase in them recently and there are a number of possible areas the
> problem could be in - core vm, page writeback and a specific driver. The
> bugs affected by this that I am aware of are;

Thanks for all the time you've spent on this one.  Let me start with
some more questions about the workload ;)

> 2. A crypted work partition and swap partition was created. On my
>    own setup, I gave no passphrase so it'd be easier to activate without
>    interaction but there are multiple options. I should have taken better
>    notes but the setup goes something like this;
> 
> 	cryptsetup create -y crypt-partition /dev/sda5
> 	pvcreate /dev/mapper/crypt-partition
> 	vgcreate crypt-volume /dev/mapper/crypt-partition
> 	lvcreate -L 5G -n crypt-logical crypt-volume
> 	lvcreate -L 2G -n crypt-swap crypt-volume
> 	mkfs -t ext3 /dev/crypt-volume/crypt-logical
> 	mkswap /dev/crypt-volume/crypt-swap
> 
> 3. With the partition mounted on /scratch, I
> 	cd /scratch
> 	mkdir music
> 	git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6
> 
> 4. On a normal partition, I expand a tarball containing test scripts available at
> 	http://www.csn.ul.ie/~mel/postings/latency-20091112/latency-tests-with-results.tar.gz
> 
> 	There are two helper programs that run as part of the test - a fake
> 	music player and a fake gitk.
> 
> 	The fake music player uses rsync with bandwidth limits to start
> 	downloading a music folder from another machine. It's bandwidth
> 	limited to simulate playing music over NFS.

So the workload is gitk reading a git repo and a program reading data
over the network.  Which part of the workload writes to disk?

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3
  2009-11-12 20:27 ` [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Chris Mason
@ 2009-11-12 22:00   ` Chris Mason
  2009-11-13  2:46     ` Chris Mason
                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Chris Mason @ 2009-11-12 22:00 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm,
	KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List

On Thu, Nov 12, 2009 at 03:27:48PM -0500, Chris Mason wrote:
> On Thu, Nov 12, 2009 at 07:30:06PM +0000, Mel Gorman wrote:
> > Sorry for the long delay in posting another version. Testing is extremely
> > time-consuming and I wasn't getting to work on this as much as I'd have liked.
> > 
> > Changelog since V2
> >   o Dropped the kswapd-quickly-notice-high-order patch. In more detailed
> >     testing, it made latencies even worse as kswapd slept more on high-order
> >     congestion causing order-0 direct reclaims.
> >   o Added changes to how congestion_wait() works
> >   o Added a number of new patches altering the behaviour of reclaim
> > 
> > Since 2.6.31-rc1, there have been an increasing number of GFP_ATOMIC
> > failures. A significant number of these have been high-order GFP_ATOMIC
> > failures and while they are generally brushed away, there has been a large
> > increase in them recently and there are a number of possible areas the
> > problem could be in - core vm, page writeback and a specific driver. The
> > bugs affected by this that I am aware of are;
> 
> Thanks for all the time you've spent on this one.  Let me start with
> some more questions about the workload ;)
> 
> So the workload is gitk reading a git repo and a program reading data
> over the network.  Which part of the workload writes to disk?

Sorry for the self reply, I started digging through your data (man,
that's a lot of data ;).  I took another tour through dm-crypt and
things make more sense now.

dm-crypt has two different single threaded workqueues for each dm-crypt
device.  The first one is meant to deal with the actual encryption and
decryption, and the second one is meant to do the IO.

So the path for a write looks something like this:

filesystem -> crypt thread -> encrypt the data -> io thread -> disk

And the path for read looks something like this:

filesystem -> io thread -> disk -> crypt thread -> decrypt data -> FS

One thread does encryption and one thread does IO, and these threads are
shared for reads and writes.  The end result is that all of the sync
reads get stuck behind any async write congestion and all of the async
writes get stuck behind any sync read congestion.

It's almost like you need to check for both sync and async congestion
before you have any hopes of a new IO making progress.

The confusing part is that dm hasn't gotten any worse in this regard
since 2.6.30 but the workload here is generating more sync reads
(hopefully from gitk and swapin) than async writes (from the low
bandwidth rsync).  So in general if you were to change mm/*.c wait
for sync congestion instead of async, things should appear better.

The punch line is that the btrfs guy thinks we can solve all of this with
just one more thread.  If we change dm-crypt to have a thread dedicated
to sync IO and a thread dedicated to async IO the system should smooth
out.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3
  2009-11-12 22:00   ` Chris Mason
@ 2009-11-13  2:46     ` Chris Mason
  2009-11-13 12:58       ` [PATCH] make crypto unplug " Chris Mason
                         ` (2 more replies)
  2009-11-13 13:44     ` Mel Gorman
  2009-11-13 13:44     ` Mel Gorman
  2 siblings, 3 replies; 16+ messages in thread
From: Chris Mason @ 2009-11-13  2:46 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm,
	KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List

On Thu, Nov 12, 2009 at 05:00:05PM -0500, Chris Mason wrote:

[ ...]

> 
> The punch line is that the btrfs guy thinks we can solve all of this with
> just one more thread.  If we change dm-crypt to have a thread dedicated
> to sync IO and a thread dedicated to async IO the system should smooth
> out.

This is pretty likely to set your dm data on fire.  It's only for Mel
who starts his script w/mkfs.

It adds the second thread and more importantly makes sure the kcryptd
thread doesn't get stuck waiting for requests.

-chris

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index ed10381..295ffeb 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -94,6 +94,7 @@ struct crypt_config {
 	struct bio_set *bs;
 
 	struct workqueue_struct *io_queue;
+	struct workqueue_struct *async_io_queue;
 	struct workqueue_struct *crypt_queue;
 
 	/*
@@ -691,7 +692,10 @@ static void kcryptd_queue_io(struct dm_crypt_io *io)
 	struct crypt_config *cc = io->target->private;
 
 	INIT_WORK(&io->work, kcryptd_io);
-	queue_work(cc->io_queue, &io->work);
+	if (io->base_bio->bi_rw & (1 << BIO_RW_SYNCIO))
+		queue_work(cc->io_queue, &io->work);
+	else
+		queue_work(cc->async_io_queue, &io->work);
 }
 
 static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io,
@@ -759,8 +763,7 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
 
 		/* Encryption was already finished, submit io now */
 		if (crypt_finished) {
-			kcryptd_crypt_write_io_submit(io, r, 0);
-
+			kcryptd_crypt_write_io_submit(io, r, 1);
 			/*
 			 * If there was an error, do not try next fragments.
 			 * For async, error is processed in async handler.
@@ -1120,6 +1123,12 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	} else
 		cc->iv_mode = NULL;
 
+	cc->async_io_queue = create_singlethread_workqueue("kcryptd_async_io");
+	if (!cc->async_io_queue) {
+		ti->error = "Couldn't create kcryptd io queue";
+		goto bad_async_io_queue;
+	}
+
 	cc->io_queue = create_singlethread_workqueue("kcryptd_io");
 	if (!cc->io_queue) {
 		ti->error = "Couldn't create kcryptd io queue";
@@ -1139,6 +1148,8 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 bad_crypt_queue:
 	destroy_workqueue(cc->io_queue);
 bad_io_queue:
+	destroy_workqueue(cc->async_io_queue);
+bad_async_io_queue:
 	kfree(cc->iv_mode);
 bad_ivmode_string:
 	dm_put_device(ti, cc->dev);
@@ -1166,6 +1177,7 @@ static void crypt_dtr(struct dm_target *ti)
 	struct crypt_config *cc = (struct crypt_config *) ti->private;
 
 	destroy_workqueue(cc->io_queue);
+	destroy_workqueue(cc->async_io_queue);
 	destroy_workqueue(cc->crypt_queue);
 
 	if (cc->req)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] make crypto unplug fix V3
  2009-11-13  2:46     ` Chris Mason
@ 2009-11-13 12:58       ` Chris Mason
  2009-11-13 17:34         ` Mel Gorman
  2009-11-13 17:34         ` Mel Gorman
  2009-11-16 16:44       ` [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate " Milan Broz
  2009-11-16 16:44       ` Milan Broz
  2 siblings, 2 replies; 16+ messages in thread
From: Chris Mason @ 2009-11-13 12:58 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm,
	KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List

This is still likely to set your dm data on fire.  It is only meant for
testers that start with mkfs and don't have any valuable dm data.

It includes my patch from last night, along with changes to force dm to
unplug when its IO queues empty.

The problem goes like this:

Process: submit read bio
dm: put bio onto work queue
process: unplug
dm: work queue finds bio, does a generic_make_request

The end result is that we miss the unplug completely.  dm-crypt needs to
unplug for sync bios.  This patch also changes it to unplug whenever the
queue is empty, which is far from ideal but better than missing the
unplugs.

This doesn't completely fix io stalls I'm seeing with dm-crypt, but its
my best guess.  If it works, I'll break it up and submit for real to
the dm people.

-chris

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index ed10381..729ae01 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -94,8 +94,12 @@ struct crypt_config {
 	struct bio_set *bs;
 
 	struct workqueue_struct *io_queue;
+	struct workqueue_struct *async_io_queue;
 	struct workqueue_struct *crypt_queue;
 
+	atomic_t sync_bios_in_queue;
+	atomic_t async_bios_in_queue;
+
 	/*
 	 * crypto related data
 	 */
@@ -679,11 +683,29 @@ static void kcryptd_io_write(struct dm_crypt_io *io)
 static void kcryptd_io(struct work_struct *work)
 {
 	struct dm_crypt_io *io = container_of(work, struct dm_crypt_io, work);
+	struct crypt_config *cc = io->target->private;
+	int zero_sync = 0;
+	int zero_async = 0;
+	int was_sync = 0;
+
+	if (io->base_bio->bi_rw & (1 << BIO_RW_SYNCIO)) {
+		zero_sync = atomic_dec_and_test(&cc->sync_bios_in_queue);
+		was_sync = 1;
+	} else
+		zero_async = atomic_dec_and_test(&cc->async_bios_in_queue);
 
 	if (bio_data_dir(io->base_bio) == READ)
 		kcryptd_io_read(io);
 	else
 		kcryptd_io_write(io);
+
+	if ((was_sync && zero_sync) ||
+	    (!was_sync && zero_async &&
+	     atomic_read(&cc->sync_bios_in_queue) == 0)) {
+		struct backing_dev_info *bdi;
+		bdi = blk_get_backing_dev_info(io->base_bio->bi_bdev);
+		blk_run_backing_dev(bdi, NULL);
+	}
 }
 
 static void kcryptd_queue_io(struct dm_crypt_io *io)
@@ -691,7 +713,13 @@ static void kcryptd_queue_io(struct dm_crypt_io *io)
 	struct crypt_config *cc = io->target->private;
 
 	INIT_WORK(&io->work, kcryptd_io);
-	queue_work(cc->io_queue, &io->work);
+	if (io->base_bio->bi_rw & (1 << BIO_RW_SYNCIO)) {
+		atomic_inc(&cc->sync_bios_in_queue);
+		queue_work(cc->io_queue, &io->work);
+	} else {
+		atomic_inc(&cc->async_bios_in_queue);
+		queue_work(cc->async_io_queue, &io->work);
+	}
 }
 
 static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io,
@@ -759,8 +787,7 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
 
 		/* Encryption was already finished, submit io now */
 		if (crypt_finished) {
-			kcryptd_crypt_write_io_submit(io, r, 0);
-
+			kcryptd_crypt_write_io_submit(io, r, 1);
 			/*
 			 * If there was an error, do not try next fragments.
 			 * For async, error is processed in async handler.
@@ -1120,6 +1147,15 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	} else
 		cc->iv_mode = NULL;
 
+	atomic_set(&cc->sync_bios_in_queue, 0);
+	atomic_set(&cc->async_bios_in_queue, 0);
+
+	cc->async_io_queue = create_singlethread_workqueue("kcryptd_async_io");
+	if (!cc->async_io_queue) {
+		ti->error = "Couldn't create kcryptd io queue";
+		goto bad_async_io_queue;
+	}
+
 	cc->io_queue = create_singlethread_workqueue("kcryptd_io");
 	if (!cc->io_queue) {
 		ti->error = "Couldn't create kcryptd io queue";
@@ -1139,6 +1175,8 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 bad_crypt_queue:
 	destroy_workqueue(cc->io_queue);
 bad_io_queue:
+	destroy_workqueue(cc->async_io_queue);
+bad_async_io_queue:
 	kfree(cc->iv_mode);
 bad_ivmode_string:
 	dm_put_device(ti, cc->dev);
@@ -1166,6 +1204,7 @@ static void crypt_dtr(struct dm_target *ti)
 	struct crypt_config *cc = (struct crypt_config *) ti->private;
 
 	destroy_workqueue(cc->io_queue);
+	destroy_workqueue(cc->async_io_queue);
 	destroy_workqueue(cc->crypt_queue);
 
 	if (cc->req)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3
  2009-11-12 22:00   ` Chris Mason
  2009-11-13  2:46     ` Chris Mason
  2009-11-13 13:44     ` Mel Gorman
@ 2009-11-13 13:44     ` Mel Gorman
  2 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2009-11-13 13:44 UTC (permalink / raw)
  To: Chris Mason, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm,
	KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List

On Thu, Nov 12, 2009 at 05:00:05PM -0500, Chris Mason wrote:
> On Thu, Nov 12, 2009 at 03:27:48PM -0500, Chris Mason wrote:
> > On Thu, Nov 12, 2009 at 07:30:06PM +0000, Mel Gorman wrote:
> > > Sorry for the long delay in posting another version. Testing is extremely
> > > time-consuming and I wasn't getting to work on this as much as I'd have liked.
> > > 
> > > Changelog since V2
> > >   o Dropped the kswapd-quickly-notice-high-order patch. In more detailed
> > >     testing, it made latencies even worse as kswapd slept more on high-order
> > >     congestion causing order-0 direct reclaims.
> > >   o Added changes to how congestion_wait() works
> > >   o Added a number of new patches altering the behaviour of reclaim
> > > 
> > > Since 2.6.31-rc1, there have been an increasing number of GFP_ATOMIC
> > > failures. A significant number of these have been high-order GFP_ATOMIC
> > > failures and while they are generally brushed away, there has been a large
> > > increase in them recently and there are a number of possible areas the
> > > problem could be in - core vm, page writeback and a specific driver. The
> > > bugs affected by this that I am aware of are;
> > 
> > Thanks for all the time you've spent on this one.  Let me start with
> > some more questions about the workload ;)
> > 
> > So the workload is gitk reading a git repo and a program reading data
> > over the network.  Which part of the workload writes to disk?
> 
> Sorry for the self reply, I started digging through your data (man,
> that's a lot of data ;). 

Yeah, sorry about that. Because I lacked a credible explanation as to
why waiting on sync really made such a difference, I had little choice
but to punt everything I had for people to dig through.

To be clear, I'm not actually running gitk. The fake-gitk is reading the
commits into memory and building a tree in a similar fashion to what gitk
does. I didn't want to use gitk itself because there wasn't a way of measuring
whether it was stalling or just other than looking at it and making a guess.

> I took another tour through dm-crypt and
> things make more sense now.
> 
> dm-crypt has two different single threaded workqueues for each dm-crypt
> device.  The first one is meant to deal with the actual encryption and
> decryption, and the second one is meant to do the IO.
> 
> So the path for a write looks something like this:
> 
> filesystem -> crypt thread -> encrypt the data -> io thread -> disk
> 
> And the path for read looks something like this:
> 
> filesystem -> io thread -> disk -> crypt thread -> decrypt data -> FS
> 
> One thread does encryption and one thread does IO, and these threads are
> shared for reads and writes.  The end result is that all of the sync
> reads get stuck behind any async write congestion and all of the async
> writes get stuck behind any sync read congestion.
> 
> It's almost like you need to check for both sync and async congestion
> before you have any hopes of a new IO making progress.
> 
> The confusing part is that dm hasn't gotten any worse in this regard
> since 2.6.30 but the workload here is generating more sync reads
> (hopefully from gitk and swapin) than async writes (from the low
> bandwidth rsync).  So in general if you were to change mm/*.c wait
> for sync congestion instead of async, things should appear better.
> 

Thanks very much for that explanation. It makes a lot of sense and
explains why waiting on sync-congestion made such a difference on the
test setup.

> The punch line is that the btrfs guy thinks we can solve all of this with
> just one more thread.  If we change dm-crypt to have a thread dedicated
> to sync IO and a thread dedicated to async IO the system should smooth
> out.
> 

I see you have posted another patch so I'll test that out first before
looking into that.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3
  2009-11-12 22:00   ` Chris Mason
  2009-11-13  2:46     ` Chris Mason
@ 2009-11-13 13:44     ` Mel Gorman
  2009-11-13 13:44     ` Mel Gorman
  2 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2009-11-13 13:44 UTC (permalink / raw)
  To: Chris Mason, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus

On Thu, Nov 12, 2009 at 05:00:05PM -0500, Chris Mason wrote:
> On Thu, Nov 12, 2009 at 03:27:48PM -0500, Chris Mason wrote:
> > On Thu, Nov 12, 2009 at 07:30:06PM +0000, Mel Gorman wrote:
> > > Sorry for the long delay in posting another version. Testing is extremely
> > > time-consuming and I wasn't getting to work on this as much as I'd have liked.
> > > 
> > > Changelog since V2
> > >   o Dropped the kswapd-quickly-notice-high-order patch. In more detailed
> > >     testing, it made latencies even worse as kswapd slept more on high-order
> > >     congestion causing order-0 direct reclaims.
> > >   o Added changes to how congestion_wait() works
> > >   o Added a number of new patches altering the behaviour of reclaim
> > > 
> > > Since 2.6.31-rc1, there have been an increasing number of GFP_ATOMIC
> > > failures. A significant number of these have been high-order GFP_ATOMIC
> > > failures and while they are generally brushed away, there has been a large
> > > increase in them recently and there are a number of possible areas the
> > > problem could be in - core vm, page writeback and a specific driver. The
> > > bugs affected by this that I am aware of are;
> > 
> > Thanks for all the time you've spent on this one.  Let me start with
> > some more questions about the workload ;)
> > 
> > So the workload is gitk reading a git repo and a program reading data
> > over the network.  Which part of the workload writes to disk?
> 
> Sorry for the self reply, I started digging through your data (man,
> that's a lot of data ;). 

Yeah, sorry about that. Because I lacked a credible explanation as to
why waiting on sync really made such a difference, I had little choice
but to punt everything I had for people to dig through.

To be clear, I'm not actually running gitk. The fake-gitk is reading the
commits into memory and building a tree in a similar fashion to what gitk
does. I didn't want to use gitk itself because there wasn't a way of measuring
whether it was stalling or just other than looking at it and making a guess.

> I took another tour through dm-crypt and
> things make more sense now.
> 
> dm-crypt has two different single threaded workqueues for each dm-crypt
> device.  The first one is meant to deal with the actual encryption and
> decryption, and the second one is meant to do the IO.
> 
> So the path for a write looks something like this:
> 
> filesystem -> crypt thread -> encrypt the data -> io thread -> disk
> 
> And the path for read looks something like this:
> 
> filesystem -> io thread -> disk -> crypt thread -> decrypt data -> FS
> 
> One thread does encryption and one thread does IO, and these threads are
> shared for reads and writes.  The end result is that all of the sync
> reads get stuck behind any async write congestion and all of the async
> writes get stuck behind any sync read congestion.
> 
> It's almost like you need to check for both sync and async congestion
> before you have any hopes of a new IO making progress.
> 
> The confusing part is that dm hasn't gotten any worse in this regard
> since 2.6.30 but the workload here is generating more sync reads
> (hopefully from gitk and swapin) than async writes (from the low
> bandwidth rsync).  So in general if you were to change mm/*.c wait
> for sync congestion instead of async, things should appear better.
> 

Thanks very much for that explanation. It makes a lot of sense and
explains why waiting on sync-congestion made such a difference on the
test setup.

> The punch line is that the btrfs guy thinks we can solve all of this with
> just one more thread.  If we change dm-crypt to have a thread dedicated
> to sync IO and a thread dedicated to async IO the system should smooth
> out.
> 

I see you have posted another patch so I'll test that out first before
looking into that.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] make crypto unplug fix V3
  2009-11-13 12:58       ` [PATCH] make crypto unplug " Chris Mason
  2009-11-13 17:34         ` Mel Gorman
@ 2009-11-13 17:34         ` Mel Gorman
  2009-11-13 18:40           ` Chris Mason
  1 sibling, 1 reply; 16+ messages in thread
From: Mel Gorman @ 2009-11-13 17:34 UTC (permalink / raw)
  To: Chris Mason, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm,
	KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List

On Fri, Nov 13, 2009 at 07:58:12AM -0500, Chris Mason wrote:
> This is still likely to set your dm data on fire.  It is only meant for
> testers that start with mkfs and don't have any valuable dm data.
> 

The good news is that my room remains fire-free. Despite swap also
running from dm-crypt, I had no corruption or instability issues.

Here is an updated set of results for fake-gitk running.

X86
2.6.30-0000000-force-highorder           Elapsed:12:08.908    Failures:0
2.6.31-0000000-force-highorder           Elapsed:10:56.283    Failures:0
2.6.31-0000006-dm-crypt-unplug           Elapsed:11:51.653    Failures:0
2.6.31-0000012-pgalloc-2.6.30            Elapsed:12:26.587    Failures:0
2.6.31-0000123-congestion-both           Elapsed:10:55.298    Failures:0
2.6.31-0001234-kswapd-quick-recheck      Elapsed:18:01.523    Failures:0
2.6.31-0123456-dm-crypt-unplug           Elapsed:10:45.720    Failures:0
2.6.31-revert-8aa7e847                   Elapsed:15:08.020    Failures:0
2.6.32-rc6-0000000-force-highorder       Elapsed:16:20.765    Failures:4
2.6.32-rc6-0000006-dm-crypt-unplug       Elapsed:13:42.920    Failures:0
2.6.32-rc6-0000012-pgalloc-2.6.30        Elapsed:16:13.380    Failures:1
2.6.32-rc6-0000123-congestion-both       Elapsed:18:39.118    Failures:0
2.6.32-rc6-0001234-kswapd-quick-recheck  Elapsed:15:04.398    Failures:0
2.6.32-rc6-0123456-dm-crypt-unplug       Elapsed:12:50.438    Failures:0
2.6.32-rc6-revert-8aa7e847               Elapsed:20:50.888    Failures:0

X86-64
2.6.30-0000000-force-highorder           Elapsed:10:37.300    Failures:0
2.6.31-0000000-force-highorder           Elapsed:08:49.338    Failures:0
2.6.31-0000006-dm-crypt-unplug           Elapsed:09:37.840    Failures:0
2.6.31-0000012-pgalloc-2.6.30            Elapsed:15:49.690    Failures:0
2.6.31-0000123-congestion-both           Elapsed:09:18.790    Failures:0
2.6.31-0001234-kswapd-quick-recheck      Elapsed:08:39.268    Failures:0
2.6.31-0123456-dm-crypt-unplug           Elapsed:08:20.965    Failures:0
2.6.31-revert-8aa7e847                   Elapsed:08:07.457    Failures:0
2.6.32-rc6-0000000-force-highorder       Elapsed:18:29.103    Failures:1
2.6.32-rc6-0000006-dm-crypt-unplug       Elapsed:25:53.515    Failures:3
2.6.32-rc6-0000012-pgalloc-2.6.30        Elapsed:19:55.570    Failures:6
2.6.32-rc6-0000123-congestion-both       Elapsed:17:29.255    Failures:2
2.6.32-rc6-0001234-kswapd-quick-recheck  Elapsed:14:41.068    Failures:0
2.6.32-rc6-0123456-dm-crypt-unplug       Elapsed:15:48.028    Failures:1
2.6.32-rc6-revert-8aa7e847               Elapsed:14:48.647    Failures:0

The numbering in the kernel indicates what patches are applied. I tested
the dm-crypt patch both in isolation and in combination with the patches
in this series.

Basically, the dm-crypt-unplug makes a small difference in performance
overall, mostly slight gains and losses. There was one massive regression
with the dm-crypt patch applied to 2.6.32-rc6 but at the moment, I don't
know what that is.

In general, the patch reduces the amount of time direct reclaimers are
spending on congestion_wait.

> It includes my patch from last night, along with changes to force dm to
> unplug when its IO queues empty.
> 
> The problem goes like this:
> 
> Process: submit read bio
> dm: put bio onto work queue
> process: unplug
> dm: work queue finds bio, does a generic_make_request
> 
> The end result is that we miss the unplug completely.  dm-crypt needs to
> unplug for sync bios.  This patch also changes it to unplug whenever the
> queue is empty, which is far from ideal but better than missing the
> unplugs.
> 
> This doesn't completely fix io stalls I'm seeing with dm-crypt, but its
> my best guess.  If it works, I'll break it up and submit for real to
> the dm people.
> 

Out of curiousity, how are you measuring IO stalls? In the tests I'm doing,
the worker processes output their progress and it should be at a steady
rate. I considered a stall to be an excessive delay between updates which
is a pretty indirect measure.

> -chris
> 
> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
> index ed10381..729ae01 100644
> --- a/drivers/md/dm-crypt.c
> +++ b/drivers/md/dm-crypt.c
> @@ -94,8 +94,12 @@ struct crypt_config {
>  	struct bio_set *bs;
>  
>  	struct workqueue_struct *io_queue;
> +	struct workqueue_struct *async_io_queue;
>  	struct workqueue_struct *crypt_queue;
>  
> +	atomic_t sync_bios_in_queue;
> +	atomic_t async_bios_in_queue;
> +
>  	/*
>  	 * crypto related data
>  	 */
> @@ -679,11 +683,29 @@ static void kcryptd_io_write(struct dm_crypt_io *io)
>  static void kcryptd_io(struct work_struct *work)
>  {
>  	struct dm_crypt_io *io = container_of(work, struct dm_crypt_io, work);
> +	struct crypt_config *cc = io->target->private;
> +	int zero_sync = 0;
> +	int zero_async = 0;
> +	int was_sync = 0;
> +
> +	if (io->base_bio->bi_rw & (1 << BIO_RW_SYNCIO)) {
> +		zero_sync = atomic_dec_and_test(&cc->sync_bios_in_queue);
> +		was_sync = 1;
> +	} else
> +		zero_async = atomic_dec_and_test(&cc->async_bios_in_queue);
>  
>  	if (bio_data_dir(io->base_bio) == READ)
>  		kcryptd_io_read(io);
>  	else
>  		kcryptd_io_write(io);
> +
> +	if ((was_sync && zero_sync) ||
> +	    (!was_sync && zero_async &&
> +	     atomic_read(&cc->sync_bios_in_queue) == 0)) {
> +		struct backing_dev_info *bdi;
> +		bdi = blk_get_backing_dev_info(io->base_bio->bi_bdev);
> +		blk_run_backing_dev(bdi, NULL);
> +	}
>  }
>  
>  static void kcryptd_queue_io(struct dm_crypt_io *io)
> @@ -691,7 +713,13 @@ static void kcryptd_queue_io(struct dm_crypt_io *io)
>  	struct crypt_config *cc = io->target->private;
>  
>  	INIT_WORK(&io->work, kcryptd_io);
> -	queue_work(cc->io_queue, &io->work);
> +	if (io->base_bio->bi_rw & (1 << BIO_RW_SYNCIO)) {
> +		atomic_inc(&cc->sync_bios_in_queue);
> +		queue_work(cc->io_queue, &io->work);
> +	} else {
> +		atomic_inc(&cc->async_bios_in_queue);
> +		queue_work(cc->async_io_queue, &io->work);
> +	}
>  }
>  
>  static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io,
> @@ -759,8 +787,7 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
>  
>  		/* Encryption was already finished, submit io now */
>  		if (crypt_finished) {
> -			kcryptd_crypt_write_io_submit(io, r, 0);
> -
> +			kcryptd_crypt_write_io_submit(io, r, 1);
>  			/*
>  			 * If there was an error, do not try next fragments.
>  			 * For async, error is processed in async handler.
> @@ -1120,6 +1147,15 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	} else
>  		cc->iv_mode = NULL;
>  
> +	atomic_set(&cc->sync_bios_in_queue, 0);
> +	atomic_set(&cc->async_bios_in_queue, 0);
> +
> +	cc->async_io_queue = create_singlethread_workqueue("kcryptd_async_io");
> +	if (!cc->async_io_queue) {
> +		ti->error = "Couldn't create kcryptd io queue";
> +		goto bad_async_io_queue;
> +	}
> +
>  	cc->io_queue = create_singlethread_workqueue("kcryptd_io");
>  	if (!cc->io_queue) {
>  		ti->error = "Couldn't create kcryptd io queue";
> @@ -1139,6 +1175,8 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  bad_crypt_queue:
>  	destroy_workqueue(cc->io_queue);
>  bad_io_queue:
> +	destroy_workqueue(cc->async_io_queue);
> +bad_async_io_queue:
>  	kfree(cc->iv_mode);
>  bad_ivmode_string:
>  	dm_put_device(ti, cc->dev);
> @@ -1166,6 +1204,7 @@ static void crypt_dtr(struct dm_target *ti)
>  	struct crypt_config *cc = (struct crypt_config *) ti->private;
>  
>  	destroy_workqueue(cc->io_queue);
> +	destroy_workqueue(cc->async_io_queue);
>  	destroy_workqueue(cc->crypt_queue);
>  
>  	if (cc->req)
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] make crypto unplug fix V3
  2009-11-13 12:58       ` [PATCH] make crypto unplug " Chris Mason
@ 2009-11-13 17:34         ` Mel Gorman
  2009-11-13 17:34         ` Mel Gorman
  1 sibling, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2009-11-13 17:34 UTC (permalink / raw)
  To: Chris Mason, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus

On Fri, Nov 13, 2009 at 07:58:12AM -0500, Chris Mason wrote:
> This is still likely to set your dm data on fire.  It is only meant for
> testers that start with mkfs and don't have any valuable dm data.
> 

The good news is that my room remains fire-free. Despite swap also
running from dm-crypt, I had no corruption or instability issues.

Here is an updated set of results for fake-gitk running.

X86
2.6.30-0000000-force-highorder           Elapsed:12:08.908    Failures:0
2.6.31-0000000-force-highorder           Elapsed:10:56.283    Failures:0
2.6.31-0000006-dm-crypt-unplug           Elapsed:11:51.653    Failures:0
2.6.31-0000012-pgalloc-2.6.30            Elapsed:12:26.587    Failures:0
2.6.31-0000123-congestion-both           Elapsed:10:55.298    Failures:0
2.6.31-0001234-kswapd-quick-recheck      Elapsed:18:01.523    Failures:0
2.6.31-0123456-dm-crypt-unplug           Elapsed:10:45.720    Failures:0
2.6.31-revert-8aa7e847                   Elapsed:15:08.020    Failures:0
2.6.32-rc6-0000000-force-highorder       Elapsed:16:20.765    Failures:4
2.6.32-rc6-0000006-dm-crypt-unplug       Elapsed:13:42.920    Failures:0
2.6.32-rc6-0000012-pgalloc-2.6.30        Elapsed:16:13.380    Failures:1
2.6.32-rc6-0000123-congestion-both       Elapsed:18:39.118    Failures:0
2.6.32-rc6-0001234-kswapd-quick-recheck  Elapsed:15:04.398    Failures:0
2.6.32-rc6-0123456-dm-crypt-unplug       Elapsed:12:50.438    Failures:0
2.6.32-rc6-revert-8aa7e847               Elapsed:20:50.888    Failures:0

X86-64
2.6.30-0000000-force-highorder           Elapsed:10:37.300    Failures:0
2.6.31-0000000-force-highorder           Elapsed:08:49.338    Failures:0
2.6.31-0000006-dm-crypt-unplug           Elapsed:09:37.840    Failures:0
2.6.31-0000012-pgalloc-2.6.30            Elapsed:15:49.690    Failures:0
2.6.31-0000123-congestion-both           Elapsed:09:18.790    Failures:0
2.6.31-0001234-kswapd-quick-recheck      Elapsed:08:39.268    Failures:0
2.6.31-0123456-dm-crypt-unplug           Elapsed:08:20.965    Failures:0
2.6.31-revert-8aa7e847                   Elapsed:08:07.457    Failures:0
2.6.32-rc6-0000000-force-highorder       Elapsed:18:29.103    Failures:1
2.6.32-rc6-0000006-dm-crypt-unplug       Elapsed:25:53.515    Failures:3
2.6.32-rc6-0000012-pgalloc-2.6.30        Elapsed:19:55.570    Failures:6
2.6.32-rc6-0000123-congestion-both       Elapsed:17:29.255    Failures:2
2.6.32-rc6-0001234-kswapd-quick-recheck  Elapsed:14:41.068    Failures:0
2.6.32-rc6-0123456-dm-crypt-unplug       Elapsed:15:48.028    Failures:1
2.6.32-rc6-revert-8aa7e847               Elapsed:14:48.647    Failures:0

The numbering in the kernel indicates what patches are applied. I tested
the dm-crypt patch both in isolation and in combination with the patches
in this series.

Basically, the dm-crypt-unplug makes a small difference in performance
overall, mostly slight gains and losses. There was one massive regression
with the dm-crypt patch applied to 2.6.32-rc6 but at the moment, I don't
know what that is.

In general, the patch reduces the amount of time direct reclaimers are
spending on congestion_wait.

> It includes my patch from last night, along with changes to force dm to
> unplug when its IO queues empty.
> 
> The problem goes like this:
> 
> Process: submit read bio
> dm: put bio onto work queue
> process: unplug
> dm: work queue finds bio, does a generic_make_request
> 
> The end result is that we miss the unplug completely.  dm-crypt needs to
> unplug for sync bios.  This patch also changes it to unplug whenever the
> queue is empty, which is far from ideal but better than missing the
> unplugs.
> 
> This doesn't completely fix io stalls I'm seeing with dm-crypt, but its
> my best guess.  If it works, I'll break it up and submit for real to
> the dm people.
> 

Out of curiousity, how are you measuring IO stalls? In the tests I'm doing,
the worker processes output their progress and it should be at a steady
rate. I considered a stall to be an excessive delay between updates which
is a pretty indirect measure.

> -chris
> 
> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
> index ed10381..729ae01 100644
> --- a/drivers/md/dm-crypt.c
> +++ b/drivers/md/dm-crypt.c
> @@ -94,8 +94,12 @@ struct crypt_config {
>  	struct bio_set *bs;
>  
>  	struct workqueue_struct *io_queue;
> +	struct workqueue_struct *async_io_queue;
>  	struct workqueue_struct *crypt_queue;
>  
> +	atomic_t sync_bios_in_queue;
> +	atomic_t async_bios_in_queue;
> +
>  	/*
>  	 * crypto related data
>  	 */
> @@ -679,11 +683,29 @@ static void kcryptd_io_write(struct dm_crypt_io *io)
>  static void kcryptd_io(struct work_struct *work)
>  {
>  	struct dm_crypt_io *io = container_of(work, struct dm_crypt_io, work);
> +	struct crypt_config *cc = io->target->private;
> +	int zero_sync = 0;
> +	int zero_async = 0;
> +	int was_sync = 0;
> +
> +	if (io->base_bio->bi_rw & (1 << BIO_RW_SYNCIO)) {
> +		zero_sync = atomic_dec_and_test(&cc->sync_bios_in_queue);
> +		was_sync = 1;
> +	} else
> +		zero_async = atomic_dec_and_test(&cc->async_bios_in_queue);
>  
>  	if (bio_data_dir(io->base_bio) == READ)
>  		kcryptd_io_read(io);
>  	else
>  		kcryptd_io_write(io);
> +
> +	if ((was_sync && zero_sync) ||
> +	    (!was_sync && zero_async &&
> +	     atomic_read(&cc->sync_bios_in_queue) == 0)) {
> +		struct backing_dev_info *bdi;
> +		bdi = blk_get_backing_dev_info(io->base_bio->bi_bdev);
> +		blk_run_backing_dev(bdi, NULL);
> +	}
>  }
>  
>  static void kcryptd_queue_io(struct dm_crypt_io *io)
> @@ -691,7 +713,13 @@ static void kcryptd_queue_io(struct dm_crypt_io *io)
>  	struct crypt_config *cc = io->target->private;
>  
>  	INIT_WORK(&io->work, kcryptd_io);
> -	queue_work(cc->io_queue, &io->work);
> +	if (io->base_bio->bi_rw & (1 << BIO_RW_SYNCIO)) {
> +		atomic_inc(&cc->sync_bios_in_queue);
> +		queue_work(cc->io_queue, &io->work);
> +	} else {
> +		atomic_inc(&cc->async_bios_in_queue);
> +		queue_work(cc->async_io_queue, &io->work);
> +	}
>  }
>  
>  static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io,
> @@ -759,8 +787,7 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
>  
>  		/* Encryption was already finished, submit io now */
>  		if (crypt_finished) {
> -			kcryptd_crypt_write_io_submit(io, r, 0);
> -
> +			kcryptd_crypt_write_io_submit(io, r, 1);
>  			/*
>  			 * If there was an error, do not try next fragments.
>  			 * For async, error is processed in async handler.
> @@ -1120,6 +1147,15 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	} else
>  		cc->iv_mode = NULL;
>  
> +	atomic_set(&cc->sync_bios_in_queue, 0);
> +	atomic_set(&cc->async_bios_in_queue, 0);
> +
> +	cc->async_io_queue = create_singlethread_workqueue("kcryptd_async_io");
> +	if (!cc->async_io_queue) {
> +		ti->error = "Couldn't create kcryptd io queue";
> +		goto bad_async_io_queue;
> +	}
> +
>  	cc->io_queue = create_singlethread_workqueue("kcryptd_io");
>  	if (!cc->io_queue) {
>  		ti->error = "Couldn't create kcryptd io queue";
> @@ -1139,6 +1175,8 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  bad_crypt_queue:
>  	destroy_workqueue(cc->io_queue);
>  bad_io_queue:
> +	destroy_workqueue(cc->async_io_queue);
> +bad_async_io_queue:
>  	kfree(cc->iv_mode);
>  bad_ivmode_string:
>  	dm_put_device(ti, cc->dev);
> @@ -1166,6 +1204,7 @@ static void crypt_dtr(struct dm_target *ti)
>  	struct crypt_config *cc = (struct crypt_config *) ti->private;
>  
>  	destroy_workqueue(cc->io_queue);
> +	destroy_workqueue(cc->async_io_queue);
>  	destroy_workqueue(cc->crypt_queue);
>  
>  	if (cc->req)
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] make crypto unplug fix V3
  2009-11-13 17:34         ` Mel Gorman
@ 2009-11-13 18:40           ` Chris Mason
  2009-11-13 20:29             ` Mel Gorman
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Mason @ 2009-11-13 18:40 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm,
	KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List

On Fri, Nov 13, 2009 at 05:34:46PM +0000, Mel Gorman wrote:
> On Fri, Nov 13, 2009 at 07:58:12AM -0500, Chris Mason wrote:
> > This is still likely to set your dm data on fire.  It is only meant for
> > testers that start with mkfs and don't have any valuable dm data.
> > 
> 
> The good news is that my room remains fire-free. Despite swap also
> running from dm-crypt, I had no corruption or instability issues.

Ok, definitely not so convincing I'd try and shove it into a late rc.

> 
> Here is an updated set of results for fake-gitk running.
> 
> X86
> 2.6.30-0000000-force-highorder           Elapsed:12:08.908    Failures:0
> 2.6.31-0000000-force-highorder           Elapsed:10:56.283    Failures:0
> 2.6.31-0000006-dm-crypt-unplug           Elapsed:11:51.653    Failures:0
> 2.6.31-0000012-pgalloc-2.6.30            Elapsed:12:26.587    Failures:0
> 2.6.31-0000123-congestion-both           Elapsed:10:55.298    Failures:0
> 2.6.31-0001234-kswapd-quick-recheck      Elapsed:18:01.523    Failures:0
> 2.6.31-0123456-dm-crypt-unplug           Elapsed:10:45.720    Failures:0
> 2.6.31-revert-8aa7e847                   Elapsed:15:08.020    Failures:0
> 2.6.32-rc6-0000000-force-highorder       Elapsed:16:20.765    Failures:4
> 2.6.32-rc6-0000006-dm-crypt-unplug       Elapsed:13:42.920    Failures:0
> 2.6.32-rc6-0000012-pgalloc-2.6.30        Elapsed:16:13.380    Failures:1
> 2.6.32-rc6-0000123-congestion-both       Elapsed:18:39.118    Failures:0
> 2.6.32-rc6-0001234-kswapd-quick-recheck  Elapsed:15:04.398    Failures:0
> 2.6.32-rc6-0123456-dm-crypt-unplug       Elapsed:12:50.438    Failures:0
> 2.6.32-rc6-revert-8aa7e847               Elapsed:20:50.888    Failures:0
> 
> X86-64
> 2.6.30-0000000-force-highorder           Elapsed:10:37.300    Failures:0
> 2.6.31-0000000-force-highorder           Elapsed:08:49.338    Failures:0
> 2.6.31-0000006-dm-crypt-unplug           Elapsed:09:37.840    Failures:0
> 2.6.31-0000012-pgalloc-2.6.30            Elapsed:15:49.690    Failures:0
> 2.6.31-0000123-congestion-both           Elapsed:09:18.790    Failures:0
> 2.6.31-0001234-kswapd-quick-recheck      Elapsed:08:39.268    Failures:0
> 2.6.31-0123456-dm-crypt-unplug           Elapsed:08:20.965    Failures:0
> 2.6.31-revert-8aa7e847                   Elapsed:08:07.457    Failures:0
> 2.6.32-rc6-0000000-force-highorder       Elapsed:18:29.103    Failures:1
> 2.6.32-rc6-0000006-dm-crypt-unplug       Elapsed:25:53.515    Failures:3
> 2.6.32-rc6-0000012-pgalloc-2.6.30        Elapsed:19:55.570    Failures:6
> 2.6.32-rc6-0000123-congestion-both       Elapsed:17:29.255    Failures:2
> 2.6.32-rc6-0001234-kswapd-quick-recheck  Elapsed:14:41.068    Failures:0
> 2.6.32-rc6-0123456-dm-crypt-unplug       Elapsed:15:48.028    Failures:1
> 2.6.32-rc6-revert-8aa7e847               Elapsed:14:48.647    Failures:0
> 
> The numbering in the kernel indicates what patches are applied. I tested
> the dm-crypt patch both in isolation and in combination with the patches
> in this series.
> 
> Basically, the dm-crypt-unplug makes a small difference in performance
> overall, mostly slight gains and losses. There was one massive regression
> with the dm-crypt patch applied to 2.6.32-rc6 but at the moment, I don't
> know what that is.

How consistent are your numbers between runs?  I was trying to match
this up with your last email and things were pretty different.

> 
> In general, the patch reduces the amount of time direct reclaimers are
> spending on congestion_wait.
> 
> > It includes my patch from last night, along with changes to force dm to
> > unplug when its IO queues empty.
> > 
> > The problem goes like this:
> > 
> > Process: submit read bio
> > dm: put bio onto work queue
> > process: unplug
> > dm: work queue finds bio, does a generic_make_request
> > 
> > The end result is that we miss the unplug completely.  dm-crypt needs to
> > unplug for sync bios.  This patch also changes it to unplug whenever the
> > queue is empty, which is far from ideal but better than missing the
> > unplugs.
> > 
> > This doesn't completely fix io stalls I'm seeing with dm-crypt, but its
> > my best guess.  If it works, I'll break it up and submit for real to
> > the dm people.
> > 
> 
> Out of curiousity, how are you measuring IO stalls? In the tests I'm doing,
> the worker processes output their progress and it should be at a steady
> rate. I considered a stall to be an excessive delay between updates which
> is a pretty indirect measure.

I just setup a crypto disk and did dd if=/dev/zero of=/mnt/foo bs=1M

If you watch vmstat 1, there's supposed to be a constant steam of IO to
the disk.  If a whole second goes by with zero IO, we're doing something
wrong, I get a number of multi-second stalls where we are just waiting
for IO to happen.

Most of the time I was able to catch a sysrq-w for it, someone was
waiting on a read to finish.   It isn't completely clear to me if the
unplugging is working properly.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] make crypto unplug fix V3
  2009-11-13 18:40           ` Chris Mason
@ 2009-11-13 20:29             ` Mel Gorman
  0 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2009-11-13 20:29 UTC (permalink / raw)
  To: Chris Mason, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm,
	KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List

On Fri, Nov 13, 2009 at 01:40:04PM -0500, Chris Mason wrote:
> On Fri, Nov 13, 2009 at 05:34:46PM +0000, Mel Gorman wrote:
> > On Fri, Nov 13, 2009 at 07:58:12AM -0500, Chris Mason wrote:
> > > This is still likely to set your dm data on fire.  It is only meant for
> > > testers that start with mkfs and don't have any valuable dm data.
> > > 
> > 
> > The good news is that my room remains fire-free. Despite swap also
> > running from dm-crypt, I had no corruption or instability issues.
> 
> Ok, definitely not so convincing I'd try and shove it into a late rc.
> 
> > 
> > Here is an updated set of results for fake-gitk running.
> > 
> > X86
> > 2.6.30-0000000-force-highorder           Elapsed:12:08.908    Failures:0
> > 2.6.31-0000000-force-highorder           Elapsed:10:56.283    Failures:0
> > 2.6.31-0000006-dm-crypt-unplug           Elapsed:11:51.653    Failures:0
> > 2.6.31-0000012-pgalloc-2.6.30            Elapsed:12:26.587    Failures:0
> > 2.6.31-0000123-congestion-both           Elapsed:10:55.298    Failures:0
> > 2.6.31-0001234-kswapd-quick-recheck      Elapsed:18:01.523    Failures:0
> > 2.6.31-0123456-dm-crypt-unplug           Elapsed:10:45.720    Failures:0
> > 2.6.31-revert-8aa7e847                   Elapsed:15:08.020    Failures:0
> > 2.6.32-rc6-0000000-force-highorder       Elapsed:16:20.765    Failures:4
> > 2.6.32-rc6-0000006-dm-crypt-unplug       Elapsed:13:42.920    Failures:0
> > 2.6.32-rc6-0000012-pgalloc-2.6.30        Elapsed:16:13.380    Failures:1
> > 2.6.32-rc6-0000123-congestion-both       Elapsed:18:39.118    Failures:0
> > 2.6.32-rc6-0001234-kswapd-quick-recheck  Elapsed:15:04.398    Failures:0
> > 2.6.32-rc6-0123456-dm-crypt-unplug       Elapsed:12:50.438    Failures:0
> > 2.6.32-rc6-revert-8aa7e847               Elapsed:20:50.888    Failures:0
> > 
> > X86-64
> > 2.6.30-0000000-force-highorder           Elapsed:10:37.300    Failures:0
> > 2.6.31-0000000-force-highorder           Elapsed:08:49.338    Failures:0
> > 2.6.31-0000006-dm-crypt-unplug           Elapsed:09:37.840    Failures:0
> > 2.6.31-0000012-pgalloc-2.6.30            Elapsed:15:49.690    Failures:0
> > 2.6.31-0000123-congestion-both           Elapsed:09:18.790    Failures:0
> > 2.6.31-0001234-kswapd-quick-recheck      Elapsed:08:39.268    Failures:0
> > 2.6.31-0123456-dm-crypt-unplug           Elapsed:08:20.965    Failures:0
> > 2.6.31-revert-8aa7e847                   Elapsed:08:07.457    Failures:0
> > 2.6.32-rc6-0000000-force-highorder       Elapsed:18:29.103    Failures:1
> > 2.6.32-rc6-0000006-dm-crypt-unplug       Elapsed:25:53.515    Failures:3
> > 2.6.32-rc6-0000012-pgalloc-2.6.30        Elapsed:19:55.570    Failures:6
> > 2.6.32-rc6-0000123-congestion-both       Elapsed:17:29.255    Failures:2
> > 2.6.32-rc6-0001234-kswapd-quick-recheck  Elapsed:14:41.068    Failures:0
> > 2.6.32-rc6-0123456-dm-crypt-unplug       Elapsed:15:48.028    Failures:1
> > 2.6.32-rc6-revert-8aa7e847               Elapsed:14:48.647    Failures:0
> > 
> > The numbering in the kernel indicates what patches are applied. I tested
> > the dm-crypt patch both in isolation and in combination with the patches
> > in this series.
> > 
> > Basically, the dm-crypt-unplug makes a small difference in performance
> > overall, mostly slight gains and losses. There was one massive regression
> > with the dm-crypt patch applied to 2.6.32-rc6 but at the moment, I don't
> > know what that is.
> 
> How consistent are your numbers between runs?  I was trying to match
> this up with your last email and things were pretty different.
> 

The figures from the first mail were based on kernels that were not
instrumented. It so happened that this run was based on an instrumented
kernel to get the congestion_wait figures so the results are different.

However, the results vary a lot. In some cases, it will spike just as
2.6.32-rc6-0000006-dm-crypt-unplug did. The reported figure is the
average of 4 fake-gitk runs. I don't have the standard deviation handy
but it's high.

> > 
> > In general, the patch reduces the amount of time direct reclaimers are
> > spending on congestion_wait.
> > 
> > > It includes my patch from last night, along with changes to force dm to
> > > unplug when its IO queues empty.
> > > 
> > > The problem goes like this:
> > > 
> > > Process: submit read bio
> > > dm: put bio onto work queue
> > > process: unplug
> > > dm: work queue finds bio, does a generic_make_request
> > > 
> > > The end result is that we miss the unplug completely.  dm-crypt needs to
> > > unplug for sync bios.  This patch also changes it to unplug whenever the
> > > queue is empty, which is far from ideal but better than missing the
> > > unplugs.
> > > 
> > > This doesn't completely fix io stalls I'm seeing with dm-crypt, but its
> > > my best guess.  If it works, I'll break it up and submit for real to
> > > the dm people.
> > > 
> > 
> > Out of curiousity, how are you measuring IO stalls? In the tests I'm doing,
> > the worker processes output their progress and it should be at a steady
> > rate. I considered a stall to be an excessive delay between updates which
> > is a pretty indirect measure.
> 
> I just setup a crypto disk and did dd if=/dev/zero of=/mnt/foo bs=1M
> 
> If you watch vmstat 1, there's supposed to be a constant steam of IO to
> the disk.  If a whole second goes by with zero IO, we're doing something
> wrong, I get a number of multi-second stalls where we are just waiting
> for IO to happen.
> 

Ok, cool.

> Most of the time I was able to catch a sysrq-w for it, someone was
> waiting on a read to finish.   It isn't completely clear to me if the
> unplugging is working properly.
> 

The machines I'm using are now tied up doing the same tests with dm-crypt
and then I need to rerun with dm-crypt with the changed patches. It'll be
early next week before the machines free up again for me to investigate your
patch more but initial results look promising at least.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3
  2009-11-13  2:46     ` Chris Mason
  2009-11-13 12:58       ` [PATCH] make crypto unplug " Chris Mason
@ 2009-11-16 16:44       ` Milan Broz
  2009-11-16 18:36         ` Chris Mason
  2009-11-16 16:44       ` Milan Broz
  2 siblings, 1 reply; 16+ messages in thread
From: Milan Broz @ 2009-11-16 16:44 UTC (permalink / raw)
  To: Chris Mason, Mel Gorman, Andrew Morton, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel,
	linux-mm, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki,
	Kernel Testers List, device-mapper development,
	Alasdair G Kergon

On 11/13/2009 03:46 AM, Chris Mason wrote:
> On Thu, Nov 12, 2009 at 05:00:05PM -0500, Chris Mason wrote:
> 
> [ ...]
> 
>>
>> The punch line is that the btrfs guy thinks we can solve all of this with
>> just one more thread.  If we change dm-crypt to have a thread dedicated
>> to sync IO and a thread dedicated to async IO the system should smooth
>> out.

Please, can you cc DM maintainers with these kind of patches? dm-devel list at least.

Note that the crypt requests can be already processed synchronously or asynchronously,
depending on used crypto module (async it is in the case of some hw acceleration).

Adding another queue make the situation more complicated and because the crypt
requests can be queued in crypto layer I am not sure that this solution will help
in this situation at all.
(Try to run that with AES-NI acceleration for example.)


Milan
--
mbroz@redhat.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3
  2009-11-13  2:46     ` Chris Mason
  2009-11-13 12:58       ` [PATCH] make crypto unplug " Chris Mason
  2009-11-16 16:44       ` [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate " Milan Broz
@ 2009-11-16 16:44       ` Milan Broz
  2 siblings, 0 replies; 16+ messages in thread
From: Milan Broz @ 2009-11-16 16:44 UTC (permalink / raw)
  To: Chris Mason, Mel Gorman, Andrew Morton, Frans Pop, Jiri Kosina

On 11/13/2009 03:46 AM, Chris Mason wrote:
> On Thu, Nov 12, 2009 at 05:00:05PM -0500, Chris Mason wrote:
> 
> [ ...]
> 
>>
>> The punch line is that the btrfs guy thinks we can solve all of this with
>> just one more thread.  If we change dm-crypt to have a thread dedicated
>> to sync IO and a thread dedicated to async IO the system should smooth
>> out.

Please, can you cc DM maintainers with these kind of patches? dm-devel list at least.

Note that the crypt requests can be already processed synchronously or asynchronously,
depending on used crypto module (async it is in the case of some hw acceleration).

Adding another queue make the situation more complicated and because the crypt
requests can be queued in crypto layer I am not sure that this solution will help
in this situation at all.
(Try to run that with AES-NI acceleration for example.)


Milan
--
mbroz@redhat.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3
  2009-11-16 16:44       ` [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate " Milan Broz
@ 2009-11-16 18:36         ` Chris Mason
  2009-11-19  8:12           ` Mel Gorman
  0 siblings, 1 reply; 16+ messages in thread
From: Chris Mason @ 2009-11-16 18:36 UTC (permalink / raw)
  To: Milan Broz
  Cc: Mel Gorman, Andrew Morton, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, linux-kernel, linux-mm,
	KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List,
	device-mapper development, Alasdair G Kergon

On Mon, Nov 16, 2009 at 05:44:07PM +0100, Milan Broz wrote:
> On 11/13/2009 03:46 AM, Chris Mason wrote:
> > On Thu, Nov 12, 2009 at 05:00:05PM -0500, Chris Mason wrote:
> > 
> > [ ...]
> > 
> >>
> >> The punch line is that the btrfs guy thinks we can solve all of this with
> >> just one more thread.  If we change dm-crypt to have a thread dedicated
> >> to sync IO and a thread dedicated to async IO the system should smooth
> >> out.
> 
> Please, can you cc DM maintainers with these kind of patches? dm-devel list at least.
> 

Well, my current patch is a hack.  If I had come up with a proven theory
(hopefully Mel can prove it ;), it definitely would have gone through
the dm-devel lists.

> Note that the crypt requests can be already processed synchronously or asynchronously,
> depending on used crypto module (async it is in the case of some hw acceleration).
> 
> Adding another queue make the situation more complicated and because the crypt
> requests can be queued in crypto layer I am not sure that this solution will help
> in this situation at all.
> (Try to run that with AES-NI acceleration for example.)

The problem is that async threads still imply a kind of ordering.
If there's a fifo serviced by one thread or 10, the latency ramifications
are very similar for a new entry on the list.  We have to wait for a
large portion of the low-prio items in order to service a high prio
item.

With a queue dedicated to sync requests and one dedicated to async,
you'll get better read latencies.  Btrfs has a similar problem around
the crc helper threads and it ends up solving things with two different
lists (high and low prio) processed by one thread.

-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3
  2009-11-16 18:36         ` Chris Mason
@ 2009-11-19  8:12           ` Mel Gorman
  0 siblings, 0 replies; 16+ messages in thread
From: Mel Gorman @ 2009-11-19  8:12 UTC (permalink / raw)
  To: Chris Mason, Milan Broz, Andrew Morton, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, linux-kernel,
	linux-mm, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki,
	Kernel Testers List, device-mapper development,
	Alasdair G Kergon

On Mon, Nov 16, 2009 at 01:36:13PM -0500, Chris Mason wrote:
> On Mon, Nov 16, 2009 at 05:44:07PM +0100, Milan Broz wrote:
> > On 11/13/2009 03:46 AM, Chris Mason wrote:
> > > On Thu, Nov 12, 2009 at 05:00:05PM -0500, Chris Mason wrote:
> > > 
> > > [ ...]
> > > 
> > >>
> > >> The punch line is that the btrfs guy thinks we can solve all of this with
> > >> just one more thread.  If we change dm-crypt to have a thread dedicated
> > >> to sync IO and a thread dedicated to async IO the system should smooth
> > >> out.
> > 
> > Please, can you cc DM maintainers with these kind of patches? dm-devel list at least.
> > 
> 
> Well, my current patch is a hack.  If I had come up with a proven theory
> (hopefully Mel can prove it ;), it definitely would have gone through
> the dm-devel lists.
> 

I can't prove it for sure but the workload might not be targetted enough
to show better or worse read latencies.

I adjusted the workload to run fake-gitk multiple times to get a better
sense of the deviation between runs

On X86-64, the timings were
2.6.30-0000000-force-highorder                Elapsed:10:52.218(stddev:008.085)   Failures:0
2.6.31-0000000-force-highorder                Elapsed:11:32.258(stddev:130.779)   Failures:0
2.6.31-0012345-kswapd-stay-awake-when-min     Elapsed:09:34.662(stddev:022.239)   Failures:0
2.6.31-0123456-dm-crypt-unplug                Elapsed:10:28.718(stddev:060.897)   Failures:0
2.6.32-rc6-0000000-force-highorder            Elapsed:27:53.686(stddev:207.508)   Failures:37
2.6.32-rc6-0012345-kswapd-stay-awake-when-min Elapsed:27:26.735(stddev:221.214)   Failures:6
2.6.32-rc6-0123456-dm-crypt-unplug            Elapsed:27:35.462(stddev:205.017)   Failures:4

On X86, they were
2.6.30-0000000-force-highorder                Elapsed:13:36.768(stddev:019.514)   Failures:0
2.6.31-0000000-force-highorder                Elapsed:16:27.922(stddev:134.839)   Failures:0
2.6.31-0000006-dm-crypt-unplug                Elapsed:15:47.160(stddev:183.488)   Failures:0
2.6.31-0012345-kswapd-stay-awake-when-min     Elapsed:18:32.458(stddev:182.164)   Failures:0
2.6.31-0123456-dm-crypt-unplug                Elapsed:17:07.482(stddev:210.404)   Failures:0
2.6.32-rc6-0000000-force-highorder            Elapsed:26:08.763(stddev:123.926)   Failures:4
2.6.32-rc6-0000006-dm-crypt-unplug            Elapsed:17:57.550(stddev:254.412)   Failures:1
2.6.32-rc6-0012345-kswapd-stay-awake-when-min Elapsed:25:03.435(stddev:234.685)   Failures:1
2.6.32-rc6-0123456-dm-crypt-unplug            Elapsed:25:21.382(stddev:211.252)   Failures:0

(I forgot to queue up the dm-crypt patches on their own for X86-64 which
is why the results are missing).

While the dm-crypt patch shows small differences, they are well within
the noise for each run of fake-gitk so I can't draw any major conclusion
from it.

On X86 for 2.6.31, roughly the same amount of time is spent in
congestion_wait() with or without the patch. On 2.6.32-rc6, the time
kswapd spends congestioned on the ASYNC queue is reduced by about 20%
both when compared against mainline and compared against the other
patches in the series applied. There is very little difference to the
congestion on the SYNC queue.

On X86-64 for 2.6.31, the story is slightly different. I don't think
it's an architecture thing because the X86-64 machine has twice as many
cores as the X86 test machine. Here, congestion_wait() spent on the
ASYNC queue remains roughly the same but the time spent on the SYNC
queue for direct reclaim is reduced by almost a third. Against
2.6.32-rc6, there was very little difference.

Again, it's hard to draw solid conclusions from this. I know from other
testing that the low_latency tunable for the IO scheduler is an important
factor for the performance of this test on 2.6.32-rc6 so if disabled, it
mgiht show a clearer picture, but right now I can't say for sure it's an
improvement.

> > Note that the crypt requests can be already processed synchronously or asynchronously,
> > depending on used crypto module (async it is in the case of some hw acceleration).
> > 
> > Adding another queue make the situation more complicated and because the crypt
> > requests can be queued in crypto layer I am not sure that this solution will help
> > in this situation at all.
> > (Try to run that with AES-NI acceleration for example.)
> 
> The problem is that async threads still imply a kind of ordering.
> If there's a fifo serviced by one thread or 10, the latency ramifications
> are very similar for a new entry on the list.  We have to wait for a
> large portion of the low-prio items in order to service a high prio
> item.
> 
> With a queue dedicated to sync requests and one dedicated to async,
> you'll get better read latencies.  Btrfs has a similar problem around
> the crc helper threads and it ends up solving things with two different
> lists (high and low prio) processed by one thread.
> 
> -chris
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-11-19  8:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-12 19:30 [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Mel Gorman
2009-11-12 19:30 ` [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed Mel Gorman
2009-11-12 20:27 ` [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Chris Mason
2009-11-12 22:00   ` Chris Mason
2009-11-13  2:46     ` Chris Mason
2009-11-13 12:58       ` [PATCH] make crypto unplug " Chris Mason
2009-11-13 17:34         ` Mel Gorman
2009-11-13 17:34         ` Mel Gorman
2009-11-13 18:40           ` Chris Mason
2009-11-13 20:29             ` Mel Gorman
2009-11-16 16:44       ` [PATCH 0/7] Reduce GFP_ATOMIC allocation failures, candidate " Milan Broz
2009-11-16 18:36         ` Chris Mason
2009-11-19  8:12           ` Mel Gorman
2009-11-16 16:44       ` Milan Broz
2009-11-13 13:44     ` Mel Gorman
2009-11-13 13:44     ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox