From: Torsten Kaiser <just.for.lkml@googlemail.com>
To: Neil Brown <neilb@suse.de>
Cc: Wu Fengguang <fengguang.wu@intel.com>,
Rik van Riel <riel@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Li Shaohua <shaohua.li@intel.com>
Subject: Re: Deadlock possibly caused by too_many_isolated.
Date: Tue, 19 Oct 2010 10:43:31 +0200 [thread overview]
Message-ID: <AANLkTin3wXWwA-HXhjx6wvzznp3p57Pg6fee8YNkZB79@mail.gmail.com> (raw)
In-Reply-To: <20101019101151.57c6dd56@notabene>
On Tue, Oct 19, 2010 at 1:11 AM, Neil Brown <neilb@suse.de> wrote:
> On Mon, 18 Oct 2010 12:58:17 +0200
> Torsten Kaiser <just.for.lkml@googlemail.com> wrote:
>
>> On Mon, Oct 18, 2010 at 6:14 AM, Neil Brown <neilb@suse.de> wrote:
>> > Testing shows that this patch seems to work.
>> > The test load (essentially kernbench) doesn't deadlock any more, though it
>> > does get bogged down thrashing in swap so it doesn't make a lot more
>> > progress :-) I guess that is to be expected.
>>
>> I just noticed this thread, as your mail from today pushed it up.
>>
>> In your original mail you wrote: " I recently had a customer (running
>> 2.6.32) report a deadlock during very intensive IO with lots of
>> processes. " and " Some threads that are blocked there, hold some IO
>> lock (probably in the filesystem) and are trying to allocate memory
>> inside the block device (md/raid1 to be precise) which is allocating
>> with GFP_NOIO and has a mempool to fall back on."
>>
>> I recently had the same problem (intense IO due to swapstorm created
>> by 20 gcc processes hung my system) and after initially blaming the
>> workqueue changes in 2.6.36 Tejun Heo determined that my problem was
>> not the workqueues getting locked up, but that it was cause by an
>> exhausted mempool:
>> http://marc.info/?l=linux-kernel&m=128655737012549&w=2
>>
>> Instrumenting mm/mempool.c and retrying my workload showed that
>> fs_bio_set from fs/bio.c looked like the mempool to blame and the code
>> in drivers/md/raid1.c to be the misuser:
>> http://marc.info/?l=linux-kernel&m=128671179817823&w=2
>>
>> I was even able to reproduce this hang with only using a normal RAID1
>> md device as swapspace and then using dd to fill a tmpfs until
>> swapping was needed:
>> http://marc.info/?l=linux-raid&m=128699402805191&w=2
>>
>> Looking back in the history of raid1.c and bio.c I found the following
>> interesting parts:
>>
>> * the change to allocate more then one bio via bio_clone() is from
>> 2005, but it looks like it was OK back then, because at that point the
>> fs_bio_set was allocation 256 entries
>> * in 2007 the size of the mempool was changed from 256 to only 2
>> entries (5972511b77809cb7c9ccdb79b825c54921c5c546 "A single unit is
>> enough, lets scale it down to 2 just to be on the safe side.")
>> * only in 2009 the comment "To make this work, callers must never
>> allocate more than 1 bio at the time from this pool. Callers that need
>> to allocate more than 1 bio must always submit the previously allocate
>> bio for IO before attempting to allocate a new one. Failure to do so
>> can cause livelocks under memory pressure." was added to bio_alloc()
>> that is the base from my reasoning that raid1.c is broken. (And such a
>> comment was not added to bio_clone() although both calls use the same
>> mempool)
>>
>> So could please look someone into raid1.c to confirm or deny that
>> using multiple bio_clone() (one per drive) before submitting them
>> together could also cause such deadlocks?
>>
>> Thank for looking
>>
>> Torsten
>
> Yes, thanks for the report.
> This is a real bug exactly as you describe.
>
> This is how I think I will fix it, though it needs a bit of review and
> testing before I can be certain.
> Also I need to check raid10 etc to see if they can suffer too.
>
> If you can test it I would really appreciate it.
I did test it, but while it seemed to fix the deadlock, the system
still got unusable.
The still running "vmstat 1" showed that the swapout was still
progressing, but at a rate of ~20k sized bursts every 5 to 20 seconds.
I also tried to additionally add Wu's patch:
--- linux-next.orig/mm/vmscan.c 2010-10-13 12:35:14.000000000 +0800
+++ linux-next/mm/vmscan.c 2010-10-19 00:13:04.000000000 +0800
@@ -1163,6 +1163,13 @@ static int too_many_isolated(struct zone
isolated = zone_page_state(zone, NR_ISOLATED_ANON);
}
+ /*
+ * GFP_NOIO/GFP_NOFS callers are allowed to isolate more pages, so that
+ * they won't get blocked by normal ones and form circular deadlock
next prev parent reply other threads:[~2010-10-19 8:43 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-14 23:11 Neil Brown
2010-09-15 0:30 ` Rik van Riel
2010-09-15 2:23 ` Neil Brown
2010-09-15 2:37 ` Wu Fengguang
2010-09-15 2:54 ` Wu Fengguang
2010-09-15 3:06 ` Wu Fengguang
2010-09-15 3:13 ` Wu Fengguang
2010-09-15 3:18 ` Shaohua Li
2010-09-15 3:31 ` Wu Fengguang
2010-09-15 3:17 ` Neil Brown
2010-09-15 3:47 ` Wu Fengguang
2010-09-15 8:28 ` Wu Fengguang
2010-09-15 8:44 ` Neil Brown
2010-10-18 4:14 ` Neil Brown
2010-10-18 5:04 ` KOSAKI Motohiro
2010-10-18 10:58 ` Torsten Kaiser
2010-10-18 23:11 ` Neil Brown
2010-10-19 8:43 ` Torsten Kaiser [this message]
2010-10-19 10:06 ` Torsten Kaiser
2010-10-20 5:57 ` Wu Fengguang
2010-10-20 7:05 ` KOSAKI Motohiro
2010-10-20 9:27 ` Wu Fengguang
2010-10-20 13:03 ` Jens Axboe
2010-10-22 5:37 ` Wu Fengguang
2010-10-22 8:07 ` Wu Fengguang
2010-10-22 8:09 ` Jens Axboe
2010-10-24 16:52 ` Wu Fengguang
2010-10-25 6:40 ` Neil Brown
2010-10-25 7:26 ` Wu Fengguang
2010-10-20 7:25 ` Torsten Kaiser
2010-10-20 9:01 ` Wu Fengguang
2010-10-20 10:07 ` Torsten Kaiser
2010-10-20 14:23 ` Minchan Kim
2010-10-20 15:35 ` Torsten Kaiser
2010-10-20 23:31 ` Minchan Kim
2010-10-18 16:15 ` Wu Fengguang
2010-10-18 21:58 ` Andrew Morton
2010-10-18 22:31 ` Neil Brown
2010-10-18 22:41 ` Andrew Morton
2010-10-19 0:57 ` KOSAKI Motohiro
2010-10-19 1:15 ` Minchan Kim
2010-10-19 1:21 ` KOSAKI Motohiro
2010-10-19 1:32 ` Minchan Kim
2010-10-19 2:03 ` KOSAKI Motohiro
2010-10-19 2:16 ` Minchan Kim
2010-10-19 2:54 ` KOSAKI Motohiro
2010-10-19 2:35 ` Wu Fengguang
2010-10-19 2:52 ` Minchan Kim
2010-10-19 3:05 ` Wu Fengguang
2010-10-19 3:09 ` Minchan Kim
2010-10-19 3:13 ` KOSAKI Motohiro
2010-10-19 5:11 ` Minchan Kim
2010-10-19 3:21 ` Shaohua Li
2010-10-19 7:15 ` Shaohua Li
2010-10-19 7:34 ` Minchan Kim
2010-10-19 2:24 ` Wu Fengguang
2010-10-19 2:37 ` KOSAKI Motohiro
2010-10-19 2:37 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AANLkTin3wXWwA-HXhjx6wvzznp3p57Pg6fee8YNkZB79@mail.gmail.com \
--to=just.for.lkml@googlemail.com \
--cc=akpm@linux-foundation.org \
--cc=fengguang.wu@intel.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=neilb@suse.de \
--cc=riel@redhat.com \
--cc=shaohua.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox