From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx183.postini.com [74.125.245.183]) by kanga.kvack.org (Postfix) with SMTP id 4222C6B005A for ; Mon, 17 Sep 2012 19:59:21 -0400 (EDT) Received: by qady1 with SMTP id y1so2460763qad.14 for ; Mon, 17 Sep 2012 16:59:20 -0700 (PDT) Date: Mon, 17 Sep 2012 16:58:35 -0700 (PDT) From: Hugh Dickins Subject: Re: blk, mm: lockdep irq lock inversion in linux-next In-Reply-To: <20120917162248.d998afe3.akpm@linux-foundation.org> Message-ID: References: <5054878F.1030908@gmail.com> <20120917162248.d998afe3.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Andrew Morton Cc: Sasha Levin , axboe@kernel.dk, Tejun Heo , linux-mm , "linux-kernel@vger.kernel.org" , Dave Jones On Mon, 17 Sep 2012, Andrew Morton wrote: > On Sat, 15 Sep 2012 15:50:07 +0200 > Sasha Levin wrote: > > > Hi all, > > > > While fuzzing with trinity within a KVM tools guest on a linux-next kernel, I > > got the lockdep warning at the bottom of this mail. > > > > I've tried figuring out where it was introduced, but haven't found any sign that > > any of the code in that area changed recently, so I'm probably missing something... > > > > > > [ 157.966399] ========================================================= > > [ 157.968523] [ INFO: possible irq lock inversion dependency detected ] > > [ 157.970029] 3.6.0-rc5-next-20120914-sasha-00001-g802bf6c-dirty #340 Tainted: G W > > [ 157.970029] --------------------------------------------------------- > > [ 157.970029] trinity-child38/6642 just changed the state of lock: > > [ 157.970029] (&(&mapping->tree_lock)->rlock){+.+...}, at: [] > > invalidate_inode_pages2_range+0x20c/0x3c0 > > [ 157.970029] but this lock was taken by another, SOFTIRQ-safe lock in the past: > > [ 157.970029] (&(&new->queue_lock)->rlock){..-...} > > > > [snippage] > > gack, what a mess. Thanks for the report. AFAICT, what has happened is: > > invalidate_complete_page2() > ->spin_lock_irq(&mapping->tree_lock) > ->clear_page_mlock() > __clear_page_mlock() > ->isolate_lru_page() > ->spin_lock_irq(&zone->lru_lock) > ->spin_unlock_irq(&zone->lru_lock) > > whoops. isolate_lru_page() just enabled local interrupts while we're > holding ->tree_lock, which is supposed to be an irq-save lock. And in > a rather obscure way, lockdep caught it. Congratulations on deciphering the lockdep report, I soon gave up. But it looks like a bigger problem than your patch addresses: both filemap.c and rmap.c document tree_lock as nesting within lru_lock; and although it's possible that time has changed that, I doubt it. I think invalidate_complete_page2() is simply wrong to be calling clear_page_mlock() while holding mapping->tree_lock (other callsites avoid doing so). Maybe it should do a preliminary PageDirty test, then clear_page_mlock(), then take mapping->tree_lock, then repeat PageDirty test, without worrying about the odd case when it might clear mlock but then decide to back off the page. Oh, hold on, that reminds me: a few months ago I was putting together a tidy-up patch near there, and it seemed to me inappropriate to be clearing mlock down in truncate/invalidate, that belongs better to when unmapping the page, doesn't it? I'll look that out and try to finish it off. Hugh > > Problem is, I cannot find any recent change which might have triggered > this. > > I don't know how repeatable this is for you (not very at all, I > suspect). This? > > > From: Andrew Morton > Subject: mm: isolate_lru_page(): don't enable local interrupts > > isolate_lru_page() is called with local interrupts disabled, via > > invalidate_complete_page2() > ->spin_lock_irq(&mapping->tree_lock) > ->clear_page_mlock() > __clear_page_mlock() > ->isolate_lru_page() > > so it should not unconditionally enable local interrupts. > > Sasha hit a lockdep warning when running Trinity as a result of this. > > Reported-by: Sasha Levin > Cc: Mel Gorman > Signed-off-by: Andrew Morton > --- > > mm/vmscan.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff -puN mm/vmscan.c~mm-isolate_lru_page-dont-enable-local-interrupts mm/vmscan.c > --- a/mm/vmscan.c~mm-isolate_lru_page-dont-enable-local-interrupts > +++ a/mm/vmscan.c > @@ -1161,8 +1161,9 @@ int isolate_lru_page(struct page *page) > if (PageLRU(page)) { > struct zone *zone = page_zone(page); > struct lruvec *lruvec; > + unsigned long flags; > > - spin_lock_irq(&zone->lru_lock); > + spin_lock_irqsave(&zone->lru_lock, flags); > lruvec = mem_cgroup_page_lruvec(page, zone); > if (PageLRU(page)) { > int lru = page_lru(page); > @@ -1171,7 +1172,7 @@ int isolate_lru_page(struct page *page) > del_page_from_lru_list(page, lruvec, lru); > ret = 0; > } > - spin_unlock_irq(&zone->lru_lock); > + spin_unlock_irqrestore(&zone->lru_lock, flags); > } > return ret; > } > _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org