linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Glauber Costa <glommer@gmail.com>
Cc: Dave Chinner <david@fromorbit.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: linux-next: slab shrinkers: BUG at mm/list_lru.c:92
Date: Fri, 28 Jun 2013 17:12:53 +0200	[thread overview]
Message-ID: <20130628151253.GE5125@dhcp22.suse.cz> (raw)
In-Reply-To: <20130628143124.GA6552@localhost.localdomain>

On Fri 28-06-13 18:31:26, Glauber Costa wrote:
> On Fri, Jun 28, 2013 at 10:39:43AM +0200, Michal Hocko wrote:
> > I have just triggered this one.
> > 
> > [37955.364062] RIP: 0010:[<ffffffff81127e5b>]  [<ffffffff81127e5b>] list_lru_walk_node+0xab/0x140
> > [37955.364062] RSP: 0000:ffff8800374af7b8  EFLAGS: 00010286
> > [37955.364062] RAX: 0000000000000106 RBX: ffff88002ead7838 RCX: ffff8800374af830
> Note ebx
> 
> > [37955.364062] RDX: 0000000000000107 RSI: ffff88001d250dc0 RDI: ffff88002ead77d0
> > [37955.364062] RBP: ffff8800374af818 R08: 0000000000000000 R09: ffff88001ffeafc0
> > [37955.364062] R10: 0000000000000002 R11: 0000000000000000 R12: ffff88001d250dc0
> > [37955.364062] R13: 00000000000000a0 R14: 000000572ead7838 R15: ffff88001d250dc8
> Note r14

Hmm the upper part is 57 which is also weird. Do you think this might be
a HW issue?

It would be strange that I cannot reproduce it without the series
applied and I was testing the "good" case for a long time to be sure
that this is just not a "consistent good luck".

> > [37955.364062] Process as (pid: 3351, threadinfo ffff8800374ae000, task ffff880036d665c0)
> > [37955.364062] Stack:
> > [37955.364062]  ffff88001da3e700 ffff8800374af830 ffff8800374af838 ffffffff811846d0
> > [37955.364062]  0000000000000000 ffff88001ce75c48 01ff8800374af838 ffff8800374af838
> > [37955.364062]  0000000000000000 ffff88001ce75800 ffff8800374afa08 0000000000001014
> > [37955.364062] Call Trace:
> > [37955.364062]  [<ffffffff811846d0>] ? insert_inode_locked+0x160/0x160
> > [37955.364062]  [<ffffffff8118496c>] prune_icache_sb+0x3c/0x60
> > [37955.364062]  [<ffffffff8116dcbe>] super_cache_scan+0x12e/0x1b0
> > [37955.364062]  [<ffffffff8111354a>] shrink_slab_node+0x13a/0x250
> > [37955.364062]  [<ffffffff8111671b>] shrink_slab+0xab/0x120
> > [37955.364062]  [<ffffffff81117944>] do_try_to_free_pages+0x264/0x360
> > [37955.364062]  [<ffffffff81117d90>] try_to_free_pages+0x130/0x180
> > [37955.364062]  [<ffffffff81001974>] ? __switch_to+0x1b4/0x550
> > [37955.364062]  [<ffffffff8110a2fe>] __alloc_pages_slowpath+0x39e/0x790
> > [37955.364062]  [<ffffffff8110a8ea>] __alloc_pages_nodemask+0x1fa/0x210
> > [37955.364062]  [<ffffffff8114d1b0>] alloc_pages_vma+0xa0/0x120
> > [37955.364062]  [<ffffffff81129ebb>] do_anonymous_page+0x16b/0x350
> > [37955.364062]  [<ffffffff8112f9c5>] handle_pte_fault+0x235/0x240
> > [37955.364062]  [<ffffffff8107b8b0>] ? set_next_entity+0xb0/0xd0
> > [37955.364062]  [<ffffffff8112fcbf>] handle_mm_fault+0x2ef/0x400
> > [37955.364062]  [<ffffffff8157e927>] __do_page_fault+0x237/0x4f0
> > [37955.364062]  [<ffffffff8116a8a8>] ? fsnotify_access+0x68/0x80
> > [37955.364062]  [<ffffffff8116b0b8>] ? vfs_read+0xd8/0x130
> > [37955.364062]  [<ffffffff8157ebe9>] do_page_fault+0x9/0x10ffff88002ead7838
> > [37955.364062]  [<ffffffff8157b348>] page_fault+0x28/0x30
> > [37955.364062] Code: 44 24 18 0f 84 87 00 00 00 49 83 7c 24 18 00 78 7b 49 83 c5 01 48 8b 4d a8 48 8b 11 48 8d 42 ff 48 85 d2 48 89 01 74 78 4d 39 f7 <49> 8b 06 4c 89 f3 74 6d 49 89 c6 eb a6 0f 1f 84 00 00 00 00 00 
> > [37955.364062] RIP  [<ffffffff81127e5b>] list_lru_walk_node+0xab/0x140
> > 
> > ffffffff81127e0e:       48 8b 55 b0             mov    -0x50(%rbp),%rdx
> > ffffffff81127e12:       4c 89 e6                mov    %r12,%rsi
> > ffffffff81127e15:       48 89 df                mov    %rbx,%rdi
> > ffffffff81127e18:       ff 55 b8                callq  *-0x48(%rbp)		# isolate(item, &nlru->lock, cb_arg)
> > ffffffff81127e1b:       83 f8 01                cmp    $0x1,%eax
> > ffffffff81127e1e:       74 78                   je     ffffffff81127e98 <list_lru_walk_node+0xe8>
> > ffffffff81127e20:       73 4e                   jae    ffffffff81127e70 <list_lru_walk_node+0xc0>
> > [...]
> One interesting thing I have noted here, is that r14 is basically the lower half of rbx, with
> the upper part borked.
> 
> Because we are talking about a single word, this does not seem the usual update-half-of-double-word
> without locking issue.
> 
> From your excerpt, it is not totally clear what r14 is. But by looking at rdi which
> is 0xffff88002ead77d0 and very probable nlru->lock due to the calling convention,
> that would indicate that this is nlru->list in case you have spinlock debugging enabled.
> 
> So yes, someone destroyed our next pointer, and amazingly only half of it.
> 
> Still, the only time we ever release this lock is when isolate returns LRU_RETRY. Maybe the
> way we restart is wrong? (although I can't see how)
> 
> An iput() happens outside the lock in that case, but it seems safe : if that ends up manipulating
> the lru it will do so through our accessors.
> 
> I will have to think a bit more... Any other strange thing happening before it ?

Nothing. I basically see two cases here. One is hang and the other is
this crash followed up by many soft lockups as we died with the lock
held.

When I think about it, the hang case might still be a false positive
(for xfs at least) because it might just happen that those processes I
have listed are in D state just too long. I have gave them tens of
seconds and than compared traces and consider them hung if they didn't
change. I can retest and try to give them hours to be absolutely sure.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-06-28 15:12 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-17 14:18 Michal Hocko
2013-06-17 15:14 ` Glauber Costa
2013-06-17 15:33   ` Michal Hocko
2013-06-17 16:54     ` Glauber Costa
2013-06-18  7:42       ` Michal Hocko
2013-06-17 21:35   ` Andrew Morton
2013-06-17 22:30     ` Glauber Costa
2013-06-18  2:46       ` Dave Chinner
2013-06-18  6:31         ` Glauber Costa
2013-06-18  8:24           ` Michal Hocko
2013-06-18 10:44             ` Michal Hocko
2013-06-18 13:50               ` Michal Hocko
2013-06-25  2:27                 ` Dave Chinner
2013-06-26  8:15                   ` Michal Hocko
2013-06-26 23:24                     ` Dave Chinner
2013-06-27 14:54                       ` Michal Hocko
2013-06-28  8:39                         ` Michal Hocko
2013-06-28 14:31                           ` Glauber Costa
2013-06-28 15:12                             ` Michal Hocko [this message]
2013-06-29  2:55                         ` Dave Chinner
2013-06-30 18:33                           ` Michal Hocko
2013-07-01  1:25                             ` Dave Chinner
2013-07-01  7:50                               ` Michal Hocko
2013-07-01  8:10                                 ` Dave Chinner
2013-07-02  9:22                                   ` Michal Hocko
2013-07-02 12:19                                     ` Dave Chinner
2013-07-02 12:44                                       ` Michal Hocko
2013-07-03 11:24                                         ` Dave Chinner
2013-07-03 14:08                                           ` Glauber Costa
2013-07-04 16:36                                           ` Michal Hocko
2013-07-08 12:53                                             ` Michal Hocko
2013-07-08 21:04                                               ` Andrew Morton
2013-07-09 17:34                                                 ` Glauber Costa
2013-07-09 17:51                                                   ` Andrew Morton
2013-07-09 17:32                                               ` Glauber Costa
2013-07-09 17:50                                                 ` Andrew Morton
2013-07-09 17:57                                                   ` Glauber Costa
2013-07-09 17:57                                                 ` Michal Hocko
2013-07-09 21:39                                                   ` Andrew Morton
2013-07-10  2:31                                               ` Dave Chinner
2013-07-10  7:34                                                 ` Michal Hocko
2013-07-10  8:06                                                 ` Michal Hocko
2013-07-11  2:26                                                   ` Dave Chinner
2013-07-11  3:03                                                     ` Andrew Morton
2013-07-11 13:23                                                     ` Michal Hocko
2013-07-12  1:42                                                       ` Hugh Dickins
2013-07-13  3:29                                                         ` Dave Chinner
2013-07-15  9:14                                             ` Michal Hocko
2013-06-18  6:26       ` Glauber Costa
2013-06-18  8:25         ` Michal Hocko
2013-06-19  7:13         ` Michal Hocko
2013-06-19  7:35           ` Glauber Costa
2013-06-19  8:52             ` Glauber Costa
2013-06-19 13:57             ` Michal Hocko
2013-06-19 14:02               ` Glauber Costa
2013-06-19 14:28           ` Michal Hocko
2013-06-20 14:11             ` Glauber Costa
2013-06-20 15:12               ` Michal Hocko
2013-06-20 15:16                 ` Michal Hocko
2013-06-21  9:00                 ` Michal Hocko
2013-06-23 11:51                   ` Glauber Costa
2013-06-23 11:55                     ` Glauber Costa
2013-06-25  2:29                     ` Dave Chinner
2013-06-26  8:22                     ` Michal Hocko
2013-06-18  8:19       ` Michal Hocko
2013-06-18  8:21         ` Glauber Costa
2013-06-18  8:26           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130628151253.GE5125@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=glommer@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox