From: Dave Chinner <david@fromorbit.com>
To: Glauber Costa <glommer@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.cz>,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: linux-next: slab shrinkers: BUG at mm/list_lru.c:92
Date: Tue, 18 Jun 2013 12:46:23 +1000 [thread overview]
Message-ID: <20130618024623.GP29338@dastard> (raw)
In-Reply-To: <20130617223004.GB2538@localhost.localdomain>
On Tue, Jun 18, 2013 at 02:30:05AM +0400, Glauber Costa wrote:
> On Mon, Jun 17, 2013 at 02:35:08PM -0700, Andrew Morton wrote:
> > On Mon, 17 Jun 2013 19:14:12 +0400 Glauber Costa <glommer@gmail.com> wrote:
> >
> > > > I managed to trigger:
> > > > [ 1015.776029] kernel BUG at mm/list_lru.c:92!
> > > > [ 1015.776029] invalid opcode: 0000 [#1] SMP
> > > > with Linux next (next-20130607) with https://lkml.org/lkml/2013/6/17/203
> > > > on top.
> > > >
> > > > This is obviously BUG_ON(nlru->nr_items < 0) and
> > > > ffffffff81122d0b: 48 85 c0 test %rax,%rax
> > > > ffffffff81122d0e: 49 89 44 24 18 mov %rax,0x18(%r12)
> > > > ffffffff81122d13: 0f 84 87 00 00 00 je ffffffff81122da0 <list_lru_walk_node+0x110>
> > > > ffffffff81122d19: 49 83 7c 24 18 00 cmpq $0x0,0x18(%r12)
> > > > ffffffff81122d1f: 78 7b js ffffffff81122d9c <list_lru_walk_node+0x10c>
> > > > [...]
> > > > ffffffff81122d9c: 0f 0b ud2
> > > >
> > > > RAX is -1UL.
> > > Yes, fearing those kind of imbalances, we decided to leave the counter as a signed quantity
> > > and BUG, instead of an unsigned quantity.
> > >
> > > >
> > > > I assume that the current backtrace is of no use and it would most
> > > > probably be some shrinker which doesn't behave.
> > > >
> > > There are currently 3 users of list_lru in tree: dentries, inodes and xfs.
> > > Assuming you are not using xfs, we are left with dentries and inodes.
> > >
> > > The first thing to do is to find which one of them is misbehaving. You can try finding
> > > this out by the address of the list_lru, and where it lays in the superblock.
> > >
> > > Once we know each of them is misbehaving, then we'll have to figure out why.
> >
> > The trace says shrink_slab_node->super_cache_scan->prune_icache_sb. So
> > it's inodes?
> >
> Assuming there is no memory corruption of any sort going on , let's check the code.
> nr_item is only manipulated in 3 places:
>
> 1) list_lru_add, where it is increased
> 2) list_lru_del, where it is decreased in case the user have voluntarily removed the
> element from the list
> 3) list_lru_walk_node, where an element is removing during shrink.
>
> All three excerpts seem to be correctly locked, so something like this indicates an imbalance.
inode_lru_isolate() looks suspicious to me:
WARN_ON(inode->i_state & I_NEW);
inode->i_state |= I_FREEING;
spin_unlock(&inode->i_lock);
list_move(&inode->i_lru, freeable);
this_cpu_dec(nr_unused);
return LRU_REMOVED;
}
All the other cases where I_FREEING is set and the inode is removed
from the LRU are completely done under the inode->i_lock. i.e. from
an external POV, the state change to I_FREEING and removal from LRU
are supposed to be atomic, but they are not here.
I'm not sure this is the source of the problem, but it definitely
needs fixing.
> callers:
> iput_final, evict_inodes, invalidate_inodes.
> Both evict_inodes and invalidate_inodes will do the following pattern:
>
> inode->i_state |= I_FREEING;
> inode_lru_list_del(inode);
> spin_unlock(&inode->i_lock);
> list_add(&inode->i_lru, &dispose);
>
> IOW, they will remove the element from the LRU, and add it to the dispose list.
> Both of them will also bail out if they see I_FREEING already set, so they are safe
> against each other - because the flag is manipulated inside the lock.
>
> But how about iput_final? It seems to me that if we are calling iput_final at the
> same time as the other two, this *could* happen (maybe there is some extra protection
> that can be seen from Australia but not from here. Dave?)
If I_FREEING is set before we enter iput_final(), then something
else is screwed up. I_FREEING is only set once the last reference
has gone away and we are killing the inode. All the other callers
that set I_FREEING check that the reference count on the inode is
zero before they set I_FREEING. Hence I_FREEING cannot be set on the
transition of i_count from 1 to 0 when iput_final() is called. So
the patch won't do anything to avoid the problem being seen.
Keep in mind that we this is actually a new warning on the count of
inodes on the LRU - we never had a check that it didn't go negative
before....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-06-18 2:46 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-17 14:18 Michal Hocko
2013-06-17 15:14 ` Glauber Costa
2013-06-17 15:33 ` Michal Hocko
2013-06-17 16:54 ` Glauber Costa
2013-06-18 7:42 ` Michal Hocko
2013-06-17 21:35 ` Andrew Morton
2013-06-17 22:30 ` Glauber Costa
2013-06-18 2:46 ` Dave Chinner [this message]
2013-06-18 6:31 ` Glauber Costa
2013-06-18 8:24 ` Michal Hocko
2013-06-18 10:44 ` Michal Hocko
2013-06-18 13:50 ` Michal Hocko
2013-06-25 2:27 ` Dave Chinner
2013-06-26 8:15 ` Michal Hocko
2013-06-26 23:24 ` Dave Chinner
2013-06-27 14:54 ` Michal Hocko
2013-06-28 8:39 ` Michal Hocko
2013-06-28 14:31 ` Glauber Costa
2013-06-28 15:12 ` Michal Hocko
2013-06-29 2:55 ` Dave Chinner
2013-06-30 18:33 ` Michal Hocko
2013-07-01 1:25 ` Dave Chinner
2013-07-01 7:50 ` Michal Hocko
2013-07-01 8:10 ` Dave Chinner
2013-07-02 9:22 ` Michal Hocko
2013-07-02 12:19 ` Dave Chinner
2013-07-02 12:44 ` Michal Hocko
2013-07-03 11:24 ` Dave Chinner
2013-07-03 14:08 ` Glauber Costa
2013-07-04 16:36 ` Michal Hocko
2013-07-08 12:53 ` Michal Hocko
2013-07-08 21:04 ` Andrew Morton
2013-07-09 17:34 ` Glauber Costa
2013-07-09 17:51 ` Andrew Morton
2013-07-09 17:32 ` Glauber Costa
2013-07-09 17:50 ` Andrew Morton
2013-07-09 17:57 ` Glauber Costa
2013-07-09 17:57 ` Michal Hocko
2013-07-09 21:39 ` Andrew Morton
2013-07-10 2:31 ` Dave Chinner
2013-07-10 7:34 ` Michal Hocko
2013-07-10 8:06 ` Michal Hocko
2013-07-11 2:26 ` Dave Chinner
2013-07-11 3:03 ` Andrew Morton
2013-07-11 13:23 ` Michal Hocko
2013-07-12 1:42 ` Hugh Dickins
2013-07-13 3:29 ` Dave Chinner
2013-07-15 9:14 ` Michal Hocko
2013-06-18 6:26 ` Glauber Costa
2013-06-18 8:25 ` Michal Hocko
2013-06-19 7:13 ` Michal Hocko
2013-06-19 7:35 ` Glauber Costa
2013-06-19 8:52 ` Glauber Costa
2013-06-19 13:57 ` Michal Hocko
2013-06-19 14:02 ` Glauber Costa
2013-06-19 14:28 ` Michal Hocko
2013-06-20 14:11 ` Glauber Costa
2013-06-20 15:12 ` Michal Hocko
2013-06-20 15:16 ` Michal Hocko
2013-06-21 9:00 ` Michal Hocko
2013-06-23 11:51 ` Glauber Costa
2013-06-23 11:55 ` Glauber Costa
2013-06-25 2:29 ` Dave Chinner
2013-06-26 8:22 ` Michal Hocko
2013-06-18 8:19 ` Michal Hocko
2013-06-18 8:21 ` Glauber Costa
2013-06-18 8:26 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130618024623.GP29338@dastard \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=glommer@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox