From: Chuck Lever III <chuck.lever@oracle.com>
To: Jeff Layton <jlayton@kernel.org>
Cc: kernel test robot <yujie.liu@intel.com>,
"oe-lkp@lists.linux.dev" <oe-lkp@lists.linux.dev>,
kernel test robot <lkp@intel.com>,
Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [cel:topic-shmem-stable-dir-cookies] [shmem] 5fd403eb6c: WARNING:inconsistent_lock_state
Date: Wed, 12 Apr 2023 21:00:02 +0000 [thread overview]
Message-ID: <6B05671F-7E27-4820-A59C-02DC9EFD8A80@oracle.com> (raw)
In-Reply-To: <bcd5eb8bd0ae863025ced85f9d0a76eb40fa64e7.camel@kernel.org>
> On Apr 12, 2023, at 3:34 PM, Jeff Layton <jlayton@kernel.org> wrote:
>
> On Wed, 2023-04-12 at 19:09 +0000, Chuck Lever III wrote:
>>
>>> On Apr 12, 2023, at 3:05 PM, Jeff Layton <jlayton@kernel.org> wrote:
>>>
>>> On Wed, 2023-04-12 at 18:03 +0000, Chuck Lever III wrote:
>>>>
>>>>
>>>>> On Apr 10, 2023, at 8:36 PM, kernel test robot <yujie.liu@intel.com>
>>>>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> kernel test robot noticed "WARNING:inconsistent_lock_state" on:
>>>>>
>>>>> commit: 5fd403eb6c181c63a3aacd55d92b80256a0670cf ("shmem: stable
>>>>> directory cookies")
>>>>> git://git.kernel.org/cgit/linux/kernel/git/cel/linux topic-shmem-
>>>>> stable-dir-cookies
>>>>>
>>>>> in testcase: boot
>>>>>
>>>>> compiler: gcc-11
>>>>> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2
>>>>> -m 16G
>>>>>
>>>>> (please refer to attached dmesg/kmsg for entire log/backtrace)
>>>>>
>>>>>
>>>>> If you fix the issue, kindly add following tag
>>>>>> Reported-by: kernel test robot <yujie.liu@intel.com>
>>>>>> Link:
>>>>> https://lore.kernel.org/oe-lkp/202304101606.79aea62f-yujie.liu@intel.com
>>>>>
>>>>>
>>>>> [ 21.279213][ C0] WARNING: inconsistent lock state
>>>>> [ 21.279668][ C0] 6.3.0-rc5-00001-g5fd403eb6c18 #1 Not tainted
>>>>> [ 21.280199][ C0] --------------------------------
>>>>> [ 21.280657][ C0] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W}
>>>>> usage.
>>>>> [ 21.281238][ C0] swapper/0/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
>>>>> [ 21.281773][ C0] ffff8881102e9b50 (&xa->xa_lock#3){+.?.}-{2:2}, at:
>>>>> xa_destroy (lib/xarray.c:2214)
>>>>> [ 21.283140][ C0] {SOFTIRQ-ON-W} state was registered at:
>>>>> [ 21.283640][ C0] __lock_acquire (kernel/locking/lockdep.c:5010)
>>>>> [ 21.284089][ C0] lock_acquire (kernel/locking/lockdep.c:467
>>>>> kernel/locking/lockdep.c:5671 kernel/locking/lockdep.c:5634)
>>>>> [ 21.284513][ C0] _raw_spin_lock
>>>>> (include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
>>>>> [ 21.284937][ C0] shmem_doff_add (include/linux/xarray.h:965
>>>>> mm/shmem.c:2943)
>>>>> [ 21.285375][ C0] shmem_mknod (mm/shmem.c:3014)
>>>>> [ 21.285791][ C0] vfs_mknod (fs/namei.c:3916)
>>>>> [ 21.286195][ C0] devtmpfs_work_loop (drivers/base/devtmpfs.c:228
>>>>> drivers/base/devtmpfs.c:393 drivers/base/devtmpfs.c:408)
>>>>> [ 21.286653][ C0] devtmpfsd (devtmpfs.c:?)
>>>>> [ 21.287046][ C0] kthread (kernel/kthread.c:376)
>>>>> [ 21.287441][ C0] ret_from_fork (arch/x86/entry/entry_64.S:314)
>>>>> [ 21.287864][ C0] irq event stamp: 167451
>>>>> [ 21.288264][ C0] hardirqs last enabled at (167450):
>>>>> kasan_quarantine_put (arch/x86/include/asm/irqflags.h:42
>>>>> (discriminator 1) arch/x86/include/asm/irqflags.h:77 (discriminator
>>>>> 1) arch/x86/include/asm/irqflags.h:135 (discriminator 1)
>>>>> mm/kasan/quarantine.c:242 (discriminator 1))
>>>>> [ 21.289095][ C0] hardirqs last disabled at (167451):
>>>>> _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:108
>>>>> kernel/locking/spinlock.c:162)
>>>>> [ 21.289969][ C0] softirqs last enabled at (167330): __do_softirq
>>>>> (kernel/softirq.c:415 kernel/softirq.c:600)
>>>>> [ 21.290755][ C0] softirqs last disabled at (167355): irq_exit_rcu
>>>>> (kernel/softirq.c:445 kernel/softirq.c:650 kernel/softirq.c:640
>>>>> kernel/softirq.c:662)
>>>>> [ 21.291540][ C0]
>>>>> [ 21.291540][ C0] other info that might help us debug this:
>>>>> [ 21.292230][ C0] Possible unsafe locking scenario:
>>>>> [ 21.292230][ C0]
>>>>> [ 21.292905][ C0] CPU0
>>>>> [ 21.293235][ C0] ----
>>>>> [ 21.293575][ C0] lock(&xa->xa_lock#3);
>>>>> [ 21.293987][ C0] <Interrupt>
>>>>> [ 21.294327][ C0] lock(&xa->xa_lock#3);
>>>>> [ 21.294753][ C0]
>>>>> [ 21.294753][ C0] *** DEADLOCK ***
>>>>> [ 21.294753][ C0]
>>>>> [ 21.295483][ C0] 1 lock held by swapper/0/0:
>>>>> [ 21.295914][ C0] #0: ffffffff8597a260 (rcu_callback){....}-{0:0},
>>>>> at: rcu_do_batch (kernel/rcu/tree.c:2104)
>>>>
>>>> It appears that RCU is trying to evict a tmpfs directory inode
>>>> prematurely.
>>>> lockdep catches this because someone else is trying to add an entry to
>>>> it
>>>> while RCU is trying to free it. Classic use-after-free.
>>>>
>>>> Jeff, the only new iput() in this patch is the one you suggested in
>>>> shmem_symlink(). Are you sure it is needed (and also correct)?
>>>>
>>>
>>> The code in your topic-shmem-stable-dir-cookies branch looks correct to
>>> me. After shmem_get_inode, it holds an inode reference and that must be
>>> explicitly put on error, unless you attach it to the dentry (via
>>> d_instantiate).
>>>
>>> I'm not sure how to interpret this. The log is a bit of a mess. It looks
>>> it ended up in some sort of recursive call into the same xarray due to
>>> an interrupt?
>>
>> I think it's easier to see if you look at the dmesg.xz that was
>> attached to the original report.
>>
>> The thing calling xa_destroy is being invoked from i_callback,
>> which is the RCU-deferred "inode destroy" method. It's running
>> in softIRQ context.
>>
>
> Right, but why is it trying to add an entry to an xarray that is being
> destroyed?
I think because the directory inode's reference count is too low,
and it's being destroyed too early (ie, while there are still
legitimate users).
> Or maybe it isn't, and lockdep is just confused and is
> classifying the various per-inode xarrays together? I have a hard time
> interpreting these reports sometimes. :-/
>
>>
>>> One thing that looks suspicious to me is that this patch has the call to
>>> shmem_doff_map_destroy in free_inode (which is the RCU callback). I
>>> think you probably want to do that in destroy_inode instead since that
>>> involves taking locks and such.
>>
>> I'll have a look!
>>
>
> Cool, I think that's probably safest here. In principle, the xarray
> should be empty when we get to this point so there ought not be much to
> do anyway.
I expected that shmem_free_in_core_inode() was invoked later than
shmem_destroy_inode(), after everything had been completely torn
down.
I'm shooting in the dark with the fix, though. I can't make heads
or tails of the job file that was attached, so I don't have a
reproducer.
--
Chuck Lever
prev parent reply other threads:[~2023-04-12 21:00 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-11 0:36 kernel test robot
2023-04-12 18:03 ` Chuck Lever III
2023-04-12 19:05 ` Jeff Layton
2023-04-12 19:09 ` Chuck Lever III
2023-04-12 19:34 ` Jeff Layton
2023-04-12 21:00 ` Chuck Lever III [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6B05671F-7E27-4820-A59C-02DC9EFD8A80@oracle.com \
--to=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-mm@kvack.org \
--cc=lkp@intel.com \
--cc=oe-lkp@lists.linux.dev \
--cc=yujie.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox