From: Marco Elver <elver@google.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: paulmck@kernel.org, linux-next@vger.kernel.org,
linux-kernel@vger.kernel.org, kasan-dev@googlegroups.com,
linux-mm@kvack.org, sfr@canb.auug.org.au, bigeasy@linutronix.de,
longman@redhat.com, boqun.feng@gmail.com, cl@linux.com,
penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com,
akpm@linux-foundation.org
Subject: Re: [BUG] -next lockdep invalid wait context
Date: Wed, 30 Oct 2024 23:34:08 +0100 [thread overview]
Message-ID: <ZyK0YPgtWExT4deh@elver.google.com> (raw)
In-Reply-To: <e06d69c9-f067-45c6-b604-fd340c3bd612@suse.cz>
On Wed, Oct 30, 2024 at 10:48PM +0100, Vlastimil Babka wrote:
> On 10/30/24 22:05, Paul E. McKenney wrote:
> > Hello!
>
> Hi!
>
> > The next-20241030 release gets the splat shown below when running
> > scftorture in a preemptible kernel. This bisects to this commit:
> >
> > 560af5dc839e ("lockdep: Enable PROVE_RAW_LOCK_NESTING with PROVE_LOCKING")
> >
> > Except that all this is doing is enabling lockdep to find the problem.
> >
> > The obvious way to fix this is to make the kmem_cache structure's
> > cpu_slab field's ->lock be a raw spinlock, but this might not be what
> > we want for real-time response.
>
> But it's a local_lock, not spinlock and it's doing local_lock_irqsave(). I'm
> confused what's happening here, the code has been like this for years now.
>
> > This can be reproduced deterministically as follows:
> >
> > tools/testing/selftests/rcutorture/bin/kvm.sh --torture scf --allcpus --duration 2 --configs PREEMPT --kconfig CONFIG_NR_CPUS=64 --memory 7G --trust-make --kasan --bootargs "scftorture.nthreads=64 torture.disable_onoff_at_boot csdlock_debug=1"
> >
> > I doubt that the number of CPUs or amount of memory makes any difference,
> > but that is what I used.
> >
> > Thoughts?
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > [ 35.659746] =============================
> > [ 35.659746] [ BUG: Invalid wait context ]
> > [ 35.659746] 6.12.0-rc5-next-20241029 #57233 Not tainted
> > [ 35.659746] -----------------------------
> > [ 35.659746] swapper/37/0 is trying to lock:
> > [ 35.659746] ffff8881ff4bf2f0 (&c->lock){....}-{3:3}, at: put_cpu_partial+0x49/0x1b0
> > [ 35.659746] other info that might help us debug this:
> > [ 35.659746] context-{2:2}
> > [ 35.659746] no locks held by swapper/37/0.
> > [ 35.659746] stack backtrace:
> > [ 35.659746] CPU: 37 UID: 0 PID: 0 Comm: swapper/37 Not tainted 6.12.0-rc5-next-20241029 #57233
> > [ 35.659746] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> > [ 35.659746] Call Trace:
> > [ 35.659746] <IRQ>
> > [ 35.659746] dump_stack_lvl+0x68/0xa0
> > [ 35.659746] __lock_acquire+0x8fd/0x3b90
> > [ 35.659746] ? start_secondary+0x113/0x210
> > [ 35.659746] ? __pfx___lock_acquire+0x10/0x10
> > [ 35.659746] ? __pfx___lock_acquire+0x10/0x10
> > [ 35.659746] ? __pfx___lock_acquire+0x10/0x10
> > [ 35.659746] ? __pfx___lock_acquire+0x10/0x10
> > [ 35.659746] lock_acquire+0x19b/0x520
> > [ 35.659746] ? put_cpu_partial+0x49/0x1b0
> > [ 35.659746] ? __pfx_lock_acquire+0x10/0x10
> > [ 35.659746] ? __pfx_lock_release+0x10/0x10
> > [ 35.659746] ? lock_release+0x20f/0x6f0
> > [ 35.659746] ? __pfx_lock_release+0x10/0x10
> > [ 35.659746] ? lock_release+0x20f/0x6f0
> > [ 35.659746] ? kasan_save_track+0x14/0x30
> > [ 35.659746] put_cpu_partial+0x52/0x1b0
> > [ 35.659746] ? put_cpu_partial+0x49/0x1b0
> > [ 35.659746] ? __pfx_scf_handler_1+0x10/0x10
> > [ 35.659746] __flush_smp_call_function_queue+0x2d2/0x600
>
> How did we even get to put_cpu_partial directly from flushing smp calls?
> SLUB doesn't use them, it uses queue_work_on)_ for flushing and that
> flushing doesn't involve put_cpu_partial() AFAIK.
>
> I think only slab allocation or free can lead to put_cpu_partial() that
> would mean the backtrace is missing something. And that somebody does a slab
> alloc/free from a smp callback, which I'd then assume isn't allowed?
Tail-call optimization is hiding the caller. Compiling with
-fno-optimize-sibling-calls exposes the caller. This gives the full
picture:
[ 40.321505] =============================
[ 40.322711] [ BUG: Invalid wait context ]
[ 40.323927] 6.12.0-rc5-next-20241030-dirty #4 Not tainted
[ 40.325502] -----------------------------
[ 40.326653] cpuhp/47/253 is trying to lock:
[ 40.327869] ffff8881ff9bf2f0 (&c->lock){....}-{3:3}, at: put_cpu_partial+0x48/0x1a0
[ 40.330081] other info that might help us debug this:
[ 40.331540] context-{2:2}
[ 40.332305] 3 locks held by cpuhp/47/253:
[ 40.333468] #0: ffffffffae6e6910 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0xe0/0x590
[ 40.336048] #1: ffffffffae6e9060 (cpuhp_state-down){+.+.}-{0:0}, at: cpuhp_thread_fun+0xe0/0x590
[ 40.338607] #2: ffff8881002a6948 (&root->kernfs_rwsem){++++}-{4:4}, at: kernfs_remove_by_name_ns+0x78/0x100
[ 40.341454] stack backtrace:
[ 40.342291] CPU: 47 UID: 0 PID: 253 Comm: cpuhp/47 Not tainted 6.12.0-rc5-next-20241030-dirty #4
[ 40.344807] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 40.347482] Call Trace:
[ 40.348199] <IRQ>
[ 40.348827] dump_stack_lvl+0x6b/0xa0
[ 40.349899] dump_stack+0x10/0x20
[ 40.350850] __lock_acquire+0x900/0x4010
[ 40.360290] lock_acquire+0x191/0x4f0
[ 40.364850] put_cpu_partial+0x51/0x1a0
[ 40.368341] scf_handler+0x1bd/0x290
[ 40.370590] scf_handler_1+0x4e/0xb0
[ 40.371630] __flush_smp_call_function_queue+0x2dd/0x600
[ 40.373142] generic_smp_call_function_single_interrupt+0xe/0x20
[ 40.374801] __sysvec_call_function_single+0x50/0x280
[ 40.376214] sysvec_call_function_single+0x6c/0x80
[ 40.377543] </IRQ>
[ 40.378142] <TASK>
And scf_handler does indeed tail-call kfree:
static void scf_handler(void *scfc_in)
{
[...]
} else {
kfree(scfcp);
}
}
next prev parent reply other threads:[~2024-10-30 22:34 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-30 21:05 Paul E. McKenney
2024-10-30 21:48 ` Vlastimil Babka
2024-10-30 22:34 ` Marco Elver [this message]
2024-10-30 23:04 ` Boqun Feng
2024-10-30 23:10 ` Paul E. McKenney
2024-10-31 7:21 ` Sebastian Andrzej Siewior
2024-10-31 7:35 ` Vlastimil Babka
2024-10-31 7:55 ` Sebastian Andrzej Siewior
2024-10-31 8:18 ` Vlastimil Babka
2024-11-01 17:14 ` Paul E. McKenney
2024-10-31 17:50 ` Paul E. McKenney
2024-11-01 19:50 ` Boqun Feng
2024-11-01 19:54 ` [PATCH] scftorture: Use workqueue to free scf_check Boqun Feng
2024-11-01 23:35 ` Paul E. McKenney
2024-11-03 3:35 ` Boqun Feng
2024-11-03 15:03 ` Paul E. McKenney
2024-11-04 10:50 ` [PATCH 1/2] scftorture: Move memory allocation outside of preempt_disable region Sebastian Andrzej Siewior
2024-11-04 10:50 ` [PATCH 2/2] scftorture: Use a lock-less list to free memory Sebastian Andrzej Siewior
2024-11-05 1:00 ` Boqun Feng
2024-11-07 11:21 ` Sebastian Andrzej Siewior
2024-11-07 14:08 ` Paul E. McKenney
2024-11-07 14:43 ` Sebastian Andrzej Siewior
2024-11-07 14:59 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZyK0YPgtWExT4deh@elver.google.com \
--to=elver@google.com \
--cc=akpm@linux-foundation.org \
--cc=bigeasy@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=cl@linux.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-next@vger.kernel.org \
--cc=longman@redhat.com \
--cc=paulmck@kernel.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=sfr@canb.auug.org.au \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox