From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC6E8D6B6DE for ; Wed, 30 Oct 2024 23:11:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3CA066B009B; Wed, 30 Oct 2024 19:11:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 352F36B009C; Wed, 30 Oct 2024 19:11:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F4666B009D; Wed, 30 Oct 2024 19:11:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id F2A2F6B009B for ; Wed, 30 Oct 2024 19:11:02 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id BC585A112C for ; Wed, 30 Oct 2024 23:11:02 +0000 (UTC) X-FDA: 82731815166.10.6AEA3B0 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf16.hostedemail.com (Postfix) with ESMTP id E3F06180010 for ; Wed, 30 Oct 2024 23:10:34 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=VopkGahq; spf=pass (imf16.hostedemail.com: domain of "SRS0=rUt/=R2=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org" designates 139.178.84.217 as permitted sender) smtp.mailfrom="SRS0=rUt/=R2=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org"; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730329647; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fx+aMQvnHIPIR/kIiX2EJFAWTHAag/FigDGsXRc8T8o=; b=OA58tfCHgqqvfLqpzbvDcmtTuAz6XWMiy7y5Hxd50KGWfl2BdGZa98Bcn01N68hmVA8iYQ 9/mYeuSHqfMcFEI7nTr0WX+Ih6LzfYuM7NCpTzeorQiI2r/AF3330bQStcbMsoCckgXdzT CTD81c1ZVPZF4XbJdjWRnMczIc0JeX0= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=VopkGahq; spf=pass (imf16.hostedemail.com: domain of "SRS0=rUt/=R2=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org" designates 139.178.84.217 as permitted sender) smtp.mailfrom="SRS0=rUt/=R2=paulmck-ThinkPad-P17-Gen-1.home=paulmck@kernel.org"; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730329647; a=rsa-sha256; cv=none; b=lOvC/2pLYqYVDx42T8ToUMnNtngxa5yCBZJzvIdXWprdMPlC8FLkLWkSAA+7peVajzVlg/ fk9v6n9Us4km5frk1EzYweWwDphYdIjkvA0mfKO8PNhLTEWzasO0Kw/jXTyqNKB3skZmcn C9HKRtUhCKKnRZSbCfAV5bv0D28s9nM= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 53C6E5C6A13; Wed, 30 Oct 2024 23:10:15 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EBEFCC4CECE; Wed, 30 Oct 2024 23:10:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1730329860; bh=NobT7qYPgQUDHyh4P5cTe3irZqg7DEUQJbR+GUeX5ac=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=VopkGahqXCNlTpf7AI+7gIpoBrYuq5ssZSU6uX7UiPnzc+PKC3X1lD6A0oOUF1hTG atTVsEIE4NoncDheeKu3q6HeLfIqW42kgs2C6u6hsNiKRcfBN9BLc1N7YNSIIZrx/b lIA25iKdM8Lx9H6QByYehMSl4DlZReZ1aHCd5u6pjCju1Jq80p4ic5TDonx53vZQ4A 4TXtwICbCJn2D0sjy2ALwSr61qLqhpP1D4ZCXseKUEBRwyhmUHGdV4TFQuKMQ0OAfI QG2Bj/OP+4xoLAapTH//iQaowQ+XgHS1tODxxNrnV1ecO1j/92QCCYMBrWczW+SnAN k7JPIztZkQOnw== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 82F26CE0864; Wed, 30 Oct 2024 16:10:58 -0700 (PDT) Date: Wed, 30 Oct 2024 16:10:58 -0700 From: "Paul E. McKenney" To: Marco Elver Cc: Vlastimil Babka , linux-next@vger.kernel.org, linux-kernel@vger.kernel.org, kasan-dev@googlegroups.com, linux-mm@kvack.org, sfr@canb.auug.org.au, bigeasy@linutronix.de, longman@redhat.com, boqun.feng@gmail.com, cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org Subject: Re: [BUG] -next lockdep invalid wait context Message-ID: <66a745bb-d381-471c-aeee-3800a504f87d@paulmck-laptop> Reply-To: paulmck@kernel.org References: <41619255-cdc2-4573-a360-7794fc3614f7@paulmck-laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E3F06180010 X-Stat-Signature: jn1uh56ef7bupaqz93b9pwdhwg3ccgkw X-Rspam-User: X-HE-Tag: 1730329834-998798 X-HE-Meta: U2FsdGVkX19R0tif8CQDbFSkcg8KReTDTrrtfXJ0cxjbI+XbMUxaHTh+GK6yg8Tl4IePvzXZZ/zMBoqXzaVNZHECWvUIwa8EquKULogZ4JVDJW0lHp5K+NzE2FBTJhz5MSYmTZbXas6FyK3JG+laDW9fdLUXDNN0dNvWAzegFRUDEkNND8DIPmYA7whJyJq4PCAucAaISTAodwt94cdzZAZx7c6rYQWuC1JbLfPJD3GRCDt54nbOn0lpuQgbQBveLMFskdfsP+bc0kjbYi2v1giaBio2rvaobDBDjyKXQZCgV6C+n+M4iy6xuXFKn8u60FZLaJj12ESkvpOvRQ+y93LrOKiEPNmax83BqObdEjGyJVb96WK+rRgUL11ER+dvgotWPyize37S4Be7/nZCEj6lVTrPDUnBshFe//k1Oj6YpRaYUUg2bQ9Kuo0rEd1vkQjdf2URaIaZfFC6PvOtx/sUOGgBPaeJc1Y8iFQFVL4bOuoQdAGlzkNn133Mm2vZiJKeINH8gP9Y1YzVvg/I6ddKcXama1Xbwrxh8MaxvIhm+XwalBDAgjnV3wYgG8KjlPfqn8y+8x4CLkUKni9X+WMRqN83pGydV8xs6hTTbxqG7wqSXpy1TfX9TwZkkR4oFG+9h8IHyHT6CbzIWogjq6WVx8frGxosXcGTopepy2/wcjSrHOMzlyYCCRFAIltm/gh58FK9RtqzRHCaIAXu9Imrb5u2OjVh/k45hUKqn4oQgNnt9AaQMVnEPvViPj+LhT+IZe6VD34ymFoW+g/md5y5IZJCIawKFBZ8GtD0k7qeQDYAe2ZRXQHkNIDxx0R0uGcD7Gqx3H5zCZ+7lG7BxJRV1BpzZoB1d0C1n3SzM8oem6V592WcbjNb7UbLEX2XUnXtHEO695DdjPl9nA14mNx7VytRsuPeAI3BRRN6Z+6b+g5t5Sdi5vBYWP/vudTF6B5CzEq9kcAJ+wLLueI Q5uHf519 qJTAzsrD+3l+gsnTZjQd06sAHb92OjikcN9qfGvJ6hkY9Idc0Zy8a5STYWJTTUXJYfLceubdJn734Uax9dgcvo4RNGrcNpAJql6AMbVTzscZtwwKOdc6+c+/7oO+2yaI0iigDgc5JaDzwFGYQIr6EtAJ/xKSsX1Fb7IAnX9tkhr8+rDn4WThKMSp9hQFEBcl0wha9jUkt2NUYXLdgzVpNe/32Amr1isrV19s1l2/Xs2xqJp4PmFl7JTU+UfwsSn84y0CFbUoJFmVzCa8fqKcfk+1oHC7ZeRiOg6d0gs1LfniBeCZR59I7tc/RH0AuyzF0aB0BoXe5UbspvV9nRe7QWJkwf7C0xi9zcFKuJSCtLeTugH6S3SAs/fBgIMywYgbDOjPIODjAYyX9SNF80MvYI4jtxyzjlVUs+Zshyde9VMekr4wgO2ObQNoTo6u6DJ7ET0Y6GGxoHdnWOxM5vk4RhrN+WyH8w+vJYowi8y/SSmhRudQgxQ1nhdRSSGqMSlUK/wEec01fjKjXqkyn+HRWwRLcgIU4ZXM/4uo/qHjhpgVGjCUZiIMe9d3qoP/hMTqsneWYCXerlCEse4Ws67pD3lTwsWygq40mvO7nEPBZFHAiv9g2UH/J878FEuO45jBypMW6d0A+x3BSFwHG7bT1IBbKuiFrury04bu7VNXjwWTrCZ1ij7kS4Tei6+2KVovCJ/VkmXphEuuAdGozZhCOhm4lxzdjC9S+MbJvb3nhf/T7l3Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Oct 30, 2024 at 11:34:08PM +0100, Marco Elver wrote: > On Wed, Oct 30, 2024 at 10:48PM +0100, Vlastimil Babka wrote: > > On 10/30/24 22:05, Paul E. McKenney wrote: > > > Hello! > > > > Hi! > > > > > The next-20241030 release gets the splat shown below when running > > > scftorture in a preemptible kernel. This bisects to this commit: > > > > > > 560af5dc839e ("lockdep: Enable PROVE_RAW_LOCK_NESTING with PROVE_LOCKING") > > > > > > Except that all this is doing is enabling lockdep to find the problem. > > > > > > The obvious way to fix this is to make the kmem_cache structure's > > > cpu_slab field's ->lock be a raw spinlock, but this might not be what > > > we want for real-time response. > > > > But it's a local_lock, not spinlock and it's doing local_lock_irqsave(). I'm > > confused what's happening here, the code has been like this for years now. > > > > > This can be reproduced deterministically as follows: > > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --torture scf --allcpus --duration 2 --configs PREEMPT --kconfig CONFIG_NR_CPUS=64 --memory 7G --trust-make --kasan --bootargs "scftorture.nthreads=64 torture.disable_onoff_at_boot csdlock_debug=1" > > > > > > I doubt that the number of CPUs or amount of memory makes any difference, > > > but that is what I used. > > > > > > Thoughts? > > > > > > Thanx, Paul > > > > > > ------------------------------------------------------------------------ > > > > > > [ 35.659746] ============================= > > > [ 35.659746] [ BUG: Invalid wait context ] > > > [ 35.659746] 6.12.0-rc5-next-20241029 #57233 Not tainted > > > [ 35.659746] ----------------------------- > > > [ 35.659746] swapper/37/0 is trying to lock: > > > [ 35.659746] ffff8881ff4bf2f0 (&c->lock){....}-{3:3}, at: put_cpu_partial+0x49/0x1b0 > > > [ 35.659746] other info that might help us debug this: > > > [ 35.659746] context-{2:2} > > > [ 35.659746] no locks held by swapper/37/0. > > > [ 35.659746] stack backtrace: > > > [ 35.659746] CPU: 37 UID: 0 PID: 0 Comm: swapper/37 Not tainted 6.12.0-rc5-next-20241029 #57233 > > > [ 35.659746] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 > > > [ 35.659746] Call Trace: > > > [ 35.659746] > > > [ 35.659746] dump_stack_lvl+0x68/0xa0 > > > [ 35.659746] __lock_acquire+0x8fd/0x3b90 > > > [ 35.659746] ? start_secondary+0x113/0x210 > > > [ 35.659746] ? __pfx___lock_acquire+0x10/0x10 > > > [ 35.659746] ? __pfx___lock_acquire+0x10/0x10 > > > [ 35.659746] ? __pfx___lock_acquire+0x10/0x10 > > > [ 35.659746] ? __pfx___lock_acquire+0x10/0x10 > > > [ 35.659746] lock_acquire+0x19b/0x520 > > > [ 35.659746] ? put_cpu_partial+0x49/0x1b0 > > > [ 35.659746] ? __pfx_lock_acquire+0x10/0x10 > > > [ 35.659746] ? __pfx_lock_release+0x10/0x10 > > > [ 35.659746] ? lock_release+0x20f/0x6f0 > > > [ 35.659746] ? __pfx_lock_release+0x10/0x10 > > > [ 35.659746] ? lock_release+0x20f/0x6f0 > > > [ 35.659746] ? kasan_save_track+0x14/0x30 > > > [ 35.659746] put_cpu_partial+0x52/0x1b0 > > > [ 35.659746] ? put_cpu_partial+0x49/0x1b0 > > > [ 35.659746] ? __pfx_scf_handler_1+0x10/0x10 > > > [ 35.659746] __flush_smp_call_function_queue+0x2d2/0x600 > > > > How did we even get to put_cpu_partial directly from flushing smp calls? > > SLUB doesn't use them, it uses queue_work_on)_ for flushing and that > > flushing doesn't involve put_cpu_partial() AFAIK. > > > > I think only slab allocation or free can lead to put_cpu_partial() that > > would mean the backtrace is missing something. And that somebody does a slab > > alloc/free from a smp callback, which I'd then assume isn't allowed? > > Tail-call optimization is hiding the caller. Compiling with > -fno-optimize-sibling-calls exposes the caller. This gives the full > picture: > > [ 40.321505] ============================= > [ 40.322711] [ BUG: Invalid wait context ] > [ 40.323927] 6.12.0-rc5-next-20241030-dirty #4 Not tainted > [ 40.325502] ----------------------------- > [ 40.326653] cpuhp/47/253 is trying to lock: > [ 40.327869] ffff8881ff9bf2f0 (&c->lock){....}-{3:3}, at: put_cpu_partial+0x48/0x1a0 > [ 40.330081] other info that might help us debug this: > [ 40.331540] context-{2:2} > [ 40.332305] 3 locks held by cpuhp/47/253: > [ 40.333468] #0: ffffffffae6e6910 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0xe0/0x590 > [ 40.336048] #1: ffffffffae6e9060 (cpuhp_state-down){+.+.}-{0:0}, at: cpuhp_thread_fun+0xe0/0x590 > [ 40.338607] #2: ffff8881002a6948 (&root->kernfs_rwsem){++++}-{4:4}, at: kernfs_remove_by_name_ns+0x78/0x100 > [ 40.341454] stack backtrace: > [ 40.342291] CPU: 47 UID: 0 PID: 253 Comm: cpuhp/47 Not tainted 6.12.0-rc5-next-20241030-dirty #4 > [ 40.344807] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 > [ 40.347482] Call Trace: > [ 40.348199] > [ 40.348827] dump_stack_lvl+0x6b/0xa0 > [ 40.349899] dump_stack+0x10/0x20 > [ 40.350850] __lock_acquire+0x900/0x4010 > [ 40.360290] lock_acquire+0x191/0x4f0 > [ 40.364850] put_cpu_partial+0x51/0x1a0 > [ 40.368341] scf_handler+0x1bd/0x290 > [ 40.370590] scf_handler_1+0x4e/0xb0 > [ 40.371630] __flush_smp_call_function_queue+0x2dd/0x600 > [ 40.373142] generic_smp_call_function_single_interrupt+0xe/0x20 > [ 40.374801] __sysvec_call_function_single+0x50/0x280 > [ 40.376214] sysvec_call_function_single+0x6c/0x80 > [ 40.377543] > [ 40.378142] > > And scf_handler does indeed tail-call kfree: > > static void scf_handler(void *scfc_in) > { > [...] > } else { > kfree(scfcp); > } > } So I need to avoid calling kfree() within an smp_call_function() handler? Thanx, Paul