From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5161C4338F for ; Sun, 15 Aug 2021 10:24:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 387486109E for ; Sun, 15 Aug 2021 10:24:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 387486109E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=i-love.sakura.ne.jp Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id D25346B0071; Sun, 15 Aug 2021 06:23:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD5E76B0072; Sun, 15 Aug 2021 06:23:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC26C8D0001; Sun, 15 Aug 2021 06:23:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0026.hostedemail.com [216.40.44.26]) by kanga.kvack.org (Postfix) with ESMTP id 9FB366B0071 for ; Sun, 15 Aug 2021 06:23:59 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 553148249980 for ; Sun, 15 Aug 2021 10:23:59 +0000 (UTC) X-FDA: 78476929398.19.5B77DFD Received: from www262.sakura.ne.jp (www262.sakura.ne.jp [202.181.97.72]) by imf20.hostedemail.com (Postfix) with ESMTP id 59F6BD006907 for ; Sun, 15 Aug 2021 10:23:57 +0000 (UTC) Received: from fsav411.sakura.ne.jp (fsav411.sakura.ne.jp [133.242.250.110]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id 17FAN8f6053775; Sun, 15 Aug 2021 19:23:08 +0900 (JST) (envelope-from penguin-kernel@i-love.sakura.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav411.sakura.ne.jp (F-Secure/fsigk_smtp/550/fsav411.sakura.ne.jp); Sun, 15 Aug 2021 19:23:08 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/fsav411.sakura.ne.jp) Received: from [192.168.1.9] (M106072142033.v4.enabler.ne.jp [106.72.142.33]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id 17FAN8Bt053771 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NO); Sun, 15 Aug 2021 19:23:08 +0900 (JST) (envelope-from penguin-kernel@i-love.sakura.ne.jp) To: Thomas Gleixner , Zqiang , Kees Cook , Stephen Boyd , Tejun Heo , Lai Jiangshan , Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , Boqun Feng , linux-mm From: Tetsuo Handa Subject: [5.14] problem with dynamic memory allocation for object debugging Message-ID: Date: Sun, 15 Aug 2021 19:23:03 +0900 User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 59F6BD006907 Authentication-Results: imf20.hostedemail.com; dkim=none; spf=none (imf20.hostedemail.com: domain of penguin-kernel@i-love.sakura.ne.jp has no SPF policy when checking 202.181.97.72) smtp.mailfrom=penguin-kernel@i-love.sakura.ne.jp; dmarc=none X-Rspamd-Server: rspam01 X-Stat-Signature: db1b1zkqp1xnynmjwmr4sg1636re9wnd X-HE-Tag: 1629023037-881759 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello. I'm frequently hitting "BUG: Invalid wait context", and it seems that this problem is not fixed yet. This is a problem between spinlock versus raw_spinlock. check_wait_context() complains if we try to take a spinlock when we already hold a raw_spinlock. raw_spinlock is preferred for debugging code while normal spinlock is preferred for normal code. But sometimes debugging code wants to allocate memory. fill_pool() (a function in debugobject code) is using raw_spinlock but does kmalloc(GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN). fill_pool() can work as long as fill_pool() is called without holding raw_spinlocks. However, functions in workqueue code (e.g. __queue_work()) are also using raw_spinlock, and triggers fill_pool() via debug_object_activate() call. If fill_pool() hits kmalloc() path, lockdep complains because get_partial_node() uses normal spinlock. [ 17.726419][ T41] [ 17.727873][ T41] ============================= [ 17.730788][ T41] [ BUG: Invalid wait context ] [ 17.733519][ T41] 5.14.0-rc5+ #742 Not tainted [ 17.735111][ T41] ----------------------------- [ 17.736747][ T41] kworker/2:1/41 is trying to lock: [ 17.738468][ T41] ffff888005854d18 (&n->list_lock){-.-.}-{3:3}, at: get_partial_node.isra.0.part.0+0x48/0x4c0 [ 17.741896][ T41] other info that might help us debug this: [ 17.743970][ T41] context-{5:5} [ 17.745132][ T41] 3 locks held by kworker/2:1/41: [ 17.746798][ T41] #0: ffff888005871548 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x21e/0x600 [ 17.750250][ T41] #1: ffffc90000823e58 ((work_completion)(&(&krcp->monitor_work)->work)){+.+.}-{0:0}, at: process_one_work+0x21e/0x600 [ 17.754416][ T41] #2: ffff88807cfebe40 (krc.lock){....}-{2:2}, at: kfree_rcu_monitor+0x1d/0x150 [ 17.757302][ T52] cfg80211: failed to load regulatory.db [ 17.757462][ T41] stack backtrace: [ 17.757464][ T41] CPU: 2 PID: 41 Comm: kworker/2:1 Not tainted 5.14.0-rc5+ #742 [ 17.757466][ T41] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020 [ 17.757468][ T41] Workqueue: events kfree_rcu_monitor [ 17.757474][ T41] Call Trace: [ 17.757476][ T41] dump_stack_lvl+0x57/0x7d [ 17.757481][ T41] __lock_acquire.cold+0x213/0x2f2 [ 17.757486][ T41] lock_acquire+0xd0/0x300 [ 17.757490][ T41] ? get_partial_node.isra.0.part.0+0x48/0x4c0 [ 17.757494][ T41] ? __bfs+0x10a/0x240 [ 17.757496][ T41] _raw_spin_lock+0x2c/0x70 [ 17.757500][ T41] ? get_partial_node.isra.0.part.0+0x48/0x4c0 [ 17.757503][ T41] get_partial_node.isra.0.part.0+0x48/0x4c0 [ 17.757506][ T41] ? ret_from_fork+0x1f/0x30 [ 17.757510][ T41] ___slab_alloc+0x4a0/0x9c0 [ 17.757513][ T41] ? fill_pool+0x14d/0x220 [ 17.757517][ T41] ? check_prev_add+0x32f/0xc00 [ 17.757519][ T41] ? __slab_alloc.isra.0+0x22/0x80 [ 17.757522][ T41] __slab_alloc.isra.0+0x22/0x80 [ 17.757524][ T41] kmem_cache_alloc+0x334/0x360 [ 17.757527][ T41] ? fill_pool+0x14d/0x220 [ 17.757530][ T41] fill_pool+0x14d/0x220 [ 17.757533][ T41] ? debug_object_activate+0xb2/0x210 [ 17.757536][ T41] ? find_held_lock+0x2b/0x80 [ 17.757537][ T41] __debug_object_init+0x2e/0x3a0 [ 17.757540][ T41] debug_object_activate+0x12a/0x210 [ 17.757543][ T41] ? delayed_work_timer_fn+0x10/0x10 [ 17.757547][ T41] call_rcu+0x2c/0x330 [ 17.757550][ T41] queue_rcu_work+0x22/0x30 [ 17.757553][ T41] kfree_rcu_monitor+0xff/0x150 [ 17.757555][ T41] process_one_work+0x2aa/0x600 [ 17.757559][ T41] worker_thread+0x48/0x3d0 [ 17.757562][ T41] ? process_one_work+0x600/0x600 [ 17.757564][ T41] kthread+0x13e/0x170 [ 17.757567][ T41] ? set_kthread_struct+0x40/0x40 [ 17.757570][ T41] ret_from_fork+0x1f/0x30 Question is how to solve this problem. Can all locks held in atomic memory allocation path be updated to use raw_spinlock ? If yes, it is nice. Can debugobject code preallocate memory asynchronously so that debug_object_activate() can be called from any context ? Can workqueue code stop using raw_spinlocks (e.g. reverting commit a9b8a985294debae ("workqueue: Convert the pool::lock and wq_mayday_lock to raw_spinlock_t")) ?