From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CD76C021AA for ; Wed, 19 Feb 2025 16:22:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C468280241; Wed, 19 Feb 2025 11:22:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 472AB28023C; Wed, 19 Feb 2025 11:22:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 34080280241; Wed, 19 Feb 2025 11:22:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0C99928023C for ; Wed, 19 Feb 2025 11:22:50 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id CB55E8116A for ; Wed, 19 Feb 2025 16:22:49 +0000 (UTC) X-FDA: 83137212858.07.2FD0AAD Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf14.hostedemail.com (Postfix) with ESMTP id 039AD100014 for ; Wed, 19 Feb 2025 16:22:47 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of "SRS0=Eo3P=VK=goodmis.org=rostedt@kernel.org" designates 139.178.84.217 as permitted sender) smtp.mailfrom="SRS0=Eo3P=VK=goodmis.org=rostedt@kernel.org"; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739982168; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l63W48b/GR+7VCkls3I+TcEvr+5MLF2oOmvt46cKYZY=; b=zVx+6mrSEO6uirGCgXYOfwcG5dpenYXYRDi/3judN3RkDnn3f1ZDSxcP5RjPjWjT1wjq5b HprFm/HLnl1fezQMCBbXjyOXkihpLvUPZP3DGNzrTlZcUNNZl2XRMOGLnX1G92RYGqYd80 4qOUwcUh8mDxe4h2k9PvJgXVuRk/TZA= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of "SRS0=Eo3P=VK=goodmis.org=rostedt@kernel.org" designates 139.178.84.217 as permitted sender) smtp.mailfrom="SRS0=Eo3P=VK=goodmis.org=rostedt@kernel.org"; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739982168; a=rsa-sha256; cv=none; b=4RLz+uM6+hBD+QG1ws59K7gm6Kv6MJdOW5tEo/WlTubWDhauWCzqnHFzRH/23PYeeIfJrB yWoIL3GUsZ175vbF45mMnKOvC0PK8D3Z3Ls91dJm6iA9aUAVCVgpefGS/Rr3NCbTRn0Vs5 XH9xEPPTswh0g3mvibpI8JONejxqKss= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id A16295C5B77; Wed, 19 Feb 2025 16:22:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id BB84AC4CED1; Wed, 19 Feb 2025 16:22:44 +0000 (UTC) Date: Wed, 19 Feb 2025 11:23:08 -0500 From: Steven Rostedt To: "Masami Hiramatsu (Google)" Cc: Peter Zijlstra , Ingo Molnar , Will Deacon , Andrew Morton , Boqun Feng , Waiman Long , Joel Granados , Anna Schumaker , Lance Yang , Kent Overstreet , Yongliang Gao , Tomasz Figa , Sergey Senozhatsky , linux-kernel@vger.kernel.org, Linux Memory Management List , Lance Yang Subject: Re: [PATCH 1/2] hung_task: Show the blocker task if the task is hung on mutex Message-ID: <20250219112308.5d905680@gandalf.local.home> In-Reply-To: <173997004932.2137198.7959507113210521328.stgit@mhiramat.tok.corp.google.com> References: <173997003868.2137198.9462617208992136056.stgit@mhiramat.tok.corp.google.com> <173997004932.2137198.7959507113210521328.stgit@mhiramat.tok.corp.google.com> X-Mailer: Claws Mail 3.20.0git84 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Queue-Id: 039AD100014 X-Stat-Signature: 48qowpij7h9bz4h5nkjuc8ecixmrzqrx X-Rspamd-Server: rspam03 X-HE-Tag: 1739982167-728725 X-HE-Meta: U2FsdGVkX1+qCT6/GdnrbpPMZpRkwSl/7ldwEbJAWwTN+rtZlQkuANvxRpV8JskDlHj134q7e+nG7wpQokzF9aIFY3HCAy4iULRKksWRY0HTGcLUkeRQH5YmUOiLaNdvaj1ioO3szoNsBpbN8rbUMwsJRhTtbMMZm/BgCjvdya4SjSvdPM85kZkBdvphnlcvKLpRfmUUyLwq23Pu9Pv79wj4nMvwP64T0jJYa/hdbFPj6mwS2vUJqeQmL19U/JlqVzFtQZo9fanX1NWDWb0AIW6vzb7GwaGpbwNir8FeLplFC5zUJhqmB1/4hwGcJbHwIx7I0TxidqvG3qAj9Xu87L2oHour94fbvcPNXlt8yOkhveOqspbPD98f+1pashph2HUYC1vE5zvpW6LBxoq6ye/a7aI7JAGp2A7vDNSm/csGEWhkSA6WtpuEAjMWhd1VA8Ol9oTXpzhJKm0jZ2I5cOfbtEgPf1Pni/Sn21YARWCUhJhVFoqi9XwKgz5MCd6Dlcez8R/UYz/qn+PAAlW5VOPcRFXrtyaZW+pQFzLDlZezw8J3xqoUSQaHcznTsFDD/aOKfut7wkE8RpaZHCoezHlNHZfnrXHZuP5jj1MwniUvmDoICdI0FkUWwCkw3VMKDSq9hCYh6Av69awID+0KAgIHrnINVKvOkT3VuzwOe1CtGRuK0JWXv3R6t6UseZETCY+tMZ6hQsBT+VLMX689rjHB+0LTWG36NCOxqxTaUv50RQLE/vGZO+FsK7FpjdBkrNHCKBlttoNTbtQLf0KMQQS4IwjOqCf3KXe1sMHCL7DUyy0jqggdCrcKY6B4v4/T5nYTLDmWemBr652m4ahIWTZA7WVWOkaSq+wBBYGHz+Z52NmSznIEfH0QF+yTRd+3t5nhq2708r7t3+V5LHlBQxEOL9GEECEuUZWCB4iRY0bnnPFydsxfRjKEVkK26gz+1ZPK8HY9m2YkSo2VoKi yAPklWPn 1g6Nck9D6smU3ARte26sgdCnJBrDPB1s+JYa1LzH2r98BGk01BYV2CJHlrK8JTqhXfzVqM207bTGpPS03XJMrJH1FLutJOUTkEzfP+orySblmVBwH2WOyRgvQzYf7D3VanyzpEEjmkge+N8zCFQkn2WR/arIJBjK7FEWGivU2DF0+i9jIF4jXqB80gW5m4jRLxZQHa9LnKs6/Ked/uNnlDEnBY74Dsv/JdmqpcoqvGuLQV0eqNuaWQnQFOZ4AfhFRVYppWDZmHsQvypcVZT62WzZO2GfEYYIW4KBqyj0R0aTqiKzVvIpELiYWBnLsQiuQDaNZ8Oth5+geJo2wIbUvPYhPUIue53cNeT3zv2tam5WX9x7bkpjmL2z4biu1yWzXnRJF1O+jOk9txC+hqqWGVR/Sg7cxsTSYIL+iV74Y7ezemxY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 19 Feb 2025 22:00:49 +0900 "Masami Hiramatsu (Google)" wrote: > From: Masami Hiramatsu (Google) > > The "hung_task" shows a long-time uninterruptible slept task, but most > often, it's blocked on a mutex acquired by another task. Without > dumping such a task, investigating the root cause of the hung task > problem is very difficult. > > Fortunately CONFIG_DEBUG_MUTEXES=y allows us to identify the mutex > blocking the task. And the mutex has "owner" information, which can > be used to find the owner task and dump it with hung tasks. > > With this change, the hung task shows blocker task's info like below; > We've hit bugs like this in the field a few times, and it was very difficult to debug. Something like this would have made our lives much easier! > Signed-off-by: Masami Hiramatsu (Google) > --- > kernel/hung_task.c | 38 ++++++++++++++++++++++++++++++++++++++ > kernel/locking/mutex-debug.c | 1 + > kernel/locking/mutex.c | 9 +++++++++ > kernel/locking/mutex.h | 6 ++++++ > 4 files changed, 54 insertions(+) > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c > index 04efa7a6e69b..d1ce69504090 100644 > --- a/kernel/hung_task.c > +++ b/kernel/hung_task.c > @@ -25,6 +25,8 @@ > > #include > > +#include "locking/mutex.h" > + > /* > * The number of tasks checked: > */ > @@ -93,6 +95,41 @@ static struct notifier_block panic_block = { > .notifier_call = hung_task_panic, > }; > > + > +#ifdef CONFIG_DEBUG_MUTEXES > +static void debug_show_blocker(struct task_struct *task) > +{ > + struct task_struct *g, *t; > + unsigned long owner; > + struct mutex *lock; > + > + if (!task->blocked_on) > + return; > + > + lock = task->blocked_on->mutex; This is a catch 22. To look at the task's blocked_on, we need the lock->wait_lock held, otherwise this could be an issue. But to get that lock, we need to look at the task's blocked_on field! As this can race. Another thing is that the waiter is on the task's stack. Perhaps we need to move this into sched/core.c and be able to lock the task's rq. Because even something like: waiter = READ_ONCE(task->blocked_on); May be garbage if the task were to suddenly wake up and run. Now if we were able to lock the task's rq, which would prevent it from being woken up, then the blocked_on field would not be at risk of being corrupted. -- Steve > + if (unlikely(!lock)) { > + pr_err("INFO: task %s:%d is blocked on a mutex, but the mutex is not found.\n", > + task->comm, task->pid); > + return; > + } > + owner = debug_mutex_get_owner(lock); > + if (likely(owner)) { > + /* Ensure the owner information is correct. */ > + for_each_process_thread(g, t) > + if ((unsigned long)t == owner) { > + pr_err("INFO: task %s:%d is blocked on a mutex owned by task %s:%d.\n", > + task->comm, task->pid, t->comm, t->pid); > + sched_show_task(t); > + return; > + } > + } > + pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n", > + task->comm, task->pid); > +} > +#else > +#define debug_show_blocker(t) do {} while (0) > +#endif