From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADBE9E7718B for ; Mon, 23 Dec 2024 11:55:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1AEB06B00A3; Mon, 23 Dec 2024 06:55:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 15E5C6B00A4; Mon, 23 Dec 2024 06:55:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F19ED6B00AD; Mon, 23 Dec 2024 06:55:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C9ED06B00A3 for ; Mon, 23 Dec 2024 06:55:55 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 500151A1818 for ; Mon, 23 Dec 2024 11:55:55 +0000 (UTC) X-FDA: 82926069450.28.BC6504A Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) by imf23.hostedemail.com (Postfix) with ESMTP id 4B6A114000C for ; Mon, 23 Dec 2024 11:55:28 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="fYF/jPKm"; spf=pass (imf23.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734954935; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Yh1cy/wRC7b39JaZfiJhB6xPXvP60nFjZDtMdtjelOI=; b=SLScBjbJcAsCtXiW83wWD2GnwI9s4Fo4C2jeIHMnmtt1WmuL9Ic+mcLVgXrx3lqop6AYdY 4tqwXIkPdMMnfznXveJ4BsAQD4JoUFMQaC8BpyMNnrAmjT0LsML6k60UZRkzK6w07RZouM 8e/EECU6YKw78WdPZyYwDE7iyiowxk4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734954935; a=rsa-sha256; cv=none; b=qddGaGIwvHDOcsqmGY4+f7w6OYWIcSxcqnhmli8xycfiOFj9EtI0QZYJSOYXnx16vdunw2 5BkD/bAiW6wFUt86Pd6WzFN8N/F+lXalwEdgh84lyuRgyEVbRJwUmbobR8U8koCSq+Lvl3 Mpv6vn4JHthkVb8G++QADtxpcQ6Y7lQ= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="fYF/jPKm"; spf=pass (imf23.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.41 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-qv1-f41.google.com with SMTP id 6a1803df08f44-6d8f65ef5abso31512666d6.3 for ; Mon, 23 Dec 2024 03:55:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734954952; x=1735559752; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Yh1cy/wRC7b39JaZfiJhB6xPXvP60nFjZDtMdtjelOI=; b=fYF/jPKmlyJ34krikAA5em42rrth/bZRqVeuIekfzqAxNY2s1Gh/sHBv9Hp8cwYNI4 S3KuD5pvwjstzvl3w+ojN93uWLXOm5IgrTati0i3Z5Ee59PCHOSL+1lM2bO8a3+9npLS 3i9psrRqaeiNgUyHosXgwN7kiD7rz5OFLNLYXPpoEuI8lHkNuCm64DvYXt4uthAtoEkH f/E1J0S4Klw+LhoAhPU9unQkx1EGwRgqTHkgtrSxDXy7q4SScgI8yDqGSxnEnT/u3/dO Lv4SE7h991r6NPmZ/KF84gUhGA0A189X4F8Uhcp2CwYB13ZS8ot8u3YYIsgJpFCNC4Nu YaVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734954952; x=1735559752; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Yh1cy/wRC7b39JaZfiJhB6xPXvP60nFjZDtMdtjelOI=; b=Me9hmGOqSSjpm7/qRoztUl+JUvWG8VBaKV10E/LGRfwH7Fso0RJwA7rG7xV0xXDdXs 9ngXTIeE6P+qdSlxW79MdNJmAmHr9eHhq4bpF1V/OkJ2Tkn8sWOYMak+MCCVOvVHpQJM vbcXT6V68Q7Hl2oLGVABR4O3CbFkAfZZQ8bfKspOuEVMov/oP3H8A5F+v7AHIbdd8/a3 U3M9RTlbqq5FA6l7fLDQBdlOaMIKPC2ypuEq4LpN8QAqoisHC33aSc4t8P5Q/+kVFBzI d0CZD+UVV/oqThFSDodi8VdIRxv4qWS8U3+Y4wcRKumGcHpiKTRcG/8ZZXdngX6H3zDi PkQg== X-Gm-Message-State: AOJu0YzBfBuzsRcJJ31YJn/PRkgJmimcU35WPaxQu036In9f1YzWCwgr 7zHfyfKwcfzQ77mR7j00UkTNk41yzF6/K2Q7iYD/3h6Xd0f/hWCA9+tfAKAjl+sHLNFJSiKt2h/ gt04gcuvpeLFOot+cP0mdgPicMq4= X-Gm-Gg: ASbGnctbtlp93S06rczGQ81C82OdCr3gZEASvczpptwmQ647PKSlWGmsBY4AlvMV/YW acFULQf8M5w9MTtnyDlwqyTFqFzQkVJVF81HHJM0= X-Google-Smtp-Source: AGHT+IGXELnX+S4Nn14Ey7GNcfI/7nbDoBDh1cupEATZtZwYNjk0hwMNt+RnsPKL4Nz2MzJSfLv7WW2jB4HoZzNTdaQ= X-Received: by 2002:a05:6214:27eb:b0:6d8:f612:e27d with SMTP id 6a1803df08f44-6dd2339ff38mr204223226d6.32.1734954952509; Mon, 23 Dec 2024 03:55:52 -0800 (PST) MIME-Version: 1.0 References: <20241223093722.78570-1-laoar.shao@gmail.com> In-Reply-To: <20241223093722.78570-1-laoar.shao@gmail.com> From: Yafang Shao Date: Mon, 23 Dec 2024 19:55:16 +0800 Message-ID: Subject: Re: [PATCH] hung_task: fix missing hung task detection for kthread in TASK_WAKEKILL state To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org, Dave Chinner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 4B6A114000C X-Stat-Signature: jarebryhodhanaicrgm19z1gdqggrwfk X-Rspam-User: X-HE-Tag: 1734954928-607594 X-HE-Meta: U2FsdGVkX18WbmFr7Fp0BT8DoS2Pw8iyOVJDDmUjQqfsJLXaWj8oC3FsMPvuxbYVcU5fjUr+dHXtbSJ6GHjcOXlwjOTBEksX86MOLvDvFvc9H/Qcd45oikMCCVbXYT8Kc2J3i7fRKO80+zUdjtyNHhubBKg2Ers0F/5qb9dlOtaHzRtCgfLMt8bujIcHa4BCdz/xW+vb2rp1i+o040p6nE3ZuDwMVYXHunXAdONqlJFwouDN4zPPofbWXJZ9ItVZPQSLt7Nel62PFXwNE9ri+TKEIHrOS2ZgUOFurJEHhx9Ghthfp3Awpo5qlhW8tACRK9dvMtHEEdmqXAmL2Ry5k8K6b6mpaXxiKm4GzGVwTb5qIodjS6btVOJ2eH9z37j1w4YTcefM0KhGh3/fx5Tk6inLu4CE3oOFngCtsR9KkXkVcOixAruJdXzLzYlv+c7JiAtZ61NGnwGgtdaCL8XNZpcVto/ghcW0R7oDiVkni75UfzoXQKAS+uAJOBbWp6IE82FfffgS2V+oDQvCyU5xoM+En1d/B8oPog579soRPePSNFMJhzASplQUPM2SqqhzbdnXkqK2v56lbTlGDd5NhDmghf1bvmpa6aMv9F83tI3/YHtvPe8D9dpwup2gNqbd/AARHAJxCOYL7mFVGiJQr4W1iiFMbk+qkgD4FBMcNoSbalNrmOkATpC5YWE3rBvomr9RUarCO21Qy7LbXreub2xFUPs+zUCcyDj9bjHzCr6j37jWr2aYSGe4iKQ6GaGXrFPGOYNWPbYj/9+hBOH9MD1YH1nvLU73VpzrlapFaanjErpKO8L5Gb3BgEnhP6M9KHh7WsfSpUhcsvkBKrfsfpLwIxZ868P/HaY0idRhaD7hpEq/CEbv6qYRQUgzu/g4Jcr0gj3JMC59/opw4Ul2JsoqNrOBZ2Nf02LRvybEQdaXOkIdSViDeKHPBI/aHdeAe9sEBnzqAImTv/QCsC6 ku68NAfj gUIdWipmIdtb4c+LT4iVX/k+IgWrWLGzGy1gMNOyVRlv+wkMECjtM0AiLEKwTC61M/BJnn+GU8XVkhqA7AX7wBy1pZtOI8DBbsz+h2dgpm+zFqFpEKTkn44FPz+FLTofPUsRP8c6Yd8EmsVwWOao7hLWjV7wFyQ2SeYrzNzMFuyDvvkmACcELqcbge7UV2x+K/q3QAeRS3SG+W+tyICMYnG8cC168/HfDxysVgz1cRe7okPKpgBsNTBMqYdJY90ysNjUrhaj7t3doi9zx0GiVVvQdBX6faLuDlbdk0OjjzkhPvgoOKu2c2q0Ls2ZgQEpPx515Uz1CcCsUn27FffA4xtMF8aADq5UDKxIs47efM22cS8zSYtNP6ymTrgVe0SvRq/hTcdUkB71gBmdEmPTljRJ1TCmbbMh7a6czHk7OYXLUVkKKu/0hFeHB5dByxyX7u/8R6yCfVciReqfZe2YVyGhR1RpBSPs5AqPgxAqznJITvAvTteuDfL3czg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.006363, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Dec 23, 2024 at 5:37=E2=80=AFPM Yafang Shao = wrote: > > We recently encountered an XFS deadlock issue, which is a known problem > resolved in the upstream kernel [0]. During the analysis of this issue, I > observed that a kernel thread in the TASK_WAKEKILL state could not be > detected as a hung task by the hung_task detector. The details are as > follows: > > Using the following command, I identified nine tasks stuck in the D state= : > > $ ps -eLo state,comm,tid,wchan | grep ^D > D java 4177339 xfs_buf_lock > D kworker/93:3+xf 3025535 xfs_buf_lock > D kworker/87:0+xf 3426612 xfs_extent_busy_flush > D kworker/85:0+xf 3479378 xfs_buf_lock > D kworker/91:1+xf 3584478 xfs_buf_lock > D kworker/80:3+xf 3655680 xfs_buf_lock > D kworker/89:0+xf 3671691 xfs_buf_lock > D kworker/84:1+xf 3708397 xfs_buf_lock > D kworker/81:1+xf 4005763 xfs_buf_lock > > However, the hung_task detector only reported eight of these tasks: > > [3108840.650652] INFO: task java:4177339 blocked for more than 247779 sec= onds. > [3108840.654197] INFO: task kworker/93:3:3025535 blocked for more than 24= 8427 seconds. > [3108840.657711] INFO: task kworker/85:0:3479378 blocked for more than 24= 7836 seconds. > [3108840.661483] INFO: task kworker/91:1:3584478 blocked for more than 24= 9638 seconds. > [3108840.664871] INFO: task kworker/80:3:3655680 blocked for more than 24= 9638 seconds. > [3108840.668495] INFO: task kworker/89:0:3671691 blocked for more than 24= 9047 seconds. > [3108840.672418] INFO: task kworker/84:1:3708397 blocked for more than 24= 7836 seconds. > [3108840.676175] INFO: task kworker/81:1:4005763 blocked for more than 24= 7836 seconds. > > Task 3426612, although in the D state, was not reported as a hung task. > > I confirmed that task 3426612 remained in the D (disk sleep) state and > experienced no context switches over a long period: > > $ cat /proc/3426612/status | grep -E "State:|ctxt_switches:"; \ > sleep 60; echo "----"; \ > cat /proc/3426612/status | grep -E "State:|ctxt_switches:" > State: D (disk sleep) > voluntary_ctxt_switches: 7516 > nonvoluntary_ctxt_switches: 0 > ---- > State: D (disk sleep) > voluntary_ctxt_switches: 7516 > nonvoluntary_ctxt_switches: 0 > > The system's hung_task detector settings were configured as follows: > > kernel.hung_task_timeout_secs =3D 28 > kernel.hung_task_warnings =3D -1 > > The issue lies in the handling of task state in the XFS code. Specificall= y, > the thread in question (3426612) was set to the TASK_KILLABLE state in > xfs_extent_busy_flush(): > > xfs_extent_busy_flush > prepare_to_wait(&pag->pagb_wait, &wait, TASK_KILLABLE); > > When a task is in the TASK_WAKEKILL state (a subset of TASK_KILLABLE), th= e > hung_task detector ignores it, as it assumes such tasks can be terminated= . > However, in this case, the kernel thread cannot be killed, meaning it > effectively becomes a hung task. > > To address this issue, the hung_task detector should report the kthreads = in > the TASK_WAKEKILL state. > > Link: https://lore.kernel.org/linux-xfs/20230620002021.1038067-5-david@fr= omorbit.com/ [0] > Signed-off-by: Yafang Shao > Cc: Dave Chinner > --- > kernel/hung_task.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c > index c18717189f32..ed63fd84ce2e 100644 > --- a/kernel/hung_task.c > +++ b/kernel/hung_task.c > @@ -220,8 +220,9 @@ static void check_hung_uninterruptible_tasks(unsigned= long timeout) > */ > state =3D READ_ONCE(t->__state); > if ((state & TASK_UNINTERRUPTIBLE) && > + (t->flags & PF_KTHREAD || > !(state & TASK_WAKEKILL) && > - !(state & TASK_NOLOAD)) > + !(state & TASK_NOLOAD))) > check_hung_task(t, timeout); > } > unlock: > -- > 2.43.5 > Please disregard this. There may be multiple hung tasks in the TASK_IDLE st= ate. I will send a new one. -- Regards Yafang