From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39535C47E49 for ; Fri, 25 Oct 2019 13:59:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 000F3222CD for ; Fri, 25 Oct 2019 13:59:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="anbmIYYO" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 000F3222CD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8E2476B0006; Fri, 25 Oct 2019 09:59:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 86BE36B0007; Fri, 25 Oct 2019 09:59:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7821E6B0008; Fri, 25 Oct 2019 09:59:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0185.hostedemail.com [216.40.44.185]) by kanga.kvack.org (Postfix) with ESMTP id 546406B0006 for ; Fri, 25 Oct 2019 09:59:16 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id F0BF618030351 for ; Fri, 25 Oct 2019 13:59:15 +0000 (UTC) X-FDA: 76082463870.12.girl31_8b09c71a74763 X-HE-Tag: girl31_8b09c71a74763 X-Filterd-Recvd-Size: 4625 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Fri, 25 Oct 2019 13:59:15 +0000 (UTC) Received: from sasha-vm.mshome.net (c-73-47-72-35.hsd1.nh.comcast.net [73.47.72.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 52669222C9; Fri, 25 Oct 2019 13:59:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1572011954; bh=hYJm1cFdOAmx6GLYpJvVEn8yPe42AQ1KAIXNR8DbkTg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=anbmIYYOTveMRDzw4dCNxnGV6FWDMezqGn8YBISD18I7O1m8Wbj5TyaqNvXyOz3IA V5qptt2g3iIXCtzbrhR8GM9dhBgCHUQGcBPAf59CEOUsSYFHKLjN1P9yYfjAdFOVsh 8J+hD9W0trqBGtBdVer+hGU+hJySWZ6+Cl5SfZCI= From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Mika Westerberg , AceLan Kao , "Rafael J . Wysocki" , Jens Axboe , Sasha Levin , linux-mm@kvack.org Subject: [PATCH AUTOSEL 4.4 16/16] bdi: Do not use freezable workqueue Date: Fri, 25 Oct 2019 09:58:40 -0400 Message-Id: <20191025135842.25977-16-sashal@kernel.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191025135842.25977-1-sashal@kernel.org> References: <20191025135842.25977-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Mika Westerberg [ Upstream commit a2b90f11217790ec0964ba9c93a4abb369758c26 ] A removable block device, such as NVMe or SSD connected over Thunderbolt can be hot-removed any time including when the system is suspended. When device is hot-removed during suspend and the system gets resumed, kernel first resumes devices and then thaws the userspace including freezable workqueues. What happens in that case is that the NVMe driver notices that the device is unplugged and removes it from the system. This ends up calling bdi_unregister() for the gendisk which then schedules wb_workfn() to be run one more time. However, since the bdi_wq is still frozen flush_delayed_work() call in wb_shutdown() blocks forever halting system resume process. User sees this as hang as nothing is happening anymore. Triggering sysrq-w reveals this: Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme] Call Trace: ? __schedule+0x2c5/0x630 ? wait_for_completion+0xa4/0x120 schedule+0x3e/0xc0 schedule_timeout+0x1c9/0x320 ? resched_curr+0x1f/0xd0 ? wait_for_completion+0xa4/0x120 wait_for_completion+0xc3/0x120 ? wake_up_q+0x60/0x60 __flush_work+0x131/0x1e0 ? flush_workqueue_prep_pwqs+0x130/0x130 bdi_unregister+0xb9/0x130 del_gendisk+0x2d2/0x2e0 nvme_ns_remove+0xed/0x110 [nvme_core] nvme_remove_namespaces+0x96/0xd0 [nvme_core] nvme_remove+0x5b/0x160 [nvme] pci_device_remove+0x36/0x90 device_release_driver_internal+0xdf/0x1c0 nvme_remove_dead_ctrl_work+0x14/0x30 [nvme] process_one_work+0x1c2/0x3f0 worker_thread+0x48/0x3e0 kthread+0x100/0x140 ? current_work+0x30/0x30 ? kthread_park+0x80/0x80 ret_from_fork+0x35/0x40 This is not limited to NVMes so exactly same issue can be reproduced by hot-removing SSD (over Thunderbolt) while the system is suspended. Prevent this from happening by removing WQ_FREEZABLE from bdi_wq. Reported-by: AceLan Kao Link: https://marc.info/?l=3Dlinux-kernel&m=3D138695698516487 Link: https://bugzilla.kernel.org/show_bug.cgi?id=3D204385 Link: https://lore.kernel.org/lkml/20191002122136.GD2819@lahna.fi.intel.c= om/#t Acked-by: Rafael J. Wysocki Signed-off-by: Mika Westerberg Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin --- mm/backing-dev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 07e3b3b8e8469..c682fb90cf356 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -245,8 +245,8 @@ static int __init default_bdi_init(void) { int err; =20 - bdi_wq =3D alloc_workqueue("writeback", WQ_MEM_RECLAIM | WQ_FREEZABLE | - WQ_UNBOUND | WQ_SYSFS, 0); + bdi_wq =3D alloc_workqueue("writeback", WQ_MEM_RECLAIM | WQ_UNBOUND | + WQ_SYSFS, 0); if (!bdi_wq) return -ENOMEM; =20 --=20 2.20.1