From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74728C77B60 for ; Sat, 29 Apr 2023 04:56:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BADEB6B0071; Sat, 29 Apr 2023 00:56:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B5DDC6B0074; Sat, 29 Apr 2023 00:56:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A4BF56B007B; Sat, 29 Apr 2023 00:56:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 965F56B0071 for ; Sat, 29 Apr 2023 00:56:44 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 5EC051A0205 for ; Sat, 29 Apr 2023 04:56:44 +0000 (UTC) X-FDA: 80733218328.12.4910DA6 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by imf21.hostedemail.com (Postfix) with ESMTP id F343F1C000C for ; Sat, 29 Apr 2023 04:56:41 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=fail ("headers rsa verify failed") header.d=mit.edu header.s=outgoing header.b=HKk5nO+q; spf=pass (imf21.hostedemail.com: domain of tytso@mit.edu designates 18.9.28.11 as permitted sender) smtp.mailfrom=tytso@mit.edu; dmarc=pass (policy=none) header.from=mit.edu ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682744202; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MxVIng8Evj++Z1u5M0nsXwrWqthOdSqW0ypNnEpPCFw=; b=bRq7ISp51R7Z6pCBpc3yJgTOW/CTQWWmVxDJ5qOhBMxdd7T4+oN7uw69+wMeB2caHnW5vF mQmpIrI6qZxI3lxje4cBcXCyKmjCT6Pxo/3pvCeqxjH2K/1AIeVwjIyUl/qYobeRV8Q6kQ HEx8bPg8QfmTF+hCN9JIhwoPoZ1sa0w= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=fail ("headers rsa verify failed") header.d=mit.edu header.s=outgoing header.b=HKk5nO+q; spf=pass (imf21.hostedemail.com: domain of tytso@mit.edu designates 18.9.28.11 as permitted sender) smtp.mailfrom=tytso@mit.edu; dmarc=pass (policy=none) header.from=mit.edu ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682744202; a=rsa-sha256; cv=none; b=uRVo/0/Ofloa6lOivVk2qOA3ybpSEsZlyCVPVqhDIbZYZG48vtXsSVBfO/+VGUW5hoW6oV wUKKTJXCETguf9bvFoNgQgtFGVZWTLQm+aUAn3Zcer9wr4Tlext3clVWOVHhUz17/ClLVS Dg9ixErYM92EZUTv5QuMrFSE+3XXHck= Received: from letrec.thunk.org ([76.150.80.181]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 33T4u4S9019981 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 29 Apr 2023 00:56:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1682744169; bh=MxVIng8Evj++Z1u5M0nsXwrWqthOdSqW0ypNnEpPCFw=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=HKk5nO+qnFlDZChkGokPzHeb6mvY0qy3p1ZzgVpxcrxibP21L3LwhzN0L4jDw90kI rI2LnVSA2rc3SkGUKXHZ6OzPX6WV1VJS0OZg3JnOFp5IeCr0s025WZVc3r5bjKXis6 fbaz12JodlwTmZUq4iSXHcD57Z2JCIE9BsY7PENAuTwFrBUIHtO9NwOHAyhCrhpntL nMZ8yuB5zX0TXXgUNOObAmZkfmIOxsb4sZGE0z6ceafd4F3pgog1qX0LFSKzHmxHYz SEg5X9BmJM4q+abjxF8q/WSE6mfkZRqwheEM41Q2by7xwJRV10PKJDXCxcNgu0psiz zz2t9GLTgjh1Q== Received: by letrec.thunk.org (Postfix, from userid 15806) id EF0678C01B4; Sat, 29 Apr 2023 00:56:03 -0400 (EDT) Date: Sat, 29 Apr 2023 00:56:03 -0400 From: "Theodore Ts'o" To: Ming Lei Cc: Baokun Li , Matthew Wilcox , linux-ext4@vger.kernel.org, Andreas Dilger , linux-block@vger.kernel.org, Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Dave Chinner , Eric Sandeen , Christoph Hellwig , Zhang Yi , yangerkun Subject: Re: [ext4 io hang] buffered write io hang in balance_dirty_pages Message-ID: References: <663b10eb-4b61-c445-c07c-90c99f629c74@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: j34zo19g8r5tasckc66ahr6rhgsi3mxb X-Rspamd-Queue-Id: F343F1C000C X-HE-Tag: 1682744201-636590 X-HE-Meta: U2FsdGVkX1/pT2WEFy2WuuxotNBv9BJG0hLeBV17cm5oU1wWoSX/rlVQOC2fG1lMqN4iTit+PXtodB5pw7L++ML4MvOJe8RQNz0FFoViayeVvbkucq3TZXhEv479gCVSW+gxSQvsoxOaF0/nzAyNF+npQC9sv6SM80IoX5xvxXqdKv34SAM0HAX6p6bdEe6Jz2eHN/nIf8GTwHVpNyRCMzKdT6Q6btRy219FrdRw6VAhVlshA/v7soD/2YgFxq/hSTEMAdYS03/5DaKmeFtNqqxu8cCcPlMpWwDrxk+V1pNWYfm8i9cH5iHrC2ZLzwUj0JbEXNDGt7Pisk4D9bADywEPCre0hue7A/6B4uOWRIYb7tYrsueFRvYblPoUFvvbVy87eO4QcZ/CcvX0uEuwKRMeexr6kSbD8bVUw6ciOhgVEfeU/HVJFktvnRUHSdNO/bF55l3ekXoPykevhpN9f5I0/1YsFkCRMikbB3rdf79BFyb4GkRQ/8x5kYKEGsPSeoA7d3xVmzaaCLwld3tfXAC8Sjrs2bVji163PUL7dMZXnWCkNLs3tH3ytcZ69BwFdTV3tSg9lSJzOR0jPUi/YUkUFqA4Fh/elBTxDv69kFFCYS5X+aQLNZ+c8N4dKp219QoDMPSfKJ+fLdx7fsldwZjiT+FG6lmTXQG7hnfrIj5GSp4ToRLwvwV40s+xO0b1fYKOlqLMmQWbhPPhODrut+HX6rc4guZxiDWV2cW5ZUesKzLukaa44SwB8s+76sj2CPCuhScOSSxzVDgt4W0B0dBzLt3S2/VBx8Wdkb+W1QbzkvbL2a4GRaIThV3ddlnKfPsJXvw+/ZHQLPvqRdGFDYO5447JdfSr5PaPscFbFmiju8UoC3sYbz1SjG//MJty+lHqqbWRRu8BJmYAjs43RJsDbMMb6Cq/LYpo6WxsCVfwMD+b6DPL7lb7HUlBgEpL9HqebLoIR1uUAcIIJb9 Rrou6X57 GwZRovICkNra+wE3pYwHgl0HBlo+E3wZGuOpOxwGDgAI07213Dpddpl0nyoVvneG91oK4P4Vj4m05jK4i4bu/5F6YbjTgmxp1yZPZU8/C/Z/KSh3lX0znUdXBtQ/jKfEYnKPJYdQj8POe9t2Ff3q9oQb00Wbk80tqR2TUpbsohbdRJ344j61XESkkR+qBRS5r0xU28x9JMbYSTifmSrkX+giwN/bAzLBURR0ggzlaHuugxitHBtIVCSwv1/4YGZfJuVqUPECmE5EcevM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Apr 29, 2023 at 11:16:14AM +0800, Ming Lei wrote: > > bdi_unregister() is called in del_gendisk(), since bdi_register() has > to be called in add_disk() where major/minor is figured out. > > > problem is that the block device shouldn't just *vanish*, with the > > That looks not realistic, removable disk can be gone any time, and device > driver error handler often deletes disk as the last straw, and it shouldn't > be hard to observe such error. It's not realistic to think that the file system can write back any dirty pages, sure. At this point, the user has already yanked out the thumb drive, and the physical device is gone. However, various fields like bdi->dev shouldn't get deinitialized until after the s_ops->shutdown() function has returned. We need to give the file system a chance to shutdown any pending writebacks; otherwise, we could be racing with writeback happening in some other kernel thread, and while the I/O is certainly not going to suceed, it would be nice if attempts to write to the block device return an error, intead potentially causing the kernel to crash. The shutdown function might need to sleep while it waits for workqueues or kernel threads to exit, or while it iterates over all inodes and clears all of the dirty bits and/or drop all of the pages associated with the file system on the disconnected block device. So while this happens, I/O should just fail, and not result in a kernel BUG or oops. Once the s_ops->shutdown() has returned, then del_gendisk can shutdown and/or deallocate anything it wants, and if the file system tries to use the bdi after s_ops->shutdown() has returned, well, it deserves anything it gets. (Well, it would be nice if things didn't bug/oops in fs/buffer.c if there is no s_ops->shutdown() function, since there are a lot of legacy file systems that use the buffer cache and until we can add some kind of generic shutdown function to fs/libfs.c and make sure that all of the legacy file systems that are likely to be used on a USB thumb drive are fixed, it would be nice if they were protected. At the very least, we should make that things are no worse than they currently are.) - Ted P.S. Note that the semantics I've described here for s_ops->shutdown() are slightly different than what the FS_IOC_SHUTDOWN ioctl currently does. For example, after FS_IOC_SHUTDOWN, writes to files will fail, but read to already open files will succeed. I know this because the original ext4 shutdown implementation did actually prevent reads from going through, but we got objections from those that wanted ext4's FS_IOC_SHUTDOWN to work the same way as xfs's. So we have an out of tree patch for ext4's FS_IOC_SHUTDOWN implementation in our kernels at $WORK, because we were using it when we knew that the back-end server providing the iSCSI or remote block had died, and we wanted to make sure our borg (think Kubernetes) jobs would fast fail when they tried reading from the dead file system, as opposed to failing only after some timeout had elapsed. To avoid confusion, we should probably either use a different name than s_ops->shutdown(), or add a new mode to FS_IOC_SHUTDOWN which corresponds to "the block device is gone, shut *everything* down: reads, writes, everything." My preference would be the latter, since it would mean we could stop carrying that out-of-tree patch in our data center kernels...