From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF355C77B7C for ; Sat, 29 Apr 2023 03:16:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1E6C66B0071; Fri, 28 Apr 2023 23:16:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 195EE6B0074; Fri, 28 Apr 2023 23:16:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 05D7E6B0075; Fri, 28 Apr 2023 23:16:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E6A066B0071 for ; Fri, 28 Apr 2023 23:16:33 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AF3A4A0404 for ; Sat, 29 Apr 2023 03:16:33 +0000 (UTC) X-FDA: 80732965866.23.0C70F1C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf07.hostedemail.com (Postfix) with ESMTP id 13F084001A for ; Sat, 29 Apr 2023 03:16:30 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=WpuGL8h1; spf=pass (imf07.hostedemail.com: domain of ming.lei@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=ming.lei@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682738192; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Y8HvWUV5hy/kWDmDmtLLm7pt+PnVWNujp0CYMU9jQ6c=; b=SEAH1W6FYm5Q6YvkOTqrbokivdhLxpiZmBF3XHlHoeoDIEppFiafkzbZu3gSBZQYgs84a8 VcKExQm+MhdUfLGbxCtWfHIe1hq5Kh23jxXM1dOQsuAj7GpMPMfLQBijPjdgQQXs7e0iqE YkBDmNQVz9eklaun3EHyUNWzVo9A428= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=WpuGL8h1; spf=pass (imf07.hostedemail.com: domain of ming.lei@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=ming.lei@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682738192; a=rsa-sha256; cv=none; b=ZQxAZo9dMy9+5kvgPTt+SQ3TOzd4XzLgXLa1226/PI07uX4M565hSFDPw+hm5McP1AANOE B+mfkt1r3g5IGQbxP7/YHxPs8/jsAQ8Fim9fMXWoicpjeG6gLycrzLD9oWIfO+MG3qWVw+ y5i480DGTnoGy+2hiZKFYi2tYY36G50= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682738190; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Y8HvWUV5hy/kWDmDmtLLm7pt+PnVWNujp0CYMU9jQ6c=; b=WpuGL8h1WANZYcnUms8C2tn/vZSXYEnsu9CEq0n3dGAUd4MKqSWrZXN4tewOIK7APHekKs nDw+jC/ge7mBuh3b4HELdmTowi9NuMJRg0rHpo65Iq6l5xIcy82r2MMCEspsIQnVIFfadx rFcHiQLTJwoHqVnU2BlSEB3FwzxqiV0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-571-ssp8t18XOjSZ-rjNIW5Xcw-1; Fri, 28 Apr 2023 23:16:28 -0400 X-MC-Unique: ssp8t18XOjSZ-rjNIW5Xcw-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C78AC811E7E; Sat, 29 Apr 2023 03:16:27 +0000 (UTC) Received: from ovpn-8-24.pek2.redhat.com (ovpn-8-18.pek2.redhat.com [10.72.8.18]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 05C6D40F177; Sat, 29 Apr 2023 03:16:19 +0000 (UTC) Date: Sat, 29 Apr 2023 11:16:14 +0800 From: Ming Lei To: Theodore Ts'o Cc: Baokun Li , Matthew Wilcox , linux-ext4@vger.kernel.org, Andreas Dilger , linux-block@vger.kernel.org, Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Dave Chinner , Eric Sandeen , Christoph Hellwig , Zhang Yi , yangerkun , ming.lei@redhat.com Subject: Re: [ext4 io hang] buffered write io hang in balance_dirty_pages Message-ID: References: <663b10eb-4b61-c445-c07c-90c99f629c74@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 13F084001A X-Stat-Signature: 81dzb5nk7t9o1bsyjszswkqo3e6t1kwg X-HE-Tag: 1682738190-973016 X-HE-Meta: U2FsdGVkX1+03mMd3krXTJvxeytbsynIGBtoRqjkecSYEYTJPEAEWKShIQCQGuT3X13DNOiE/N97LbUKC56X2OH0uUEcIMtsJpsAt5UKfuaNXYhEMvlXSg2RxYWhOhlYKVHB8IiG4koCnzrSOIQkBgewWG5TgGfPqgnO34XhX+o9Ehqaxivy4pbBvaRqmutprVl9turHASDQhia6b8VZtUM9RJPdefGlpD8OzlWJIGMVh13IoNUR0K5Of+meKqYCUoul84bPMLqVUt2eIw/TpucNuV8yhfm+49kx2rm1e6JKWKQvIsGuaKpPbmXfCqlJpQ//ssRNXzXyAm6tlHRAiLH8Ne3snrz2igpb/3/QYaQ/o5LKYo/ZD7VkDC+22/2ylq6JNYZlcql7EhpoIsGq5zrZx9UgFX5OWjHzS4hghv5n41Rp6Ei2ame7ysuMN9I9EghiefRV+q+jTFrUgFx3EAPYwX8T9qPEHrRCzWzfEL9jt/FFbDSJ8YKZOHE2ijALoeIKzISVStALiSlC6EEBxMGsbMB+A2lYMY2EiVHMncOryYjRNH0wJIhl0CdWwa9YKgWWIA1F/B7EPCdYPhWHkGkKFfsbIJEJOLE5v9UeLmboTQW2oBBiaQLDRhSuHkzPEmR+u0+sCHdkitW2jWrBYSM60wegHpljshmH5nk9lQx+LfPj+W9+3TeidMQTAuVJ6+s/H0kLNTHDWWB2DmjJueShaKeIEExu9XyzooxNYfa2tEzbZVGD82OCApvohXEDdY2liQ28QiOKvlRawuRa2j1IZruKCQ5OEoArRbtgDQi6AIg6bC+6rmUL83hkUgNgK7EJdt686fNb2m1ke3GMsBbgpDxZXCGgcM4TSNSkeo0rR8QN8ymq6MSEa4jwCYcOmCUaPUodd+/2qhtabOAw+ZVHTSRj6Nim4oAznVXG4eDfE6ET4aSF3clcW8dFuW3laNgpXmrH2mtwklzwTKN c2L2uRCr ZeTv4sEWxuRQrbUSBQWIfTRwK2Bnc4BDvsBJUGd+8R4Wzxw78rfO5AxVSwnS0e6QqV44tdYxFDwGfWd3BQXz7o514fWyyA40nimZibrE9QjBdiqlk2VP31vhYuQMLmdkfwSX66341asgcOodO7c6XwhFAGy1Nzcaxt4+AJK8Me8yLwIwbhOQba3NQEyJDCLbk0NLhXoCzoi/Nr91FipI3XK520g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Apr 28, 2023 at 01:47:22AM -0400, Theodore Ts'o wrote: > On Fri, Apr 28, 2023 at 11:47:26AM +0800, Baokun Li wrote: > > Ext4 just detects I/O Error and remounts it as read-only, it doesn't know > > if the current disk is dead or not. > > > > I asked Yu Kuai and he said that disk_live() can be used to determine > > whether > > a disk has been removed based on the status of the inode corresponding to > > the block device, but this is generally not done in file systems. > > What really needs to happen is that del_gendisk() needs to inform file > systems that the disk is gone, so that the file system can shutdown > the file system and tear everything down. OK, looks both Dave and you have same suggestion, and IMO, it isn't hard to add one interface for notifying FS, and it can be either one s_ops->shutdown() or shutdown_filesystem(struct super_block *sb). But the main job should be how this interface is implemented in FS/VFS side, so it looks one more FS job, and block layer can call shutdown_filesystem() from del_gendisk() simply. > > disk_live() is relatively new; it was added in August 2021. Back in IO failure plus checking disk_live() could be one way for handling the failure, but this kind of interface isn't friendly. > 2015, I had added the following in fs/ext4/super.c: > > /* > * The del_gendisk() function uninitializes the disk-specific data > * structures, including the bdi structure, without telling anyone > * else. Once this happens, any attempt to call mark_buffer_dirty() > * (for example, by ext4_commit_super), will cause a kernel OOPS. > * This is a kludge to prevent these oops until we can put in a proper > * hook in del_gendisk() to inform the VFS and file system layers. > */ > static int block_device_ejected(struct super_block *sb) > { > struct inode *bd_inode = sb->s_bdev->bd_inode; > struct backing_dev_info *bdi = inode_to_bdi(bd_inode); > > return bdi->dev == NULL; > } > > As the comment states, it's rather awkward to have the file system > check to see if the block device is dead in various places; the real I can understand the awkward, :-( bdi_unregister() is called in del_gendisk(), since bdi_register() has to be called in add_disk() where major/minor is figured out. > problem is that the block device shouldn't just *vanish*, with the That looks not realistic, removable disk can be gone any time, and device driver error handler often deletes disk as the last straw, and it shouldn't be hard to observe such error. Also it is not realistic to wait until all openers closes the bdev, given it may wait forever. > block device structures egetting partially de-initialized, without the > block layer being polite enough to let the file system know. Block device & gendisk instance won't be gone if the bdev is opened, and I guess it is just few fields deinitialized, such as bdi->dev, bdi could be the only one used by FS code. > > > Those dirty pages that are already there are piling up and can't be > > written back, which I think is a real problem. Can the block layer > > clear those dirty pages when it detects that the disk is deleted? > > Well, the dirty pages belong to the file system, and so it needs to be > up to the file system to clear out the dirty pages. But I'll also > what the right thing to do when a disk gets removed is not necessarily > obvious. Yeah, clearing dirty pages doesn't belong to block layer. > > For example, suppose some process has a file mmap'ed into its address > space, and that file is on the disk which the user has rudely yanked > out from their laptop; what is the right thing to do? Do we kill the > process? Do we let the process write to the mmap'ed region, and > silently let the modified data go *poof* when the process exits? What > if there is an executable file on the removable disk, and there are > one or more processes running that executable when the device > disappears? Do we kill the process? Do we let the process run unti > it tries to access a page which hasn't been paged in and then kill the > process? > > We should design a proper solution for What Should Happen when a > removable disk gets removed unceremoniously without unmounting the > file system first. It's not just a matter of making some tests go > green.... Agree, the trouble is actually in how FS to handle the disk removal. Thanks, Ming