From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63D18C77B73 for ; Thu, 27 Apr 2023 23:33:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 676F46B0071; Thu, 27 Apr 2023 19:33:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 626D96B0072; Thu, 27 Apr 2023 19:33:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 515C06B0074; Thu, 27 Apr 2023 19:33:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 427556B0071 for ; Thu, 27 Apr 2023 19:33:35 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 04E001C6B29 for ; Thu, 27 Apr 2023 23:33:34 +0000 (UTC) X-FDA: 80728775190.02.01BA0C4 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf23.hostedemail.com (Postfix) with ESMTP id D3366140003 for ; Thu, 27 Apr 2023 23:33:32 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FojXuSDQ; spf=pass (imf23.hostedemail.com: domain of dchinner@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dchinner@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682638413; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8j1mufvIgrEof2rz1gioQYw01o/93612V76WfPJiQ9s=; b=QwGrLOnqxPpNW+qj5X9HpyxjqHXxAtbePGeHRoHRF2Tt8H2vpqkI2dsxzcIYesDYaIwP0d zikkPn0HYf+OJZkKErAozNgJb7ujHg6HQqJzpYXBw3l3pfWHCYPHx02uKpHqqTmwB3DyjT 0U6qZQZ0/qn6QTJUwypd9///mD/gTFc= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FojXuSDQ; spf=pass (imf23.hostedemail.com: domain of dchinner@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dchinner@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682638413; a=rsa-sha256; cv=none; b=o7rlj9nd56MCji20A+Awr2mykKhAFDt/s7TzNMI8b9Zr3PxhJN/TCId7blVPvfr146tpJI NkDXSBRJRr/X7Gd2u/tAHCS0Y/sUdfEKh+0SPfa2b2MwtjgK54kHilO5iNDgDNM9/pu4nw s+9jVVp0gGGf2yvP5Y8bExmpWMUSTLo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682638412; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=8j1mufvIgrEof2rz1gioQYw01o/93612V76WfPJiQ9s=; b=FojXuSDQZ41z3gKKATqVRliiJxP8kQf2GpGF6ZKdad6fa4+BLM3VVq0eaTL3n/Rrm5UlaG y+1IZQ0ljG457FGUdZl1ytKp2f64IS9ex71ujB2NlgtynwRsbtdv/uNqMKT3bJw7S4x1ji zeXcqcQ/VShM7wSonY6o4wfuvbBJXKw= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-522-OJLfBfwhM7m20J7kGJ2TRQ-1; Thu, 27 Apr 2023 19:33:28 -0400 X-MC-Unique: OJLfBfwhM7m20J7kGJ2TRQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3E5FE1C068C6; Thu, 27 Apr 2023 23:33:28 +0000 (UTC) Received: from rh (vpn2-52-17.bne.redhat.com [10.64.52.17]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5C0581121314; Thu, 27 Apr 2023 23:33:27 +0000 (UTC) Received: from localhost ([::1] helo=rh) by rh with esmtps (TLS1.3) tls TLS_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1psB75-0001eO-2u; Fri, 28 Apr 2023 09:33:23 +1000 Date: Fri, 28 Apr 2023 09:33:20 +1000 From: Dave Chinner To: Ming Lei Cc: Theodore Ts'o , linux-ext4@vger.kernel.org, Andreas Dilger , linux-block@vger.kernel.org, Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Eric Sandeen , Christoph Hellwig , Zhang Yi Subject: Re: [ext4 io hang] buffered write io hang in balance_dirty_pages Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Rspamd-Queue-Id: D3366140003 X-Stat-Signature: q8z4xmnoacew4wh53sdiemgkhtz4tmkr X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1682638412-318648 X-HE-Meta: U2FsdGVkX18BU+SoC2kAqR/PT2UAvswAPlCHTrBt0uavbVO/mTpJ+CPUlKjqI0uf8e8PUHCY+7aLzvh3rfjFq8xOoEKwiQoUbq24UrWmIsfIRxht0LDmFBvincrbnVF56VgjAfcKPYdEYcYv6IG0UBbtnyskrQZrCg/+oEZzByDeoVrW8uj3MNUcXf5muxhYpLS+M6u51rhsnnZ1RvoiwObd2wtVEJGgSNkXbo7xVWp4F9Wpkev07A3w1bx3PffvMcKVD4D9CtzjAG/8GlwrZOylfIGrMsQjTMu+G3CFA1sEf4lbUCI1GY97JMa3Q3BLYFnC7iB5s4yxjCeG/L90Pyj0tk2AbM9pOs8XPooQaVmV8x6ET8WEE3qxDUsg2YMc8mOk+FBE13EtvAS0Tb8vYlv8VWy9kB00Gzea+LrnmZXiCEYtFsEEd5hXdAgpfAyPo8ZBY1SEVXgI/BZmauuzmuoOKyw4l/Gzv6GERvz7lrlke/5l5cTByg3eOAjRbfYxqtXWPk89rMfPW+RB5O+W2AxDOuun0PJnoCR5fyDxWiItzA8Gqx5bUmvV5XP/fu7XUX9BSOPPoxsozbFrQIv0MxcqxOxbAbzwsFUuBbujsTDB5Sk/Svi3zoIRDLs7DLpoL4pV9soW0gVagOMGSrzDiq4U7+RtGLRb6VONpCXqCZIjEMyfpsjCJQW2FUbT106EFL9fM56ahyeO7uw+BK/XgI/7yMkwLgq+vTMGlRmehkaXCLIe2bqij0d85Wm4aZU6o3AbQBUCUVdjDiJuOfN4BMmG3S83IWeqPDWaogWL70Gyc+PfU83d723hecO4hVeXipWEeornKMinEja79Zv+UPfyssiCSENGXkaleDxKWo/aLbPSdSr/qe1Hh8MsAahKXtJdMxw4LGaLTgnFFtbjdR2qTKNkQeLjSoBTvN1SuYr9f1LAjx3hCNnerm+bqCI0hPH3/eOa7vETsOld6Gj Nk+IlNat LLJetJ/X48UyMUY0loh0Nh+f9AXhy45RGJVkFEjN+eqYC9F2shYSgeOk4BqyYxrX4W+Q83AZv+XrqYDOYpB9Mkq2FYBraEb1ATiTJ1mQBCcHSu61uvkW9fhx7xg0oyBCVMT4bZ0UR6hvIIK2WLHeIyxATeORmD6LPa5OhzfVOJHB6RaB5m5VyAnUFlnFo/cxkbB254vPHFCDZWyrmZ7Y6uGE6H+1rud5MMbo42rab+ezVwZ8zXpMPgo7DLw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 27, 2023 at 10:20:28AM +0800, Ming Lei wrote: > Hello Guys, > > I got one report in which buffered write IO hangs in balance_dirty_pages, > after one nvme block device is unplugged physically, then umount can't > succeed. The bug here is that the device unplug code has not told the filesystem that it's gone away permanently. This is the same problem we've been having for the past 15 years - when block device goes away permanently it leaves the filesystem and everything else dependent on the block device completely unaware that they are unable to function anymore. IOWs, the block device remove path is relying on -unreliable side effects- of filesystem IO error handling to produce what we'd call "correct behaviour". The block device needs to be shutting down the filesystem when it has some sort of fatal, unrecoverable error like this (e.g. hot unplug). We have the XFS_IOC_GOINGDOWN ioctl for telling the filesystem it can't function anymore. This ioctl (_IOR('X',125,__u32)) has also been replicated into ext4, f2fs and CIFS and it gets exercised heavily by fstests. Hence this isn't XFS specific functionality, nor is it untested functionality. The ioctl should be lifted to the VFS as FS_IOC_SHUTDOWN and a super_operations method added to trigger a filesystem shutdown. That way the block device removal code could simply call sb->s_ops->shutdown(sb, REASON) if it exists rather than sync_filesystem(sb) if there's a superblock associated with the block device. Then all these This way we won't have to spend another two decades of people complaining about how applications and filesystems hang when they pull the storage device out from under them and the filesystem didn't do something that made it notice before the system hung.... > So far only observed on ext4 FS, not see it on XFS. Pure dumb luck - a journal IO failed on XFS (probably during the sync_filesystem() call) and that shut the filesystem down. > I guess it isn't > related with disk type, and not tried such test on other type of disks yet, > but will do. It can happen on any block device based storage that gets pulled from under any filesystem without warning. > Seems like dirty pages aren't cleaned after ext4 bio is failed in this > situation? Yes, because the filesystem wasn't shut down on device removal to tell it that it's allowed to toss away dirty pages as they cannot be cleaned via the IO path.... -Dave. -- Dave Chinner dchinner@redhat.com