From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4AAD8C77B73 for ; Fri, 5 May 2023 02:06:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 38AA26B0075; Thu, 4 May 2023 22:06:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 33A776B0078; Thu, 4 May 2023 22:06:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 228F86B007B; Thu, 4 May 2023 22:06:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by kanga.kvack.org (Postfix) with ESMTP id 087C06B0075 for ; Thu, 4 May 2023 22:06:46 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683252405; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=+1lZHH9cJOkUnL6+SkM6O5cU5HIeggOc99mIe6aElrU=; b=ASosaWtJswzd1Dx4dzqF0AVaoH0T6NrwQtmfLGxzZIsaenlrYfrqg9Vei2wKsisxf2PEJU 00rg91PlwIT70nQ7V/PeB3PO3QsE38u4Joua/9Xv5tgTXoi0xIGsayyqLbpGLIIAZuF5u6 43ZZZSbJWj2T3xQQD7ab7RgdSTJ70ZI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-130-vimcqaoUNs6Ig5aKRDdcoA-1; Thu, 04 May 2023 22:06:41 -0400 X-MC-Unique: vimcqaoUNs6Ig5aKRDdcoA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id F118A85A5B1; Fri, 5 May 2023 02:06:40 +0000 (UTC) Received: from ovpn-8-16.pek2.redhat.com (ovpn-8-20.pek2.redhat.com [10.72.8.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C782B2026D16; Fri, 5 May 2023 02:06:33 +0000 (UTC) Date: Fri, 5 May 2023 10:06:28 +0800 From: Ming Lei To: Keith Busch Cc: Theodore Ts'o , linux-ext4@vger.kernel.org, Andreas Dilger , linux-block@vger.kernel.org, Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Dave Chinner , Eric Sandeen , Christoph Hellwig , Zhang Yi , ming.lei@redhat.com Subject: Re: [ext4 io hang] buffered write io hang in balance_dirty_pages Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, May 04, 2023 at 09:59:52AM -0600, Keith Busch wrote: > On Thu, Apr 27, 2023 at 10:20:28AM +0800, Ming Lei wrote: > > Hello Guys, > > > > I got one report in which buffered write IO hangs in balance_dirty_pages, > > after one nvme block device is unplugged physically, then umount can't > > succeed. > > > > Turns out it is one long-term issue, and it can be triggered at least > > since v5.14 until the latest v6.3. > > > > And the issue can be reproduced reliably in KVM guest: > > > > 1) run the following script inside guest: > > > > mkfs.ext4 -F /dev/nvme0n1 > > mount /dev/nvme0n1 /mnt > > dd if=/dev/zero of=/mnt/z.img& > > sleep 10 > > echo 1 > /sys/block/nvme0n1/device/device/remove > > > > 2) dd hang is observed and /dev/nvme0n1 is gone actually > > Sorry to jump in so late. > > For an ungraceful nvme removal, like a surpirse hot unplug, the driver > sets the capacity to 0 and that effectively ends all dirty page writers > that could stall forward progress on the removal. And that 0 capacity > should also cause 'dd' to exit. Actually nvme device has been gone, and the hang just happens in balance_dirty_pages() from generic_perform_write(). The issue should be triggered on all kinds of disks which can be hot-unplug, and it can be duplicated on both ublk and nvme easily. > > But this is not an ungraceful removal, so we're not getting that forced > behavior. Could we use the same capacity trick here after flushing any > outstanding dirty pages? set_capacity(0) has been called in del_gendisk() after fsync_bdev() & __invalidate_device(), but I understand FS code just try best to flush dirty pages. And when the bdev is gone, these un-flushed dirty pages need cleanup, otherwise they can't be used any more. Thanks, Ming