From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7CA1C77B73 for ; Fri, 28 Apr 2023 01:41:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 07E8C6B0071; Thu, 27 Apr 2023 21:41:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 02E9B6B0072; Thu, 27 Apr 2023 21:41:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E38076B0074; Thu, 27 Apr 2023 21:41:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D10736B0071 for ; Thu, 27 Apr 2023 21:41:37 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6D989AC8A9 for ; Fri, 28 Apr 2023 01:41:37 +0000 (UTC) X-FDA: 80729097834.23.F2B9180 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf22.hostedemail.com (Postfix) with ESMTP id 9736AC000B for ; Fri, 28 Apr 2023 01:41:35 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=arOn6Fw+; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf22.hostedemail.com: domain of ming.lei@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=ming.lei@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682646095; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DAggdfiCq2i+jGMt/hx+W9dG2SinJfHBtiVgLfXz2w8=; b=7Dsbnu1bu74sXzYHXornnbeXEXN1DopzMCO+1O3I+8lpACJ2s5O/BqwCgRcr5kRJcNRmZN bVBuJj7ghxWc+Oq+6OjH8XwjBYd5w3YAW4B4EdSMdIPDk444e1NOzc2OV4IZ5qB54UlV2N T7BL8lUS3zBUlwHbFOSCsDhz9A98R+o= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=arOn6Fw+; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf22.hostedemail.com: domain of ming.lei@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=ming.lei@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682646095; a=rsa-sha256; cv=none; b=zy7iOhwQw+LFPabtVWtrQ0R/Li6voffDEhgOMOKIR9dS3Ho697xUZ8Ul1cg1TPpZz2wKLv koKqtluw1uBpMrmEBNKYwYks/kWswydE413Zg5Wyn0G+sdkro2iEaovHFdPmcHPyOHQY1E TPw7X9pD9ayJ0QrKY9r45PlmaHgPX84= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1682646094; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DAggdfiCq2i+jGMt/hx+W9dG2SinJfHBtiVgLfXz2w8=; b=arOn6Fw+HdawSOrfS6LuOoMb8WiU2/J/JPQHuzCL7yjkIIVyI2VVMFZ0xXPP16b/LtyUrn 7MlFTHTnYu6aKuUweLrOHquIaUFF0Uk2tCmyMZ7CesiXG3r2gheQX+IEjGFgrkP+3Dvoiu 7r+pezuMnTDNZpiTQZUKG06httsH2E4= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-115-X7zBevW-NnOvO2-CBXNroA-1; Thu, 27 Apr 2023 21:41:29 -0400 X-MC-Unique: X7zBevW-NnOvO2-CBXNroA-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A9C6529A9D40; Fri, 28 Apr 2023 01:41:28 +0000 (UTC) Received: from ovpn-8-24.pek2.redhat.com (ovpn-8-24.pek2.redhat.com [10.72.8.24]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CBB96492C3E; Fri, 28 Apr 2023 01:41:20 +0000 (UTC) Date: Fri, 28 Apr 2023 09:41:15 +0800 From: Ming Lei To: Baokun Li Cc: Matthew Wilcox , Theodore Ts'o , linux-ext4@vger.kernel.org, Andreas Dilger , linux-block@vger.kernel.org, Andrew Morton , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Dave Chinner , Eric Sandeen , Christoph Hellwig , Zhang Yi , yangerkun , ming.lei@redhat.com Subject: Re: [ext4 io hang] buffered write io hang in balance_dirty_pages Message-ID: References: <663b10eb-4b61-c445-c07c-90c99f629c74@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 9736AC000B X-Stat-Signature: owrudmei4jggs644zdwoc6mbt5k1td1g X-HE-Tag: 1682646095-552268 X-HE-Meta: U2FsdGVkX1/LGcmY/4p4mkfAiDU/28/aeBxe0jWcEmJoiOH9E1NGAjeR0Uccp0p4c8TKjjI5idtFGObr8MPqB5Tj9hNbSgCna4Oe8DyfrC+CsgC9Rgy5yLCYbqBZd15/950NdHLr5nFxndzWQMbKrRFgxrOtVOZTDit9uUbDco1T5BgVbN4PaqbBlAjc69IkbytyTf18vozWqDi5D+fnwLFpGGPWdFBX0wx33r1LjZe4m8EaLEzi/3vpRJpMfhlM/5pgiVFVievVlkudLyf/W6GcvdVUvekqpHeMGE2ulPk306N0HRnLkAf7gYarFdsMTjPrfEm4RvhBwYS3CuVxz5f02g5qD5Jci3lIr0TeJEdfO3p19xFMVm5sEy6zLo61SyC/EuSR4/YB5cLBJFvAbilT28lfM6ryv0NB6oETCWla29Z3w2Orzg8Q2f1nhcDi4qS2S8ASR8iJoH5ExlEdh8w2gN29f832PbMruTxYjx7qerQ6MuBPDjnZ1jI7AiPSO+9AtzUv9dLmwkQqrZAEb8r9OjK/MhYTYgclvFUF8u3u/Y2lupE47xmGeCYlh+I9QO1LQXyyfQ8W3bwDrBYOAsX9rRIbNkzGGz8X6qmfaQ7pKuniXoEXBX8aZnI/3D5a261MQL51SyheRORnH4hZfr0acbc3jYHe/C4Edtbf5PHp32puGYgkCF6k+YxYqS7YpT/waWkKUZW8sJVwxu41dYr06R0hE3jvhU2KCYhpjLsT7VTXuWmcXdxNDo9TaIpYjvOsQne249mQHT3HV4BcYrY1RZFa86Q/hmN6iPJ5VhIEPWrftoyzJTJTz1MkOWTZsEuUoHJMUq5jtkM0f4BO6vj2ZgLd3CQGurNoc+w4JaRse0gpQhCH7XrYGuydu5tFX0RyQauhB4kwzgsqdGk0AsSwx/kVappYYhOMT+B5ObCNtIhI8sGeJyQ4PAfdbxjGJDN6zjCH8JrP6HwfPU4 LV8Ww9sa Bvk16g0S1SkPEzK0G/LgsrwR4ddTDanQadqHEiDzuBTu92BImtzbcetxW2VrPxWsXwmhbUamj3MCcKft7W3+ZgIKNzcZ/WWPGKNNY28Men2d/NzhMArGSW76KsqGsOxKe/r8yRqa1Yik238wNisO5X+W/yZKQ4fz/bQxLnfNEEN4QzO3/jVY/23iiwJUIzVNqRG7aQW0LWImiFD5wNg9BTopqNw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 27, 2023 at 07:27:04PM +0800, Ming Lei wrote: > On Thu, Apr 27, 2023 at 07:19:35PM +0800, Baokun Li wrote: > > On 2023/4/27 18:01, Ming Lei wrote: > > > On Thu, Apr 27, 2023 at 02:36:51PM +0800, Baokun Li wrote: > > > > On 2023/4/27 12:50, Ming Lei wrote: > > > > > Hello Matthew, > > > > > > > > > > On Thu, Apr 27, 2023 at 04:58:36AM +0100, Matthew Wilcox wrote: > > > > > > On Thu, Apr 27, 2023 at 10:20:28AM +0800, Ming Lei wrote: > > > > > > > Hello Guys, > > > > > > > > > > > > > > I got one report in which buffered write IO hangs in balance_dirty_pages, > > > > > > > after one nvme block device is unplugged physically, then umount can't > > > > > > > succeed. > > > > > > That's a feature, not a bug ... the dd should continue indefinitely? > > > > > Can you explain what the feature is? And not see such 'issue' or 'feature' > > > > > on xfs. > > > > > > > > > > The device has been gone, so IMO it is reasonable to see FS buffered write IO > > > > > failed. Actually dmesg has shown that 'EXT4-fs (nvme0n1): Remounting > > > > > filesystem read-only'. Seems these things may confuse user. > > > > > > > > The reason for this difference is that ext4 and xfs handle errors > > > > differently. > > > > > > > > ext4 remounts the filesystem as read-only or even just continues, vfs_write > > > > does not check for these. > > > vfs_write may not find anything wrong, but ext4 remount could see that > > > disk is gone, which might happen during or after remount, however. > > > > > > > xfs shuts down the filesystem, so it returns a failure at > > > > xfs_file_write_iter when it finds an error. > > > > > > > > > > > > ``` ext4 > > > > ksys_write > > > >  vfs_write > > > >   ext4_file_write_iter > > > >    ext4_buffered_write_iter > > > >     ext4_write_checks > > > >      file_modified > > > >       file_modified_flags > > > >        __file_update_time > > > >         inode_update_time > > > >          generic_update_time > > > >           __mark_inode_dirty > > > >            ext4_dirty_inode ---> 2. void func, No propagating errors out > > > >             __ext4_journal_start_sb > > > >              ext4_journal_check_start ---> 1. Error found, remount-ro > > > >     generic_perform_write ---> 3. No error sensed, continue > > > >      balance_dirty_pages_ratelimited > > > >       balance_dirty_pages_ratelimited_flags > > > >        balance_dirty_pages > > > >         // 4. Sleeping waiting for dirty pages to be freed > > > >         __set_current_state(TASK_KILLABLE) > > > >         io_schedule_timeout(pause); > > > > ``` > > > > > > > > ``` xfs > > > > ksys_write > > > >  vfs_write > > > >   xfs_file_write_iter > > > >    if (xfs_is_shutdown(ip->i_mount)) > > > >      return -EIO;    ---> dd fail > > > > ``` > > > Thanks for the info which is really helpful for me to understand the > > > problem. > > > > > > > > > balance_dirty_pages() is sleeping in KILLABLE state, so kill -9 of > > > > > > the dd process should succeed. > > > > > Yeah, dd can be killed, however it may be any application(s), :-) > > > > > > > > > > Fortunately it won't cause trouble during reboot/power off, given > > > > > userspace will be killed at that time. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > Ming > > > > > > > > > Don't worry about that, we always set the current thread to TASK_KILLABLE > > > > > > > > while waiting in balance_dirty_pages(). > > > I have another concern, if 'dd' isn't killed, dirty pages won't be cleaned, and > > > these (big amount)memory becomes not usable, and typical scenario could be USB HDD > > > unplugged. > > > > > > > > > thanks, > > > Ming > > Yes, it is unreasonable to continue writing data with the previously opened > > fd after > > the file system becomes read-only, resulting in dirty page accumulation. > > > > I provided a patch in another reply. > > Could you help test if it can solve your problem? > > If it can indeed solve your problem, I will officially send it to the email > > list. > > OK, I will test it tomorrow. Your patch can avoid dd hang when bs is 512 at default, but if bs is increased to 1G and more 'dd' tasks are started, the dd hang issue still can be observed. The reason should be the next paragraph I posted. Another thing is that if remount read-only makes sense on one dead disk? Yeah, block layer doesn't export such interface for querying if bdev is dead. However, I think it is reasonable to export such interface if FS needs that. > > But I am afraid if it can avoid the issue completely because the > old write task hang in balance_dirty_pages() may still write/dirty pages > if it is one very big size write IO. thanks, Ming