From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0B0CCCD18E for ; Wed, 18 Sep 2024 08:31:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F1066B0085; Wed, 18 Sep 2024 04:31:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A1606B0088; Wed, 18 Sep 2024 04:31:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3681E6B0089; Wed, 18 Sep 2024 04:31:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1827B6B0085 for ; Wed, 18 Sep 2024 04:31:39 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 856D6160401 for ; Wed, 18 Sep 2024 08:31:38 +0000 (UTC) X-FDA: 82577190276.14.495EB07 Received: from mail.flyingcircus.io (mail.flyingcircus.io [212.122.41.197]) by imf24.hostedemail.com (Postfix) with ESMTP id AEB6F180013 for ; Wed, 18 Sep 2024 08:31:35 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=flyingcircus.io header.s=mail header.b=Rd5vvMao; spf=pass (imf24.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io; dmarc=pass (policy=reject) header.from=flyingcircus.io ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726648264; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=me6yhkQ1Qf7Nz+9dw5lt1yO15V7nP5XWT/aS4UjahWA=; b=ALX5hGRkqe2NkUTyZN8LpURmg0GPQikncOVfFBMlvaCyHRmDOh+mjDVY6w8QLbiiZw5JIA 1/Sg2kp4K0IDuA+YXM5u6XZAW8SIaC/LDJfJnPWgAzBN+qY3azSsWB3BVnaaW4JhLY2KvN lygsvNReCZif6LNK6SkzJUmkVQ3+FB4= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=flyingcircus.io header.s=mail header.b=Rd5vvMao; spf=pass (imf24.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io; dmarc=pass (policy=reject) header.from=flyingcircus.io ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726648264; a=rsa-sha256; cv=none; b=CDqC8sVCKMGQpKXWCU6pdeP/r6UJwsEb0hYpq1N397Fel5wdMnHJ6yLW4cHfTxR0HaSOrk 2BYZ0jbMmXIyT0XOpo8/dm/S48JhyarXvp7x1y9FqNqhjXl6b3aPtuON9kKxoj3AOdgly9 yUckmllAKKAiy45P4sKAamZCpOxVodk= Content-Type: text/plain; charset=utf-8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flyingcircus.io; s=mail; t=1726648291; bh=me6yhkQ1Qf7Nz+9dw5lt1yO15V7nP5XWT/aS4UjahWA=; h=Subject:From:In-Reply-To:Date:Cc:References:To; b=Rd5vvMaolEmrl1l2z1ggs+5be2/6j4K964x7uMBLhVcPO2gEU/eYonbL1WLMHsp2+ QT8qbQk0uFrMQew4heoH4gNpduUY/4ziQ8d6f9V2MVqoQat69ad6Vx9Eghq6wD66pw LS293864SUaoed/IsnRGnQ8RKO7SFKGBlbrR9QzE= Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51\)) Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) From: Christian Theune In-Reply-To: <686D222E-3CA3-49BE-A9E5-E5E2F5AFD5DA@flyingcircus.io> Date: Wed, 18 Sep 2024 10:31:09 +0200 Cc: Linus Torvalds , Jens Axboe , Matthew Wilcox , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , clm@meta.com, regressions@lists.linux.dev, regressions@leemhuis.info Content-Transfer-Encoding: quoted-printable Message-Id: <0C392D79-DAB1-4730-B2AB-B2B8CF100F11@flyingcircus.io> References: <0fc8c3e7-e5d2-40db-8661-8c7199f84e43@kernel.dk> <686D222E-3CA3-49BE-A9E5-E5E2F5AFD5DA@flyingcircus.io> To: Dave Chinner X-Rspam-User: X-Stat-Signature: jwmdo8qjxonfrmfg5brkrpxbbig3u7y5 X-Rspamd-Queue-Id: AEB6F180013 X-Rspamd-Server: rspam11 X-HE-Tag: 1726648295-454287 X-HE-Meta: U2FsdGVkX18cyug4OudmMCCYdtl5UJf4eb6bxrplHdTm72ATrg1t4Jhb84u7XxS+5KGscu3ySuwirGFJYPqYmsQ2PkrJiMbss9XMi/O2EpqG5kqhywE+IwXYpyYVz/ZTxqDfdXFWGodCyLhDQb6CPiG5daz0ntaX4wTdTlleJ7bd5OfcwKI1EOZVHWiRFi9vmuvarofsMfJ3Bg/4OkrBfYO39bciTXy0n8mBXSHHk5sHhTXvCK9LTQPKJATRmHL0yCYxj4FqrKtj/G14tvf4t7p51uS8uCXvKF6vtGvRNkq6g4xws3CJcbj+/oFwQAGocsm177mt9WssQxLbgSUL8RMCbYRzRQIP+7rxks2kgWDJQRjU+mm5vlKj/jQCrLlZWFiiQC60/YPhamoWUHYsDE1LLyANkIMrvhUUCcakIS1pcSAZznI9FZmywELgqiTkYD5LiTaJ8GXaQEGUOrDOGTNazwuJcL+MEEu0kj0fGDzHRqKz8H92aeKYZ72aOijWGMU3irck3kR0SQ83nu7H8rxRKGUhdil6lwRFRb7S1Oxaz3uI5VEThGuUrObTAbbayVCjJONCKiRyCDH5zeHFhIK7qnizhU0e+28QX6lG+lz43h5aohnQbrKHL+JfLOgrdrh0j5IMe8t8MuzBPx0/F1TzMyOM7mIwRTgtFkj5ITPQuc5sttjUoobwqD3w+0yRTDDTYAMEduSFb+XrG694LZIrqfzcssWsYKVFG5sRcKxDDl9lkVkWhLlbnXlTa7aTR9efM+P4EgAsGzq73Q0BVoZQr5zuLUhXkMI9QMIZePAS6HkPVItHqT8EFkHhaaDwaTHNCnDF/eaonv/f491d3pROwzW6DVxJU1lPLicBXURXTy3oKgu2pZ7fU/EXvB3yeo2q/w+WZoAnWOPNlejDtWBvnysjB0DvzMKfnRoBGVN62jgaEgIqpLPy8+U+0JoBUI2WX/OopAJbhUo0Uu7 0jIN4jZ4 Sx3ZmRUQYYD1+XZ7DYF5aSyGXCUYQLkXpdLAkBBc7/DC13+yT0He/3Sjtq03HX1+Dn1aIYBdWlFK87gvFfMn7h19vxJ68Kiz8IWaQlZjQ6r7gxgeW7M7NVdpl0qD41XkT1jKtGdghhlWHbs6NjnHghlNzOsppVGtfpZA8k5V7y/SI3HLR0mgN42HGf/5WPzpHglSb1MBw4NU6j7exbefkuMJaufYuk5NTftoWAhHP1UY+SGMpS2C3LEJbV+yn/UrPAJ9u606xXhXVKm620M5a3EWasAUXIpnyxf3oIie7d39n+wTJXvDBVYh1HuFsamKzebfr2z/GDkONJ4U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > On 16. Sep 2024, at 09:14, Christian Theune = wrote: >=20 >>=20 >> On 16. Sep 2024, at 02:00, Dave Chinner wrote: >>=20 >> I don't think this is a data corruption/loss problem - it certainly >> hasn't ever appeared that way to me. The "data loss" appeared to be >> in incomplete postgres dump files after the system was rebooted and >> this is exactly what would happen when you randomly crash the >> system. i.e. dirty data in memory is lost, and application data >> being written at the time is in an inconsistent state after the >> system recovers. IOWs, there was no clear evidence of actual data >> corruption occuring, and data loss is definitely expected when the >> page cache iteration hangs and the system is forcibly rebooted >> without being able to sync or unmount the filesystems=E2=80=A6 >> All the hangs seem to be caused by folio lookup getting stuck >> on a rogue xarray entry in truncate or readahead. If we find an >> invalid entry or a folio from a different mapping or with a >> unexpected index, we skip it and try again. Hence this does not >> appear to be a data corruption vector, either - it results in a >> livelock from endless retry because of the bad entry in the xarray. >> This endless retry livelock appears to be what is being reported. >>=20 >> IOWs, there is no evidence of real runtime data corruption or loss >> from this pagecache livelock bug. We also haven't heard of any >> random file data corruption events since we've enabled large folios >> on XFS. Hence there really is no evidence to indicate that there is >> a large folio xarray lookup bug that results in data corruption in >> the existing code, and therefore there is no obvious reason for >> turning off the functionality we are already building significant >> new functionality on top of. I=E2=80=99ve been chewing more on this and reviewed the tickets I have. = We did see a PostgreSQL database ending up reporting "ERROR: invalid = page in block 30896 of relation base/16389/103292=E2=80=9D.=20 My understanding of the argument that this bug does not corrupt data is = that the error would only lead to a crash-consistent state. So = applications that can properly recover from a crash-consistent state = would only experience data loss to the point of the crash (which is fine = and expected) but should not end up in a further corrupted state. PostgreSQL reporting this error indicates - to my knowledge - that it = did not see a crash consistent state of the file system. Christian --=20 Christian Theune =C2=B7 ct@flyingcircus.io =C2=B7 +49 345 219401 0 Flying Circus Internet Operations GmbH =C2=B7 https://flyingcircus.io Leipziger Str. 70/71 =C2=B7 06108 Halle (Saale) =C2=B7 Deutschland HR Stendal HRB 21169 =C2=B7 Gesch=C3=A4ftsf=C3=BChrer: Christian Theune, = Christian Zagrodnick