From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B23AC3ABB2 for ; Mon, 16 Sep 2024 07:15:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD2E96B008C; Mon, 16 Sep 2024 03:15:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A84C16B0092; Mon, 16 Sep 2024 03:15:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 94A326B0093; Mon, 16 Sep 2024 03:15:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 76B4A6B008C for ; Mon, 16 Sep 2024 03:15:15 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1F2811C1468 for ; Mon, 16 Sep 2024 07:15:15 +0000 (UTC) X-FDA: 82569740190.06.5045EC9 Received: from mail.flyingcircus.io (mail.flyingcircus.io [212.122.41.197]) by imf21.hostedemail.com (Postfix) with ESMTP id 025611C0007 for ; Mon, 16 Sep 2024 07:15:12 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=flyingcircus.io header.s=mail header.b=E9epe5D3; spf=pass (imf21.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io; dmarc=pass (policy=reject) header.from=flyingcircus.io ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726470803; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=t9TGglCebkVwz3Xfy8T8OgJ7gnTHiWWWEXfL6kuEj/E=; b=ujlZ9TSp/LQsRyIiAWo6Uox4oCAIp+KDzCVuXXqruUhYmL70LuJUCSyxwP+PjQXV0TXfo4 5dpkBZf8TMC6uNCwUtWmz93mjPs7+6NQukFCa4vufga0o75KtEPvEsU8L66uzSpONeOO4u SPQxsUdp3p3Lx+Q7yNkaqEpDPGZKSE4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726470803; a=rsa-sha256; cv=none; b=wxtx5lbj/DrA/GJu3kvMMyckavHw7jAuySXs8Wu+cTBv0q99lVfo8mlKkVCmGwGGX2oSqH Lf71IQ2fiL99GHtEO0KhjqkaP0mzyq+o6XtBv2ixbtIV1zuJMxqRxye+mZa6AtbWnkCw7o TjkUOQlGhms8Nb4fvurOjdSy+Vh+Bk0= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=flyingcircus.io header.s=mail header.b=E9epe5D3; spf=pass (imf21.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io; dmarc=pass (policy=reject) header.from=flyingcircus.io Content-Type: text/plain; charset=utf-8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flyingcircus.io; s=mail; t=1726470908; bh=t9TGglCebkVwz3Xfy8T8OgJ7gnTHiWWWEXfL6kuEj/E=; h=Subject:From:In-Reply-To:Date:Cc:References:To; b=E9epe5D3D1j+Q+sICOdAlUUKkZN6QetDxnUXPHefyIa63ZlqWP1lclUkb+7l8cWEf rsaAiCHiWmruiNdTaYBVwekXCY0u0mcMUF5vAWHLDxbPL2nuIattrWFD2EJ6Z3IXw9 N9n65hNnyi+a/Ys2GeVYLfnN3+1xaQrrJtvsJpJg= Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51\)) Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) From: Christian Theune In-Reply-To: Date: Mon, 16 Sep 2024 09:14:45 +0200 Cc: Linus Torvalds , Jens Axboe , Matthew Wilcox , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , clm@meta.com, regressions@lists.linux.dev, regressions@leemhuis.info Content-Transfer-Encoding: quoted-printable Message-Id: <686D222E-3CA3-49BE-A9E5-E5E2F5AFD5DA@flyingcircus.io> References: <0fc8c3e7-e5d2-40db-8661-8c7199f84e43@kernel.dk> To: Dave Chinner X-Stat-Signature: amn5dpe61uoz1maraoss8nqew9rpf7o3 X-Rspamd-Queue-Id: 025611C0007 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1726470912-243842 X-HE-Meta: U2FsdGVkX1/rQvC5l+obqa5BfAQYRMQCU6ACyYR5htnurs8WJxr4W0XYPpJ1q3m9gs11r1kGHR03E91y+UI9DAlNhlNYZYZ0wClHtpUHCcNi1MPEGJbjERfA718KcnbKYIHyLhcJTeAKG2UF/odGk+fvvUGBI6CmnMmsq9xlY1aS8o3z+sZu8qhsRrgi6cFo48OAFKpF08XXr44+JkH+bmceToDGBsesl9uC/AMwiGktJNkq6LYyt+BxErFhdsNd5Yfjqofsp0kf3+E62fTF5nJ7HUfC3PnWEy1YUwVtrhmlaAbQtUa+iZnDTBFJk1dPxtRxCNKLChbnCVXD43Hr9yLJLGdmPev/I2OTJUr9PH3s5CrhrV68BkOl177niB/p7+y3CR/B9fBia/l+/5jv+3YLy5TGW67vm7QqETkahRvxPouzW5MSRrvowja69pxcxq2hKnJt9IXIZ6O6v5pxZLQlSrzP6Jh5R3npj/e6DSGi8x5uzBvRzqKLoqGGfwbDh8E88Xs6wlvZf6as4AKJ+PAaZ8wnGrmhlkFr/hIYUPxaE5d06kBzDhJjcv0GpSMMKDZlAAx8WsM3NnlktIu5Lkyy2BA/sVdbrbrNCG2yPOdaJU0ACwdgXJdejH0t0H9kSK4HkYz2u0YUKKH6+mqjoHG+0XO2p4//DhD7a70Tt9AzYU0ozPI+kYtVVWDS26BTvxiwDUxT8C/pL042O73bl9IfPzaPtEgysjNn5CyvrMTExqlkDSg43eGfO8p0rH3FMQ/DMbOQ33KaUiED5YTkTl1DQlJMljaKe4kByi8gX15Cyt6MxjXDRPmKBbYFgsBAOqpGx/LPQ4HSR49y2say5te5aqRjVKNXp9UhXSbHQpFbuMpTNStb0MXTGkFOak8xOrGWFXkO/baFsyCieXR3pyvzSMIyIPaN89+0172wjSeSI34I3Q3FbDR4cyj03b5ZEMvTK3oFsAOD7yxMQKE 5CC6BtbI HAFVPTWbo2WxGrdQqlRCNSYe6t5rRQ44GkPD5b6nV2XoJWpslXdNZo02Zn/6cLy2LIbbKuMohXEGdp0c3Nubyiyrh937wZKAOt5eEyNFbXcdhbWzlahZdVMXvOBoOwXuSfpIYgd33mldoA4knUfWqI0MOsOOj5m2tc9zT51z8y13gnj+3fp5JprmFBhudnPlXbVJ/urq0eLXmg3c7DX5M7TZQr9qOXsgFXdvJyaVExnU11ivea5mv3ml9pEtPbob4gZR6njQD8WxJvZGEBQ7PDyDTMefCxnGHBQsDjNHLHZltJDQJ2EGTVAQiX6KVt/Satdnb+6MmqoaJjK4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000007, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > On 16. Sep 2024, at 02:00, Dave Chinner wrote: >=20 > On Thu, Sep 12, 2024 at 03:25:50PM -0700, Linus Torvalds wrote: >> On Thu, 12 Sept 2024 at 15:12, Jens Axboe wrote: >> Honestly, the fact that it hasn't been reverted after apparently >> people knowing about it for months is a bit shocking to me. = Filesystem >> people tend to take unknown corruption issues as a big deal. What >> makes this so special? Is it because the XFS people don't consider it >> an XFS issue, so... >=20 > I don't think this is a data corruption/loss problem - it certainly > hasn't ever appeared that way to me. The "data loss" appeared to be > in incomplete postgres dump files after the system was rebooted and > this is exactly what would happen when you randomly crash the > system. i.e. dirty data in memory is lost, and application data > being written at the time is in an inconsistent state after the > system recovers. IOWs, there was no clear evidence of actual data > corruption occuring, and data loss is definitely expected when the > page cache iteration hangs and the system is forcibly rebooted > without being able to sync or unmount the filesystems=E2=80=A6 > All the hangs seem to be caused by folio lookup getting stuck > on a rogue xarray entry in truncate or readahead. If we find an > invalid entry or a folio from a different mapping or with a > unexpected index, we skip it and try again. Hence this does not > appear to be a data corruption vector, either - it results in a > livelock from endless retry because of the bad entry in the xarray. > This endless retry livelock appears to be what is being reported. >=20 > IOWs, there is no evidence of real runtime data corruption or loss > from this pagecache livelock bug. We also haven't heard of any > random file data corruption events since we've enabled large folios > on XFS. Hence there really is no evidence to indicate that there is > a large folio xarray lookup bug that results in data corruption in > the existing code, and therefore there is no obvious reason for > turning off the functionality we are already building significant > new functionality on top of. Right, understood.=20 However, the timeline of one of the encounters with PostgreSQL (the = first comment in Bugzilla) involved still makes me feel uneasy: =20 T0 : one postgresql process blocked with a different = trace (not involving xas_load) T+a few minutes : another process stuck with the relevant = xas_load/descend trace T+a few more minutes : other processes blocked in xas_load (this time = the systemd journal) T+14m : the journal gets coredumped, likely due to some = watchdog=20 Things go back to normal. T+14h : another postgres process gets fully stuck on the = xas_load/descend trace I agree with your analysis if the process gets stuck in an infinite = loop, but I=E2=80=99ve seen at least one instance where it appears to = have left the loop at some point and IMHO that would be a condition that = would allow data corruption. > It's been 10 months since I asked Christain to help isolate a > reproducer so we can track this down. Nothing came from that, so > we're still at exactly where we were at back in november 2023 - > waiting for information on a way to reproduce this issue more > reliably. Sorry for dropping the ball from my side as well - I=E2=80=99ve learned = my lesson from trying to go through Bugzilla here. ;) You mentioned above that this might involve read-ahead code and that=E2=80= =99s something I noticed before: the machines that carry databases do = run with a higher read-ahead setting (1MiB vs. 128k elsewhere). Also, I=E2=80=99m still puzzled about the one variation that seems to = involve page faults and not XFS. That=E2=80=99s something I haven=E2=80=99= t seen a response to yet whether this IS in fact interesting or not.=20 Christian --=20 Christian Theune =C2=B7 ct@flyingcircus.io =C2=B7 +49 345 219401 0 Flying Circus Internet Operations GmbH =C2=B7 https://flyingcircus.io Leipziger Str. 70/71 =C2=B7 06108 Halle (Saale) =C2=B7 Deutschland HR Stendal HRB 21169 =C2=B7 Gesch=C3=A4ftsf=C3=BChrer: Christian Theune, = Christian Zagrodnick