From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7B23AC3ABB2
	for <linux-mm@archiver.kernel.org>; Mon, 16 Sep 2024 07:15:16 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id AD2E96B008C; Mon, 16 Sep 2024 03:15:15 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id A84C16B0092; Mon, 16 Sep 2024 03:15:15 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 94A326B0093; Mon, 16 Sep 2024 03:15:15 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16])
	by kanga.kvack.org (Postfix) with ESMTP id 76B4A6B008C
	for <linux-mm@kvack.org>; Mon, 16 Sep 2024 03:15:15 -0400 (EDT)
Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay01.hostedemail.com (Postfix) with ESMTP id 1F2811C1468
	for <linux-mm@kvack.org>; Mon, 16 Sep 2024 07:15:15 +0000 (UTC)
X-FDA: 82569740190.06.5045EC9
Received: from mail.flyingcircus.io (mail.flyingcircus.io [212.122.41.197])
	by imf21.hostedemail.com (Postfix) with ESMTP id 025611C0007
	for <linux-mm@kvack.org>; Mon, 16 Sep 2024 07:15:12 +0000 (UTC)
Authentication-Results: imf21.hostedemail.com;
	dkim=pass header.d=flyingcircus.io header.s=mail header.b=E9epe5D3;
	spf=pass (imf21.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io;
	dmarc=pass (policy=reject) header.from=flyingcircus.io
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1726470803;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=t9TGglCebkVwz3Xfy8T8OgJ7gnTHiWWWEXfL6kuEj/E=;
	b=ujlZ9TSp/LQsRyIiAWo6Uox4oCAIp+KDzCVuXXqruUhYmL70LuJUCSyxwP+PjQXV0TXfo4
	5dpkBZf8TMC6uNCwUtWmz93mjPs7+6NQukFCa4vufga0o75KtEPvEsU8L66uzSpONeOO4u
	SPQxsUdp3p3Lx+Q7yNkaqEpDPGZKSE4=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726470803; a=rsa-sha256;
	cv=none;
	b=wxtx5lbj/DrA/GJu3kvMMyckavHw7jAuySXs8Wu+cTBv0q99lVfo8mlKkVCmGwGGX2oSqH
	Lf71IQ2fiL99GHtEO0KhjqkaP0mzyq+o6XtBv2ixbtIV1zuJMxqRxye+mZa6AtbWnkCw7o
	TjkUOQlGhms8Nb4fvurOjdSy+Vh+Bk0=
ARC-Authentication-Results: i=1;
	imf21.hostedemail.com;
	dkim=pass header.d=flyingcircus.io header.s=mail header.b=E9epe5D3;
	spf=pass (imf21.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io;
	dmarc=pass (policy=reject) header.from=flyingcircus.io
Content-Type: text/plain;
	charset=utf-8
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flyingcircus.io;
	s=mail; t=1726470908;
	bh=t9TGglCebkVwz3Xfy8T8OgJ7gnTHiWWWEXfL6kuEj/E=;
	h=Subject:From:In-Reply-To:Date:Cc:References:To;
	b=E9epe5D3D1j+Q+sICOdAlUUKkZN6QetDxnUXPHefyIa63ZlqWP1lclUkb+7l8cWEf
	 rsaAiCHiWmruiNdTaYBVwekXCY0u0mcMUF5vAWHLDxbPL2nuIattrWFD2EJ6Z3IXw9
	 N9n65hNnyi+a/Ys2GeVYLfnN3+1xaQrrJtvsJpJg=
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51\))
Subject: Re: Known and unfixed active data loss bug in MM + XFS with large
 folios since Dec 2021 (any kernel from 6.1 upwards)
From: Christian Theune <ct@flyingcircus.io>
In-Reply-To: <Zud1EhTnoWIRFPa/@dread.disaster.area>
Date: Mon, 16 Sep 2024 09:14:45 +0200
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
 Jens Axboe <axboe@kernel.dk>,
 Matthew Wilcox <willy@infradead.org>,
 linux-mm@kvack.org,
 "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
 linux-fsdevel@vger.kernel.org,
 linux-kernel@vger.kernel.org,
 Daniel Dao <dqminh@cloudflare.com>,
 clm@meta.com,
 regressions@lists.linux.dev,
 regressions@leemhuis.info
Content-Transfer-Encoding: quoted-printable
Message-Id: <686D222E-3CA3-49BE-A9E5-E5E2F5AFD5DA@flyingcircus.io>
References: <A5A976CB-DB57-4513-A700-656580488AB6@flyingcircus.io>
 <ZuNjNNmrDPVsVK03@casper.infradead.org>
 <0fc8c3e7-e5d2-40db-8661-8c7199f84e43@kernel.dk>
 <CAHk-=wh5LRp6Tb2oLKv1LrJWuXKOvxcucMfRMmYcT-npbo0=_A@mail.gmail.com>
 <Zud1EhTnoWIRFPa/@dread.disaster.area>
To: Dave Chinner <david@fromorbit.com>
X-Stat-Signature: amn5dpe61uoz1maraoss8nqew9rpf7o3
X-Rspamd-Queue-Id: 025611C0007
X-Rspam-User: 
X-Rspamd-Server: rspam08
X-HE-Tag: 1726470912-243842
X-HE-Meta: U2FsdGVkX1/rQvC5l+obqa5BfAQYRMQCU6ACyYR5htnurs8WJxr4W0XYPpJ1q3m9gs11r1kGHR03E91y+UI9DAlNhlNYZYZ0wClHtpUHCcNi1MPEGJbjERfA718KcnbKYIHyLhcJTeAKG2UF/odGk+fvvUGBI6CmnMmsq9xlY1aS8o3z+sZu8qhsRrgi6cFo48OAFKpF08XXr44+JkH+bmceToDGBsesl9uC/AMwiGktJNkq6LYyt+BxErFhdsNd5Yfjqofsp0kf3+E62fTF5nJ7HUfC3PnWEy1YUwVtrhmlaAbQtUa+iZnDTBFJk1dPxtRxCNKLChbnCVXD43Hr9yLJLGdmPev/I2OTJUr9PH3s5CrhrV68BkOl177niB/p7+y3CR/B9fBia/l+/5jv+3YLy5TGW67vm7QqETkahRvxPouzW5MSRrvowja69pxcxq2hKnJt9IXIZ6O6v5pxZLQlSrzP6Jh5R3npj/e6DSGi8x5uzBvRzqKLoqGGfwbDh8E88Xs6wlvZf6as4AKJ+PAaZ8wnGrmhlkFr/hIYUPxaE5d06kBzDhJjcv0GpSMMKDZlAAx8WsM3NnlktIu5Lkyy2BA/sVdbrbrNCG2yPOdaJU0ACwdgXJdejH0t0H9kSK4HkYz2u0YUKKH6+mqjoHG+0XO2p4//DhD7a70Tt9AzYU0ozPI+kYtVVWDS26BTvxiwDUxT8C/pL042O73bl9IfPzaPtEgysjNn5CyvrMTExqlkDSg43eGfO8p0rH3FMQ/DMbOQ33KaUiED5YTkTl1DQlJMljaKe4kByi8gX15Cyt6MxjXDRPmKBbYFgsBAOqpGx/LPQ4HSR49y2say5te5aqRjVKNXp9UhXSbHQpFbuMpTNStb0MXTGkFOak8xOrGWFXkO/baFsyCieXR3pyvzSMIyIPaN89+0172wjSeSI34I3Q3FbDR4cyj03b5ZEMvTK3oFsAOD7yxMQKE
 5CC6BtbI
 HAFVPTWbo2WxGrdQqlRCNSYe6t5rRQ44GkPD5b6nV2XoJWpslXdNZo02Zn/6cLy2LIbbKuMohXEGdp0c3Nubyiyrh937wZKAOt5eEyNFbXcdhbWzlahZdVMXvOBoOwXuSfpIYgd33mldoA4knUfWqI0MOsOOj5m2tc9zT51z8y13gnj+3fp5JprmFBhudnPlXbVJ/urq0eLXmg3c7DX5M7TZQr9qOXsgFXdvJyaVExnU11ivea5mv3ml9pEtPbob4gZR6njQD8WxJvZGEBQ7PDyDTMefCxnGHBQsDjNHLHZltJDQJ2EGTVAQiX6KVt/Satdnb+6MmqoaJjK4=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000007, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>


> On 16. Sep 2024, at 02:00, Dave Chinner <david@fromorbit.com> wrote:
>=20
> On Thu, Sep 12, 2024 at 03:25:50PM -0700, Linus Torvalds wrote:
>> On Thu, 12 Sept 2024 at 15:12, Jens Axboe <axboe@kernel.dk> wrote:
>> Honestly, the fact that it hasn't been reverted after apparently
>> people knowing about it for months is a bit shocking to me. =
Filesystem
>> people tend to take unknown corruption issues as a big deal. What
>> makes this so special? Is it because the XFS people don't consider it
>> an XFS issue, so...
>=20
> I don't think this is a data corruption/loss problem - it certainly
> hasn't ever appeared that way to me.  The "data loss" appeared to be
> in incomplete postgres dump files after the system was rebooted and
> this is exactly what would happen when you randomly crash the
> system. i.e. dirty data in memory is lost, and application data
> being written at the time is in an inconsistent state after the
> system recovers. IOWs, there was no clear evidence of actual data
> corruption occuring, and data loss is definitely expected when the
> page cache iteration hangs and the system is forcibly rebooted
> without being able to sync or unmount the filesystems=E2=80=A6
> All the hangs seem to be caused by folio lookup getting stuck
> on a rogue xarray entry in truncate or readahead. If we find an
> invalid entry or a folio from a different mapping or with a
> unexpected index, we skip it and try again.  Hence this does not
> appear to be a data corruption vector, either - it results in a
> livelock from endless retry because of the bad entry in the xarray.
> This endless retry livelock appears to be what is being reported.
>=20
> IOWs, there is no evidence of real runtime data corruption or loss
> from this pagecache livelock bug.  We also haven't heard of any
> random file data corruption events since we've enabled large folios
> on XFS. Hence there really is no evidence to indicate that there is
> a large folio xarray lookup bug that results in data corruption in
> the existing code, and therefore there is no obvious reason for
> turning off the functionality we are already building significant
> new functionality on top of.

Right, understood.=20

However, the timeline of one of the encounters with PostgreSQL (the =
first comment in Bugzilla) involved still makes me feel uneasy:
=20
T0                   : one postgresql process blocked with a different =
trace (not involving xas_load)
T+a few minutes      : another process stuck with the relevant =
xas_load/descend trace
T+a few more minutes : other processes blocked in xas_load (this time =
the systemd journal)
T+14m                : the journal gets coredumped, likely due to some =
watchdog=20

Things go back to normal.

T+14h                : another postgres process gets fully stuck on the =
xas_load/descend trace


I agree with your analysis if the process gets stuck in an infinite =
loop, but I=E2=80=99ve seen at least one instance where it appears to =
have left the loop at some point and IMHO that would be a condition that =
would allow data corruption.

> It's been 10 months since I asked Christain to help isolate a
> reproducer so we can track this down. Nothing came from that, so
> we're still at exactly where we were at back in november 2023 -
> waiting for information on a way to reproduce this issue more
> reliably.

Sorry for dropping the ball from my side as well - I=E2=80=99ve learned =
my lesson from trying to go through Bugzilla here. ;)

You mentioned above that this might involve read-ahead code and that=E2=80=
=99s something I noticed before: the machines that carry databases do =
run with a higher read-ahead setting (1MiB vs. 128k elsewhere).

Also, I=E2=80=99m still puzzled about the one variation that seems to =
involve page faults and not XFS. That=E2=80=99s something I haven=E2=80=99=
t seen a response to yet whether this IS in fact interesting or not.=20

Christian

--=20
Christian Theune =C2=B7 ct@flyingcircus.io =C2=B7 +49 345 219401 0
Flying Circus Internet Operations GmbH =C2=B7 https://flyingcircus.io
Leipziger Str. 70/71 =C2=B7 06108 Halle (Saale) =C2=B7 Deutschland
HR Stendal HRB 21169 =C2=B7 Gesch=C3=A4ftsf=C3=BChrer: Christian Theune, =
Christian Zagrodnick