From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8C8E6EEE25C
	for <linux-mm@archiver.kernel.org>; Thu, 12 Sep 2024 21:19:05 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 08D046B0082; Thu, 12 Sep 2024 17:19:05 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 03A856B0083; Thu, 12 Sep 2024 17:19:04 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id E6BB56B008A; Thu, 12 Sep 2024 17:19:04 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id C44F36B0082
	for <linux-mm@kvack.org>; Thu, 12 Sep 2024 17:19:04 -0400 (EDT)
Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id 72740A9D32
	for <linux-mm@kvack.org>; Thu, 12 Sep 2024 21:19:04 +0000 (UTC)
X-FDA: 82557351408.18.B6B1993
Received: from mail.flyingcircus.io (mail.flyingcircus.io [212.122.41.197])
	by imf06.hostedemail.com (Postfix) with ESMTP id 6B86E180004
	for <linux-mm@kvack.org>; Thu, 12 Sep 2024 21:19:02 +0000 (UTC)
Authentication-Results: imf06.hostedemail.com;
	dkim=pass header.d=flyingcircus.io header.s=mail header.b=Mpmiq3co;
	dmarc=pass (policy=reject) header.from=flyingcircus.io;
	spf=pass (imf06.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1726175825;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:in-reply-to:
	 references:dkim-signature; bh=RTKdlsvETPxqXxjU7Rd93X4kyI4i83Yt+XK3zkX09p8=;
	b=KEyG3PegD6G9qMo5+rjOgZjyce/m6aWWnXRr/p6+OtB/Q54VK4wE+E54X42Y+N4oWKdvVU
	2MG665v6vzW4xeWas7vD4ypacfh3DrnqZ8if/rxTQGpjaaGLXqdi2D6DESJiCPcwQX5wQE
	3A/juoXTFWrYaBG/7S/3bjnMVDIrAKw=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726175825; a=rsa-sha256;
	cv=none;
	b=dJ/Whq8Oe/aGQPNdAVZgAIS5BxYHmqm8NQhCllKqbQQus0DrzUYLPaf4LUOoXKNI5holtH
	Z2t2afLAoXV6fSH2mgBmggVF3KU/f33t2APQ9F+vLS1pNShC00G2qkrsfs1Pu+WNdtt0bD
	VaCqEI0OTTcKYLqKk3uGAioLK1kKoWY=
ARC-Authentication-Results: i=1;
	imf06.hostedemail.com;
	dkim=pass header.d=flyingcircus.io header.s=mail header.b=Mpmiq3co;
	dmarc=pass (policy=reject) header.from=flyingcircus.io;
	spf=pass (imf06.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io
From: Christian Theune <ct@flyingcircus.io>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flyingcircus.io;
	s=mail; t=1726175937;
	bh=RTKdlsvETPxqXxjU7Rd93X4kyI4i83Yt+XK3zkX09p8=;
	h=From:Subject:Date:Cc:To;
	b=Mpmiq3cofaB5rbLbLbuVBsRAlNFsqRTeXxL4W1BX8XPw/xWWQ5UDy3y47Te6V9Epu
	 2zDs/cQdphkBs69kuP8ASTUB23zi+3SsQen8Z/a5oQmOtJK7F4EzIc4Sm9rmFM33Z+
	 ZL1soSGZ9MBkX7BUV+Y4k0WmaPIc70TYE6M4Tb3s=
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51\))
Subject: Known and unfixed active data loss bug in MM + XFS with large folios
 since Dec 2021 (any kernel from 6.1 upwards)
Message-Id: <A5A976CB-DB57-4513-A700-656580488AB6@flyingcircus.io>
Date: Thu, 12 Sep 2024 23:18:34 +0200
Cc: torvalds@linux-foundation.org,
 axboe@kernel.dk,
 Daniel Dao <dqminh@cloudflare.com>,
 Dave Chinner <david@fromorbit.com>,
 willy@infradead.org,
 clm@meta.com,
 regressions@lists.linux.dev,
 regressions@leemhuis.info
To: linux-mm@kvack.org,
 "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
 linux-fsdevel@vger.kernel.org,
 linux-kernel@vger.kernel.org
X-Rspamd-Server: rspam07
X-Rspamd-Queue-Id: 6B86E180004
X-Stat-Signature: msi64sry7yquxtcu6j6om5cmsgobrj1y
X-Rspam-User: 
X-HE-Tag: 1726175942-408741
X-HE-Meta: U2FsdGVkX1+SRmx5ngzjPNS3QeCJxOtH+EuNSs7EliIxc7n04BSKgWbL6nD8vsrxa2Batc9AGfPtyf8EhE6unqYHlTH3A6KXPbYzQ8B4BUqtiiGPDj9vmM4nteVC5OxyU5ugR3T7vNNYygPD88GQrCSo8a0Sg7vRfHTpV5owa6lLexqqeQbwpGtrXOD+0OE8SgUouQCOUBcekvn64lA3M5NqczWnRKszXHW5iSMWJxek2j6jIVkihF9eNIucG45xwHC2eQlIYBoSotvEkFdn7QZj+Fuum75Tc4GakJCbcYpfGtvvUE8VsVUQ8ABltgtig6b9OYoWVvqjHP4QV+9z/DwfKpsBVWMtM7zsjldGOoBefKJ+N+mcCqUvSeLAwyMv7Ks5x9ie8Zxm5cyoSN/lk++sH/KJ0V4h1Lc+9DsMhvs0GermZrJzka/hFc1XtFCV9phio+NpU7+twRS9Mnv88sXoPu0mZIzbE5IxuhajyKjQkmn6AdSJNbWc329Y9YNRlfjh80fIGIDIEjv2ul9PUJbsYetrHTkhAJCpoLKfUCNKBh2tSzD/l38Ow7qwRHBOXaUerNTO54+ayHMDnuiRGwyQmgW0TVUw1nqX22dkqzBjLJYX1KsfgcHk+f/XEkl6lQ4pTBX9Gk/SBPqChFtPPektzr9ZRm+7i80jIkIju3l6VpASlVh81ki69yrjsemgU19PMDYjp7ZoYFCcKw1WFr6ijAb2fSV2VjJ+PH78tuBF9Q+mZnRhIWy3cOnck9yRdH4d1DC6Ky1nGwgW17esyQwoqS0NeJf9ML7qaU5spACKaVY/EtcgkbNfeFNOgCJzPX9ARB1+d+voXZfHXpTQhuIrsAhjDOnhsSscwYCVC7oXQxO45J4bM7XSYX47zcqfSoplLMJVGz8+jALKEQkfIqWgsByd3rfpbPyaJGMFop13Qqk3uVetQB2b++j2tDDs1MF1zle7s4BLzsXYMqA
 vkmAJHDG
 QvY2wxSfbTuvk+pupnorwuEWNnRL1MK8MJ0GZahk8TLUIwxTpWgoMhGIG6i+GfHylmpWXysZk/mQ+TAYW1L8Am36PzWhLpQIZH3RTfBZZIJ3y7gIDOomZWfju5OTrpZDapQ3hwO/UdYWJnccCFH3mkQo6Ht/wo2vt8iy1l7duenrcMhM86vN/Kp9yNtU/QCmVw1crNc/ORnGNHS7DQM66n8SMoxZY04QSGAsty80mRbkhNzNdEji8WGo+dNoK403eMMzOc7JgIJOJ7wowiv2eKSTb7+TM2GV780PcV7t80lcpngz2/nxVsKuKxSW7lfftlDCxzRJvvDw1gXsxyIMQz6NoPWLk/q0VWMIu
X-Bogosity: Ham, tests=bogofilter, spamicity=0.012368, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Hello everyone,

I=E2=80=99d like to raise awareness about a bug causing data loss =
somewhere in MM interacting with XFS that seems to have been around =
since Dec 2021 =
(https://github.com/torvalds/linux/commit/6795801366da0cd3d99e27c37f020a8f=
16714886).

We started encountering this bug when upgrading to 6.1 around June 2023 =
and we have had at least 16 instances with data loss in a fleet of 1.5k =
VMs.

This bug is very hard to reproduce but has been known to exist as a =
=E2=80=9Cfluke=E2=80=9D for a while already. I have invested a number of =
days trying to come up with workloads to trigger it quicker than that =
stochastic =E2=80=9Conce every few weeks in a fleet of 1.5k machines", =
but it eludes me so far. I know that this also affects Facebook/Meta as =
well as Cloudflare who are both running newer kernels (at least 6.1, =
6.6, and 6.9) with the above mentioned patch reverted. I=E2=80=99m from =
a much smaller company and seeing that those guys are running with this =
patch reverted (that now makes their kernel basically an =
untested/unsupported deviation from the mainline) smells like =
desparation. I=E2=80=99m with a much smaller team and company and I=E2=80=99=
m wondering why this isn=E2=80=99t tackled more urgently from more hands =
to make it shallow (hopefully).

The issue appears to happen mostly on nodes that are running some kind =
of database or specifically storage-oriented load. In our case we see =
this happening with PostgreSQL and MySQL. Cloudflare IIRC saw this with =
RocksDB load and Meta is talking about nfsd load.

I suspect low memory (but not OOM low) / pressure and maybe swap =
conditions seem to increase the chance of triggering it - but I might be =
completely wrong on that suspicion.

There is a bug report I started here back then: =
https://bugzilla.kernel.org/show_bug.cgi?id=3D217572 and there have been =
discussions on the XFS list: =
https://lore.kernel.org/lkml/CA+wXwBS7YTHUmxGP3JrhcKMnYQJcd6=3D7HE+E1v-guk=
01L2K3Zw@mail.gmail.com/T/ but ultimately this didn=E2=80=99t receive =
sufficient interested to keep it moving forward and I ran out of steam. =
Unfortunately we can=E2=80=99t be stuck on 5.15 forever and other kernel =
developers correctly keep pointing out that we should be updating, but =
that isn=E2=80=99t an option as long as this time bomb still exists.

Jens pointed out that Meta's findings and their notes on the revert =
included "When testing nfsd on top of v5.19, we hit lockups in =
filemap_read(). These ended up being because the xarray for the files =
being read had pages from other files mixed in."

XFS is known to me and admired for the very high standards they =
represent regarding testing and avoiding data loss but ultimately that =
doesn=E2=80=99t matter if we=E2=80=99re going to be stuck with this bug =
forever.

I=E2=80=99m able to help funding efforts, help creating a reproducer, =
generally donate my time (not a kernel developer myself) and even =
provide access to machines that did see the crash (but don=E2=80=99t =
carry customer data), but I=E2=80=99m not making any progress or getting =
any traction here.

Jens encouraged me to raise the visibility in this way - so that=E2=80=99s=
 what I=E2=80=99m trying here.

Please help.

In appreciation of all the hard work everyone is putting in and with =
hugs and love,
Christian

--=20
Christian Theune =C2=B7 ct@flyingcircus.io =C2=B7 +49 345 219401 0
Flying Circus Internet Operations GmbH =C2=B7 https://flyingcircus.io
Leipziger Str. 70/71 =C2=B7 06108 Halle (Saale) =C2=B7 Deutschland
HR Stendal HRB 21169 =C2=B7 Gesch=C3=A4ftsf=C3=BChrer: Christian Theune, =
Christian Zagrodnick