From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C8E6EEE25C for ; Thu, 12 Sep 2024 21:19:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 08D046B0082; Thu, 12 Sep 2024 17:19:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 03A856B0083; Thu, 12 Sep 2024 17:19:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E6BB56B008A; Thu, 12 Sep 2024 17:19:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C44F36B0082 for ; Thu, 12 Sep 2024 17:19:04 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 72740A9D32 for ; Thu, 12 Sep 2024 21:19:04 +0000 (UTC) X-FDA: 82557351408.18.B6B1993 Received: from mail.flyingcircus.io (mail.flyingcircus.io [212.122.41.197]) by imf06.hostedemail.com (Postfix) with ESMTP id 6B86E180004 for ; Thu, 12 Sep 2024 21:19:02 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=flyingcircus.io header.s=mail header.b=Mpmiq3co; dmarc=pass (policy=reject) header.from=flyingcircus.io; spf=pass (imf06.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726175825; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=RTKdlsvETPxqXxjU7Rd93X4kyI4i83Yt+XK3zkX09p8=; b=KEyG3PegD6G9qMo5+rjOgZjyce/m6aWWnXRr/p6+OtB/Q54VK4wE+E54X42Y+N4oWKdvVU 2MG665v6vzW4xeWas7vD4ypacfh3DrnqZ8if/rxTQGpjaaGLXqdi2D6DESJiCPcwQX5wQE 3A/juoXTFWrYaBG/7S/3bjnMVDIrAKw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726175825; a=rsa-sha256; cv=none; b=dJ/Whq8Oe/aGQPNdAVZgAIS5BxYHmqm8NQhCllKqbQQus0DrzUYLPaf4LUOoXKNI5holtH Z2t2afLAoXV6fSH2mgBmggVF3KU/f33t2APQ9F+vLS1pNShC00G2qkrsfs1Pu+WNdtt0bD VaCqEI0OTTcKYLqKk3uGAioLK1kKoWY= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=flyingcircus.io header.s=mail header.b=Mpmiq3co; dmarc=pass (policy=reject) header.from=flyingcircus.io; spf=pass (imf06.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io From: Christian Theune DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flyingcircus.io; s=mail; t=1726175937; bh=RTKdlsvETPxqXxjU7Rd93X4kyI4i83Yt+XK3zkX09p8=; h=From:Subject:Date:Cc:To; b=Mpmiq3cofaB5rbLbLbuVBsRAlNFsqRTeXxL4W1BX8XPw/xWWQ5UDy3y47Te6V9Epu 2zDs/cQdphkBs69kuP8ASTUB23zi+3SsQen8Z/a5oQmOtJK7F4EzIc4Sm9rmFM33Z+ ZL1soSGZ9MBkX7BUV+Y4k0WmaPIc70TYE6M4Tb3s= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51\)) Subject: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) Message-Id: Date: Thu, 12 Sep 2024 23:18:34 +0200 Cc: torvalds@linux-foundation.org, axboe@kernel.dk, Daniel Dao , Dave Chinner , willy@infradead.org, clm@meta.com, regressions@lists.linux.dev, regressions@leemhuis.info To: linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 6B86E180004 X-Stat-Signature: msi64sry7yquxtcu6j6om5cmsgobrj1y X-Rspam-User: X-HE-Tag: 1726175942-408741 X-HE-Meta: U2FsdGVkX1+SRmx5ngzjPNS3QeCJxOtH+EuNSs7EliIxc7n04BSKgWbL6nD8vsrxa2Batc9AGfPtyf8EhE6unqYHlTH3A6KXPbYzQ8B4BUqtiiGPDj9vmM4nteVC5OxyU5ugR3T7vNNYygPD88GQrCSo8a0Sg7vRfHTpV5owa6lLexqqeQbwpGtrXOD+0OE8SgUouQCOUBcekvn64lA3M5NqczWnRKszXHW5iSMWJxek2j6jIVkihF9eNIucG45xwHC2eQlIYBoSotvEkFdn7QZj+Fuum75Tc4GakJCbcYpfGtvvUE8VsVUQ8ABltgtig6b9OYoWVvqjHP4QV+9z/DwfKpsBVWMtM7zsjldGOoBefKJ+N+mcCqUvSeLAwyMv7Ks5x9ie8Zxm5cyoSN/lk++sH/KJ0V4h1Lc+9DsMhvs0GermZrJzka/hFc1XtFCV9phio+NpU7+twRS9Mnv88sXoPu0mZIzbE5IxuhajyKjQkmn6AdSJNbWc329Y9YNRlfjh80fIGIDIEjv2ul9PUJbsYetrHTkhAJCpoLKfUCNKBh2tSzD/l38Ow7qwRHBOXaUerNTO54+ayHMDnuiRGwyQmgW0TVUw1nqX22dkqzBjLJYX1KsfgcHk+f/XEkl6lQ4pTBX9Gk/SBPqChFtPPektzr9ZRm+7i80jIkIju3l6VpASlVh81ki69yrjsemgU19PMDYjp7ZoYFCcKw1WFr6ijAb2fSV2VjJ+PH78tuBF9Q+mZnRhIWy3cOnck9yRdH4d1DC6Ky1nGwgW17esyQwoqS0NeJf9ML7qaU5spACKaVY/EtcgkbNfeFNOgCJzPX9ARB1+d+voXZfHXpTQhuIrsAhjDOnhsSscwYCVC7oXQxO45J4bM7XSYX47zcqfSoplLMJVGz8+jALKEQkfIqWgsByd3rfpbPyaJGMFop13Qqk3uVetQB2b++j2tDDs1MF1zle7s4BLzsXYMqA vkmAJHDG QvY2wxSfbTuvk+pupnorwuEWNnRL1MK8MJ0GZahk8TLUIwxTpWgoMhGIG6i+GfHylmpWXysZk/mQ+TAYW1L8Am36PzWhLpQIZH3RTfBZZIJ3y7gIDOomZWfju5OTrpZDapQ3hwO/UdYWJnccCFH3mkQo6Ht/wo2vt8iy1l7duenrcMhM86vN/Kp9yNtU/QCmVw1crNc/ORnGNHS7DQM66n8SMoxZY04QSGAsty80mRbkhNzNdEji8WGo+dNoK403eMMzOc7JgIJOJ7wowiv2eKSTb7+TM2GV780PcV7t80lcpngz2/nxVsKuKxSW7lfftlDCxzRJvvDw1gXsxyIMQz6NoPWLk/q0VWMIu X-Bogosity: Ham, tests=bogofilter, spamicity=0.012368, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hello everyone, I=E2=80=99d like to raise awareness about a bug causing data loss = somewhere in MM interacting with XFS that seems to have been around = since Dec 2021 = (https://github.com/torvalds/linux/commit/6795801366da0cd3d99e27c37f020a8f= 16714886). We started encountering this bug when upgrading to 6.1 around June 2023 = and we have had at least 16 instances with data loss in a fleet of 1.5k = VMs. This bug is very hard to reproduce but has been known to exist as a = =E2=80=9Cfluke=E2=80=9D for a while already. I have invested a number of = days trying to come up with workloads to trigger it quicker than that = stochastic =E2=80=9Conce every few weeks in a fleet of 1.5k machines", = but it eludes me so far. I know that this also affects Facebook/Meta as = well as Cloudflare who are both running newer kernels (at least 6.1, = 6.6, and 6.9) with the above mentioned patch reverted. I=E2=80=99m from = a much smaller company and seeing that those guys are running with this = patch reverted (that now makes their kernel basically an = untested/unsupported deviation from the mainline) smells like = desparation. I=E2=80=99m with a much smaller team and company and I=E2=80=99= m wondering why this isn=E2=80=99t tackled more urgently from more hands = to make it shallow (hopefully). The issue appears to happen mostly on nodes that are running some kind = of database or specifically storage-oriented load. In our case we see = this happening with PostgreSQL and MySQL. Cloudflare IIRC saw this with = RocksDB load and Meta is talking about nfsd load. I suspect low memory (but not OOM low) / pressure and maybe swap = conditions seem to increase the chance of triggering it - but I might be = completely wrong on that suspicion. There is a bug report I started here back then: = https://bugzilla.kernel.org/show_bug.cgi?id=3D217572 and there have been = discussions on the XFS list: = https://lore.kernel.org/lkml/CA+wXwBS7YTHUmxGP3JrhcKMnYQJcd6=3D7HE+E1v-guk= 01L2K3Zw@mail.gmail.com/T/ but ultimately this didn=E2=80=99t receive = sufficient interested to keep it moving forward and I ran out of steam. = Unfortunately we can=E2=80=99t be stuck on 5.15 forever and other kernel = developers correctly keep pointing out that we should be updating, but = that isn=E2=80=99t an option as long as this time bomb still exists. Jens pointed out that Meta's findings and their notes on the revert = included "When testing nfsd on top of v5.19, we hit lockups in = filemap_read(). These ended up being because the xarray for the files = being read had pages from other files mixed in." XFS is known to me and admired for the very high standards they = represent regarding testing and avoiding data loss but ultimately that = doesn=E2=80=99t matter if we=E2=80=99re going to be stuck with this bug = forever. I=E2=80=99m able to help funding efforts, help creating a reproducer, = generally donate my time (not a kernel developer myself) and even = provide access to machines that did see the crash (but don=E2=80=99t = carry customer data), but I=E2=80=99m not making any progress or getting = any traction here. Jens encouraged me to raise the visibility in this way - so that=E2=80=99s= what I=E2=80=99m trying here. Please help. In appreciation of all the hard work everyone is putting in and with = hugs and love, Christian --=20 Christian Theune =C2=B7 ct@flyingcircus.io =C2=B7 +49 345 219401 0 Flying Circus Internet Operations GmbH =C2=B7 https://flyingcircus.io Leipziger Str. 70/71 =C2=B7 06108 Halle (Saale) =C2=B7 Deutschland HR Stendal HRB 21169 =C2=B7 Gesch=C3=A4ftsf=C3=BChrer: Christian Theune, = Christian Zagrodnick