From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37F53EEE26C for ; Thu, 12 Sep 2024 22:11:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A9DE86B0082; Thu, 12 Sep 2024 18:11:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A4CE16B0083; Thu, 12 Sep 2024 18:11:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 93B656B0088; Thu, 12 Sep 2024 18:11:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 746AC6B0082 for ; Thu, 12 Sep 2024 18:11:42 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D83BC140939 for ; Thu, 12 Sep 2024 22:11:41 +0000 (UTC) X-FDA: 82557484002.11.3DA7539 Received: from mail.flyingcircus.io (mail.flyingcircus.io [212.122.41.197]) by imf24.hostedemail.com (Postfix) with ESMTP id 9BD77180009 for ; Thu, 12 Sep 2024 22:11:39 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=flyingcircus.io header.s=mail header.b=ZUhlgfLi; spf=pass (imf24.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io; dmarc=pass (policy=reject) header.from=flyingcircus.io ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726179047; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tLufAUa9Md3zA7yO5hhrLcww1PKNnR1dzXzgw8FRsVc=; b=SEQtUvRLsZAOd8K5BCmZGkwiGDeYVoURb2Xpo/3W+pc00l351esBGTV5aGUwoq/3/Pvq0I zeyrZHnSetDR55UvtaPE0sMLSW4/dP8Jo1XudcAXqZRNEjkFv5u8B3Jtlh5S46X/MaNtS0 OfW6eH47qw103ETHOBuN40UI+m2L/Xc= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=flyingcircus.io header.s=mail header.b=ZUhlgfLi; spf=pass (imf24.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io; dmarc=pass (policy=reject) header.from=flyingcircus.io ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726179047; a=rsa-sha256; cv=none; b=7sfyDve8WKMMhB7NOYivH1Odi0NmPMtDFaXvOhDqOYcUY1qGIzl4mQzFxjrbpMF7zT9Ltz ecSFOOlly327yiqf6qriGOY0qw7JMMYS6NXWRW5rpAxkyOjSlcymEhPGUXOSieo+crldMW xDsMwh6IBq/iqzuUjKAy5g2PCHkZt1E= Content-Type: text/plain; charset=utf-8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flyingcircus.io; s=mail; t=1726179095; bh=tLufAUa9Md3zA7yO5hhrLcww1PKNnR1dzXzgw8FRsVc=; h=Subject:From:In-Reply-To:Date:Cc:References:To; b=ZUhlgfLiJpi0EFJgc0SJooISs/ObkWUgF4bWjtliLY/eQwfcidnzLTtYC97UwswbH gvwvbrgkLKqVaCRv7OPQp4NZa7qonpQOAsiWDmRdMa1ouRIDRTgDpI4z6LME7S7A1g +kipp5bA686f6PsXwaQDfcGM2C1gwIACJeSzL/s4= Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51\)) Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) From: Christian Theune In-Reply-To: Date: Fri, 13 Sep 2024 00:11:14 +0200 Cc: linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, axboe@kernel.dk, Daniel Dao , Dave Chinner , clm@meta.com, regressions@lists.linux.dev, regressions@leemhuis.info Content-Transfer-Encoding: quoted-printable Message-Id: <969BEE75-323B-4331-8E09-60AA3E662EC6@flyingcircus.io> References: To: Matthew Wilcox X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 9BD77180009 X-Stat-Signature: ac3u6g3ezdzbu4psgmmgt6s3fdpg115k X-HE-Tag: 1726179099-649619 X-HE-Meta: U2FsdGVkX187Jlj8fHdnCA/XHbcNaRiJP4Ob23J+Hhb8jASCFNXm84rlH3BXvIXm76PJoXNkDFTw+77+n4ael+9ucFwK9xKsEyjrKFlDw2aBJjhMNd1UNPhof6QqHad9gYp9p6+MaTdDKy55iAU8mOr8JnMKkR0vBnk1ht6TgaPGKLHslmipHysFp+AajXESQZosrm/yoQqBsqIeiO2Wu5uvnof7Y7vtGE/pnb7MNloV+rYnqCct476+1z4NjX2e9RqMY7HAGW1ARKhU5aR8g2lBPyJPsBIJDFVhccB2j3ua9Mi9WHWwwSb3DBtxeH1U28bQO0dbwMyh7o3eTsCVTIv8YBiQu9kvcgl7JrACNgyW+hmTQh2nNZNKYDTqaM6tpCarhJxxW+bp+DXWZtYRzurTV6kFt9yO87dKXbUpGoKovd3TfpPBrx6Gg9Cze/8iDRm/qA1sIzCS4O/BKwIfyGxmXsAH1YNPdfLxEULK0AsfdbongSKgJwLHbXzrF8KABplIp8wlzs2bL761Me4t4sAOjQwAml4jpkfGPs400Pbj4mLQXKPydwFbomZhEccDQBcSX3d8W1WvJ80cOq1dDm/wkjYlzu6iwrvXNIiAXtENrCTU1ijM0mpPTO0Fog21PEoEdcMJlyGShuu6k3V+Yg85xXNRqaQ52TbQnOqmpWCXoO+bJfcLTSVL1zqQAY+Siv6BX2IyCXpeCLWYnV6TBa8gu5jOcMmHR5mSqK6nHgrq4rCF7ULOC2qr4fSnnHDLwFYM6Y95X/fMHQIMUpRO67w5jyX8c/9tM+R2FGuRtt3/sK2U1CClap9du7bKZxrJlJn0AIb5E9Djw+V+dYWMsC8lCWUiGZ9MNrxrnMesujyJ4LP37UjEun+TbGfX1ledrJl1pzXoxdikOJzKEv9aQBjoMT5bjGiR0NeMEm2vWEpPO+Q2fXZF9jHKD1XtE4RoesgsaE7fkMOyXF5oe9z 01zcimcQ xCHQBBOZtl+z/7mWfmR4mp1vStxJy9TschJUCqPKv7SwhFolcNtXqsl4nZA2ns3eeK72nwO2kg3sbs2t6DFTqEQYjSpm6J4B3vXU7d144sRQkMTTq5jmqAtJJJygw/dX3AiNbotepsbK8Pw0bi6ABiZZSTQCsE2yGOLW9E1IWq4WDWmkxeAxtdIlLDzcfAu3G1w05eUhnezdXVUc0Hk2Od4yXaXB9DnLB4SpT0SF8Y1Dzl0PfVCSZUHdYOfHd3BuSGhL7QSV229nnZ0q3AiBmqzF2u6o2I9rn2RQGLMBM+pJqRLhPUFZQl3nCKhrgtrazapB43I+phJ6IMrI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.121139, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Matthew, > On 12. Sep 2024, at 23:55, Matthew Wilcox wrote: >=20 > On Thu, Sep 12, 2024 at 11:18:34PM +0200, Christian Theune wrote: >> This bug is very hard to reproduce but has been known to exist as a >> =E2=80=9Cfluke=E2=80=9D for a while already. I have invested a number = of days trying >> to come up with workloads to trigger it quicker than that stochastic >> =E2=80=9Conce every few weeks in a fleet of 1.5k machines", but it = eludes >> me so far. I know that this also affects Facebook/Meta as well as >> Cloudflare who are both running newer kernels (at least 6.1, 6.6, >> and 6.9) with the above mentioned patch reverted. I=E2=80=99m from a = much >> smaller company and seeing that those guys are running with this = patch >> reverted (that now makes their kernel basically an = untested/unsupported >> deviation from the mainline) smells like desparation. I=E2=80=99m = with a >> much smaller team and company and I=E2=80=99m wondering why this = isn=E2=80=99t >> tackled more urgently from more hands to make it shallow (hopefully). >=20 > This passive-aggressive nonsense is deeply aggravating. I've known > about this bug for much longer, but like you I am utterly unable to > reproduce it. I've spent months looking for the bug, and I cannot. I=E2=80=99m sorry. I=E2=80=99ve honestly tried my best to not make this = message personally injuring to anybody involved while trying to also = communicate the seriousness of this issue that we=E2=80=99re stuck with. = Apparently I failed.=20 As I=E2=80=99m not a kernel developer I tried to stick to describing the = issue and am not sure what strategies would typically need to be applied = when individual efforts fail.=20 I=E2=80=99m not sure why it=E2=80=99s nonsense, though. Liebe Gr=C3=BC=C3=9Fe, Christian Theune --=20 Christian Theune =C2=B7 ct@flyingcircus.io =C2=B7 +49 345 219401 0 Flying Circus Internet Operations GmbH =C2=B7 https://flyingcircus.io Leipziger Str. 70/71 =C2=B7 06108 Halle (Saale) =C2=B7 Deutschland HR Stendal HRB 21169 =C2=B7 Gesch=C3=A4ftsf=C3=BChrer: Christian Theune, = Christian Zagrodnick