From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A6ECCE8D4C for ; Thu, 19 Sep 2024 06:35:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7A5956B0082; Thu, 19 Sep 2024 02:35:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7558D6B0083; Thu, 19 Sep 2024 02:35:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 61CB96B0085; Thu, 19 Sep 2024 02:35:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 43A3F6B0082 for ; Thu, 19 Sep 2024 02:35:06 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D2B9EC0AEF for ; Thu, 19 Sep 2024 06:35:05 +0000 (UTC) X-FDA: 82580525370.21.D191879 Received: from mail.flyingcircus.io (mail.flyingcircus.io [212.122.41.197]) by imf10.hostedemail.com (Postfix) with ESMTP id D6E95C0006 for ; Thu, 19 Sep 2024 06:35:03 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=flyingcircus.io header.s=mail header.b=ffStb6s8; spf=pass (imf10.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io; dmarc=pass (policy=reject) header.from=flyingcircus.io ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726727591; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VRmJB0TDLwZhilpn0qW35CwpUJuZSIY3BltKr4ca9cU=; b=K7SqJEgMfsAYquPQwU4AEgwdl3O87caLvGP77QJpPzP8bnSXS6ib4x6+/xX+Peg4kcbyoB dqHjWjTi9zSDrSqZ2rjGo4LRq3U/EhIbXv7clSH0PJlnoIkxHjAMWiZ65alMK17Un1NA19 wbqtT3iOMbp5SyoPFFt7Ak8sjqWH8cc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726727591; a=rsa-sha256; cv=none; b=UkKz+ibhsh9EjFYNtYh/J1Yrfa+AuSmIG1m/D1zKPutg9UnyptSfyWs07AcZNTlDUZfzJq HXMLnPu4H0ijMc2kcE+1ZMXDTdIxiUH87RKs7XZLE6i+YrrZVRmkv2H0DsCrB/9YudTtHl cBXP4RAVaQ00PHp+2x90KIahmDdBYxo= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=flyingcircus.io header.s=mail header.b=ffStb6s8; spf=pass (imf10.hostedemail.com: domain of ct@flyingcircus.io designates 212.122.41.197 as permitted sender) smtp.mailfrom=ct@flyingcircus.io; dmarc=pass (policy=reject) header.from=flyingcircus.io Content-Type: text/plain; charset=utf-8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=flyingcircus.io; s=mail; t=1726727698; bh=VRmJB0TDLwZhilpn0qW35CwpUJuZSIY3BltKr4ca9cU=; h=Subject:From:In-Reply-To:Date:Cc:References:To; b=ffStb6s8HBuWmwvQNys8HfkponNZskM8Va7mK4b5PD+RrjW/eX0uaZKkWUvFVvEPv JBdPaenuRjtZzTY8rHH4hzds6EBghd/8neuhuBb9/7fboUmtwNBWT4R0Tov1iDZCnZ 9dLt/GxPJ2WN2iScG890VJaEkwzZo8/UgmT9/o4A= Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3818.100.11.1.3\)) Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) From: Christian Theune In-Reply-To: Date: Thu, 19 Sep 2024 08:34:37 +0200 Cc: Dave Chinner , Matthew Wilcox , Chris Mason , Jens Axboe , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , regressions@lists.linux.dev, regressions@leemhuis.info Content-Transfer-Encoding: quoted-printable Message-Id: References: <74cceb67-2e71-455f-a4d4-6c5185ef775b@meta.com> <52d45d22-e108-400e-a63f-f50ef1a0ae1a@meta.com> <5bee194c-9cd3-47e7-919b-9f352441f855@kernel.dk> <459beb1c-defd-4836-952c-589203b7005c@meta.com> To: Linus Torvalds X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D6E95C0006 X-Stat-Signature: 7ay7ob39dk8n9sroh3pspj4wpe459wjb X-HE-Tag: 1726727703-764194 X-HE-Meta: U2FsdGVkX18ogz5YhwhPlEEica/jJbVs84Ia/BPb3unXdCUO7vhb8aCKpMGJKmpu9Zo60jQV1dY8cgyDICTIfiLPwRGBNAtxK+l3OQfMRYXawGo6U8mtkpPX+oOAh4xRT5x12AYiWhLc4DsvHGkLOmd0Npksb+Ua5g+v91JxTg4XXs7Pmg4GG7DAt+RRsIhnEHGtPk45NvW2YmeHMXlQi5Fz36EaTGKis1kLU0D5fsg5LU2NLcHP3EncBiTOlsTMt1BGb2VjAXbxo5nx5z/jxVBwm355Aqn+AvRumSXKnuqj9iKRUlGOsjmLO0E4Olu+Qx3p1//GR/9DTOXhb1qPutDNI+uEikrD4zhQORpXbMMvsh9qwQ8kkYqXvS7R4pYOQA3xjlFGtYBSGf68hNlN4bSwZLyM8/IHMLIh02p1UG9p481MAdYnv3jaULng2b2rGskPu5DwTT4gkDpzii6jI0kN4SZP+KvE+okwrT//2jnG+ATnFqm0pfKrgH/x3KcMxcAtSkt+iaokGBroZ8X6sD0DR+UyxFOyx6C9w7yg4ZtW0nYxMYtwucy1PdVXMfX+6Ah7TpSlPdCTokL6V3cLAAAPCL2EQHLf6rJfhn8uThhlfFV6CH//ogZx75W2y4YIm8nKNYn6wUXZvIj05h2VfOamMg10AkOErSHaK6lSzP47XSrIXFB6kFeg8RxAKKWe4otTRvtkxlsAArz+LdATRYh3noSHGXLgVkDI1mvMYYViB0UHWPA/J4tOlZ9XzMq1HpFeYdij/YecCNkcGdPyUdkRdpeBUUMBq4MgyZtZw/CKW/hJuJ3w5ikzcT57elyvzXwwIHlTV6FAmjtKSzjxPMcUKjQmGr8cvYBXjpFY1wAucm2saFrh8b1gJEOsLQ7HoCEyCRrFnt07c67hxJd6yFEspXz1EDMYFQfl7t0RVU1czff61VgL9mA3OytBmiDs/RTkTbLPA+YaJc+Lg/3 XA6Iq4Qd SpUn1eB0gP090kLz1gBrDpIz8GXP6/YzJHBpp2zfMnSCcxUyfM2n89mZDZMKrz/vSfBePb4TJQG4padUda8BGu08tPK6T2sZIzxskK334DrbhcejSSmLiLKwByh+SW+r9GfD4Jiw9fIG0HcikuZ7K2t2NGg117/Nhkq4RZG7W8jCGS+pj+K6LCDmEY/tEEofA6LeeKHvP3iLat998G52ZQ6aRR/JctYlUfpx2zz3+O9+Yqv3vtuePOn1Q02D6oTkf1bbHUFXvW2fRx7zpqNyZkwewYwg2/Zd6Zf/wTEjoOowqEbNXE/YIZKLXJaV60nefG+bura8zNAwFODE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000020, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > On 19. Sep 2024, at 05:12, Linus Torvalds = wrote: >=20 > On Thu, 19 Sept 2024 at 05:03, Linus Torvalds > wrote: >>=20 >> I think we should just do the simple one-liner of adding a >> "xas_reset()" to after doing xas_split_alloc() (or do it inside the >> xas_split_alloc()). >=20 > .. and obviously that should be actually *verified* to fix the issue > not just with the test-case that Chris and Jens have been using, but > on Christian's real PostgreSQL load. >=20 > Christian? Happy to! I see there=E2=80=99s still some back and forth on the = specific patches. Let me know which kernel version and which patches I = should start trying out. I=E2=80=99m loosing track while following the = discussion.=20 In preparation: I=E2=80=99m wondering whether the known reproducer gives = insight how I might force my load to trigger it more easily? Would = running the reproducer above and combining that with a running = PostgreSQL benchmark make sense?=20 Otherwise we=E2=80=99d likely only be getting insight after weeks of not = seeing crashes =E2=80=A6=20 Christian --=20 Christian Theune =C2=B7 ct@flyingcircus.io =C2=B7 +49 345 219401 0 Flying Circus Internet Operations GmbH =C2=B7 https://flyingcircus.io Leipziger Str. 70/71 =C2=B7 06108 Halle (Saale) =C2=B7 Deutschland HR Stendal HRB 21169 =C2=B7 Gesch=C3=A4ftsf=C3=BChrer: Christian Theune, = Christian Zagrodnick