From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6B8DCCD1A5 for ; Wed, 18 Sep 2024 13:35:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2FA526B0082; Wed, 18 Sep 2024 09:35:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A92B6B0083; Wed, 18 Sep 2024 09:35:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 198BC6B0085; Wed, 18 Sep 2024 09:35:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id EF0746B0082 for ; Wed, 18 Sep 2024 09:35:08 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 88B58120637 for ; Wed, 18 Sep 2024 13:35:08 +0000 (UTC) X-FDA: 82577955096.03.B61542A Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf28.hostedemail.com (Postfix) with ESMTP id C0B6CC0008 for ; Wed, 18 Sep 2024 13:35:05 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=eiUTHz3H; spf=none (imf28.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726666358; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ctRJF9uHUjt117HVJblBA7Yck7PY6Nvpd0RsIq4J+Mk=; b=NRS87ahpVTyThEXs2wTuR8HH0ZwXuTdUVIqSoqYtBIG8gpUx6/lsjWhtMIKg9J5PUOklS3 Il5dzReCwnKPXNpwc94ai0GkfIhp7lm2oKETG8HLvEfDBN7FJsRAsi2vb/Jvkn5aEWyWfc 9YMpXtsQd3c0KMfmrSpqbF9+7yzesZs= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=eiUTHz3H; spf=none (imf28.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726666358; a=rsa-sha256; cv=none; b=cB/6tshcGLMO7T/ZRzQyWQQi1z5oZdlGoxqOZ4f1fglYVHrwo+lLtczg6qCVpoSMUnXwv8 ROPE+MaJPPdBB9Rx4HyVSzSBff9Uw/LOwxw1YcN82yf3Jlk1v8RI6rrfbqCJp1wx9LX6Py 1TB9/7+ayglS9tzIKARz2LaPXp189f4= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=ctRJF9uHUjt117HVJblBA7Yck7PY6Nvpd0RsIq4J+Mk=; b=eiUTHz3HhgXvW8Wyk9peiYyKdd 3zGGUtvVY5myV3gIz6y0vli6yjNWQUN5uysRnuNsNh5lKSvU5+uBD57H3iEx8zkWow545P1411ZLv 0DImM9weqgmnyjikDi62MG3TQflVd8sXuAoWFd60jNlWHKpN7S813yB1t3Nqmrqc6fleDDoFtaGMO TCmkUIKHjuhb9/kw4+FJDEHc+Sv6ZS6wUt++38naA1gK1LyyJb18tNQXxY+wzACakL3+GlS5039JT SBLNmcdLzYQZALEDtnJDl1JMh4b1/ttZogVVU9lPewq2p6A6d4+omV3rVwNroQPJFXXi9y0BifYii j2Cj+qCg==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1squpd-00000005Yqo-2bc7; Wed, 18 Sep 2024 13:34:57 +0000 Date: Wed, 18 Sep 2024 14:34:57 +0100 From: Matthew Wilcox To: Chris Mason Cc: Jens Axboe , Linus Torvalds , Dave Chinner , Christian Theune , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , regressions@lists.linux.dev, regressions@leemhuis.info Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) Message-ID: References: <0fc8c3e7-e5d2-40db-8661-8c7199f84e43@kernel.dk> <74cceb67-2e71-455f-a4d4-6c5185ef775b@meta.com> <52d45d22-e108-400e-a63f-f50ef1a0ae1a@meta.com> <5bee194c-9cd3-47e7-919b-9f352441f855@kernel.dk> <459beb1c-defd-4836-952c-589203b7005c@meta.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <459beb1c-defd-4836-952c-589203b7005c@meta.com> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: C0B6CC0008 X-Stat-Signature: 5mhxxbk8rb8tdfoqxm3eyaiz4gan6stk X-Rspam-User: X-HE-Tag: 1726666505-358647 X-HE-Meta: U2FsdGVkX1+7gXwsR8ChQxFXZ9Di2qGNTwu50Dc/zsh6bxNU4/2H1+lqYDOoDbtFzz3IqBTnT42JEwZct0NVrEli0R66zDiIdQF79SBceeGezwojI60offpjkD2Pt72z+WhhZhB/PpYNhKNQ90kAewBa2xChx9Z/MrU3YodDnlNuPmKRgstj1uENWEVl4H6S5ncgxKV+8jcl8/Mpdge6y/R9YSWo51x+tb1cvhZ00FFJfPSqfiDQAiuytJlGcR6LG6pHdV5raQ65vywa7lPVSKZgr6XlixdcLcDoyzhPXRrSyMIszQtUiT8UJg02it7WB6eDLkvEokexByhcoqhOie+f+NG4f1EDZSiTQgQrJkbdwVBzzDcisL3QtUxqOr2h7k/ga6QmCJu1KLBYfY/QOFGKDoYsC6pUEQtRAW75Z08w7Roo8UA3IH9uQ3Xz6zzGRUsBH+Eq3JEomLNkNnWZO9WRP+Ct+vGAZjMaTRT3VP4Jc/gXkcqhUJ0PK4K7U1Iz3fApnjwejuATl65an1rQvpYGOQUIk7EW2qBs5SuccJgsWy/Vy7AiD8fTdebsF5U+JVya9z8rUz1ppDkwBkynhubj54npwEyLKNgLbRQU4K82vKouzrvbGl7SlRy85SECjMTODVJvcqWmCfWCRWjKNs1W2ZeP79PZ8AHnBJQ154rRFw93HtuahxtHLfcbS5iF/zgOXHN+1iPZ/LqlCFTiKwWAExO8Ld3e8ILCzuCH0r27BXV6opigSgd8oLbzuekFRFo1ghzhAjzsSjgoA6Y8z+iKt/UOfgGLWDoBsWDsujuKx/H7rDEL083NptzW6LU7o6D+2Znr7Vt17CzmwxpvecNwFTmSa/XB5ZeTjtJ5KoJrXjw8TLwn68eEp09qAOQPB1vvz51HwatMjWQf7ozH09NwoTWQBxW5CnOtqY9o4s7bLnr8cOL/etZOZYw9amgD23kN1t4i4R3bH5xlwUw 3+dbGmcl oo7EE5b7KHBE04Ix/jXtvRRPEaLN8U0pY/1Brp0AhM8n5GXQdMAe/pZzqbMO8IXlr//U0vzEAdzOAbPL7bGczhlgWEkPkUVFEh7uHzi/VoOFuz9NXldKDrDZ0xLpaPWKGyNT5UjG/L409oIXU113GUGbS13GnAGk7bmmNQd4mg4nH3pLzFEWjWOHAsqODJ+PrvxP1dOBeJIfCMhOYbmDHCDMuTGnlnf6V5wi1c8/mk+pXmHPZVPf3pj8otQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Sep 18, 2024 at 11:28:52AM +0200, Chris Mason wrote: > I think the bug was in __filemap_add_folio()'s usage of xarray_split_alloc() > and the tree changing before taking the lock. It's just a guess, but that > was always my biggest suspect. Oh god, that's it. there should have been an xas_reset() after calling xas_split_alloc(). and 6758c1128ceb calls xas_reset() after calling xas_split_alloc(). i wonder if xas_split_alloc() should call xas_reset() to prevent this from ever being a problem again?