From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36DBBCE8D46 for ; Thu, 19 Sep 2024 04:36:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BDDCA6B0085; Thu, 19 Sep 2024 00:36:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B8EB76B0088; Thu, 19 Sep 2024 00:36:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A568A6B0089; Thu, 19 Sep 2024 00:36:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8126D6B0085 for ; Thu, 19 Sep 2024 00:36:20 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3FD8A1C516F for ; Thu, 19 Sep 2024 04:36:20 +0000 (UTC) X-FDA: 82580226120.01.79A1455 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf26.hostedemail.com (Postfix) with ESMTP id 27B7A140005 for ; Thu, 19 Sep 2024 04:36:16 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=cYrmvAV9; spf=none (imf26.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726720466; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yBIDFtJtFLiiz28ldZ89Z7XDnfs0+KKtU9f88uP3b5k=; b=A9CwomNZzUBmA7LBOKefbmDgwbDpPH52Bn/o+V2HVFO3tx80NLMK9ROJ/syH7LCJw9W0Sy pMZGDxQr3Tp7EZhjhr90IUsnhwkT7ztEa6aj+DlcUcqZo82HjbYyPblhFPq8bLqsSRI8kW jyvDbFOuCjIcTnC1w3zKV+0htRr1atc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726720466; a=rsa-sha256; cv=none; b=UKXoGf2zMaxcFg6/BXZFLDod9bVFD7omAL1Sxq0ph0sJnIThHBFp7RJGD7gFD0/LSrOBck r+gwR7BQWAsmlaXZk6rmvPK+b0QhJoIQGKhccQHqWcWRZqdemRauANy7c327//BckMfZT1 RqbclzeFX64DcbH7a//q/XMTILhX9pE= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=cYrmvAV9; spf=none (imf26.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=yBIDFtJtFLiiz28ldZ89Z7XDnfs0+KKtU9f88uP3b5k=; b=cYrmvAV9dPjwaQCl4qdeehbhDV AkQrf7ebQb5Qa90kM3q7m66KALEh9cenOR90RYjnSm+H8k3wH0iAPARUMVAjIJ/gqS7OP+HTR+Lb6 XZsZLQ7teaTuRbDxU/gH0MkLl+AG4WkPFXCNrwks3mRUc/BDAn70vL1Hf9fglI9Zrg+qh5lCvQFNU a9/N/6KyoCvmTsQm+bzmmx3xlWFGm20T1SkWAdgajPrwgI2dIDaUdxcEkyHgQLygSNKKwh+zSAjVw dnDTumvpRbMeie4SipEvEYm8l8Ok9fPkWgD3pd0awvdR42uvLTB99voKGzA+DaL03GK7pMnEIAS8q XS9Wbh7w==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1sr8tp-00000006aAy-0WS8; Thu, 19 Sep 2024 04:36:13 +0000 Date: Thu, 19 Sep 2024 05:36:12 +0100 From: Matthew Wilcox To: Jens Axboe Cc: Linus Torvalds , Dave Chinner , Chris Mason , Christian Theune , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , regressions@lists.linux.dev, regressions@leemhuis.info Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) Message-ID: References: <52d45d22-e108-400e-a63f-f50ef1a0ae1a@meta.com> <5bee194c-9cd3-47e7-919b-9f352441f855@kernel.dk> <459beb1c-defd-4836-952c-589203b7005c@meta.com> <8697e349-d22f-43a0-8469-beb857eb44a1@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8697e349-d22f-43a0-8469-beb857eb44a1@kernel.dk> X-Rspamd-Queue-Id: 27B7A140005 X-Stat-Signature: hhspmcfdh5grt5ju4mef8edqhxgpufni X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1726720576-217569 X-HE-Meta: U2FsdGVkX1+NRzyjf8J5skY/KWI52HGBav50c3MOMItgELyjkJAFqq0Na8vYXdW0KuGQjaYdy6224+fPjLVa47XN/vP7u1Yy0ITnBZtja5C8RXVpHGv9Hy7GU7XPfPFqt/fUwTxHGiGMU6D042Dq5ZjV8Ox89MOTOg0iqRbatDHwcgEFlRgJc7f6NHFMJDtuPkXzw+9Wo/VxCjV2Hr7Ydl+b3VKJp+UegzExRtv9O7HeGIElj4fVgLvRPzx/E6E8nRUi2sGWgPhlD3YroxaAISjxpV3WPgfKrDjh4yHOdnJWuyKtEIgVI0DeB0DqzVyn87empL/s4xZL/rAs0Y0Eft2Waq/388EAdKwzNXzZviFqaxvvX0FWlgXLYoGh/1U+FcxsnKTCb9mbSQ5nHbwcYERzf74hhBZ17arEdoqVDJNsL/uiuo9oT92/qPaRPPKOht3QbdpnNHgsh4K9edKR7LJEcoY8vLNfn5FSd7dKQliFunBV6M9OhPywKbS26Oz9wubfVZFpYWltW7zfdGEm6IyqOc8WBm3Jjj3ePKeJZEX43+/7IYD4Dqi4cEQ0vpJC66fFhk9BcTknyC1VkDyjipTI/tjH1Ci3vGtWoiPjzvE8TmL6HOqG3oF21YnInr0hn9AA7+JwXTVZMwn0D1mkFneCKZ3fPbMt3mj7adMYmtCtGv9B8AIhtVkLKATlmY51DxmZRMrm+dW4hjobeQofWvA9BxpHFznTc6yj4h2xXPEs0o9bkQjNqhEYaxuwhKQu1Ds9jKUKUsqh+IKu4kU/MN8l+1V0MA04A0N3mf9+cW9BbkRTCbbBVSFpZOIBvucnzTm044F9vZvWD1q7KzI1ti/hfuHGKJS8BDFNpNPAanPLAiBejV/5hWNUWjhVd9lw5X0JKTABkFXAq7GCo/389pAYUee/v7Mh9IXolaoSiySMI+gafs8AOut2SQWuXCk1DPBpSNYh7aWq9i7LzD1 96xA5OCg 3kq5aqmA5qMDQly2jVqeEo4B3u8Ycg3e6o0N4Og+h47Q/2qc2d+LPMu9FiiAI5DEbYXxD8IC1KnHhnNRwY44viQIK1vvVTzniI8e11TdNcoavIYZG+5Atl5B2H4enaEFnaaZFYYPqugq7fs+HE4xtkVMV4SWfaWCku6hBDtJqDVSgNDXTVPJbIqqvf/KeTBuXw0qxMejSD2N6CdB3aBa1LFJ6HNCqw0Mdi+8LYavpmFTegF3cqLczsKLlkfeLPiXBCP15DRzbSvZZkLXhUYOhdg4Wm07OyZk2MQjyQS932iUzORkBEjZDf99mbqPjYhj018NkkIKz1sUOhn4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Sep 18, 2024 at 09:38:41PM -0600, Jens Axboe wrote: > On 9/18/24 9:12 PM, Linus Torvalds wrote: > > On Thu, 19 Sept 2024 at 05:03, Linus Torvalds > > wrote: > >> > >> I think we should just do the simple one-liner of adding a > >> "xas_reset()" to after doing xas_split_alloc() (or do it inside the > >> xas_split_alloc()). > > > > .. and obviously that should be actually *verified* to fix the issue > > not just with the test-case that Chris and Jens have been using, but > > on Christian's real PostgreSQL load. > > > > Christian? > > > > Note that the xas_reset() needs to be done after the check for errors > > - or like Willy suggested, xas_split_alloc() needs to be re-organized. > > > > So the simplest fix is probably to just add a > > > > if (xas_error(&xas)) > > goto error; > > } > > + xas_reset(&xas); > > xas_lock_irq(&xas); > > xas_for_each_conflict(&xas, entry) { > > old = entry; > > > > in __filemap_add_folio() in mm/filemap.c > > > > (The above is obviously a whitespace-damaged pseudo-patch for the > > pre-6758c1128ceb state. I don't actually carry a stable tree around on > > my laptop, but I hope it's clear enough what I'm rambling about) > > I kicked off a quick run with this on 6.9 with my debug patch as well, > and it still fails for me... I'll double check everything is sane. For > reference, below is the 6.9 filemap patch. > > diff --git a/mm/filemap.c b/mm/filemap.c > index 30de18c4fd28..88093e2b7256 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -883,6 +883,7 @@ noinline int __filemap_add_folio(struct address_space *mapping, > if (order > folio_order(folio)) > xas_split_alloc(&xas, xa_load(xas.xa, xas.xa_index), > order, gfp); > + xas_reset(&xas); > xas_lock_irq(&xas); > xas_for_each_conflict(&xas, entry) { > old = entry; My brain is still mushy, but I think there is still a problem (both with the simple fix for 6.9 and indeed with 6.10). For splitting a folio, we have the folio locked, so we know it's not going anywhere. The tree may get rearranged around it while we don't have the xa_lock, but we're somewhat protected. In this case we're splitting something that was, at one point, a shadow entry. There's no struct there to lock. So I think we can have a situation where we replicate 'old' (in 6.10) or xa_load() (in 6.9) into the nodes we allocate in xas_split_alloc(). In 6.10, that's at least guaranteed to be a shadow entry, but in 6.9, it might already be a folio by this point because we've raced with something else also doing a split. Probably xas_split_alloc() needs to just do the alloc, like the name says, and drop the 'entry' argument. ICBW, but I think it explains what you're seeing? Maybe it doesn't?