From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EEF19EEE27A for ; Fri, 13 Sep 2024 03:44:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5F3436B00AE; Thu, 12 Sep 2024 23:44:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 57C466B00AF; Thu, 12 Sep 2024 23:44:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3CEDD6B00B0; Thu, 12 Sep 2024 23:44:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 1A8956B00AE for ; Thu, 12 Sep 2024 23:44:23 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9435114032C for ; Fri, 13 Sep 2024 03:44:22 +0000 (UTC) X-FDA: 82558322364.11.1A23BAF Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf26.hostedemail.com (Postfix) with ESMTP id 42CD8140005 for ; Fri, 13 Sep 2024 03:44:20 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=JOdq7coC; spf=none (imf26.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726199008; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RvfXQ9uvEs2BWWmkp13Rn+2w03z6x8sxWgVmbE+dLsk=; b=3hdqMvHvqOtt2qvvBdt8gea6vJuPFgx02tUH4+FMN06+suEWSeaRTA0lNcElrLEZOkRQ8K kUghsPMq2wkKR3nCSzSsqP3PeGhdEwRPSjm6X8jiObKAUFYFxaGMF3xl2zbO2QeESejiYo SZGSaJHK8Qhgp0B0HUg9fTkR0b16j28= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=JOdq7coC; spf=none (imf26.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726199008; a=rsa-sha256; cv=none; b=3D8D6MVkjagNzdlX10lX335cHrabHsglhWCF+J/mfQ0eH4y5TmGMM4wkerycgHqArskEp6 +Ehx1j4edU4JyDLl3zsGXZTlkFvapOroHx1Oh2cP9PC1JBGshR9VO28+x+4x3A8h5zle7Z RvdIeZX7Ztrm4EUWH4Y9mQ/pXZE4ELg= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=RvfXQ9uvEs2BWWmkp13Rn+2w03z6x8sxWgVmbE+dLsk=; b=JOdq7coCdU6dtWavSvoUMfgrkb HkADbUmaigqOAD9xt0G4oLZPvfAPkdxAFR/9izSAnmePGYc1AvjFz080+NLrvCIcADY9LkHjCTjCh 1zmlQw1FsW0jgAUWCa6Gk7b8eAelPrR1BjOU1mK+mIEhAMXWWMFRsyFs/578cNsU7Zpd2IB6iz7Vx IXiLEB7YFP9nr9k/lVXJ6uAJ1oHMCv5UiinGibHQFfuMLsvsn6AD5/JiotmGckfolhF0U7L0Z7zEo 8ytsOnorp5ox7o3K1IJaEBqylesxq0Um4ewmL6vrPDszY7tbyJ0Ou+C1oBbY2SP3nFCDB1+7X3BFe 8fpoFTGg==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1soxEA-0000000G4tf-1dYV; Fri, 13 Sep 2024 03:44:10 +0000 Date: Fri, 13 Sep 2024 04:44:10 +0100 From: Matthew Wilcox To: Linus Torvalds Cc: Jens Axboe , Christian Theune , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , Dave Chinner , clm@meta.com, regressions@lists.linux.dev, regressions@leemhuis.info Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) Message-ID: References: <0fc8c3e7-e5d2-40db-8661-8c7199f84e43@kernel.dk> <415b0e1a-c92f-4bf9-bccd-613f903f3c75@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 42CD8140005 X-Stat-Signature: oj6fyckihqf3848nmdajswqf6q44y6mg X-HE-Tag: 1726199060-104198 X-HE-Meta: U2FsdGVkX18zNnHZSefoVAs/o+0NBXldmO74Ekt6nerUMuX+bUeYBzD8iMl300n4GcgMQc1VJ6LBTCzC+TgiyIR5F+u+U8FjFC9HCZGsim1p9oUb8P/tRim9inANDavtI/cQlQtXSZGx34Nakc6l8Nf+9IHvCfbqxWZ8b7nowOnq1HNRSAfck3wMgwGJRxsjwu0F18ozm39Ld5padk924qzLLBXMV2VTLjuKaiAhF/+KEul6VXu0SDX4YE17kY+m1Ns3fYOqaYtMeuPTBwePmgcv9uGLV5BoUXJansPcKHOP2z5W2IYrOKfb8926DhR0lFb23wNxSIZdDzNIQSkbXuVJ3IpKraZMDTy+ZpLPyANo6YBFjRVq4s/nk9Nm/gQvdrrvTNKg5nIDBiB8tyhI8cs05AF8t2LtIY3APa0RN4PUPuvwHx9cG93EuSCkYKtet5evuCWgCo4Bb5zbgwMEBuA4gznNLsLfMvQRwxItfROa8NfrzSqT3TQMfbR4vA18erygLfbc762ASOa7U6AQPJk3yR55g/XQh3et9nSY/xlfg5zSwVW0PDC5NMtoOCE/gLm7olLhupU7japplHMB8/0JTnp2RYakCa6ru/SSeIbhy6MrxoXYXAB6asnbEVm4gKu9gpSFonj/qrbOLROh9t/jsfbH0XmCQtdsLzQK01S9AKsCihf46hOhalAZrEFdKtOD0APSrJ0y7aiIt00rtlOPLZ35hfVCajIBmb2TUWxrfXuXHHKoh3W9AOmuMNgTNComOKwbZATzIrIJbplqwV6ZHy8MYanG0F9Ea/1P4nD+qvFWvl9xazFHbLfN9f4wmOSX7IQ9EDfmxV6jSsrfbgVD9pAcS3NpFKGIFj5xgttmqPLT7JKlCRmO2oWQwC/g8fLT3ksuPkh/t0M80RIpcBkSHan89srdBE28rJIGTsIGQQk/fP8C7Q8E0FOBDq5KUvS7iwn11pk73fTmjjO f1LWaTHC Wp8eXNqJvj8xkbCtc8HJXtqDfqEYdSVpGmFwh0o1LGK1wY9mHoiNSmVMa+glS6wwjLxG+AKP5TSxBumrS9SfbM3lafE81OevJUBSTX8243fohi1eVrlmFCdIixi2KbIoLMGwcSS+KAQjbgREM0heH7JnP4wlOCH4+4GAaboEgn2lPCBCi6bH6ydUfIOxGeLw53Ws29QEfQ5ME921EmauMvSPzXly186ZEnt6T6+EAlrVeHiT0yiBlt3pCZSnO2/EqGOh0tva8UVzt590I3V5nuF6Atg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Sep 12, 2024 at 03:56:17PM -0700, Linus Torvalds wrote: > On Thu, 12 Sept 2024 at 15:30, Jens Axboe wrote: > > > > It might be an iomap thing... Other file systems do use it, but to > > various degrees, and XFS is definitely the primary user. > > I have to say, I looked at the iomap code, and it's disgusting. I'm not going to comment on this because I think it's unrelated to the problem. We have reports of bad entries being returned from page cache lookups. Sometimes they're pages which have been freed, sometimes they're pages which are very definitely in use by a different filesystem. I think that's what the underlying problem is here (or else we have two problems). I'm not convinced that it's necessarily related to large folios, but it's certainly easier to reproduce with large folios. I've looked at a number of explanations for this. Could it be a page that's being freed without being removed from the xarray? We seem to have debug that would trigger in that case, so I don't think so. Could it be a page with a messed-up refcount? Again, I think we'd notice the VM_BUG_ON_PAGE() in put_page_testzero(), so I don't think it's that either. My current best guess is that we have an xarray node with a stray pointer in it; that the node is freed from one xarray, allocated to a different xarray, but not properly cleared. But I can't reproduce the problem, so that's pure speculation on my part.