From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE602FC616C for ; Fri, 13 Sep 2024 18:15:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 377096B0099; Fri, 13 Sep 2024 14:15:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3260F6B00C5; Fri, 13 Sep 2024 14:15:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C7AB6B009A; Fri, 13 Sep 2024 14:15:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id EF49F6B00C5 for ; Fri, 13 Sep 2024 14:15:34 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 663FAA0463 for ; Fri, 13 Sep 2024 18:15:34 +0000 (UTC) X-FDA: 82560517788.18.8B987D3 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf13.hostedemail.com (Postfix) with ESMTP id 25A2A2000E for ; Fri, 13 Sep 2024 18:15:31 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=R5ag1CbF; spf=none (imf13.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726251191; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5Bq1Gea88A+B6NeDYaKCfbevhpShpf8ZKdFAOPfCVxM=; b=Bj8eY/dSzaWruP0M/zyAlHq+QX6LYDfjKpaX8OFZfdEQhNOmo8/tdgjNJpDjHwYJLxvlVV yu8fpsNsMgxmLVzyw9OVaUHQKvn3SQV1N1Ex0Kc0AC49jgXVJqjXMTIoYgapODSxkU4sY/ 3uvF3lET6X+K/3NexbyXbeEoVATPfho= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=R5ag1CbF; spf=none (imf13.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726251191; a=rsa-sha256; cv=none; b=Ackrm83HHYh01jLbTft/uxCLOQO1jIjIbQR/iumJKO+mJuyxarBePPmQwqsLVJpqKp/6Wm UomF8rlVQyycbScUTA8IXgzqxFIcv01Z6GjLRYaRxZ16Pngvbc8MhirK3Pea3Nqs/+DvR4 G/Yny1Rc8g7VXlx9rXhnD0iz4/CTFR8= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=5Bq1Gea88A+B6NeDYaKCfbevhpShpf8ZKdFAOPfCVxM=; b=R5ag1CbFUgKYRrgPv9hvd9N0st gv2RQcEDGnDMhGm6FfWCwcueEoDgTilPTHewhvzTanyDdOCZznNjtw/jsemjq+dD8HqBCkyVwksVd EjGmaTHUIXSggDU0i0iNOzSsDvLkpUXLcSm2awbf/UCZCtUdki2H78F+9KMHmtXcK1xnXKHutcIkO WqCpHlXilpccexdKKRW6vWl5SO2H/it8zE0uaDWiSLPbNchT2O6Sb6C6cOiFjfMJNLC9p2hEuGmWN 30EVL8uwd2J9lqJYlLARlV17iMI4KbfRLKrCiKc0j0HtO2BmSp5jQJj93n6Pa/bAOqfqQsOQWLcHI zLWTbeCg==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1spApL-0000000GrP9-0pZr; Fri, 13 Sep 2024 18:15:27 +0000 Date: Fri, 13 Sep 2024 19:15:26 +0100 From: Matthew Wilcox To: Chris Mason Cc: Linus Torvalds , Jens Axboe , Christian Theune , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , Dave Chinner , regressions@lists.linux.dev, regressions@leemhuis.info Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) Message-ID: References: <0fc8c3e7-e5d2-40db-8661-8c7199f84e43@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 25A2A2000E X-Stat-Signature: rntaiy48dhxyw5a9u8why3p3456qccqe X-Rspam-User: X-HE-Tag: 1726251331-186809 X-HE-Meta: U2FsdGVkX1+77peXp0/nonUdoBGNAcAWYFXeT3eojpf++uX0eDnenOK0M+5wD8EVeW2QFoWT5jYWamaNXZF0K6fdizcxZuKtvvyLBspsYEPa6PG25Ywa0LcinS/5iv/lnjoEKuDWbOm07WnJ5l2hNppijVaYm4NCrNouANmQ1AZBymmp5cicLlcJuM9Blql5YXL/c4yaumLZKPhRQI6zp7zIuzHods/s/vrKywV8oid/Is6c7rMbEHLF7WAqZsIQo3Xn6KOGyUh/gfUMF7JfqU2FnpoM65CLEHXoNyZznlNYkxMFCf28H3+QivsPsJaktfkVq2lYKJS2S2Li7Dw02mBfnLbzoQCEwUbrBxnDIrrX+vSy9kjurVQLOpAJJHLjnk+HfbGYW3n5/pC52NcnCyaXM+RjyNrmTRJU8ZmAQ3UWVTN1EJdRPqWFF16YDk9nX1MFg+4C5RBzWmAjCQSH4waSRdjES9068bj8jJalYRyRnt9/dMHDqOMYYv1U9+xZqo+Rz9VSwSLOS0mz0ZxZLVfKE24YYdst4vv3y49A9VAtJsgHtVYZOMUH4mqIz0YO6VkbAD8UBxRCZajO9rOtsNI02iDYjkQiLkY+FrBy0LACr78uyxIBwS6g0YSl5jDfyvFL2YDfixiCyAXM3T2LsdOf16N8anYSuGKKUOeqt5bXtlvl5EsIVst60FRnDrUq2NUjZiU4QqWtf6T8RhhWHIuPDHWB6OFKn/KeJBLpXkgFn62ocNm6zoyMOxUbGbNsOq9aWYN5r0QoKYTKlfDSgsJCX+ODRiQ2KExAt080BokcsJxcmcdrqZX5iiLkakdb9MDK4a/dhQsRM25ELN1stwtXO/pN+8CLAdYD6eXFCv8CqzSE/xU/nPMan+5HjDw9+ecG41xtLxfaninXqArxS0oljfhZcesngfjqfk3qmyU67yVUzJryjso3pi/U7SLF0EkObkh2H6XJr9dgCde a8hZ4RKE ER0Nxfw7YB5fhrErZI/pmPlnT4klII0QPQYd+tHq7SSdxUinyzBxp1FDqC2TKh/3nB3VUW977XfzLt608llyVAW0TmnLuiHFGLBL6sLhrGlqW/i7HAn8f8ywLLDbY+rD/L7/qVRvBpomzX7wUeKaXMS5lhu2MoE1XTY1IZX4ujdQJtDIkRaSXmilcyzJnQimUBRhkOAwCPEKwtsg6RsFlA1WfOH57Rwmza3D+fZ5lgk3mWes/tMCT5IAi+BEDRxMPD5PahjSzF8+c6TeQ20ENmIkXzA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Sep 13, 2024 at 12:33:49PM -0400, Chris Mason wrote: > > If you could get the precise index numbers, that would be an important > > clue. It would be interesting to know the index number in the xarray > > where the folio was found rather than folio->index (as I suspect that > > folio->index is completely bogus because folio->mapping is wrong). > > But gathering that info is going to be hard. > > This particular debug session was late at night while we were urgently > trying to roll out some NFS features. I didn't really save many of the > details because my plan was to reproduce it and make a full bug report. > > Also, I was explaining the details to people in workplace chat, which is > wildly bad at rendering long lines of structured text, especially when > half the people in the chat are on a mobile device. > > You're probably wondering why all of that is important...what I'm really > trying to say is that I've attached a screenshot of the debugging output. > > It came from a older drgn script, where I'm still clinging to "radix", > and you probably can't trust the string representation of the page flags > because I wasn't yet using Omar's helpers and may have hard coded them > from an older kernel. That's all _fine_. This is enormously helpful. First, we see the same folio appear three times. I think that's particularly significant. Modulo 64 (number of entries/node), the indices the bad folio are found at is 16, 32 and 48. So I think the _current_ order of folio is 4, but at the time the folio was put in the xarray, it was order 6. Except ... at order-6 we elide a level of the xarray. So we shouldn't be able to see this. Hm. Oh! I think split is the key. Let's say we have an order-6 (or larger) folio. And we call split_huge_page() (whatever it's called in your kernel version). That calls xas_split_alloc() followed by xas_split(). xas_split_alloc() puts entry in node->slots[0] and initialises node->slots[1..XA_CHUNK_SIZE] to a sibling entry. Now, if we do allocate those node in xas_split_alloc(), we're supposed to free them with radix_tree_node_rcu_free() which zeroes all the slots. But what if we don't, somehow? (this is my best current theory). Then we allocate the node to a different tree, but any time we try to look something up, unless it's the index for which we allocated the node, we find a sibling entry and it points to a stale pointer. I'm going to think on this a bit more, but so far this is all good evidence for my leading theory.