From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 247F5C3601A for ; Tue, 17 Sep 2024 13:25:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5BF746B0082; Tue, 17 Sep 2024 09:25:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 56F926B0083; Tue, 17 Sep 2024 09:25:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 437336B0085; Tue, 17 Sep 2024 09:25:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 25E6F6B0082 for ; Tue, 17 Sep 2024 09:25:25 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A21361C3015 for ; Tue, 17 Sep 2024 13:25:24 +0000 (UTC) X-FDA: 82574301768.13.1349F7B Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf02.hostedemail.com (Postfix) with ESMTP id 5EE5F80011 for ; Tue, 17 Sep 2024 13:25:22 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=O9A2Vlt+; spf=none (imf02.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726579466; a=rsa-sha256; cv=none; b=FSBycPS9oJ3ZLY2DfqRIUZTKqsQXA/1paQfgfO8W0XZi7pK0yIxXxv4ng6IvqH4yZEQJ7X 5Wx8Xv0a/sXIX0eiwoJK1dpFQVdR8f8mxzfOUExQCcUMn3F4MKbF7OZUguiBITV/DsJyJI U3YfLljj5SzbcXieRPgcSn3HMVSETiQ= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=O9A2Vlt+; spf=none (imf02.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726579466; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XE0pVbqxqTT4RLjiHozh6t2ZvtCH+KY8eY1nRde4TzA=; b=EjljhWNTXppT/+MENU83zE6D3qtafwN90yDj45/W3XillFSG5Y7xKT/1Gado263xq4OUrC zG6Eni+PqhAYeSwdVGF81M0V16CAh+Q7ReEwy1EC/AIkV+n1dAs6sKDV+gADlj5qSVJyAs R+K1I3KEUmTrYkS4ZkRlDvMudomw+ZE= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=XE0pVbqxqTT4RLjiHozh6t2ZvtCH+KY8eY1nRde4TzA=; b=O9A2Vlt+1scJZ+pflALJYFbHl/ dp0Uh4+oq+gr+SyYodCTAMa36SXpSp7ReoykEhRITC3I6iPFo5xy8fKT+zP2mxElqMTNfDWnDm5V3 gR7oip56eas6RY6jJjYOZUKA20KTz+YPcUwmczCzM4uhlupTPJra88z4537Dcj04W3YmqI6sVJ2SE HHggiINgtYlZHU2LN9rXaPMbP2dOlH8wh1riM81VZzeB48BZ+dNbQzb4d62KuzmwAyGglqbok71kz WImWbP/s4TOrvIaxlGmNDDB1IuTHgnTS9nxEDiTHO9M7ihRlwh425AcBIV2FIewm+TVNNVrzom0wl Us8U9o4A==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1sqYCj-00000003BD1-2icq; Tue, 17 Sep 2024 13:25:17 +0000 Date: Tue, 17 Sep 2024 14:25:17 +0100 From: Matthew Wilcox To: Chris Mason Cc: Linus Torvalds , Dave Chinner , Jens Axboe , Christian Theune , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , regressions@lists.linux.dev, regressions@leemhuis.info Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) Message-ID: References: <0fc8c3e7-e5d2-40db-8661-8c7199f84e43@kernel.dk> <74cceb67-2e71-455f-a4d4-6c5185ef775b@meta.com> <52d45d22-e108-400e-a63f-f50ef1a0ae1a@meta.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <52d45d22-e108-400e-a63f-f50ef1a0ae1a@meta.com> X-Stat-Signature: h4p35zhkr33bxaxkscji6b54tyohscet X-Rspamd-Queue-Id: 5EE5F80011 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1726579522-436414 X-HE-Meta: U2FsdGVkX1/AIKqoGTI8imdZK1VULUGFqPKbE82fBv/EBOsygGAZm9CzlSz12zIBt5SOHi+eXVVN6e5plOoX1R1e1+0fHuMc6lJZf77SOPPr9ZP0Bydl45YkQQ5yAWjq7eMcGfvi+geSvvQnw5ASHYjPK18RmfXbQL6MSa9u9B2qvYmE0Pt0sZr694zX6QypDhzI7y0XPTFGhNeHiFkpPBIf5RnTGm8uC9E+IJipHPp9UszCFC05KzzHEbzBuc41MjOKboNjYXCKOwkvrDboq7fmPTMsKl2WR/rqWtciuTRkHmNKYnlhi6+Nr70uYJs8sb4H9wgTFTjC64y87WnaCnTcD4stClFyzqcCsu3dtSNfV9FKRpgRFDsJYKCTMbp8nJN/b1ajC79thYRR1c8f7GA+dXCAq5VAf7OzX+1k9NfIXhzGKfeEbnD2rOmnP09SemI1uuy10JrAh65PFOo94y7qIVPlPDckAluCOxMi0Irg3TMuv2nraW59pQyeNWZTEUdebn+fGqP74zIuWOgG/Aqc5oz+H7sD0dsmN8KqWu0dMgKCGID8PXEBp0P1K4tYWsHFJzdLmwZHEOCqnoI6xWoDEl4TwiSjIFBxZXwnAAKKuB35IZ3evE9nDcpZyyaqeiwHAmXTAl71KvMBz6nonJmAZY38Tb8ZawDvhXQgeBusR47XNZ76XInY7Q+Lfz9D15AxN5f1cJKL3xH/tlrX8Xm4Ghs8NyyW03HaO4LF8LWEzhSM/j1oNC5LYKtyhfhOdAIZa7iqOUFyj6egesGCcRswEUekRUwDhGgAb6p51RUNrbp6ekOLs9h8P8pqogkbV0XnuEtxr809RGhMdjY0tlBf41BDwiQONyYJnF+/bDXmGR6RXEcYXGnQn4QhDy7VdElygEwt3QvCU+LEhpkjKl0quL3WX7bS564xW7wfqkTnxyCVCHCrh3NvWwCRlwTNqxi+0X1scQQTuV74qDz JKOV0S8y 6lc5qIKCg7t07F8TTN8A9Q5TsGt1n+r0vu1jhHliBHs+8kb/5g5HT48pwvP49SWI/hrFeH5rMy2FZdg8DomNR5aCIknjBtA1wvKJebeWpQ94YCQdTWT5SpzXJlLfSYG7WfEMK2kIybuIoNYivITZVr8wP1e5x5Z77gztCxIbTKYEEtU60/9t85/PryDiJpLXU16jc0S8S5or2umZ34R1TAnrgolnLQrz0fpHepNoZRD3/4DwomXO0rGZu6qFH+BZAR+2YYkO0rZiGY9RMZctGwfzkgA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Sep 17, 2024 at 01:13:05PM +0200, Chris Mason wrote: > On 9/17/24 5:32 AM, Matthew Wilcox wrote: > > On Mon, Sep 16, 2024 at 10:47:10AM +0200, Chris Mason wrote: > >> I've got a bunch of assertions around incorrect folio->mapping and I'm > >> trying to bash on the ENOMEM for readahead case. There's a GFP_NOWARN > >> on those, and our systems do run pretty short on ram, so it feels right > >> at least. We'll see. > > > > I've been running with some variant of this patch the whole way across > > the Atlantic, and not hit any problems. But maybe with the right > > workload ...? > > > > There are two things being tested here. One is whether we have a > > cross-linked node (ie a node that's in two trees at the same time). > > The other is whether the slab allocator is giving us a node that already > > contains non-NULL entries. > > > > If you could throw this on top of your kernel, we might stand a chance > > of catching the problem sooner. If it is one of these problems and not > > something weirder. > > > > This fires in roughly 10 seconds for me on top of v6.11. Since array seems > to always be 1, I'm not sure if the assertion is right, but hopefully you > can trigger yourself. Whoops. $ git grep XA_RCU_FREE lib/xarray.c:#define XA_RCU_FREE ((struct xarray *)1) lib/xarray.c: node->array = XA_RCU_FREE; so you walked into a node which is currently being freed by RCU. Which isn't a problem, of course. I don't know why I do that; it doesn't seem like anyone tests it. The jetlag is seriously kicking in right now, so I'm going to refrain from saying anything more because it probably won't be coherent.