From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B24ACE8D46 for ; Thu, 19 Sep 2024 03:38:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 945636B0082; Wed, 18 Sep 2024 23:38:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F53B6B0083; Wed, 18 Sep 2024 23:38:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 795CD6B0085; Wed, 18 Sep 2024 23:38:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 5C1A96B0082 for ; Wed, 18 Sep 2024 23:38:50 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id BDC5D140A35 for ; Thu, 19 Sep 2024 03:38:49 +0000 (UTC) X-FDA: 82580081178.20.D66BFE0 Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com [209.85.218.47]) by imf10.hostedemail.com (Postfix) with ESMTP id 841B6C0009 for ; Thu, 19 Sep 2024 03:38:47 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b="an/qVJA3"; dmarc=none; spf=pass (imf10.hostedemail.com: domain of axboe@kernel.dk designates 209.85.218.47 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726717036; a=rsa-sha256; cv=none; b=yVsBEYplNU4xZa1BI1FQww28wVnkrhRUduzdLIVg+FLG6/zB37KiGhCKe8S9u6IFcMk6tr qjmo7rYNTCp5/KxxK5+2hgtU8mJjf+NT7dri5I/6tAi4OUfcQ5h/8fEjlSoJozGqNG93A/ wGLJgSo61D/OJqvmKMmjVwU/moMhNXA= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b="an/qVJA3"; dmarc=none; spf=pass (imf10.hostedemail.com: domain of axboe@kernel.dk designates 209.85.218.47 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726717036; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0ip2Z9ZN8QnSQEWnaDGGUtbvLzJ9Kl1C8B/3QvjgaEU=; b=abB+kqelgTEiBZC161G92VnhDjhlrF7HZf7yABlk5CmkxfVZLnyscxeLhnDrc+09DkIZD6 9Q86whzpn0zxkRCwSifrpFIE06fEHhrl6cp3uurCF3qNC6BP+xhPyD9c9vihslQA5jr20p neFcGtY6Emq+D2FAvUcYrQ7OanXenZc= Received: by mail-ej1-f47.google.com with SMTP id a640c23a62f3a-a8d6ac24a3bso64969566b.1 for ; Wed, 18 Sep 2024 20:38:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1726717126; x=1727321926; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=0ip2Z9ZN8QnSQEWnaDGGUtbvLzJ9Kl1C8B/3QvjgaEU=; b=an/qVJA3Bg6Ff+TNSjpXyU4JopgRClXV3z6HO7atcM1qcFX+YetDomTxlAhjcm2Uja L/V2V5lFK3ahQB9h1QJuKX9zV/wsWVDMFEacfcO3TZ4TYW28OP0CHI8kGaCWkL+fojYE 0IDPlubD3CUczlPeI1CVG49mSCe2/se4RQt3vBjfd7XXk448P4RC5C0j4cxttQNKylAn Uz1SipSi2fhTs+J9Vvet0eVi4cM8WKnTIjt7eZpJVic+5h8Cz9F9W86VpWZ+wAQwYrcZ at2lDlGsw9rjVvYc698/jfwJgUw1vwN/dcjEFV+E7jxzwt1lXRT6a4+ZL2ilodwlNDJG epxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726717126; x=1727321926; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=0ip2Z9ZN8QnSQEWnaDGGUtbvLzJ9Kl1C8B/3QvjgaEU=; b=CGV/hP4J4VE38vdBKY0jKCk+mwO0RXniDL2/mFswGmp76KQpVowa/goWWpVTUb08Oh KtAdypGJQzwBPJD1kqJmbbJ7LU/Yxr5luJGey+R0GaatMm8z8Hg9/+2GwTLJKmQN71IN kEzPCSPKIm0tgo4U22bWuhzHxN8fPgmjIbUut4cJcpQ8tFsbNFaIiC03mzQZpI6/0Xtu 47kcs4DmjxMSa3c8aUver5EE2hcXWCUTsLTbvPpoBNOA/2wnELtjzQJZWeU5ljYl2J8W VATzv0VyGRZ1jmp7GTQOqBiUsgluABI8Rz1VMSR2sR5H9mTFT6fFy2dJZh0Wv+T06jAj YRlA== X-Forwarded-Encrypted: i=1; AJvYcCWTbh5Y7E0IYDPstIJklCu9KcciEh9cM5zLoZJe1LdMHlMuad7MQ++0aEbiSvVNsljtPYoykxvq+A==@kvack.org X-Gm-Message-State: AOJu0YyJJShJtFikpHlLjpmtEovCh6aYp8VBhkR6HhEDkYwkFMR+sBbE fs08FdFfsKOzwlaeGVDOgQ/wGnnbw7IW2b+hgAcFONcYJHIZVc9QMCm43uI3oB0= X-Google-Smtp-Source: AGHT+IEoH3Dhv73LGspYMYEfg6SjReNg6iDMuoDHAeKmrMPg6QH84CSbwClYlsxjJrTcrRMZ7RzAJg== X-Received: by 2002:a17:907:3f88:b0:a8d:6372:2d38 with SMTP id a640c23a62f3a-a90c1cba61emr142785266b.18.1726717125742; Wed, 18 Sep 2024 20:38:45 -0700 (PDT) Received: from [192.168.0.216] ([185.44.53.103]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a90610f4375sm666457166b.73.2024.09.18.20.38.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 18 Sep 2024 20:38:43 -0700 (PDT) Message-ID: <8697e349-d22f-43a0-8469-beb857eb44a1@kernel.dk> Date: Wed, 18 Sep 2024 21:38:41 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) To: Linus Torvalds , Dave Chinner Cc: Matthew Wilcox , Chris Mason , Christian Theune , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , regressions@lists.linux.dev, regressions@leemhuis.info References: <74cceb67-2e71-455f-a4d4-6c5185ef775b@meta.com> <52d45d22-e108-400e-a63f-f50ef1a0ae1a@meta.com> <5bee194c-9cd3-47e7-919b-9f352441f855@kernel.dk> <459beb1c-defd-4836-952c-589203b7005c@meta.com> Content-Language: en-US From: Jens Axboe In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 841B6C0009 X-Stat-Signature: 1ntsodq76res8wt41iqgsek5tze79nfq X-Rspam-User: X-HE-Tag: 1726717127-925618 X-HE-Meta: U2FsdGVkX1851fMIpaoAYVPMA3qXvQOyhfth7BsPkIKCVadV7aLF6F6Prbe7LcgVnRxB60Ca9YMXXWc+kVVSABDVEX2RuxJf8EH4h6JXTRTb3iLq6GB0R7BKZg+ewk0aPkzzZwMp9BK14yJlurd8PGdOTb7qkOz3DIgNWsJg5xyrFS7aPXA5W4zd6Jk0eprZpKGAzHxlxhHmlpODpeqLLpTMWT1jwNQpuAuP2KhT1Xwc1yWCnY6KSpeDMmcUSd3rjxCzpTPEI3gtc4dGEhTkg5xyvbKMyp476VeTGj6zNHE8MIVacL8sx6A3iOP/P8zrgp5xf+tX/x052Orl3KlgCzrZ/uvvtww9rkXH5nkkRzkk77AE1o7weKn6tylu+y/okw51NIcvT4/PZ9FGmbUTsi5fTzZzV8JtCBDTIsRKeQDhYomi+0/K7GpSZok4M1RcKi0iJs2pivadZlDN6eYJ4xM4e/igN0wVCUWlA3Rw/K+Pbxz3PYaNnpxRYPtCubo+mwqfLd3AKQiYSKJ0HmXdRseHVY7DwzKPHnKHN6H0kKmb6RIDKakTzatA5tbK9wVsNTGTOMaK+ysoLoDbZ2LvDyPofjpEeSNljYh/CDrz65fJj2F7MPoZ/0BagfYF8b4d3bLoLwHgBXF/SBIX463z1Enm/BuLy1DzGqYIkqqc4dU2PPqHdaXB14h3d5XCks1jmgeMXHG5jhjSUXMXXitnjdzSyl56u9o4ZHfypHIdmZrjppilIdCAhajR46OzzZQzCzYtnFnusmv5ffiYsEDibWQ3O7SiDr+Y7zqXZQrYhsV0HUqq/NTCow2WmBdt6QaK0ocHwYYIK7k5a5qkeEzU8FMNdCXb6ilWh6HbYN6+8xrUeprTFh/1cTXPlcNfQiMZcxllnbALgP0anrlowTBGmmYe0ntUXm8QPXSVd1PCfPHIAdOFy87FYI1Gx+bwgy8VNW/LrLvPEl//CSK8K7b N/sx/l4v b53ySwQj2/+sX2jYfRjQva6CrCdhbf973v3Crr0oh8NU6VVBy8AH0LPLRJWqqAQ8tNSK2af9CR7hbpMzEhcxlqsJzbH0dz76pnuHf7zLklOA9i1Vgv6lhMXF3UEwTsvXmRihcyil9vGxdD+xo01CE0239h8IItoAaoXj7zyrpniBhhqGPDEjqCLMBPwjskyZSEV2fEPQ4lhoxPdP0uVi+qYMrXziE+ZSAD5AY2bfHu+PP3kc0wNAGcGszZ0GbqHRcoFOgjB6LFDBSv+mG/Yf2UzRIlYnqOwAWE7YX+ARy+KtAty6u581zrpP/uqzGTqgoCkKRtmkSnikMaKEWpf5agZXcyV+yIk9yObz/T2VtqZK/qvvnfWN3JF2zuPYocuLN2+rZNiixlwrWrti+E2vFgBEwhpjeOAfA/pzcwyt7+0fCk9gsYZ3fnBaoApjMM512hqml/IBeIdDJ17o= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 9/18/24 9:12 PM, Linus Torvalds wrote: > On Thu, 19 Sept 2024 at 05:03, Linus Torvalds > wrote: >> >> I think we should just do the simple one-liner of adding a >> "xas_reset()" to after doing xas_split_alloc() (or do it inside the >> xas_split_alloc()). > > .. and obviously that should be actually *verified* to fix the issue > not just with the test-case that Chris and Jens have been using, but > on Christian's real PostgreSQL load. > > Christian? > > Note that the xas_reset() needs to be done after the check for errors > - or like Willy suggested, xas_split_alloc() needs to be re-organized. > > So the simplest fix is probably to just add a > > if (xas_error(&xas)) > goto error; > } > + xas_reset(&xas); > xas_lock_irq(&xas); > xas_for_each_conflict(&xas, entry) { > old = entry; > > in __filemap_add_folio() in mm/filemap.c > > (The above is obviously a whitespace-damaged pseudo-patch for the > pre-6758c1128ceb state. I don't actually carry a stable tree around on > my laptop, but I hope it's clear enough what I'm rambling about) I kicked off a quick run with this on 6.9 with my debug patch as well, and it still fails for me... I'll double check everything is sane. For reference, below is the 6.9 filemap patch. diff --git a/mm/filemap.c b/mm/filemap.c index 30de18c4fd28..88093e2b7256 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -883,6 +883,7 @@ noinline int __filemap_add_folio(struct address_space *mapping, if (order > folio_order(folio)) xas_split_alloc(&xas, xa_load(xas.xa, xas.xa_index), order, gfp); + xas_reset(&xas); xas_lock_irq(&xas); xas_for_each_conflict(&xas, entry) { old = entry; -- Jens Axboe