From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A147CE8D4C for ; Thu, 19 Sep 2024 04:42:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E51976B0089; Thu, 19 Sep 2024 00:42:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E00FE6B008A; Thu, 19 Sep 2024 00:42:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC9626B008C; Thu, 19 Sep 2024 00:42:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id AFE7D6B0089 for ; Thu, 19 Sep 2024 00:42:18 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 402EA120ACA for ; Thu, 19 Sep 2024 04:42:18 +0000 (UTC) X-FDA: 82580241156.30.D042835 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) by imf05.hostedemail.com (Postfix) with ESMTP id 1782E100003 for ; Thu, 19 Sep 2024 04:42:15 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=CH4GEAnU; spf=pass (imf05.hostedemail.com: domain of axboe@kernel.dk designates 209.85.128.51 as permitted sender) smtp.mailfrom=axboe@kernel.dk; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726720879; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FEv0NNjcPwkYB/8A8F624McGvuZVJ9GKy7Zkfan8JEI=; b=uTvtMQLiO71x1JRTBOmfKdZBiMBr4nr2XnLjxcZLi3yUtFN+U0+zWkttECHbnf7yUeUZNL gNiir4tCf1YpoEG4/vrO1yAef1x6FeDTzRzYNIUIznisYe7UUw3xbaoLKY4AWgqAa//gdM mhw+AcO47Hj1qVvgC4Hm6S1Buub+O68= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=CH4GEAnU; spf=pass (imf05.hostedemail.com: domain of axboe@kernel.dk designates 209.85.128.51 as permitted sender) smtp.mailfrom=axboe@kernel.dk; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726720879; a=rsa-sha256; cv=none; b=nJX71oIZL6gDm+b5GFiWbXuW4S++QNKq6hGJkfE8NHRXal3vJHSKXoZlKTwDk6rYS4RM7Q GOWx2pihdUugrs5t2IzWZxy3vnCt6oG3hj+vijLHNc535mEwHaLOeK8woDgjBmIj5ljiFs T8hUplAn6Ke1qaa6uEHQyKi/DfIAstQ= Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-42cb6f3a5bcso4332465e9.2 for ; Wed, 18 Sep 2024 21:42:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1726720934; x=1727325734; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=FEv0NNjcPwkYB/8A8F624McGvuZVJ9GKy7Zkfan8JEI=; b=CH4GEAnU0/hqa3Bg2j8DT2RbjCVCbAuoTS0Jf4don6n8HTxRhxR9jPths3r6rxsNmP bT3kmwa0uyVhI+GvED9l75OL1nPYsCKn06fZTB9pj2GryyGNcXutD4/DBULqWtAq67u7 CM3VL0fr4ByvyGPUJOWT/w8AfOtcV0/d56NAGBouh5dmVDqFlIc+rGgktsPDhP9y+zsb ODcAQLKHK3DIBzPQjK4dICzL7Xrw/WSlH6oTrfovF35yNZ5dNB3d6Y6RPTw44Ptu7yWF 0OpV8kYy8YPms4UfInGo06fZGu0wgxFJS3rTozmyibcNODNlzR1goHsDe2qr7AXb9Hmw 01UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726720934; x=1727325734; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FEv0NNjcPwkYB/8A8F624McGvuZVJ9GKy7Zkfan8JEI=; b=Wh7qaIKPl9cB8hFQIMaesSKl92O8ji1pPRANlW/QMdPNuP6prpjUO5RD2x32zjhxBu ilNHPxFI6AnrbUJMriPzWGGkLo5IXTm+7c7dPLqJYnkVjytGsv9IVeGsTSrckg6GIkih VSn2GzY1CHIbTZXJJRCqpz/CMVEmJLSdhsMuUNZCLrVCJ7fAVpxXio0fbRFplvNXcAi9 ZLuughVJhXdgfAYvOapyItY0DoOZIoi9hAFHJRT/xTMqJANDMmM7a07jZx4I/wZdvdT3 LRFEK4NwcsUxBPo5corBFZPwW/z0InrNZaCl+DDLBNmJLT7xJaGZ/B1NCxZYpwX60XlF dFqg== X-Forwarded-Encrypted: i=1; AJvYcCV0dX/LnUMq/WOrbvLNJOEN93fp4nnPLMag3CPgpVcfjD7H3zmcB9AjQ20uUg/h6xIWSHZey57chw==@kvack.org X-Gm-Message-State: AOJu0YxX+0OShTb1iwuEDHQRkkE+w64AUG4MzKp5dCIMi8V9DsATpc4k uYA+4MGwumJF6U7PEneK7gI2B+1rTZE5/Yz0d0y6VTl67Xumua7sIP945QLgmks= X-Google-Smtp-Source: AGHT+IEgYzZAXC3ZjqvsRJgwZCm8/AA1vRU0Hskc9IMNfMwPplLV5Fn3D/nQ2UuKkwP3xJKeZULYnw== X-Received: by 2002:a05:600c:4e94:b0:42c:b8c9:16b6 with SMTP id 5b1f17b1804b1-42d9070a24cmr199328925e9.2.1726720933826; Wed, 18 Sep 2024 21:42:13 -0700 (PDT) Received: from [192.168.0.216] ([185.44.53.103]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-42e75b814bcsm9164595e9.21.2024.09.18.21.42.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 18 Sep 2024 21:42:12 -0700 (PDT) Message-ID: <4ef7647f-80d1-48e5-9cff-9ab612054ff8@kernel.dk> Date: Wed, 18 Sep 2024 22:42:10 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) To: Linus Torvalds Cc: Dave Chinner , Matthew Wilcox , Chris Mason , Christian Theune , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , regressions@lists.linux.dev, regressions@leemhuis.info References: <74cceb67-2e71-455f-a4d4-6c5185ef775b@meta.com> <52d45d22-e108-400e-a63f-f50ef1a0ae1a@meta.com> <5bee194c-9cd3-47e7-919b-9f352441f855@kernel.dk> <459beb1c-defd-4836-952c-589203b7005c@meta.com> <8697e349-d22f-43a0-8469-beb857eb44a1@kernel.dk> Content-Language: en-US From: Jens Axboe In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 1782E100003 X-Stat-Signature: 91ww58scdp5nikoknaqzx9ut4kwnszxm X-HE-Tag: 1726720935-557065 X-HE-Meta: U2FsdGVkX1/TLzwwRtrQDZ3zbrTOEl1FN4kGnp4+NObHwEvBScqRprrl3/cIQeWo9VcMxNJ/N7tJQfVCc2ULJkgWLvJidE1NDWfSA0Lside2/WDo2ZJUqBZ/pywuLHNNMduRJ5WXuXJKahNxRk7UJKm20jqF9fNazqVYnQl25BX3ksQmlXG/OHb0kFXU+qDd77oP2ll2OJyK3jkuc7oIjz5Kh8xjB2Xtia4pYIlXCco5JPNI8fBWRQ75ebqT66Jh8Zu82/Crqw0wcvpTpG7tP60JC9fd/GRnDdrYyTM4bKlXImm4OHFcSeC2mONRgKbSE/HhrR8g2khzi9xa/3CWDFBEM8zVD4mn9/2DO2fSV+IewenPVm33hkQ7ffalS2AWqH7nro5UvjhIvy+YVCdupyyhXDpHzEbizy++wxwYAaWGnvp94UXzyDnQEVxRwJShSb7JgMrpOpRkDCqrYFyklfXj1nE0FNNx6BMs4O1VGmPDpyXiLYz0BgNGMRP1hKkdvjrASeM/hzHoV64Paib66/NdQdntFWlBLEGlqMYfQpSku0E3dFtfFlNkDWaaVD7GFEqJddZgyZTA40oArLC8pHDiDMt+3pi1hhy2Ddd4O6siJDjo0UDeuxoBS5Uyhh1OxKLJJ73LZNTEzI0rUxOFVitBceKII3OUi0aLKA9+klb55AEDdXGb2g7Q2i8gyDqzzVxvbAruwwEK6w9b3FYh9FFiAB4QBcGupFArBCyEstSLghdDP+doxPlHwk2M20IH6HDUg0WR7mLD/1b+1eElAznhDZy4aw72g3hchphvRSIv/QzV9vgVeshZDRdcFL8jim5Y84X20jAljlT6YKsaedBm6PLjMJanozkQ53/BpWJV0cnW30N2mgWNMyI6KWuRGUW4bdj1reMUAk2wvIhVpDpoObz3+hDgCq53ooYH8crJnh2FvwkMJkC6K2iUZBWHvibJs/T52okrNVIgm9G OPPFB+Lb XE/o8YC9cC9qsHI3addqSr2AFW9/Yys5IuiOzl7gTMYddfB2OaWHP2qq/hy9XO8hwszlBwHASvv1f9Hf1YnFiXoGIZupQd8triLwE4MnIbNQC2v9v+qpx6q6aCCqM0RCtz1dT9LRJ2t7xPt0hAezmCH1iSUN2iPei0KBk9Eg6F0rIX265YN0zqva1KTy+XflTwRtnxVNuV2wc/povMI25xax5Mtccv8lGdTSMBTLAS7h+eQp19DzAobVPrFMzKkVa78EFdIITVtFAymore9Psb4w35f9m41cXhgx0uL0ydmrnpxE5YQ/kl6TQQBgSHqpUzAdnx/vgXnvmP1+VBw6P5puhfear+lp5EdjA0pJPwf+h03rxyQJ1JU3scw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 9/18/24 10:32 PM, Linus Torvalds wrote: > On Thu, 19 Sept 2024 at 05:38, Jens Axboe wrote: >> >> I kicked off a quick run with this on 6.9 with my debug patch as well, >> and it still fails for me... I'll double check everything is sane. For >> reference, below is the 6.9 filemap patch. Confirmed with a few more runs, still hits, basically as quickly as it did before. So no real change observed with the added xas_reset(). > Ok, that's interesting. So it's *not* just about "that code didn't do > xas_reset() after xas_split_alloc()". > > Now, another thing that commit 6758c1128ceb ("mm/filemap: optimize > filemap folio adding") does is that it now *only* calls xa_get_order() > under the xa lock, and then it verifies it against the > xas_split_alloc() that it did earlier. > > The old code did "xas_split_alloc()" with one order (all outside the > lock), and then re-did the xas_get_order() lookup inside the lock. But > if it changed in between, it ended up doing the "xas_split()" with the > new order, even though "xas_split_alloc()" was done with the *old* > order. > > That seems dangerous, and maybe the lack of xas_reset() was never the > *major* issue? > > Willy? You know this code much better than I do. Maybe we should just > back-port 6758c1128ceb in its entirety. > > Regardless, I'd want to make sure that we really understand the root > cause. Because it certainly looks like *just* the lack of xas_reset() > wasn't it. Just for sanity's sake, I backported 6758c1128ceb (and the associated xarray xas_get_order() change) to 6.9 and kicked that off. -- Jens Axboe