From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68DBCEEE26C for ; Thu, 12 Sep 2024 22:30:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2DE56B0082; Thu, 12 Sep 2024 18:30:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB6886B0083; Thu, 12 Sep 2024 18:30:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C303E6B0089; Thu, 12 Sep 2024 18:30:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A1BC06B0082 for ; Thu, 12 Sep 2024 18:30:34 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2F6DF1A0759 for ; Thu, 12 Sep 2024 22:30:34 +0000 (UTC) X-FDA: 82557531588.14.39FC957 Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by imf04.hostedemail.com (Postfix) with ESMTP id 231A240019 for ; Thu, 12 Sep 2024 22:30:31 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=rrLM4KGl; spf=pass (imf04.hostedemail.com: domain of axboe@kernel.dk designates 209.85.215.173 as permitted sender) smtp.mailfrom=axboe@kernel.dk; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726180178; a=rsa-sha256; cv=none; b=fyEeZQKCqEqBnjkW2WQkx3mWnsuNCFEOgLRMgNC74ywNahevU72E0YjfY2DdlTiMViWpO3 8a3ylenCHNyr2hwuHayM6rP0ueWNHrQzxRo6aCvXuEAe8+oEtjokpTDRP0+LatlxVTBS9q V8jgKegi2a5uVgwTZl41M0hUJU2UIDY= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=rrLM4KGl; spf=pass (imf04.hostedemail.com: domain of axboe@kernel.dk designates 209.85.215.173 as permitted sender) smtp.mailfrom=axboe@kernel.dk; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726180178; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ONdBPIMyzZ6xU3izITaSKx2Q8d20ZvtQqwKOGE5YIUM=; b=5g9j4zxr+6NAUc/xZuu3SZflYsFgpNPYr89dLd8YEooPGvvNqiYMyr5d+nmzfY0rd1bOFz ct6e66oYCbrhr8V9Qn6EiUdN3x6R6ye+PJVZ0QSGL9aba2H8uqBEZ2zM0rqSW0x4xVnxXY ZdvH860x+rYmJ6HJinxPs1quJmYpi0Y= Received: by mail-pg1-f173.google.com with SMTP id 41be03b00d2f7-6c5bcb8e8edso182911a12.2 for ; Thu, 12 Sep 2024 15:30:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1726180231; x=1726785031; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=ONdBPIMyzZ6xU3izITaSKx2Q8d20ZvtQqwKOGE5YIUM=; b=rrLM4KGl075MRTMlD/mz/fVmEyeTehvMZtoqLeUD7GmwuXe286UgYAXF547MKj4uPG 9XbxITRZZ767cXnc1YtnMySKDt2vnKcYPvYfVDxTPqk3zlbX4JT4y6aAdbtxCct2ERrC IEvaDuqbcUZxR357/Z8H6c9IQ+Ma3GMiW1CoaGwj83KM11GVAROrd2IoPT2imNLL5yfR ldANkT1Wn2Vd6uynx32ssapGuMoNYMUhdpLt7QETyaI00NccIJ1V7lPrVBWNgA3WhSsk Sc/ejElMTmSxCexRIPE1VcI+AUd/vxT1GXqS1hM4aOkSRye7RYrjKh7HIZqGq0skkCby 0LLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726180231; x=1726785031; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ONdBPIMyzZ6xU3izITaSKx2Q8d20ZvtQqwKOGE5YIUM=; b=A2G5EI+ksqyV+2UYWO2uvlQx9mxexM7fpDHev5dutRR4jBKNHmbRlPi8EI+q2D2c1b Npj3eWwYK0aKd55GjIeKNp++jkhE3rSciL3apPZWdVcgJzdUeTd0k+JNrM0vy71UnUnm GjN8MyHFmhg3ENlJ5URMEVMoAWnQZ3d/IalmiKMu1QBsmqLLaW3BTOBCaLt3ksRL4hIK oDNv/HGYjnqSZ4t2m36uj4YTZ3AARRDFrbs+KsvoHK/K2tYvBp1cotgnwa2kuq+aBqGm /87p3B4cvUm64bc4CzJncF1RgVswz25GYx2AugtziekrEhjFvhlQf45l4mFRDzcnbCAb s2dw== X-Forwarded-Encrypted: i=1; AJvYcCUWoX8238+5C+HKDATUnT+K24BSbVEQR1tEkrkz4MYGtMHdVHHwBRR/fgVT3Y+VTHC2DFI/aCMS2A==@kvack.org X-Gm-Message-State: AOJu0YyQvaFxSsPkOUHWZjRJHOQG8p7I5ZynUUqbD2mdQoaNOTj0qumm 7wqC2xvTkrRIHecglMEmbZGHTAXTL8yVSoycAANnm7pNPVj638E/PiHwWEcjOuI= X-Google-Smtp-Source: AGHT+IELxGffTm/vOTCU6GKkLGts7f4oggk+OaSXCHB4wd3tdF/80Vdf6npvarkwyjy+x+jlAgsX7w== X-Received: by 2002:a05:6a21:398a:b0:1cf:21c7:2aff with SMTP id adf61e73a8af0-1d112cbc058mr1046793637.23.1726180230753; Thu, 12 Sep 2024 15:30:30 -0700 (PDT) Received: from [192.168.1.150] ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-719090d5204sm5031966b3a.209.2024.09.12.15.30.29 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 12 Sep 2024 15:30:30 -0700 (PDT) Message-ID: <415b0e1a-c92f-4bf9-bccd-613f903f3c75@kernel.dk> Date: Thu, 12 Sep 2024 16:30:28 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) To: Linus Torvalds Cc: Matthew Wilcox , Christian Theune , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , Dave Chinner , clm@meta.com, regressions@lists.linux.dev, regressions@leemhuis.info References: <0fc8c3e7-e5d2-40db-8661-8c7199f84e43@kernel.dk> Content-Language: en-US From: Jens Axboe In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: e5ydis9gh76qm9ir577na7j34z5p9eq6 X-Rspamd-Queue-Id: 231A240019 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1726180231-108549 X-HE-Meta: U2FsdGVkX1/2O8d3eqmcmAGjpc169PkkMwAElQqh6JFgXrjva/g1eBQndw4N5wDHdGRJ/UGpAW0UVaeXGBmiSdPOHABzptunXldIW4oCKfy8ghJHHleGdGqx6JVDD1maUpApFfB5RacrjXFsF7IpFD6yFF+1viGg0wyfjikk4T5g7jStQnS8NcuMN0bIriMzjz9cgbPPr9qLM5F2Ged2VoFkGjRp/tpfx2osfiS45c05oz8+3mVulvUoUNy45saGGgBRASyzK+BCWoOEE3K0C6fr4uvbAQYRnsISWUUtjnagVkTrIUNPH9+yKDDOWW7D/gDM9ivay1pAMf8bXbAC61rfoymc4KUHmTRkL6DALlgSVMBHL3CW3ITskkoAanOuCRO7+LU2skBkuXzDi6zYlHjulYsicJZRL9944bHYLgUlCDrVrvQaluR3tfYi1ss5ktp6I5JlUeqzY43c2EqJuG0vAyiqQzr7ZGs5UO68oOVMe4wKfCdzBmV/awi34o/+Kob4chO1xWnZkSxL7cfUW+/Gq3+uqqJLv/WJhazzaBv+A2fuPG1WAjTCQV6IXHSZP+82VF1j2grZ1PbtBB/3roF1XxAfahblrrgh+heZO61O3cqVhoypxhr9XiNuiy0miExpakYIie+lkcokrkw3keGOlMT1PD8JFNf5/UZGtzU7cg+6pgH0Nh8cfGCzICjzZ0ETvF9eb3O+g8SWu9QUTFL01xF8GDciaTG+obfoliYIMlQx9P8aWlFeAaO3fYFDQbIFZcMIkpdVA9J6U2wfgKqViR47/RQBJmnM+N9FDWWe4x4T5l/FI4APmcyxEUVaViUXXZrnW5f5v+5LmUFM5O0FyVXMF0KQOsp5umQQmQy1J7GLrec8g1at6n4GL/CBINX79henh7mA9/S/f1+KDR4AkbYICsnIPbKC1whNi02pMfjp3PB8BC7sX2cxX8ocgLRVDBuZMMxo819n8qI 1lelSKKl RCo40KmDDQGs5L4CCNEJP9H1SEC7vqKUwx8UHWi6rVfHnX/BMpP6+IlvmCyshRWSv0/hh9IUESTesML9JmcNAagLE43O79JbImyo7K5BEQNRYwEBUC1Ap7PnfHiogOQ27IHEiVT3CE/iBjIELnoK7HxCcF9u/mHsA3tqXiTOFGmd7gS1oir3f0uX3KdoeUrHj2VAp/SUAMr0DdLV2C9MwAkYe0cYvArh3X7rMBtVER/HGmLelqT7sjCRiQ5YeJmgpDXZJbgmoKVzRdsnsQc0wClHQj/Afj+Ei6EHW0hqoIH/46mDP4o5C95iiMjnzy01SndkPOW+SWcyxr5dg53V7PcJ4OzsSPpCYfsrs+RtoD0+dITpqt5REGHQk+V+Gh5S6dtiZZi9AwD+5HEs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 9/12/24 4:25 PM, Linus Torvalds wrote: > On Thu, 12 Sept 2024 at 15:12, Jens Axboe wrote: >> >> When I saw Christian's report, I seemed to recall that we ran into this >> at Meta too. And we did, and hence have been reverting it since our 5.19 >> release (and hence 6.4, 6.9, and 6.11 next). We should not be shipping >> things that are known broken. > > I do think that if we have big sites just reverting it as known broken > and can't figure out why, we should do so upstream too. Agree. I suspect it would've come up internally shortly too, as we're just now preparing to roll 6.11 as the next kernel. That always starts with a list of "what commits are in our 6.9 tree that aren't upstream" and then porting those, and this one is in that (pretty short) list. > Yes, it's going to make it even harder to figure out what's wrong. > Not great. But if this causes filesystem corruption, that sure isn't > great either. And people end up going "I'll use ext4 which doesn't > have the problem", that's not exactly helpful either. Until someone has a good reproducer for it, it is going to remain elusive. And it's a two-liner to enable it again for testing, hence should not be a hard thing to do. > And yeah, the reason ext4 doesn't have the problem is simply because > ext4 doesn't enable large folios. So that doesn't pin anything down > either (ie it does *not* say "this is an xfs bug" - it obviously might > be, but it's probably more likely some large-folio issue). > > Other filesystems do enable large folios (afs, bcachefs, erofs, nfs, > smb), but maybe just not be used under the kind of load to show it. It might be an iomap thing... Other file systems do use it, but to various degrees, and XFS is definitely the primary user. > Honestly, the fact that it hasn't been reverted after apparently > people knowing about it for months is a bit shocking to me. Filesystem > people tend to take unknown corruption issues as a big deal. What > makes this so special? Is it because the XFS people don't consider it > an XFS issue, so... Double agree, I was pretty surprised when I learned of all this today. -- Jens Axboe