From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB4DDEEE270 for ; Thu, 12 Sep 2024 22:56:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 282C76B0082; Thu, 12 Sep 2024 18:56:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 20B8A6B0083; Thu, 12 Sep 2024 18:56:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0852F6B0089; Thu, 12 Sep 2024 18:56:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id DB9A76B0082 for ; Thu, 12 Sep 2024 18:56:39 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 868F3140B44 for ; Thu, 12 Sep 2024 22:56:39 +0000 (UTC) X-FDA: 82557597318.20.0E58E8A Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com [209.85.208.52]) by imf08.hostedemail.com (Postfix) with ESMTP id 72289160003 for ; Thu, 12 Sep 2024 22:56:37 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=JWpGORKw; dmarc=none; spf=pass (imf08.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.52 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726181769; a=rsa-sha256; cv=none; b=hHk2Ik8ZQud1xTcUakJCVcy0h1ZSr8Mftl05vMHGoHmsqZrqR+xTI8OFZXm/NAOZ2KUlPM HUNUgLoS2/IAZaqBqDyp+f7lUF9OeYWPxzRfT2l3V5Bm+wgnfJNaKVbNHnJ0SRKNoLHHUo fq679I1DQhM18E98jcJsYQ3u5dYliCw= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=JWpGORKw; dmarc=none; spf=pass (imf08.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.52 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726181769; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=p5oSHOzzc4kY1d0eewmzi0Wbpuaj3BjPn9DNZ4NK/40=; b=CrLN3AYPP7g5N1h9W1e9Fbbu8aQrmXe1uoONZdo5qWcqDWbl1an7kaX3mu1oTS8tDdwfe/ XzCBtz8ClRPOM5U5zMA9FCCsdWHNZrYums0uTkp3qG7PhUepy5INgq/QhUCX0o8LQYk+UU pEueM0uykp07XsaRqdMyHfdWm6qFtyA= Received: by mail-ed1-f52.google.com with SMTP id 4fb4d7f45d1cf-5c4146c7b28so1784217a12.2 for ; Thu, 12 Sep 2024 15:56:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1726181796; x=1726786596; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=p5oSHOzzc4kY1d0eewmzi0Wbpuaj3BjPn9DNZ4NK/40=; b=JWpGORKwnkZaD8Y0I/VCHYUAqzlrxVa7Hm1syCY29smMQLMG6/aFwX33lwe65PTLD9 YzMEdBI7AkpRCLdL6pAd68JmivTWN4iYHZ1XRHLipDPRDUdAynqb5/fpJViS42dUhuJi aSYw93uFp3Ioq0uwqRNjRySSLxfcIT9isL2SY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726181796; x=1726786596; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=p5oSHOzzc4kY1d0eewmzi0Wbpuaj3BjPn9DNZ4NK/40=; b=GZjyeYBppByzM44WbY746avkpclfjYzMpAH7y1vHUpY3fZR8mpvc2EZy0hxuqhyKPI m3hUlbZncKOPNS/UWoavy/4m+k3bsgltMjnLI0IksAquB2xuCIv8X9hTqEykGQeVVrQP fWdTzgUxap5RwADEnXLnEI/P9CnzI6EU4Q0CNB+XbeZxBf/BDY+nS1LkKZ/QZdO3TQ0/ iFbwer054C1kjWQbYtNrqLYpLkKX7XrRfUHOa0UuCczDS1KmqSkWHfN6q0CZCkWUp9ki bX+sQTo5mCVdVDpUWa50GK5Fx04HKwgENT1UbCJ6p/XPDm8vdGb8I6toAk5+7xjdoSLT kNRw== X-Forwarded-Encrypted: i=1; AJvYcCVYOlk+BpxdeI7Q2xyod1whPxHGeK+zIBFrjn+Bu86y8vHRSsb/RPEoxWyYouurQsirBHOptxtbmQ==@kvack.org X-Gm-Message-State: AOJu0YwlGzb5xqJL26OI/hpSoGHg66y5lW+Z82zNRZ+ct5d4fOsMcADa vm5hmueas+oHdLl1hU6JOVnVqIIIbE7k8KNbOP2F2WOr7sx3Q6KVciusU0I333iH116xvNgXb86 lY84= X-Google-Smtp-Source: AGHT+IFTT8GxlMkpJmz4wkPhkRe2pqErTiLhC95QLVtUOAGTPoV9K8vB+qQoK2RpHFy7UCETH+rZKA== X-Received: by 2002:a17:907:f148:b0:a86:7c5d:1856 with SMTP id a640c23a62f3a-a902961df7dmr446549666b.46.1726181795411; Thu, 12 Sep 2024 15:56:35 -0700 (PDT) Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com. [209.85.208.47]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a8d25a25897sm798093766b.87.2024.09.12.15.56.34 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 12 Sep 2024 15:56:34 -0700 (PDT) Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-5c25554ec1eso1809934a12.1 for ; Thu, 12 Sep 2024 15:56:34 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCWF9+0TxwRUaKsky6gWcgoG5656tyF+5KeVFmqi12XA2GHsQN0VaHHGyQkcJc78gHNuPLpyDTRDAg==@kvack.org X-Received: by 2002:a05:6402:274a:b0:5c3:d8fd:9a3b with SMTP id 4fb4d7f45d1cf-5c413e4d06cmr4158380a12.28.1726181793913; Thu, 12 Sep 2024 15:56:33 -0700 (PDT) MIME-Version: 1.0 References: <0fc8c3e7-e5d2-40db-8661-8c7199f84e43@kernel.dk> <415b0e1a-c92f-4bf9-bccd-613f903f3c75@kernel.dk> In-Reply-To: <415b0e1a-c92f-4bf9-bccd-613f903f3c75@kernel.dk> From: Linus Torvalds Date: Thu, 12 Sep 2024 15:56:17 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) To: Jens Axboe Cc: Matthew Wilcox , Christian Theune , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , Dave Chinner , clm@meta.com, regressions@lists.linux.dev, regressions@leemhuis.info Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Queue-Id: 72289160003 X-Rspamd-Server: rspam01 X-Stat-Signature: oaoq8aggeeefc33ns4qaqe6ucjhmfiez X-HE-Tag: 1726181797-784823 X-HE-Meta: U2FsdGVkX1/rKuKvvIqh5BRklNwODMJ5w2ztmweGtBF2a5Gcf7dsyn4M2QrzBQk+gaeKFZAPm3E4ZAqk8yEGoL6NkTFmsSHIhl0cxIg733aauZiZIP5ngoBywKC5zJ5uf6jde1Sy3c7+XMvObcjWQkBFP+1bMKOpymKsHnMYcHvMOnk7D44P6gIcMsws2+2IMMC0wnGaIzKa2vVSc+TuMml01Un4t5zpcO7stA7RC5JsNdo223F/tGBILsLThzPLuKiGPya00EEQDSktGMmAJYXYfWEzMmbJZq8Z0QQv1+7K2aAuIo422hVoHHCQ9h4bruaaAwkbzqYSKlyd/RveSetXAl+djUluMZWk3X8bbt8/OyLqzOi9hRDBN0cNXTW2gwz/bvghkwEFamwnwMbTGAYeYSd+7LuX+jhYCUI5zhVnZww9L0hkBYyeWLUILthGJqpNXicCq80mXMHvts33hawsY/Fy5xHWaDb2XVV4NLEzLKupm5+RchwojLxcuyewY0trJFzvMVxviuUd0xhyd084pazdMhCoFHTRZu0pM2Ai1o2+Ix4jdTPoUqAeDv2nGTLPGPkWNia9vwsYloI5Ym3LorKhI+wcqU8QO3MNzy00sJ2NJneJhpifVUmbYdAjAAbn0BVqa2Svoqy1ifCpRPoCvfqaomCfmIvfPlqTG5AsfXwgPBSn3rIN+55064he6ayB0ut/2TtRijpHDFIBk1k7B7gwMjwU5cUKpB9349FAoZYWt16zQ6hWRPOsjbYzO63Ija4NAnicOxexxtghIKceC68Hzn68a7YqWrrIH1Qr7BMlerisjDBg7YnRGBeNUVR0Am1ojZAvq3OR0q4im6rPil7vdS11eCBcb9nQcly89FvvLcnaogieQe5US1f7oL9QnrjJc1aIEaIsxC4fa69zWRmy2Z9P7AFB2wupWSnYMKtQr6PUM8bnROyUmxaAhCB0YRHbnlrkcphIbXr dx5JKRr5 yLEdXWomL+IyO54tBCFKTCt9NQgQAKJZ3jeRHvbiB2qwFAeyN8541Bow8XRNDgJjToPNa6XKGcK+KWwUxfEq2W/2kgNIF50OrA42jicJePHIxiEPK4M1NiSIf+7hSbOkZ3zUcJ3PmZSQSFOihpC5CnFUCsiCcRq71ceIJ/sNTfpx1HLLGXpRXzzBc1IbFTxM86rHu1I4CY7q5cXE/DSSFSC3G3yt3ddoPdzh/a+SKe6+fteRTRR6fSKFbWllDqg7cCYO/XRdUbX35X3V6dNxStQRhVSWxB3rfUJ7qI4uml0Nhz6+WU4zGahXIXY7SNYkJ+nkJsZxlOmz+QycJxIL1vWCe4hpdvrEmWiMYTuVsu0sz/+2oFELZAK5JzMtFO0+abXDOwCItU1goApMbh1ZdQDUniJq3ISzLbTzgbcjuMnAeih91kAfoAdJG4ciEJ74A+E6iOgzilzkXc7UUZCtrP9keg+LKsFUR+8txhl3TJm2q3VpBMEkK7AT9usd3aMs13JQrl4ljKIOww9ID2RXkb2wa+buu5l2VfUFbop1WrF7+vGI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 12 Sept 2024 at 15:30, Jens Axboe wrote: > > It might be an iomap thing... Other file systems do use it, but to > various degrees, and XFS is definitely the primary user. I have to say, I looked at the iomap code, and it's disgusting. The "I don't support large folios" check doesn't even say "don't do large folios". That's what the regular __filemap_get_folio() code does for reads, and that's the sane thing to do. But that's not what the iomap code does. AT ALL. No, the iomap code limits "len" of a write in iomap_write_begin() to be within one page, and then magically depends on (a) __iomap_get_folio() using that length to decide how big a folio to allocate (b) iomap_write_begin() doing its own "what is the real length:" based on that. (c) the *caller* then having to do the same thing, to see what length iomap_write_begin() _actually_ used (because it wasn't the 'bytes' that was passed in). Honestly, the iomap code is just odd. Having these kinds of subtle interdependencies doesn't make sense. The two code sequences don't even use the same logic, with iomap_write_begin() doing if (!mapping_large_folio_support(iter->inode->i_mapping)) len = min_t(size_t, len, PAGE_SIZE - offset_in_page(pos)); [... alloc folio ...] if (pos + len > folio_pos(folio) + folio_size(folio)) len = folio_pos(folio) + folio_size(folio) - pos; and the caller (iomap_write_iter) doing offset = offset_in_folio(folio, pos); if (bytes > folio_size(folio) - offset) bytes = folio_size(folio) - offset; and yes, the two completely different ways of picking 'len' (called 'bytes' in the second case) had *better* match. I do think they match, but code shouldn't be organized this way. It's not just the above kind of odd thing either, it's things like iomap_get_folio() using that fgf_set_order(len), which does unsigned int shift = ilog2(size); if (shift <= PAGE_SHIFT) return 0; so now it has done that potentially expensive ilog2() for the common case of "len < PAGE_SIZE", but dammit, it should never have even bothered looking at 'len' if the inode didn't support large folios in the first place, and we shouldn't have had that special odd 'len = min_t(..)" magic rule to force an order-0 thing, because Yeah, yeah, modern CPU's all have reasonably cheap bit finding instructions. But the code simply shouldn't have this kind of thing in the first place. The folio should have been allocated *before* iomap_write_begin(), the "no large folios" should just have fixed the order to zero there, and the actual real-life length of the write should have been limited in *one* piece of code after the allocation point instead of then having two different pieces of code depending on matching (subtle and undocumented) logic. Put another way: I most certainly don't see the bug here - it may look _odd_, but not wrong - but at the same time, looking at that code doesn't make me get the warm and fuzzies about the iomap large-folio situation either. Linus