From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD273FC6172 for ; Fri, 13 Sep 2024 21:30:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 367756B00AE; Fri, 13 Sep 2024 17:30:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 317BA6B00AF; Fri, 13 Sep 2024 17:30:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1DEC76B00B0; Fri, 13 Sep 2024 17:30:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id F2B316B00AE for ; Fri, 13 Sep 2024 17:30:23 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 638D21209B4 for ; Fri, 13 Sep 2024 21:30:23 +0000 (UTC) X-FDA: 82561008726.21.C8AE1FC Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf10.hostedemail.com (Postfix) with ESMTP id F320BC0015 for ; Fri, 13 Sep 2024 21:30:20 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=D2MX1hXV; spf=none (imf10.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726262992; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yZFmPPNNPyPVHwqQRbgbAhRYsxST8nX90G2qtvqpxNQ=; b=xlOaBsGYmHvK5GHArP7ztVgnRopJ4e6tNNmhJzUYNFwgjhfegWT8aQYYZZNMKs79r50f/n S7CClrAUlpiKYZRVi+G8uUsz49uQOL5pz6ZBytqrZuMFRbEix7mNmsHHmZHuuI5rRPk3IZ Vr6/l+yV9vURLKeHjKQ+2dF3tPH5fw8= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=D2MX1hXV; spf=none (imf10.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726262992; a=rsa-sha256; cv=none; b=uw5PEND1oTQC5zvnkH4S+qdgWJ905V0UlAoL4aeszca8+QdRFq5utJgHoAgAXWgx747vca CajzmR/Nj2K82y9gfNwVy2GMNOLgMZBrhVPvI23GA+6xtZAVl82yWaZa7xPMvaeAHEvnPO LHtGgkJZK6H/3hbKsME5MQmkkvnqdWA= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=yZFmPPNNPyPVHwqQRbgbAhRYsxST8nX90G2qtvqpxNQ=; b=D2MX1hXVqxM+b2Tq+Hvw4tocOz DLg4I0ge4lCMRtXGKFXcDCDoqiHz5uV1jey6HcaRGdhqNsDq9kaCOD5GMwYGdJKL8u8nr7RPusfWP eCqHGxooSUSBS+L++H6Yi3XWDxfxioPGIdJLfAENLQC0MQVFfRTUrqaSYUvpEVihXoTDulMulQq2t AljQWD4jAGxQX52sidyhG1KiJU49IgAxxXi3tze/1mhSFUpP5OOOkq+1AkFxkE9jRQuMTf3hPJoY+ LEAIYHHmZC4S0va6eJosOkNqz1Tx9qQNupkHhQ8tllOPDy7Knx+PzDsSMEV80kR0i79uQHlx40Gpx +cWxG3ig==; Received: from willy by casper.infradead.org with local (Exim 4.98 #2 (Red Hat Linux)) id 1spDrr-0000000H0er-3mFV; Fri, 13 Sep 2024 21:30:16 +0000 Date: Fri, 13 Sep 2024 22:30:15 +0100 From: Matthew Wilcox To: Linus Torvalds Cc: Chris Mason , Jens Axboe , Christian Theune , linux-mm@kvack.org, "linux-xfs@vger.kernel.org" , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Daniel Dao , Dave Chinner , regressions@lists.linux.dev, regressions@leemhuis.info Subject: Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards) Message-ID: References: <0fc8c3e7-e5d2-40db-8661-8c7199f84e43@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Stat-Signature: u8ekzwjb1btdtz4q815mfuo5fq6sf3nd X-Rspamd-Queue-Id: F320BC0015 X-Rspamd-Server: rspam11 X-HE-Tag: 1726263020-206042 X-HE-Meta: U2FsdGVkX19iGzufxdoYEofq00xGZfWHOkReh7qQgDPirK3hD5OXlbXGUeiHGREC8N/KWeiqAc0wb4gME7LeATp1n88wOVSn4eIIO1gnpfZzLDG8wafZjXS+3+QU+ZD/ZYrCBiC9H/TkVL1VZUCVCUgb7bbCD4NJ7tYwztdf24CfapXaoyaQqplzXUJippZakBD1/5FXdVyroQUiXHFNAKnGt4CKolhsK2+3jJpxdccTZCaC1oLDmFDR6jnImjcPO9LIUdC3MHZJc8ijarxTU4G+Fi4Jin2MRdb3EAqb9fi+cw3j6WRHX4oJs/edgLVse8BdnoafA5dxdWDg0sszTTBAIVO9Ppguy/oGXa9jNprVcnPy8uV69SOgNSq7rf50R/a4kZDsIIvbjZtTB798px6QoYm+8XNIhX/BZhBBM7bmXp9+bWn1VUT1TFju+SkPldd6PD/4XC8Ww9Ar0U1oGcy9rX4xfidl1DkT37Ru0hZZZan7yC5yO2cnBJ8UdlNGu8nStjnethBM6yqL4L53CS6O4OdSlnJF7WsjWiAU0SV4zDy3+YP/VbB5//apf/4F0uYQPSb5rvzVxfB8/zCyhOAQX0VyE16dDMYwkDGV3E4zTVoHx87sKwDRKWKL1Ov5xgUYsbtCGGF88kKNvaljnh2oH+TNTz/UjkZnfjj+TeB0+GazsttzfeHE5nTAe49j+e/JEzuMJcLdkY7ia7ks64kFNus2pGeMbus1MjiVKULMPOOuFqcZDN8taxCqiAqYBJ/xaAZ0vnZkUVW05Uf4Om/gN0zM3hiF4KZ3SybXqUfCNKnwgCDQlTVWzYTiLoDRGqLuwkDsMfsxqoVssYvRyBCmHfGU+9R5f0B+xMCkGUuZ9Wa71XP6CSrd1yP3R3H76ZMslKVtPXGVjASY+2D2NCrjOfHbzbHyXGXtCHpuub8AwZLFEw90K5cndwqj/jp3O7o1WsuKjZSX5t0OlhP zug5D5uw mwcH7zOlUNxgz0xjtAmTyuYhCVtRUawN+CYHdVkLfpskBC4Y/i2DhOl8BPtia1MZuvVZkMdTtTKRw4lDB6RLwaau0OwfJ1q477AvJXLIeZHFbR4elAxLRXQ/LUPL/UE7Zb5xXHaCh9+z/Yw3/6dej1Y2h6NZgxFvQaNFyh+QaKVZrPuCqgDAKo3ndPAiD7k8mX+ZmcVS4yRJ/7K6to4ga5voLMJGgLzh7BGINM1vcSzK+uCMcj/Gl/Rda6pTZWN8o3pDV1REAytlA/4kPUSE03yyTsXaxIZUnOMJDQAW4x/o+cX0Iv4sfbTY4Eywdsu+AKgbOyRZg5La6IeE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Sep 13, 2024 at 02:24:02PM -0700, Linus Torvalds wrote: > On Fri, 13 Sept 2024 at 11:15, Matthew Wilcox wrote: > > > > Oh! I think split is the key. Let's say we have an order-6 (or > > larger) folio. And we call split_huge_page() (whatever it's called > > in your kernel version). That calls xas_split_alloc() followed > > by xas_split(). xas_split_alloc() puts entry in node->slots[0] and > > initialises node->slots[1..XA_CHUNK_SIZE] to a sibling entry. > > Hmm. The splitting does seem to be not just indicated by the debug > logs, but it ends up being a fairly complicated case. *The* most > complicated case of adding a new folio by far, I'd say. > > And I wonder if it's even necessary? Unfortunately, we need to handle things like "we are truncating a file which has a folio which now extends many pages beyond the end of the file" and so we have to split the folio which now crosses EOF. Or we could write it back and drop it, but that has its own problems. Part of the "large block size" patches sitting in Christian's tree is solving these problems for folios which can't be split down to order-0, so there may be ways we can handle this better now, but if we don't split we might end up wasting a lot of memory in file tails. > It's possible that I'm entirely missing something, but at least the > filemap_add_folio() case looks like it really would actually be > happier with a "oh, that size conflicts with an existing entry, let's > just allocate a smaller size then" Pretty sure we already do that; it's mostly handled through the readahead path which checks for conflicting folios already in the cache.