From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A39C8D3C92D for ; Sun, 20 Oct 2024 21:41:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1EF3D6B007B; Sun, 20 Oct 2024 17:41:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 19F166B0082; Sun, 20 Oct 2024 17:41:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 08E6A6B0083; Sun, 20 Oct 2024 17:41:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id DFA106B007B for ; Sun, 20 Oct 2024 17:41:03 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 07F0B1A148A for ; Sun, 20 Oct 2024 21:40:36 +0000 (UTC) X-FDA: 82695300954.11.37387EA Received: from out-182.mta1.migadu.com (out-182.mta1.migadu.com [95.215.58.182]) by imf05.hostedemail.com (Postfix) with ESMTP id 99962100007 for ; Sun, 20 Oct 2024 21:40:34 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=oREiYfM2; spf=pass (imf05.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.182 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729460386; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=O78mJ5Puv0eo89e0YAxj37dhhuMOGm/UqefbI9GPIUI=; b=0WoRFGfFq4JlcVQofYDptzFnUh/RiXJTgW2uMz63Y3sqSOnUeUj3u+XuC8Zh3wf3BtQ8jf Wm2tvWl4KMmbBuzOQnep+lBWV1plpDZFY6EP34c5n80NIPQOmScXy7JusWMe8e/2c3oZtI rg2RsCdcs7ZNu8Y2qjjkhT2S8DpbtKQ= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=oREiYfM2; spf=pass (imf05.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.182 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729460386; a=rsa-sha256; cv=none; b=dCO0Gp9XBpX1aANr2QjE6mhsI7SkA4KkfVIClHbnhc0Gzc6MBwoEG0twHRGuW5OawDZ0cU kV1sE53LxWpLDbpZUGTDm29VF4OOoZ3BufyrW9/O8hS6+qmkWnReq6fZKWVr6eWQo5axVp RBbYZXLRQ+JQWJi45EEohU12Pi7MTfI= Date: Sun, 20 Oct 2024 17:40:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1729460459; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=O78mJ5Puv0eo89e0YAxj37dhhuMOGm/UqefbI9GPIUI=; b=oREiYfM2VF2rmGPYMb6/32WQUY2Nb7kROI1rkk/6cGlitmchyO0bH+zMUa8i+q0xl3cOEL nxqtRALaNh6Q+GnlFn9qhIkVqZHgqKHa39djZqPP2MzFnUJxBBIRtPbGyOhTWX5LoHwV+U uJxXK2GNXvHlU2HsPL9Jtpx5MOb2PCE= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Linus Torvalds Cc: Lorenzo Stoakes , linux-bcachefs@vger.kernel.org, linux-mm@kvack.org, Vlastimil Babka , Andrew Morton , Uladzislau Rezki , Christoph Hellwig Subject: Re: [PATCH] mm: Drop INT_MAX limit from kvmalloc() Message-ID: References: <6eo3gekf6twbnzhpsi2emz2s6sgtof6iba2rvbor7himmejoq5@qbfwtpbpvqoe> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 99962100007 X-Stat-Signature: 5edd9bi6thnspgnkgp96juei4nkcnn3a X-HE-Tag: 1729460434-316463 X-HE-Meta: U2FsdGVkX1+93Ljywkak9xqbsQTVFo9I79j+4AgPwiC1vPLsnZcle9VPWgGIXi89u8maOgspt9TmbQ8AHAnuxGMcN0SJi4IH7zKika2ApEjnOKaBIItIVtro7sq2W8izF+RLGVzSaH3lxD1I2PPdmxqXpno3kll6xcK/suBCZhr3EGpX0KBgn5ApTFAPvZ0Ru4ZBSPpUmTxo+Em4TDjcZtKl3U3SboDpdUZJvMnbsClMbyRk5sEDi3pfl+H5aFMUyHxgT0emCNUEQUqlw0C/oVj9wi6I8GavxAAwFSquxbEQzmU8WywOMrqHQjDfdGJmRP8ZQt3Im/XGGlFETUxBXA+tMynkat35hgukCc8TlBsA82QIjXPyGx7Zm2E+6+oHB1Ogrup3CAIrA1eJ1sIyPCN7raBwL4GrYDtpzyrlV4ZoGFmyAqzKkh1riWssaTJh5mBvvfKDcnemQTnIb6fD4RNT2XIdo4dDC4wOsIVtvjdmOXEVTX3xxUOlk5xH3qdfKGEmQt6v9efOWBAfGKuXJAZti//9bRI8Md7x7lpnWLjkKdJ7XovcAq77YtMraYmlFFK5cPqkUGPWRggLh6PQllmoz8VGNbk0Fi43i+BIbLUPqYbsg4AX3y153tms1vNm+Wx1aWTbBy3nnJVhnf/Ev7Jt9O88qtBHLUhgMjFFWxM0TbEOausew4J66PmlnLvI/MUKAIme7jynhc0u6RR6HdWj7QbN7PGCPTs3MP6Lqy4Y+dN5Vd9K4O9mhCaBTGwKI9h2iA+/WWOXLFJgl/Vdn8GuJJbl8n1MoEAnqgNpsgrAfnOap4q/VNXy7EiqG/KlRCkbRWGFA8uVjNDzyReU7otrLkUYrt+t4IjsHU8DKAVO+ILrvhGz86At/pJAjHTIraalGfkujLNM7YuqL52oJ9HMXy/yUsNBoBIt2oocY9oMgVl1UdjIOizuc5fuIJ6CftTa6sxC2eatrAeN9bB 7O4vdYj0 o+YzxgKnFsUF6JeNNLQzoQUwOCiJmUoqiV0Znrm6SteW2ERz0z1EE7HXCeEI8S9U35mJn3T6WQW0YNFAgSvUIVH/H9X/Z8HcCGP/ZxsfRh3gKUciLobFNH6bKlv+H14C7fxQi9LAu1nXpQP/VP+JyCvWimunDiBFLGo4YX0L6IWDVqpKzyGhScxGjQZvbF8qutDa9dx4+8BxmBKvAR6o9vcj42wWG89OIx7BWQ9gougu1CgMM8hVLmycXcOaHyDXzGsxK X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Oct 20, 2024 at 02:21:50PM -0700, Linus Torvalds wrote: > On Sun, 20 Oct 2024 at 13:54, Linus Torvalds > wrote: > > > > On Sun, 20 Oct 2024 at 13:30, Kent Overstreet wrote: > > > > > > Latency for journal replay? > > > > No, latency for the journaling itself. > > Side note: latency of the journal replay can actually be quite > critical indeed for any "five nines" operation, and big journals are > not necessarily a good idea for that reason. > > There's a very real reason many places don't use filesystems that do > fsck any more. I need to ask one of the guys with a huge filesystem (if you're listening and have numbers, please chime in), but I don't think journal replay is bad compared to system boot time. At this point it would be completely trivial to do journal replay in the background, after the filesystem is mounted: all we need to do prior to mount is read the journal and sort+dedup the keys, replaying all the updates is the expensive part - but like I mentioned the btree API transparently overlays the journal keys until journal replay is finished, and this was necessary for solving various bootstrap issues. So if someone complains, I'll flip that on and we'll start testing it. Fsck is the real concern, yes, and there's lots to be done there. I have the majority of the work completed for online fsck, but that isn't enough - because if fsck takes a week to complete and it takes most of system capacity while it's running, that's not acceptable either (and that would be the case today if you tried bcachefs on a petabyte filesystem). So for that, we need to be making as many of the consistency checks and repair things that fsck does things that we can do whenever other operations are touching that metadata (and this is mainly what I mean when I mean self healing), and we need to either reduce our dependency on passes that go "walk everything and check references", or add ways to shard them (and only check parts of the filesystem that are suspected to have damage). Checking extent backpointers is the big offender, and fortunately that's the easiest one to fix.