From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31F56C6FA9E for ; Sun, 5 Mar 2023 05:03:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 67C096B0071; Sun, 5 Mar 2023 00:03:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 62BF66B0073; Sun, 5 Mar 2023 00:03:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 51A6F6B0074; Sun, 5 Mar 2023 00:03:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 418106B0071 for ; Sun, 5 Mar 2023 00:03:03 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 0DA761A0817 for ; Sun, 5 Mar 2023 05:03:03 +0000 (UTC) X-FDA: 80533650246.18.331A3FA Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf04.hostedemail.com (Postfix) with ESMTP id 3EB0B4000A for ; Sun, 5 Mar 2023 05:03:00 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="sBPaYm/n"; spf=none (imf04.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677992581; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MYWBC0WOMIDYRhY6uikE3pHTCOX53fIXE4SZZXI8vwI=; b=ZJJPRuSzjgKQuvKJjZtSxbnQFofwGGLkXFkwzaq7uJJYbkvGc2s7vq5JzMuN+d9HhjxJ8g X/sLOKVNsgbi6dJk3YjR8WkCQea+KquQcibyFChPffCGLjFKCMhvn+TC597rou/wpSvw20 TkDdk98zF2uKxfII+ApB1jO9JtcjCLQ= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="sBPaYm/n"; spf=none (imf04.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677992581; a=rsa-sha256; cv=none; b=jtVtI4Z+8bI7l9jGvYpYcMcsSzYgMSR9Y35UZM6wYuG4qQo/CCcp9R8UeNOQW3LmB0XNLF DwiktIZn4zFgoOEfbrSvxaJyZEfIcVJdwzAjzoCuIEZbLAQJt1BuKmsDBZm9tkq9SlFDmL 83X1hWNHoDWWcZ5POvuFlEVhc52lAL0= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=MYWBC0WOMIDYRhY6uikE3pHTCOX53fIXE4SZZXI8vwI=; b=sBPaYm/n2Dc6Etsx9Ww560hGO4 OWlFc1zCTcv8loJ8ST34qAl6MLaEJYurxh1yXvL9Jn6M5sYIcu/BDjmWOQM8k0CRNtFdWzhfITW+S m9YkL7yDZcajdzBZ5br+HDk43PM0+d6naiQU6/qSOz0klu4bXO/VWI779iB9PsmfZkax0iw1B8kR5 ZX7vuT8+BXm1cIVyOXWFYtpKSxwyWjx6zf8GKQYVkkNRZnDbyRQn+ZfxnRw2rFBqcIGu0tg/hB2PW wHO4pDInBlylBEyRG4Gsoby04Iq43ZP5ctGumc4bwT+Zg7MYG/u2oIGrVraGMn/4AttFWJYBXrcWR TMX3YJCA==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pYgWB-004FVh-Ee; Sun, 05 Mar 2023 05:02:43 +0000 Date: Sun, 5 Mar 2023 05:02:43 +0000 From: Matthew Wilcox To: Luis Chamberlain Cc: James Bottomley , Keith Busch , Theodore Ts'o , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org Subject: Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations Message-ID: References: <2600732b9ed0ddabfda5831aff22fd7e4270e3be.camel@HansenPartnership.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 3EB0B4000A X-Stat-Signature: h6ctaqnn4o4hi6yktn4s56s7km5eezow X-Rspam-User: X-HE-Tag: 1677992580-241215 X-HE-Meta: U2FsdGVkX180Bcz6b/FZXpj2XMJzahl+nrIxICokMwuvi0UNkrSpEPHCyqkkV0XJTd7PzQH4ZTsMGQ0+O4cL3iQ70dfqDnuaig7WHVfvotBahV/kuCi+j6W6GtDPbU/LTzI8qhjUDnhnrKW3Ht2ejZU54q8jT7rwNKYSQtxh1g11TdMV93x02ybs04sZW5HFE0lbP/6ovl9oG+/cK5Il2748mqThmXrS6dSOevgqiBtIgi1c5g3psYyTXsuvahCANm2N7X1kbwYqWRUXfnYmNHs+QDMkQsCArs4zVej4zmn+ch7+nnzAGa6OAMgywNWFuGXk0ei30WyaAgT8nMO1DVj7c419QFDOKIdUSSxUOwwSK4LazfdKYc/QO+27pheAZDjlyKydqkABlmGVJ/bLNZJ5JOxX7Yw4U7bLJfnRLlyGB8WZJ+oiFw0uxTiJaiq+xvFPo1r4bcSCQ04oRNym9Co6If+O04R5qse/GSRIRUuuVh0iJdz7PsnvextuJr0jYMzQFc9YKs/+oA0+7NK+vm/e4VWGusfOQr3FkqfZtfk4x3R1D5P8wVMrx7jyGcWm21f3+MoycgXHaLmLuzn8M/Jf0RB1hTALIN58jMPxhhcJoKb9UNnth4HQZVAW9uFU0R/YroRYPJ6BnBJQvqFlV4fMiNm1PAHrJT3Oc1awoFXIG0jrwddN0YpqsW9jViWJH0EaKyNF/ylSmIfrcQRLUIpW8nk3kqUFWVP9rGuuJPZigRKhuOL4kanT64ueQ8D3fcvdPHQp+srUeZzLIqom6mB9uWdWzkJzd1evVHM2ohe9Echmrxc2zdt+xGKzl0/kqD9wQJzZnlQwM1Kj5IIqGysY/UPz58kAlHTV2u08XiBdh8bwiCcMeU9Q0L7Qgxy8HZwruBpK5YnVY7LUzmNbCmB98ibjUMU0hyV7j7U3gPGfLPrQZpzGhY9/qQQec2FoYkuBmbyK3H0+tK9/hfB +aoPLdn4 nxSP3BKvAH/yiHm/nOr6UZd57iNvERoTsIpSdXvBSxiETZycHqCUfG/2dq++F0B8Fhujfqv1gimgonADpN3fyItW5mhxPISQEKpBqh3vJOHScPNM5X/hbjdR58DmX1RNcfbrK/CadsoYAjfQZh5aJOsBcbw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Mar 04, 2023 at 08:15:50PM -0800, Luis Chamberlain wrote: > On Sat, Mar 04, 2023 at 04:39:02PM +0000, Matthew Wilcox wrote: > > I'm getting more and more > > comfortable with the idea that "Linux doesn't support block sizes > > > PAGE_SIZE on 32-bit machines" is an acceptable answer. > > First of all filesystems would need to add support for a larger block > sizes > PAGE_SIZE, and that takes effort. It is also a support question > too. > > I think garnering consensus from filesystem developers we don't want > to support block sizes > PAGE_SIZE on 32-bit systems would be a good > thing to review at LSFMM or even on this list. I hightly doubt anyone > is interested in that support. Agreed. > > XFS already works with arbitrary-order folios. > > But block sizes > PAGE_SIZE is work which is still not merged. It > *can* be with time. That would allow one to muck with larger block > sizes than 4k on x86-64 for instance. Without this, you can't play > ball. Do you mean that XFS is checking that fs block size <= PAGE_SIZE and that check needs to be dropped? If so, I don't see where that happens. Or do you mean that the blockdev "filesystem" needs to be enhanced to support large folios? That's going to be kind of a pain because it uses buffer_heads. And ext4 depends on it using buffer_heads. So, yup, more work needed than I remembered (but as I said, it's FS side, not block layer or driver work). Or were you referring to the NVMe PAGE_SIZE sanity check that Keith mentioned upthread? > > The only needed piece is > > specifying to the VFS that there's a minimum order for this particular > > inode, and having the VFS honour that everywhere. > > Other than the above too, don't we still also need to figure out what > fs APIs would incur larger order folios? And then what about corner cases > with the page cache? > > I was hoping some of these nooks and crannies could be explored with tmpfs. I think we're exploring all those with XFS. Or at least, many of them. A lot of the folio conversion patches you see flowing past are pure efficiency gains -- no need to convert between pages and folios implicitly; do the explicit conversions and save instructions. Most of the correctness issues were found & fixed a long time ago when PMD support was added to tmpfs. One notable exception would be the writeback path since tmpfs doesn't writeback, it has that special thing it does with swap. tmpfs is a rather special case as far as its use of the filesystem APIs go, but I suspect I've done most of the needed work to have it work with arbitrary order folios instead of just PTE and PMD sizes. There's probably some left-over assumptions that I didn't find yet. Maybe in the swap path, for example ;-)