From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC369C678D5 for ; Sat, 4 Mar 2023 17:54:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D8DCD6B0072; Sat, 4 Mar 2023 12:54:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D16616B0073; Sat, 4 Mar 2023 12:54:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB6FA6B0074; Sat, 4 Mar 2023 12:54:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A80E06B0072 for ; Sat, 4 Mar 2023 12:54:54 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6B0ACA013D for ; Sat, 4 Mar 2023 17:54:54 +0000 (UTC) X-FDA: 80531966508.25.349635D Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf25.hostedemail.com (Postfix) with ESMTP id 6F13FA0015 for ; Sat, 4 Mar 2023 17:54:52 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=p3DDdOtc; spf=none (imf25.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677952492; a=rsa-sha256; cv=none; b=KI3DRNystqvTcyjGCPR0uONko/lzPWKfoUjQNs3EJiACzIR81saoeiKqDGtLGhFJpE53uQ 2FMH9+hyrBywkvsjsWvnmm51tu3+N87eTr2RBLmvSy5K+VyM7kVyhP0fSzwnUJ80O/RBaR DDZ9B2mKEDAQcXvJKXLwWQvFwWA6LaY= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=p3DDdOtc; spf=none (imf25.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677952492; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AHMBefZAR0nuTytT2m/j2q5aYMqyv8mfg9wTvKpbSsY=; b=fNs5DEdpO1vAKNQtHVi0DeMBwMiPc7wzUqH89RMeVJ40WDZqDaUl9IjwEp6vxoV2PsxYJG f7e38oyccZgTV2MUwTTQwBPZYgnN4aozcLJOb5ShpHAdYs29+rbch5BUANq9zDdB7GWkpg 5lR8qpWs4XTwQkYBc0ggB1S44I3VFYQ= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=AHMBefZAR0nuTytT2m/j2q5aYMqyv8mfg9wTvKpbSsY=; b=p3DDdOtc7ikjBWHog0cMJ9VmJw U9UDn0f5o2SxCxUbasMiIlD790WazQJEXKj02kXSBESFTF6fsOQuHYqLk19u15VeFjXB3cTnXpRg5 6xMCRdErxuQJyhqVrvjz4P2/lVoBF+mWWRXkRGqbrKrWEzDB2i6Z5dh208EMleV8mXu4T0NGbIZWp LyqFki2uQca5NINNj1qBfYLT56igzY2nEPnkxhAAZw+Fu0D1N8WVjwaCmz6r3SYCizTb0xU7biwfY WKlyI1RGtYoXFv4bQeJyH3Th3WPuviigHy1nGOlDzhhwcOjXjflRNHYw0OOQbd8aK9MHlkJNCwWab jA3v7g0g==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pYW5e-003xvk-1x; Sat, 04 Mar 2023 17:54:38 +0000 Date: Sat, 4 Mar 2023 17:54:38 +0000 From: Matthew Wilcox To: Hannes Reinecke Cc: Luis Chamberlain , Keith Busch , Theodore Ts'o , Pankaj Raghav , Daniel Gomez , Javier =?iso-8859-1?Q?Gonz=E1lez?= , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org Subject: Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Queue-Id: 6F13FA0015 X-Rspamd-Server: rspam01 X-Stat-Signature: 8j8con44thk4x3wz9j6ucqqym98ipoup X-HE-Tag: 1677952492-976559 X-HE-Meta: U2FsdGVkX1+81AO57IN8uTZLWqhyipwWoL6Azezai6700lHjMCjL/HSdQrjxThmPVG2X0O38DfApVS0RR75y5AQYuGXhfZiv6BahHMUX6qolbDC5c64uWXhxPLXhvjdunjlSUUYlPmjVzIzoxGgDCnVXfmq1pFD0l33PcSabRsXTRGjHAtdB5axXQ4Z3A9EzvOqEdIfWtTW0mPlU/u8J5LOlbyCF3H/yZF046CZy5eiigq/KZOZjkEGdikmmS79DHrmh4mJWSFeVKbZgDEa/pIuC0LJrEo9XK8U/kjx0v8hjxlJw0IYDp/dzl7OmpQEm1X9Q0JGf9ZVWJAYP+w5+oqdLGo1WmM2iZlXsnVImDkh9TpTaWQhz7fghvkGLMsUgE+c6V39dVlQACArtgRaVWtrTVMa9mZpnyoaNJuN5elao0GbzjrOZ/RUFFAMfT7QDts8CTz/LCwRMOGHD+U2dLceASoEvNFL/d50nLzFRRTF2KD9Q5cpKGItivnj4Xnp+1jMz9sxIeMFe86Qk1DouRTdFE+spu8TkL6NMftOC8YIksbbYUtqBogxYDIirDoPBkTkVDyQaHHP7JFPIse1sebdlcOI5HxnFewXzGx5jcMXq/97gVSJcl1K7lGzC/GnIzmMDAQydE4fwnkj7B/RGGEUENdm30L9O1eqLoene+aMV8s4Qk2R4znlGnlX0OAUoLZ0Ejkt5O7D39qSmElQVH5scSjGw4/kis5gRjSi+x7EtEOoCsQSsYmHqKxwF6L9kMjETJBpHR6sIlO/8nHSv03n4Ht4HNtAjD+ZlIpZmjCKXrV9MKgT+zeSMl0F/2CuoIYOkV+7M114hydNGiVhLJGLPO0b6x1HeYEqzTHfaqfxQDX22I/4+sWwzq6SnFqpxWvRCuYMdWO5wKnY+2pbNuyD+ZmVrabFBT6FC8+ugcpbN4EUhA4S7tMS2R1n0MarqWk93/UuZPBXav6USgje abS94u8p uH9u/h7b+OTOJ47pB+MLpmc0NBRhMkwIu7PfPmwA2c6CE+mEmhAK/5qBSEiPxOxFtc4PeAzDfkKekMB8KtU8j3pxRBhbMOlmUKzScGugN/cCV3vkX5d6xFm1p1Ht6yYBiqAD/n2hTZsZVbtGY3v5PmSbbBJ9tara40bYj X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Mar 04, 2023 at 06:17:35PM +0100, Hannes Reinecke wrote: > On 3/4/23 17:47, Matthew Wilcox wrote: > > On Sat, Mar 04, 2023 at 12:08:36PM +0100, Hannes Reinecke wrote: > > > We could implement a (virtual) zoned device, and expose each zone as a > > > block. That gives us the required large block characteristics, and with > > > a bit of luck we might be able to dial up to really large block sizes > > > like the 256M sizes on current SMR drives. > > > ublk might be a good starting point. > > > > Ummmm. Is supporting 256MB block sizes really a desired goal? I suggest > > that is far past the knee of the curve; if we can only write 256MB chunks > > as a single entity, we're looking more at a filesystem redesign than we > > are at making filesystems and the MM support 256MB size blocks. > > > Naa, not really. It _would_ be cool as we could get rid of all the cludges > which have nowadays re sequential writes. > And, remember, 256M is just a number someone thought to be a good > compromise. If we end up with a lower number (16M?) we might be able > to convince the powers that be to change their zone size. > Heck, with 16M block size there wouldn't be a _need_ for zones in > the first place. > > But yeah, 256M is excessive. Initially I would shoot for something > like 2M. I think we're talking about different things (probably different storage vendors want different things, or even different people at the same storage vendor want different things). Luis and I are talking about larger LBA sizes. That is, the minimum read/write size from the block device is 16kB or 64kB or whatever. In this scenario, the minimum amount of space occupied by a file goes up from 512 bytes or 4kB to 64kB. That's doable, even if somewhat suboptimal. Your concern seems to be more around shingled devices (or their equivalent in SSD terms) where there are large zones which are append-only, but you can still random-read 512 byte LBAs. I think there are different solutions to these problems, and people are working on both of these problems. But if storage vendors are really pushing for 256MB LBAs, then that's going to need a third kind of solution, and I'm not aware of anyone working on that.