From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4DCFC61DA4 for ; Mon, 6 Mar 2023 08:23:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5A2006B0072; Mon, 6 Mar 2023 03:23:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 55437280001; Mon, 6 Mar 2023 03:23:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 41C406B0074; Mon, 6 Mar 2023 03:23:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 325D16B0072 for ; Mon, 6 Mar 2023 03:23:20 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id F1B22A0B96 for ; Mon, 6 Mar 2023 08:23:19 +0000 (UTC) X-FDA: 80537783718.08.4F97DB3 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf23.hostedemail.com (Postfix) with ESMTP id 8E00A140007 for ; Mon, 6 Mar 2023 08:23:17 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=o9lm7UJZ; spf=none (imf23.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678090998; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=De7UoAg6L10EIApbR8j9lQJmaBA4wNwgXFUakJ10PZM=; b=DlfEBTc7fj5DphRV1vFFVQipGKzO+cAqpIj9rvHWss8XMO2pMSfDr5mPeSdLi/1RmNL5nT aHUpJdfPTeEndop10YKOFwZb26dbNAO+grVYUwKglybIxWjF/7fHm6KS/PIEMEYr432YTm tWiGptNu2AgnYgJMuIBrqD2AXXFlsWc= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=o9lm7UJZ; spf=none (imf23.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678090998; a=rsa-sha256; cv=none; b=zT6WtDNS0TQM0mQv1yy8SMpC2QvJSrew2xVLBkU9jPg4gJyflM9D6V81+upGqL1+ZW0Ycj kj6J9pztR2YDjXFFQVo5PTDZOOSkLArDADn+hwJ0csC285SyMRDOyhzKQ+GtOOKx4OjegU /rux2sRaxSxk0GrXUggvVbdUL9YXnxw= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=De7UoAg6L10EIApbR8j9lQJmaBA4wNwgXFUakJ10PZM=; b=o9lm7UJZKoGq2QDs2J0BgyJo60 qStPvdSQvgqFd3M2doU48cXMWoMaqUkc/4KyMHX+IbL2OlgQ/hLM/gWWkNAh3Jywn8QdlNMImTGK2 kaA5qf0tybsarAnhtqUwNDDz0WoIhyP66MKZOlniWauNzOKYZWnmFQb48cn6+oz9JNMl53I3cY022 zVmQLlueeHNOXPplS/4ph23nGo9acTbaVnHdvGTnWz8Ky001pwvfXnltp72eI2TE/bhPRfz7soAPN AFfkBfnvkEFIBc1EPSiSSLNuQcN9fgJyYfj8h7excWAVMi08xqzkK8/ghq70NG/zOFHMof5jj5u6a mfu0uI0A==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pZ67Y-005AKn-9z; Mon, 06 Mar 2023 08:23:00 +0000 Date: Mon, 6 Mar 2023 08:23:00 +0000 From: Matthew Wilcox To: Hannes Reinecke Cc: Luis Chamberlain , Keith Busch , Theodore Ts'o , Pankaj Raghav , Daniel Gomez , Javier =?iso-8859-1?Q?Gonz=E1lez?= , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org Subject: Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations Message-ID: References: <0b70deae-9fc7-ca33-5737-85d7532b3d33@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0b70deae-9fc7-ca33-5737-85d7532b3d33@suse.de> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 8E00A140007 X-Stat-Signature: z33q9ggrk49cmgfb13zymd33dscr1x7n X-Rspam-User: X-HE-Tag: 1678090997-976193 X-HE-Meta: U2FsdGVkX1+0gcKMhSVIFYogMDohpS3pCMVjsA1Kkwa4KpIUWek4Yp6xnu/PnDHcNevp62VtAOLT5bdcckcZAoHzUTp+26SfZ6hLzW5JD8ntXVdrxEiibIIDYDcsrDIZc//bG37GjUm85/6/OZFaAuR06qs396TxlnaTPvD45go1CXWwiCmBwqWzQNhsffrR2CjAciYdOrPux0B3Uc00O3acObaJd/ltRWLzxKrwsar22mhebwmm0FWs0/omG4Q+FRO8w5T+1q8nYVk1IqsigyIuTUoH7xf/encZfdsLkFShrRRcBsiG1n+8XfKg8Sjy/luNv5XfxHp0qzV4EKLs+3RviTUHjv/6vuFMx/G3kc1/+/OIt+RKNcvnT30oEd32MiFjromyasRq4kq5SDBHsCjD/10o4jNUoAilswaNFZkkCJNjqRU+8atBTtgVNcYhYoPnJ5S9p9pzRseBFoyiVUQWNkBoNdNhmZBGhV65NKnCqmF8UdOPVlfK4lNUmlkamwKl5wQhyeM/lkUlG27tiqnhBaH1dpjEXsCPEosWYhdWZp3DzlC/0Kq0c/qx9IY4GFq3GbxRKEZMb/oD9F7LiyYbSD4PUWaeC0kqAZdMwz88E8XBhg7YBBmptlOS8NNOY8dzrFuZKu7wAiLtsL+2ItY67t2NRFNkSqVgaOkNHM9RnCMnPh/+ddic0lXDSk/8EObcc4f/w/NlWrdozi8C6x90Kboy+HyVx3a3dJVm/BJvARHTjq51GoBswN769A30D0FkSFy60SbWb3jmThS8+tn4S8AeqyHGS3ltPib19esHewX7NcXbRLc4Cj2dJz11uutTLK9P0i58X6HYiHq4r8zD/cHWWnRu6ApRITXkTKcH2JojJtu2xQnOk7ji2/P4A8QXK7tdgwBLBsVXwUEZ7nwsoo10bteC/k9J+lkc4MIwy8Vosu7wsfG1r3NMkjgEgYrLH3h59An7Atg23st AhMPtl13 ZSwtDXKWOqvpwPVfjouUzQXHVSX4H7hfwf8KmbXv06CnOOm8rsYgpznZmIt9SsyTCt42CpaaXnGHNqy31iGSo+qiOli0MHqlWrq2Cf2LZBewtgGWIlyIpQvgWX8kw1+fZiAppSOqbBqNhD0ricZzEMFrcygI/aRzNJ7GJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Mar 05, 2023 at 12:22:15PM +0100, Hannes Reinecke wrote: > On 3/4/23 18:54, Matthew Wilcox wrote: > > I think we're talking about different things (probably different storage > > vendors want different things, or even different people at the same > > storage vendor want different things). > > > > Luis and I are talking about larger LBA sizes. That is, the minimum > > read/write size from the block device is 16kB or 64kB or whatever. > > In this scenario, the minimum amount of space occupied by a file goes > > up from 512 bytes or 4kB to 64kB. That's doable, even if somewhat > > suboptimal. > > > And so do I. One can view zones as really large LBAs. > > Indeed it might be suboptimal from the OS point of view. > But from the device point of view it won't. > And, in fact, with devices becoming faster and faster the question is > whether sticking with relatively small sectors won't become a limiting > factor eventually. > > > Your concern seems to be more around shingled devices (or their equivalent > > in SSD terms) where there are large zones which are append-only, but > > you can still random-read 512 byte LBAs. I think there are different > > solutions to these problems, and people are working on both of these > > problems. > > > My point being that zones are just there because the I/O stack can only deal > with sectors up to 4k. If the I/O stack would be capable of dealing > with larger LBAs one could identify a zone with an LBA, and the entire issue > of append-only and sequential writes would be moot. > Even the entire concept of zones becomes irrelevant as the OS would > trivially only write entire zones. All current filesystems that I'm aware of require their fs block size to be >= LBA size. That is, you can't take a 512-byte blocksize ext2 filesystem and put it on a 4kB LBA storage device. That means that files can only grow/shrink in 256MB increments. I don't think that amount of wasted space is going to be acceptable. So if we're serious about going down this path, we need to tell filesystem people to start working out how to support fs block size < LBA size. That's a big ask, so let's be sure storage vendors actually want this. Both supporting zoned devices & suporting 16k/64k block sizes are easier asks.