From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEC85C678D4 for ; Mon, 6 Mar 2023 10:05:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3A6CC6B0072; Mon, 6 Mar 2023 05:05:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 356CE6B0073; Mon, 6 Mar 2023 05:05:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F73F6B0074; Mon, 6 Mar 2023 05:05:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0FCFF6B0072 for ; Mon, 6 Mar 2023 05:05:26 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D9660C0AA1 for ; Mon, 6 Mar 2023 10:05:25 +0000 (UTC) X-FDA: 80538041010.29.4247F2B Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf19.hostedemail.com (Postfix) with ESMTP id C39AC1A001D for ; Mon, 6 Mar 2023 10:05:22 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=TCnLqFQl; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=7QsEJHCX; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf19.hostedemail.com: domain of hare@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=hare@suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678097123; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QRb9m2gaQoCLctLOiYS3tE6uGi3yOZ0FLkQqw+fFybA=; b=3yltfYy1LUQsIONOEslKfGibi/jfWzAI2HUr/hXGYOmlzGy37JNlv5qahOmEg4XYV4tm3R Vhj8VK2UJpEo94GRV2vD5m206Op/0cjsGE2+LyBuasPxDKE1e2UqFojr+5OxCtRIwqCgFf xbKRp8ANIKPLZDeCL2iIPoHEml2Ogio= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=TCnLqFQl; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=7QsEJHCX; dmarc=pass (policy=none) header.from=suse.de; spf=pass (imf19.hostedemail.com: domain of hare@suse.de designates 195.135.220.29 as permitted sender) smtp.mailfrom=hare@suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678097123; a=rsa-sha256; cv=none; b=xilmQxlVohk76bBopDrWacMfPmgflkJkz+SaL+3iBIxS+plRJLAoDpxIaBhpVqt36rDWjK 4689KcoiSYQwADyqovSu/QIhZfjjql2YbsbArKCp37O9W2DxOvrzrBlcoom9G0U8EWwmlt HJH4FZx4uTf6UoDyvly6vOfD99sVGQQ= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 303971FDDD; Mon, 6 Mar 2023 10:05:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1678097121; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QRb9m2gaQoCLctLOiYS3tE6uGi3yOZ0FLkQqw+fFybA=; b=TCnLqFQlPrObIJ2juVIYjrKmRoTrx3k99z6KhVWYekZPPw4ynNrbxuEsuJyGzff3GXyFrc CT1C3eappXnvVCbJf0hQ0fzjmVpnF9YlkcQBlsqQDlNpTvb3caMPsk4txIfgvmN4gv/vN/ 0r9LGeBBDqu5g9VOWeI5mjwMNpGveMw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1678097121; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QRb9m2gaQoCLctLOiYS3tE6uGi3yOZ0FLkQqw+fFybA=; b=7QsEJHCXVvi7r+XMVVSbIhVXqO5yz9jL2VMCXu13ylzuQLIZRnMx4a1gUJcfr6lVtP1Xi1 dvhhF7CN5Z15hZBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 17ACE13A66; Mon, 6 Mar 2023 10:05:21 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id BJXfBOG6BWSXbwAAMHmgww (envelope-from ); Mon, 06 Mar 2023 10:05:21 +0000 Message-ID: Date: Mon, 6 Mar 2023 11:05:20 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 Subject: Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations To: Matthew Wilcox Cc: Luis Chamberlain , Keith Busch , Theodore Ts'o , Pankaj Raghav , Daniel Gomez , =?UTF-8?Q?Javier_Gonz=c3=a1lez?= , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org References: <0b70deae-9fc7-ca33-5737-85d7532b3d33@suse.de> Content-Language: en-US From: Hannes Reinecke In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C39AC1A001D X-Stat-Signature: rcx46hfudkm4sw8cabkfs3y59kjtq8tp X-HE-Tag: 1678097122-348158 X-HE-Meta: U2FsdGVkX1+4wfcKl4qsZobRF7NB8sJ1AJOhcsxW/evW43cP+Hup0nCpQ0gBfYhvZCl2FkH84QM1Br4zil0XjlN0mnCPKQHYE0ZPQJjCEnudFH3WEPMfbw7yfUreO8luXL8ZDPzV7UeGd1qmmrhCWYSdPuqxTFzzhKxLt6F9mTuK3uENrcaN3ZEzutib3AWN6DPl3LCd6/4cEcGe2gMxmtP+tjgGcTZRUW/43TQPHQH+rEJj9oj9ttiVEQDmAp2k80TsZKpg1MhVfn/XeAu4XhRbt56TNBWvcoKCV7CVWJuYzxwQ/qAFshcqDneTAghB0S/II1e51+ID4aZjBesetj06h6/R6dEyEKXtGO829p3kl3xaDRusQxb2pEG5k9VAhX1Qf/7nrCQLYTXMAvGvo/meyhCIagprmm4IT7aDmUk3TX1Jtu2idIT7HmhJP47hijMYWY3J8dnDgH3XnCpf4e7wTPsFLK81oa2u6oxV21bZC/BJaP3eLrPWP6C2nugGTL+tlP7CvPMQxDzJP7PTGjqVY5WOdbPIk607SFH0xLf9lp20yhTzM6k+DCjpySBSHtFAfVx7kbQhFf8LKxXQB69LCfOrHPQvAoYV4tUQEakkcHPa1mbYl3WRG1XfiMINvnCCzene077vVKqAefbTbI9i6pWoM7vD3DQgIWm9EPsDPy6DM3FgjASxeOUxotzaixWDDORT8Knajv/PXj7Sczyf7t5k6XkTZ/AOn++u1mQwvndfz4Bx1T/V+8qUdKN0GcH7YoJxbTDyejvzYw36XQs/nnjct/9WVjoM4ysh5z0RwB2g+4M29BAfgAabxPAFDtZ/W93Au56NBJ076MomAEzikLdNsN28nMZoV/0VTdrwz8/frjDQuKsECUfg9KGZ1qz2Anl0wmd1ZZySPR6No/VBPUiw3PK2KogsM5BwEgrielNi441RIxuGFjXrSAA5MRRIDGWfeJWRcrsJPLq HhBe1XxT 0/HsU2O2D3QBwtY7TYonZwm9+TbRmmoDRuZvAAdLToWt8/dnw2JnLP5QRXwIQMYCOlep4WsrblS/9pj5NWADu5gjiiOPcrg6Mf2C+ZqX3HkVW3UngZ1wv5w/iz7zjTwdREsj5tYGdamMZDTe8wVh3iS+JYiPUh/UZepDm0J6PLWVLa8/84xtEik1LaivGG64Cbp0mmhQMB2S/i237wrL5Yt3DKD1jEc0Z0YMru9UabyxrBv1VTWRmAWlGvQchvL76nFprlSI72cDm8qhXU0C7Ji15vkmaNQrnJ0mdsvTdZEP4iQPPZtGcfXDpKQ/Mj6kwr6TvCYJeazSWLj8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 3/6/23 09:23, Matthew Wilcox wrote: > On Sun, Mar 05, 2023 at 12:22:15PM +0100, Hannes Reinecke wrote: >> On 3/4/23 18:54, Matthew Wilcox wrote: >>> I think we're talking about different things (probably different storage >>> vendors want different things, or even different people at the same >>> storage vendor want different things). >>> >>> Luis and I are talking about larger LBA sizes. That is, the minimum >>> read/write size from the block device is 16kB or 64kB or whatever. >>> In this scenario, the minimum amount of space occupied by a file goes >>> up from 512 bytes or 4kB to 64kB. That's doable, even if somewhat >>> suboptimal. >>> >> And so do I. One can view zones as really large LBAs. >> >> Indeed it might be suboptimal from the OS point of view. >> But from the device point of view it won't. >> And, in fact, with devices becoming faster and faster the question is >> whether sticking with relatively small sectors won't become a limiting >> factor eventually. >> >>> Your concern seems to be more around shingled devices (or their equivalent >>> in SSD terms) where there are large zones which are append-only, but >>> you can still random-read 512 byte LBAs. I think there are different >>> solutions to these problems, and people are working on both of these >>> problems. >>> >> My point being that zones are just there because the I/O stack can only deal >> with sectors up to 4k. If the I/O stack would be capable of dealing >> with larger LBAs one could identify a zone with an LBA, and the entire issue >> of append-only and sequential writes would be moot. >> Even the entire concept of zones becomes irrelevant as the OS would >> trivially only write entire zones. > > All current filesystems that I'm aware of require their fs block size > to be >= LBA size. That is, you can't take a 512-byte blocksize ext2 > filesystem and put it on a 4kB LBA storage device. > > That means that files can only grow/shrink in 256MB increments. I > don't think that amount of wasted space is going to be acceptable. > So if we're serious about going down this path, we need to tell > filesystem people to start working out how to support fs block > size < LBA size. > > That's a big ask, so let's be sure storage vendors actually want > this. Both supporting zoned devices & suporting 16k/64k block > sizes are easier asks. Why, I know. And this really is a future goal. (Possibly a very _distant_ future goal.) Indeed we should concentrate on getting 16k/64k blocks initially. Or maybe 128k blocks to help our RAIDed friends. Cheers, Hannes