From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10B03C6FA9E for ; Sat, 4 Mar 2023 07:34:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C8866B0072; Sat, 4 Mar 2023 02:34:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 078EC6B0073; Sat, 4 Mar 2023 02:34:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA9696B0074; Sat, 4 Mar 2023 02:34:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DA6376B0072 for ; Sat, 4 Mar 2023 02:34:47 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 984DF1A06EF for ; Sat, 4 Mar 2023 07:34:47 +0000 (UTC) X-FDA: 80530403814.02.47FEF4C Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf03.hostedemail.com (Postfix) with ESMTP id 38DEC2000F for ; Sat, 4 Mar 2023 07:34:44 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=DEQSDa3s; spf=none (imf03.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677915285; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cSluSmsGaLzDYhfoisEbRtp29cbXLVqMPsLxJj6jGno=; b=EJGZnE29OLxgqffw1mmMnfePUfEqSK1hjcCp8WEA0+hse/NDZEyDZjLqMJ+DHwiYj4lRfQ T6hYL/S4Sbv0m9Gnswx7d1ywwZKPmxEa4fujXMSqlFTIrC/gqnVkeNjOZCUsXLGwf8crNw cJIaMviyhb8JyxcYRvvnyj79MQBsZmg= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=DEQSDa3s; spf=none (imf03.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677915285; a=rsa-sha256; cv=none; b=q53vLhrfIxJz42Q5Ao2eXXCM8QQHhvVgzTlmHZW2u/WQGFYLVMYtzlvPwkelcG3Si1J3ym qNYHurjQ9Vl/vs8hztSwaYQHEfbvZcuwZlTcnMjSbQuO/85V+gf/9JPp0H7p/HE3bMhntD 7JQq2pj3URRVIohceFkgo0lc6gpwef4= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=cSluSmsGaLzDYhfoisEbRtp29cbXLVqMPsLxJj6jGno=; b=DEQSDa3sTTeaqwDXHmg4i2Bg4d Z6nzx312CvCszO5/LXypdlJePS7uQ8kTu6Jh+g9em1dAEL+aPG7aeclzfDAN3RfzG2pjWVbBQ4mOJ 0OcYdnta/DrCTD5MhzAO0KbUnS5LjE4mVwFqg90P0/fzGd2L+HyN+RDmujl3pv7IHhwFA520/X7jS R0blBBqTkkcBq2x5ZP8xS9SgkF6YS5Q3JHQru8mjzqDnWkjnk6206J+MDVibpu+ZMp+jiWRIYOlGm 1zMoOzx8YSYZeHqM8DwGN9Ne+ZW0ZKaZ71myX8Pac25mxzahgqY2R+yIct9UUjGYg8v40we4fbhrL 7TYQysRQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pYMPZ-003ikI-Hl; Sat, 04 Mar 2023 07:34:33 +0000 Date: Sat, 4 Mar 2023 07:34:33 +0000 From: Matthew Wilcox To: James Bottomley Cc: Keith Busch , Luis Chamberlain , Theodore Ts'o , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org Subject: Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 38DEC2000F X-Stat-Signature: ebqjp7nrc77rsfgpk6yokg8ug7ignais X-Rspam-User: X-HE-Tag: 1677915284-378881 X-HE-Meta: U2FsdGVkX1+Ks4sEJtZxyBrUc9dAufnaY73QV+sUM8EhJ7M0Ejy8tFW4tin2PyXvTkjosZ3+uadUt7xvfr/KQDeRUDYHRXzPVgFpfxwsDzllSkXNNfn9490dqQJYwoZZsE/hy5wfWoOAac6VIq73OF613YHu7LG1qqjlbA8kOQJZpaV/MFcKkXJZYXNmFmk3IocS3AGfCpBswkxr5zwnNHBTjFDsS7+ZORh/V1l8ahK9RReoO1krz69Meks/WQAumUuWn9ZmBld4MNHl3Skf5H2wWFaAoT79lGdjB2jtw+CaBxPNRIvomKYwbe15vyBl6XrpoALXNrbUzZCco3zBU0W4BsgNXix2kcs3PlhOy0iQEt9ww2PCT8QXMVUP85VJtwrT3wclb12a76LjvkG8UAwHTXLn/63Y/5X1sOTU/yWQS0q8WwrizyhmNjHL2nsNwjgbmH5OVMcWRILHz8k0a8L9qm8CtLAVfL/HpR1z8g9ifV0cuiChTgYLBN9J9rsmo5nKqW2kF2pbpLkWgOvESfNNgpUjhgYAK1OaIQEKOQOa7MzbS3Gfvgkmp+c2itESyomOO53JbJAVieRJ587OG+R7ZVNj/fv1H014AP00Siz6cuG1mOWGX31EFdjZO8LNeIru2X3dJ7jxlJYz3KpPQxD+2sdBZRkxqvA6e2pmRb5Sq12FZaicvx5vW5i6xsJt4gLFoVLeTjYRAcGYOsrZqRKmsL+45i7lLpBgIpmEARV4ZP1F41/OyTsisC6hVQaYAWSpYS3Q88S7lIZU9ur5yXTLAkwsPQNW7keIaXuQK2qFm4bB9stutgNoZeg1L56mPW6t8TZcvyrFQGW3jYV8um1+wk4sRFkMhydfeAhCu4vUfJbSPiRNDPQb0RAIZeRjnu7yNwNbvwCfv42Zk2WMVYUn+7gO9v0JK12pvAqGQcfRyq2cCqW1563PUwO7zkxt3FteVEjOBuamF1h4jXm yEygNlIS OGHcpYyezPQfb3LcuA8VtSimOuOv2Lq4383iad7JYOr63A9/p92FOabN0ZYeyFZVu5ner399tqXr9uyFKWPVt+7/XByULqqEomysSHYyxYQVaoiD4Bsdn7otgRHP9keo9zjXveTA9cBEgErh4u8gvcC9Eew== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Mar 03, 2023 at 08:11:47AM -0500, James Bottomley wrote: > On Fri, 2023-03-03 at 03:49 +0000, Matthew Wilcox wrote: > > On Thu, Mar 02, 2023 at 06:58:58PM -0700, Keith Busch wrote: > > > That said, I was hoping you were going to suggest supporting 16k > > > logical block sizes. Not a problem on some arch's, but still > > > problematic when PAGE_SIZE is 4k. :) > > > > I was hoping Luis was going to propose a session on LBA size > > > PAGE_SIZE. Funnily, while the pressure is coming from the storage > > vendors, I don't think there's any work to be done in the storage > > layers.  It's purely a FS+MM problem. > > Heh, I can do the fools rush in bit, especially if what we're > interested in the minimum it would take to support this ... > > The FS problem could be solved simply by saying FS block size must > equal device block size, then it becomes purely a MM issue. Spoken like somebody who's never converted a filesystem to supporting large folios. There are a number of issues: 1. The obvious; use of PAGE_SIZE and/or PAGE_SHIFT 2. Use of kmap-family to access, eg directories. You can't kmap an entire folio, only one page at a time. And if a dentry is split across a page boundary ... 3. buffer_heads do not currently support large folios. Working on it. Probably a few other things I forget. But look through the recent patches to AFS, CIFS, NFS, XFS, iomap that do folio conversions. A lot of it is pretty mechanical, but some of it takes hard thought. And if you have ideas about how to handle ext2 directories, I'm all ears. > The MM > issue could be solved by adding a page order attribute to struct > address_space and insisting that pagecache/filemap functions in > mm/filemap.c all have to operate on objects that are an integer > multiple of the address space order. The base allocator is > filemap_alloc_folio, which already has an apparently always zero order > parameter (hmmm...) and it always seems to be called from sites that > have the address_space, so it could simply be modified to always > operate at the address_space order. Oh, I have a patch for that. That's the easy part. The hard part is plugging your ears to the screams of the MM people who are convinced that fragmentation will make it impossible to mount your filesystem.