From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1742DC433F5 for ; Sun, 12 Sep 2021 13:21:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A36F661213 for ; Sun, 12 Sep 2021 13:21:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A36F661213 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 09377900002; Sun, 12 Sep 2021 09:21:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0432D6B0072; Sun, 12 Sep 2021 09:21:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E7355900002; Sun, 12 Sep 2021 09:21:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0054.hostedemail.com [216.40.44.54]) by kanga.kvack.org (Postfix) with ESMTP id D54306B0071 for ; Sun, 12 Sep 2021 09:21:19 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 6AAE0267F2 for ; Sun, 12 Sep 2021 13:21:19 +0000 (UTC) X-FDA: 78578982678.19.6C0F79E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf03.hostedemail.com (Postfix) with ESMTP id 03F9E30000A4 for ; Sun, 12 Sep 2021 13:21:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1631452878; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=yZLdvXCxnHFukhCLU/8ODaJA2GQ4KIoZzMo62K6+NmI=; b=RU4Cy7HNEbqi0jQ+cHmCgzLnYJiTPzvLNNnrKjzLJ8jWYkvhgO+R5ACSA2DfGTdTLS4PnB sfZU9NQhpX3zOkJLn3MxyUIB35UZ6hBkQhvrZlAHuKmqV6YSmTAV7DDUx5jamPTryFipgb 3jC1t3vmnAKbjiKvEunukiUyemB41g0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-393-o-hs5TClOaWiPWJYJ7fplQ-1; Sun, 12 Sep 2021 09:21:16 -0400 X-MC-Unique: o-hs5TClOaWiPWJYJ7fplQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BE5BD1808304; Sun, 12 Sep 2021 13:21:14 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0A8081B472; Sun, 12 Sep 2021 13:21:07 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells To: Johannes Weiner cc: dhowells@redhat.com, Kent Overstreet , Matthew Wilcox , Jeff Layton , "Darrick J. Wong" , Christoph Hellwig , Linus Torvalds , Andrew Morton , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Decoupling filesystems from pages MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <1086692.1631452867.1@warthog.procyon.org.uk> Content-Transfer-Encoding: quoted-printable Date: Sun, 12 Sep 2021 14:21:07 +0100 Message-ID: <1086693.1631452867@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=RU4Cy7HN; spf=none (imf03.hostedemail.com: domain of dhowells@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=dhowells@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 03F9E30000A4 X-Stat-Signature: yogg1rqzmk5fipouc9d6e4fkc3p1o8a6 X-HE-Tag: 1631452878-337842 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Johannes, > Wouldn't it make more sense to decouple filesystems from "paginess", > as David puts it, now instead? Avoid the risk of doing it twice, avoid > the more questionable churn inside mm code, avoid the confusing > proximity to the page and its API in the long-term... Let me seize that opening. I've been working on doing this for network filesystems - at least those that want to buy in. If you look here: https://lore.kernel.org/ceph-devel/162687506932.276387.1445671889052435550= 9.stgit@warthog.procyon.org.uk/T/#m23428c315a77d8c5206b9646bf74c8ef18d4d38= c the current state of which is here: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/= ?h=3Dnetfs-folio-regions I've been looking at abstracting anything to do with pages out of the netf= s and putting that stuff into a helper library. The library handles all the caching stuff and just presents the filesystem with requests to read into/write from an iov_iter. The filesystem doesn't then see pages at all= . The motivation behind this is to make content encryption and compression transparent and automatically available to all participating filesystems - with the requirement that the data stored in the local disk cache (ie. fscache) is *also* encrypted. I have content encryption working for basic read and write on afs and Jeff Layton is looking at how to make it work with ceph - but it's very much a = work in progress and things like truncate and mmap don't yet work with it. Anyway, the library, as I'm currently writing it, maintains a list of byte-range dirty regions on each inode, where a dirty region may span mult= iple folios and a folio may be contributory to multiple regions. The fact that pages are involved is really then merely an implementation detail Content encryption/compression blocks may be any power-of-2 size, from 2 b= ytes to megabytes, and this need bear no relation to page size. The library ca= lls the crypto hooks for each crypto block in the chunk[*] to be crypted. [*] Terminology is such fun. I have to deal with pages, crypto blocks, ob= ject layout blocks, I/O blocks (rsize/wsize settings), regions. In fact ->readpage(), ->writepage() and ->launder_page() are difficult whe= n I may be required to deal with blocks larger than the size of a page. The p= age being poked may be in the middle of a block, so I'm endeavouring to work around that. Using the regions should allow me to 'launder' an inode befo= re invalidating the pages attached to it, and the dirty region objects can ac= t instead of the dirty, writeback and fscache flags on a page. I've been building this on top of Willy's folio patchset, and so I've paus= ed for the moment whilst I wait to see what becomes of that. If folios doesn= 't get in or gets renamed, I have a load of reworking to do. Does this sound like something you'd be interested in looking at more generally than just network filesystems? David