linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: dhowells@redhat.com, Kent Overstreet <kent.overstreet@gmail.com>,
	Matthew Wilcox <willy@infradead.org>,
	Jeff Layton <jlayton@kernel.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-cachefs@redhat.com,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Decoupling filesystems from pages
Date: Sun, 12 Sep 2021 14:21:07 +0100	[thread overview]
Message-ID: <1086693.1631452867@warthog.procyon.org.uk> (raw)

Hi Johannes,

> Wouldn't it make more sense to decouple filesystems from "paginess",
> as David puts it, now instead? Avoid the risk of doing it twice, avoid
> the more questionable churn inside mm code, avoid the confusing
> proximity to the page and its API in the long-term...

Let me seize that opening.  I've been working on doing this for network
filesystems - at least those that want to buy in.  If you look here:

https://lore.kernel.org/ceph-devel/162687506932.276387.14456718890524355509.stgit@warthog.procyon.org.uk/T/#m23428c315a77d8c5206b9646bf74c8ef18d4d38c

the current state of which is here:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=netfs-folio-regions

I've been looking at abstracting anything to do with pages out of the netfs
and putting that stuff into a helper library.  The library handles all the
caching stuff and just presents the filesystem with requests to read
into/write from an iov_iter.  The filesystem doesn't then see pages at all.

The motivation behind this is to make content encryption and compression
transparent and automatically available to all participating filesystems -
with the requirement that the data stored in the local disk cache
(ie. fscache) is *also* encrypted.

I have content encryption working for basic read and write on afs and Jeff
Layton is looking at how to make it work with ceph - but it's very much a work
in progress and things like truncate and mmap don't yet work with it.

Anyway, the library, as I'm currently writing it, maintains a list of
byte-range dirty regions on each inode, where a dirty region may span multiple
folios and a folio may be contributory to multiple regions.  The fact that
pages are involved is really then merely an implementation detail

Content encryption/compression blocks may be any power-of-2 size, from 2 bytes
to megabytes, and this need bear no relation to page size.  The library calls
the crypto hooks for each crypto block in the chunk[*] to be crypted.

[*] Terminology is such fun.  I have to deal with pages, crypto blocks, object
    layout blocks, I/O blocks (rsize/wsize settings), regions.

In fact ->readpage(), ->writepage() and ->launder_page() are difficult when I
may be required to deal with blocks larger than the size of a page.  The page
being poked may be in the middle of a block, so I'm endeavouring to work
around that.  Using the regions should allow me to 'launder' an inode before
invalidating the pages attached to it, and the dirty region objects can act
instead of the dirty, writeback and fscache flags on a page.

I've been building this on top of Willy's folio patchset, and so I've paused
for the moment whilst I wait to see what becomes of that.  If folios doesn't
get in or gets renamed, I have a load of reworking to do.

Does this sound like something you'd be interested in looking at more
generally than just network filesystems?

David



                 reply	other threads:[~2021-09-12 13:21 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1086693.1631452867@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=djwong@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=jlayton@kernel.org \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-cachefs@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox