From: David Howells <dhowells@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: dhowells@redhat.com, lsf-pc@lists.linux-foundation.org,
netfs@lists.linux.dev, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] Large folios, swap and fscache
Date: Fri, 02 Feb 2024 15:57:44 +0000 [thread overview]
Message-ID: <2761655.1706889464@warthog.procyon.org.uk> (raw)
In-Reply-To: <Zbz8VAKcO56rBh6b@casper.infradead.org>
Matthew Wilcox <willy@infradead.org> wrote:
> So my modest proposal is that we completely rearchitect how we handle
> swap. Instead of putting swp entries in the page tables (and in shmem's
> case in the page cache), we turn swap into an (object, offset) lookup
> (just like a filesystem). That means that each anon_vma becomes its
> own swap object and each shmem inode becomes its own swap object.
> The swap system can then borrow techniques from whichever filesystem
> it likes to do (object, offset, length) -> n x (device, block) mappings.
That's basically what I'm suggesting, I think, but offloading the mechanics
down to a filesystem. That would be fine with me. bcachefs is an {key,val}
store right?
> > Further to this, we have at least two ways to cache data on
> > disk/flash/etc. - swap and fscache - and both want to set aside disk space
> > for their operation. Might it be possible to combine the two?
> >
> > One thing I want to look at for fscache is the possibility of switching
> > from a file-per-object-based approach to a tagged cache more akin to the
> > way OpenAFS does things. In OpenAFS, you have a whole bunch of small
> > files, each containing a single block (e.g. 256K) of data, and an index
> > that maps a particular {volume,file,version,block} to one of these files
> > in the cache.
>
> I think my proposal above works for you? For each file you want to cache,
> create a swap object, and then tell swap when you want to read/write to
> the local swap object. What you do need is to persist the objects over
> a power cycle. That shouldn't be too hard ... after all, filesystems
> manage to do it.
Sure - but there is an integrity constraint that doesn't exist with swap.
There is also an additional feature of fscache: unless the cache entry is
locked in the cache (e.g. we're doing diconnected operation), we can throw
away an object from fscache and recycle it if we need space. In fact, this is
the way OpenAFS works: every write transaction done on a file/dir on the
server is done atomically and is given a monotonically increasing data version
number that is then used as part of the index key in the cache. So old
versions of the data get recycled as the cache needs to make space.
Which also means that if swap needs more space, it can just kick stuff out of
fscache if it is not locked in.
> All we need to do is figure out how to name the lookup (I don't think we
> need to use strings to name the swap object, but obviously we could). Maybe
> it's just a stream of bytes.
A binary blob would probably be better.
I would use a separate index to map higher level organisations, such as
cell+volume in afs or the server address + share name in cifs to an index
number that can be used in the cache.
Further, I could do with a way to invalidate all objects matching a particular
subkey.
David
next prev parent reply other threads:[~2024-02-02 15:57 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-02 9:09 David Howells
2024-02-02 14:29 ` Matthew Wilcox
2024-02-22 19:02 ` Luis Chamberlain
2024-02-22 19:16 ` Yosry Ahmed
2024-02-22 22:26 ` Chris Li
2024-02-29 19:31 ` Chris Li
2024-02-02 15:57 ` David Howells [this message]
2024-02-02 19:22 ` Matthew Wilcox
2024-02-03 5:13 ` Gao Xiang
2024-02-04 23:45 ` Dave Chinner
2024-02-22 22:45 ` Chris Li
2024-02-23 3:00 ` Andreas Dilger
2024-02-23 3:46 ` Chris Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2761655.1706889464@warthog.procyon.org.uk \
--to=dhowells@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=netfs@lists.linux.dev \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox