Re: [LSF/MM/BPF TOPIC] Large folios, swap and fscache

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Chris Li <chrisl@kernel.org>
To: David Howells <dhowells@redhat.com>
Cc: lsf-pc@lists.linux-foundation.org,
	Matthew Wilcox <willy@infradead.org>,
	 netfs@lists.linux.dev, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] Large folios, swap and fscache
Date: Thu, 22 Feb 2024 14:45:30 -0800	[thread overview]
Message-ID: <CAF8kJuNt2Vqk0yGkuz7qHAui7tb9B1W6U+SLyTmc6N2ngCU53A@mail.gmail.com> (raw)
In-Reply-To: <2701740.1706864989@warthog.procyon.org.uk>

Hi David,

On Fri, Feb 2, 2024 at 1:10 AM David Howells <dhowells@redhat.com> wrote:
>
> Hi,
>
> The topic came up in a recent discussion about how to deal with large folios
> when it comes to swap as a swap device is normally considered a simple array
> of PAGE_SIZE-sized elements that can be indexed by a single integer.

Sorry for being late for the party. I think I was the one that brought
this topic up in the online discussion with Will and You. Let me know
if you are referring to a different discussion.

>
> With the advent of large folios, however, we might need to change this in
> order to be better able to swap out a compound page efficiently.  Swap
> fragmentation raises its head, as does the need to potentially save multiple
> indices per folio.  Does swap need to grow more filesystem features?

Yes, with a large folio, it is harder to allocate continuous swap
entries where 4K swap entries are allocated and free all the time. The
fragmentation will likely make the swap file have very little
continuous swap entries.

We can change that assumption, allow large folio reading and writing
of discontinued blocks on the block device level. We will likely need
a file system like kind of the indirection layer to store the location
of those blocks. In other words, the folio needs to read/write a list
of io vectors, not just one block.

>
> Further to this, we have at least two ways to cache data on disk/flash/etc. -
> swap and fscache - and both want to set aside disk space for their operation.
> Might it be possible to combine the two?
>
> One thing I want to look at for fscache is the possibility of switching from a
> file-per-object-based approach to a tagged cache more akin to the way OpenAFS
> does things.  In OpenAFS, you have a whole bunch of small files, each
> containing a single block (e.g. 256K) of data, and an index that maps a
> particular {volume,file,version,block} to one of these files in the cache.
>
> Now, I could also consider holding all the data blocks in a single file (or
> blockdev) - and this might work for swap.  For fscache, I do, however, need to
> have some sort of integrity across reboots that swap does not require.

The main trade off is the memory usage for the meta data and latency
of reading and writing.
The file system has typically a different IO pattern than swap, e.g.
file reads can be batched and have good locality.
Where swap is a lot of random location read/write.

Current swap using array like swap entry, one of the pros of that is
just one IO required for one folio.
The performance gets worse when swap needs to read the metadata first
to locate the block, then read the block of data in.
Page fault latency will get longer. That is one of the trade-offs we
need to consider.

Chris

next prev parent reply	other threads:[~2024-02-22 22:45 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-02  9:09 David Howells
2024-02-02 14:29 ` Matthew Wilcox
2024-02-22 19:02   ` Luis Chamberlain
2024-02-22 19:16     ` Yosry Ahmed
2024-02-22 22:26     ` Chris Li
2024-02-29 19:31   ` Chris Li
2024-02-02 15:57 ` David Howells
2024-02-02 19:22   ` Matthew Wilcox
2024-02-03  5:13 ` Gao Xiang
2024-02-04 23:45 ` Dave Chinner
2024-02-22 22:45 ` Chris Li [this message]
2024-02-23  3:00   ` Andreas Dilger
2024-02-23  3:46     ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAF8kJuNt2Vqk0yGkuz7qHAui7tb9B1W6U+SLyTmc6N2ngCU53A@mail.gmail.com \
    --to=chrisl@kernel.org \
    --cc=dhowells@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=netfs@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox