linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Chris Li <chrisl@kernel.org>
To: Jared Hulbert <jaredeh@gmail.com>
Cc: linux-mm <linux-mm@kvack.org>
Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction "the pony"
Date: Wed, 6 Mar 2024 10:16:12 -0800	[thread overview]
Message-ID: <CANeU7QmYcHmyFMqjyBe1ij6KNCiFMKQx7MkWNANhmy8A8wnMjA@mail.gmail.com> (raw)
In-Reply-To: <CA+ZsKJ7JE56NS6hu4L_uyywxZO7ixgftvfKjdND9e5SOyn+72Q@mail.gmail.com>

On Wed, Mar 6, 2024 at 2:39 AM Jared Hulbert <jaredeh@gmail.com> wrote:
>
> On Tue, Mar 5, 2024 at 9:51 PM Chris Li <chrisl@kernel.org> wrote:

> > If your file size is 4K each and you need to store millions of 4K
> > small files, reference it by an integer like filename. Typical file
> > systems like ext4, btrfs will definitely not be able to get 1% meta
> > data storage for that kind of usage. 1% of 4K is 40 bytes. Your
> > typical inode struct is much bigger than that. Last I checked, the
> > sizeof(struct inode) is 632.
>
> Okay that is an interesting difference in assumptions.  I see no need
> to have file == page, I think that would be insane to have an inode
> per swap page.  You'd have one big "file" and do offsets.  Or a file
> per cgroup, etc.

Then you are back to design your own data structure to manage how to
map the swap entry into large file offsets. The swap file is a one
large file, it can group  clusters as smaller large files internally.
Why not use the swap file directly? The VFS does not really help, it
is more of a burden to maintain all those super blocks, directory,
inode etc.


> Remember I'm advocating a subset of the VFS interface, learning from
> it not using it as is.

You can't really use a subset without having the other parts drag
alone. Most of the VFS operations, those op call back functions do not
apply to swap directly any way.
If you say VFS is just an inspiration, then that is more or less what
I had in mind earlier :-)

>
> > > From a fundamental architecture standpoint it's not a stretch to think
> > > that a modified filesystem would be meet or beat existing swap engines
> > > on metadata overhead.
> >
> > Please show me one file system that can beat the existing swap system
> > in the swap specific usage case (load/store of individual 4K pages), I
> > am interested in learning.
>
> Well mind you I'm suggesting a modified filesystem and this is hard to
> compare apples to apples, but sure... here we go :)
>
> Consider an unmodified EXT4 vs ZRAM with a backing device of the same
> sizes, same hardware.
>
> Using the page cache as a bad proxy for RAM caching in the case of
> EXT4 and comparing to the ZRAM without sending anything to the backing
> store. The ZRAM is faster at reads while the EXT4 is a little faster
> at writes
>
>       | ZRAM     | EXT4     |
> -----------------------------
> read  | 4.4 GB/s | 2.5 GB/s |
> write | 643 MB/s | 658 MB/s |
>
> If you look at what happens when you talk about getting thing to and
> from the disk then while the ZRAM is a tiny bit faster at the reads
> but ZRAM is way slow at writes.
>
>       | ZRAM      | EXT4      |
> -------------------------------
> read  | 1.14 GB/s | 1.10 GB/s |
> write | 82.3 MB/s |  548 MB/s |

I am more interested in terms of per swap entry memory overhead.

Without knowing how you map the swap entry into file read/writes, I
have no idea now how to interpertet those numbers in the swap back end
usage context. ZRAM is just a block device, ZRAM does not participate
in how the swap entry was allocated or free. ZRAM does compression,
which is CPU intensive.  While EXT4 doesn't, it is understandable ZRAM
might have lower write bandwidth.   I am not sure how those numbers
translate into prediction of how a file system based swap back end
system performs.

Regards,

Chris


  parent reply	other threads:[~2024-03-06 18:16 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-01  9:24 Chris Li
2024-03-01  9:53 ` Nhat Pham
2024-03-01 18:57   ` Chris Li
2024-03-04 22:58   ` Matthew Wilcox
2024-03-05  3:23     ` Chengming Zhou
2024-03-05  7:44       ` Chris Li
2024-03-05  8:15         ` Chengming Zhou
2024-03-05 18:24           ` Chris Li
2024-03-05  9:32         ` Nhat Pham
2024-03-05  9:52           ` Chengming Zhou
2024-03-05 10:55             ` Nhat Pham
2024-03-05 19:20               ` Chris Li
2024-03-05 20:56                 ` Jared Hulbert
2024-03-05 21:38         ` Jared Hulbert
2024-03-05 21:58           ` Chris Li
2024-03-06  4:16             ` Jared Hulbert
2024-03-06  5:50               ` Chris Li
     [not found]                 ` <CA+ZsKJ7JE56NS6hu4L_uyywxZO7ixgftvfKjdND9e5SOyn+72Q@mail.gmail.com>
2024-03-06 18:16                   ` Chris Li [this message]
2024-03-06 22:44                     ` Jared Hulbert
2024-03-07  0:46                       ` Chris Li
2024-03-07  8:57                         ` Jared Hulbert
2024-03-06  1:33   ` Barry Song
2024-03-04 18:43 ` Kairui Song
2024-03-04 22:03   ` Jared Hulbert
2024-03-04 22:47     ` Chris Li
2024-03-04 22:36   ` Chris Li
2024-03-06  1:15 ` Barry Song
2024-03-06  2:59   ` Chris Li
2024-03-06  6:05     ` Barry Song
2024-03-06 17:56       ` Chris Li
2024-03-06 21:29         ` Barry Song
2024-03-08  8:55       ` David Hildenbrand
2024-03-07  7:56 ` Chuanhua Han
2024-03-07 14:03   ` [Lsf-pc] " Jan Kara
2024-03-07 21:06     ` Jared Hulbert
2024-03-07 21:17       ` Barry Song
2024-03-08  0:14         ` Jared Hulbert
2024-03-08  0:53           ` Barry Song
2024-03-14  9:03         ` Jan Kara
2024-05-16 15:04           ` Zi Yan
2024-05-17  3:48             ` Chris Li
2024-03-14  8:52       ` Jan Kara
2024-03-08  2:02     ` Chuanhua Han
2024-03-14  8:26       ` Jan Kara
2024-03-14 11:19         ` Chuanhua Han
2024-05-15 23:07           ` Chris Li
2024-05-16  7:16             ` Chuanhua Han
2024-05-17 12:12     ` Karim Manaouil
2024-05-21 20:40       ` Chris Li
2024-05-28  7:08         ` Jared Hulbert
2024-05-29  3:36           ` Chris Li
2024-05-29  3:57         ` Matthew Wilcox
2024-05-29  6:50           ` Chris Li
2024-05-29 12:33             ` Matthew Wilcox
2024-05-30 22:53               ` Chris Li
2024-05-31  3:12                 ` Matthew Wilcox
2024-06-01  0:43                   ` Chris Li
2024-05-31  1:56               ` Yuanchu Xie
2024-05-31 16:51                 ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANeU7QmYcHmyFMqjyBe1ij6KNCiFMKQx7MkWNANhmy8A8wnMjA@mail.gmail.com \
    --to=chrisl@kernel.org \
    --cc=jaredeh@gmail.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox