From: Barry Song <21cnbao@gmail.com>
To: Nhat Pham <nphamcs@gmail.com>
Cc: Chris Li <chrisl@kernel.org>,
lsf-pc@lists.linux-foundation.org, linux-mm <linux-mm@kvack.org>,
ryan.roberts@arm.com, David Hildenbrand <david@redhat.com>,
Chuanhua Han <hanchuanhua@oppo.com>
Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction "the pony"
Date: Wed, 6 Mar 2024 14:33:14 +1300 [thread overview]
Message-ID: <CAGsJ_4ysCN6f7qt=6gvee1x3ttbOnifGneqcRm9Hoeun=uFQ2w@mail.gmail.com> (raw)
In-Reply-To: <CAKEwX=P7AE8Ofqi4CyL0UOwSOVvHEG4kUFmRBzHH_N=NPxPDuA@mail.gmail.com>
On Fri, Mar 1, 2024 at 10:53 PM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Fri, Mar 1, 2024 at 4:24 PM Chris Li <chrisl@kernel.org> wrote:
> >
> > In last year's LSF/MM I talked about a VFS-like swap system. That is
> > the pony that was chosen.
> > However, I did not have much chance to go into details.
>
> I'd love to attend this talk/chat :)
>
> >
> > This year, I would like to discuss what it takes to re-architect the
> > whole swap back end from scratch?
> >
> > Let’s start from the requirements for the swap back end.
> >
> > 1) support the existing swap usage (not the implementation).
> >
> > Some other design goals::
> >
> > 2) low per swap entry memory usage.
> >
> > 3) low io latency.
> >
> > What are the functions the swap system needs to support?
> >
> > At the device level. Swap systems need to support a list of swap files
> > with a priority order. The same priority of swap device will do round
> > robin writing on the swap device. The swap device type includes zswap,
> > zram, SSD, spinning hard disk, swap file in a file system.
> >
> > At the swap entry level, here is the list of existing swap entry usage:
> >
> > * Swap entry allocation and free. Each swap entry needs to be
> > associated with a location of the disk space in the swapfile. (offset
> > of swap entry).
> > * Each swap entry needs to track the map count of the entry. (swap_map)
> > * Each swap entry needs to be able to find the associated memory
> > cgroup. (swap_cgroup_ctrl->map)
> > * Swap cache. Lookup folio/shadow from swap entry
> > * Swap page writes through a swapfile in a file system other than a
> > block device. (swap_extent)
> > * Shadow entry. (store in swap cache)
>
> IMHO, one thing this new abstraction should support is seamless
> transfer/migration of pages from one backend to another (perhaps from
> high to low priority backends, i.e writeback).
>
> I think this will require some careful redesigns. The closest thing we
> have right now is zswap -> backing swapfile. But it is currently
> handled in a rather peculiar manner - the underlying swap slot has
> already been reserved for the zswap entry. But there's a couple of
> problems with this:
>
> a) This is wasteful. We're essentially having the same piece of data
> occupying spaces in two levels in the hierarchies.
> b) How do we generalize to a multi-tier hierarchy?
> c) This is a bit too backend-specific. It'd be nice if we can make
> this as backend-agnostic as possible (if possible).
>
> Motivation: I'm currently working/thinking about decoupling zswap and
> swap, and this is one of the more challenging aspects (as I can't seem
> to find a precedent in the swap world for inter-swap backends pages
> migration), and especially with respect to concurrent loads (and
> swapcache interactions).
>
> I don't have good answers/designs quite yet - just raising some
> questions/concerns :)
I actually have one more problem here. to swap in a large folio,
in case we have 16 subpages, it could be that 5 subpages are
in zswap and 11 are in the backend swap in some cases. we get
no way to differententiate this unless we iterate subpage one by one
within a large folio before calling zswap_load(). right now,
swap_read_folio() can't handle this,
void swap_read_folio(struct folio *folio, bool synchronous,
struct swap_iocb **plug)
{
...
if (zswap_load(folio)) {
folio_mark_uptodate(folio);
folio_unlock(folio);
} else if (data_race(sis->flags & SWP_FS_OPS)) {
swap_read_folio_fs(folio, plug);
} else if (synchronous || (sis->flags & SWP_SYNCHRONOUS_IO)) {
swap_read_folio_bdev_sync(folio, sis);
} else {
swap_read_folio_bdev_async(folio, sis);
}
...
}
Thanks
Barry
next prev parent reply other threads:[~2024-03-06 1:33 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-01 9:24 Chris Li
2024-03-01 9:53 ` Nhat Pham
2024-03-01 18:57 ` Chris Li
2024-03-04 22:58 ` Matthew Wilcox
2024-03-05 3:23 ` Chengming Zhou
2024-03-05 7:44 ` Chris Li
2024-03-05 8:15 ` Chengming Zhou
2024-03-05 18:24 ` Chris Li
2024-03-05 9:32 ` Nhat Pham
2024-03-05 9:52 ` Chengming Zhou
2024-03-05 10:55 ` Nhat Pham
2024-03-05 19:20 ` Chris Li
2024-03-05 20:56 ` Jared Hulbert
2024-03-05 21:38 ` Jared Hulbert
2024-03-05 21:58 ` Chris Li
2024-03-06 4:16 ` Jared Hulbert
2024-03-06 5:50 ` Chris Li
[not found] ` <CA+ZsKJ7JE56NS6hu4L_uyywxZO7ixgftvfKjdND9e5SOyn+72Q@mail.gmail.com>
2024-03-06 18:16 ` Chris Li
2024-03-06 22:44 ` Jared Hulbert
2024-03-07 0:46 ` Chris Li
2024-03-07 8:57 ` Jared Hulbert
2024-03-06 1:33 ` Barry Song [this message]
2024-03-04 18:43 ` Kairui Song
2024-03-04 22:03 ` Jared Hulbert
2024-03-04 22:47 ` Chris Li
2024-03-04 22:36 ` Chris Li
2024-03-06 1:15 ` Barry Song
2024-03-06 2:59 ` Chris Li
2024-03-06 6:05 ` Barry Song
2024-03-06 17:56 ` Chris Li
2024-03-06 21:29 ` Barry Song
2024-03-08 8:55 ` David Hildenbrand
2024-03-07 7:56 ` Chuanhua Han
2024-03-07 14:03 ` [Lsf-pc] " Jan Kara
2024-03-07 21:06 ` Jared Hulbert
2024-03-07 21:17 ` Barry Song
2024-03-08 0:14 ` Jared Hulbert
2024-03-08 0:53 ` Barry Song
2024-03-14 9:03 ` Jan Kara
2024-05-16 15:04 ` Zi Yan
2024-05-17 3:48 ` Chris Li
2024-03-14 8:52 ` Jan Kara
2024-03-08 2:02 ` Chuanhua Han
2024-03-14 8:26 ` Jan Kara
2024-03-14 11:19 ` Chuanhua Han
2024-05-15 23:07 ` Chris Li
2024-05-16 7:16 ` Chuanhua Han
2024-05-17 12:12 ` Karim Manaouil
2024-05-21 20:40 ` Chris Li
2024-05-28 7:08 ` Jared Hulbert
2024-05-29 3:36 ` Chris Li
2024-05-29 3:57 ` Matthew Wilcox
2024-05-29 6:50 ` Chris Li
2024-05-29 12:33 ` Matthew Wilcox
2024-05-30 22:53 ` Chris Li
2024-05-31 3:12 ` Matthew Wilcox
2024-06-01 0:43 ` Chris Li
2024-05-31 1:56 ` Yuanchu Xie
2024-05-31 16:51 ` Chris Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGsJ_4ysCN6f7qt=6gvee1x3ttbOnifGneqcRm9Hoeun=uFQ2w@mail.gmail.com' \
--to=21cnbao@gmail.com \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=hanchuanhua@oppo.com \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=nphamcs@gmail.com \
--cc=ryan.roberts@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox