From: Chris Li <chrisl@kernel.org>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: Minchan Kim <minchan@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Sergey Senozhatsky <senozhatsky@chromium.org>,
lsf-pc@lists.linux-foundation.org, Linux-MM <linux-mm@kvack.org>,
Michal Hocko <mhocko@kernel.org>,
Shakeel Butt <shakeelb@google.com>,
David Rientjes <rientjes@google.com>,
Hugh Dickins <hughd@google.com>,
Seth Jennings <sjenning@redhat.com>,
Dan Streetman <ddstreet@ieee.org>,
Vitaly Wool <vitaly.wool@konsulko.com>,
Yang Shi <shy828301@gmail.com>, Peter Xu <peterx@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@surriel.com>
Subject: Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap
Date: Thu, 2 Mar 2023 09:05:28 -0800 [thread overview]
Message-ID: <ZADXWMtKDDq1xP3s@google.com> (raw)
In-Reply-To: <CAJD7tkYbLG1=YbBCF7bj9MbRcH0FkdjTi4tRZ+vAxE5DUodFAg@mail.gmail.com>
On Wed, Mar 01, 2023 at 04:58:08PM -0800, Yosry Ahmed wrote:
> > The indirection layer would be essential to support it but it would
> > be also great if we don't waste any memory for the user who don't
> > want the feature.
>
> I can't currently think of a way to eliminate overhead for people only
> using swapfiles, as a lot of the core implementation changes, unless
> we want to maintain considerably more code with a lot of repeated
> functionality implemented differently. Perhaps this will change as I
> implement this, maybe things are better (or worse) than what I think
> they are, I am actively working on a proof-of-concept right now. Maybe
> a discussion in LSF/MM/BPF will help come up with optimizations as
> well :)
>
> >
> > Just FYI, there was similar discussion long time ago about the
> > indirection layer.
> > https://lore.kernel.org/linux-mm/4DA25039.3020700@redhat.com/
>
> Yeah Hugh shared this one with me earlier, but there are a few things
> that I don't understand how they would work, at least in today's
> world.
Let's add Rik into the discussion, maybe he can help refresh some details.
Chris
>
> Firstly, the proposal suggests that we store a radix tree index in the
> page tables, and in the radix tree store the swap entry AND the swap
> count. I am not really sure how they would fit in 8 bytes, especially
> if we need continuation and 1 byte is not enough for the swap count.
> Continuation logic now depends on linking vmalloc'd pages using the
> lru field in struct page/folio. Perhaps we can figure out a split that
> gives enough space for swap count without continuation while also not
> limiting swapfile sizes too much.
>
> Secondly, IIUC in that proposal once we swap a page in, we free the
> swap entry and add the swapcache page to the radix tree instead. In
> that case, where does the swap count go? IIUC we still need to
> maintain it to be able to tell when all processes mapping the page
> have faulted it back, otherwise the radix tree entry is maintained
> indefinitely. We can maybe stash the swap count somewhere else in this
> case, and bring it back to the radix tree if we swap the page out
> again. Not really sure where, we can have a separate radix tree for
> swap counts when the page is in swapcache, or we can always have it in
> a separate radix tree so that the swap entry fits comfortably in the
> first radix tree.
>
> To be able to accomodate zswap in this design, I think we always need
> a separate radix tree for swap counts. In that case, one radix tree
> contains swap_entry/zswap_entry/swapcache, and the other radix tree
> contains the swap count. I think this may work, but I am not sure if
> the overhead of always doing a lookup to read the swap count is okay.
> I am also sure there would be some fun synchronization problems
> between both trees (but we already need to synchronize today between
> the swapcache and swap counts?).
>
> It sounds like it is possible to make it work. I will spend some time
> thinking about it. Having 2 radix trees also solves the 32-bit systems
> problem, but I am not sure if it's a generally better design. Radix
> trees also take up some extra space other than the entry size itself,
> so I am not sure how much memory we would end up actually saving.
>
> Johannes, I am curious if you have any thoughts about this alternative design?
>
next prev parent reply other threads:[~2023-03-02 17:05 UTC|newest]
Thread overview: 105+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-18 22:38 Yosry Ahmed
2023-02-19 4:31 ` Matthew Wilcox
2023-02-19 9:34 ` Yosry Ahmed
2023-02-28 23:22 ` Chris Li
2023-03-01 0:08 ` Matthew Wilcox
2023-03-01 23:22 ` Chris Li
2023-02-21 18:39 ` Yang Shi
2023-02-21 18:56 ` Yosry Ahmed
2023-02-21 19:26 ` Yang Shi
2023-02-21 19:46 ` Yosry Ahmed
2023-02-21 23:34 ` Yang Shi
2023-02-21 23:38 ` Yosry Ahmed
2023-02-22 16:57 ` Johannes Weiner
2023-02-22 22:46 ` Yosry Ahmed
2023-02-28 4:29 ` Kalesh Singh
2023-02-28 8:09 ` Yosry Ahmed
2023-02-28 4:54 ` Sergey Senozhatsky
2023-02-28 8:12 ` Yosry Ahmed
2023-02-28 23:29 ` Minchan Kim
2023-03-02 0:58 ` Yosry Ahmed
2023-03-02 1:25 ` Yosry Ahmed
2023-03-02 17:05 ` Chris Li [this message]
2023-03-02 17:47 ` Chris Li
2023-03-02 18:15 ` Johannes Weiner
2023-03-02 18:56 ` Chris Li
2023-03-02 18:23 ` Rik van Riel
2023-03-02 21:42 ` Chris Li
2023-03-02 22:36 ` Rik van Riel
2023-03-02 22:55 ` Yosry Ahmed
2023-03-03 4:05 ` Chris Li
2023-03-03 0:01 ` Chris Li
2023-03-02 16:58 ` Chris Li
2023-03-01 10:44 ` Sergey Senozhatsky
2023-03-02 1:01 ` Yosry Ahmed
2023-02-28 23:11 ` Chris Li
2023-03-02 0:30 ` Yosry Ahmed
2023-03-02 1:00 ` Yosry Ahmed
2023-03-02 16:51 ` Chris Li
2023-03-03 0:33 ` Minchan Kim
2023-03-03 0:49 ` Yosry Ahmed
2023-03-03 1:25 ` Minchan Kim
2023-03-03 17:15 ` Yosry Ahmed
2023-03-09 12:48 ` Huang, Ying
2023-03-09 19:58 ` Chris Li
2023-03-09 20:19 ` Yosry Ahmed
2023-03-10 3:06 ` Huang, Ying
2023-03-10 23:14 ` Chris Li
2023-03-13 1:10 ` Huang, Ying
2023-03-15 7:41 ` Yosry Ahmed
2023-03-16 1:42 ` Huang, Ying
2023-03-11 1:06 ` Yosry Ahmed
2023-03-13 2:12 ` Huang, Ying
2023-03-15 8:01 ` Yosry Ahmed
2023-03-16 7:50 ` Huang, Ying
2023-03-17 10:19 ` Yosry Ahmed
2023-03-17 18:19 ` Chris Li
2023-03-17 18:23 ` Yosry Ahmed
2023-03-20 2:55 ` Huang, Ying
2023-03-20 6:25 ` Chris Li
2023-03-23 0:56 ` Huang, Ying
2023-03-23 6:46 ` Chris Li
2023-03-23 6:56 ` Huang, Ying
2023-03-23 18:28 ` Chris Li
2023-03-23 18:40 ` Yosry Ahmed
2023-03-23 19:49 ` Chris Li
2023-03-23 19:54 ` Yosry Ahmed
2023-03-23 21:10 ` Chris Li
2023-03-24 17:28 ` Chris Li
2023-03-22 5:56 ` Yosry Ahmed
2023-03-23 1:48 ` Huang, Ying
2023-03-23 2:21 ` Yosry Ahmed
2023-03-23 3:16 ` Huang, Ying
2023-03-23 3:27 ` Yosry Ahmed
2023-03-23 5:37 ` Huang, Ying
2023-03-23 15:18 ` Yosry Ahmed
2023-03-24 2:37 ` Huang, Ying
2023-03-24 7:28 ` Yosry Ahmed
2023-03-24 17:23 ` Chris Li
2023-03-27 1:23 ` Huang, Ying
2023-03-28 5:54 ` Yosry Ahmed
2023-03-28 6:20 ` Huang, Ying
2023-03-28 6:29 ` Yosry Ahmed
2023-03-28 6:59 ` Huang, Ying
2023-03-28 7:59 ` Yosry Ahmed
2023-03-28 14:14 ` Johannes Weiner
2023-03-28 19:59 ` Yosry Ahmed
2023-03-28 21:22 ` Chris Li
2023-03-28 21:30 ` Yosry Ahmed
2023-03-28 20:50 ` Chris Li
2023-03-28 21:01 ` Yosry Ahmed
2023-03-28 21:32 ` Chris Li
2023-03-28 21:44 ` Yosry Ahmed
2023-03-28 22:01 ` Chris Li
2023-03-28 22:02 ` Yosry Ahmed
2023-03-29 1:31 ` Huang, Ying
2023-03-29 1:41 ` Yosry Ahmed
2023-03-29 16:04 ` Chris Li
2023-04-04 8:24 ` Huang, Ying
2023-04-04 8:10 ` Huang, Ying
2023-04-04 8:47 ` Yosry Ahmed
2023-04-06 1:40 ` Huang, Ying
2023-03-29 15:22 ` Chris Li
2023-03-10 2:07 ` Luis Chamberlain
2023-03-10 2:15 ` Yosry Ahmed
2023-05-12 3:07 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZADXWMtKDDq1xP3s@google.com \
--to=chrisl@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=ddstreet@ieee.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=mhocko@kernel.org \
--cc=minchan@kernel.org \
--cc=peterx@redhat.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=senozhatsky@chromium.org \
--cc=shakeelb@google.com \
--cc=shy828301@gmail.com \
--cc=sjenning@redhat.com \
--cc=vitaly.wool@konsulko.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox