From: Chris Li <chrisl@kernel.org>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Andrew Morton <akpm@linux-foundation.org>,
Kairui Song <kasong@tencent.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Barry Song <baohua@kernel.org>,
Chengming Zhou <chengming.zhou@linux.dev>,
linux-mm@kvack.org, Rik van Riel <riel@surriel.com>,
linux-kernel@vger.kernel.org, pratmal@google.com,
sweettea@google.com, gthelen@google.com, weixugc@google.com
Subject: Re: [PATCH RFC] mm: ghost swapfile support for zswap
Date: Tue, 25 Nov 2025 22:50:02 +0400 [thread overview]
Message-ID: <CACePvbUB8wjEH1rsaRs+LwX4RKGrjLFSGzjhgrdS3e7Lcz6BeQ@mail.gmail.com> (raw)
In-Reply-To: <2a8fd7bd35939b9aa4a7267c93e1fda995137966@linux.dev>
On Mon, Nov 24, 2025 at 11:32 PM Yosry Ahmed <yosry.ahmed@linux.dev> wrote:
>
> I think what Chris's idea is (and Chris correct me if I am wrong), is
> that we use ghost swapfiles (that are not backed by disk space) for
> zswap. So zswap has its own swapfiles, separate from disk swapfiles.
Ack.
> memory.tiers establishes the ordering between swapfiles, so you put
> "ghost" -> "real" to get today's zswap writeback behavior. When you
> writeback, you keep page tables pointing at the swap entry in the ghost
> swapfile. What you do is:
> - Allocate a new swap entry in the "real" swapfile.
> - Update the swap table of the "ghost" swapfile to point at the swap
> entry in the "real" swapfile, reusing the pointer used for the
> swapcache.
Ack, with minor adjustment in mapping the swap entry to the physical
location. The swap entry has swap cache, the physical location does
not.
> Then, on swapin, you read the swap table of the "ghost" swapfile, find
> the redirection, and read to the swap table of the "real" swapfile, then
> read the page from disk into the swap cache. The redirection in the
> "ghost" swapfile will keep existing, wasting that slot, until all
> references to it are dropped.
Ack. That is assuming we don't have a rmap a like for the swap entry.
> I think this might work for this specific use case, with less overhead
> than the xarray. BUT there are a few scenarios that are not covered
> AFAICT:
>
> - You still need to statically size the ghost swapfiles and their
> overheads.
No true, both ghost swapfile and physical swapfile can expand
additional clusters beyond the original physical size, for allocating
the continued high order entry or redirection. For a ghost swapfile,
there is no physical layer, only the front end. So the size can grow
dynamically. Just allocate more clusters. The current swapfile header
file size is just an initial size. My current patch does not implement
that. It will need some later swap table phase to make it happen. But
that is not an architecture limit, it has been considered as part of
normal business.
> - Wasting a slot in the ghost swapfile for the redirection. This
> complicates static provisioning a bit, because you have to account for
> entries that will be in zswap as well as writtenback. Furthermore,
> IIUC swap.tiers is intended to be generic and cover other use cases
> beyond zswap like SSD -> HDD. For that, I think wasting a slot in the
> SSD when we writeback to the HDD is a much bigger problem.
Yes and No. Yes it only wastes a front end swap entry (with swap
cache). The physical location is a seperate layer. No, the physical
SSD space is not wasted because you can allocate additional front end
swap entry by growing the swap entry front end. Then have the
additional front end swap entry point to the physical location you
just directed away from. There is a lot more consideration of the
front end vs the physical layer. The physical layer does not care
about location order size 2^N alignment. The physical layer cares a
bit about continuity and the number of IOV that it needs to issue.
The swap entry front end and the physical layer have slightly
different constraints.
> - We still cannot do swapoff efficiently as we need to walk the page
> tables (and some swap tables) to find and swapin all entries in a
> swapfile. Not as important as other things, but worth mentioning.
That need rmap for swap entries. It It is an independent issue.
Chris
next prev parent reply other threads:[~2025-11-25 18:50 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-21 9:31 Chris Li
2025-11-21 10:19 ` Nhat Pham
2025-11-22 1:52 ` Chris Li
2025-11-24 14:47 ` Nhat Pham
2025-11-25 18:26 ` Chris Li
2025-11-21 11:40 ` Johannes Weiner
2025-11-22 1:52 ` Chris Li
2025-11-22 10:29 ` Kairui Song
2025-11-24 15:35 ` Nhat Pham
2025-11-24 16:14 ` Rik van Riel
2025-11-24 17:26 ` Chris Li
2025-11-24 17:42 ` Rik van Riel
2025-11-24 17:58 ` Chris Li
2025-11-24 17:27 ` Johannes Weiner
2025-11-24 18:24 ` Chris Li
2025-11-24 19:32 ` Johannes Weiner
2025-11-25 19:27 ` Chris Li
2025-11-25 21:31 ` Johannes Weiner
2025-11-26 19:22 ` Chris Li
2025-11-26 21:52 ` Rik van Riel
2025-11-27 1:52 ` Chris Li
2025-11-27 2:26 ` Rik van Riel
2025-11-27 19:09 ` Chris Li
2025-11-28 20:46 ` Nhat Pham
2025-11-29 20:38 ` Chris Li
2025-12-01 16:43 ` Johannes Weiner
2025-12-01 19:49 ` Kairui Song
2025-12-02 17:02 ` Johannes Weiner
2025-12-02 20:48 ` Chris Li
2025-12-01 20:21 ` Barry Song
2025-12-02 19:58 ` Chris Li
2025-12-01 23:37 ` Nhat Pham
2025-12-02 19:18 ` Chris Li
2025-12-02 18:18 ` Nhat Pham
2025-12-02 21:07 ` Chris Li
2025-11-24 19:32 ` Yosry Ahmed
2025-11-24 20:24 ` Nhat Pham
2025-11-25 18:50 ` Chris Li [this message]
2025-11-26 21:58 ` Rik van Riel
2025-11-27 2:07 ` Chris Li
2025-11-27 2:34 ` Rik van Riel
2025-11-25 18:14 ` Chris Li
2025-11-25 18:55 ` Johannes Weiner
2025-11-21 15:14 ` Yosry Ahmed
2025-11-22 1:52 ` Chris Li
2025-11-24 14:57 ` Nhat Pham
2025-11-22 9:59 ` Kairui Song
2025-11-22 13:58 ` Baoquan He
2025-12-02 2:56 ` Barry Song
2025-12-02 6:31 ` Baoquan He
2025-12-02 17:53 ` Nhat Pham
2025-12-02 21:01 ` Chris Li
2025-12-03 8:37 ` Yosry Ahmed
2025-12-03 20:02 ` Chris Li
2025-12-04 6:16 ` Yosry Ahmed
2025-12-04 10:11 ` Chris Li
2025-12-04 20:55 ` Yosry Ahmed
2025-12-05 8:56 ` Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACePvbUB8wjEH1rsaRs+LwX4RKGrjLFSGzjhgrdS3e7Lcz6BeQ@mail.gmail.com \
--to=chrisl@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=chengming.zhou@linux.dev \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=pratmal@google.com \
--cc=riel@surriel.com \
--cc=shikemeng@huaweicloud.com \
--cc=sweettea@google.com \
--cc=weixugc@google.com \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox