From: Kairui Song <ryncsn@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kairui Song via B4 Relay <devnull+kasong.tencent.com@kernel.org>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Zi Yan <ziy@nvidia.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Barry Song <baohua@kernel.org>, Hugh Dickins <hughd@google.com>,
Chris Li <chrisl@kernel.org>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
Yosry Ahmed <yosry.ahmed@linux.dev>,
Youngjun Park <youngjun.park@lge.com>,
Chengming Zhou <chengming.zhou@linux.dev>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Qi Zheng <zhengqi.arch@bytedance.com>,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org
Subject: Re: [PATCH RFC 00/15] mm, swap: swap table phase IV with dynamic ghost swapfile
Date: Tue, 24 Feb 2026 10:10:42 +0800 [thread overview]
Message-ID: <CAMgjq7AyL4=cN1mQ=i56j-kOvEaZXyT-3Wu063vM5JijXcFDLg@mail.gmail.com> (raw)
In-Reply-To: <aZyFxKGXc8J6PIij@cmpxchg.org>
On Tue, Feb 24, 2026 at 1:00 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Fri, Feb 20, 2026 at 07:42:01AM +0800, Kairui Song via B4 Relay wrote:
> > - 8 bytes per slot memory usage, when using only plain swap.
> > - And the memory usage can be reduced to 3 or only 1 byte.
> > - 16 bytes per slot memory usage, when using ghost / virtual zswap.
> > - Zswap can just use ci_dyn->virtual_table to free up it's content
> > completely.
> > - And the memory usage can be reduced to 11 or 8 bytes using the same
> > code above.
> > - 24 bytes only if including reverse mapping is in use.
>
> That seems to tie us pretty permanently to duplicate metadata.
>
> For every page that was written to disk through zswap, we have an
> entry in the ghost swapfile, and an entry in the backend swapfile, no?
No, only one entry in the ghost swapfile (xswap or virtual swap file,
anyway it's just a name). The one in the physical swap is a reverse
mapping entry, it tells which slot in the ghost swapfile is pointing
to the physical slot, so swapoff / migration of the physical slot can
be done in O(1) time.
So, zero duplicate of any data.
>
> > - Minimal code review or maintenance burden. All layers are using the exact
> > same infrastructure for metadata / allocation / synchronization, making
> > all API and conventions consistent and easy to maintain.
> > - Writeback, migration and compaction are easily supportable since both
> > reverse mapping and reallocation are prepared. We just need a
> > folio_realloc_swap to allocate new entries for the existing entry, and
> > fill the swap table with a reserve map entry.
> > - Fast swapoff: Just read into ghost / virtual swap cache.
>
> Can we get this for disk swap as well? ;)
>
> Zswap swapoff is already fairly fast, albeit CPU intense. It's the
> scattered IO that makes swapoff on disks so terrible.
I am talking about disk swap here, not zswap. Swapoff of a physical
entry just loads the swap data in the virtual slot according to the
reverse mapping entry.
> > free -m
> > total used free shared buff/cache available
> > Mem: 1465 250 927 1 356 1215
> > Swap: 15269887 0 15269887
>
> I'm not a fan of this. This makes free(1) output kind of useless, and
> very misleading. The swap space presented here has nothing to do with
> actual swap capacity, and the actual disk swap capacity is obscured.
>
> And how would a user choose this size? How would a distribution?
It can be dynamic (just si->max += 2M on every cluster allocation
since it's really just a number now). Can be hidden, and can have an
infinite size. That's just an interface design that can be flexibly
changed.
For example if we just set this to a super large value and hide it, it
will look identical to vss from userspace perspect, but stay optional
and zero overhead for existing ZRAM or plain swap users.
> The only limit is compression ratio, and you don't know this in
> advance. This restriction seems pretty arbitrary and avoidable.
Just as a reference: In practice we limit our ZRAM setup to 1/4 or 1:1
of the total RAM to avoid the machine goto endless reclaim and never
go OOM.
But we can also have an infinite size ZSWAP now, with this series.
next prev parent reply other threads:[~2026-02-24 2:11 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-19 23:42 Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 01/15] mm: move thp_limit_gfp_mask to header Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 02/15] mm, swap: simplify swap_cache_alloc_folio Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 03/15] mm, swap: move conflict checking logic of out swap cache adding Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 04/15] mm, swap: add support for large order folios in swap cache directly Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 05/15] mm, swap: unify large folio allocation Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 06/15] memcg, swap: reparent the swap entry on swapin if swapout cgroup is dead Kairui Song via B4 Relay
2026-02-23 16:22 ` Johannes Weiner
2026-02-24 5:44 ` Shakeel Butt
2026-02-19 23:42 ` [PATCH RFC 07/15] memcg, swap: defer the recording of memcg info and reparent flexibly Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 08/15] mm, swap: store and check memcg info in the swap table Kairui Song via B4 Relay
2026-02-23 16:36 ` Johannes Weiner
2026-02-19 23:42 ` [PATCH RFC 09/15] mm, swap: support flexible batch freeing of slots in different memcg Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 10/15] mm, swap: always retrieve memcg id from swap table Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 11/15] mm/swap, memcg: remove swap cgroup array Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 12/15] mm, swap: merge zeromap into swap table Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 13/15] mm: ghost swapfile support for zswap Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 14/15] mm, swap: add a special device for ghost swap setup Kairui Song via B4 Relay
2026-02-19 23:42 ` [PATCH RFC 15/15] mm, swap: allocate cluster dynamically for ghost swapfile Kairui Song via B4 Relay
2026-02-21 8:15 ` [PATCH RFC 00/15] mm, swap: swap table phase IV with dynamic " Barry Song
2026-02-21 9:07 ` Kairui Song
2026-02-21 9:30 ` Barry Song
2026-02-23 16:52 ` Johannes Weiner
2026-02-24 2:10 ` Kairui Song [this message]
2026-02-23 18:22 ` Nhat Pham
2026-02-24 3:34 ` Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAMgjq7AyL4=cN1mQ=i56j-kOvEaZXyT-3Wu063vM5JijXcFDLg@mail.gmail.com' \
--to=ryncsn@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=cgroups@vger.kernel.org \
--cc=chengming.zhou@linux.dev \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=devnull+kasong.tencent.com@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=shikemeng@huaweicloud.com \
--cc=yosry.ahmed@linux.dev \
--cc=youngjun.park@lge.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox