From: Nhat Pham <nphamcs@gmail.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: Wenchao Hao <haowenchao22@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Chengming Zhou <chengming.zhou@linux.dev>,
Jens Axboe <axboe@kernel.dk>,
Johannes Weiner <hannes@cmpxchg.org>,
Minchan Kim <minchan@kernel.org>,
Sergey Senozhatsky <senozhatsky@chromium.org>,
Yosry Ahmed <yosry@kernel.org>,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, Barry Song <baohua@kernel.org>,
Xueyuan Chen <xueyuan.chen21@gmail.com>,
Wenchao Hao <haowenchao@xiaomi.com>
Subject: Re: [RFC PATCH v2 0/4] mm/zsmalloc: reduce zs_free() latency on swap release path
Date: Tue, 21 Apr 2026 11:25:17 -0700 [thread overview]
Message-ID: <CAKEwX=P6L=CeJCXkC+cKbZKYk_ORho3pLy476c8=BbDianx=vg@mail.gmail.com> (raw)
In-Reply-To: <CAKEwX=PkFiP+u+ThrzjTKBi+usQf2uuhTZcfB2BNNA8RboOFDQ@mail.gmail.com>
On Tue, Apr 21, 2026 at 11:07 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> On Tue, Apr 21, 2026 at 10:18 AM Kairui Song <ryncsn@gmail.com> wrote:
> >
> > On Tue, Apr 21, 2026 at 11:55 PM Nhat Pham <nphamcs@gmail.com> wrote:
> > >
> >
> > Thanks for adding me to the Cc list :), Barry started this idea with
> > ZRAM, which looks very interesting to me.
> >
> > > On Tue, Apr 21, 2026 at 5:16 AM Wenchao Hao <haowenchao22@gmail.com> wrote:
> > > >
> > > > Swap freeing can be expensive when unmapping a VMA containing
> > > > many swap entries. This has been reported to significantly
> > > > delay memory reclamation during Android's low-memory killing,
> > > > especially when multiple processes are terminated to free
> > > > memory, with slot_free() accounting for more than 80% of
> > > > the total cost of freeing swap entries.
> > > >
> > > > Two earlier attempts by Lei and Zhiguo added a new thread in the mm core
> > > > to asynchronously collect and free swap entries [1][2], but the
> > > > design itself is fairly complex.
> > > >
> > > > When anon folios and swap entries are mixed within a
> > > > process, reclaiming anon folios from killed processes
> > > > helps return memory to the system as quickly as possible,
> > > > so that newly launched applications can satisfy their
> > > > memory demands. It is not ideal for swap freeing to block
> > > > anon folio freeing. On the other hand, swap freeing can
> > > > still return memory to the system, although at a slower
> > > > rate due to memory compression.
> > >
> > > Is this correct? I don't think we do decompression in
> > > zswap_invalidate() path. We do decompression in zswap_load(), but as a
> > > separate step from zswap_invalidate().
> >
> > It's not about decompression. I think what Wenchao means here is that:
> > freeing the swap entry also releases the backing compression data, but
> > compared to freeing an actual folio (which bring back a free folio to
> > reduce memory pressure), you may need to free a lot of swap entries to
> > free one whole folio, because the compressed data could be much
> > smaller than folio and with fragmentation. And swap entry freeing is
> > still not fast enough to be ignored.
>
> Ah I see yeah. That's the not "as much bang-for-your-buck" as folio
> freeing category. I agree on this point.
>
> >
> > >
> > > zswap/zsmalloc entry freeing is decoupled from decompression. For
> > > example, on process teardown, we free the zsmalloc memory but never
> > > decompress (if we do then it's a bug to be fixed lol, but I doubt it).
> > >
> > > Zsmalloc freeing might not be worth as much bang-for-your-buck wise
> > > compared to anon folio freeing, but if it's "expensive", then I think
> > > that points to a different root-cause: zsmalloc's poor scalability in
> > > the free path.
> >
> > That's a very nice insight. I had an idea previously that can we have
> > something like a zs free bulk? Freeing handles one by one does seem
> > expensive.
> > https://lore.kernel.org/linux-mm/adt3Q_SRToF6fb3W@KASONG-MC4/
> >
> > It might be tricky to do so though.
> >
> > It will be best if we can speed up everything, doing things async
> > doesn't reduce the total amount of work, and might cause more trouble
> > like worker overhead or delayed freeing causing more memory pressure,
> > if the workqueue didn't run in time. Or maybe a process is almost
> > completely swapped out, then this won't help at all.
> >
> > I'm not against the async idea, they might combine well.
>
> Completely agree! I was thinking about batching the free operations
> for zsmalloc. Right now seems like even if we have a contiguous range
> of swap slots to be freed, we call one
> zram_slot_free_notify/zswap_invalidate at a time, which then call
> zs_free one at a time? I wonder if there's any batching opportunity
> here. Might be complicated with the pool lock and class lock dance in
> zs_free() though :)
>
> And yeah the async stuff is orthogonal too.
>
> >
> > >
> > > I've stared at this code path for a bit, because my other patch series
> > > (vswap - see [1]) was reported to display regression on the free path
> > > on the usemem benchmark. And one of the issues was the contention
> > > between compaction (both systemwide compaction, i.e zs_page_migrate,
> > > and zsmalloc's internal compaction, but mostly the former).:
> > >
> > > * zs_free read-acquires pool->lock, and compaction write-acquires the
> > > same lock. So the compaction thread will make all zs free-ers wait for
> > > it. I saw this read lock delay when I perfed the free step of usemem.
> > >
> > > * If this lock has fair queue-ing semantics (I have not checked), then
> > > if there a compaction is behind a bunch of zs_free in the queue, then
> > > all the subsequent zs_free's ers are blocked :)
> > >
> > > * I'm also curious about cache-friendliness of this rwlock, bouncing
> > > across CPUs, if you have multiple processes being torn down
> > > concurrently.
> >
> > That's interesting, when I mentioned zs free bulk I was thinking that,
> > if we have a percpu queue, at least we may try read lock that on every
> > enqueue, free the whole queue if successful, then release the lock.
> > I'm sure there are more ways to optimize that, just a random idea :)
>
> Yep! Would be nice to have some perf trace to pinpoint where the overhead is.
>
Ah OK - I found this thread now:
https://lore.kernel.org/linux-mm/20260414054930.225853-1-xueyuan.chen21@gmail.com/
Hmm, free_zspage() and kmem_cache_free().
* kmem_cache_free() is just handle freeing. Bulk-freeing?
* free_zspage() looks like just ordinary teardown work :( Seems like
we're not spinning any lock here - we just try lock the backing pages,
and the rest is normal work. Not sure how to optimize this - perhaps
deferring is the only way.
prev parent reply other threads:[~2026-04-21 18:25 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 12:16 Wenchao Hao
2026-04-21 12:16 ` [RFC PATCH v2 1/4] mm:zsmalloc: drop class lock before freeing zspage Wenchao Hao
2026-04-21 12:16 ` [RFC PATCH v2 2/4] mm/zsmalloc: introduce zs_free_deferred() for async handle freeing Wenchao Hao
2026-04-21 19:46 ` Nhat Pham
2026-04-21 12:16 ` [RFC PATCH v2 3/4] zram: defer zs_free() in swap slot free notification path Wenchao Hao
2026-04-21 12:16 ` [RFC PATCH v2 4/4] mm/zswap: defer zs_free() in zswap_invalidate() path Wenchao Hao
2026-04-21 17:03 ` Nhat Pham
2026-04-21 15:54 ` [RFC PATCH v2 0/4] mm/zsmalloc: reduce zs_free() latency on swap release path Nhat Pham
2026-04-21 17:17 ` Kairui Song
2026-04-21 18:07 ` Nhat Pham
2026-04-21 18:25 ` Nhat Pham [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAKEwX=P6L=CeJCXkC+cKbZKYk_ORho3pLy476c8=BbDianx=vg@mail.gmail.com' \
--to=nphamcs@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=baohua@kernel.org \
--cc=chengming.zhou@linux.dev \
--cc=hannes@cmpxchg.org \
--cc=haowenchao22@gmail.com \
--cc=haowenchao@xiaomi.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=ryncsn@gmail.com \
--cc=senozhatsky@chromium.org \
--cc=xueyuan.chen21@gmail.com \
--cc=yosry@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox