From: YoungJun Park <youngjun.park@lge.com>
To: Pedro Falcato <pfalcato@suse.de>
Cc: Chris Li <chrisl@kernel.org>,
Christoph Hellwig <hch@infradead.org>,
lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
nphamcs@gmail.com, bhe@redhat.com, taejoon.song@lge.com,
ryncsn@gmail.com
Subject: Re: [LSF/MM/BPF TOPIC] Flash Friendly Swap
Date: Tue, 24 Feb 2026 13:02:22 +0900 [thread overview]
Message-ID: <aZ0izpnK+QMqxYbM@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <aZ0L48fzXzC9IOfj@yjaykim-PowerEdge-T330>
On Tue, Feb 24, 2026 at 11:24:35AM +0900, YoungJun Park wrote:
> On Mon, Feb 23, 2026 at 06:53:12PM +0000, Pedro Falcato wrote:
> > On Mon, Feb 23, 2026 at 10:15:14AM -0800, Chris Li wrote:
> > > On Mon, Feb 23, 2026 at 5:23 AM Christoph Hellwig <hch@infradead.org> wrote:
> > > >
> > > > On Fri, Feb 20, 2026 at 03:47:18PM -0800, Chris Li wrote:
> > > > > Hi Christoph,
> > > > >
> > > > > On Fri, Feb 20, 2026 at 8:22 AM Christoph Hellwig <hch@infradead.org> wrote:
> > > > > >
> > > > > > Honestly, I think always writing sequentially when swapping and
> > > > > > reclaiming in lumps (I'd call them "zones" :)) is probably the best
> > > > > > idea. Even for the these days unlikely case of swapping to HDD it
> > > > >
> > > > > For the flash device with FTL, the location of the data written is
> > > > > most likely logical anyway. The flash devices tend to group the new
> > > > > data internally to the same erase block together even when they are
> > > > > discontinuous from the block device point of view.
> > > >
> > > > Yes, but that's not the point..
> > > >
> > > > > It is easy to write
> > > > > out sequentially when the swap device is mostly empty. That is how the
> > > > > cluster allocator does currently any way. However, the tricky part is
> > > > > what when some random 4K blocks get swapped in, that will create holes
> > > > > on both the swap device and internal write out data. Very quickly the
> > > > > free cluster on swap devices will get all used up and that you will
> > > > > not be able to write out sequentially any more. The FTL layer
> > > > > internally wants to GC those holes to create a large empty erase
> > > > > block. I do see where to pick up the next write location can have a
> > > > > huge impact on the flash internal GC behavior and write amplification
> > > > > factor.
> > > >
> > > > And that is the point. The FTL will always do a bad job with these work
> > > > loads. You should not do overwrites, and can do much better
> > >
> > > I am not sure I understand "You should not do overwrites". Can you
> > > help clarify it for me? Let say we always prefer to the write to new
> > > clusters while some swap entries has been free. What happen we run out
> > > of new cluster to write? Wouldn't we be forced to overwrite the
> > > previous free swap location? It seems to me the "overwrite" is
> > > un-avoidable if you keep swapping in and out. That is the part I am
> > > missing.
> >
> > See log-structured fileystems. I suspect that's close to what we want for flash
> > storage swap.
> >
> > Also, FWIW: the cloud vendors have fake SSDs that while have negligible seek
> > latency, have extremely low IOPS values (e.g AWS gp2 can do 100 IOPS on its
> > base setting, and scales up to 16K IOPS. gp3 can do 3000 up to 80K on the
> > maximum size). I suspect swapping on these is a huge slog, and we would also
> > like to write out as much sequentially as we can here (though I hope no one
> > is *actually* swapping on these things). Also mechanical drives. Log-structured
> > filesystems were originally invented for these too :)
>
> +CC Nhat Pham, He Baoquan, Taejoon
>
> Hi Pedro,
>
> The motivation is indeed similar to that of log-structured filesystems, and it
> employs a similar management mechanism.
>
> That is why I thought a management style similar to filesystems might be
> necessary at the swap layer as well (the swap abstraction layer mentioned in
> the proposal document).
>
> Previously, the direction for upstreaming our solution was somewhat ambiguous,
> so we have been maintaining it privately for several years.
>
> However, recently, I would like to discuss how to proceed with upstreaming in
> the context of Baoquan's "swap_ops and pluggable swap backend"
> (https://lore.kernel.org/linux-mm/aZiFvzlBJiYBUDre@MiWiFi-R3L-srv/) and
> Nhat's "Virtual Swap Space"
> (https://lore.kernel.org/linux-mm/20260208215839.87595-1-nphamcs@gmail.com/).
>
> Best regards
> Youngjun Park
+CC Kairui
Oops, I missed adding the discussion involving Kairui (CC'd). This is also
a direction currently being discussed:
https://lore.kernel.org/linux-mm/CAMgjq7D6n0H2=di0SrMQbJ48cVeKhGeQMH_mY0y-au4OJbE2GQ@mail.gmail.com/T/#m2feb4489b29075136169ff3efd28dc365062f66a
I hope our proposal can be considered or aligned with these ongoing
discussions.
next prev parent reply other threads:[~2026-02-24 4:02 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-18 12:46 YoungJun Park
2026-02-20 16:22 ` Christoph Hellwig
2026-02-20 23:47 ` Chris Li
2026-02-23 13:23 ` Christoph Hellwig
2026-02-23 18:15 ` Chris Li
2026-02-23 18:53 ` Pedro Falcato
2026-02-24 2:24 ` YoungJun Park
2026-02-24 4:02 ` YoungJun Park [this message]
2026-02-24 2:15 ` YoungJun Park
2026-02-24 2:08 ` YoungJun Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aZ0izpnK+QMqxYbM@yjaykim-PowerEdge-T330 \
--to=youngjun.park@lge.com \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=hch@infradead.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=nphamcs@gmail.com \
--cc=pfalcato@suse.de \
--cc=ryncsn@gmail.com \
--cc=taejoon.song@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox