linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] Large folio (z)swapin
@ 2025-01-09 20:06 Usama Arif
  2025-01-09 21:34 ` Yosry Ahmed
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Usama Arif @ 2025-01-09 20:06 UTC (permalink / raw)
  To: lsf-pc, Linux Memory Management List
  Cc: Johannes Weiner, Barry Song, Yosry Ahmed, Shakeel Butt

I would like to propose a session to discuss the work going on
around large folio swapin, whether its traditional swap or
zswap or zram.

Large folios have obvious advantages that have been discussed before
like fewer page faults, batched PTE and rmap manipulation, reduced
lru list, TLB coalescing (for arm64 and amd).
However, swapping in large folios has its own drawbacks like higher
swap thrashing.
I had initially sent a RFC of zswapin of large folios in [1]
but it causes a regression due to swap thrashing in kernel
build time, which I am confident is happening with zram large
folio swapin as well (which is merged in kernel).

Some of the points we could discuss in the session:

- What is the right (preferably open source) benchmark to test for
swapin of large folios? kernel build time in limited
memory cgroup shows a regression, microbenchmarks show a massive
improvement, maybe there are benchmarks where TLB misses is
a big factor and show an improvement.

- We could have something like
/sys/kernel/mm/transparent_hugepage/hugepages-*kB/swapin_enabled
to enable/disable swapin but its going to be difficult to tune, might
have different optimum values based on workloads and are likely to be
left at their default values. Is there some dynamic way to decide when
to swapin large folios and when to fallback to smaller folios?
swapin_readahead swapcache path which only supports 4K folios atm has a
read ahead window based on hits, however readahead is a folio flag and
not a page flag, so this method can't be used as once a large folio
is swapped in, we won't get a fault and subsequent hits on other
pages of the large folio won't be recorded.

- For zswap and zram, it might be that doing larger block compression/
decompression might offset the regression from swap thrashing, but it
brings about its own issues. For e.g. once a large folio is swapped
out, it could fail to swapin as a large folio and fallback
to 4K, resulting in redundant decompressions.
This will also mean swapin of large folios from traditional swap
isn't something we should proceed with?

- Should we even support large folio swapin? You often have high swap
activity when the system/cgroup is close to running out of memory, at this
point, maybe the best way forward is to just swapin 4K pages and let
khugepaged [2], [3] collapse them if the surrounding pages are swapped in
as well.

[1] https://lore.kernel.org/all/20241018105026.2521366-1-usamaarif642@gmail.com/
[2] https://lore.kernel.org/all/20250108233128.14484-1-npache@redhat.com/
[3] https://lore.kernel.org/lkml/20241216165105.56185-1-dev.jain@arm.com/

Thanks,
Usama


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-01-28  8:17 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-09 20:06 [LSF/MM/BPF TOPIC] Large folio (z)swapin Usama Arif
2025-01-09 21:34 ` Yosry Ahmed
2025-01-10  4:29 ` Nhat Pham
2025-01-10 10:28   ` Barry Song
2025-01-11 10:52   ` Zhu Yanjun
2025-01-10 10:09 ` Barry Song
2025-01-10 10:26   ` Usama Arif
2025-01-10 10:30     ` Barry Song
2025-01-10 10:40       ` Usama Arif
2025-01-10 10:47         ` Barry Song
2025-01-12 10:49   ` Barry Song
2025-01-13  3:16 ` Chuanhua Han
2025-01-28  8:17 ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox