Re: [LSF/MM/BPF TOPIC] Large folio (z)swapin

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yosry Ahmed <yosryahmed@google.com>
To: Usama Arif <usamaarif642@gmail.com>
Cc: lsf-pc@lists.linux-foundation.org,
	 Linux Memory Management List <linux-mm@kvack.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Barry Song <21cnbao@gmail.com>,
	 Shakeel Butt <shakeel.butt@linux.dev>
Subject: Re: [LSF/MM/BPF TOPIC] Large folio (z)swapin
Date: Thu, 9 Jan 2025 13:34:25 -0800	[thread overview]
Message-ID: <CAJD7tkZHZ+9A8mdo+=5umfh7-=AqHNKq2rr4kjw0+-27QkT6Gw@mail.gmail.com> (raw)
In-Reply-To: <58716200-fd10-4487-aed3-607a10e9fdd0@gmail.com>

On Thu, Jan 9, 2025 at 12:06 PM Usama Arif <usamaarif642@gmail.com> wrote:
>
> I would like to propose a session to discuss the work going on
> around large folio swapin, whether its traditional swap or
> zswap or zram.
>
> Large folios have obvious advantages that have been discussed before
> like fewer page faults, batched PTE and rmap manipulation, reduced
> lru list, TLB coalescing (for arm64 and amd).
> However, swapping in large folios has its own drawbacks like higher
> swap thrashing.
> I had initially sent a RFC of zswapin of large folios in [1]
> but it causes a regression due to swap thrashing in kernel
> build time, which I am confident is happening with zram large
> folio swapin as well (which is merged in kernel).

I am obviously interested in this discussion, but unfortunately I
won't be able to make it this year. I will try to attend remotely
though if possible!

>
> Some of the points we could discuss in the session:
>
> - What is the right (preferably open source) benchmark to test for
> swapin of large folios? kernel build time in limited
> memory cgroup shows a regression, microbenchmarks show a massive
> improvement, maybe there are benchmarks where TLB misses is
> a big factor and show an improvement.
>
> - We could have something like
> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/swapin_enabled
> to enable/disable swapin but its going to be difficult to tune, might
> have different optimum values based on workloads and are likely to be
> left at their default values. Is there some dynamic way to decide when
> to swapin large folios and when to fallback to smaller folios?
> swapin_readahead swapcache path which only supports 4K folios atm has a
> read ahead window based on hits, however readahead is a folio flag and
> not a page flag, so this method can't be used as once a large folio
> is swapped in, we won't get a fault and subsequent hits on other
> pages of the large folio won't be recorded.
>
> - For zswap and zram, it might be that doing larger block compression/
> decompression might offset the regression from swap thrashing, but it
> brings about its own issues. For e.g. once a large folio is swapped
> out, it could fail to swapin as a large folio and fallback
> to 4K, resulting in redundant decompressions.
> This will also mean swapin of large folios from traditional swap
> isn't something we should proceed with?
>
> - Should we even support large folio swapin? You often have high swap
> activity when the system/cgroup is close to running out of memory, at this
> point, maybe the best way forward is to just swapin 4K pages and let
> khugepaged [2], [3] collapse them if the surrounding pages are swapped in
> as well.
>
> [1] https://lore.kernel.org/all/20241018105026.2521366-1-usamaarif642@gmail.com/
> [2] https://lore.kernel.org/all/20250108233128.14484-1-npache@redhat.com/
> [3] https://lore.kernel.org/lkml/20241216165105.56185-1-dev.jain@arm.com/
>
> Thanks,
> Usama

next prev parent reply	other threads:[~2025-01-09 21:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-09 20:06 Usama Arif
2025-01-09 21:34 ` Yosry Ahmed [this message]
2025-01-10  4:29 ` Nhat Pham
2025-01-10 10:28   ` Barry Song
2025-01-11 10:52   ` Zhu Yanjun
2025-01-10 10:09 ` Barry Song
2025-01-10 10:26   ` Usama Arif
2025-01-10 10:30     ` Barry Song
2025-01-10 10:40       ` Usama Arif
2025-01-10 10:47         ` Barry Song
2025-01-12 10:49   ` Barry Song
2025-01-13  3:16 ` Chuanhua Han
2025-01-28  8:17 ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJD7tkZHZ+9A8mdo+=5umfh7-=AqHNKq2rr4kjw0+-27QkT6Gw@mail.gmail.com' \
    --to=yosryahmed@google.com \
    --cc=21cnbao@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=shakeel.butt@linux.dev \
    --cc=usamaarif642@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox