linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Barry Song <21cnbao@gmail.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	 Baoquan He <bhe@redhat.com>, Chris Li <chrisl@kernel.org>,
	Nhat Pham <nphamcs@gmail.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Yosry Ahmed <yosry.ahmed@linux.dev>,
	 David Hildenbrand <david@redhat.com>,
	Youngjun Park <youngjun.park@lge.com>,
	 Hugh Dickins <hughd@google.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	 "Huang, Ying" <ying.huang@linux.alibaba.com>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	 Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	 "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	linux-kernel@vger.kernel.org,  Kairui Song <kasong@tencent.com>
Subject: Re: [PATCH 03/19] mm, swap: never bypass the swap cache even for SWP_SYNCHRONOUS_IO
Date: Tue, 4 Nov 2025 11:47:35 +0800	[thread overview]
Message-ID: <CAGsJ_4xsUwUH_VyeYaXHURqTS66Fbuxa00GTM5izwK-=Vg_20g@mail.gmail.com> (raw)
In-Reply-To: <20251029-swap-table-p2-v1-3-3d43f3b6ec32@tencent.com>

On Wed, Oct 29, 2025 at 11:59 PM Kairui Song <ryncsn@gmail.com> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> Now the overhead of the swap cache is trivial, bypassing the swap
> cache is no longer a valid optimization. So unify the swapin path using
> the swap cache. This changes the swap in behavior in multiple ways:
>
> We used to rely on `SWP_SYNCHRONOUS_IO && __swap_count(entry) == 1` as
> the indicator to bypass both the swap cache and readahead. The swap
> count check is not a good indicator for readahead. It existed because
> the previously swap design made readahead strictly coupled with swap
> cache bypassing. We actually want to always bypass readahead for
> SWP_SYNCHRONOUS_IO devices even if swap count > 1, But bypassing the
> swap cache will cause redundant IO.

I suppose it’s not only redundant I/O, but also causes additional memory
copies, as each swap-in allocates a new folio. Using swapcache allows the
folio to be shared instead?

>
> Now that limitation is gone, with the new introduced helpers and design,
> we will always swap cache, so this check can be simplified to check
> SWP_SYNCHRONOUS_IO only, effectively disabling readahead for all
> SWP_SYNCHRONOUS_IO cases, this is a huge win for many workloads.
>
> The second thing here is that this enabled a large swap for all swap
> entries on SWP_SYNCHRONOUS_IO devices. Previously, the large swap in is
> also coupled with swap cache bypassing, and so the count checking side
> effect also makes large swap in less effective. Now this is also fixed.
> We will always have a large swap in support for all SWP_SYNCHRONOUS_IO
> cases.
>

In your cover letter, you mentioned: “it’s especially better for workloads
with swap count > 1 on SYNC_IO devices, about ~20% gain in the above test.”
Is this improvement mainly from mTHP swap-in?


> And to catch potential issues with large swap in, especially with page
> exclusiveness and swap cache, more debug sanity checks and comments are
> added. But overall, the code is simpler. And new helper and routines
> will be used by other components in later commits too. And now it's
> possible to rely on the swap cache layer for resolving synchronization
> issues, which will also be done by a later commit.
>
> Worth mentioning that for a large folio workload, this may cause more
> serious thrashing. This isn't a problem with this commit, but a generic
> large folio issue. For a 4K workload, this commit increases the
> performance.
>
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
>  mm/memory.c     | 136 +++++++++++++++++++++-----------------------------------
>  mm/swap.h       |   6 +++
>  mm/swap_state.c |  27 +++++++++++
>  3 files changed, 84 insertions(+), 85 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 4c3a7e09a159..9a43d4811781 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4613,7 +4613,15 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
>  }
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> -static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq);
> +/* Sanity check that a folio is fully exclusive */
> +static void check_swap_exclusive(struct folio *folio, swp_entry_t entry,
> +                                unsigned int nr_pages)
> +{
> +       do {
> +               VM_WARN_ON_ONCE_FOLIO(__swap_count(entry) != 1, folio);
> +               entry.val++;
> +       } while (--nr_pages);
> +}
>
>  /*
>   * We enter with non-exclusive mmap_lock (to exclude vma changes,
> @@ -4626,17 +4634,14 @@ static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq);
>  vm_fault_t do_swap_page(struct vm_fault *vmf)
>  {
>         struct vm_area_struct *vma = vmf->vma;
> -       struct folio *swapcache, *folio = NULL;
> -       DECLARE_WAITQUEUE(wait, current);
> +       struct folio *swapcache = NULL, *folio;
>         struct page *page;
>         struct swap_info_struct *si = NULL;
>         rmap_t rmap_flags = RMAP_NONE;
> -       bool need_clear_cache = false;
>         bool exclusive = false;
>         swp_entry_t entry;
>         pte_t pte;
>         vm_fault_t ret = 0;
> -       void *shadow = NULL;
>         int nr_pages;
>         unsigned long page_idx;
>         unsigned long address;
> @@ -4707,57 +4712,21 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>         folio = swap_cache_get_folio(entry);
>         if (folio)
>                 swap_update_readahead(folio, vma, vmf->address);
> -       swapcache = folio;
> -

I wonder if we should move swap_update_readahead() elsewhere. Since for
sync IO you’ve completely dropped readahead, why do we still need to call
update_readahead()?

Thanks
Barry


  reply	other threads:[~2025-11-04  3:47 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-29 15:58 [PATCH 00/19] mm, swap: never bypass swap cache and cleanup flags (swap table phase II) Kairui Song
2025-10-29 15:58 ` [PATCH 01/19] mm/swap: rename __read_swap_cache_async to swap_cache_alloc_folio Kairui Song
2025-10-30 22:53   ` Yosry Ahmed
2025-11-03  8:28     ` Barry Song
2025-11-03  9:02       ` Kairui Song
2025-11-03  9:10         ` Barry Song
2025-11-03 16:50         ` Yosry Ahmed
2025-10-29 15:58 ` [PATCH 02/19] mm, swap: split swap cache preparation loop into a standalone helper Kairui Song
2025-10-29 15:58 ` [PATCH 03/19] mm, swap: never bypass the swap cache even for SWP_SYNCHRONOUS_IO Kairui Song
2025-11-04  3:47   ` Barry Song [this message]
2025-11-04 10:44     ` Kairui Song
2025-10-29 15:58 ` [PATCH 04/19] mm, swap: always try to free swap cache for SWP_SYNCHRONOUS_IO devices Kairui Song
2025-11-04  4:19   ` Barry Song
2025-11-04  8:26     ` Barry Song
2025-11-04 10:55       ` Kairui Song
2025-10-29 15:58 ` [PATCH 05/19] mm, swap: simplify the code and reduce indention Kairui Song
2025-10-29 15:58 ` [PATCH 06/19] mm, swap: free the swap cache after folio is mapped Kairui Song
2025-11-04  9:14   ` Barry Song
2025-11-04 10:50     ` Kairui Song
2025-11-04 19:52       ` Barry Song
2025-10-29 15:58 ` [PATCH 07/19] mm/shmem: never bypass the swap cache for SWP_SYNCHRONOUS_IO Kairui Song
2025-10-29 15:58 ` [PATCH 08/19] mm/shmem, swap: remove SWAP_MAP_SHMEM Kairui Song
2025-10-29 15:58 ` [PATCH 09/19] mm, swap: swap entry of a bad slot should not be considered as swapped out Kairui Song
2025-10-29 15:58 ` [PATCH 10/19] mm, swap: consolidate cluster reclaim and check logic Kairui Song
2025-10-31  5:25   ` YoungJun Park
2025-10-31  7:11     ` Kairui Song
2025-10-29 15:58 ` [PATCH 11/19] mm, swap: split locked entry duplicating into a standalone helper Kairui Song
2025-10-29 15:58 ` [PATCH 12/19] mm, swap: use swap cache as the swap in synchronize layer Kairui Song
2025-10-29 19:25   ` kernel test robot
2025-10-29 15:58 ` [PATCH 13/19] mm, swap: remove workaround for unsynchronized swap map cache state Kairui Song
2025-11-07  3:07   ` Barry Song
2025-11-09 14:18     ` Kairui Song
2025-11-10  7:21       ` Barry Song
2025-11-16 16:01         ` Kairui Song
2025-10-29 15:58 ` [PATCH 14/19] mm, swap: sanitize swap entry management workflow Kairui Song
2025-10-29 19:25   ` kernel test robot
2025-10-30  5:25     ` Kairui Song
2025-10-29 19:25   ` kernel test robot
2025-11-01  4:51   ` YoungJun Park
2025-11-01  8:59     ` Kairui Song
2025-11-01  9:08       ` YoungJun Park
2025-10-29 15:58 ` [PATCH 15/19] mm, swap: add folio to swap cache directly on allocation Kairui Song
2025-10-29 16:52   ` Kairui Song
2025-10-31  5:56   ` YoungJun Park
2025-10-31  7:02     ` Kairui Song
2025-10-29 15:58 ` [PATCH 16/19] mm, swap: check swap table directly for checking cache Kairui Song
2025-11-06 21:02   ` Barry Song
2025-11-07  3:13     ` Kairui Song
2025-10-29 15:58 ` [PATCH 17/19] mm, swap: clean up and improve swap entries freeing Kairui Song
2025-10-29 15:58 ` [PATCH 18/19] mm, swap: drop the SWAP_HAS_CACHE flag Kairui Song
2025-10-29 15:58 ` [PATCH 19/19] mm, swap: remove no longer needed _swap_info_get Kairui Song
2025-10-30 23:04 ` [PATCH 00/19] mm, swap: never bypass swap cache and cleanup flags (swap table phase II) Yosry Ahmed
2025-10-31  6:58   ` Kairui Song
2025-11-05  7:39 ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGsJ_4xsUwUH_VyeYaXHURqTS66Fbuxa00GTM5izwK-=Vg_20g@mail.gmail.com' \
    --to=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=nphamcs@gmail.com \
    --cc=ryncsn@gmail.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yosry.ahmed@linux.dev \
    --cc=youngjun.park@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox