linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kairui Song <ryncsn@gmail.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Baoquan He <bhe@redhat.com>,  Barry Song <baohua@kernel.org>,
	Chris Li <chrisl@kernel.org>,  Nhat Pham <nphamcs@gmail.com>,
	Yosry Ahmed <yosry.ahmed@linux.dev>,
	 David Hildenbrand <david@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Youngjun Park <youngjun.park@lge.com>,
	Hugh Dickins <hughd@google.com>,
	 Baolin Wang <baolin.wang@linux.alibaba.com>,
	 Ying Huang <ying.huang@linux.alibaba.com>,
	 Kemeng Shi <shikemeng@huaweicloud.com>,
	 Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	 "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	 linux-kernel@vger.kernel.org, Kairui Song <kasong@tencent.com>
Subject: [PATCH v5 06/19] mm, swap: free the swap cache after folio is mapped
Date: Sat, 20 Dec 2025 03:43:35 +0800	[thread overview]
Message-ID: <20251220-swap-table-p2-v5-6-8862a265a033@tencent.com> (raw)
In-Reply-To: <20251220-swap-table-p2-v5-0-8862a265a033@tencent.com>

From: Kairui Song <kasong@tencent.com>

Currently, we remove the folio from the swap cache and free the swap
cache before mapping the PTE. To reduce repeated faults due to parallel
swapins of the same PTE, change it to remove the folio from the swap
cache after it is mapped. So new faults from the swap PTE will be much
more likely to see the folio in the swap cache and wait on it.

This does not eliminate all swapin races: an ongoing swapin fault may
still see an empty swap cache. That's harmless, as the PTE is changed
before the swap cache is cleared, so it will just return and not trigger
any repeated faults. This does help to reduce the chance.

Reviewed-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/memory.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index ca54009cd586..a4c58341c44a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4362,6 +4362,7 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf)
 static inline bool should_try_to_free_swap(struct swap_info_struct *si,
 					   struct folio *folio,
 					   struct vm_area_struct *vma,
+					   unsigned int extra_refs,
 					   unsigned int fault_flags)
 {
 	if (!folio_test_swapcache(folio))
@@ -4384,7 +4385,7 @@ static inline bool should_try_to_free_swap(struct swap_info_struct *si,
 	 * reference only in case it's likely that we'll be the exclusive user.
 	 */
 	return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &&
-		folio_ref_count(folio) == (1 + folio_nr_pages(folio));
+		folio_ref_count(folio) == (extra_refs + folio_nr_pages(folio));
 }
 
 static vm_fault_t pte_marker_clear(struct vm_fault *vmf)
@@ -4936,15 +4937,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	 */
 	arch_swap_restore(folio_swap(entry, folio), folio);
 
-	/*
-	 * Remove the swap entry and conditionally try to free up the swapcache.
-	 * We're already holding a reference on the page but haven't mapped it
-	 * yet.
-	 */
-	swap_free_nr(entry, nr_pages);
-	if (should_try_to_free_swap(si, folio, vma, vmf->flags))
-		folio_free_swap(folio);
-
 	add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
 	add_mm_counter(vma->vm_mm, MM_SWAPENTS, -nr_pages);
 	pte = mk_pte(page, vma->vm_page_prot);
@@ -4998,6 +4990,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	arch_do_swap_page_nr(vma->vm_mm, vma, address,
 			pte, pte, nr_pages);
 
+	/*
+	 * Remove the swap entry and conditionally try to free up the swapcache.
+	 * Do it after mapping, so raced page faults will likely see the folio
+	 * in swap cache and wait on the folio lock.
+	 */
+	swap_free_nr(entry, nr_pages);
+	if (should_try_to_free_swap(si, folio, vma, nr_pages, vmf->flags))
+		folio_free_swap(folio);
+
 	folio_unlock(folio);
 	if (unlikely(folio != swapcache)) {
 		/*

-- 
2.52.0



  parent reply	other threads:[~2025-12-19 19:44 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-19 19:43 [PATCH v5 00/19] mm, swap: swap table phase II: unify swapin use swap cache and cleanup flags Kairui Song
2025-12-19 19:43 ` [PATCH v5 01/19] mm, swap: rename __read_swap_cache_async to swap_cache_alloc_folio Kairui Song
2025-12-19 19:43 ` [PATCH v5 02/19] mm, swap: split swap cache preparation loop into a standalone helper Kairui Song
2025-12-19 19:43 ` [PATCH v5 03/19] mm, swap: never bypass the swap cache even for SWP_SYNCHRONOUS_IO Kairui Song
2025-12-19 19:43 ` [PATCH v5 04/19] mm, swap: always try to free swap cache for SWP_SYNCHRONOUS_IO devices Kairui Song
2025-12-19 19:43 ` [PATCH v5 05/19] mm, swap: simplify the code and reduce indention Kairui Song
2025-12-19 19:43 ` Kairui Song [this message]
2025-12-19 19:43 ` [PATCH v5 08/19] mm/shmem, swap: remove SWAP_MAP_SHMEM Kairui Song
2025-12-19 19:43 ` [PATCH v5 09/19] mm, swap: swap entry of a bad slot should not be considered as swapped out Kairui Song
2025-12-19 19:43 ` [PATCH v5 10/19] mm, swap: consolidate cluster reclaim and usability check Kairui Song
2025-12-19 19:43 ` [PATCH v5 11/19] mm, swap: split locked entry duplicating into a standalone helper Kairui Song
2025-12-19 19:43 ` [PATCH v5 12/19] mm, swap: use swap cache as the swap in synchronize layer Kairui Song
2025-12-19 19:43 ` [PATCH v5 13/19] mm, swap: remove workaround for unsynchronized swap map cache state Kairui Song
2025-12-19 19:43 ` [PATCH v5 14/19] mm, swap: cleanup swap entry management workflow Kairui Song
2025-12-20  4:02   ` Baoquan He
2025-12-22  2:43     ` Kairui Song
2025-12-19 19:43 ` [PATCH v5 15/19] mm, swap: add folio to swap cache directly on allocation Kairui Song
2025-12-20  4:12   ` Baoquan He
2025-12-22  2:42     ` Kairui Song
2025-12-22  3:41       ` Baoquan He
2025-12-19 19:43 ` [PATCH v5 16/19] mm, swap: check swap table directly for checking cache Kairui Song
2025-12-19 19:43 ` [PATCH v5 17/19] mm, swap: clean up and improve swap entries freeing Kairui Song
2025-12-19 19:43 ` [PATCH v5 18/19] mm, swap: drop the SWAP_HAS_CACHE flag Kairui Song
2025-12-19 19:43 ` [PATCH v5 19/19] mm, swap: remove no longer needed _swap_info_get Kairui Song
2025-12-19 19:57 ` [PATCH v5 07/19] mm/shmem: never bypass the swap cache for SWP_SYNCHRONOUS_IO Kairui Song
2025-12-19 20:05 ` [PATCH v5 00/19] mm, swap: swap table phase II: unify swapin use swap cache and cleanup flags Kairui Song
2025-12-20 12:34 ` Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251220-swap-table-p2-v5-6-8862a265a033@tencent.com \
    --to=ryncsn@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=nphamcs@gmail.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yosry.ahmed@linux.dev \
    --cc=youngjun.park@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox