linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/6] large folios swap-in: handle refault cases first
@ 2024-05-29  8:28 Barry Song
  2024-05-29  8:28 ` [PATCH v5 1/6] mm: swap: introduce swap_free_nr() for batched swap_free() Barry Song
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Barry Song @ 2024-05-29  8:28 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: baolin.wang, chrisl, david, hanchuanhua, hannes, hughd, kasong,
	linux-kernel, ryan.roberts, surenb, v-songbaohua, willy, xiang,
	ying.huang, yosryahmed, yuzhao, ziy

From: Barry Song <v-songbaohua@oppo.com>

This patch is extracted from the large folio swapin series[1], primarily addressing
the handling of scenarios involving large folios in the swap cache. Currently, it is
particularly focused on addressing the refaulting of mTHP, which is still undergoing
reclamation. This approach aims to streamline code review and expedite the integration
of this segment into the MM tree.

It relies on Ryan's swap-out series[2], leveraging the helper function
swap_pte_batch() introduced by that series.

Presently, do_swap_page only encounters a large folio in the swap
cache before the large folio is released by vmscan. However, the code
should remain equally useful once we support large folio swap-in via
swapin_readahead(). This approach can effectively reduce page faults
and eliminate most redundant checks and early exits for MTE restoration
in recent MTE patchset[3].

The large folio swap-in for SWP_SYNCHRONOUS_IO and swapin_readahead()
will be split into separate patch sets and sent at a later time.

-v5:
   collect reviewed-by of Ryan, "Huang, Ying", thanks!

-v4:
 - collect acked-by/reviewed-by of Ryan, "Huang, Ying", Chris, David and
   Khalid, many thanks!
 - Simplify reuse code in do_swap_page() by checking refcount==1, per
   David;
 - Initialize large folio-related variables later in do_swap_page(), per
   Ryan;
 - define swap_free() as swap_free_nr(1) per Ying and Ryan.

-v3:
 - optimize swap_free_nr using bitmap with single one "long"; "Huang, Ying"
 - drop swap_free() as suggested by "Huang, Ying", now hibernation can get
   batched;
 - lots of cleanup in do_swap_page() as commented by Ryan Roberts and "Huang,
   Ying";
 - handle arch_do_swap_page() with nr pages though the only platform which
   needs it, sparc, doesn't support THP_SWAPOUT as suggested by "Huang,
   Ying";
 - introduce pte_move_swp_offset() as suggested by "Huang, Ying";
 - drop the "any_shared" of checking swap entries with respect to David's
   comment;
 - drop the counter of swapin_refault and keep it for debug purpose per
   Ying
 - collect reviewed-by tags
 Link:
  https://lore.kernel.org/linux-mm/20240503005023.174597-1-21cnbao@gmail.com/

-v2:
 - rebase on top of mm-unstable in which Ryan's swap_pte_batch() has changed
   a lot.
 - remove folio_add_new_anon_rmap() for !folio_test_anon()
   as currently large folios are always anon(refault).
 - add mTHP swpin refault counters
  Link:
  https://lore.kernel.org/linux-mm/20240409082631.187483-1-21cnbao@gmail.com/

-v1:
  Link: https://lore.kernel.org/linux-mm/20240402073237.240995-1-21cnbao@gmail.com/

Differences with the original large folios swap-in series
 - collect r-o-b, acked;
 - rename swap_nr_free to swap_free_nr, according to Ryan;
 - limit the maximum kernel stack usage for swap_free_nr, Ryan;
 - add output argument in swap_pte_batch to expose if all entries are
   exclusive
 - many clean refinements, handle the corner case folio's virtual addr
   might not be naturally aligned

[1] https://lore.kernel.org/linux-mm/20240304081348.197341-1-21cnbao@gmail.com/
[2] https://lore.kernel.org/linux-mm/20240408183946.2991168-1-ryan.roberts@arm.com/
[3] https://lore.kernel.org/linux-mm/20240322114136.61386-1-21cnbao@gmail.com/

Barry Song (3):
  mm: remove the implementation of swap_free() and always use
    swap_free_nr()
  mm: introduce pte_move_swp_offset() helper which can move offset
    bidirectionally
  mm: introduce arch_do_swap_page_nr() which allows restore metadata for
    nr pages

Chuanhua Han (3):
  mm: swap: introduce swap_free_nr() for batched swap_free()
  mm: swap: make should_try_to_free_swap() support large-folio
  mm: swap: entirely map large folios found in swapcache

 include/linux/pgtable.h | 26 +++++++++++++-----
 include/linux/swap.h    |  9 +++++--
 kernel/power/swap.c     |  5 ++--
 mm/internal.h           | 25 ++++++++++++++---
 mm/memory.c             | 60 +++++++++++++++++++++++++++++++++--------
 mm/swapfile.c           | 48 +++++++++++++++++++++++++++++----
 6 files changed, 142 insertions(+), 31 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v5 1/6] mm: swap: introduce swap_free_nr() for batched swap_free()
  2024-05-29  8:28 [PATCH v5 0/6] large folios swap-in: handle refault cases first Barry Song
@ 2024-05-29  8:28 ` Barry Song
  2024-05-29  8:28 ` [PATCH v5 2/6] mm: remove the implementation of swap_free() and always use swap_free_nr() Barry Song
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Barry Song @ 2024-05-29  8:28 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: baolin.wang, chrisl, david, hanchuanhua, hannes, hughd, kasong,
	linux-kernel, ryan.roberts, surenb, v-songbaohua, willy, xiang,
	ying.huang, yosryahmed, yuzhao, ziy

From: Chuanhua Han <hanchuanhua@oppo.com>

While swapping in a large folio, we need to free swaps related to the whole
folio. To avoid frequently acquiring and releasing swap locks, it is better
to introduce an API for batched free.
Furthermore, this new function, swap_free_nr(), is designed to efficiently
handle various scenarios for releasing a specified number, nr, of swap
entries.

Signed-off-by: Chuanhua Han <hanchuanhua@oppo.com>
Co-developed-by: Barry Song <v-songbaohua@oppo.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Chris Li <chrisl@kernel.org>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
---
 include/linux/swap.h |  5 +++++
 mm/swapfile.c        | 47 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index a11c75e897ec..45f76dfe29b1 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -481,6 +481,7 @@ extern void swap_shmem_alloc(swp_entry_t);
 extern int swap_duplicate(swp_entry_t);
 extern int swapcache_prepare(swp_entry_t);
 extern void swap_free(swp_entry_t);
+extern void swap_free_nr(swp_entry_t entry, int nr_pages);
 extern void swapcache_free_entries(swp_entry_t *entries, int n);
 extern void free_swap_and_cache_nr(swp_entry_t entry, int nr);
 int swap_type_of(dev_t device, sector_t offset);
@@ -562,6 +563,10 @@ static inline void swap_free(swp_entry_t swp)
 {
 }
 
+static inline void swap_free_nr(swp_entry_t entry, int nr_pages)
+{
+}
+
 static inline void put_swap_folio(struct folio *folio, swp_entry_t swp)
 {
 }
diff --git a/mm/swapfile.c b/mm/swapfile.c
index f1e559e216bd..92a045d34a97 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1356,6 +1356,53 @@ void swap_free(swp_entry_t entry)
 		__swap_entry_free(p, entry);
 }
 
+static void cluster_swap_free_nr(struct swap_info_struct *sis,
+		unsigned long offset, int nr_pages)
+{
+	struct swap_cluster_info *ci;
+	DECLARE_BITMAP(to_free, BITS_PER_LONG) = { 0 };
+	int i, nr;
+
+	ci = lock_cluster_or_swap_info(sis, offset);
+	while (nr_pages) {
+		nr = min(BITS_PER_LONG, nr_pages);
+		for (i = 0; i < nr; i++) {
+			if (!__swap_entry_free_locked(sis, offset + i, 1))
+				bitmap_set(to_free, i, 1);
+		}
+		if (!bitmap_empty(to_free, BITS_PER_LONG)) {
+			unlock_cluster_or_swap_info(sis, ci);
+			for_each_set_bit(i, to_free, BITS_PER_LONG)
+				free_swap_slot(swp_entry(sis->type, offset + i));
+			if (nr == nr_pages)
+				return;
+			bitmap_clear(to_free, 0, BITS_PER_LONG);
+			ci = lock_cluster_or_swap_info(sis, offset);
+		}
+		offset += nr;
+		nr_pages -= nr;
+	}
+	unlock_cluster_or_swap_info(sis, ci);
+}
+
+void swap_free_nr(swp_entry_t entry, int nr_pages)
+{
+	int nr;
+	struct swap_info_struct *sis;
+	unsigned long offset = swp_offset(entry);
+
+	sis = _swap_info_get(entry);
+	if (!sis)
+		return;
+
+	while (nr_pages) {
+		nr = min_t(int, nr_pages, SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
+		cluster_swap_free_nr(sis, offset, nr);
+		offset += nr;
+		nr_pages -= nr;
+	}
+}
+
 /*
  * Called after dropping swapcache to decrease refcnt to swap entries.
  */
-- 
2.34.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v5 2/6] mm: remove the implementation of swap_free() and always use swap_free_nr()
  2024-05-29  8:28 [PATCH v5 0/6] large folios swap-in: handle refault cases first Barry Song
  2024-05-29  8:28 ` [PATCH v5 1/6] mm: swap: introduce swap_free_nr() for batched swap_free() Barry Song
@ 2024-05-29  8:28 ` Barry Song
  2024-05-29  8:28 ` [PATCH v5 3/6] mm: introduce pte_move_swp_offset() helper which can move offset bidirectionally Barry Song
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Barry Song @ 2024-05-29  8:28 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: baolin.wang, chrisl, david, hanchuanhua, hannes, hughd, kasong,
	linux-kernel, ryan.roberts, surenb, v-songbaohua, willy, xiang,
	ying.huang, yosryahmed, yuzhao, ziy, Rafael J. Wysocki,
	Pavel Machek, Len Brown, Christoph Hellwig

From: Barry Song <v-songbaohua@oppo.com>

To streamline maintenance efforts, we propose removing the implementation
of swap_free(). Instead, we can simply invoke swap_free_nr() with nr
set to 1. swap_free_nr() is designed with a bitmap consisting of only
one long, resulting in overhead that can be ignored for cases where nr
equals 1.

A prime candidate for leveraging swap_free_nr() lies within
kernel/power/swap.c. Implementing this change facilitates the adoption
of batch processing for hibernation.

Suggested-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Chris Li <chrisl@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Len Brown <len.brown@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
---
 include/linux/swap.h | 10 +++++-----
 kernel/power/swap.c  |  5 ++---
 mm/swapfile.c        | 17 ++++-------------
 3 files changed, 11 insertions(+), 21 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 45f76dfe29b1..3df75d62a835 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -480,7 +480,6 @@ extern int add_swap_count_continuation(swp_entry_t, gfp_t);
 extern void swap_shmem_alloc(swp_entry_t);
 extern int swap_duplicate(swp_entry_t);
 extern int swapcache_prepare(swp_entry_t);
-extern void swap_free(swp_entry_t);
 extern void swap_free_nr(swp_entry_t entry, int nr_pages);
 extern void swapcache_free_entries(swp_entry_t *entries, int n);
 extern void free_swap_and_cache_nr(swp_entry_t entry, int nr);
@@ -559,10 +558,6 @@ static inline int swapcache_prepare(swp_entry_t swp)
 	return 0;
 }
 
-static inline void swap_free(swp_entry_t swp)
-{
-}
-
 static inline void swap_free_nr(swp_entry_t entry, int nr_pages)
 {
 }
@@ -611,6 +606,11 @@ static inline void free_swap_and_cache(swp_entry_t entry)
 	free_swap_and_cache_nr(entry, 1);
 }
 
+static inline void swap_free(swp_entry_t entry)
+{
+	swap_free_nr(entry, 1);
+}
+
 #ifdef CONFIG_MEMCG
 static inline int mem_cgroup_swappiness(struct mem_cgroup *memcg)
 {
diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index d9abb7ab031d..85a8b5f4a081 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -200,12 +200,11 @@ void free_all_swap_pages(int swap)
 
 	while ((node = swsusp_extents.rb_node)) {
 		struct swsusp_extent *ext;
-		unsigned long offset;
 
 		ext = rb_entry(node, struct swsusp_extent, node);
 		rb_erase(node, &swsusp_extents);
-		for (offset = ext->start; offset <= ext->end; offset++)
-			swap_free(swp_entry(swap, offset));
+		swap_free_nr(swp_entry(swap, ext->start),
+			     ext->end - ext->start + 1);
 
 		kfree(ext);
 	}
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 92a045d34a97..9c6d8e557c0f 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1343,19 +1343,6 @@ static void swap_entry_free(struct swap_info_struct *p, swp_entry_t entry)
 	swap_range_free(p, offset, 1);
 }
 
-/*
- * Caller has made sure that the swap device corresponding to entry
- * is still around or has not been recycled.
- */
-void swap_free(swp_entry_t entry)
-{
-	struct swap_info_struct *p;
-
-	p = _swap_info_get(entry);
-	if (p)
-		__swap_entry_free(p, entry);
-}
-
 static void cluster_swap_free_nr(struct swap_info_struct *sis,
 		unsigned long offset, int nr_pages)
 {
@@ -1385,6 +1372,10 @@ static void cluster_swap_free_nr(struct swap_info_struct *sis,
 	unlock_cluster_or_swap_info(sis, ci);
 }
 
+/*
+ * Caller has made sure that the swap device corresponding to entry
+ * is still around or has not been recycled.
+ */
 void swap_free_nr(swp_entry_t entry, int nr_pages)
 {
 	int nr;
-- 
2.34.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v5 3/6] mm: introduce pte_move_swp_offset() helper which can move offset bidirectionally
  2024-05-29  8:28 [PATCH v5 0/6] large folios swap-in: handle refault cases first Barry Song
  2024-05-29  8:28 ` [PATCH v5 1/6] mm: swap: introduce swap_free_nr() for batched swap_free() Barry Song
  2024-05-29  8:28 ` [PATCH v5 2/6] mm: remove the implementation of swap_free() and always use swap_free_nr() Barry Song
@ 2024-05-29  8:28 ` Barry Song
  2024-05-29  8:28 ` [PATCH v5 4/6] mm: introduce arch_do_swap_page_nr() which allows restore metadata for nr pages Barry Song
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Barry Song @ 2024-05-29  8:28 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: baolin.wang, chrisl, david, hanchuanhua, hannes, hughd, kasong,
	linux-kernel, ryan.roberts, surenb, v-songbaohua, willy, xiang,
	ying.huang, yosryahmed, yuzhao, ziy

From: Barry Song <v-songbaohua@oppo.com>

There could arise a necessity to obtain the first pte_t from a swap
pte_t located in the middle. For instance, this may occur within the
context of do_swap_page(), where a page fault can potentially occur in
any PTE of a large folio. To address this, the following patch introduces
pte_move_swp_offset(), a function capable of bidirectional movement by
a specified delta argument. Consequently, pte_next_swp_offset()
will directly invoke it with delta = 1.

Suggested-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
---
 mm/internal.h | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index bbec99cc9d9d..3419c329b3bc 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -211,18 +211,21 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr,
 }
 
 /**
- * pte_next_swp_offset - Increment the swap entry offset field of a swap pte.
+ * pte_move_swp_offset - Move the swap entry offset field of a swap pte
+ *	 forward or backward by delta
  * @pte: The initial pte state; is_swap_pte(pte) must be true and
  *	 non_swap_entry() must be false.
+ * @delta: The direction and the offset we are moving; forward if delta
+ *	 is positive; backward if delta is negative
  *
- * Increments the swap offset, while maintaining all other fields, including
+ * Moves the swap offset, while maintaining all other fields, including
  * swap type, and any swp pte bits. The resulting pte is returned.
  */
-static inline pte_t pte_next_swp_offset(pte_t pte)
+static inline pte_t pte_move_swp_offset(pte_t pte, long delta)
 {
 	swp_entry_t entry = pte_to_swp_entry(pte);
 	pte_t new = __swp_entry_to_pte(__swp_entry(swp_type(entry),
-						   (swp_offset(entry) + 1)));
+						   (swp_offset(entry) + delta)));
 
 	if (pte_swp_soft_dirty(pte))
 		new = pte_swp_mksoft_dirty(new);
@@ -234,6 +237,20 @@ static inline pte_t pte_next_swp_offset(pte_t pte)
 	return new;
 }
 
+
+/**
+ * pte_next_swp_offset - Increment the swap entry offset field of a swap pte.
+ * @pte: The initial pte state; is_swap_pte(pte) must be true and
+ *	 non_swap_entry() must be false.
+ *
+ * Increments the swap offset, while maintaining all other fields, including
+ * swap type, and any swp pte bits. The resulting pte is returned.
+ */
+static inline pte_t pte_next_swp_offset(pte_t pte)
+{
+	return pte_move_swp_offset(pte, 1);
+}
+
 /**
  * swap_pte_batch - detect a PTE batch for a set of contiguous swap entries
  * @start_ptep: Page table pointer for the first entry.
-- 
2.34.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v5 4/6] mm: introduce arch_do_swap_page_nr() which allows restore metadata for nr pages
  2024-05-29  8:28 [PATCH v5 0/6] large folios swap-in: handle refault cases first Barry Song
                   ` (2 preceding siblings ...)
  2024-05-29  8:28 ` [PATCH v5 3/6] mm: introduce pte_move_swp_offset() helper which can move offset bidirectionally Barry Song
@ 2024-05-29  8:28 ` Barry Song
  2024-05-29  8:28 ` [PATCH v5 5/6] mm: swap: make should_try_to_free_swap() support large-folio Barry Song
  2024-05-29  8:28 ` [PATCH v5 6/6] mm: swap: entirely map large folios found in swapcache Barry Song
  5 siblings, 0 replies; 7+ messages in thread
From: Barry Song @ 2024-05-29  8:28 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: baolin.wang, chrisl, david, hanchuanhua, hannes, hughd, kasong,
	linux-kernel, ryan.roberts, surenb, v-songbaohua, willy, xiang,
	ying.huang, yosryahmed, yuzhao, ziy, Khalid Aziz,
	David S. Miller, Andreas Larsson

From: Barry Song <v-songbaohua@oppo.com>

Should do_swap_page() have the capability to directly map a large folio,
metadata restoration becomes necessary for a specified number of pages
denoted as nr. It's important to highlight that metadata restoration is
solely required by the SPARC platform, which, however, does not enable
THP_SWAP. Consequently, in the present kernel configuration, there
exists no practical scenario where users necessitate the restoration of
nr metadata. Platforms implementing THP_SWAP might invoke this function
with nr values exceeding 1, subsequent to do_swap_page() successfully
mapping an entire large folio. Nonetheless, their arch_do_swap_page_nr()
functions remain empty.

Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andreas Larsson <andreas@gaisler.com>
---
 include/linux/pgtable.h | 26 ++++++++++++++++++++------
 mm/memory.c             |  3 ++-
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 117b807e3f89..2f32eaccf0b9 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1089,6 +1089,15 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
 })
 
 #ifndef __HAVE_ARCH_DO_SWAP_PAGE
+static inline void arch_do_swap_page_nr(struct mm_struct *mm,
+				     struct vm_area_struct *vma,
+				     unsigned long addr,
+				     pte_t pte, pte_t oldpte,
+				     int nr)
+{
+
+}
+#else
 /*
  * Some architectures support metadata associated with a page. When a
  * page is being swapped out, this metadata must be saved so it can be
@@ -1097,12 +1106,17 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
  * page as metadata for the page. arch_do_swap_page() can restore this
  * metadata when a page is swapped back in.
  */
-static inline void arch_do_swap_page(struct mm_struct *mm,
-				     struct vm_area_struct *vma,
-				     unsigned long addr,
-				     pte_t pte, pte_t oldpte)
-{
-
+static inline void arch_do_swap_page_nr(struct mm_struct *mm,
+					struct vm_area_struct *vma,
+					unsigned long addr,
+					pte_t pte, pte_t oldpte,
+					int nr)
+{
+	for (int i = 0; i < nr; i++) {
+		arch_do_swap_page(vma->vm_mm, vma, addr + i * PAGE_SIZE,
+				pte_advance_pfn(pte, i),
+				pte_advance_pfn(oldpte, i));
+	}
 }
 #endif
 
diff --git a/mm/memory.c b/mm/memory.c
index 100f54fc9e6c..c0c6de31a313 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4314,7 +4314,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	VM_BUG_ON(!folio_test_anon(folio) ||
 			(pte_write(pte) && !PageAnonExclusive(page)));
 	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
-	arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte);
+	arch_do_swap_page_nr(vma->vm_mm, vma, vmf->address,
+			pte, vmf->orig_pte, 1);
 
 	folio_unlock(folio);
 	if (folio != swapcache && swapcache) {
-- 
2.34.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v5 5/6] mm: swap: make should_try_to_free_swap() support large-folio
  2024-05-29  8:28 [PATCH v5 0/6] large folios swap-in: handle refault cases first Barry Song
                   ` (3 preceding siblings ...)
  2024-05-29  8:28 ` [PATCH v5 4/6] mm: introduce arch_do_swap_page_nr() which allows restore metadata for nr pages Barry Song
@ 2024-05-29  8:28 ` Barry Song
  2024-05-29  8:28 ` [PATCH v5 6/6] mm: swap: entirely map large folios found in swapcache Barry Song
  5 siblings, 0 replies; 7+ messages in thread
From: Barry Song @ 2024-05-29  8:28 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: baolin.wang, chrisl, david, hanchuanhua, hannes, hughd, kasong,
	linux-kernel, ryan.roberts, surenb, v-songbaohua, willy, xiang,
	ying.huang, yosryahmed, yuzhao, ziy

From: Chuanhua Han <hanchuanhua@oppo.com>

The function should_try_to_free_swap() operates under the assumption
that swap-in always occurs at the normal page granularity,
i.e., folio_nr_pages() = 1. However, in reality, for large folios,
add_to_swap_cache() will invoke folio_ref_add(folio, nr).  To accommodate
large folio swap-in, this patch eliminates this assumption.

Signed-off-by: Chuanhua Han <hanchuanhua@oppo.com>
Co-developed-by: Barry Song <v-songbaohua@oppo.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: Chris Li <chrisl@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index c0c6de31a313..10719e5afecb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3925,7 +3925,7 @@ static inline bool should_try_to_free_swap(struct folio *folio,
 	 * reference only in case it's likely that we'll be the exlusive user.
 	 */
 	return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &&
-		folio_ref_count(folio) == 2;
+		folio_ref_count(folio) == (1 + folio_nr_pages(folio));
 }
 
 static vm_fault_t pte_marker_clear(struct vm_fault *vmf)
-- 
2.34.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v5 6/6] mm: swap: entirely map large folios found in swapcache
  2024-05-29  8:28 [PATCH v5 0/6] large folios swap-in: handle refault cases first Barry Song
                   ` (4 preceding siblings ...)
  2024-05-29  8:28 ` [PATCH v5 5/6] mm: swap: make should_try_to_free_swap() support large-folio Barry Song
@ 2024-05-29  8:28 ` Barry Song
  5 siblings, 0 replies; 7+ messages in thread
From: Barry Song @ 2024-05-29  8:28 UTC (permalink / raw)
  To: akpm, linux-mm
  Cc: baolin.wang, chrisl, david, hanchuanhua, hannes, hughd, kasong,
	linux-kernel, ryan.roberts, surenb, v-songbaohua, willy, xiang,
	ying.huang, yosryahmed, yuzhao, ziy

From: Chuanhua Han <hanchuanhua@oppo.com>

When a large folio is found in the swapcache, the current implementation
requires calling do_swap_page() nr_pages times, resulting in nr_pages
page faults. This patch opts to map the entire large folio at once to
minimize page faults. Additionally, redundant checks and early exits
for ARM64 MTE restoring are removed.

Signed-off-by: Chuanhua Han <hanchuanhua@oppo.com>
Co-developed-by: Barry Song <v-songbaohua@oppo.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
---
 mm/memory.c | 59 +++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 48 insertions(+), 11 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 10719e5afecb..eef4e482c0c2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4016,6 +4016,10 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	pte_t pte;
 	vm_fault_t ret = 0;
 	void *shadow = NULL;
+	int nr_pages;
+	unsigned long page_idx;
+	unsigned long address;
+	pte_t *ptep;
 
 	if (!pte_unmap_same(vmf))
 		goto out;
@@ -4214,6 +4218,38 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 		goto out_nomap;
 	}
 
+	nr_pages = 1;
+	page_idx = 0;
+	address = vmf->address;
+	ptep = vmf->pte;
+	if (folio_test_large(folio) && folio_test_swapcache(folio)) {
+		int nr = folio_nr_pages(folio);
+		unsigned long idx = folio_page_idx(folio, page);
+		unsigned long folio_start = address - idx * PAGE_SIZE;
+		unsigned long folio_end = folio_start + nr * PAGE_SIZE;
+		pte_t *folio_ptep;
+		pte_t folio_pte;
+
+		if (unlikely(folio_start < max(address & PMD_MASK, vma->vm_start)))
+			goto check_folio;
+		if (unlikely(folio_end > pmd_addr_end(address, vma->vm_end)))
+			goto check_folio;
+
+		folio_ptep = vmf->pte - idx;
+		folio_pte = ptep_get(folio_ptep);
+		if (!pte_same(folio_pte, pte_move_swp_offset(vmf->orig_pte, -idx)) ||
+		    swap_pte_batch(folio_ptep, nr, folio_pte) != nr)
+			goto check_folio;
+
+		page_idx = idx;
+		address = folio_start;
+		ptep = folio_ptep;
+		nr_pages = nr;
+		entry = folio->swap;
+		page = &folio->page;
+	}
+
+check_folio:
 	/*
 	 * PG_anon_exclusive reuses PG_mappedtodisk for anon pages. A swap pte
 	 * must never point at an anonymous page in the swapcache that is
@@ -4273,12 +4309,12 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	 * We're already holding a reference on the page but haven't mapped it
 	 * yet.
 	 */
-	swap_free(entry);
+	swap_free_nr(entry, nr_pages);
 	if (should_try_to_free_swap(folio, vma, vmf->flags))
 		folio_free_swap(folio);
 
-	inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
-	dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
+	add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
+	add_mm_counter(vma->vm_mm, MM_SWAPENTS, -nr_pages);
 	pte = mk_pte(page, vma->vm_page_prot);
 
 	/*
@@ -4295,27 +4331,28 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 		}
 		rmap_flags |= RMAP_EXCLUSIVE;
 	}
-	flush_icache_page(vma, page);
+	folio_ref_add(folio, nr_pages - 1);
+	flush_icache_pages(vma, page, nr_pages);
 	if (pte_swp_soft_dirty(vmf->orig_pte))
 		pte = pte_mksoft_dirty(pte);
 	if (pte_swp_uffd_wp(vmf->orig_pte))
 		pte = pte_mkuffd_wp(pte);
-	vmf->orig_pte = pte;
+	vmf->orig_pte = pte_advance_pfn(pte, page_idx);
 
 	/* ksm created a completely new copy */
 	if (unlikely(folio != swapcache && swapcache)) {
-		folio_add_new_anon_rmap(folio, vma, vmf->address);
+		folio_add_new_anon_rmap(folio, vma, address);
 		folio_add_lru_vma(folio, vma);
 	} else {
-		folio_add_anon_rmap_pte(folio, page, vma, vmf->address,
+		folio_add_anon_rmap_ptes(folio, page, nr_pages, vma, address,
 					rmap_flags);
 	}
 
 	VM_BUG_ON(!folio_test_anon(folio) ||
 			(pte_write(pte) && !PageAnonExclusive(page)));
-	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
-	arch_do_swap_page_nr(vma->vm_mm, vma, vmf->address,
-			pte, vmf->orig_pte, 1);
+	set_ptes(vma->vm_mm, address, ptep, pte, nr_pages);
+	arch_do_swap_page_nr(vma->vm_mm, vma, address,
+			pte, pte, nr_pages);
 
 	folio_unlock(folio);
 	if (folio != swapcache && swapcache) {
@@ -4339,7 +4376,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	}
 
 	/* No need to invalidate - it was non-present before */
-	update_mmu_cache_range(vmf, vma, vmf->address, vmf->pte, 1);
+	update_mmu_cache_range(vmf, vma, address, ptep, nr_pages);
 unlock:
 	if (vmf->pte)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-- 
2.34.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-05-29  8:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-29  8:28 [PATCH v5 0/6] large folios swap-in: handle refault cases first Barry Song
2024-05-29  8:28 ` [PATCH v5 1/6] mm: swap: introduce swap_free_nr() for batched swap_free() Barry Song
2024-05-29  8:28 ` [PATCH v5 2/6] mm: remove the implementation of swap_free() and always use swap_free_nr() Barry Song
2024-05-29  8:28 ` [PATCH v5 3/6] mm: introduce pte_move_swp_offset() helper which can move offset bidirectionally Barry Song
2024-05-29  8:28 ` [PATCH v5 4/6] mm: introduce arch_do_swap_page_nr() which allows restore metadata for nr pages Barry Song
2024-05-29  8:28 ` [PATCH v5 5/6] mm: swap: make should_try_to_free_swap() support large-folio Barry Song
2024-05-29  8:28 ` [PATCH v5 6/6] mm: swap: entirely map large folios found in swapcache Barry Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox