linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nhat Pham <nphamcs@gmail.com>
To: linux-mm@kvack.org
Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, hughd@google.com,
	yosry.ahmed@linux.dev, mhocko@kernel.org,
	roman.gushchin@linux.dev, shakeel.butt@linux.dev,
	muchun.song@linux.dev, len.brown@intel.com,
	chengming.zhou@linux.dev, kasong@tencent.com, chrisl@kernel.org,
	huang.ying.caritas@gmail.com, ryan.roberts@arm.com,
	viro@zeniv.linux.org.uk, baohua@kernel.org, osalvador@suse.de,
	lorenzo.stoakes@oracle.com, christophe.leroy@csgroup.eu,
	pavel@kernel.org, kernel-team@meta.com,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	linux-pm@vger.kernel.org
Subject: [RFC PATCH 12/14] vswap: support THP swapin and batch free_swap_and_cache
Date: Mon,  7 Apr 2025 16:42:13 -0700	[thread overview]
Message-ID: <20250407234223.1059191-13-nphamcs@gmail.com> (raw)
In-Reply-To: <20250407234223.1059191-1-nphamcs@gmail.com>

This patch implements the required functionalities for THP swapin and
batched free_swap_and_cache() in the virtual swap space design.

The central requirement is the range of entries we are working with must
have no mixed backing states:

1. For now, zswap-backed entries are not supported for these batched
   operations.
2. All the entries must be backed by the same type.
3. If the swap entries in the batch are backed by in-memory folio, it
   must be the same folio (i.e they correspond to the subpages of that
   folio).
4. If the swap entries in the batch are backed by slots on swapfiles, it
   must be the same swapfile, and these physical swap slots must also be
   contiguous.

Signed-off-by: Nhat Pham <nphamcs@gmail.com>
---
 include/linux/swap.h |  6 +++
 mm/internal.h        | 14 +------
 mm/memory.c          | 16 ++++++--
 mm/vswap.c           | 89 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 109 insertions(+), 16 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 98cdfe0c1da7..c3a10c952116 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -763,6 +763,7 @@ bool vswap_folio_backed(swp_entry_t entry, int nr);
 void vswap_store_folio(swp_entry_t entry, struct folio *folio);
 void swap_zeromap_folio_set(struct folio *folio);
 void vswap_assoc_zswap(swp_entry_t entry, struct zswap_entry *zswap_entry);
+bool vswap_can_swapin_thp(swp_entry_t entry, int nr);
 
 static inline bool trylock_swapoff(swp_entry_t entry,
 				struct swap_info_struct **si)
@@ -862,6 +863,11 @@ static inline void vswap_assoc_zswap(swp_entry_t entry,
 {
 }
 
+static inline bool vswap_can_swapin_thp(swp_entry_t entry, int nr)
+{
+	return true;
+}
+
 static inline bool trylock_swapoff(swp_entry_t entry,
 				struct swap_info_struct **si)
 {
diff --git a/mm/internal.h b/mm/internal.h
index 51061691a731..6694e7a14745 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -268,14 +268,7 @@ static inline swp_entry_t swap_nth(swp_entry_t entry, long n)
 	return (swp_entry_t) { entry.val + n };
 }
 
-/* temporary disallow batched swap operations */
-static inline swp_entry_t swap_move(swp_entry_t entry, long delta)
-{
-	swp_entry_t next_entry;
-
-	next_entry.val = 0;
-	return next_entry;
-}
+swp_entry_t swap_move(swp_entry_t entry, long delta);
 #else
 static inline swp_entry_t swap_nth(swp_entry_t entry, long n)
 {
@@ -344,8 +337,6 @@ static inline pte_t pte_next_swp_offset(pte_t pte)
  * max_nr must be at least one and must be limited by the caller so scanning
  * cannot exceed a single page table.
  *
- * Note that for virtual swap space, we will not batch anything for now.
- *
  * Return: the number of table entries in the batch.
  */
 static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte)
@@ -360,9 +351,6 @@ static inline int swap_pte_batch(pte_t *start_ptep, int max_nr, pte_t pte)
 	VM_WARN_ON(!is_swap_pte(pte));
 	VM_WARN_ON(non_swap_entry(entry));
 
-	if (IS_ENABLED(CONFIG_VIRTUAL_SWAP))
-		return 1;
-
 	cgroup_id = lookup_swap_cgroup_id(entry);
 	while (ptep < end_ptep) {
 		pte = ptep_get(ptep);
diff --git a/mm/memory.c b/mm/memory.c
index c5c34efafa81..5abb464913ef 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4226,10 +4226,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
 	 * A large swapped out folio could be partially or fully in zswap. We
 	 * lack handling for such cases, so fallback to swapping in order-0
 	 * folio.
-	 *
-	 * We also disable THP swapin on the virtual swap implementation, for now.
 	 */
-	if (!zswap_never_enabled() || IS_ENABLED(CONFIG_VIRTUAL_SWAP))
+	if (!zswap_never_enabled())
 		goto fallback;
 
 	entry = pte_to_swp_entry(vmf->orig_pte);
@@ -4419,6 +4417,18 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 				}
 				need_clear_cache = true;
 
+				/*
+				 * Recheck to make sure the entire range is still
+				 * THP-swapin-able. Note that before we call
+				 * swapcache_prepare(), entries in the range can
+				 * still have their backing status changed.
+				 */
+				if (IS_ENABLED(CONFIG_VIRTUAL_SWAP) &&
+						!vswap_can_swapin_thp(entry, nr_pages)) {
+					schedule_timeout_uninterruptible(1);
+					goto out_page;
+				}
+
 				mem_cgroup_swapin_uncharge_swap(entry, nr_pages);
 
 				shadow = get_shadow_from_swap_cache(entry);
diff --git a/mm/vswap.c b/mm/vswap.c
index fcc7807ba89b..c09a7efc2aeb 100644
--- a/mm/vswap.c
+++ b/mm/vswap.c
@@ -9,6 +9,7 @@
 #include <linux/swap.h>
 #include <linux/swapops.h>
 #include <linux/swap_cgroup.h>
+#include "internal.h"
 #include "swap.h"
 
 /*
@@ -1104,6 +1105,94 @@ bool vswap_folio_backed(swp_entry_t entry, int nr)
 				&& type == VSWAP_FOLIO;
 }
 
+/**
+ * vswap_can_swapin_thp - check if the swap entries can be swapped in as a THP.
+ * @entry: the first virtual swap slot in the range.
+ * @nr: the number of slots in the range.
+ *
+ * For now, we can only swap in a THP if the entire range is zero-filled, or if
+ * the entire range is backed by a contiguous range of physical swap slots on a
+ * swapfile.
+ */
+bool vswap_can_swapin_thp(swp_entry_t entry, int nr)
+{
+	enum swap_type type;
+
+	return vswap_check_backing(entry, &type, nr) == nr &&
+		(type == VSWAP_ZERO || type == VSWAP_SWAPFILE);
+}
+
+/**
+ * swap_move - increment the swap slot by delta, checking the backing state and
+ *             return 0 if the backing state does not match (i.e wrong backing
+ *             state type, or wrong offset on the backing stores).
+ * @entry: the original virtual swap slot.
+ * @delta: the offset to increment the original slot.
+ *
+ * Note that this function is racy unless we can pin the backing state of these
+ * swap slots down with swapcache_prepare().
+ *
+ * Caller should only rely on this function as a best-effort hint otherwise,
+ * and should double-check after ensuring the whole range is pinned down.
+ *
+ * Return: the incremented virtual swap slot if the backing state matches, or
+ *         0 if the backing state does not match.
+ */
+swp_entry_t swap_move(swp_entry_t entry, long delta)
+{
+	struct swp_desc *desc, *next_desc;
+	swp_entry_t next_entry;
+	bool invalid = true;
+	struct folio *folio;
+	enum swap_type type;
+	swp_slot_t slot;
+
+	next_entry.val = entry.val + delta;
+
+	rcu_read_lock();
+	desc = xa_load(&vswap_map, entry.val);
+	next_desc = xa_load(&vswap_map, next_entry.val);
+
+	if (!desc || !next_desc) {
+		rcu_read_unlock();
+		return (swp_entry_t){0};
+	}
+
+	read_lock(&desc->lock);
+	if (desc->type == VSWAP_ZSWAP) {
+		read_unlock(&desc->lock);
+		goto rcu_unlock;
+	}
+
+	type = desc->type;
+	if (type == VSWAP_FOLIO)
+		folio = desc->folio;
+
+	if (type == VSWAP_SWAPFILE)
+		slot = desc->slot;
+	read_unlock(&desc->lock);
+
+	read_lock(&next_desc->lock);
+	if (next_desc->type != type)
+		goto next_unlock;
+
+	if (type == VSWAP_SWAPFILE &&
+			(swp_slot_type(next_desc->slot) != swp_slot_type(slot) ||
+				swp_slot_offset(next_desc->slot) !=
+							swp_slot_offset(slot) + delta))
+		goto next_unlock;
+
+	if (type == VSWAP_FOLIO && next_desc->folio != folio)
+		goto next_unlock;
+
+	invalid = false;
+next_unlock:
+	read_unlock(&next_desc->lock);
+rcu_unlock:
+	rcu_read_unlock();
+	return invalid ? (swp_entry_t){0} : next_entry;
+}
+
 /*
  * Return the count of contiguous swap entries that share the same
  * VSWAP_ZERO status as the starting entry. If is_zeromap is not NULL,
-- 
2.47.1



  parent reply	other threads:[~2025-04-07 23:43 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-07 23:42 [RFC PATCH 00/14] Virtual Swap Space Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 01/14] swapfile: rearrange functions Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 02/14] mm: swap: add an abstract API for locking out swapoff Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 03/14] mm: swap: add a separate type for physical swap slots Nhat Pham
2025-04-08 14:15   ` Johannes Weiner
2025-04-08 15:11     ` Nhat Pham
2025-04-22 14:41     ` Yosry Ahmed
     [not found]     ` <6807ab09.670a0220.152ca3.502fSMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-22 15:50       ` Nhat Pham
2025-04-22 18:50         ` Kairui Song
2025-04-07 23:42 ` [RFC PATCH 04/14] mm: swap: swap cache support for virtualized swap Nhat Pham
2025-04-08 15:00   ` Johannes Weiner
2025-04-08 15:34     ` Nhat Pham
2025-04-08 15:43       ` Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 05/14] zswap: unify zswap tree " Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 06/14] mm: swap: allocate a virtual swap slot for each swapped out page Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 07/14] swap: implement the swap_cgroup API using virtual swap Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 08/14] swap: manage swap entry lifetime at the virtual swap layer Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 09/14] swap: implement locking out swapoff using virtual swap slot Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 10/14] mm: swap: decouple virtual swap slot from backing store Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 11/14] memcg: swap: only charge physical swap slots Nhat Pham
2025-04-07 23:42 ` Nhat Pham [this message]
2025-04-07 23:42 ` [RFC PATCH 13/14] swap: simplify swapoff using virtual swap Nhat Pham
2025-04-07 23:42 ` [RFC PATCH 14/14] zswap: do not start zswap shrinker if there is no physical swap slots Nhat Pham
2025-04-08 13:04 ` [RFC PATCH 00/14] Virtual Swap Space Usama Arif
2025-04-08 15:20   ` Nhat Pham
2025-04-08 15:45   ` Johannes Weiner
2025-04-08 16:25     ` Nhat Pham
2025-04-08 16:27       ` Nhat Pham
2025-04-08 16:22 ` Kairui Song
2025-04-08 16:47   ` Nhat Pham
2025-04-08 16:59     ` Kairui Song
2025-04-22 14:43       ` Yosry Ahmed
2025-04-22 14:56 ` Yosry Ahmed
     [not found] ` <6807afd0.a70a0220.2ae8b9.e07cSMTPIN_ADDED_BROKEN@mx.google.com>
2025-04-22 17:15   ` Nhat Pham
2025-04-22 19:29     ` Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250407234223.1059191-13-nphamcs@gmail.com \
    --to=nphamcs@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=hannes@cmpxchg.org \
    --cc=huang.ying.caritas@gmail.com \
    --cc=hughd@google.com \
    --cc=kasong@tencent.com \
    --cc=kernel-team@meta.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=pavel@kernel.org \
    --cc=roman.gushchin@linux.dev \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox