[PATCH v2 0/2] mm/swap: hibernate: improve hibernate performance with new allocator

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] mm/swap: hibernate: improve hibernate performance with new allocator
@ 2026-02-15 11:15 Kairui Song via B4 Relay
  2026-02-15 11:15 ` [PATCH v2 1/2] mm, swap: speed up hibernation allocation and writeout Kairui Song via B4 Relay
  2026-02-15 11:15 ` [PATCH v2 2/2] mm, swap: merge common convention and simplify allocation helper Kairui Song via B4 Relay
  0 siblings, 2 replies; 5+ messages in thread
From: Kairui Song via B4 Relay @ 2026-02-15 11:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Chris Li, Kemeng Shi, Nhat Pham, Baoquan He,
	Barry Song, Rafael J. Wysocki, Carsten Grohmann, linux-kernel,
	open list:SUSPEND TO RAM, Kairui Song

The new swap allocator didn't provide a high-performance allocation
method for hibernate, and just left it using the easy slow path. As a
result, hibernate performance is quite bad on some devices

Fix it by implementing hibernate support for the fast allocation path.

This regression seems only happen with SSD devices with poor 4k
performance. I've tested on several different NVME and SSD setups, the
performance diff is tiny on them, but testing on a Samsung SSD 830
Series (SATA II, 3.0 Gbps) showed a big difference [1]:

Test result with Samsung SSD 830 Series (SATA II, 3.0 Gbps) thanks
to Carsten Grohmann [1]:
6.19:               324 seconds
After this series:  35 seconds

Test result with SAMSUNG MZ7LH480HAHQ-00005 (SATA 3.2, 6.0 Gb/s):
Before 0ff67f990bd4: Wrote 2230700 kbytes in 4.47 seconds (499.03 MB/s)
After 0ff67f990bd4: Wrote 2215472 kbytes in 4.44 seconds (498.98 MB/s)
After this series: Wrote 2038748 kbytes in 4.04 seconds (504.64 MB/s)

Test result with Memblaze P5910DT0384M00:
Before 0ff67f990bd4: Wrote 2222772 kbytes in 0.84 seconds (2646.15 MB/s)
After 0ff67f990bd4: Wrote 2224184 kbytes in 0.90 seconds (2471.31 MB/s)
After this series: Wrote 1559088 kbytes in 0.55 seconds (2834.70 MB/s)

The performance is almost the same for blazing fast SSDs, but for some
SSDs, the performance is several times better.

Patch 1 improves the hibernate performance by using the fast path, and
patch 2 cleans up the code a bit since there are now multiple fast path
users using similar conventions.

Signed-off-by: Kairui Song <kasong@tencent.com>
Tested-by: Carsten Grohmann <carstengrohmann@gmx.de>
Link: https://lore.kernel.org/linux-mm/8b4bdcfa-ce3f-4e23-839f-31367df7c18f@gmx.de/ [1]
---
Changes in v2:
- Based on mm-unstable, resend using b4's relay to fix mismathed patch content.
- Link to v1: https://lore.kernel.org/r/20260215-hibernate-perf-v1-0-f55ee9ee67db@tencent.com

---
Kairui Song (2):
      mm, swap: speed up hibernation allocation and writeout
      mm, swap: merge common convention and simplify allocation helper

 mm/swapfile.c | 56 ++++++++++++++++++++++++++------------------------------
 1 file changed, 26 insertions(+), 30 deletions(-)
---
base-commit: 53f061047924205138ad9bc315885255f7cc4944
change-id: 20260212-hibernate-perf-fb7783b2b252

Best regards,
-- 
Kairui Song <kasong@tencent.com>




^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] mm, swap: speed up hibernation allocation and writeout
  2026-02-15 11:15 [PATCH v2 0/2] mm/swap: hibernate: improve hibernate performance with new allocator Kairui Song via B4 Relay
@ 2026-02-15 11:15 ` Kairui Song via B4 Relay
  2026-02-15 17:02   ` Andrew Morton
  2026-02-15 11:15 ` [PATCH v2 2/2] mm, swap: merge common convention and simplify allocation helper Kairui Song via B4 Relay
  1 sibling, 1 reply; 5+ messages in thread
From: Kairui Song via B4 Relay @ 2026-02-15 11:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Chris Li, Kemeng Shi, Nhat Pham, Baoquan He,
	Barry Song, Rafael J. Wysocki, Carsten Grohmann, linux-kernel,
	open list:SUSPEND TO RAM, Kairui Song

From: Kairui Song <kasong@tencent.com>

Since commit 0ff67f990bd4 ("mm, swap: remove swap slot cache"),
hibernation has been using the swap slot slow allocation path for
simplification, which turns out might cause regression for some
devices because the allocator now rotates clusters too often, leading to
slower allocation and more random distribution of data.

Fast allocation is not complex, so implement hibernation support as
well.

And reduce the indent of the code too, while at it. It doesn't have to
check the device flag, as the allocator will also check the device flag
and refuse to allocate if the device is not writable.

Test result with Samsung SSD 830 Series (SATA II, 3.0 Gbps) shows the
performance is several times better [1]:
6.19:               324 seconds
After this series:  35 seconds

Fixes: 0ff67f990bd4 ("mm, swap: remove swap slot cache")
Reported-by: Carsten Grohmann <carstengrohmann@gmx.de>
Closes: https://lore.kernel.org/linux-mm/20260206121151.dea3633d1f0ded7bbf49c22e@linux-foundation.org/
Link: https://lore.kernel.org/linux-mm/8b4bdcfa-ce3f-4e23-839f-31367df7c18f@gmx.de/ [1]
Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/swapfile.c | 34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index c6863ff7152c..bcac10d96fb5 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1926,8 +1926,9 @@ void swap_put_entries_direct(swp_entry_t entry, int nr)
 /* Allocate a slot for hibernation */
 swp_entry_t swap_alloc_hibernation_slot(int type)
 {
-	struct swap_info_struct *si = swap_type_to_info(type);
-	unsigned long offset;
+	struct swap_info_struct *pcp_si, *si = swap_type_to_info(type);
+	unsigned long pcp_offset, offset = SWAP_ENTRY_INVALID;
+	struct swap_cluster_info *ci;
 	swp_entry_t entry = {0};
 
 	if (!si)
@@ -1935,17 +1936,26 @@ swp_entry_t swap_alloc_hibernation_slot(int type)
 
 	/* This is called for allocating swap entry, not cache */
 	if (get_swap_device_info(si)) {
-		if (si->flags & SWP_WRITEOK) {
-			/*
-			 * Grab the local lock to be compliant
-			 * with swap table allocation.
-			 */
-			local_lock(&percpu_swap_cluster.lock);
-			offset = cluster_alloc_swap_entry(si, NULL);
-			local_unlock(&percpu_swap_cluster.lock);
-			if (offset)
-				entry = swp_entry(si->type, offset);
+		/*
+		 * Try the local cluster first if it matches the device. If
+		 * not, try grab a new cluster and override local cluster.
+		 */
+		local_lock(&percpu_swap_cluster.lock);
+		pcp_si = this_cpu_read(percpu_swap_cluster.si[0]);
+		pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]);
+		if (pcp_si == si && pcp_offset) {
+			ci = swap_cluster_lock(si, pcp_offset);
+			if (cluster_is_usable(ci, 0))
+				offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset);
+			else
+				swap_cluster_unlock(ci);
 		}
+		if (!offset)
+			offset = cluster_alloc_swap_entry(si, NULL);
+		if (offset)
+			entry = swp_entry(si->type, offset);
+		local_unlock(&percpu_swap_cluster.lock);
+
 		put_swap_device(si);
 	}
 fail:

-- 
2.52.0




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/2] mm, swap: speed up hibernation allocation and writeout
  2026-02-15 11:15 ` [PATCH v2 1/2] mm, swap: speed up hibernation allocation and writeout Kairui Song via B4 Relay
@ 2026-02-15 17:02   ` Andrew Morton
  2026-02-15 18:25     ` Kairui Song
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2026-02-15 17:02 UTC (permalink / raw)
  To: kasong
  Cc: Kairui Song via B4 Relay, linux-mm, Chris Li, Kemeng Shi,
	Nhat Pham, Baoquan He, Barry Song, Rafael J. Wysocki,
	Carsten Grohmann, linux-kernel, open list:SUSPEND TO RAM

On Sun, 15 Feb 2026 19:15:05 +0800 Kairui Song via B4 Relay <devnull+kasong.tencent.com@kernel.org> wrote:

> Since commit 0ff67f990bd4 ("mm, swap: remove swap slot cache"),
> hibernation has been using the swap slot slow allocation path for
> simplification, which turns out might cause regression for some
> devices because the allocator now rotates clusters too often, leading to
> slower allocation and more random distribution of data.
> 
> Fast allocation is not complex, so implement hibernation support as
> well.
> 
> And reduce the indent of the code too, while at it. It doesn't have to
> check the device flag, as the allocator will also check the device flag
> and refuse to allocate if the device is not writable.
> 
> Test result with Samsung SSD 830 Series (SATA II, 3.0 Gbps) shows the
> performance is several times better [1]:
> 6.19:               324 seconds
> After this series:  35 seconds

10x is a lot, so I think we should offer this to -stable kernels.

If you agree, could you please prepare a more backportable fix? 
Something minimal, separated from the [2/2] cleanup and without the
incidental whitespace alteration?

We can look at the indenting alteration and [2/2] after 7.0-rc1.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/2] mm, swap: speed up hibernation allocation and writeout
  2026-02-15 17:02   ` Andrew Morton
@ 2026-02-15 18:25     ` Kairui Song
  0 siblings, 0 replies; 5+ messages in thread
From: Kairui Song @ 2026-02-15 18:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kasong, Kairui Song via B4 Relay, linux-mm, Chris Li, Kemeng Shi,
	Nhat Pham, Baoquan He, Barry Song, Rafael J. Wysocki,
	Carsten Grohmann, linux-kernel, open list:SUSPEND TO RAM

On Sun, Feb 15, 2026 at 09:02:36AM +0800, Andrew Morton wrote:
> On Sun, 15 Feb 2026 19:15:05 +0800 Kairui Song via B4 Relay <devnull+kasong.tencent.com@kernel.org> wrote:
> 
> > Since commit 0ff67f990bd4 ("mm, swap: remove swap slot cache"),
> > hibernation has been using the swap slot slow allocation path for
> > simplification, which turns out might cause regression for some
> > devices because the allocator now rotates clusters too often, leading to
> > slower allocation and more random distribution of data.
> > 
> > Fast allocation is not complex, so implement hibernation support as
> > well.
> > 
> > And reduce the indent of the code too, while at it. It doesn't have to
> > check the device flag, as the allocator will also check the device flag
> > and refuse to allocate if the device is not writable.
> > 
> > Test result with Samsung SSD 830 Series (SATA II, 3.0 Gbps) shows the
> > performance is several times better [1]:
> > 6.19:               324 seconds
> > After this series:  35 seconds
> 
> 10x is a lot, so I think we should offer this to -stable kernels.
> 
> If you agree, could you please prepare a more backportable fix? 
> Something minimal, separated from the [2/2] cleanup and without the
> incidental whitespace alteration?

Hi Andrew,

I think this is already very close to minimal. But I can send a v3 to
split the indention change in a standalone patch, just to reduce the
LOC changed for stable backport.

I'll also cc stable too. I think we only need to fix for 6.18 and
6.19 right? They will still need manual conflict resolving even
without the indention change. But I can help with that.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] mm, swap: merge common convention and simplify allocation helper
  2026-02-15 11:15 [PATCH v2 0/2] mm/swap: hibernate: improve hibernate performance with new allocator Kairui Song via B4 Relay
  2026-02-15 11:15 ` [PATCH v2 1/2] mm, swap: speed up hibernation allocation and writeout Kairui Song via B4 Relay
@ 2026-02-15 11:15 ` Kairui Song via B4 Relay
  1 sibling, 0 replies; 5+ messages in thread
From: Kairui Song via B4 Relay @ 2026-02-15 11:15 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Chris Li, Kemeng Shi, Nhat Pham, Baoquan He,
	Barry Song, Rafael J. Wysocki, Carsten Grohmann, linux-kernel,
	open list:SUSPEND TO RAM, Kairui Song

From: Kairui Song <kasong@tencent.com>

Almost all callers of the cluster scan helper require the: lock -> check
usefulness/emptiness check -> allocate -> unlock routine. So merge them
into the same helper to simplify the code.

Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/swapfile.c | 30 ++++++++----------------------
 1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index bcac10d96fb5..03cc0ff4dc8c 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -923,11 +923,14 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si,
 	bool need_reclaim, ret, usable;
 
 	lockdep_assert_held(&ci->lock);
-	VM_WARN_ON(!cluster_is_usable(ci, order));
 
-	if (end < nr_pages || ci->count + nr_pages > SWAPFILE_CLUSTER)
+	if (!cluster_is_usable(ci, order) || end < nr_pages ||
+	    ci->count + nr_pages > SWAPFILE_CLUSTER)
 		goto out;
 
+	if (cluster_is_empty(ci))
+		offset = cluster_offset(si, ci);
+
 	for (end -= nr_pages; offset <= end; offset += nr_pages) {
 		need_reclaim = false;
 		if (!cluster_scan_range(si, ci, offset, nr_pages, &need_reclaim))
@@ -1060,14 +1063,7 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si,
 			goto new_cluster;
 
 		ci = swap_cluster_lock(si, offset);
-		/* Cluster could have been used by another order */
-		if (cluster_is_usable(ci, order)) {
-			if (cluster_is_empty(ci))
-				offset = cluster_offset(si, ci);
-			found = alloc_swap_scan_cluster(si, ci, folio, offset);
-		} else {
-			swap_cluster_unlock(ci);
-		}
+		found = alloc_swap_scan_cluster(si, ci, folio, offset);
 		if (found)
 			goto done;
 	}
@@ -1332,14 +1328,7 @@ static bool swap_alloc_fast(struct folio *folio)
 		return false;
 
 	ci = swap_cluster_lock(si, offset);
-	if (cluster_is_usable(ci, order)) {
-		if (cluster_is_empty(ci))
-			offset = cluster_offset(si, ci);
-		alloc_swap_scan_cluster(si, ci, folio, offset);
-	} else {
-		swap_cluster_unlock(ci);
-	}
-
+	alloc_swap_scan_cluster(si, ci, folio, offset);
 	put_swap_device(si);
 	return folio_test_swapcache(folio);
 }
@@ -1945,10 +1934,7 @@ swp_entry_t swap_alloc_hibernation_slot(int type)
 		pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]);
 		if (pcp_si == si && pcp_offset) {
 			ci = swap_cluster_lock(si, pcp_offset);
-			if (cluster_is_usable(ci, 0))
-				offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset);
-			else
-				swap_cluster_unlock(ci);
+			offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset);
 		}
 		if (!offset)
 			offset = cluster_alloc_swap_entry(si, NULL);

-- 
2.52.0




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-02-15 19:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-15 11:15 [PATCH v2 0/2] mm/swap: hibernate: improve hibernate performance with new allocator Kairui Song via B4 Relay
2026-02-15 11:15 ` [PATCH v2 1/2] mm, swap: speed up hibernation allocation and writeout Kairui Song via B4 Relay
2026-02-15 17:02   ` Andrew Morton
2026-02-15 18:25     ` Kairui Song
2026-02-15 11:15 ` [PATCH v2 2/2] mm, swap: merge common convention and simplify allocation helper Kairui Song via B4 Relay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox