* [PATCH v3 0/3] mm/swap: hibernate: improve hibernate performance with new allocator
@ 2026-02-15 19:00 Kairui Song via B4 Relay
2026-02-15 19:00 ` [PATCH v3 1/3] mm, swap: speed up hibernation allocation and writeout Kairui Song via B4 Relay
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Kairui Song via B4 Relay @ 2026-02-15 19:00 UTC (permalink / raw)
To: linux-mm
Cc: Andrew Morton, Chris Li, Kemeng Shi, Nhat Pham, Baoquan He,
Barry Song, Carsten Grohmann, Rafael J. Wysocki, linux-kernel,
open list:SUSPEND TO RAM, Carsten Grohmann, Kairui Song, stable
The new swap allocator didn't provide a high-performance allocation
method for hibernate, and just left it using the easy slow path. As a
result, hibernate performance is quite bad on some devices
Fix it by implementing hibernate support for the fast allocation path.
This regression seems only happen with SSD devices with poor 4k
performance. I've tested on several different NVME and SSD setups, the
performance diff is tiny on them, but testing on a Samsung SSD 830
Series (SATA II, 3.0 Gbps) showed a big difference [1]:
Test result with Samsung SSD 830 Series (SATA II, 3.0 Gbps) thanks
to Carsten Grohmann [1]:
6.19: 324 seconds
After this series: 35 seconds
Test result with SAMSUNG MZ7LH480HAHQ-00005 (SATA 3.2, 6.0 Gb/s):
Before 0ff67f990bd4: Wrote 2230700 kbytes in 4.47 seconds (499.03 MB/s)
After 0ff67f990bd4: Wrote 2215472 kbytes in 4.44 seconds (498.98 MB/s)
After this series: Wrote 2038748 kbytes in 4.04 seconds (504.64 MB/s)
Test result with Memblaze P5910DT0384M00:
Before 0ff67f990bd4: Wrote 2222772 kbytes in 0.84 seconds (2646.15 MB/s)
After 0ff67f990bd4: Wrote 2224184 kbytes in 0.90 seconds (2471.31 MB/s)
After this series: Wrote 1559088 kbytes in 0.55 seconds (2834.70 MB/s)
The performance is almost the same for blazing fast SSDs, but for some
SSDs, the performance is several times better.
Patch 1 improves the hibernate performance by using the fast path, and
patch 2 cleans up the code a bit since there are now multiple fast path
users using similar conventions.
Signed-off-by: Kairui Song <kasong@tencent.com>
Tested-by: Carsten Grohmann <mail@carstengrohmann.de>
Link: https://lore.kernel.org/linux-mm/8b4bdcfa-ce3f-4e23-839f-31367df7c18f@gmx.de/ [1]
---
Changes in v3:
- Split the indention change to a standalone patch.
- Update mail address and add Cc stable.
- Link to v2: https://lore.kernel.org/r/20260215-hibernate-perf-v2-0-cf28c75b04b7@tencent.com
Changes in v2:
- Based on mm-unstable, resend using b4's relay to fix mismathed patch content.
- Link to v1: https://lore.kernel.org/r/20260215-hibernate-perf-v1-0-f55ee9ee67db@tencent.com
---
Kairui Song (3):
mm, swap: speed up hibernation allocation and writeout
mm, swap: reduce indention for hibernate allocation helper
mm, swap: merge common convention and simplify allocation helper
mm/swapfile.c | 55 +++++++++++++++++++++++++------------------------------
1 file changed, 25 insertions(+), 30 deletions(-)
---
base-commit: 53f061047924205138ad9bc315885255f7cc4944
change-id: 20260212-hibernate-perf-fb7783b2b252
Best regards,
--
Kairui Song <kasong@tencent.com>
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH v3 1/3] mm, swap: speed up hibernation allocation and writeout 2026-02-15 19:00 [PATCH v3 0/3] mm/swap: hibernate: improve hibernate performance with new allocator Kairui Song via B4 Relay @ 2026-02-15 19:00 ` Kairui Song via B4 Relay 2026-02-15 20:43 ` Barry Song 2026-02-15 19:00 ` [PATCH v3 2/3] mm, swap: reduce indention for hibernate allocation helper Kairui Song via B4 Relay 2026-02-15 19:00 ` [PATCH v3 3/3] mm, swap: merge common convention and simplify " Kairui Song via B4 Relay 2 siblings, 1 reply; 11+ messages in thread From: Kairui Song via B4 Relay @ 2026-02-15 19:00 UTC (permalink / raw) To: linux-mm Cc: Andrew Morton, Chris Li, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Carsten Grohmann, Rafael J. Wysocki, linux-kernel, open list:SUSPEND TO RAM, Carsten Grohmann, Kairui Song, stable From: Kairui Song <kasong@tencent.com> Since commit 0ff67f990bd4 ("mm, swap: remove swap slot cache"), hibernation has been using the swap slot slow allocation path for simplification, which turns out might cause regression for some devices because the allocator now rotates clusters too often, leading to slower allocation and more random distribution of data. Fast allocation is not complex, so implement hibernation support as well. Test result with Samsung SSD 830 Series (SATA II, 3.0 Gbps) shows the performance is several times better [1]: 6.19: 324 seconds After this series: 35 seconds Fixes: 0ff67f990bd4 ("mm, swap: remove swap slot cache") Reported-by: Carsten Grohmann <mail@carstengrohmann.de> Closes: https://lore.kernel.org/linux-mm/20260206121151.dea3633d1f0ded7bbf49c22e@linux-foundation.org/ Link: https://lore.kernel.org/linux-mm/8b4bdcfa-ce3f-4e23-839f-31367df7c18f@gmx.de/ [1] Cc: stable@vger.kernel.org Signed-off-by: Kairui Song <kasong@tencent.com> --- mm/swapfile.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index c6863ff7152c..32e0e7545ab8 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1926,8 +1926,9 @@ void swap_put_entries_direct(swp_entry_t entry, int nr) /* Allocate a slot for hibernation */ swp_entry_t swap_alloc_hibernation_slot(int type) { - struct swap_info_struct *si = swap_type_to_info(type); - unsigned long offset; + struct swap_info_struct *pcp_si, *si = swap_type_to_info(type); + unsigned long pcp_offset, offset = SWAP_ENTRY_INVALID; + struct swap_cluster_info *ci; swp_entry_t entry = {0}; if (!si) @@ -1937,11 +1938,21 @@ swp_entry_t swap_alloc_hibernation_slot(int type) if (get_swap_device_info(si)) { if (si->flags & SWP_WRITEOK) { /* - * Grab the local lock to be compliant - * with swap table allocation. + * Try the local cluster first if it matches the device. If + * not, try grab a new cluster and override local cluster. */ local_lock(&percpu_swap_cluster.lock); - offset = cluster_alloc_swap_entry(si, NULL); + pcp_si = this_cpu_read(percpu_swap_cluster.si[0]); + pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]); + if (pcp_si == si && pcp_offset) { + ci = swap_cluster_lock(si, pcp_offset); + if (cluster_is_usable(ci, 0)) + offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset); + else + swap_cluster_unlock(ci); + } + if (!offset) + offset = cluster_alloc_swap_entry(si, NULL); local_unlock(&percpu_swap_cluster.lock); if (offset) entry = swp_entry(si->type, offset); -- 2.52.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 1/3] mm, swap: speed up hibernation allocation and writeout 2026-02-15 19:00 ` [PATCH v3 1/3] mm, swap: speed up hibernation allocation and writeout Kairui Song via B4 Relay @ 2026-02-15 20:43 ` Barry Song 2026-02-16 6:06 ` Kairui Song 0 siblings, 1 reply; 11+ messages in thread From: Barry Song @ 2026-02-15 20:43 UTC (permalink / raw) To: kasong Cc: linux-mm, Andrew Morton, Chris Li, Kemeng Shi, Nhat Pham, Baoquan He, Carsten Grohmann, Rafael J. Wysocki, linux-kernel, open list:SUSPEND TO RAM, Carsten Grohmann, stable On Mon, Feb 16, 2026 at 3:00 AM Kairui Song via B4 Relay <devnull+kasong.tencent.com@kernel.org> wrote: > > From: Kairui Song <kasong@tencent.com> > > Since commit 0ff67f990bd4 ("mm, swap: remove swap slot cache"), > hibernation has been using the swap slot slow allocation path for > simplification, which turns out might cause regression for some > devices because the allocator now rotates clusters too often, leading to > slower allocation and more random distribution of data. > > Fast allocation is not complex, so implement hibernation support as > well. > > Test result with Samsung SSD 830 Series (SATA II, 3.0 Gbps) shows the > performance is several times better [1]: > 6.19: 324 seconds > After this series: 35 seconds > > Fixes: 0ff67f990bd4 ("mm, swap: remove swap slot cache") > Reported-by: Carsten Grohmann <mail@carstengrohmann.de> > Closes: https://lore.kernel.org/linux-mm/20260206121151.dea3633d1f0ded7bbf49c22e@linux-foundation.org/ > Link: https://lore.kernel.org/linux-mm/8b4bdcfa-ce3f-4e23-839f-31367df7c18f@gmx.de/ [1] > Cc: stable@vger.kernel.org > Signed-off-by: Kairui Song <kasong@tencent.com> > --- > mm/swapfile.c | 21 ++++++++++++++++----- > 1 file changed, 16 insertions(+), 5 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index c6863ff7152c..32e0e7545ab8 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1926,8 +1926,9 @@ void swap_put_entries_direct(swp_entry_t entry, int nr) > /* Allocate a slot for hibernation */ > swp_entry_t swap_alloc_hibernation_slot(int type) > { > - struct swap_info_struct *si = swap_type_to_info(type); > - unsigned long offset; > + struct swap_info_struct *pcp_si, *si = swap_type_to_info(type); > + unsigned long pcp_offset, offset = SWAP_ENTRY_INVALID; > + struct swap_cluster_info *ci; > swp_entry_t entry = {0}; > > if (!si) > @@ -1937,11 +1938,21 @@ swp_entry_t swap_alloc_hibernation_slot(int type) > if (get_swap_device_info(si)) { > if (si->flags & SWP_WRITEOK) { > /* > - * Grab the local lock to be compliant > - * with swap table allocation. > + * Try the local cluster first if it matches the device. If > + * not, try grab a new cluster and override local cluster. > */ > local_lock(&percpu_swap_cluster.lock); > - offset = cluster_alloc_swap_entry(si, NULL); > + pcp_si = this_cpu_read(percpu_swap_cluster.si[0]); > + pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]); > + if (pcp_si == si && pcp_offset) { > + ci = swap_cluster_lock(si, pcp_offset); > + if (cluster_is_usable(ci, 0)) > + offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset); > + else > + swap_cluster_unlock(ci); > + } > + if (!offset) I assume you mean SWAP_ENTRY_INVALID? Would that be more readable? > + offset = cluster_alloc_swap_entry(si, NULL); > local_unlock(&percpu_swap_cluster.lock); > if (offset) > entry = swp_entry(si->type, offset); > > -- > 2.52.0 Thanks Barry ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 1/3] mm, swap: speed up hibernation allocation and writeout 2026-02-15 20:43 ` Barry Song @ 2026-02-16 6:06 ` Kairui Song 0 siblings, 0 replies; 11+ messages in thread From: Kairui Song @ 2026-02-16 6:06 UTC (permalink / raw) To: Barry Song Cc: kasong, linux-mm, Andrew Morton, Chris Li, Kemeng Shi, Nhat Pham, Baoquan He, Carsten Grohmann, Rafael J. Wysocki, linux-kernel, open list:SUSPEND TO RAM, Carsten Grohmann, stable On Mon, Feb 16, 2026 at 04:43:40AM +0800, Barry Song wrote: > > @@ -1937,11 +1938,21 @@ swp_entry_t swap_alloc_hibernation_slot(int type) > > if (get_swap_device_info(si)) { > > if (si->flags & SWP_WRITEOK) { > > /* > > - * Grab the local lock to be compliant > > - * with swap table allocation. > > + * Try the local cluster first if it matches the device. If > > + * not, try grab a new cluster and override local cluster. > > */ > > local_lock(&percpu_swap_cluster.lock); > > - offset = cluster_alloc_swap_entry(si, NULL); > > + pcp_si = this_cpu_read(percpu_swap_cluster.si[0]); > > + pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]); > > + if (pcp_si == si && pcp_offset) { > > + ci = swap_cluster_lock(si, pcp_offset); > > + if (cluster_is_usable(ci, 0)) > > + offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset); > > + else > > + swap_cluster_unlock(ci); > > + } > > + if (!offset) > > I assume you mean SWAP_ENTRY_INVALID? Would that be more readable? Yes, it's very common in swapfile.c to check !offset since SWAP_ENTRY_INVALID is zero. But I agree checking SWAP_ENTRY_INVALID is more readable and maintainable, I'll change to SWAP_ENTRY_INVALID, also use this macro more in further codes. Thanks! ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 2/3] mm, swap: reduce indention for hibernate allocation helper 2026-02-15 19:00 [PATCH v3 0/3] mm/swap: hibernate: improve hibernate performance with new allocator Kairui Song via B4 Relay 2026-02-15 19:00 ` [PATCH v3 1/3] mm, swap: speed up hibernation allocation and writeout Kairui Song via B4 Relay @ 2026-02-15 19:00 ` Kairui Song via B4 Relay 2026-02-15 23:20 ` Barry Song 2026-02-15 19:00 ` [PATCH v3 3/3] mm, swap: merge common convention and simplify " Kairui Song via B4 Relay 2 siblings, 1 reply; 11+ messages in thread From: Kairui Song via B4 Relay @ 2026-02-15 19:00 UTC (permalink / raw) To: linux-mm Cc: Andrew Morton, Chris Li, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Carsten Grohmann, Rafael J. Wysocki, linux-kernel, open list:SUSPEND TO RAM, Carsten Grohmann, Kairui Song From: Kairui Song <kasong@tencent.com> It doesn't have to check the device flag, as the allocator will also check the device flag and refuse to allocate if the device is not writable. This might cause a trivial waste of CPU cycles of hibernate allocation raced with swapoff, but that is very unlikely to happen. Removing the check on the common path should be more helpful. Signed-off-by: Kairui Song <kasong@tencent.com> --- mm/swapfile.c | 38 ++++++++++++++++++-------------------- 1 file changed, 18 insertions(+), 20 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 32e0e7545ab8..0d1b17c99221 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1936,27 +1936,25 @@ swp_entry_t swap_alloc_hibernation_slot(int type) /* This is called for allocating swap entry, not cache */ if (get_swap_device_info(si)) { - if (si->flags & SWP_WRITEOK) { - /* - * Try the local cluster first if it matches the device. If - * not, try grab a new cluster and override local cluster. - */ - local_lock(&percpu_swap_cluster.lock); - pcp_si = this_cpu_read(percpu_swap_cluster.si[0]); - pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]); - if (pcp_si == si && pcp_offset) { - ci = swap_cluster_lock(si, pcp_offset); - if (cluster_is_usable(ci, 0)) - offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset); - else - swap_cluster_unlock(ci); - } - if (!offset) - offset = cluster_alloc_swap_entry(si, NULL); - local_unlock(&percpu_swap_cluster.lock); - if (offset) - entry = swp_entry(si->type, offset); + /* + * Try the local cluster first if it matches the device. If + * not, try grab a new cluster and override local cluster. + */ + local_lock(&percpu_swap_cluster.lock); + pcp_si = this_cpu_read(percpu_swap_cluster.si[0]); + pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]); + if (pcp_si == si && pcp_offset) { + ci = swap_cluster_lock(si, pcp_offset); + if (cluster_is_usable(ci, 0)) + offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset); + else + swap_cluster_unlock(ci); } + if (!offset) + offset = cluster_alloc_swap_entry(si, NULL); + local_unlock(&percpu_swap_cluster.lock); + if (offset) + entry = swp_entry(si->type, offset); put_swap_device(si); } fail: -- 2.52.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 2/3] mm, swap: reduce indention for hibernate allocation helper 2026-02-15 19:00 ` [PATCH v3 2/3] mm, swap: reduce indention for hibernate allocation helper Kairui Song via B4 Relay @ 2026-02-15 23:20 ` Barry Song 2026-02-16 6:21 ` Kairui Song 0 siblings, 1 reply; 11+ messages in thread From: Barry Song @ 2026-02-15 23:20 UTC (permalink / raw) To: kasong Cc: linux-mm, Andrew Morton, Chris Li, Kemeng Shi, Nhat Pham, Baoquan He, Carsten Grohmann, Rafael J. Wysocki, linux-kernel, open list:SUSPEND TO RAM, Carsten Grohmann On Mon, Feb 16, 2026 at 3:00 AM Kairui Song via B4 Relay <devnull+kasong.tencent.com@kernel.org> wrote: > > From: Kairui Song <kasong@tencent.com> > > It doesn't have to check the device flag, as the allocator will also > check the device flag and refuse to allocate if the device is not > writable. This might cause a trivial waste of CPU cycles of hibernate > allocation raced with swapoff, but that is very unlikely to happen. > Removing the check on the common path should be more helpful. > > Signed-off-by: Kairui Song <kasong@tencent.com> > --- > mm/swapfile.c | 38 ++++++++++++++++++-------------------- > 1 file changed, 18 insertions(+), 20 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 32e0e7545ab8..0d1b17c99221 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1936,27 +1936,25 @@ swp_entry_t swap_alloc_hibernation_slot(int type) > > /* This is called for allocating swap entry, not cache */ > if (get_swap_device_info(si)) { I guess we could further reduce indentation by doing: if (!get_swap_device_info(si)) goto fail; > - if (si->flags & SWP_WRITEOK) { > - /* > - * Try the local cluster first if it matches the device. If > - * not, try grab a new cluster and override local cluster. > - */ > - local_lock(&percpu_swap_cluster.lock); > - pcp_si = this_cpu_read(percpu_swap_cluster.si[0]); > - pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]); > - if (pcp_si == si && pcp_offset) { > - ci = swap_cluster_lock(si, pcp_offset); > - if (cluster_is_usable(ci, 0)) > - offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset); > - else > - swap_cluster_unlock(ci); > - } > - if (!offset) > - offset = cluster_alloc_swap_entry(si, NULL); > - local_unlock(&percpu_swap_cluster.lock); > - if (offset) > - entry = swp_entry(si->type, offset); > + /* > + * Try the local cluster first if it matches the device. If > + * not, try grab a new cluster and override local cluster. > + */ > + local_lock(&percpu_swap_cluster.lock); > + pcp_si = this_cpu_read(percpu_swap_cluster.si[0]); > + pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]); > + if (pcp_si == si && pcp_offset) { > + ci = swap_cluster_lock(si, pcp_offset); > + if (cluster_is_usable(ci, 0)) > + offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset); > + else > + swap_cluster_unlock(ci); > } > + if (!offset) > + offset = cluster_alloc_swap_entry(si, NULL); > + local_unlock(&percpu_swap_cluster.lock); > + if (offset) > + entry = swp_entry(si->type, offset); > put_swap_device(si); > } > fail: > > -- > 2.52.0 > > Thanks Barry ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 2/3] mm, swap: reduce indention for hibernate allocation helper 2026-02-15 23:20 ` Barry Song @ 2026-02-16 6:21 ` Kairui Song 2026-02-16 7:37 ` Barry Song 0 siblings, 1 reply; 11+ messages in thread From: Kairui Song @ 2026-02-16 6:21 UTC (permalink / raw) To: Barry Song Cc: kasong, linux-mm, Andrew Morton, Chris Li, Kemeng Shi, Nhat Pham, Baoquan He, Carsten Grohmann, Rafael J. Wysocki, linux-kernel, open list:SUSPEND TO RAM, Carsten Grohmann On Mon, Feb 16, 2026 at 07:20:49AM +0800, Barry Song wrote: > On Mon, Feb 16, 2026 at 3:00 AM Kairui Song via B4 Relay > <devnull+kasong.tencent.com@kernel.org> wrote: > > > > From: Kairui Song <kasong@tencent.com> > > > > It doesn't have to check the device flag, as the allocator will also > > check the device flag and refuse to allocate if the device is not > > writable. This might cause a trivial waste of CPU cycles of hibernate > > allocation raced with swapoff, but that is very unlikely to happen. > > Removing the check on the common path should be more helpful. > > > > Signed-off-by: Kairui Song <kasong@tencent.com> > > --- > > mm/swapfile.c | 38 ++++++++++++++++++-------------------- > > 1 file changed, 18 insertions(+), 20 deletions(-) > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index 32e0e7545ab8..0d1b17c99221 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -1936,27 +1936,25 @@ swp_entry_t swap_alloc_hibernation_slot(int type) > > > > /* This is called for allocating swap entry, not cache */ > > if (get_swap_device_info(si)) { > > > I guess we could further reduce indentation by doing: > if (!get_swap_device_info(si)) > goto fail; > Agree, I think we can make it even simpler by having: /* Return empty entry if device is not usable (swapoff or full) */ if (!si || !get_swap_device_info(si)) return entry; Then the `fail` label is also gone. I'll post a v4 later today combined with your another suggestion. Thanks! ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 2/3] mm, swap: reduce indention for hibernate allocation helper 2026-02-16 6:21 ` Kairui Song @ 2026-02-16 7:37 ` Barry Song 0 siblings, 0 replies; 11+ messages in thread From: Barry Song @ 2026-02-16 7:37 UTC (permalink / raw) To: Kairui Song Cc: kasong, linux-mm, Andrew Morton, Chris Li, Kemeng Shi, Nhat Pham, Baoquan He, Carsten Grohmann, Rafael J. Wysocki, linux-kernel, open list:SUSPEND TO RAM, Carsten Grohmann On Mon, Feb 16, 2026 at 2:21 PM Kairui Song <ryncsn@gmail.com> wrote: > > On Mon, Feb 16, 2026 at 07:20:49AM +0800, Barry Song wrote: > > On Mon, Feb 16, 2026 at 3:00 AM Kairui Song via B4 Relay > > <devnull+kasong.tencent.com@kernel.org> wrote: > > > > > > From: Kairui Song <kasong@tencent.com> > > > > > > It doesn't have to check the device flag, as the allocator will also > > > check the device flag and refuse to allocate if the device is not > > > writable. This might cause a trivial waste of CPU cycles of hibernate > > > allocation raced with swapoff, but that is very unlikely to happen. > > > Removing the check on the common path should be more helpful. > > > > > > Signed-off-by: Kairui Song <kasong@tencent.com> > > > --- > > > mm/swapfile.c | 38 ++++++++++++++++++-------------------- > > > 1 file changed, 18 insertions(+), 20 deletions(-) > > > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > > index 32e0e7545ab8..0d1b17c99221 100644 > > > --- a/mm/swapfile.c > > > +++ b/mm/swapfile.c > > > @@ -1936,27 +1936,25 @@ swp_entry_t swap_alloc_hibernation_slot(int type) > > > > > > /* This is called for allocating swap entry, not cache */ > > > if (get_swap_device_info(si)) { > > > > > > I guess we could further reduce indentation by doing: > > if (!get_swap_device_info(si)) > > goto fail; > > > > Agree, I think we can make it even simpler by having: > > /* Return empty entry if device is not usable (swapoff or full) */ > if (!si || !get_swap_device_info(si)) > return entry; > > Then the `fail` label is also gone. Yes, this looks even nicer to me. :-) > > I'll post a v4 later today combined with your another suggestion. Thanks! ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3 3/3] mm, swap: merge common convention and simplify allocation helper 2026-02-15 19:00 [PATCH v3 0/3] mm/swap: hibernate: improve hibernate performance with new allocator Kairui Song via B4 Relay 2026-02-15 19:00 ` [PATCH v3 1/3] mm, swap: speed up hibernation allocation and writeout Kairui Song via B4 Relay 2026-02-15 19:00 ` [PATCH v3 2/3] mm, swap: reduce indention for hibernate allocation helper Kairui Song via B4 Relay @ 2026-02-15 19:00 ` Kairui Song via B4 Relay 2026-02-16 7:34 ` Barry Song 2 siblings, 1 reply; 11+ messages in thread From: Kairui Song via B4 Relay @ 2026-02-15 19:00 UTC (permalink / raw) To: linux-mm Cc: Andrew Morton, Chris Li, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Carsten Grohmann, Rafael J. Wysocki, linux-kernel, open list:SUSPEND TO RAM, Carsten Grohmann, Kairui Song From: Kairui Song <kasong@tencent.com> Almost all callers of the cluster scan helper require the: lock -> check usefulness/emptiness check -> allocate -> unlock routine. So merge them into the same helper to simplify the code. Signed-off-by: Kairui Song <kasong@tencent.com> --- mm/swapfile.c | 30 ++++++++---------------------- 1 file changed, 8 insertions(+), 22 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 0d1b17c99221..68dbbbd0dd24 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -923,11 +923,14 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, bool need_reclaim, ret, usable; lockdep_assert_held(&ci->lock); - VM_WARN_ON(!cluster_is_usable(ci, order)); - if (end < nr_pages || ci->count + nr_pages > SWAPFILE_CLUSTER) + if (!cluster_is_usable(ci, order) || end < nr_pages || + ci->count + nr_pages > SWAPFILE_CLUSTER) goto out; + if (cluster_is_empty(ci)) + offset = cluster_offset(si, ci); + for (end -= nr_pages; offset <= end; offset += nr_pages) { need_reclaim = false; if (!cluster_scan_range(si, ci, offset, nr_pages, &need_reclaim)) @@ -1060,14 +1063,7 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, goto new_cluster; ci = swap_cluster_lock(si, offset); - /* Cluster could have been used by another order */ - if (cluster_is_usable(ci, order)) { - if (cluster_is_empty(ci)) - offset = cluster_offset(si, ci); - found = alloc_swap_scan_cluster(si, ci, folio, offset); - } else { - swap_cluster_unlock(ci); - } + found = alloc_swap_scan_cluster(si, ci, folio, offset); if (found) goto done; } @@ -1332,14 +1328,7 @@ static bool swap_alloc_fast(struct folio *folio) return false; ci = swap_cluster_lock(si, offset); - if (cluster_is_usable(ci, order)) { - if (cluster_is_empty(ci)) - offset = cluster_offset(si, ci); - alloc_swap_scan_cluster(si, ci, folio, offset); - } else { - swap_cluster_unlock(ci); - } - + alloc_swap_scan_cluster(si, ci, folio, offset); put_swap_device(si); return folio_test_swapcache(folio); } @@ -1945,10 +1934,7 @@ swp_entry_t swap_alloc_hibernation_slot(int type) pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]); if (pcp_si == si && pcp_offset) { ci = swap_cluster_lock(si, pcp_offset); - if (cluster_is_usable(ci, 0)) - offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset); - else - swap_cluster_unlock(ci); + offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset); } if (!offset) offset = cluster_alloc_swap_entry(si, NULL); -- 2.52.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 3/3] mm, swap: merge common convention and simplify allocation helper 2026-02-15 19:00 ` [PATCH v3 3/3] mm, swap: merge common convention and simplify " Kairui Song via B4 Relay @ 2026-02-16 7:34 ` Barry Song 2026-02-16 7:53 ` Kairui Song 0 siblings, 1 reply; 11+ messages in thread From: Barry Song @ 2026-02-16 7:34 UTC (permalink / raw) To: kasong Cc: linux-mm, Andrew Morton, Chris Li, Kemeng Shi, Nhat Pham, Baoquan He, Carsten Grohmann, Rafael J. Wysocki, linux-kernel, open list:SUSPEND TO RAM, Carsten Grohmann On Mon, Feb 16, 2026 at 3:00 AM Kairui Song via B4 Relay <devnull+kasong.tencent.com@kernel.org> wrote: > > From: Kairui Song <kasong@tencent.com> > > Almost all callers of the cluster scan helper require the: lock -> check > usefulness/emptiness check -> allocate -> unlock routine. So merge them > into the same helper to simplify the code. Previously, when !cluster_is_usable(ci, order), we only called swap_cluster_unlock(). Now we do more work in this path: out: relocate_cluster(si, ci); swap_cluster_unlock(ci); if (si->flags & SWP_SOLIDSTATE) { this_cpu_write(percpu_swap_cluster.offset[order], next); this_cpu_write(percpu_swap_cluster.si[order], si); } else { si->global_cluster->next[order] = next; } return found; I assume this is what you want to do as well, but can we add some explanation here? Also, it would be better to add a comment that alloc_swap_scan_cluster() expects ci->lock to be held on entry and releases ci->lock before returning. > > Signed-off-by: Kairui Song <kasong@tencent.com> > --- > mm/swapfile.c | 30 ++++++++---------------------- > 1 file changed, 8 insertions(+), 22 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 0d1b17c99221..68dbbbd0dd24 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -923,11 +923,14 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, > bool need_reclaim, ret, usable; > > lockdep_assert_held(&ci->lock); > - VM_WARN_ON(!cluster_is_usable(ci, order)); > > - if (end < nr_pages || ci->count + nr_pages > SWAPFILE_CLUSTER) > + if (!cluster_is_usable(ci, order) || end < nr_pages || > + ci->count + nr_pages > SWAPFILE_CLUSTER) > goto out; > > + if (cluster_is_empty(ci)) > + offset = cluster_offset(si, ci); > + > for (end -= nr_pages; offset <= end; offset += nr_pages) { > need_reclaim = false; > if (!cluster_scan_range(si, ci, offset, nr_pages, &need_reclaim)) > @@ -1060,14 +1063,7 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, > goto new_cluster; > > ci = swap_cluster_lock(si, offset); > - /* Cluster could have been used by another order */ > - if (cluster_is_usable(ci, order)) { > - if (cluster_is_empty(ci)) > - offset = cluster_offset(si, ci); > - found = alloc_swap_scan_cluster(si, ci, folio, offset); > - } else { > - swap_cluster_unlock(ci); > - } > + found = alloc_swap_scan_cluster(si, ci, folio, offset); > if (found) > goto done; > } > @@ -1332,14 +1328,7 @@ static bool swap_alloc_fast(struct folio *folio) > return false; > > ci = swap_cluster_lock(si, offset); > - if (cluster_is_usable(ci, order)) { > - if (cluster_is_empty(ci)) > - offset = cluster_offset(si, ci); > - alloc_swap_scan_cluster(si, ci, folio, offset); > - } else { > - swap_cluster_unlock(ci); > - } > - > + alloc_swap_scan_cluster(si, ci, folio, offset); > put_swap_device(si); > return folio_test_swapcache(folio); > } > @@ -1945,10 +1934,7 @@ swp_entry_t swap_alloc_hibernation_slot(int type) > pcp_offset = this_cpu_read(percpu_swap_cluster.offset[0]); > if (pcp_si == si && pcp_offset) { > ci = swap_cluster_lock(si, pcp_offset); > - if (cluster_is_usable(ci, 0)) > - offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset); > - else > - swap_cluster_unlock(ci); > + offset = alloc_swap_scan_cluster(si, ci, NULL, pcp_offset); > } > if (!offset) > offset = cluster_alloc_swap_entry(si, NULL); > > -- > 2.52.0 > > Thanks Barry ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v3 3/3] mm, swap: merge common convention and simplify allocation helper 2026-02-16 7:34 ` Barry Song @ 2026-02-16 7:53 ` Kairui Song 0 siblings, 0 replies; 11+ messages in thread From: Kairui Song @ 2026-02-16 7:53 UTC (permalink / raw) To: Barry Song Cc: kasong, linux-mm, Andrew Morton, Chris Li, Kemeng Shi, Nhat Pham, Baoquan He, Carsten Grohmann, Rafael J. Wysocki, linux-kernel, open list:SUSPEND TO RAM, Carsten Grohmann On Mon, Feb 16, 2026 at 03:34:54PM +0800, Barry Song wrote: > On Mon, Feb 16, 2026 at 3:00 AM Kairui Song via B4 Relay > <devnull+kasong.tencent.com@kernel.org> wrote: > > > > From: Kairui Song <kasong@tencent.com> > > > > Almost all callers of the cluster scan helper require the: lock -> check > > usefulness/emptiness check -> allocate -> unlock routine. So merge them > > into the same helper to simplify the code. > > Previously, when !cluster_is_usable(ci, order), we only called > swap_cluster_unlock(). Now we do more work in this path: > > > out: > relocate_cluster(si, ci); > swap_cluster_unlock(ci); > if (si->flags & SWP_SOLIDSTATE) { > this_cpu_write(percpu_swap_cluster.offset[order], next); > this_cpu_write(percpu_swap_cluster.si[order], si); > } else { > si->global_cluster->next[order] = next; > } > return found; > > I assume this is what you want to do as well, but can we add > some explanation here? Yes, that's fine. alloc_swap_scan_cluster is suppose to update the percpu offset cache so if the cluster is not usable, writing SWAP_ENTRY_INVALID to invalidate the cache might even be helpful for future scan. At lease not harmful, I'll add some explanation, comments. > > Also, it would be better to add a comment that > alloc_swap_scan_cluster() expects ci->lock to be held on > entry and releases ci->lock before returning. Thanks for the suggestion, I even thought about renaming the helper to indicate it will try update the percpu offset and release the lock. But didn't have a better idea to naming and we also have alloc_swap_scan_list, leave the name untouched seems more consistent. I'll just add some comment then. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-02-16 7:54 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-02-15 19:00 [PATCH v3 0/3] mm/swap: hibernate: improve hibernate performance with new allocator Kairui Song via B4 Relay 2026-02-15 19:00 ` [PATCH v3 1/3] mm, swap: speed up hibernation allocation and writeout Kairui Song via B4 Relay 2026-02-15 20:43 ` Barry Song 2026-02-16 6:06 ` Kairui Song 2026-02-15 19:00 ` [PATCH v3 2/3] mm, swap: reduce indention for hibernate allocation helper Kairui Song via B4 Relay 2026-02-15 23:20 ` Barry Song 2026-02-16 6:21 ` Kairui Song 2026-02-16 7:37 ` Barry Song 2026-02-15 19:00 ` [PATCH v3 3/3] mm, swap: merge common convention and simplify " Kairui Song via B4 Relay 2026-02-16 7:34 ` Barry Song 2026-02-16 7:53 ` Kairui Song
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox