* [PATCH] PM: hibernate: preserve uswsusp swap pin across SNAPSHOT_SET_SWAP_AREA re-set failures
@ 2026-04-14 14:32 DaeMyung Kang
2026-04-14 16:18 ` YoungJun Park
2026-04-14 16:49 ` [PATCH v2] PM: hibernate: keep existing uswsusp swap pin if re-selection fails DaeMyung Kang
0 siblings, 2 replies; 3+ messages in thread
From: DaeMyung Kang @ 2026-04-14 14:32 UTC (permalink / raw)
To: Andrew Morton, Rafael J . Wysocki
Cc: Youngjun Park, Kairui Song, Chris Li, Kemeng Shi, Nhat Pham,
Baoquan He, Barry Song, Len Brown, Pavel Machek, linux-mm,
linux-pm, linux-kernel, DaeMyung Kang, stable
Commit 5b2b0c6e4577 ("mm/swap, PM: hibernate: fix swapoff race in uswsusp
by pinning swap device") introduced SWP_HIBERNATION so that the swap
device chosen via /dev/snapshot is held against swapoff for the entire
uswsusp session. The intended invariant is: from the first successful
SNAPSHOT_SET_SWAP_AREA until the /dev/snapshot fd is closed, exactly one
swap device is pinned.
snapshot_set_swap_area() breaks that invariant on the re-set path:
unpin_hibernation_swap_type(data->swap);
data->swap = pin_hibernation_swap_type(swdev, offset);
if (data->swap < 0)
return swdev ? -ENODEV : -EINVAL;
The unpin happens unconditionally before the new pin is attempted. If
the new pin fails (e.g. user space supplies an offset/device that is not
an active swap area), the session continues with no swap device pinned,
reopening exactly the swapoff race the original commit was meant to
close. A subsequent swapoff on the previously selected device now
succeeds where it would have been blocked with EBUSY.
As a secondary consequence, data->swap is overwritten with the negative
error return from pin_hibernation_swap_type(). The value is harmless at
close time (swap_type_to_info() on the invalid type returns NULL, so the
release-side unpin is a no-op and there is no pin to leak), but leaving
a negative sentinel in data->swap for the rest of the session is still
a state-hygiene defect: any future reader of data->swap cannot
distinguish it from a never-set session.
The bug is observable with ioctls alone; it does not require an actual
hibernation cycle. A user-space caller that supplies one valid and then
one invalid resume_swap_area is enough to strand the session without a
pin.
Reordering pin/unpin in the caller cannot fix this cleanly. Each of
pin_hibernation_swap_type() / unpin_hibernation_swap_type() acquires
swap_lock independently, so any two-call sequence leaves a window in
which swapoff can observe an inconsistent pin state. The same-area
re-set case (type == old_type) also cannot be expressed with pin+unpin
without either toggling the bit (racy) or returning EBUSY (a false
error).
Introduce repin_hibernation_swap_type(), which performs the transition
atomically under a single swap_lock acquisition:
- verify that old_type, if held, still carries SWP_HIBERNATION;
- look up the new swap area;
- if it is the same as old_type, return without touching any flags;
- otherwise clear SWP_HIBERNATION on the old si and set it on the
new si within the same critical section;
- on any failure, return without modifying either si's flags, so the
previous pin is preserved.
Update snapshot_set_swap_area() to use the new helper and to stage the
result in a local variable, committing to data->swap only on success.
This closes the protection-loss window and also avoids the data->swap
corruption on failure.
Fixes: 5b2b0c6e4577 ("mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device")
Cc: stable@vger.kernel.org
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
---
Notes (not part of the commit, stripped by git am):
Baseline
--------
This patch is generated against linux-next at commit 5b2b0c6e4577
("mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap
device"). Mainline does not yet carry that commit, and neither the
helpers it introduces (pin/unpin_hibernation_swap_type) nor the code
site this patch modifies exist there. The base-commit trailer at the
bottom of the mbox records the exact commit.
Testing
-------
The bug does not require an actual hibernation cycle. The ioctl path
alone is enough to re-open the swapoff race. A targeted reproducer is
included below; run it as root in a throwaway VM with two active swap
block devices and one non-swap block device (three arguments).
Run inside a VM on linux-next at 5b2b0c6e4577 with this patch applied:
step1: pinned active swap /dev/vda
step2: swapoff blocked with EBUSY while pin is held
step3: repinned active swap to /dev/vdb
step4: swapoff(/dev/vda) succeeded after repinning away
step5: repinned swap is blocked with EBUSY
step6: bogus SNAPSHOT_SET_SWAP_AREA failed as expected: No such device
step7: swapoff(/dev/vdb) is still blocked with EBUSY
result: FIXED kernel, hibernation pin was preserved
step8: swapoff succeeded after closing /dev/snapshot
Run on the same tree without this patch applied: step7 instead reports
"swapoff(/dev/vdb) succeeded after failed re-set" and the program exits
with status 1 ("BUGGY kernel, hibernation pin was dropped").
What the reproducer covers:
- SWP_HIBERNATION is actually enforced against swapoff (step2, step5);
- the success path of repin_hibernation_swap_type() atomically moves
the pin from one active swap to another (step3, step4, step5);
- the failure path of repin_hibernation_swap_type() preserves the
existing pin (step6, step7);
- the pin lifetime ends on /dev/snapshot close (step8).
What it does not cover:
- snapshot_open(O_RDONLY) initial resume-device pin path;
- the full suspend-to-disk image create/restore flow;
- concurrent swapoff racing against SNAPSHOT_SET_SWAP_AREA;
- the type == old_type idempotent branch (not externally observable).
A normal sysfs-based suspend-to-disk cycle continues to work; the
find_hibernation_swap_type() path is unchanged. Build tested with
allmodconfig and run-tested with CONFIG_PROVE_LOCKING=y and
CONFIG_KASAN=y. The VM was booted with oops=panic panic=-1 so any
WARN/Oops/BUG would have halted the run; the full test completed
cleanly with no kernel log diagnostics, including the three
WARN_ON_ONCE() invariant checks inside repin_hibernation_swap_type().
Reproducer (C source, for reference only -- not added to the tree):
// SPDX-License-Identifier: GPL-2.0
/*
* Reproduce the uswsusp SNAPSHOT_SET_SWAP_AREA pin lifetime regression.
*
* This targets the bug introduced after hibernation swap pinning was added:
* a failed SNAPSHOT_SET_SWAP_AREA() could drop the existing pin, letting a
* subsequent swapoff() succeed while /dev/snapshot was still open.
*
* Run only inside a throwaway VM. The test manipulates swap state and leaves
* the target swap area disabled on success.
*/
#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <linux/types.h>
#include <linux/suspend_ioctls.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/swap.h>
#include <sys/sysmacros.h>
#include <unistd.h>
static void print_usage(const char *prog)
{
fprintf(stderr,
"usage: %s <active-swap-dev-1> <active-swap-dev-2> <bogus-block-dev>\n"
" <active-swap-dev-1> must be an active swap block device.\n"
" <active-swap-dev-2> must be a second active swap block device.\n"
" <bogus-block-dev> must be a block device that is not a swap area.\n",
prog);
}
static int encode_dev(dev_t dev)
{
unsigned int major_num = major(dev);
unsigned int minor_num = minor(dev);
/*
* Match the kernel's new_encode_dev() layout; SNAPSHOT_SET_SWAP_AREA
* decodes this with new_decode_dev() on the kernel side.
*/
return (major_num & 0xfff) << 8 |
(minor_num & 0xff) |
((minor_num & ~0xff) << 12);
}
static int get_block_dev(const char *path, dev_t *dev)
{
struct stat st;
if (stat(path, &st) < 0) {
fprintf(stderr, "stat(%s): %s\n", path, strerror(errno));
return -errno;
}
if (!S_ISBLK(st.st_mode)) {
fprintf(stderr, "%s is not a block device\n", path);
return -EINVAL;
}
*dev = st.st_rdev;
return 0;
}
static int snapshot_set_swap_area(int fd, dev_t dev, long long offset)
{
struct resume_swap_area area = {
.offset = offset,
.dev = encode_dev(dev),
};
if (ioctl(fd, SNAPSHOT_SET_SWAP_AREA, &area) < 0)
return -errno;
return 0;
}
int main(int argc, char **argv)
{
const char *swap_path_1, *swap_path_2, *bogus_path;
dev_t swap_dev_1, swap_dev_2, bogus_dev;
int fd, ret;
bool buggy = false;
if (argc != 4) {
print_usage(argv[0]);
return 2;
}
if (geteuid() != 0) {
fprintf(stderr, "must run as root\n");
return 2;
}
swap_path_1 = argv[1];
swap_path_2 = argv[2];
bogus_path = argv[3];
ret = get_block_dev(swap_path_1, &swap_dev_1);
if (ret < 0)
return 2;
ret = get_block_dev(swap_path_2, &swap_dev_2);
if (ret < 0)
return 2;
ret = get_block_dev(bogus_path, &bogus_dev);
if (ret < 0)
return 2;
fd = open("/dev/snapshot", O_WRONLY);
if (fd < 0) {
fprintf(stderr, "open(/dev/snapshot): %s\n", strerror(errno));
return 2;
}
ret = snapshot_set_swap_area(fd, swap_dev_1, 0);
if (ret < 0) {
fprintf(stderr, "step1: valid SNAPSHOT_SET_SWAP_AREA failed: %s\n",
strerror(-ret));
close(fd);
return 2;
}
printf("step1: pinned active swap %s\n", swap_path_1);
if (swapoff(swap_path_1) == 0) {
fprintf(stderr,
"step2: swapoff(%s) unexpectedly succeeded while pinned\n",
swap_path_1);
close(fd);
return 1;
}
if (errno != EBUSY) {
fprintf(stderr,
"step2: swapoff(%s) failed with %s, expected EBUSY\n",
swap_path_1, strerror(errno));
close(fd);
return 2;
}
printf("step2: swapoff blocked with EBUSY while pin is held\n");
ret = snapshot_set_swap_area(fd, swap_dev_2, 0);
if (ret < 0) {
fprintf(stderr,
"step3: second valid SNAPSHOT_SET_SWAP_AREA failed: %s\n",
strerror(-ret));
close(fd);
return 2;
}
printf("step3: repinned active swap to %s\n", swap_path_2);
if (swapoff(swap_path_1) < 0) {
fprintf(stderr,
"step4: swapoff(%s) failed after repin: %s\n",
swap_path_1, strerror(errno));
close(fd);
return 2;
}
printf("step4: swapoff(%s) succeeded after repinning away\n",
swap_path_1);
if (swapoff(swap_path_2) == 0) {
fprintf(stderr,
"step5: swapoff(%s) unexpectedly succeeded while pinned\n",
swap_path_2);
close(fd);
return 1;
}
if (errno != EBUSY) {
fprintf(stderr,
"step5: swapoff(%s) failed with %s, expected EBUSY\n",
swap_path_2, strerror(errno));
close(fd);
return 2;
}
printf("step5: repinned swap is blocked with EBUSY\n");
ret = snapshot_set_swap_area(fd, bogus_dev, 0);
if (!ret) {
fprintf(stderr,
"step6: bogus SNAPSHOT_SET_SWAP_AREA unexpectedly succeeded\n");
close(fd);
return 2;
}
printf("step6: bogus SNAPSHOT_SET_SWAP_AREA failed as expected: %s\n",
strerror(-ret));
if (swapoff(swap_path_2) == 0) {
printf("step7: swapoff(%s) succeeded after failed re-set\n",
swap_path_2);
printf("result: BUGGY kernel, hibernation pin was dropped\n");
buggy = true;
} else if (errno == EBUSY) {
printf("step7: swapoff(%s) is still blocked with EBUSY\n",
swap_path_2);
printf("result: FIXED kernel, hibernation pin was preserved\n");
} else {
fprintf(stderr, "step7: unexpected swapoff(%s) error: %s\n",
swap_path_2, strerror(errno));
close(fd);
return 2;
}
close(fd);
if (!buggy) {
if (swapoff(swap_path_2) < 0) {
fprintf(stderr,
"step8: swapoff(%s) after close failed: %s\n",
swap_path_2, strerror(errno));
return 2;
}
printf("step8: swapoff succeeded after closing /dev/snapshot\n");
}
printf("note: re-enable swap with `swapon %s` and `swapon %s`\n",
swap_path_1, swap_path_2);
return buggy ? 1 : 0;
}
include/linux/swap.h | 1 +
kernel/power/user.c | 12 +++------
mm/swapfile.c | 61 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 66 insertions(+), 8 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 1930f81e6be4..720347ae8ce1 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -435,6 +435,7 @@ static inline long get_nr_swap_pages(void)
extern void si_swapinfo(struct sysinfo *);
extern int pin_hibernation_swap_type(dev_t device, sector_t offset);
+extern int repin_hibernation_swap_type(int old_type, dev_t device, sector_t offset);
extern void unpin_hibernation_swap_type(int type);
extern int find_hibernation_swap_type(dev_t device, sector_t offset);
int find_first_swap(dev_t *device);
diff --git a/kernel/power/user.c b/kernel/power/user.c
index 4406f5644a56..869371ad4a5f 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -218,6 +218,7 @@ static int snapshot_set_swap_area(struct snapshot_data *data,
{
sector_t offset;
dev_t swdev;
+ int swap;
if (swsusp_swap_in_use())
return -EPERM;
@@ -238,19 +239,14 @@ static int snapshot_set_swap_area(struct snapshot_data *data,
offset = swap_area.offset;
}
- /*
- * Unpin the swap device if a swap area was already
- * set by SNAPSHOT_SET_SWAP_AREA.
- */
- unpin_hibernation_swap_type(data->swap);
-
/*
* User space encodes device types as two-byte values,
* so we need to recode them
*/
- data->swap = pin_hibernation_swap_type(swdev, offset);
- if (data->swap < 0)
+ swap = repin_hibernation_swap_type(data->swap, swdev, offset);
+ if (swap < 0)
return swdev ? -ENODEV : -EINVAL;
+ data->swap = swap;
data->dev = swdev;
return 0;
}
diff --git a/mm/swapfile.c b/mm/swapfile.c
index c5b459a18f43..4d3b41125e6a 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2215,6 +2215,67 @@ int pin_hibernation_swap_type(dev_t device, sector_t offset)
return type;
}
+/**
+ * repin_hibernation_swap_type - Retarget a hibernation pin without dropping it
+ * @old_type: Currently pinned swap type, or a negative value if none is pinned
+ * @device: Block device containing the resume image
+ * @offset: Offset identifying the swap area
+ *
+ * Locate the swap device for @device/@offset and make it the hibernation-pinned
+ * device. If @old_type already refers to the same swap area, the existing pin
+ * is kept. On failure, the previous pin is preserved.
+ *
+ * Return:
+ * >= 0 on success (new swap type).
+ * -EINVAL if @device is invalid.
+ * -ENODEV if the swap device is not found.
+ * -EBUSY if another device is already pinned for hibernation.
+ */
+int repin_hibernation_swap_type(int old_type, dev_t device, sector_t offset)
+{
+ int type;
+ struct swap_info_struct *old_si = NULL, *new_si;
+
+ spin_lock(&swap_lock);
+
+ if (old_type >= 0) {
+ old_si = swap_type_to_info(old_type);
+ if (WARN_ON_ONCE(!old_si || !(old_si->flags & SWP_HIBERNATION))) {
+ spin_unlock(&swap_lock);
+ return -EINVAL;
+ }
+ }
+
+ type = __find_hibernation_swap_type(device, offset);
+ if (type < 0) {
+ spin_unlock(&swap_lock);
+ return type;
+ }
+
+ if (type == old_type) {
+ spin_unlock(&swap_lock);
+ return type;
+ }
+
+ new_si = swap_type_to_info(type);
+ if (WARN_ON_ONCE(!new_si)) {
+ spin_unlock(&swap_lock);
+ return -ENODEV;
+ }
+
+ if (WARN_ON_ONCE(new_si->flags & SWP_HIBERNATION)) {
+ spin_unlock(&swap_lock);
+ return -EBUSY;
+ }
+
+ if (old_si)
+ old_si->flags &= ~SWP_HIBERNATION;
+ new_si->flags |= SWP_HIBERNATION;
+
+ spin_unlock(&swap_lock);
+ return type;
+}
+
/**
* unpin_hibernation_swap_type - Unpin the swap device for hibernation
* @type: Swap type previously returned by pin_hibernation_swap_type()
base-commit: 5b2b0c6e457765adbe96fb2d464ff1bcd3d72158
--
2.43.0
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] PM: hibernate: preserve uswsusp swap pin across SNAPSHOT_SET_SWAP_AREA re-set failures
2026-04-14 14:32 [PATCH] PM: hibernate: preserve uswsusp swap pin across SNAPSHOT_SET_SWAP_AREA re-set failures DaeMyung Kang
@ 2026-04-14 16:18 ` YoungJun Park
2026-04-14 16:49 ` [PATCH v2] PM: hibernate: keep existing uswsusp swap pin if re-selection fails DaeMyung Kang
1 sibling, 0 replies; 3+ messages in thread
From: YoungJun Park @ 2026-04-14 16:18 UTC (permalink / raw)
To: DaeMyung Kang
Cc: Andrew Morton, Rafael J . Wysocki, Kairui Song, Chris Li,
Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Len Brown,
Pavel Machek, linux-mm, linux-pm, linux-kernel, stable
On Tue, Apr 14, 2026 at 11:32:00PM +0900, DaeMyung Kang wrote:
Hi Daemyung :)
> Commit 5b2b0c6e4577 ("mm/swap, PM: hibernate: fix swapoff
> race in uswsusp by pinning swap device") introduced
> SWP_HIBERNATION so that the swap device chosen via
> /dev/snapshot is held against swapoff for the entire uswsusp
> session. The intended invariant is: from the first successful
> SNAPSHOT_SET_SWAP_AREA until the /dev/snapshot fd is closed,
> exactly one swap device is pinned.
>
> snapshot_set_swap_area() breaks that invariant on the re-set
> path:
>
> unpin_hibernation_swap_type(data->swap);
> data->swap = pin_hibernation_swap_type(swdev, offset);
> if (data->swap < 0)
> return swdev ? -ENODEV : -EINVAL;
>
> The unpin happens unconditionally before the new pin is
> attempted. If the new pin fails (e.g. user space supplies an
> offset/device that is not an active swap area), the session
> continues with no swap device pinned, reopening exactly the
> swapoff race the original commit was meant to close. A
> subsequent swapoff on the previously selected device now
> succeeds where it would have been blocked with EBUSY.
Hmm.. This was actually intentional. The API semantic
is that a second SNAPSHOT_SET_SWAP_AREA abandons the
previous pin. If the new pin fails, the ioctl returns
an error and userspace is responsible for aborting the
session -- proceeding without a pinned device makes no
sense.
The only benefit of preserving the old pin on failure
would be protecting against userspace that ignores the
error. But even in that case, the session has no valid
swap target, so the hibernation image write itself
would fail before swapoff becomes a concern. I think
this is an edge case rather than a bug.
IOW, Looking at your test case, I think this part is
userspace's responsibility:
> ret = snapshot_set_swap_area(fd, bogus_dev, 0);
> if (!ret) {
> fprintf(stderr,
> "step6: bogus SNAPSHOT_SET_SWAP_AREA unexpectedly succeeded\n");
> close(fd);
> return 2;
> }
The ioctl has already returned an error here. Userspace
should close /dev/snapshot and stop, not continue and
expect the old pin to still be in place.
(BTW, the tests are well written and easy to follow.
Thanks!)
For this patch to have real value, there should be
something that concretely breaks after the swapoff
succeeds. But since the session has no valid swap
target at that point, is there any actual broken
behavior that follows?
> if (!buggy) {
> if (swapoff(swap_path_2) < 0) {
> fprintf(stderr,
> "step8: swapoff(%s) after close failed: %s\n",
> swap_path_2, strerror(errno));
> return 2;
> }
> printf("step8: swapoff succeeded after closing /dev/snapshot\n");
> }
If you still see concrete value, I would be happy to
take this as an improvement (without Fixes: and
Cc: stable) -- see my suggestion below for a lighter
approach.
> Reordering pin/unpin in the caller cannot fix this
> cleanly. Each of pin_hibernation_swap_type() /
> unpin_hibernation_swap_type() acquires swap_lock
> independently, so any two-call sequence leaves a window
> in which swapoff can observe an inconsistent pin state.
> The same-area re-set case (type == old_type) also cannot
> be expressed with pin+unpin without either toggling the
> bit (racy) or returning EBUSY (a false error).
>
> Introduce repin_hibernation_swap_type(), which performs
> the transition atomically under a single swap_lock
> acquisition:
I understand the intent. If you still see enough value
in preserving the pin on failure, I would suggest a
lighter approach -- see below.
> - unpin_hibernation_swap_type(data->swap);
> -
> - data->swap = pin_hibernation_swap_type(swdev, offset);
> - if (data->swap < 0)
> + swap = repin_hibernation_swap_type(data->swap, swdev,
> + offset);
> + if (swap < 0)
> return swdev ? -ENODEV : -EINVAL;
> + data->swap = swap;
Would it be simpler to use find_hibernation_swap_type()
to look up the new type first, and if it differs from
data->swap, call pin_hibernation_swap_type() on the new
one? If pin succeeds, unpin the old one. If it returns
-EBUSY, just keep the existing pin.
If swapoff sneaks in between find and pin, pin will
simply fail -- I don't think the kernel needs to
guarantee atomicity for that window. This does acquire
swap_lock multiple times, but SNAPSHOT_SET_SWAP_AREA is
an extremely rare operation, so the extra lock
round-trips should be negligible.
Reusing the existing helpers seems preferable to adding
a new function with this much code for a single call
site.
How do you think?
Thanks again!
Youngjun Park
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v2] PM: hibernate: keep existing uswsusp swap pin if re-selection fails
2026-04-14 14:32 [PATCH] PM: hibernate: preserve uswsusp swap pin across SNAPSHOT_SET_SWAP_AREA re-set failures DaeMyung Kang
2026-04-14 16:18 ` YoungJun Park
@ 2026-04-14 16:49 ` DaeMyung Kang
1 sibling, 0 replies; 3+ messages in thread
From: DaeMyung Kang @ 2026-04-14 16:49 UTC (permalink / raw)
To: Andrew Morton, Rafael J . Wysocki
Cc: Youngjun Park, Kairui Song, Chris Li, Kemeng Shi, Nhat Pham,
Baoquan He, Barry Song, Len Brown, Pavel Machek, linux-mm,
linux-pm, linux-kernel, DaeMyung Kang
Commit 5b2b0c6e4577 ("mm/swap, PM: hibernate: fix swapoff race in
uswsusp by pinning swap device") introduced SWP_HIBERNATION so that
the swap area selected through /dev/snapshot remains protected against
swapoff() for the lifetime of the uswsusp session.
When user space issues SNAPSHOT_SET_SWAP_AREA again,
snapshot_set_swap_area() currently drops the old pin before attempting
to pin the new swap area. If the new selection fails, the ioctl
returns an error and user space is expected to abort the session.
However, preserving the existing pin in that case makes the kernel
side more robust against a failed re-selection, while keeping the
existing userspace-visible behavior unchanged.
Implement this with the existing swap helpers:
- look up the requested swap area first
- treat re-selecting the already pinned area as a no-op
- pin the new area before unpinning the old one
- leave the existing pin in place if the new pin attempt fails
This keeps the hibernation session protected against swapoff() until
/dev/snapshot is closed, even after a failed attempt to switch to a
different swap area.
Suggested-by: Youngjun Park <youngjun.park@lge.com>
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
---
Notes (not part of the commit, stripped by git am):
Changes in v2:
- Drop Fixes: and Cc: stable; reframe as a hardening improvement
rather than a regression fix, per Youngjun's feedback that the
current behavior is intentional and there is no concrete
user-observable harm.
- Drop the new repin_hibernation_swap_type() helper. Rework
snapshot_set_swap_area() in place using the existing find / pin /
unpin helpers as Youngjun suggested; the change now touches only
kernel/power/user.c and adds no new API.
- Update the subject and commit log accordingly.
- Add Suggested-by: trailer.
v1: https://lore.kernel.org/lkml/20260414143200.1267932-1-charsyam@gmail.com/
Baseline
--------
This patch is generated against linux-next at commit 5b2b0c6e4577
("mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap
device"). Mainline does not yet carry that commit, and neither the
helpers used here (find/pin/unpin_hibernation_swap_type) nor the code
site this patch modifies exist there. The base-commit trailer at the
bottom of the mbox records the exact commit.
Testing
-------
The behavior change can be exercised entirely through the
/dev/snapshot ioctl path; no actual hibernation cycle is required.
A targeted assertion test is below; run it as root in a throwaway VM
with two active swap block devices and one non-swap block device
(three arguments).
Run inside a VM on linux-next at 5b2b0c6e4577 with this patch applied:
step1: pinned active swap /dev/vda
step2: swapoff blocked with EBUSY while pin is held
step3: repinned active swap to /dev/vdb
step4: swapoff(/dev/vda) succeeded after repinning away
step5: repinned swap is blocked with EBUSY
step6: bogus SNAPSHOT_SET_SWAP_AREA failed as expected: No such device
step7: swapoff(/dev/vdb) is still blocked with EBUSY
result: pin preserved across failed re-set (hardened behavior)
step8: swapoff succeeded after closing /dev/snapshot
Without the patch, step7 instead reports
swapoff(/dev/vdb) succeeded after failed re-set
because the old pin had been released before the failed pin attempt.
What the assertion test covers:
- SWP_HIBERNATION is enforced against swapoff (step2, step5);
- the success path moves the pin from one active swap to another
(step3, step4, step5);
- a failed re-selection preserves the existing pin (step6, step7);
- the pin lifetime ends on /dev/snapshot close (step8).
What it does not cover:
- the snapshot_open(O_RDONLY) initial resume-device pin path;
- the full suspend-to-disk image create/restore flow;
- concurrent swapoff racing against SNAPSHOT_SET_SWAP_AREA;
- the type == data->swap idempotent branch (not externally
observable since it intentionally skips the bit toggle).
A normal sysfs-based suspend-to-disk cycle continues to work; the
find_hibernation_swap_type() / pin / unpin paths themselves are
unchanged. Build tested with allmodconfig and run-tested with
CONFIG_PROVE_LOCKING=y and CONFIG_KASAN=y. The VM was booted with
oops=panic panic=-1 so any WARN/Oops/BUG would have halted the run;
the full test completed cleanly with no kernel log diagnostics.
Reproducer (C source, for reference only -- not added to the tree):
// SPDX-License-Identifier: GPL-2.0
/*
* Reproduce / verify the SNAPSHOT_SET_SWAP_AREA pin-lifetime behavior.
*
* Run only inside a throwaway VM. The test manipulates swap state and
* leaves the target swap area disabled on success.
*
* Usage:
* ./uswsusp_swapoff_repro <active-swap-1> <active-swap-2> <bogus-blk>
*
* Exit codes:
* 0 = expected (hardened) behavior: pin preserved across failed re-set
* 1 = old behavior: pin dropped on failed re-set
* 2 = setup error / inconclusive
*/
#define _GNU_SOURCE
#include <errno.h>
#include <fcntl.h>
#include <linux/types.h>
#include <linux/suspend_ioctls.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/swap.h>
#include <sys/sysmacros.h>
#include <unistd.h>
static int encode_dev(dev_t dev)
{
unsigned int major_num = major(dev);
unsigned int minor_num = minor(dev);
/* Match new_encode_dev() / new_decode_dev() in the kernel. */
return (major_num & 0xfff) << 8 |
(minor_num & 0xff) |
((minor_num & ~0xff) << 12);
}
static int get_block_dev(const char *path, dev_t *dev)
{
struct stat st;
if (stat(path, &st) < 0) {
fprintf(stderr, "stat(%s): %s\n", path, strerror(errno));
return -errno;
}
if (!S_ISBLK(st.st_mode)) {
fprintf(stderr, "%s is not a block device\n", path);
return -EINVAL;
}
*dev = st.st_rdev;
return 0;
}
static int snapshot_set_swap_area(int fd, dev_t dev, long long offset)
{
struct resume_swap_area area = {
.offset = offset,
.dev = encode_dev(dev),
};
if (ioctl(fd, SNAPSHOT_SET_SWAP_AREA, &area) < 0)
return -errno;
return 0;
}
int main(int argc, char **argv)
{
const char *p1, *p2, *pb;
dev_t d1, d2, db;
int fd, ret;
bool buggy = false;
if (argc != 4) {
fprintf(stderr,
"usage: %s <swap1> <swap2> <bogus>\n", argv[0]);
return 2;
}
if (geteuid() != 0) {
fprintf(stderr, "must run as root\n");
return 2;
}
p1 = argv[1]; p2 = argv[2]; pb = argv[3];
if (get_block_dev(p1, &d1) < 0 ||
get_block_dev(p2, &d2) < 0 ||
get_block_dev(pb, &db) < 0)
return 2;
fd = open("/dev/snapshot", O_WRONLY);
if (fd < 0) {
fprintf(stderr, "open(/dev/snapshot): %s\n", strerror(errno));
return 2;
}
ret = snapshot_set_swap_area(fd, d1, 0);
if (ret < 0) { fprintf(stderr, "step1: %s\n", strerror(-ret)); goto setup_err; }
printf("step1: pinned active swap %s\n", p1);
if (swapoff(p1) == 0) {
fprintf(stderr, "step2: swapoff unexpectedly succeeded\n");
close(fd); return 1;
}
if (errno != EBUSY) {
fprintf(stderr, "step2: expected EBUSY, got %s\n", strerror(errno));
goto setup_err;
}
printf("step2: swapoff blocked with EBUSY while pin is held\n");
ret = snapshot_set_swap_area(fd, d2, 0);
if (ret < 0) { fprintf(stderr, "step3: %s\n", strerror(-ret)); goto setup_err; }
printf("step3: repinned active swap to %s\n", p2);
if (swapoff(p1) < 0) {
fprintf(stderr, "step4: swapoff(%s): %s\n", p1, strerror(errno));
goto setup_err;
}
printf("step4: swapoff(%s) succeeded after repinning away\n", p1);
if (swapoff(p2) == 0) {
fprintf(stderr, "step5: swapoff unexpectedly succeeded\n");
close(fd); return 1;
}
if (errno != EBUSY) {
fprintf(stderr, "step5: expected EBUSY, got %s\n", strerror(errno));
goto setup_err;
}
printf("step5: repinned swap is blocked with EBUSY\n");
ret = snapshot_set_swap_area(fd, db, 0);
if (!ret) {
fprintf(stderr, "step6: bogus unexpectedly succeeded\n");
goto setup_err;
}
printf("step6: bogus SNAPSHOT_SET_SWAP_AREA failed as expected: %s\n",
strerror(-ret));
if (swapoff(p2) == 0) {
printf("step7: swapoff(%s) succeeded after failed re-set\n", p2);
printf("result: pin was dropped on failure (old behavior)\n");
buggy = true;
} else if (errno == EBUSY) {
printf("step7: swapoff(%s) is still blocked with EBUSY\n", p2);
printf("result: pin preserved across failed re-set (hardened behavior)\n");
} else {
fprintf(stderr, "step7: unexpected: %s\n", strerror(errno));
goto setup_err;
}
close(fd);
if (!buggy) {
if (swapoff(p2) < 0) {
fprintf(stderr, "step8: swapoff(%s): %s\n", p2, strerror(errno));
return 2;
}
printf("step8: swapoff succeeded after closing /dev/snapshot\n");
}
printf("note: re-enable with `swapon %s` and `swapon %s`\n", p1, p2);
return buggy ? 1 : 0;
setup_err:
close(fd);
return 2;
}
kernel/power/user.c | 35 ++++++++++++++++++++++++++---------
1 file changed, 26 insertions(+), 9 deletions(-)
diff --git a/kernel/power/user.c b/kernel/power/user.c
index 4406f5644a56..e1ab85db2e95 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -218,6 +218,7 @@ static int snapshot_set_swap_area(struct snapshot_data *data,
{
sector_t offset;
dev_t swdev;
+ int type, swap;
if (swsusp_swap_in_use())
return -EPERM;
@@ -239,18 +240,34 @@ static int snapshot_set_swap_area(struct snapshot_data *data,
}
/*
- * Unpin the swap device if a swap area was already
- * set by SNAPSHOT_SET_SWAP_AREA.
+ * User space encodes device types as two-byte values, so we need to
+ * recode them.
*/
- unpin_hibernation_swap_type(data->swap);
+ type = find_hibernation_swap_type(swdev, offset);
+ if (type < 0)
+ return swdev ? -ENODEV : -EINVAL;
- /*
- * User space encodes device types as two-byte values,
- * so we need to recode them
- */
- data->swap = pin_hibernation_swap_type(swdev, offset);
- if (data->swap < 0)
+ if (type == data->swap) {
+ /*
+ * Re-selecting the already pinned swap area is a no-op.
+ * Keep the existing pin and just refresh the cached device id.
+ */
+ data->dev = swdev;
+ return 0;
+ }
+
+ swap = pin_hibernation_swap_type(swdev, offset);
+ if (swap < 0) {
+ /*
+ * Preserve the existing pin on failure. This can happen if the
+ * target swap area disappears before pinning, or via the
+ * defensive -EBUSY path in pin_hibernation_swap_type().
+ */
return swdev ? -ENODEV : -EINVAL;
+ }
+
+ unpin_hibernation_swap_type(data->swap);
+ data->swap = swap;
data->dev = swdev;
return 0;
}
base-commit: 5b2b0c6e457765adbe96fb2d464ff1bcd3d72158
--
2.43.0
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-04-14 16:49 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-14 14:32 [PATCH] PM: hibernate: preserve uswsusp swap pin across SNAPSHOT_SET_SWAP_AREA re-set failures DaeMyung Kang
2026-04-14 16:18 ` YoungJun Park
2026-04-14 16:49 ` [PATCH v2] PM: hibernate: keep existing uswsusp swap pin if re-selection fails DaeMyung Kang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox