linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: DaeMyung Kang <charsyam@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	"Rafael J . Wysocki" <rafael@kernel.org>
Cc: Youngjun Park <youngjun.park@lge.com>,
	Kairui Song <kasong@tencent.com>, Chris Li <chrisl@kernel.org>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
	Barry Song <baohua@kernel.org>, Len Brown <lenb@kernel.org>,
	Pavel Machek <pavel@kernel.org>,
	linux-mm@kvack.org, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, DaeMyung Kang <charsyam@gmail.com>
Subject: [PATCH v2] PM: hibernate: keep existing uswsusp swap pin if re-selection fails
Date: Wed, 15 Apr 2026 01:49:36 +0900	[thread overview]
Message-ID: <20260414164937.1363887-1-charsyam@gmail.com> (raw)
In-Reply-To: <20260414143200.1267932-1-charsyam@gmail.com>

Commit 5b2b0c6e4577 ("mm/swap, PM: hibernate: fix swapoff race in
uswsusp by pinning swap device") introduced SWP_HIBERNATION so that
the swap area selected through /dev/snapshot remains protected against
swapoff() for the lifetime of the uswsusp session.

When user space issues SNAPSHOT_SET_SWAP_AREA again,
snapshot_set_swap_area() currently drops the old pin before attempting
to pin the new swap area.  If the new selection fails, the ioctl
returns an error and user space is expected to abort the session.
However, preserving the existing pin in that case makes the kernel
side more robust against a failed re-selection, while keeping the
existing userspace-visible behavior unchanged.

Implement this with the existing swap helpers:

  - look up the requested swap area first
  - treat re-selecting the already pinned area as a no-op
  - pin the new area before unpinning the old one
  - leave the existing pin in place if the new pin attempt fails

This keeps the hibernation session protected against swapoff() until
/dev/snapshot is closed, even after a failed attempt to switch to a
different swap area.

Suggested-by: Youngjun Park <youngjun.park@lge.com>
Signed-off-by: DaeMyung Kang <charsyam@gmail.com>
---
Notes (not part of the commit, stripped by git am):

Changes in v2:
  - Drop Fixes: and Cc: stable; reframe as a hardening improvement
    rather than a regression fix, per Youngjun's feedback that the
    current behavior is intentional and there is no concrete
    user-observable harm.
  - Drop the new repin_hibernation_swap_type() helper. Rework
    snapshot_set_swap_area() in place using the existing find / pin /
    unpin helpers as Youngjun suggested; the change now touches only
    kernel/power/user.c and adds no new API.
  - Update the subject and commit log accordingly.
  - Add Suggested-by: trailer.

v1: https://lore.kernel.org/lkml/20260414143200.1267932-1-charsyam@gmail.com/

Baseline
--------
This patch is generated against linux-next at commit 5b2b0c6e4577
("mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap
device"). Mainline does not yet carry that commit, and neither the
helpers used here (find/pin/unpin_hibernation_swap_type) nor the code
site this patch modifies exist there. The base-commit trailer at the
bottom of the mbox records the exact commit.

Testing
-------
The behavior change can be exercised entirely through the
/dev/snapshot ioctl path; no actual hibernation cycle is required.
A targeted assertion test is below; run it as root in a throwaway VM
with two active swap block devices and one non-swap block device
(three arguments).

Run inside a VM on linux-next at 5b2b0c6e4577 with this patch applied:

  step1: pinned active swap /dev/vda
  step2: swapoff blocked with EBUSY while pin is held
  step3: repinned active swap to /dev/vdb
  step4: swapoff(/dev/vda) succeeded after repinning away
  step5: repinned swap is blocked with EBUSY
  step6: bogus SNAPSHOT_SET_SWAP_AREA failed as expected: No such device
  step7: swapoff(/dev/vdb) is still blocked with EBUSY
  result: pin preserved across failed re-set (hardened behavior)
  step8: swapoff succeeded after closing /dev/snapshot

Without the patch, step7 instead reports
  swapoff(/dev/vdb) succeeded after failed re-set
because the old pin had been released before the failed pin attempt.

What the assertion test covers:
  - SWP_HIBERNATION is enforced against swapoff (step2, step5);
  - the success path moves the pin from one active swap to another
    (step3, step4, step5);
  - a failed re-selection preserves the existing pin (step6, step7);
  - the pin lifetime ends on /dev/snapshot close (step8).

What it does not cover:
  - the snapshot_open(O_RDONLY) initial resume-device pin path;
  - the full suspend-to-disk image create/restore flow;
  - concurrent swapoff racing against SNAPSHOT_SET_SWAP_AREA;
  - the type == data->swap idempotent branch (not externally
    observable since it intentionally skips the bit toggle).

A normal sysfs-based suspend-to-disk cycle continues to work; the
find_hibernation_swap_type() / pin / unpin paths themselves are
unchanged. Build tested with allmodconfig and run-tested with
CONFIG_PROVE_LOCKING=y and CONFIG_KASAN=y. The VM was booted with
oops=panic panic=-1 so any WARN/Oops/BUG would have halted the run;
the full test completed cleanly with no kernel log diagnostics.

Reproducer (C source, for reference only -- not added to the tree):

 // SPDX-License-Identifier: GPL-2.0
 /*
  * Reproduce / verify the SNAPSHOT_SET_SWAP_AREA pin-lifetime behavior.
  *
  * Run only inside a throwaway VM. The test manipulates swap state and
  * leaves the target swap area disabled on success.
  *
  * Usage:
  *   ./uswsusp_swapoff_repro <active-swap-1> <active-swap-2> <bogus-blk>
  *
  * Exit codes:
  *   0 = expected (hardened) behavior: pin preserved across failed re-set
  *   1 = old behavior: pin dropped on failed re-set
  *   2 = setup error / inconclusive
  */

 #define _GNU_SOURCE
 #include <errno.h>
 #include <fcntl.h>
 #include <linux/types.h>
 #include <linux/suspend_ioctls.h>
 #include <stdbool.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <sys/ioctl.h>
 #include <sys/stat.h>
 #include <sys/swap.h>
 #include <sys/sysmacros.h>
 #include <unistd.h>

 static int encode_dev(dev_t dev)
 {
 	unsigned int major_num = major(dev);
 	unsigned int minor_num = minor(dev);

 	/* Match new_encode_dev() / new_decode_dev() in the kernel. */
 	return (major_num & 0xfff) << 8 |
 	       (minor_num & 0xff) |
 	       ((minor_num & ~0xff) << 12);
 }

 static int get_block_dev(const char *path, dev_t *dev)
 {
 	struct stat st;

 	if (stat(path, &st) < 0) {
 		fprintf(stderr, "stat(%s): %s\n", path, strerror(errno));
 		return -errno;
 	}
 	if (!S_ISBLK(st.st_mode)) {
 		fprintf(stderr, "%s is not a block device\n", path);
 		return -EINVAL;
 	}
 	*dev = st.st_rdev;
 	return 0;
 }

 static int snapshot_set_swap_area(int fd, dev_t dev, long long offset)
 {
 	struct resume_swap_area area = {
 		.offset = offset,
 		.dev = encode_dev(dev),
 	};

 	if (ioctl(fd, SNAPSHOT_SET_SWAP_AREA, &area) < 0)
 		return -errno;
 	return 0;
 }

 int main(int argc, char **argv)
 {
 	const char *p1, *p2, *pb;
 	dev_t d1, d2, db;
 	int fd, ret;
 	bool buggy = false;

 	if (argc != 4) {
 		fprintf(stderr,
 			"usage: %s <swap1> <swap2> <bogus>\n", argv[0]);
 		return 2;
 	}
 	if (geteuid() != 0) {
 		fprintf(stderr, "must run as root\n");
 		return 2;
 	}
 	p1 = argv[1]; p2 = argv[2]; pb = argv[3];

 	if (get_block_dev(p1, &d1) < 0 ||
 	    get_block_dev(p2, &d2) < 0 ||
 	    get_block_dev(pb, &db) < 0)
 		return 2;

 	fd = open("/dev/snapshot", O_WRONLY);
 	if (fd < 0) {
 		fprintf(stderr, "open(/dev/snapshot): %s\n", strerror(errno));
 		return 2;
 	}

 	ret = snapshot_set_swap_area(fd, d1, 0);
 	if (ret < 0) { fprintf(stderr, "step1: %s\n", strerror(-ret)); goto setup_err; }
 	printf("step1: pinned active swap %s\n", p1);

 	if (swapoff(p1) == 0) {
 		fprintf(stderr, "step2: swapoff unexpectedly succeeded\n");
 		close(fd); return 1;
 	}
 	if (errno != EBUSY) {
 		fprintf(stderr, "step2: expected EBUSY, got %s\n", strerror(errno));
 		goto setup_err;
 	}
 	printf("step2: swapoff blocked with EBUSY while pin is held\n");

 	ret = snapshot_set_swap_area(fd, d2, 0);
 	if (ret < 0) { fprintf(stderr, "step3: %s\n", strerror(-ret)); goto setup_err; }
 	printf("step3: repinned active swap to %s\n", p2);

 	if (swapoff(p1) < 0) {
 		fprintf(stderr, "step4: swapoff(%s): %s\n", p1, strerror(errno));
 		goto setup_err;
 	}
 	printf("step4: swapoff(%s) succeeded after repinning away\n", p1);

 	if (swapoff(p2) == 0) {
 		fprintf(stderr, "step5: swapoff unexpectedly succeeded\n");
 		close(fd); return 1;
 	}
 	if (errno != EBUSY) {
 		fprintf(stderr, "step5: expected EBUSY, got %s\n", strerror(errno));
 		goto setup_err;
 	}
 	printf("step5: repinned swap is blocked with EBUSY\n");

 	ret = snapshot_set_swap_area(fd, db, 0);
 	if (!ret) {
 		fprintf(stderr, "step6: bogus unexpectedly succeeded\n");
 		goto setup_err;
 	}
 	printf("step6: bogus SNAPSHOT_SET_SWAP_AREA failed as expected: %s\n",
 	       strerror(-ret));

 	if (swapoff(p2) == 0) {
 		printf("step7: swapoff(%s) succeeded after failed re-set\n", p2);
 		printf("result: pin was dropped on failure (old behavior)\n");
 		buggy = true;
 	} else if (errno == EBUSY) {
 		printf("step7: swapoff(%s) is still blocked with EBUSY\n", p2);
 		printf("result: pin preserved across failed re-set (hardened behavior)\n");
 	} else {
 		fprintf(stderr, "step7: unexpected: %s\n", strerror(errno));
 		goto setup_err;
 	}

 	close(fd);
 	if (!buggy) {
 		if (swapoff(p2) < 0) {
 			fprintf(stderr, "step8: swapoff(%s): %s\n", p2, strerror(errno));
 			return 2;
 		}
 		printf("step8: swapoff succeeded after closing /dev/snapshot\n");
 	}
 	printf("note: re-enable with `swapon %s` and `swapon %s`\n", p1, p2);
 	return buggy ? 1 : 0;

 setup_err:
 	close(fd);
 	return 2;
 }

 kernel/power/user.c | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/kernel/power/user.c b/kernel/power/user.c
index 4406f5644a56..e1ab85db2e95 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -218,6 +218,7 @@ static int snapshot_set_swap_area(struct snapshot_data *data,
 {
 	sector_t offset;
 	dev_t swdev;
+	int type, swap;
 
 	if (swsusp_swap_in_use())
 		return -EPERM;
@@ -239,18 +240,34 @@ static int snapshot_set_swap_area(struct snapshot_data *data,
 	}
 
 	/*
-	 * Unpin the swap device if a swap area was already
-	 * set by SNAPSHOT_SET_SWAP_AREA.
+	 * User space encodes device types as two-byte values, so we need to
+	 * recode them.
 	 */
-	unpin_hibernation_swap_type(data->swap);
+	type = find_hibernation_swap_type(swdev, offset);
+	if (type < 0)
+		return swdev ? -ENODEV : -EINVAL;
 
-	/*
-	 * User space encodes device types as two-byte values,
-	 * so we need to recode them
-	 */
-	data->swap = pin_hibernation_swap_type(swdev, offset);
-	if (data->swap < 0)
+	if (type == data->swap) {
+		/*
+		 * Re-selecting the already pinned swap area is a no-op.
+		 * Keep the existing pin and just refresh the cached device id.
+		 */
+		data->dev = swdev;
+		return 0;
+	}
+
+	swap = pin_hibernation_swap_type(swdev, offset);
+	if (swap < 0) {
+		/*
+		 * Preserve the existing pin on failure.  This can happen if the
+		 * target swap area disappears before pinning, or via the
+		 * defensive -EBUSY path in pin_hibernation_swap_type().
+		 */
 		return swdev ? -ENODEV : -EINVAL;
+	}
+
+	unpin_hibernation_swap_type(data->swap);
+	data->swap = swap;
 	data->dev = swdev;
 	return 0;
 }

base-commit: 5b2b0c6e457765adbe96fb2d464ff1bcd3d72158
-- 
2.43.0



      parent reply	other threads:[~2026-04-14 16:49 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-14 14:32 [PATCH] PM: hibernate: preserve uswsusp swap pin across SNAPSHOT_SET_SWAP_AREA re-set failures DaeMyung Kang
2026-04-14 16:18 ` YoungJun Park
2026-04-14 16:49 ` DaeMyung Kang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260414164937.1363887-1-charsyam@gmail.com \
    --to=charsyam@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=kasong@tencent.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=pavel@kernel.org \
    --cc=rafael@kernel.org \
    --cc=shikemeng@huaweicloud.com \
    --cc=youngjun.park@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox