linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: YoungJun Park <youngjun.park@lge.com>
To: DaeMyung Kang <charsyam@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Kairui Song <kasong@tencent.com>, Chris Li <chrisl@kernel.org>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
	Barry Song <baohua@kernel.org>, Len Brown <lenb@kernel.org>,
	Pavel Machek <pavel@kernel.org>,
	linux-mm@kvack.org, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] PM: hibernate: preserve uswsusp swap pin across SNAPSHOT_SET_SWAP_AREA re-set failures
Date: Wed, 15 Apr 2026 01:18:31 +0900	[thread overview]
Message-ID: <ad5o1z1hSzLtUxMo@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <20260414143200.1267932-1-charsyam@gmail.com>

On Tue, Apr 14, 2026 at 11:32:00PM +0900, DaeMyung Kang wrote:

Hi Daemyung :)

> Commit 5b2b0c6e4577 ("mm/swap, PM: hibernate: fix swapoff
> race in uswsusp by pinning swap device") introduced
> SWP_HIBERNATION so that the swap device chosen via
> /dev/snapshot is held against swapoff for the entire uswsusp
> session. The intended invariant is: from the first successful
> SNAPSHOT_SET_SWAP_AREA until the /dev/snapshot fd is closed,
> exactly one swap device is pinned.
>
> snapshot_set_swap_area() breaks that invariant on the re-set
> path:
>
>       unpin_hibernation_swap_type(data->swap);
>       data->swap = pin_hibernation_swap_type(swdev, offset);
>       if (data->swap < 0)
>               return swdev ? -ENODEV : -EINVAL;
>
> The unpin happens unconditionally before the new pin is
> attempted. If the new pin fails (e.g. user space supplies an
> offset/device that is not an active swap area), the session
> continues with no swap device pinned, reopening exactly the
> swapoff race the original commit was meant to close. A
> subsequent swapoff on the previously selected device now
> succeeds where it would have been blocked with EBUSY.

Hmm.. This was actually intentional.  The API semantic
is that a second SNAPSHOT_SET_SWAP_AREA abandons the
previous pin.  If the new pin fails, the ioctl returns
an error and userspace is responsible for aborting the
session -- proceeding without a pinned device makes no
sense.

The only benefit of preserving the old pin on failure
would be protecting against userspace that ignores the
error.  But even in that case, the session has no valid
swap target, so the hibernation image write itself
would fail before swapoff becomes a concern.  I think
this is an edge case rather than a bug.

IOW, Looking at your test case, I think this part is
userspace's responsibility:

>       ret = snapshot_set_swap_area(fd, bogus_dev, 0);
>       if (!ret) {
>               fprintf(stderr,
>                       "step6: bogus SNAPSHOT_SET_SWAP_AREA unexpectedly succeeded\n");
>               close(fd);
>               return 2;
>       }

The ioctl has already returned an error here.  Userspace
should close /dev/snapshot and stop, not continue and
expect the old pin to still be in place.

(BTW, the tests are well written and easy to follow.
Thanks!)

For this patch to have real value, there should be
something that concretely breaks after the swapoff
succeeds.  But since the session has no valid swap
target at that point, is there any actual broken
behavior that follows?

>       if (!buggy) {
>               if (swapoff(swap_path_2) < 0) {
>                       fprintf(stderr,
>                               "step8: swapoff(%s) after close failed: %s\n",
>                               swap_path_2, strerror(errno));
>                       return 2;
>               }
>               printf("step8: swapoff succeeded after closing /dev/snapshot\n");
>       }

If you still see concrete value, I would be happy to
take this as an improvement (without Fixes: and
Cc: stable) -- see my suggestion below for a lighter
approach.

> Reordering pin/unpin in the caller cannot fix this
> cleanly. Each of pin_hibernation_swap_type() /
> unpin_hibernation_swap_type() acquires swap_lock
> independently, so any two-call sequence leaves a window
> in which swapoff can observe an inconsistent pin state.
> The same-area re-set case (type == old_type) also cannot
> be expressed with pin+unpin without either toggling the
> bit (racy) or returning EBUSY (a false error).
>
> Introduce repin_hibernation_swap_type(), which performs
> the transition atomically under a single swap_lock
> acquisition:

I understand the intent.  If you still see enough value
in preserving the pin on failure, I would suggest a
lighter approach -- see below.

> -     unpin_hibernation_swap_type(data->swap);
> -
> -     data->swap = pin_hibernation_swap_type(swdev, offset);
> -     if (data->swap < 0)
> +     swap = repin_hibernation_swap_type(data->swap, swdev,
> +                                        offset);
> +     if (swap < 0)
>               return swdev ? -ENODEV : -EINVAL;
> +     data->swap = swap;

Would it be simpler to use find_hibernation_swap_type()
to look up the new type first, and if it differs from
data->swap, call pin_hibernation_swap_type() on the new
one?  If pin succeeds, unpin the old one.  If it returns
-EBUSY, just keep the existing pin.

If swapoff sneaks in between find and pin, pin will
simply fail -- I don't think the kernel needs to
guarantee atomicity for that window.  This does acquire
swap_lock multiple times, but SNAPSHOT_SET_SWAP_AREA is
an extremely rare operation, so the extra lock
round-trips should be negligible.

Reusing the existing helpers seems preferable to adding
a new function with this much code for a single call
site.

How do you think?

Thanks again!
Youngjun Park


  reply	other threads:[~2026-04-14 16:18 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-14 14:32 DaeMyung Kang
2026-04-14 16:18 ` YoungJun Park [this message]
2026-04-14 16:49 ` [PATCH v2] PM: hibernate: keep existing uswsusp swap pin if re-selection fails DaeMyung Kang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad5o1z1hSzLtUxMo@yjaykim-PowerEdge-T330 \
    --to=youngjun.park@lge.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=charsyam@gmail.com \
    --cc=chrisl@kernel.org \
    --cc=kasong@tencent.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=pavel@kernel.org \
    --cc=rafael@kernel.org \
    --cc=shikemeng@huaweicloud.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox