From: YoungJun Park <youngjun.park@lge.com>
To: Chris Li <chrisl@kernel.org>
Cc: rafael@kernel.org, akpm@linux-foundation.org, kasong@tencent.com,
pavel@kernel.org, shikemeng@huaweicloud.com, nphamcs@gmail.com,
bhe@redhat.com, baohua@kernel.org, usama.arif@linux.dev,
linux-pm@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH v2] mm/swap, PM: hibernate: hold swap device reference across swap operation
Date: Fri, 6 Mar 2026 17:02:45 +0900 [thread overview]
Message-ID: <aaqKJQeO8wLQL7Zn@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <CACePvbXVvPp_a89UFztZo5nGawpFea9t=NRisf468VcxHgkX7A@mail.gmail.com>
On Thu, Mar 05, 2026 at 10:55:15PM -0800, Chris Li wrote:
> On Thu, Mar 5, 2026 at 6:46 PM Youngjun Park <youngjun.park@lge.com> wrote:
> >
> > Currently, in the uswsusp path, only the swap type value is retrieved at
> > lookup time without holding a reference. If swapoff races after the type
> > is acquired, subsequent slot allocations operate on a stale swap device.
>
> Just from you above description, I am not sure how the bug is actually
> triggered yet. That sounds possible. I want more detail.
To be honest, I am not deeply familiar with the snapshot code, which is why
I submitted this as an RFC. However, I believe the race is theoretically
possible and I was able to trigger it with a simple PoC user program.
(not in-kernel swsusp as I think, cuz every user thread freezed
before creating snapshot, only on uswsusp)
The race occurs in `power/user.c`
1. snapshot_open() calls swap_type_of() to find the swap device.
2. We get the swap type, but hold no reference at this point.
3. [Race Window]: Another thread triggers swapoff() and swapon()
4. snapshot_ioctl(SNAPSHOT_ALLOC_SWAP_PAGE) is called.
-> The swap device is gone or the type ID is reused by another device or
swap device is missing.
> Can you show me which code path triggered this bug?
> e.g. Thread A wants to suspend, with this back trace call graph.
> Then in this function foo() A grabs the swap device without holding a reference.
> Meanwhile, thread B is performing a swap off while A is at function foo().
>
> > Additionally, grabbing and releasing the swap device reference on every
> > slot allocation is inefficient across the entire hibernation swap path.
>
> If the swap entry is already allocated by the suspend code on that
> swap device, the follow up allocation does not need to grab the
> reference again because the swap device's swapped count will not drop
> to zero until resume.
You are right. Since the swap device is pinned once a swap entry is
allocated, we could indeed rely on that pinning mechanism to ensure safety
for subsequent allocations (instead of doing get/put every time).
However, relying on that pinning alone does not protect the window between
the initial lookup (step 1) and the *first* allocation.
My proposal is to grab the reference at the lookup point to close this
initial race. If we do that, I believe we can remove the per-slot
get/put calls entirely, as the initial reference is sufficient to keep the
device alive until the operation completes.
Regarding the reference release strategy in this patch:
1. uswsusp: The reference is released when the snapshot device file
is closed(snapshot_release) and error paths.
2. not uswsusp`: I only added reference release in the error paths.
About 2.. I conclude that on a successful resume, the system state reverts to
the snapshot point, making an explicit release unnecessary. However,
I am not 100% certain if this holds true for the swap reference
context.
This part is the primary reason I submitted this as an RFC. I
would appreciate it if you could review this part specifically to
confirm whether my understanding is correct.
> > Address these issues by holding the swap device reference from the point
> > the swap device is looked up, and releasing it once at each exit path.
> > This ensures the device remains valid throughout the operation and
> > removes the overhead of per-slot reference counting.
>
> I want to understand how to trigger the buggy code path first. It
> might be obvious to you. It is not obvious to me yet.
I hope the explanation above clarifies the trace. Please let me know if
there are still parts that are not obvious, and I will explain further or
investigate more.
Thank you for the review
Youngjun Park
prev parent reply other threads:[~2026-03-06 8:02 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-06 2:46 Youngjun Park
2026-03-06 6:55 ` Chris Li
2026-03-06 8:02 ` YoungJun Park [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaqKJQeO8wLQL7Zn@yjaykim-PowerEdge-T330 \
--to=youngjun.park@lge.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=kasong@tencent.com \
--cc=linux-mm@kvack.org \
--cc=linux-pm@vger.kernel.org \
--cc=nphamcs@gmail.com \
--cc=pavel@kernel.org \
--cc=rafael@kernel.org \
--cc=shikemeng@huaweicloud.com \
--cc=usama.arif@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox