Re: [PATCH v4 1/3] mm/swap, PM: hibernate: fix swapoff race in uswsusp by getting swap reference

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: YoungJun Park <youngjun.park@lge.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: rafael@kernel.org, akpm@linux-foundation.org, chrisl@kernel.org,
	kasong@tencent.com, pavel@kernel.org, shikemeng@huaweicloud.com,
	nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org,
	usama.arif@linux.dev, linux-pm@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH v4 1/3] mm/swap, PM: hibernate: fix swapoff race in uswsusp by getting swap reference
Date: Fri, 20 Mar 2026 16:59:43 +0900	[thread overview]
Message-ID: <abz+b8MwnqO0ZOeL@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <abp7aDgYLrxF3Me8@KASONG-MC4>

On Fri, Mar 20, 2026 at 12:34:02AM +0800, Kairui Song wrote:
> On Wed, Mar 18, 2026 at 03:13:16AM +0800, Youngjun Park wrote:
 
> Hi Youngjun,
> 
> Thanks! This set of API looks better that before.

Hello Kairui! :)

> >  
> > -/*
> > - * Find the swap type that corresponds to given device (if any).
> > - *
> > - * @offset - number of the PAGE_SIZE-sized block of the device, starting
> > - * from 0, in which the swap header is expected to be located.
> > - *
> > - * This is needed for the suspend to disk (aka swsusp).
> > - */
> > -int swap_type_of(dev_t device, sector_t offset)
> > +static int swap_type_of(dev_t device, sector_t offset)
> 
> Maybe rename it while at it, it's a internal helper now and
> no one is using it.

I will change!

> > +/*
> > + * Finds the swap type and safely acquires a reference to the swap device
> > + * to prevent race conditions with swapoff.
> > + *
> > + * This should be used in environments like uswsusp where a race condition
> > + * exists between configuring the resume device and allocating a swap slot.
> > + * For sysfs hibernation where user-space is frozen (making swapoff
> > + * impossible), use find_hibernation_swap_type() instead.
> > + *
> > + * The caller must drop the reference using put_hibernation_swap_type().
> > + */
> 
> You can follow the kernel doc format.

I will convert this to proper kernel-doc format as well. Thanks.

> > +int get_hibernation_swap_type(dev_t device, sector_t offset)
> > +{
> > +	int type;
> > +	struct swap_info_struct *sis;
> > +
> > +	spin_lock(&swap_lock);
> > +	type = swap_type_of(device, offset);
> > +	sis = swap_type_to_info(type);
> > +	if (!sis || !get_swap_device_info(sis))
> > +		type = -1;
> > +
> > +	spin_unlock(&swap_lock);
> > +	return type;
> > +}
> > +
> > +/*
> > + * Drops the reference to the swap device previously acquired by
> > + * get_hibernation_swap_type().
> > + */
> > +void put_hibernation_swap_type(int type)
> > +{
> > +	struct swap_info_struct *sis;
> > +
> > +	sis = swap_type_to_info(type);
> > +	if (!sis)
> > +		return;
> 
> I think it should never see sis == NULL here?

It can happen in the following cases:
1. The snapshot device is not configured at open time (i.e. no resume=
   parameter and userspace intends to configure it later via ioctl).
2. The process exits or snapshot_release() is called with type = -1.

Although callers would normally check `type >= 0`, this helper
already handles invalid types defensively, so an extra check at the
call site is not strictly necessary.

> This looks really bad... It means swapoff will hung unless user send a
> signal to stop it? How does that happen?
> 
> If hibernate may hold a reference without allocating any slot for a
> long time, then maybe we better introduce a new si->flags bit to
> indicate a device is used by hibernate, so swapoff can abort early
> with a -EBUSY.
> 
> If hibernate only holds the swap device reference after it allocated
> any slot, then we can avoid use a flag but use a hibernate type in
> swap table instead:
> 
> diff --git a/mm/swap_table.h b/mm/swap_table.h
> index 8415ffbe2b9c..025e004fc2e7 100644
> --- a/mm/swap_table.h
> +++ b/mm/swap_table.h
> @@ -25,6 +25,7 @@ struct swap_table {
>   * PFN:      | SWAP_COUNT |------ PFN -------|10| - Cached slot
>   * Pointer:  |----------- Pointer ----------|100| - (Unused)
>   * Bad:      |------------- 1 -------------|1000| - Bad slot
> + * Exclusive:|---- 0  ----| ------ 1 -----|10000| - Exclusive usage
>   *
>   * SWAP_COUNT is `SWP_TB_COUNT_BITS` long, each entry is an atomic long.
>   *
> @@ -46,6 +47,8 @@ struct swap_table {
>   *   because only the lower three bits can be used as a marker for 8 bytes
>   *   aligned pointers.
>   *
> + * - Exclusive usage: e.g. for hibernation.
> + *
>   * - Bad: Swap slot is reserved, protects swap header or holes on swap devices.
>   */
> 
> Both readahead and swapoff (unuse scan) will abort upon seeing such slot,
> we solve both readahead and swapoff issue together.
> 
> How do you think? We can use the flag idea first.

That makes sense. I also considered the pinning flag approach (and
mentioned my previously rejected strategy in response to Andrew’s
question).

Based on your feedback, I’m now convinced that introducing a dedicated
SWP_HIBERNATION pinning flag in `si->flags` is the right direction. It
makes the interaction with swapoff explicit and avoids long waits or
implicit dependency on slot allocation timing.

In the future, this pinning flag could potentially be reused in other
swap paths if we ever need to distinguish hibernation-specific
allocations. (this part is just imagination)

Regarding the swap table “exclusive bit” idea: I don’t think it fits
well here because there is a window between selecting the swap device
type and actually allocating a swap slot. During that gap, the user
context may change, so encoding exclusivity at the slot level would not
fully cover the race.

So I will proceed with the pinning flag approach !

> >  	 */
> >  	percpu_ref_kill(&p->users);
> >  	synchronize_rcu();
> > -	wait_for_completion(&p->comp);
> > +	err = wait_for_completion_interruptible(&p->comp);
> > +	if (err) {
> > +		percpu_ref_resurrect(&p->users);
> > +		synchronize_rcu();
> 
> Do we need a synchronize_rcu here? It's required above because we are
> releasing the resources so all RCU readers must exit from grace
> period. But here we are aborting and not releasing anything so
> there is no such need?

This strategy has been dropped with the introduction of the
SWP_HIBERNATION pinning flag.

For clarification: the `synchronize_rcu()` here was intended to wait for
the final percpu_ref callback, since that callback runs in RCU context.
However, as you pointed out, since we are aborting rather than releasing
resources, that extra grace period is likely unnecessary.

I will update the patch accordingly and include additional testing
results.

Thanks again for the valuable review.

Best regards,
Youngjun

next prev parent reply	other threads:[~2026-03-20  7:59 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 18:13 [PATCH v4 0/3] mm/swap, PM: hibernate: fix swapoff race and optimize swap Youngjun Park
2026-03-17 18:13 ` [PATCH v4 1/3] mm/swap, PM: hibernate: fix swapoff race in uswsusp by getting swap reference Youngjun Park
2026-03-19 16:34   ` Kairui Song
2026-03-20  7:59     ` YoungJun Park [this message]
2026-03-17 18:13 ` [PATCH v4 2/3] mm/swap: remove redundant swap device reference in alloc/free Youngjun Park
2026-03-17 18:13 ` [PATCH v4 3/3] PM: hibernate: fix spurious GFP mask WARNING in uswsusp path Youngjun Park
2026-03-17 19:16 ` [PATCH v4 0/3] mm/swap, PM: hibernate: fix swapoff race and optimize swap Andrew Morton
2026-03-18  2:16   ` YoungJun Park
2026-03-19 13:33 ` Rafael J. Wysocki
2026-03-19 13:48   ` YoungJun Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abz+b8MwnqO0ZOeL@yjaykim-PowerEdge-T330 \
    --to=youngjun.park@lge.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=kasong@tencent.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=pavel@kernel.org \
    --cc=rafael@kernel.org \
    --cc=ryncsn@gmail.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=usama.arif@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox