[PATCH 0/2] mm/swapfile: fix and cleanup swap list iterations

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] mm/swapfile: fix and cleanup swap list iterations
@ 2025-11-25 16:30 Youngjun Park
  2025-11-25 16:30 ` [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard Youngjun Park
  2025-11-25 16:30 ` [PATCH 2/2] mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate Youngjun Park
  0 siblings, 2 replies; 13+ messages in thread
From: Youngjun Park @ 2025-11-25 16:30 UTC (permalink / raw)
  To: akpm, chrisl, kasong
  Cc: shikemeng, nphamcs, bhe, baohua, youngjun.park, linux-mm

This series addresses an issue in swap_sync_discard() where list
iteration can fail when devices are removed, and includes a cleanup
for __folio_throttle_swaprate().

The issue was identified during discussion at:
https://lore.kernel.org/linux-mm/aR53i1G45XAqCzBd@yjaykim-PowerEdge-T330/

Youngjun Park (2):
  mm/swapfile: fix list iteration in swap_sync_discard
  mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate

 mm/swapfile.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

-- 
base-commit: 624c4d7b8fac3fdf9894a5f89f84709771db4dbb
2.34.1

Best regards,
-- 
Youngjun Park


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard
  2025-11-25 16:30 [PATCH 0/2] mm/swapfile: fix and cleanup swap list iterations Youngjun Park
@ 2025-11-25 16:30 ` Youngjun Park
  2025-11-26 18:23   ` Kairui Song
  2025-11-27  2:15   ` Baoquan He
  2025-11-25 16:30 ` [PATCH 2/2] mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate Youngjun Park
  1 sibling, 2 replies; 13+ messages in thread
From: Youngjun Park @ 2025-11-25 16:30 UTC (permalink / raw)
  To: akpm, chrisl, kasong
  Cc: shikemeng, nphamcs, bhe, baohua, youngjun.park, linux-mm

swap_sync_discard() has an issue where if the next device becomes full
and is removed from the plist during iteration, the operation fails
even when other swap devices with pending discard entries remain
available.

Fix by checking plist_node_empty(&next->list) and restarting iteration
when the next node is removed during discard operations.

Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/
swap_active_head. This means the iteration is only affected by swapoff
operations rather than frequent availability changes, reducing
exceptional condition checks and lock contention.

Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation")
Suggested-by: Kairui Song <kasong@tencent.com>
Signed-off-by: Youngjun Park <youngjun.park@lge.com>
---
 mm/swapfile.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index d12332423a06..998271aa09c3 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void)
 	bool ret = false;
 	struct swap_info_struct *si, *next;
 
-	spin_lock(&swap_avail_lock);
-	plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) {
-		spin_unlock(&swap_avail_lock);
+	spin_lock(&swap_lock);
+start_over:
+	plist_for_each_entry_safe(si, next, &swap_active_head, list) {
+		spin_unlock(&swap_lock);
 		if (get_swap_device_info(si)) {
 			if (si->flags & SWP_PAGE_DISCARD)
 				ret = swap_do_scheduled_discard(si);
 			put_swap_device(si);
 		}
 		if (ret)
-			return true;
-		spin_lock(&swap_avail_lock);
+			return ret;
+
+		spin_lock(&swap_lock);
+		if (plist_node_empty(&next->list))
+			goto start_over;
 	}
-	spin_unlock(&swap_avail_lock);
+	spin_unlock(&swap_lock);
 
-	return false;
+	return ret;
 }
 
 /**
-- 
2.34.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/2] mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate
  2025-11-25 16:30 [PATCH 0/2] mm/swapfile: fix and cleanup swap list iterations Youngjun Park
  2025-11-25 16:30 ` [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard Youngjun Park
@ 2025-11-25 16:30 ` Youngjun Park
  1 sibling, 0 replies; 13+ messages in thread
From: Youngjun Park @ 2025-11-25 16:30 UTC (permalink / raw)
  To: akpm, chrisl, kasong
  Cc: shikemeng, nphamcs, bhe, baohua, youngjun.park, linux-mm

The loop breaks immediately after finding the first swap device and
never modifies the list. Replace plist_for_each_entry_safe() with
plist_for_each_entry() and remove the unused next variable.

Signed-off-by: Youngjun Park <youngjun.park@lge.com>
---
 mm/swapfile.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 998271aa09c3..cc0990141171 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -4023,7 +4023,7 @@ static bool __has_usable_swap(void)
 
 void __folio_throttle_swaprate(struct folio *folio, gfp_t gfp)
 {
-	struct swap_info_struct *si, *next;
+	struct swap_info_struct *si;
 
 	if (!(gfp & __GFP_IO))
 		return;
@@ -4042,8 +4042,7 @@ void __folio_throttle_swaprate(struct folio *folio, gfp_t gfp)
 		return;
 
 	spin_lock(&swap_avail_lock);
-	plist_for_each_entry_safe(si, next, &swap_avail_head,
-				  avail_list) {
+	plist_for_each_entry(si, &swap_avail_head, avail_list) {
 		if (si->bdev) {
 			blkcg_schedule_throttle(si->bdev->bd_disk, true);
 			break;
-- 
2.34.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard
  2025-11-25 16:30 ` [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard Youngjun Park
@ 2025-11-26 18:23   ` Kairui Song
  2025-11-27  2:22     ` YoungJun Park
  2025-11-27  2:15   ` Baoquan He
  1 sibling, 1 reply; 13+ messages in thread
From: Kairui Song @ 2025-11-26 18:23 UTC (permalink / raw)
  To: Youngjun Park; +Cc: akpm, chrisl, shikemeng, nphamcs, bhe, baohua, linux-mm

On Wed, Nov 26, 2025 at 12:31 AM Youngjun Park <youngjun.park@lge.com> wrote:
>
> swap_sync_discard() has an issue where if the next device becomes full
> and is removed from the plist during iteration, the operation fails
> even when other swap devices with pending discard entries remain
> available.
>
> Fix by checking plist_node_empty(&next->list) and restarting iteration
> when the next node is removed during discard operations.
>
> Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/
> swap_active_head. This means the iteration is only affected by swapoff
> operations rather than frequent availability changes, reducing
> exceptional condition checks and lock contention.
>
> Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation")
> Suggested-by: Kairui Song <kasong@tencent.com>
> Signed-off-by: Youngjun Park <youngjun.park@lge.com>
> ---
>  mm/swapfile.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index d12332423a06..998271aa09c3 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void)
>         bool ret = false;
>         struct swap_info_struct *si, *next;
>
> -       spin_lock(&swap_avail_lock);
> -       plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) {
> -               spin_unlock(&swap_avail_lock);
> +       spin_lock(&swap_lock);
> +start_over:
> +       plist_for_each_entry_safe(si, next, &swap_active_head, list) {
> +               spin_unlock(&swap_lock);
>                 if (get_swap_device_info(si)) {
>                         if (si->flags & SWP_PAGE_DISCARD)
>                                 ret = swap_do_scheduled_discard(si);
>                         put_swap_device(si);
>                 }
>                 if (ret)
> -                       return true;
> -               spin_lock(&swap_avail_lock);
> +                       return ret;

Nit: I'd prefer to avoid unnecessary changes like this, `return true`
is the same thing. No need to send an update for this though, a really
trivial issue.

> +
> +               spin_lock(&swap_lock);
> +               if (plist_node_empty(&next->list))
> +                       goto start_over;
>         }
> -       spin_unlock(&swap_avail_lock);
> +       spin_unlock(&swap_lock);
>
> -       return false;
> +       return ret;
>  }
>
>  /**
> --
> 2.34.1

Thanks!

Acked-by: Kairui Song <kasong@tencent.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard
  2025-11-25 16:30 ` [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard Youngjun Park
  2025-11-26 18:23   ` Kairui Song
@ 2025-11-27  2:15   ` Baoquan He
  2025-11-27  2:54     ` YoungJun Park
  2025-11-27  5:42     ` YoungJun Park
  1 sibling, 2 replies; 13+ messages in thread
From: Baoquan He @ 2025-11-27  2:15 UTC (permalink / raw)
  To: Youngjun Park; +Cc: akpm, chrisl, kasong, shikemeng, nphamcs, baohua, linux-mm

On 11/26/25 at 01:30am, Youngjun Park wrote:
> swap_sync_discard() has an issue where if the next device becomes full
> and is removed from the plist during iteration, the operation fails
> even when other swap devices with pending discard entries remain
> available.
> 
> Fix by checking plist_node_empty(&next->list) and restarting iteration
> when the next node is removed during discard operations.
> 
> Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/
> swap_active_head. This means the iteration is only affected by swapoff
> operations rather than frequent availability changes, reducing
> exceptional condition checks and lock contention.
> 
> Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation")
> Suggested-by: Kairui Song <kasong@tencent.com>
> Signed-off-by: Youngjun Park <youngjun.park@lge.com>
> ---
>  mm/swapfile.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index d12332423a06..998271aa09c3 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void)
>  	bool ret = false;
>  	struct swap_info_struct *si, *next;
>  
> -	spin_lock(&swap_avail_lock);
> -	plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) {
> -		spin_unlock(&swap_avail_lock);
> +	spin_lock(&swap_lock);
> +start_over:
> +	plist_for_each_entry_safe(si, next, &swap_active_head, list) {
> +		spin_unlock(&swap_lock);
>  		if (get_swap_device_info(si)) {
>  			if (si->flags & SWP_PAGE_DISCARD)
>  				ret = swap_do_scheduled_discard(si);
>  			put_swap_device(si);
>  		}
>  		if (ret)
> -			return true;
> -		spin_lock(&swap_avail_lock);
> +			return ret;
> +
> +		spin_lock(&swap_lock);
> +		if (plist_node_empty(&next->list))
> +			goto start_over;

If there are many si with the same priority, or there are several si
spread in different memcg when swap.tier is available, are we going to
keep looping here to start over and over again possibly? The old code is
supposed to go through the plist to do one round of discarding? Not sure
if I got the code wrong, or the chance it very tiny.

Thanks
Baoquan

>  	}
> -	spin_unlock(&swap_avail_lock);
> +	spin_unlock(&swap_lock);
>  
> -	return false;
> +	return ret;
>  }
>  
>  /**
> -- 
> 2.34.1
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard
  2025-11-26 18:23   ` Kairui Song
@ 2025-11-27  2:22     ` YoungJun Park
  0 siblings, 0 replies; 13+ messages in thread
From: YoungJun Park @ 2025-11-27  2:22 UTC (permalink / raw)
  To: Kairui Song; +Cc: akpm, chrisl, shikemeng, nphamcs, bhe, baohua, linux-mm

On Thu, Nov 27, 2025 at 02:23:05AM +0800, Kairui Song wrote:
> On Wed, Nov 26, 2025 at 12:31 AM Youngjun Park <youngjun.park@lge.com> wrote:
> >
> > swap_sync_discard() has an issue where if the next device becomes full
> > and is removed from the plist during iteration, the operation fails
> > even when other swap devices with pending discard entries remain
> > available.
> >
> > Fix by checking plist_node_empty(&next->list) and restarting iteration
> > when the next node is removed during discard operations.
> >
> > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/
> > swap_active_head. This means the iteration is only affected by swapoff
> > operations rather than frequent availability changes, reducing
> > exceptional condition checks and lock contention.
> >
> > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation")
> > Suggested-by: Kairui Song <kasong@tencent.com>
> > Signed-off-by: Youngjun Park <youngjun.park@lge.com>
> > ---
> >  mm/swapfile.c | 18 +++++++++++-------
> >  1 file changed, 11 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index d12332423a06..998271aa09c3 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void)
> >         bool ret = false;
> >         struct swap_info_struct *si, *next;
> >
> > -       spin_lock(&swap_avail_lock);
> > -       plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) {
> > -               spin_unlock(&swap_avail_lock);
> > +       spin_lock(&swap_lock);
> > +start_over:
> > +       plist_for_each_entry_safe(si, next, &swap_active_head, list) {
> > +               spin_unlock(&swap_lock);
> >                 if (get_swap_device_info(si)) {
> >                         if (si->flags & SWP_PAGE_DISCARD)
> >                                 ret = swap_do_scheduled_discard(si);
> >                         put_swap_device(si);
> >                 }
> >                 if (ret)
> > -                       return true;
> > -               spin_lock(&swap_avail_lock);
> > +                       return ret;
> 
> Nit: I'd prefer to avoid unnecessary changes like this, `return true`
> is the same thing. No need to send an update for this though, a really
> trivial issue.

Okay, I will revert it to the original code.

> > +
> > +               spin_lock(&swap_lock);
> > +               if (plist_node_empty(&next->list))
> > +                       goto start_over;
> >         }
> > -       spin_unlock(&swap_avail_lock);
> > +       spin_unlock(&swap_lock);
> >
> > -       return false;
> > +       return ret;
> >  }
> >
> >  /**
> > --
> > 2.34.1
> 
> Thanks!
> 
> Acked-by: Kairui Song <kasong@tencent.com>
> 

Thank you
Youngjun Park


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard
  2025-11-27  2:15   ` Baoquan He
@ 2025-11-27  2:54     ` YoungJun Park
  2025-11-27  5:42     ` YoungJun Park
  1 sibling, 0 replies; 13+ messages in thread
From: YoungJun Park @ 2025-11-27  2:54 UTC (permalink / raw)
  To: Baoquan He; +Cc: akpm, chrisl, kasong, shikemeng, nphamcs, baohua, linux-mm

On Thu, Nov 27, 2025 at 10:15:50AM +0800, Baoquan He wrote:
> On 11/26/25 at 01:30am, Youngjun Park wrote:
> > swap_sync_discard() has an issue where if the next device becomes full
> > and is removed from the plist during iteration, the operation fails
> > even when other swap devices with pending discard entries remain
> > available.
> > 
> > Fix by checking plist_node_empty(&next->list) and restarting iteration
> > when the next node is removed during discard operations.
> > 
> > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/
> > swap_active_head. This means the iteration is only affected by swapoff
> > operations rather than frequent availability changes, reducing
> > exceptional condition checks and lock contention.
> > 
> > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation")
> > Suggested-by: Kairui Song <kasong@tencent.com>
> > Signed-off-by: Youngjun Park <youngjun.park@lge.com>
> > ---
> >  mm/swapfile.c | 18 +++++++++++-------
> >  1 file changed, 11 insertions(+), 7 deletions(-)
> > 
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index d12332423a06..998271aa09c3 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void)
> >  	bool ret = false;
> >  	struct swap_info_struct *si, *next;
> >  
> > -	spin_lock(&swap_avail_lock);
> > -	plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) {
> > -		spin_unlock(&swap_avail_lock);
> > +	spin_lock(&swap_lock);
> > +start_over:
> > +	plist_for_each_entry_safe(si, next, &swap_active_head, list) {
> > +		spin_unlock(&swap_lock);
> >  		if (get_swap_device_info(si)) {
> >  			if (si->flags & SWP_PAGE_DISCARD)
> >  				ret = swap_do_scheduled_discard(si);
> >  			put_swap_device(si);
> >  		}
> >  		if (ret)
> > -			return true;
> > -		spin_lock(&swap_avail_lock);
> > +			return ret;
> > +
> > +		spin_lock(&swap_lock);
> > +		if (plist_node_empty(&next->list))
> > +			goto start_over;
> 
> If there are many si with the same priority, or there are several si

Is this because of the requeue that happens while iterating over
`swap_avail_head`? But, requeue does not make node empty.
Also,  since we are iterating over `swap_active_head`,
it seems like it wouldn’t happen. 

> spread in different memcg when swap.tier is available, are we going to
> keep looping here to start over and over again possibly?

I think loop cannot happen on here by that reason.
But, Loop can possbily happen between swap_alloc_slow and swap_sync_discard.

If `swap.tier` is applied, I think you’re referring to the situation where
`si`s not belonging to the current tier are discarded successfully, and then
the next iteration goes through the available list again for the swap devices
in the same tier. As you mentioned, a needless looping situation could occur. 
(if discards accumulate very quickly, could it even lead to an infinite loop.)
If `swap.tier` is applied, this part may also need to be modified.

> The old code is supposed to go through the plist to do one round of discarding? 

After your review, I thought more about it — if continuous swap on/off occurs
while the `swap_lock` is released, it seems that we could keep hitting
`plist_node_empty`.
However, I think this case is very unlikely, so it
shouldn’t be a problem. 
Actually, swap_alloc_slow already works that way.
What do you think?

In the old code, if a swapoff occurs and swap usage becomes zero, causing it
to be removed from the `avail_list`, it ends up doing a one-round discarding.
If we don’t like the idea of looping due to continuous swap on/off, we could
consider adding a retry count or removing the `plist_node_empty` check.

> Not sure if I got the code wrong, or the chance it very tiny.
> Thanks
> Baoquan

I answered based on my understanding, but please correct me if I misunderstood your point.

Thanks for the review.
Youngjun Park

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard
  2025-11-27  2:15   ` Baoquan He
  2025-11-27  2:54     ` YoungJun Park
@ 2025-11-27  5:42     ` YoungJun Park
  2025-11-27  8:06       ` Baoquan He
  1 sibling, 1 reply; 13+ messages in thread
From: YoungJun Park @ 2025-11-27  5:42 UTC (permalink / raw)
  To: Baoquan He; +Cc: akpm, chrisl, kasong, shikemeng, nphamcs, baohua, linux-mm

On Thu, Nov 27, 2025 at 10:15:50AM +0800, Baoquan He wrote:
> On 11/26/25 at 01:30am, Youngjun Park wrote:
> > swap_sync_discard() has an issue where if the next device becomes full
> > and is removed from the plist during iteration, the operation fails
> > even when other swap devices with pending discard entries remain
> > available.
> > 
> > Fix by checking plist_node_empty(&next->list) and restarting iteration
> > when the next node is removed during discard operations.
> > 
> > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/
> > swap_active_head. This means the iteration is only affected by swapoff
> > operations rather than frequent availability changes, reducing
> > exceptional condition checks and lock contention.
> > 
> > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation")
> > Suggested-by: Kairui Song <kasong@tencent.com>
> > Signed-off-by: Youngjun Park <youngjun.park@lge.com>
> > ---
> >  mm/swapfile.c | 18 +++++++++++-------
> >  1 file changed, 11 insertions(+), 7 deletions(-)
> > 
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index d12332423a06..998271aa09c3 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void)
> >  	bool ret = false;
> >  	struct swap_info_struct *si, *next;
> >  
> > -	spin_lock(&swap_avail_lock);
> > -	plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) {
> > -		spin_unlock(&swap_avail_lock);
> > +	spin_lock(&swap_lock);
> > +start_over:
> > +	plist_for_each_entry_safe(si, next, &swap_active_head, list) {
> > +		spin_unlock(&swap_lock);
> >  		if (get_swap_device_info(si)) {
> >  			if (si->flags & SWP_PAGE_DISCARD)
> >  				ret = swap_do_scheduled_discard(si);
> >  			put_swap_device(si);
> >  		}
> >  		if (ret)
> > -			return true;
> > -		spin_lock(&swap_avail_lock);
> > +			return ret;
> > +
> > +		spin_lock(&swap_lock);
> > +		if (plist_node_empty(&next->list))
> > +			goto start_over;

By forcing a brief delay right before the swap_lock, I was able to observe at
runtime that when the next node is removed (due to swapoff), and there is no
plist_node_empty check, plist_del makes the node point to itself. As a result,
when the iteration continues to the next entry, it keeps retrying on itself,
since the list traversal termination condition is based on whether the current
node is the head or not.

At first glance, I had assumed that plist_node_empty also implicitly served as
a termination condition of plist_for_each_entry_safe.

Therefore, the real reason for this patch is not:
    "swap_sync_discard() has an issue where if the next device becomes full
    and is removed from the plist during iteration, the operation fails even
    when other swap devices with pending discard entries remain available."
but rather:
    "When the next node is removed, the next pointer loops back to the current
    entry, possibly causing an loop until it will be reinserted on the list."

So, the plist_node_empty check is necessary — either as it is now (not the original
code, the patch I modified) or as a break condition 
(if we want to avoid the swap on/off loop situation I mentioned in my previous email.)

Best regards,
Youngjun Park


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard
  2025-11-27  5:42     ` YoungJun Park
@ 2025-11-27  8:06       ` Baoquan He
  2025-11-27  9:34         ` YoungJun Park
  0 siblings, 1 reply; 13+ messages in thread
From: Baoquan He @ 2025-11-27  8:06 UTC (permalink / raw)
  To: YoungJun Park; +Cc: akpm, chrisl, kasong, shikemeng, nphamcs, baohua, linux-mm

On 11/27/25 at 02:42pm, YoungJun Park wrote:
> On Thu, Nov 27, 2025 at 10:15:50AM +0800, Baoquan He wrote:
> > On 11/26/25 at 01:30am, Youngjun Park wrote:
> > > swap_sync_discard() has an issue where if the next device becomes full
> > > and is removed from the plist during iteration, the operation fails
> > > even when other swap devices with pending discard entries remain
> > > available.
> > > 
> > > Fix by checking plist_node_empty(&next->list) and restarting iteration
> > > when the next node is removed during discard operations.
> > > 
> > > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/
> > > swap_active_head. This means the iteration is only affected by swapoff
> > > operations rather than frequent availability changes, reducing
> > > exceptional condition checks and lock contention.
> > > 
> > > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation")
> > > Suggested-by: Kairui Song <kasong@tencent.com>
> > > Signed-off-by: Youngjun Park <youngjun.park@lge.com>
> > > ---
> > >  mm/swapfile.c | 18 +++++++++++-------
> > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > index d12332423a06..998271aa09c3 100644
> > > --- a/mm/swapfile.c
> > > +++ b/mm/swapfile.c
> > > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void)
> > >  	bool ret = false;
> > >  	struct swap_info_struct *si, *next;
> > >  
> > > -	spin_lock(&swap_avail_lock);
> > > -	plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) {
> > > -		spin_unlock(&swap_avail_lock);
> > > +	spin_lock(&swap_lock);
> > > +start_over:
> > > +	plist_for_each_entry_safe(si, next, &swap_active_head, list) {
> > > +		spin_unlock(&swap_lock);
> > >  		if (get_swap_device_info(si)) {
> > >  			if (si->flags & SWP_PAGE_DISCARD)
> > >  				ret = swap_do_scheduled_discard(si);
> > >  			put_swap_device(si);
> > >  		}
> > >  		if (ret)
> > > -			return true;
> > > -		spin_lock(&swap_avail_lock);
> > > +			return ret;
> > > +
> > > +		spin_lock(&swap_lock);
> > > +		if (plist_node_empty(&next->list))
> > > +			goto start_over;
> 
> By forcing a brief delay right before the swap_lock, I was able to observe at
> runtime that when the next node is removed (due to swapoff), and there is no
> plist_node_empty check, plist_del makes the node point to itself. As a result,
> when the iteration continues to the next entry, it keeps retrying on itself,
> since the list traversal termination condition is based on whether the current
> node is the head or not.
> 
> At first glance, I had assumed that plist_node_empty also implicitly served as
> a termination condition of plist_for_each_entry_safe.
> 
> Therefore, the real reason for this patch is not:
>     "swap_sync_discard() has an issue where if the next device becomes full
>     and is removed from the plist during iteration, the operation fails even
>     when other swap devices with pending discard entries remain available."
> but rather:
>     "When the next node is removed, the next pointer loops back to the current
>     entry, possibly causing an loop until it will be reinserted on the list."
> 
> So, the plist_node_empty check is necessary — either as it is now (not the original
> code, the patch I modified) or as a break condition 
> (if we want to avoid the swap on/off loop situation I mentioned in my previous email.)

OK, I only thought of swap on/off case, didn't think much. As you
analyzed, the plist_node_empty check is necessary. So this patch looks
good to me. Or one alternative way is fetching the new next? Not strong
opinion though.

                if (plist_node_empty(&next->list)) {
                        if (!plist_node_empty(&si->list)) {
                                next = list_next_entry(si, list.node_list);
                                continue;
                        }
                        return false;
                }



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard
  2025-11-27  8:06       ` Baoquan He
@ 2025-11-27  9:34         ` YoungJun Park
  2025-11-27 10:32           ` Baoquan He
  0 siblings, 1 reply; 13+ messages in thread
From: YoungJun Park @ 2025-11-27  9:34 UTC (permalink / raw)
  To: Baoquan He; +Cc: akpm, chrisl, kasong, shikemeng, nphamcs, baohua, linux-mm

On Thu, Nov 27, 2025 at 04:06:56PM +0800, Baoquan He wrote:
> On 11/27/25 at 02:42pm, YoungJun Park wrote:
> > On Thu, Nov 27, 2025 at 10:15:50AM +0800, Baoquan He wrote:
> > > On 11/26/25 at 01:30am, Youngjun Park wrote:
> > > > swap_sync_discard() has an issue where if the next device becomes full
> > > > and is removed from the plist during iteration, the operation fails
> > > > even when other swap devices with pending discard entries remain
> > > > available.
> > > > 
> > > > Fix by checking plist_node_empty(&next->list) and restarting iteration
> > > > when the next node is removed during discard operations.
> > > > 
> > > > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/
> > > > swap_active_head. This means the iteration is only affected by swapoff
> > > > operations rather than frequent availability changes, reducing
> > > > exceptional condition checks and lock contention.
> > > > 
> > > > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation")
> > > > Suggested-by: Kairui Song <kasong@tencent.com>
> > > > Signed-off-by: Youngjun Park <youngjun.park@lge.com>
> > > > ---
> > > >  mm/swapfile.c | 18 +++++++++++-------
> > > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > > index d12332423a06..998271aa09c3 100644
> > > > --- a/mm/swapfile.c
> > > > +++ b/mm/swapfile.c
> > > > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void)
> > > >  	bool ret = false;
> > > >  	struct swap_info_struct *si, *next;
> > > >  
> > > > -	spin_lock(&swap_avail_lock);
> > > > -	plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) {
> > > > -		spin_unlock(&swap_avail_lock);
> > > > +	spin_lock(&swap_lock);
> > > > +start_over:
> > > > +	plist_for_each_entry_safe(si, next, &swap_active_head, list) {
> > > > +		spin_unlock(&swap_lock);
> > > >  		if (get_swap_device_info(si)) {
> > > >  			if (si->flags & SWP_PAGE_DISCARD)
> > > >  				ret = swap_do_scheduled_discard(si);
> > > >  			put_swap_device(si);
> > > >  		}
> > > >  		if (ret)
> > > > -			return true;
> > > > -		spin_lock(&swap_avail_lock);
> > > > +			return ret;
> > > > +
> > > > +		spin_lock(&swap_lock);
> > > > +		if (plist_node_empty(&next->list))
> > > > +			goto start_over;
> > 
> > By forcing a brief delay right before the swap_lock, I was able to observe at
> > runtime that when the next node is removed (due to swapoff), and there is no
> > plist_node_empty check, plist_del makes the node point to itself. As a result,
> > when the iteration continues to the next entry, it keeps retrying on itself,
> > since the list traversal termination condition is based on whether the current
> > node is the head or not.
> > 
> > At first glance, I had assumed that plist_node_empty also implicitly served as
> > a termination condition of plist_for_each_entry_safe.
> > 
> > Therefore, the real reason for this patch is not:
> >     "swap_sync_discard() has an issue where if the next device becomes full
> >     and is removed from the plist during iteration, the operation fails even
> >     when other swap devices with pending discard entries remain available."
> > but rather:
> >     "When the next node is removed, the next pointer loops back to the current
> >     entry, possibly causing an loop until it will be reinserted on the list."
> > 
> > So, the plist_node_empty check is necessary — either as it is now (not the original
> > code, the patch I modified) or as a break condition 
> > (if we want to avoid the swap on/off loop situation I mentioned in my previous email.)
> 
> OK, I only thought of swap on/off case, didn't think much. As you
> analyzed, the plist_node_empty check is necessary. So this patch looks
> good to me. Or one alternative way is fetching the new next? Not strong
> opinion though.
> 
>                 if (plist_node_empty(&next->list)) {
>                         if (!plist_node_empty(&si->list)) {
>                                 next = list_next_entry(si, list.node_list);
>                                 continue;
>                         }
>                         return false;
>                 }

Thank you for the suggestion :D
I agree it could be an improvement in some cases.
Personally, I feel the current code works fine, 
and from a readability perspective, the current approach might be a bit clearer.
It also seems that the alternative would only make a difference in very minor cases.
(order 0, swapfail and swapoff during on this routine)

Youngjun Park


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard
  2025-11-27  9:34         ` YoungJun Park
@ 2025-11-27 10:32           ` Baoquan He
  2025-11-27 10:44             ` YoungJun Park
  0 siblings, 1 reply; 13+ messages in thread
From: Baoquan He @ 2025-11-27 10:32 UTC (permalink / raw)
  To: YoungJun Park; +Cc: akpm, chrisl, kasong, shikemeng, nphamcs, baohua, linux-mm

On 11/27/25 at 06:34pm, YoungJun Park wrote:
> On Thu, Nov 27, 2025 at 04:06:56PM +0800, Baoquan He wrote:
> > On 11/27/25 at 02:42pm, YoungJun Park wrote:
> > > On Thu, Nov 27, 2025 at 10:15:50AM +0800, Baoquan He wrote:
> > > > On 11/26/25 at 01:30am, Youngjun Park wrote:
> > > > > swap_sync_discard() has an issue where if the next device becomes full
> > > > > and is removed from the plist during iteration, the operation fails
> > > > > even when other swap devices with pending discard entries remain
> > > > > available.
> > > > > 
> > > > > Fix by checking plist_node_empty(&next->list) and restarting iteration
> > > > > when the next node is removed during discard operations.
> > > > > 
> > > > > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/
> > > > > swap_active_head. This means the iteration is only affected by swapoff
> > > > > operations rather than frequent availability changes, reducing
> > > > > exceptional condition checks and lock contention.
> > > > > 
> > > > > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation")
> > > > > Suggested-by: Kairui Song <kasong@tencent.com>
> > > > > Signed-off-by: Youngjun Park <youngjun.park@lge.com>
> > > > > ---
> > > > >  mm/swapfile.c | 18 +++++++++++-------
> > > > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > > > > 
> > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > > > index d12332423a06..998271aa09c3 100644
> > > > > --- a/mm/swapfile.c
> > > > > +++ b/mm/swapfile.c
> > > > > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void)
> > > > >  	bool ret = false;
> > > > >  	struct swap_info_struct *si, *next;
> > > > >  
> > > > > -	spin_lock(&swap_avail_lock);
> > > > > -	plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) {
> > > > > -		spin_unlock(&swap_avail_lock);
> > > > > +	spin_lock(&swap_lock);
> > > > > +start_over:
> > > > > +	plist_for_each_entry_safe(si, next, &swap_active_head, list) {
> > > > > +		spin_unlock(&swap_lock);
> > > > >  		if (get_swap_device_info(si)) {
> > > > >  			if (si->flags & SWP_PAGE_DISCARD)
> > > > >  				ret = swap_do_scheduled_discard(si);
> > > > >  			put_swap_device(si);
> > > > >  		}
> > > > >  		if (ret)
> > > > > -			return true;
> > > > > -		spin_lock(&swap_avail_lock);
> > > > > +			return ret;
> > > > > +
> > > > > +		spin_lock(&swap_lock);
> > > > > +		if (plist_node_empty(&next->list))
> > > > > +			goto start_over;
> > > 
> > > By forcing a brief delay right before the swap_lock, I was able to observe at
> > > runtime that when the next node is removed (due to swapoff), and there is no
> > > plist_node_empty check, plist_del makes the node point to itself. As a result,
> > > when the iteration continues to the next entry, it keeps retrying on itself,
> > > since the list traversal termination condition is based on whether the current
> > > node is the head or not.
> > > 
> > > At first glance, I had assumed that plist_node_empty also implicitly served as
> > > a termination condition of plist_for_each_entry_safe.
> > > 
> > > Therefore, the real reason for this patch is not:
> > >     "swap_sync_discard() has an issue where if the next device becomes full
> > >     and is removed from the plist during iteration, the operation fails even
> > >     when other swap devices with pending discard entries remain available."
> > > but rather:
> > >     "When the next node is removed, the next pointer loops back to the current
> > >     entry, possibly causing an loop until it will be reinserted on the list."
> > > 
> > > So, the plist_node_empty check is necessary — either as it is now (not the original
> > > code, the patch I modified) or as a break condition 
> > > (if we want to avoid the swap on/off loop situation I mentioned in my previous email.)
> > 
> > OK, I only thought of swap on/off case, didn't think much. As you
> > analyzed, the plist_node_empty check is necessary. So this patch looks
> > good to me. Or one alternative way is fetching the new next? Not strong
> > opinion though.
> > 
> >                 if (plist_node_empty(&next->list)) {
> >                         if (!plist_node_empty(&si->list)) {
> >                                 next = list_next_entry(si, list.node_list);
> >                                 continue;
> >                         }
> >                         return false;
> >                 }
> 
> Thank you for the suggestion :D
> I agree it could be an improvement in some cases.
> Personally, I feel the current code works fine, 
> and from a readability perspective, the current approach might be a bit clearer.
> It also seems that the alternative would only make a difference in very minor cases.
> (order 0, swapfail and swapoff during on this routine)

Agree. Will you post v2 to update the patch log? I would like to add my
reviewing tag if no v2 is planned.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard
  2025-11-27 10:32           ` Baoquan He
@ 2025-11-27 10:44             ` YoungJun Park
  2025-11-27 10:50               ` Baoquan He
  0 siblings, 1 reply; 13+ messages in thread
From: YoungJun Park @ 2025-11-27 10:44 UTC (permalink / raw)
  To: Baoquan He; +Cc: akpm, chrisl, kasong, shikemeng, nphamcs, baohua, linux-mm

On Thu, Nov 27, 2025 at 06:32:53PM +0800, Baoquan He wrote:
> On 11/27/25 at 06:34pm, YoungJun Park wrote:
> > On Thu, Nov 27, 2025 at 04:06:56PM +0800, Baoquan He wrote:
> > > On 11/27/25 at 02:42pm, YoungJun Park wrote:
> > > > On Thu, Nov 27, 2025 at 10:15:50AM +0800, Baoquan He wrote:
> > > > > On 11/26/25 at 01:30am, Youngjun Park wrote:
> > > > > > swap_sync_discard() has an issue where if the next device becomes full
> > > > > > and is removed from the plist during iteration, the operation fails
> > > > > > even when other swap devices with pending discard entries remain
> > > > > > available.
> > > > > > 
> > > > > > Fix by checking plist_node_empty(&next->list) and restarting iteration
> > > > > > when the next node is removed during discard operations.
> > > > > > 
> > > > > > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/
> > > > > > swap_active_head. This means the iteration is only affected by swapoff
> > > > > > operations rather than frequent availability changes, reducing
> > > > > > exceptional condition checks and lock contention.
> > > > > > 
> > > > > > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation")
> > > > > > Suggested-by: Kairui Song <kasong@tencent.com>
> > > > > > Signed-off-by: Youngjun Park <youngjun.park@lge.com>
> > > > > > ---
> > > > > >  mm/swapfile.c | 18 +++++++++++-------
> > > > > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > > > > > 
> > > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > > > > index d12332423a06..998271aa09c3 100644
> > > > > > --- a/mm/swapfile.c
> > > > > > +++ b/mm/swapfile.c
> > > > > > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void)
> > > > > >  	bool ret = false;
> > > > > >  	struct swap_info_struct *si, *next;
> > > > > >  
> > > > > > -	spin_lock(&swap_avail_lock);
> > > > > > -	plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) {
> > > > > > -		spin_unlock(&swap_avail_lock);
> > > > > > +	spin_lock(&swap_lock);
> > > > > > +start_over:
> > > > > > +	plist_for_each_entry_safe(si, next, &swap_active_head, list) {
> > > > > > +		spin_unlock(&swap_lock);
> > > > > >  		if (get_swap_device_info(si)) {
> > > > > >  			if (si->flags & SWP_PAGE_DISCARD)
> > > > > >  				ret = swap_do_scheduled_discard(si);
> > > > > >  			put_swap_device(si);
> > > > > >  		}
> > > > > >  		if (ret)
> > > > > > -			return true;
> > > > > > -		spin_lock(&swap_avail_lock);
> > > > > > +			return ret;
> > > > > > +
> > > > > > +		spin_lock(&swap_lock);
> > > > > > +		if (plist_node_empty(&next->list))
> > > > > > +			goto start_over;
> > > > 
> > > > By forcing a brief delay right before the swap_lock, I was able to observe at
> > > > runtime that when the next node is removed (due to swapoff), and there is no
> > > > plist_node_empty check, plist_del makes the node point to itself. As a result,
> > > > when the iteration continues to the next entry, it keeps retrying on itself,
> > > > since the list traversal termination condition is based on whether the current
> > > > node is the head or not.
> > > > 
> > > > At first glance, I had assumed that plist_node_empty also implicitly served as
> > > > a termination condition of plist_for_each_entry_safe.
> > > > 
> > > > Therefore, the real reason for this patch is not:
> > > >     "swap_sync_discard() has an issue where if the next device becomes full
> > > >     and is removed from the plist during iteration, the operation fails even
> > > >     when other swap devices with pending discard entries remain available."
> > > > but rather:
> > > >     "When the next node is removed, the next pointer loops back to the current
> > > >     entry, possibly causing an loop until it will be reinserted on the list."
> > > > 
> > > > So, the plist_node_empty check is necessary — either as it is now (not the original
> > > > code, the patch I modified) or as a break condition 
> > > > (if we want to avoid the swap on/off loop situation I mentioned in my previous email.)
> > > 
> > > OK, I only thought of swap on/off case, didn't think much. As you
> > > analyzed, the plist_node_empty check is necessary. So this patch looks
> > > good to me. Or one alternative way is fetching the new next? Not strong
> > > opinion though.
> > > 
> > >                 if (plist_node_empty(&next->list)) {
> > >                         if (!plist_node_empty(&si->list)) {
> > >                                 next = list_next_entry(si, list.node_list);
> > >                                 continue;
> > >                         }
> > >                         return false;
> > >                 }
> > 
> > Thank you for the suggestion :D
> > I agree it could be an improvement in some cases.
> > Personally, I feel the current code works fine, 
> > and from a readability perspective, the current approach might be a bit clearer.
> > It also seems that the alternative would only make a difference in very minor cases.
> > (order 0, swapfail and swapoff during on this routine)
> 
> Agree. Will you post v2 to update the patch log? I would like to add my
> reviewing tag if no v2 is planned.

Oops, I’ve just posted v2 to update the patch log.
Link: https://lore.kernel.org/linux-mm/20251127100303.783198-1-youngjun.park@lge.com/T/#m920503bf9bac0d35bd2c8467a926481e58d7ab53


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard
  2025-11-27 10:44             ` YoungJun Park
@ 2025-11-27 10:50               ` Baoquan He
  0 siblings, 0 replies; 13+ messages in thread
From: Baoquan He @ 2025-11-27 10:50 UTC (permalink / raw)
  To: YoungJun Park; +Cc: akpm, chrisl, kasong, shikemeng, nphamcs, baohua, linux-mm

On 11/27/25 at 07:44pm, YoungJun Park wrote:
> On Thu, Nov 27, 2025 at 06:32:53PM +0800, Baoquan He wrote:
> > On 11/27/25 at 06:34pm, YoungJun Park wrote:
> > > On Thu, Nov 27, 2025 at 04:06:56PM +0800, Baoquan He wrote:
> > > > On 11/27/25 at 02:42pm, YoungJun Park wrote:
> > > > > On Thu, Nov 27, 2025 at 10:15:50AM +0800, Baoquan He wrote:
> > > > > > On 11/26/25 at 01:30am, Youngjun Park wrote:
> > > > > > > swap_sync_discard() has an issue where if the next device becomes full
> > > > > > > and is removed from the plist during iteration, the operation fails
> > > > > > > even when other swap devices with pending discard entries remain
> > > > > > > available.
> > > > > > > 
> > > > > > > Fix by checking plist_node_empty(&next->list) and restarting iteration
> > > > > > > when the next node is removed during discard operations.
> > > > > > > 
> > > > > > > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/
> > > > > > > swap_active_head. This means the iteration is only affected by swapoff
> > > > > > > operations rather than frequent availability changes, reducing
> > > > > > > exceptional condition checks and lock contention.
> > > > > > > 
> > > > > > > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation")
> > > > > > > Suggested-by: Kairui Song <kasong@tencent.com>
> > > > > > > Signed-off-by: Youngjun Park <youngjun.park@lge.com>
> > > > > > > ---
> > > > > > >  mm/swapfile.c | 18 +++++++++++-------
> > > > > > >  1 file changed, 11 insertions(+), 7 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > > > > > index d12332423a06..998271aa09c3 100644
> > > > > > > --- a/mm/swapfile.c
> > > > > > > +++ b/mm/swapfile.c
> > > > > > > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void)
> > > > > > >  	bool ret = false;
> > > > > > >  	struct swap_info_struct *si, *next;
> > > > > > >  
> > > > > > > -	spin_lock(&swap_avail_lock);
> > > > > > > -	plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) {
> > > > > > > -		spin_unlock(&swap_avail_lock);
> > > > > > > +	spin_lock(&swap_lock);
> > > > > > > +start_over:
> > > > > > > +	plist_for_each_entry_safe(si, next, &swap_active_head, list) {
> > > > > > > +		spin_unlock(&swap_lock);
> > > > > > >  		if (get_swap_device_info(si)) {
> > > > > > >  			if (si->flags & SWP_PAGE_DISCARD)
> > > > > > >  				ret = swap_do_scheduled_discard(si);
> > > > > > >  			put_swap_device(si);
> > > > > > >  		}
> > > > > > >  		if (ret)
> > > > > > > -			return true;
> > > > > > > -		spin_lock(&swap_avail_lock);
> > > > > > > +			return ret;
> > > > > > > +
> > > > > > > +		spin_lock(&swap_lock);
> > > > > > > +		if (plist_node_empty(&next->list))
> > > > > > > +			goto start_over;
> > > > > 
> > > > > By forcing a brief delay right before the swap_lock, I was able to observe at
> > > > > runtime that when the next node is removed (due to swapoff), and there is no
> > > > > plist_node_empty check, plist_del makes the node point to itself. As a result,
> > > > > when the iteration continues to the next entry, it keeps retrying on itself,
> > > > > since the list traversal termination condition is based on whether the current
> > > > > node is the head or not.
> > > > > 
> > > > > At first glance, I had assumed that plist_node_empty also implicitly served as
> > > > > a termination condition of plist_for_each_entry_safe.
> > > > > 
> > > > > Therefore, the real reason for this patch is not:
> > > > >     "swap_sync_discard() has an issue where if the next device becomes full
> > > > >     and is removed from the plist during iteration, the operation fails even
> > > > >     when other swap devices with pending discard entries remain available."
> > > > > but rather:
> > > > >     "When the next node is removed, the next pointer loops back to the current
> > > > >     entry, possibly causing an loop until it will be reinserted on the list."
> > > > > 
> > > > > So, the plist_node_empty check is necessary — either as it is now (not the original
> > > > > code, the patch I modified) or as a break condition 
> > > > > (if we want to avoid the swap on/off loop situation I mentioned in my previous email.)
> > > > 
> > > > OK, I only thought of swap on/off case, didn't think much. As you
> > > > analyzed, the plist_node_empty check is necessary. So this patch looks
> > > > good to me. Or one alternative way is fetching the new next? Not strong
> > > > opinion though.
> > > > 
> > > >                 if (plist_node_empty(&next->list)) {
> > > >                         if (!plist_node_empty(&si->list)) {
> > > >                                 next = list_next_entry(si, list.node_list);
> > > >                                 continue;
> > > >                         }
> > > >                         return false;
> > > >                 }
> > > 
> > > Thank you for the suggestion :D
> > > I agree it could be an improvement in some cases.
> > > Personally, I feel the current code works fine, 
> > > and from a readability perspective, the current approach might be a bit clearer.
> > > It also seems that the alternative would only make a difference in very minor cases.
> > > (order 0, swapfail and swapoff during on this routine)
> > 
> > Agree. Will you post v2 to update the patch log? I would like to add my
> > reviewing tag if no v2 is planned.
> 
> Oops, I’ve just posted v2 to update the patch log.
> Link: https://lore.kernel.org/linux-mm/20251127100303.783198-1-youngjun.park@lge.com/T/#m920503bf9bac0d35bd2c8467a926481e58d7ab53

Saw it, thanks.



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-11-27 10:50 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-25 16:30 [PATCH 0/2] mm/swapfile: fix and cleanup swap list iterations Youngjun Park
2025-11-25 16:30 ` [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard Youngjun Park
2025-11-26 18:23   ` Kairui Song
2025-11-27  2:22     ` YoungJun Park
2025-11-27  2:15   ` Baoquan He
2025-11-27  2:54     ` YoungJun Park
2025-11-27  5:42     ` YoungJun Park
2025-11-27  8:06       ` Baoquan He
2025-11-27  9:34         ` YoungJun Park
2025-11-27 10:32           ` Baoquan He
2025-11-27 10:44             ` YoungJun Park
2025-11-27 10:50               ` Baoquan He
2025-11-25 16:30 ` [PATCH 2/2] mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate Youngjun Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox