* [PATCH 0/2] mm/swapfile: fix and cleanup swap list iterations @ 2025-11-25 16:30 Youngjun Park 2025-11-25 16:30 ` [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard Youngjun Park 2025-11-25 16:30 ` [PATCH 2/2] mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate Youngjun Park 0 siblings, 2 replies; 13+ messages in thread From: Youngjun Park @ 2025-11-25 16:30 UTC (permalink / raw) To: akpm, chrisl, kasong Cc: shikemeng, nphamcs, bhe, baohua, youngjun.park, linux-mm This series addresses an issue in swap_sync_discard() where list iteration can fail when devices are removed, and includes a cleanup for __folio_throttle_swaprate(). The issue was identified during discussion at: https://lore.kernel.org/linux-mm/aR53i1G45XAqCzBd@yjaykim-PowerEdge-T330/ Youngjun Park (2): mm/swapfile: fix list iteration in swap_sync_discard mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate mm/swapfile.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) -- base-commit: 624c4d7b8fac3fdf9894a5f89f84709771db4dbb 2.34.1 Best regards, -- Youngjun Park ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard 2025-11-25 16:30 [PATCH 0/2] mm/swapfile: fix and cleanup swap list iterations Youngjun Park @ 2025-11-25 16:30 ` Youngjun Park 2025-11-26 18:23 ` Kairui Song 2025-11-27 2:15 ` Baoquan He 2025-11-25 16:30 ` [PATCH 2/2] mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate Youngjun Park 1 sibling, 2 replies; 13+ messages in thread From: Youngjun Park @ 2025-11-25 16:30 UTC (permalink / raw) To: akpm, chrisl, kasong Cc: shikemeng, nphamcs, bhe, baohua, youngjun.park, linux-mm swap_sync_discard() has an issue where if the next device becomes full and is removed from the plist during iteration, the operation fails even when other swap devices with pending discard entries remain available. Fix by checking plist_node_empty(&next->list) and restarting iteration when the next node is removed during discard operations. Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/ swap_active_head. This means the iteration is only affected by swapoff operations rather than frequent availability changes, reducing exceptional condition checks and lock contention. Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation") Suggested-by: Kairui Song <kasong@tencent.com> Signed-off-by: Youngjun Park <youngjun.park@lge.com> --- mm/swapfile.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index d12332423a06..998271aa09c3 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void) bool ret = false; struct swap_info_struct *si, *next; - spin_lock(&swap_avail_lock); - plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) { - spin_unlock(&swap_avail_lock); + spin_lock(&swap_lock); +start_over: + plist_for_each_entry_safe(si, next, &swap_active_head, list) { + spin_unlock(&swap_lock); if (get_swap_device_info(si)) { if (si->flags & SWP_PAGE_DISCARD) ret = swap_do_scheduled_discard(si); put_swap_device(si); } if (ret) - return true; - spin_lock(&swap_avail_lock); + return ret; + + spin_lock(&swap_lock); + if (plist_node_empty(&next->list)) + goto start_over; } - spin_unlock(&swap_avail_lock); + spin_unlock(&swap_lock); - return false; + return ret; } /** -- 2.34.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard 2025-11-25 16:30 ` [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard Youngjun Park @ 2025-11-26 18:23 ` Kairui Song 2025-11-27 2:22 ` YoungJun Park 2025-11-27 2:15 ` Baoquan He 1 sibling, 1 reply; 13+ messages in thread From: Kairui Song @ 2025-11-26 18:23 UTC (permalink / raw) To: Youngjun Park; +Cc: akpm, chrisl, shikemeng, nphamcs, bhe, baohua, linux-mm On Wed, Nov 26, 2025 at 12:31 AM Youngjun Park <youngjun.park@lge.com> wrote: > > swap_sync_discard() has an issue where if the next device becomes full > and is removed from the plist during iteration, the operation fails > even when other swap devices with pending discard entries remain > available. > > Fix by checking plist_node_empty(&next->list) and restarting iteration > when the next node is removed during discard operations. > > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/ > swap_active_head. This means the iteration is only affected by swapoff > operations rather than frequent availability changes, reducing > exceptional condition checks and lock contention. > > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation") > Suggested-by: Kairui Song <kasong@tencent.com> > Signed-off-by: Youngjun Park <youngjun.park@lge.com> > --- > mm/swapfile.c | 18 +++++++++++------- > 1 file changed, 11 insertions(+), 7 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index d12332423a06..998271aa09c3 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void) > bool ret = false; > struct swap_info_struct *si, *next; > > - spin_lock(&swap_avail_lock); > - plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) { > - spin_unlock(&swap_avail_lock); > + spin_lock(&swap_lock); > +start_over: > + plist_for_each_entry_safe(si, next, &swap_active_head, list) { > + spin_unlock(&swap_lock); > if (get_swap_device_info(si)) { > if (si->flags & SWP_PAGE_DISCARD) > ret = swap_do_scheduled_discard(si); > put_swap_device(si); > } > if (ret) > - return true; > - spin_lock(&swap_avail_lock); > + return ret; Nit: I'd prefer to avoid unnecessary changes like this, `return true` is the same thing. No need to send an update for this though, a really trivial issue. > + > + spin_lock(&swap_lock); > + if (plist_node_empty(&next->list)) > + goto start_over; > } > - spin_unlock(&swap_avail_lock); > + spin_unlock(&swap_lock); > > - return false; > + return ret; > } > > /** > -- > 2.34.1 Thanks! Acked-by: Kairui Song <kasong@tencent.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard 2025-11-26 18:23 ` Kairui Song @ 2025-11-27 2:22 ` YoungJun Park 0 siblings, 0 replies; 13+ messages in thread From: YoungJun Park @ 2025-11-27 2:22 UTC (permalink / raw) To: Kairui Song; +Cc: akpm, chrisl, shikemeng, nphamcs, bhe, baohua, linux-mm On Thu, Nov 27, 2025 at 02:23:05AM +0800, Kairui Song wrote: > On Wed, Nov 26, 2025 at 12:31 AM Youngjun Park <youngjun.park@lge.com> wrote: > > > > swap_sync_discard() has an issue where if the next device becomes full > > and is removed from the plist during iteration, the operation fails > > even when other swap devices with pending discard entries remain > > available. > > > > Fix by checking plist_node_empty(&next->list) and restarting iteration > > when the next node is removed during discard operations. > > > > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/ > > swap_active_head. This means the iteration is only affected by swapoff > > operations rather than frequent availability changes, reducing > > exceptional condition checks and lock contention. > > > > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation") > > Suggested-by: Kairui Song <kasong@tencent.com> > > Signed-off-by: Youngjun Park <youngjun.park@lge.com> > > --- > > mm/swapfile.c | 18 +++++++++++------- > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index d12332423a06..998271aa09c3 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void) > > bool ret = false; > > struct swap_info_struct *si, *next; > > > > - spin_lock(&swap_avail_lock); > > - plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) { > > - spin_unlock(&swap_avail_lock); > > + spin_lock(&swap_lock); > > +start_over: > > + plist_for_each_entry_safe(si, next, &swap_active_head, list) { > > + spin_unlock(&swap_lock); > > if (get_swap_device_info(si)) { > > if (si->flags & SWP_PAGE_DISCARD) > > ret = swap_do_scheduled_discard(si); > > put_swap_device(si); > > } > > if (ret) > > - return true; > > - spin_lock(&swap_avail_lock); > > + return ret; > > Nit: I'd prefer to avoid unnecessary changes like this, `return true` > is the same thing. No need to send an update for this though, a really > trivial issue. Okay, I will revert it to the original code. > > + > > + spin_lock(&swap_lock); > > + if (plist_node_empty(&next->list)) > > + goto start_over; > > } > > - spin_unlock(&swap_avail_lock); > > + spin_unlock(&swap_lock); > > > > - return false; > > + return ret; > > } > > > > /** > > -- > > 2.34.1 > > Thanks! > > Acked-by: Kairui Song <kasong@tencent.com> > Thank you Youngjun Park ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard 2025-11-25 16:30 ` [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard Youngjun Park 2025-11-26 18:23 ` Kairui Song @ 2025-11-27 2:15 ` Baoquan He 2025-11-27 2:54 ` YoungJun Park 2025-11-27 5:42 ` YoungJun Park 1 sibling, 2 replies; 13+ messages in thread From: Baoquan He @ 2025-11-27 2:15 UTC (permalink / raw) To: Youngjun Park; +Cc: akpm, chrisl, kasong, shikemeng, nphamcs, baohua, linux-mm On 11/26/25 at 01:30am, Youngjun Park wrote: > swap_sync_discard() has an issue where if the next device becomes full > and is removed from the plist during iteration, the operation fails > even when other swap devices with pending discard entries remain > available. > > Fix by checking plist_node_empty(&next->list) and restarting iteration > when the next node is removed during discard operations. > > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/ > swap_active_head. This means the iteration is only affected by swapoff > operations rather than frequent availability changes, reducing > exceptional condition checks and lock contention. > > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation") > Suggested-by: Kairui Song <kasong@tencent.com> > Signed-off-by: Youngjun Park <youngjun.park@lge.com> > --- > mm/swapfile.c | 18 +++++++++++------- > 1 file changed, 11 insertions(+), 7 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index d12332423a06..998271aa09c3 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void) > bool ret = false; > struct swap_info_struct *si, *next; > > - spin_lock(&swap_avail_lock); > - plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) { > - spin_unlock(&swap_avail_lock); > + spin_lock(&swap_lock); > +start_over: > + plist_for_each_entry_safe(si, next, &swap_active_head, list) { > + spin_unlock(&swap_lock); > if (get_swap_device_info(si)) { > if (si->flags & SWP_PAGE_DISCARD) > ret = swap_do_scheduled_discard(si); > put_swap_device(si); > } > if (ret) > - return true; > - spin_lock(&swap_avail_lock); > + return ret; > + > + spin_lock(&swap_lock); > + if (plist_node_empty(&next->list)) > + goto start_over; If there are many si with the same priority, or there are several si spread in different memcg when swap.tier is available, are we going to keep looping here to start over and over again possibly? The old code is supposed to go through the plist to do one round of discarding? Not sure if I got the code wrong, or the chance it very tiny. Thanks Baoquan > } > - spin_unlock(&swap_avail_lock); > + spin_unlock(&swap_lock); > > - return false; > + return ret; > } > > /** > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard 2025-11-27 2:15 ` Baoquan He @ 2025-11-27 2:54 ` YoungJun Park 2025-11-27 5:42 ` YoungJun Park 1 sibling, 0 replies; 13+ messages in thread From: YoungJun Park @ 2025-11-27 2:54 UTC (permalink / raw) To: Baoquan He; +Cc: akpm, chrisl, kasong, shikemeng, nphamcs, baohua, linux-mm On Thu, Nov 27, 2025 at 10:15:50AM +0800, Baoquan He wrote: > On 11/26/25 at 01:30am, Youngjun Park wrote: > > swap_sync_discard() has an issue where if the next device becomes full > > and is removed from the plist during iteration, the operation fails > > even when other swap devices with pending discard entries remain > > available. > > > > Fix by checking plist_node_empty(&next->list) and restarting iteration > > when the next node is removed during discard operations. > > > > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/ > > swap_active_head. This means the iteration is only affected by swapoff > > operations rather than frequent availability changes, reducing > > exceptional condition checks and lock contention. > > > > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation") > > Suggested-by: Kairui Song <kasong@tencent.com> > > Signed-off-by: Youngjun Park <youngjun.park@lge.com> > > --- > > mm/swapfile.c | 18 +++++++++++------- > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index d12332423a06..998271aa09c3 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void) > > bool ret = false; > > struct swap_info_struct *si, *next; > > > > - spin_lock(&swap_avail_lock); > > - plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) { > > - spin_unlock(&swap_avail_lock); > > + spin_lock(&swap_lock); > > +start_over: > > + plist_for_each_entry_safe(si, next, &swap_active_head, list) { > > + spin_unlock(&swap_lock); > > if (get_swap_device_info(si)) { > > if (si->flags & SWP_PAGE_DISCARD) > > ret = swap_do_scheduled_discard(si); > > put_swap_device(si); > > } > > if (ret) > > - return true; > > - spin_lock(&swap_avail_lock); > > + return ret; > > + > > + spin_lock(&swap_lock); > > + if (plist_node_empty(&next->list)) > > + goto start_over; > > If there are many si with the same priority, or there are several si Is this because of the requeue that happens while iterating over `swap_avail_head`? But, requeue does not make node empty. Also, since we are iterating over `swap_active_head`, it seems like it wouldn’t happen. > spread in different memcg when swap.tier is available, are we going to > keep looping here to start over and over again possibly? I think loop cannot happen on here by that reason. But, Loop can possbily happen between swap_alloc_slow and swap_sync_discard. If `swap.tier` is applied, I think you’re referring to the situation where `si`s not belonging to the current tier are discarded successfully, and then the next iteration goes through the available list again for the swap devices in the same tier. As you mentioned, a needless looping situation could occur. (if discards accumulate very quickly, could it even lead to an infinite loop.) If `swap.tier` is applied, this part may also need to be modified. > The old code is supposed to go through the plist to do one round of discarding? After your review, I thought more about it — if continuous swap on/off occurs while the `swap_lock` is released, it seems that we could keep hitting `plist_node_empty`. However, I think this case is very unlikely, so it shouldn’t be a problem. Actually, swap_alloc_slow already works that way. What do you think? In the old code, if a swapoff occurs and swap usage becomes zero, causing it to be removed from the `avail_list`, it ends up doing a one-round discarding. If we don’t like the idea of looping due to continuous swap on/off, we could consider adding a retry count or removing the `plist_node_empty` check. > Not sure if I got the code wrong, or the chance it very tiny. > Thanks > Baoquan I answered based on my understanding, but please correct me if I misunderstood your point. Thanks for the review. Youngjun Park ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard 2025-11-27 2:15 ` Baoquan He 2025-11-27 2:54 ` YoungJun Park @ 2025-11-27 5:42 ` YoungJun Park 2025-11-27 8:06 ` Baoquan He 1 sibling, 1 reply; 13+ messages in thread From: YoungJun Park @ 2025-11-27 5:42 UTC (permalink / raw) To: Baoquan He; +Cc: akpm, chrisl, kasong, shikemeng, nphamcs, baohua, linux-mm On Thu, Nov 27, 2025 at 10:15:50AM +0800, Baoquan He wrote: > On 11/26/25 at 01:30am, Youngjun Park wrote: > > swap_sync_discard() has an issue where if the next device becomes full > > and is removed from the plist during iteration, the operation fails > > even when other swap devices with pending discard entries remain > > available. > > > > Fix by checking plist_node_empty(&next->list) and restarting iteration > > when the next node is removed during discard operations. > > > > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/ > > swap_active_head. This means the iteration is only affected by swapoff > > operations rather than frequent availability changes, reducing > > exceptional condition checks and lock contention. > > > > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation") > > Suggested-by: Kairui Song <kasong@tencent.com> > > Signed-off-by: Youngjun Park <youngjun.park@lge.com> > > --- > > mm/swapfile.c | 18 +++++++++++------- > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index d12332423a06..998271aa09c3 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void) > > bool ret = false; > > struct swap_info_struct *si, *next; > > > > - spin_lock(&swap_avail_lock); > > - plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) { > > - spin_unlock(&swap_avail_lock); > > + spin_lock(&swap_lock); > > +start_over: > > + plist_for_each_entry_safe(si, next, &swap_active_head, list) { > > + spin_unlock(&swap_lock); > > if (get_swap_device_info(si)) { > > if (si->flags & SWP_PAGE_DISCARD) > > ret = swap_do_scheduled_discard(si); > > put_swap_device(si); > > } > > if (ret) > > - return true; > > - spin_lock(&swap_avail_lock); > > + return ret; > > + > > + spin_lock(&swap_lock); > > + if (plist_node_empty(&next->list)) > > + goto start_over; By forcing a brief delay right before the swap_lock, I was able to observe at runtime that when the next node is removed (due to swapoff), and there is no plist_node_empty check, plist_del makes the node point to itself. As a result, when the iteration continues to the next entry, it keeps retrying on itself, since the list traversal termination condition is based on whether the current node is the head or not. At first glance, I had assumed that plist_node_empty also implicitly served as a termination condition of plist_for_each_entry_safe. Therefore, the real reason for this patch is not: "swap_sync_discard() has an issue where if the next device becomes full and is removed from the plist during iteration, the operation fails even when other swap devices with pending discard entries remain available." but rather: "When the next node is removed, the next pointer loops back to the current entry, possibly causing an loop until it will be reinserted on the list." So, the plist_node_empty check is necessary — either as it is now (not the original code, the patch I modified) or as a break condition (if we want to avoid the swap on/off loop situation I mentioned in my previous email.) Best regards, Youngjun Park ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard 2025-11-27 5:42 ` YoungJun Park @ 2025-11-27 8:06 ` Baoquan He 2025-11-27 9:34 ` YoungJun Park 0 siblings, 1 reply; 13+ messages in thread From: Baoquan He @ 2025-11-27 8:06 UTC (permalink / raw) To: YoungJun Park; +Cc: akpm, chrisl, kasong, shikemeng, nphamcs, baohua, linux-mm On 11/27/25 at 02:42pm, YoungJun Park wrote: > On Thu, Nov 27, 2025 at 10:15:50AM +0800, Baoquan He wrote: > > On 11/26/25 at 01:30am, Youngjun Park wrote: > > > swap_sync_discard() has an issue where if the next device becomes full > > > and is removed from the plist during iteration, the operation fails > > > even when other swap devices with pending discard entries remain > > > available. > > > > > > Fix by checking plist_node_empty(&next->list) and restarting iteration > > > when the next node is removed during discard operations. > > > > > > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/ > > > swap_active_head. This means the iteration is only affected by swapoff > > > operations rather than frequent availability changes, reducing > > > exceptional condition checks and lock contention. > > > > > > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation") > > > Suggested-by: Kairui Song <kasong@tencent.com> > > > Signed-off-by: Youngjun Park <youngjun.park@lge.com> > > > --- > > > mm/swapfile.c | 18 +++++++++++------- > > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > > index d12332423a06..998271aa09c3 100644 > > > --- a/mm/swapfile.c > > > +++ b/mm/swapfile.c > > > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void) > > > bool ret = false; > > > struct swap_info_struct *si, *next; > > > > > > - spin_lock(&swap_avail_lock); > > > - plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) { > > > - spin_unlock(&swap_avail_lock); > > > + spin_lock(&swap_lock); > > > +start_over: > > > + plist_for_each_entry_safe(si, next, &swap_active_head, list) { > > > + spin_unlock(&swap_lock); > > > if (get_swap_device_info(si)) { > > > if (si->flags & SWP_PAGE_DISCARD) > > > ret = swap_do_scheduled_discard(si); > > > put_swap_device(si); > > > } > > > if (ret) > > > - return true; > > > - spin_lock(&swap_avail_lock); > > > + return ret; > > > + > > > + spin_lock(&swap_lock); > > > + if (plist_node_empty(&next->list)) > > > + goto start_over; > > By forcing a brief delay right before the swap_lock, I was able to observe at > runtime that when the next node is removed (due to swapoff), and there is no > plist_node_empty check, plist_del makes the node point to itself. As a result, > when the iteration continues to the next entry, it keeps retrying on itself, > since the list traversal termination condition is based on whether the current > node is the head or not. > > At first glance, I had assumed that plist_node_empty also implicitly served as > a termination condition of plist_for_each_entry_safe. > > Therefore, the real reason for this patch is not: > "swap_sync_discard() has an issue where if the next device becomes full > and is removed from the plist during iteration, the operation fails even > when other swap devices with pending discard entries remain available." > but rather: > "When the next node is removed, the next pointer loops back to the current > entry, possibly causing an loop until it will be reinserted on the list." > > So, the plist_node_empty check is necessary — either as it is now (not the original > code, the patch I modified) or as a break condition > (if we want to avoid the swap on/off loop situation I mentioned in my previous email.) OK, I only thought of swap on/off case, didn't think much. As you analyzed, the plist_node_empty check is necessary. So this patch looks good to me. Or one alternative way is fetching the new next? Not strong opinion though. if (plist_node_empty(&next->list)) { if (!plist_node_empty(&si->list)) { next = list_next_entry(si, list.node_list); continue; } return false; } ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard 2025-11-27 8:06 ` Baoquan He @ 2025-11-27 9:34 ` YoungJun Park 2025-11-27 10:32 ` Baoquan He 0 siblings, 1 reply; 13+ messages in thread From: YoungJun Park @ 2025-11-27 9:34 UTC (permalink / raw) To: Baoquan He; +Cc: akpm, chrisl, kasong, shikemeng, nphamcs, baohua, linux-mm On Thu, Nov 27, 2025 at 04:06:56PM +0800, Baoquan He wrote: > On 11/27/25 at 02:42pm, YoungJun Park wrote: > > On Thu, Nov 27, 2025 at 10:15:50AM +0800, Baoquan He wrote: > > > On 11/26/25 at 01:30am, Youngjun Park wrote: > > > > swap_sync_discard() has an issue where if the next device becomes full > > > > and is removed from the plist during iteration, the operation fails > > > > even when other swap devices with pending discard entries remain > > > > available. > > > > > > > > Fix by checking plist_node_empty(&next->list) and restarting iteration > > > > when the next node is removed during discard operations. > > > > > > > > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/ > > > > swap_active_head. This means the iteration is only affected by swapoff > > > > operations rather than frequent availability changes, reducing > > > > exceptional condition checks and lock contention. > > > > > > > > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation") > > > > Suggested-by: Kairui Song <kasong@tencent.com> > > > > Signed-off-by: Youngjun Park <youngjun.park@lge.com> > > > > --- > > > > mm/swapfile.c | 18 +++++++++++------- > > > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > > > index d12332423a06..998271aa09c3 100644 > > > > --- a/mm/swapfile.c > > > > +++ b/mm/swapfile.c > > > > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void) > > > > bool ret = false; > > > > struct swap_info_struct *si, *next; > > > > > > > > - spin_lock(&swap_avail_lock); > > > > - plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) { > > > > - spin_unlock(&swap_avail_lock); > > > > + spin_lock(&swap_lock); > > > > +start_over: > > > > + plist_for_each_entry_safe(si, next, &swap_active_head, list) { > > > > + spin_unlock(&swap_lock); > > > > if (get_swap_device_info(si)) { > > > > if (si->flags & SWP_PAGE_DISCARD) > > > > ret = swap_do_scheduled_discard(si); > > > > put_swap_device(si); > > > > } > > > > if (ret) > > > > - return true; > > > > - spin_lock(&swap_avail_lock); > > > > + return ret; > > > > + > > > > + spin_lock(&swap_lock); > > > > + if (plist_node_empty(&next->list)) > > > > + goto start_over; > > > > By forcing a brief delay right before the swap_lock, I was able to observe at > > runtime that when the next node is removed (due to swapoff), and there is no > > plist_node_empty check, plist_del makes the node point to itself. As a result, > > when the iteration continues to the next entry, it keeps retrying on itself, > > since the list traversal termination condition is based on whether the current > > node is the head or not. > > > > At first glance, I had assumed that plist_node_empty also implicitly served as > > a termination condition of plist_for_each_entry_safe. > > > > Therefore, the real reason for this patch is not: > > "swap_sync_discard() has an issue where if the next device becomes full > > and is removed from the plist during iteration, the operation fails even > > when other swap devices with pending discard entries remain available." > > but rather: > > "When the next node is removed, the next pointer loops back to the current > > entry, possibly causing an loop until it will be reinserted on the list." > > > > So, the plist_node_empty check is necessary — either as it is now (not the original > > code, the patch I modified) or as a break condition > > (if we want to avoid the swap on/off loop situation I mentioned in my previous email.) > > OK, I only thought of swap on/off case, didn't think much. As you > analyzed, the plist_node_empty check is necessary. So this patch looks > good to me. Or one alternative way is fetching the new next? Not strong > opinion though. > > if (plist_node_empty(&next->list)) { > if (!plist_node_empty(&si->list)) { > next = list_next_entry(si, list.node_list); > continue; > } > return false; > } Thank you for the suggestion :D I agree it could be an improvement in some cases. Personally, I feel the current code works fine, and from a readability perspective, the current approach might be a bit clearer. It also seems that the alternative would only make a difference in very minor cases. (order 0, swapfail and swapoff during on this routine) Youngjun Park ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard 2025-11-27 9:34 ` YoungJun Park @ 2025-11-27 10:32 ` Baoquan He 2025-11-27 10:44 ` YoungJun Park 0 siblings, 1 reply; 13+ messages in thread From: Baoquan He @ 2025-11-27 10:32 UTC (permalink / raw) To: YoungJun Park; +Cc: akpm, chrisl, kasong, shikemeng, nphamcs, baohua, linux-mm On 11/27/25 at 06:34pm, YoungJun Park wrote: > On Thu, Nov 27, 2025 at 04:06:56PM +0800, Baoquan He wrote: > > On 11/27/25 at 02:42pm, YoungJun Park wrote: > > > On Thu, Nov 27, 2025 at 10:15:50AM +0800, Baoquan He wrote: > > > > On 11/26/25 at 01:30am, Youngjun Park wrote: > > > > > swap_sync_discard() has an issue where if the next device becomes full > > > > > and is removed from the plist during iteration, the operation fails > > > > > even when other swap devices with pending discard entries remain > > > > > available. > > > > > > > > > > Fix by checking plist_node_empty(&next->list) and restarting iteration > > > > > when the next node is removed during discard operations. > > > > > > > > > > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/ > > > > > swap_active_head. This means the iteration is only affected by swapoff > > > > > operations rather than frequent availability changes, reducing > > > > > exceptional condition checks and lock contention. > > > > > > > > > > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation") > > > > > Suggested-by: Kairui Song <kasong@tencent.com> > > > > > Signed-off-by: Youngjun Park <youngjun.park@lge.com> > > > > > --- > > > > > mm/swapfile.c | 18 +++++++++++------- > > > > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > > > > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > > > > index d12332423a06..998271aa09c3 100644 > > > > > --- a/mm/swapfile.c > > > > > +++ b/mm/swapfile.c > > > > > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void) > > > > > bool ret = false; > > > > > struct swap_info_struct *si, *next; > > > > > > > > > > - spin_lock(&swap_avail_lock); > > > > > - plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) { > > > > > - spin_unlock(&swap_avail_lock); > > > > > + spin_lock(&swap_lock); > > > > > +start_over: > > > > > + plist_for_each_entry_safe(si, next, &swap_active_head, list) { > > > > > + spin_unlock(&swap_lock); > > > > > if (get_swap_device_info(si)) { > > > > > if (si->flags & SWP_PAGE_DISCARD) > > > > > ret = swap_do_scheduled_discard(si); > > > > > put_swap_device(si); > > > > > } > > > > > if (ret) > > > > > - return true; > > > > > - spin_lock(&swap_avail_lock); > > > > > + return ret; > > > > > + > > > > > + spin_lock(&swap_lock); > > > > > + if (plist_node_empty(&next->list)) > > > > > + goto start_over; > > > > > > By forcing a brief delay right before the swap_lock, I was able to observe at > > > runtime that when the next node is removed (due to swapoff), and there is no > > > plist_node_empty check, plist_del makes the node point to itself. As a result, > > > when the iteration continues to the next entry, it keeps retrying on itself, > > > since the list traversal termination condition is based on whether the current > > > node is the head or not. > > > > > > At first glance, I had assumed that plist_node_empty also implicitly served as > > > a termination condition of plist_for_each_entry_safe. > > > > > > Therefore, the real reason for this patch is not: > > > "swap_sync_discard() has an issue where if the next device becomes full > > > and is removed from the plist during iteration, the operation fails even > > > when other swap devices with pending discard entries remain available." > > > but rather: > > > "When the next node is removed, the next pointer loops back to the current > > > entry, possibly causing an loop until it will be reinserted on the list." > > > > > > So, the plist_node_empty check is necessary — either as it is now (not the original > > > code, the patch I modified) or as a break condition > > > (if we want to avoid the swap on/off loop situation I mentioned in my previous email.) > > > > OK, I only thought of swap on/off case, didn't think much. As you > > analyzed, the plist_node_empty check is necessary. So this patch looks > > good to me. Or one alternative way is fetching the new next? Not strong > > opinion though. > > > > if (plist_node_empty(&next->list)) { > > if (!plist_node_empty(&si->list)) { > > next = list_next_entry(si, list.node_list); > > continue; > > } > > return false; > > } > > Thank you for the suggestion :D > I agree it could be an improvement in some cases. > Personally, I feel the current code works fine, > and from a readability perspective, the current approach might be a bit clearer. > It also seems that the alternative would only make a difference in very minor cases. > (order 0, swapfail and swapoff during on this routine) Agree. Will you post v2 to update the patch log? I would like to add my reviewing tag if no v2 is planned. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard 2025-11-27 10:32 ` Baoquan He @ 2025-11-27 10:44 ` YoungJun Park 2025-11-27 10:50 ` Baoquan He 0 siblings, 1 reply; 13+ messages in thread From: YoungJun Park @ 2025-11-27 10:44 UTC (permalink / raw) To: Baoquan He; +Cc: akpm, chrisl, kasong, shikemeng, nphamcs, baohua, linux-mm On Thu, Nov 27, 2025 at 06:32:53PM +0800, Baoquan He wrote: > On 11/27/25 at 06:34pm, YoungJun Park wrote: > > On Thu, Nov 27, 2025 at 04:06:56PM +0800, Baoquan He wrote: > > > On 11/27/25 at 02:42pm, YoungJun Park wrote: > > > > On Thu, Nov 27, 2025 at 10:15:50AM +0800, Baoquan He wrote: > > > > > On 11/26/25 at 01:30am, Youngjun Park wrote: > > > > > > swap_sync_discard() has an issue where if the next device becomes full > > > > > > and is removed from the plist during iteration, the operation fails > > > > > > even when other swap devices with pending discard entries remain > > > > > > available. > > > > > > > > > > > > Fix by checking plist_node_empty(&next->list) and restarting iteration > > > > > > when the next node is removed during discard operations. > > > > > > > > > > > > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/ > > > > > > swap_active_head. This means the iteration is only affected by swapoff > > > > > > operations rather than frequent availability changes, reducing > > > > > > exceptional condition checks and lock contention. > > > > > > > > > > > > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation") > > > > > > Suggested-by: Kairui Song <kasong@tencent.com> > > > > > > Signed-off-by: Youngjun Park <youngjun.park@lge.com> > > > > > > --- > > > > > > mm/swapfile.c | 18 +++++++++++------- > > > > > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > > > > > > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > > > > > index d12332423a06..998271aa09c3 100644 > > > > > > --- a/mm/swapfile.c > > > > > > +++ b/mm/swapfile.c > > > > > > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void) > > > > > > bool ret = false; > > > > > > struct swap_info_struct *si, *next; > > > > > > > > > > > > - spin_lock(&swap_avail_lock); > > > > > > - plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) { > > > > > > - spin_unlock(&swap_avail_lock); > > > > > > + spin_lock(&swap_lock); > > > > > > +start_over: > > > > > > + plist_for_each_entry_safe(si, next, &swap_active_head, list) { > > > > > > + spin_unlock(&swap_lock); > > > > > > if (get_swap_device_info(si)) { > > > > > > if (si->flags & SWP_PAGE_DISCARD) > > > > > > ret = swap_do_scheduled_discard(si); > > > > > > put_swap_device(si); > > > > > > } > > > > > > if (ret) > > > > > > - return true; > > > > > > - spin_lock(&swap_avail_lock); > > > > > > + return ret; > > > > > > + > > > > > > + spin_lock(&swap_lock); > > > > > > + if (plist_node_empty(&next->list)) > > > > > > + goto start_over; > > > > > > > > By forcing a brief delay right before the swap_lock, I was able to observe at > > > > runtime that when the next node is removed (due to swapoff), and there is no > > > > plist_node_empty check, plist_del makes the node point to itself. As a result, > > > > when the iteration continues to the next entry, it keeps retrying on itself, > > > > since the list traversal termination condition is based on whether the current > > > > node is the head or not. > > > > > > > > At first glance, I had assumed that plist_node_empty also implicitly served as > > > > a termination condition of plist_for_each_entry_safe. > > > > > > > > Therefore, the real reason for this patch is not: > > > > "swap_sync_discard() has an issue where if the next device becomes full > > > > and is removed from the plist during iteration, the operation fails even > > > > when other swap devices with pending discard entries remain available." > > > > but rather: > > > > "When the next node is removed, the next pointer loops back to the current > > > > entry, possibly causing an loop until it will be reinserted on the list." > > > > > > > > So, the plist_node_empty check is necessary — either as it is now (not the original > > > > code, the patch I modified) or as a break condition > > > > (if we want to avoid the swap on/off loop situation I mentioned in my previous email.) > > > > > > OK, I only thought of swap on/off case, didn't think much. As you > > > analyzed, the plist_node_empty check is necessary. So this patch looks > > > good to me. Or one alternative way is fetching the new next? Not strong > > > opinion though. > > > > > > if (plist_node_empty(&next->list)) { > > > if (!plist_node_empty(&si->list)) { > > > next = list_next_entry(si, list.node_list); > > > continue; > > > } > > > return false; > > > } > > > > Thank you for the suggestion :D > > I agree it could be an improvement in some cases. > > Personally, I feel the current code works fine, > > and from a readability perspective, the current approach might be a bit clearer. > > It also seems that the alternative would only make a difference in very minor cases. > > (order 0, swapfail and swapoff during on this routine) > > Agree. Will you post v2 to update the patch log? I would like to add my > reviewing tag if no v2 is planned. Oops, I’ve just posted v2 to update the patch log. Link: https://lore.kernel.org/linux-mm/20251127100303.783198-1-youngjun.park@lge.com/T/#m920503bf9bac0d35bd2c8467a926481e58d7ab53 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard 2025-11-27 10:44 ` YoungJun Park @ 2025-11-27 10:50 ` Baoquan He 0 siblings, 0 replies; 13+ messages in thread From: Baoquan He @ 2025-11-27 10:50 UTC (permalink / raw) To: YoungJun Park; +Cc: akpm, chrisl, kasong, shikemeng, nphamcs, baohua, linux-mm On 11/27/25 at 07:44pm, YoungJun Park wrote: > On Thu, Nov 27, 2025 at 06:32:53PM +0800, Baoquan He wrote: > > On 11/27/25 at 06:34pm, YoungJun Park wrote: > > > On Thu, Nov 27, 2025 at 04:06:56PM +0800, Baoquan He wrote: > > > > On 11/27/25 at 02:42pm, YoungJun Park wrote: > > > > > On Thu, Nov 27, 2025 at 10:15:50AM +0800, Baoquan He wrote: > > > > > > On 11/26/25 at 01:30am, Youngjun Park wrote: > > > > > > > swap_sync_discard() has an issue where if the next device becomes full > > > > > > > and is removed from the plist during iteration, the operation fails > > > > > > > even when other swap devices with pending discard entries remain > > > > > > > available. > > > > > > > > > > > > > > Fix by checking plist_node_empty(&next->list) and restarting iteration > > > > > > > when the next node is removed during discard operations. > > > > > > > > > > > > > > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/ > > > > > > > swap_active_head. This means the iteration is only affected by swapoff > > > > > > > operations rather than frequent availability changes, reducing > > > > > > > exceptional condition checks and lock contention. > > > > > > > > > > > > > > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation") > > > > > > > Suggested-by: Kairui Song <kasong@tencent.com> > > > > > > > Signed-off-by: Youngjun Park <youngjun.park@lge.com> > > > > > > > --- > > > > > > > mm/swapfile.c | 18 +++++++++++------- > > > > > > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > > > > > > > > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > > > > > > index d12332423a06..998271aa09c3 100644 > > > > > > > --- a/mm/swapfile.c > > > > > > > +++ b/mm/swapfile.c > > > > > > > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void) > > > > > > > bool ret = false; > > > > > > > struct swap_info_struct *si, *next; > > > > > > > > > > > > > > - spin_lock(&swap_avail_lock); > > > > > > > - plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) { > > > > > > > - spin_unlock(&swap_avail_lock); > > > > > > > + spin_lock(&swap_lock); > > > > > > > +start_over: > > > > > > > + plist_for_each_entry_safe(si, next, &swap_active_head, list) { > > > > > > > + spin_unlock(&swap_lock); > > > > > > > if (get_swap_device_info(si)) { > > > > > > > if (si->flags & SWP_PAGE_DISCARD) > > > > > > > ret = swap_do_scheduled_discard(si); > > > > > > > put_swap_device(si); > > > > > > > } > > > > > > > if (ret) > > > > > > > - return true; > > > > > > > - spin_lock(&swap_avail_lock); > > > > > > > + return ret; > > > > > > > + > > > > > > > + spin_lock(&swap_lock); > > > > > > > + if (plist_node_empty(&next->list)) > > > > > > > + goto start_over; > > > > > > > > > > By forcing a brief delay right before the swap_lock, I was able to observe at > > > > > runtime that when the next node is removed (due to swapoff), and there is no > > > > > plist_node_empty check, plist_del makes the node point to itself. As a result, > > > > > when the iteration continues to the next entry, it keeps retrying on itself, > > > > > since the list traversal termination condition is based on whether the current > > > > > node is the head or not. > > > > > > > > > > At first glance, I had assumed that plist_node_empty also implicitly served as > > > > > a termination condition of plist_for_each_entry_safe. > > > > > > > > > > Therefore, the real reason for this patch is not: > > > > > "swap_sync_discard() has an issue where if the next device becomes full > > > > > and is removed from the plist during iteration, the operation fails even > > > > > when other swap devices with pending discard entries remain available." > > > > > but rather: > > > > > "When the next node is removed, the next pointer loops back to the current > > > > > entry, possibly causing an loop until it will be reinserted on the list." > > > > > > > > > > So, the plist_node_empty check is necessary — either as it is now (not the original > > > > > code, the patch I modified) or as a break condition > > > > > (if we want to avoid the swap on/off loop situation I mentioned in my previous email.) > > > > > > > > OK, I only thought of swap on/off case, didn't think much. As you > > > > analyzed, the plist_node_empty check is necessary. So this patch looks > > > > good to me. Or one alternative way is fetching the new next? Not strong > > > > opinion though. > > > > > > > > if (plist_node_empty(&next->list)) { > > > > if (!plist_node_empty(&si->list)) { > > > > next = list_next_entry(si, list.node_list); > > > > continue; > > > > } > > > > return false; > > > > } > > > > > > Thank you for the suggestion :D > > > I agree it could be an improvement in some cases. > > > Personally, I feel the current code works fine, > > > and from a readability perspective, the current approach might be a bit clearer. > > > It also seems that the alternative would only make a difference in very minor cases. > > > (order 0, swapfail and swapoff during on this routine) > > > > Agree. Will you post v2 to update the patch log? I would like to add my > > reviewing tag if no v2 is planned. > > Oops, I’ve just posted v2 to update the patch log. > Link: https://lore.kernel.org/linux-mm/20251127100303.783198-1-youngjun.park@lge.com/T/#m920503bf9bac0d35bd2c8467a926481e58d7ab53 Saw it, thanks. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/2] mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate 2025-11-25 16:30 [PATCH 0/2] mm/swapfile: fix and cleanup swap list iterations Youngjun Park 2025-11-25 16:30 ` [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard Youngjun Park @ 2025-11-25 16:30 ` Youngjun Park 1 sibling, 0 replies; 13+ messages in thread From: Youngjun Park @ 2025-11-25 16:30 UTC (permalink / raw) To: akpm, chrisl, kasong Cc: shikemeng, nphamcs, bhe, baohua, youngjun.park, linux-mm The loop breaks immediately after finding the first swap device and never modifies the list. Replace plist_for_each_entry_safe() with plist_for_each_entry() and remove the unused next variable. Signed-off-by: Youngjun Park <youngjun.park@lge.com> --- mm/swapfile.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 998271aa09c3..cc0990141171 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -4023,7 +4023,7 @@ static bool __has_usable_swap(void) void __folio_throttle_swaprate(struct folio *folio, gfp_t gfp) { - struct swap_info_struct *si, *next; + struct swap_info_struct *si; if (!(gfp & __GFP_IO)) return; @@ -4042,8 +4042,7 @@ void __folio_throttle_swaprate(struct folio *folio, gfp_t gfp) return; spin_lock(&swap_avail_lock); - plist_for_each_entry_safe(si, next, &swap_avail_head, - avail_list) { + plist_for_each_entry(si, &swap_avail_head, avail_list) { if (si->bdev) { blkcg_schedule_throttle(si->bdev->bd_disk, true); break; -- 2.34.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-11-27 10:50 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-11-25 16:30 [PATCH 0/2] mm/swapfile: fix and cleanup swap list iterations Youngjun Park 2025-11-25 16:30 ` [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard Youngjun Park 2025-11-26 18:23 ` Kairui Song 2025-11-27 2:22 ` YoungJun Park 2025-11-27 2:15 ` Baoquan He 2025-11-27 2:54 ` YoungJun Park 2025-11-27 5:42 ` YoungJun Park 2025-11-27 8:06 ` Baoquan He 2025-11-27 9:34 ` YoungJun Park 2025-11-27 10:32 ` Baoquan He 2025-11-27 10:44 ` YoungJun Park 2025-11-27 10:50 ` Baoquan He 2025-11-25 16:30 ` [PATCH 2/2] mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate Youngjun Park
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox