From: Baoquan He <bhe@redhat.com>
To: YoungJun Park <youngjun.park@lge.com>
Cc: akpm@linux-foundation.org, chrisl@kernel.org, kasong@tencent.com,
shikemeng@huaweicloud.com, nphamcs@gmail.com, baohua@kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard
Date: Thu, 27 Nov 2025 18:50:03 +0800 [thread overview]
Message-ID: <aSgs2y3b/DhsMMHD@MiWiFi-R3L-srv> (raw)
In-Reply-To: <aSgrdLkaLjzVh6kv@yjaykim-PowerEdge-T330>
On 11/27/25 at 07:44pm, YoungJun Park wrote:
> On Thu, Nov 27, 2025 at 06:32:53PM +0800, Baoquan He wrote:
> > On 11/27/25 at 06:34pm, YoungJun Park wrote:
> > > On Thu, Nov 27, 2025 at 04:06:56PM +0800, Baoquan He wrote:
> > > > On 11/27/25 at 02:42pm, YoungJun Park wrote:
> > > > > On Thu, Nov 27, 2025 at 10:15:50AM +0800, Baoquan He wrote:
> > > > > > On 11/26/25 at 01:30am, Youngjun Park wrote:
> > > > > > > swap_sync_discard() has an issue where if the next device becomes full
> > > > > > > and is removed from the plist during iteration, the operation fails
> > > > > > > even when other swap devices with pending discard entries remain
> > > > > > > available.
> > > > > > >
> > > > > > > Fix by checking plist_node_empty(&next->list) and restarting iteration
> > > > > > > when the next node is removed during discard operations.
> > > > > > >
> > > > > > > Additionally, switch from swap_avail_lock/swap_avail_head to swap_lock/
> > > > > > > swap_active_head. This means the iteration is only affected by swapoff
> > > > > > > operations rather than frequent availability changes, reducing
> > > > > > > exceptional condition checks and lock contention.
> > > > > > >
> > > > > > > Fixes: 686ea517f471 ("mm, swap: do not perform synchronous discard during allocation")
> > > > > > > Suggested-by: Kairui Song <kasong@tencent.com>
> > > > > > > Signed-off-by: Youngjun Park <youngjun.park@lge.com>
> > > > > > > ---
> > > > > > > mm/swapfile.c | 18 +++++++++++-------
> > > > > > > 1 file changed, 11 insertions(+), 7 deletions(-)
> > > > > > >
> > > > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > > > > > index d12332423a06..998271aa09c3 100644
> > > > > > > --- a/mm/swapfile.c
> > > > > > > +++ b/mm/swapfile.c
> > > > > > > @@ -1387,21 +1387,25 @@ static bool swap_sync_discard(void)
> > > > > > > bool ret = false;
> > > > > > > struct swap_info_struct *si, *next;
> > > > > > >
> > > > > > > - spin_lock(&swap_avail_lock);
> > > > > > > - plist_for_each_entry_safe(si, next, &swap_avail_head, avail_list) {
> > > > > > > - spin_unlock(&swap_avail_lock);
> > > > > > > + spin_lock(&swap_lock);
> > > > > > > +start_over:
> > > > > > > + plist_for_each_entry_safe(si, next, &swap_active_head, list) {
> > > > > > > + spin_unlock(&swap_lock);
> > > > > > > if (get_swap_device_info(si)) {
> > > > > > > if (si->flags & SWP_PAGE_DISCARD)
> > > > > > > ret = swap_do_scheduled_discard(si);
> > > > > > > put_swap_device(si);
> > > > > > > }
> > > > > > > if (ret)
> > > > > > > - return true;
> > > > > > > - spin_lock(&swap_avail_lock);
> > > > > > > + return ret;
> > > > > > > +
> > > > > > > + spin_lock(&swap_lock);
> > > > > > > + if (plist_node_empty(&next->list))
> > > > > > > + goto start_over;
> > > > >
> > > > > By forcing a brief delay right before the swap_lock, I was able to observe at
> > > > > runtime that when the next node is removed (due to swapoff), and there is no
> > > > > plist_node_empty check, plist_del makes the node point to itself. As a result,
> > > > > when the iteration continues to the next entry, it keeps retrying on itself,
> > > > > since the list traversal termination condition is based on whether the current
> > > > > node is the head or not.
> > > > >
> > > > > At first glance, I had assumed that plist_node_empty also implicitly served as
> > > > > a termination condition of plist_for_each_entry_safe.
> > > > >
> > > > > Therefore, the real reason for this patch is not:
> > > > > "swap_sync_discard() has an issue where if the next device becomes full
> > > > > and is removed from the plist during iteration, the operation fails even
> > > > > when other swap devices with pending discard entries remain available."
> > > > > but rather:
> > > > > "When the next node is removed, the next pointer loops back to the current
> > > > > entry, possibly causing an loop until it will be reinserted on the list."
> > > > >
> > > > > So, the plist_node_empty check is necessary — either as it is now (not the original
> > > > > code, the patch I modified) or as a break condition
> > > > > (if we want to avoid the swap on/off loop situation I mentioned in my previous email.)
> > > >
> > > > OK, I only thought of swap on/off case, didn't think much. As you
> > > > analyzed, the plist_node_empty check is necessary. So this patch looks
> > > > good to me. Or one alternative way is fetching the new next? Not strong
> > > > opinion though.
> > > >
> > > > if (plist_node_empty(&next->list)) {
> > > > if (!plist_node_empty(&si->list)) {
> > > > next = list_next_entry(si, list.node_list);
> > > > continue;
> > > > }
> > > > return false;
> > > > }
> > >
> > > Thank you for the suggestion :D
> > > I agree it could be an improvement in some cases.
> > > Personally, I feel the current code works fine,
> > > and from a readability perspective, the current approach might be a bit clearer.
> > > It also seems that the alternative would only make a difference in very minor cases.
> > > (order 0, swapfail and swapoff during on this routine)
> >
> > Agree. Will you post v2 to update the patch log? I would like to add my
> > reviewing tag if no v2 is planned.
>
> Oops, I’ve just posted v2 to update the patch log.
> Link: https://lore.kernel.org/linux-mm/20251127100303.783198-1-youngjun.park@lge.com/T/#m920503bf9bac0d35bd2c8467a926481e58d7ab53
Saw it, thanks.
next prev parent reply other threads:[~2025-11-27 10:50 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-25 16:30 [PATCH 0/2] mm/swapfile: fix and cleanup swap list iterations Youngjun Park
2025-11-25 16:30 ` [PATCH 1/2] mm/swapfile: fix list iteration in swap_sync_discard Youngjun Park
2025-11-26 18:23 ` Kairui Song
2025-11-27 2:22 ` YoungJun Park
2025-11-27 2:15 ` Baoquan He
2025-11-27 2:54 ` YoungJun Park
2025-11-27 5:42 ` YoungJun Park
2025-11-27 8:06 ` Baoquan He
2025-11-27 9:34 ` YoungJun Park
2025-11-27 10:32 ` Baoquan He
2025-11-27 10:44 ` YoungJun Park
2025-11-27 10:50 ` Baoquan He [this message]
2025-11-25 16:30 ` [PATCH 2/2] mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate Youngjun Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aSgs2y3b/DhsMMHD@MiWiFi-R3L-srv \
--to=bhe@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=chrisl@kernel.org \
--cc=kasong@tencent.com \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=shikemeng@huaweicloud.com \
--cc=youngjun.park@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox