linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: YoungJun Park <youngjun.park@lge.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: Chris Li <chrisl@kernel.org>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
	Barry Song <baohua@kernel.org>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	David Hildenbrand <david@redhat.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Ying Huang <ying.huang@linux.alibaba.com>,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH 1/4] mm, swap: do not perform synchronous discard during allocation
Date: Tue, 21 Oct 2025 16:34:46 +0900	[thread overview]
Message-ID: <aPc3lmbJEVTXoV6h@yjaykim-PowerEdge-T330> (raw)
In-Reply-To: <CAMgjq7CsYhEjvtN85XGkrONYAJxve7gG593TFeOGV-oax++kWA@mail.gmail.com>

> > Thanks, I was composing a reply on this and just saw your new comment.
> > I agree with this.
> 
> Hmm, it turns out modifying V1 to handle non-order 0 allocation
> failure also has some minor issues. Every mTHP SWAP allocation failure
> will have a slight higher overhead due to the discard check. V1 is
> fine since it only checks discard for order 0, and order 0 alloc
> failure is uncommon and usually means OOM already.

Looking at the original proposed patch.

 +	spin_lock(&swap_avail_lock);
 +	plist_for_each_entry_safe(si, next, &swap_avail_heads[nid], avail_lists[nid]) {
 +		spin_unlock(&swap_avail_lock);
 +		if (get_swap_device_info(si)) {
 +			if (si->flags & SWP_PAGE_DISCARD)
 +				ret = swap_do_scheduled_discard(si);
 +			put_swap_device(si);
 +		}
 +		if (ret)
 +			break;

if ret is true and we break, 
wouldn’t that cause spin_unlock to run without the lock being held?

 +		spin_lock(&swap_avail_lock);
 +	}
 +	spin_unlock(&swap_avail_lock); <- unlocked without lock grab.
 +
 +	return ret;
 +}

> I'm not saying V1 is the final solution, but I think maybe we can just
> keep V1 as it is? That's easier for a stable backport too, and this is
> doing far better than what it was like. The sync discard was added in
> 2013 and the later added percpu cluster at the same year never treated
> it carefully. And the discard during allocation after recent swap
> allocator rework has been kind of broken for a while.
> 
> To optimize it further in a clean way, we have to reverse the
> allocator's handling order of the plist and fast / slow path. Current
> order is local_lock -> fast -> slow (plist).
> We can walk the plist first, then do the fast / slow path: plist (or
> maybe something faster than plist but handles the priority) ->
> local_lock -> fast -> slow (bonus: this is more friendly to RT kernels

I think the idea is good, but when approaching it that way, 
I am curious about rotation handling.

In the current code, rotation is always done when traversing the plist in the slow path.
If we traverse the plist first, how should rotation be handled?

1. Do a naive rotation at plist traversal time. 
(But then fast path might allocate from an si we didn’t select.)
2. Rotate when allocating in the slow path. 
(But between releasing swap_avail_lock, we might access an si that wasn’t rotated.)

Both cases could break rotation behavior — what do you think?

> too I think). That way we don't need to rewalk the plist after
> releasing the local_lock and save a lot of trouble. I remember I
> discussed with Youngjun on this sometime ago in the mail list, I know

Recapping your earlier idea: cache only the swap device per cgroup in percpu, 
and keep the cluster inside the swap device.
Applied to swap tiers, cache only the percpu si per tier, 
and keep the cluster in the swap device.
This seems to fit well with your previous suggestion.

However, since we shifted from per-cgroup swap priority to swap tier, 
and will re-submit RFC for swap tier, we’ll need to revisit the discussion.

Youngjun Park


  parent reply	other threads:[~2025-10-21  7:34 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-06 20:02 [PATCH 0/4] mm, swap: misc cleanup and bugfix Kairui Song
2025-10-06 20:02 ` [PATCH 1/4] mm, swap: do not perform synchronous discard during allocation Kairui Song
2025-10-07 23:52   ` Nhat Pham
2025-10-08 20:54   ` Chris Li
2025-10-09 15:32     ` Kairui Song
2025-10-09 16:58       ` Chris Li
2025-10-12 16:49     ` Kairui Song
2025-10-14 21:27       ` Chris Li
2025-10-15  2:55         ` Chris Li
2025-10-15  6:24           ` Kairui Song
2025-10-15 16:45             ` Kairui Song
2025-10-21  6:48               ` Chris Li
2025-10-21  8:44                 ` Kairui Song
2025-10-21  7:34               ` YoungJun Park [this message]
2025-10-24  4:00                 ` Kairui Song
2025-10-06 20:02 ` [PATCH 2/4] mm, swap: rename helper for setup bad slots Kairui Song
2025-10-07 23:47   ` Nhat Pham
2025-10-08 10:25   ` David Hildenbrand
2025-10-08 20:58   ` Chris Li
2025-10-06 20:02 ` [PATCH 3/4] mm, swap: cleanup swap entry allocation parameter Kairui Song
2025-10-06 20:07   ` Kairui Song
2025-10-07 23:49     ` Nhat Pham
2025-10-08 10:26       ` David Hildenbrand
2025-10-08 20:59   ` Chris Li
2025-10-14  3:12   ` Baolin Wang
2025-10-06 20:02 ` [PATCH 4/4] mm/migrate, swap: drop usage of folio_index Kairui Song
2025-10-07 23:48   ` Nhat Pham
2025-10-08  1:20     ` Andrew Morton
2025-10-09 15:33     ` Kairui Song
2025-10-08 21:03   ` Chris Li
2025-10-14  3:15   ` Baolin Wang
2025-10-07 22:20 ` [PATCH 0/4] mm, swap: misc cleanup and bugfix Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPc3lmbJEVTXoV6h@yjaykim-PowerEdge-T330 \
    --to=youngjun.park@lge.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=ryncsn@gmail.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=stable@vger.kernel.org \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox