From: Baoquan He <bhe@redhat.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Barry Song <baohua@kernel.org>, Chris Li <chrisl@kernel.org>,
Nhat Pham <nphamcs@gmail.com>,
Yosry Ahmed <yosry.ahmed@linux.dev>,
David Hildenbrand <david@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Youngjun Park <youngjun.park@lge.com>,
Hugh Dickins <hughd@google.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Ying Huang <ying.huang@linux.alibaba.com>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 10/19] mm, swap: consolidate cluster reclaim and usability check
Date: Thu, 18 Dec 2025 11:33:55 +0800 [thread overview]
Message-ID: <aUN2I9MhVJFPAMb6@MiWiFi-R3L-srv> (raw)
In-Reply-To: <CAMgjq7D4xa+UpGfwP-Ji1HEoxB1=iaPzypGHL+hozwPW6v6tPQ@mail.gmail.com>
On 12/18/25 at 02:30am, Kairui Song wrote:
> On Wed, Dec 17, 2025 at 7:16 PM Baoquan He <bhe@redhat.com> wrote:
> >
> > On 12/15/25 at 12:38pm, Kairui Song wrote:
> > > On Mon, Dec 15, 2025 at 12:13 PM Baoquan He <bhe@redhat.com> wrote:
> > > >
> > > > On 12/05/25 at 03:29am, Kairui Song wrote:
> > > > > From: Kairui Song <kasong@tencent.com>
> > > > >
> > > > > Swap cluster cache reclaim requires releasing the lock, so the cluster
> > > > > may become unusable after the reclaim. To prepare for checking swap
> > > > > cache using the swap table directly, consolidate the swap cluster
> > > > > reclaim and the check logic.
> > > > >
> > > > > We will want to avoid touching the cluster's data completely with the
> > > > ~~~~~~~~
> > > > 'want to' means 'will'?
> > >
> > > Sorry about my english, I mean in the following commit, we need to
> > > avoid accessing the cluster's table (ci->table) when the cluster is
> > > empty, so the reclaim helper need to check cluster status before
> > > accessing it.
> >
> > Got it, I could be wrong. Please ignore this nit pick unless any english
> > native speaker raise concern on this.
> >
> > >
> > > >
> > > > > swap table, to avoid RCU overhead here. And by moving the cluster usable
> > > > > check into the reclaim helper, it will also help avoid a redundant scan of
> > > > > the slots if the cluster is no longer usable, and we will want to avoid
> > > > ~~~~~~~~~~~~
> > > > this place too.
> > > > > touching the cluster.
> > > > >
> > > > > Also, adjust it very slightly while at it: always scan the whole region
> > > > > during reclaim, don't skip slots covered by a reclaimed folio. Because
> > > > > the reclaim is lockless, it's possible that new cache lands at any time.
> > > > > And for allocation, we want all caches to be reclaimed to avoid
> > > > > fragmentation. Besides, if the scan offset is not aligned with the size
> > > > > of the reclaimed folio, we might skip some existing cache and fail the
> > > > > reclaim unexpectedly.
> > > > >
> > > > > There should be no observable behavior change. It might slightly improve
> > > > > the fragmentation issue or performance.
> > > > >
> > > > > Signed-off-by: Kairui Song <kasong@tencent.com>
> > > > > ---
> > > > > mm/swapfile.c | 45 +++++++++++++++++++++++++++++----------------
> > > > > 1 file changed, 29 insertions(+), 16 deletions(-)
> > > > >
> > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > > > index 5a766d4fcaa5..2703dfafc632 100644
> > > > > --- a/mm/swapfile.c
> > > > > +++ b/mm/swapfile.c
> > > > > @@ -777,33 +777,51 @@ static int swap_cluster_setup_bad_slot(struct swap_cluster_info *cluster_info,
> > > > > return 0;
> > > > > }
> > > > >
> > > > > +/*
> > > > > + * Reclaim drops the ci lock, so the cluster may become unusable (freed or
> > > > > + * stolen by a lower order). @usable will be set to false if that happens.
> > > > > + */
> > > > > static bool cluster_reclaim_range(struct swap_info_struct *si,
> > > > > struct swap_cluster_info *ci,
> > > > > - unsigned long start, unsigned long end)
> > > > > + unsigned long start, unsigned int order,
> > > > > + bool *usable)
> > > > > {
> > > > > + unsigned int nr_pages = 1 << order;
> > > > > + unsigned long offset = start, end = start + nr_pages;
> > > > > unsigned char *map = si->swap_map;
> > > > > - unsigned long offset = start;
> > > > > int nr_reclaim;
> > > > >
> > > > > spin_unlock(&ci->lock);
> > > > > do {
> > > > > switch (READ_ONCE(map[offset])) {
> > > > > case 0:
> > > > > - offset++;
> > > > > break;
> > > > > case SWAP_HAS_CACHE:
> > > > > nr_reclaim = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY);
> > > > > - if (nr_reclaim > 0)
> > > > > - offset += nr_reclaim;
> > > > > - else
> > > > > + if (nr_reclaim < 0)
> > > > > goto out;
> > > > > break;
> > > > > default:
> > > > > goto out;
> > > > > }
> > > > > - } while (offset < end);
> > > > > + } while (++offset < end);
> > > > ~~~~~ '++offset' is conflicting with nr_reclaim
> > > > returned from __try_to_reclaim_swap(). can you explain?
> > >
> > > What do you mean conflicting? If (nr_reclaim < 0), reclaim failed,
> > > this loop ends. If (nr_reclaim == 0), the slot is likely concurrently
> > > freed so the loop should just continue to iterate & reclaim to ensure
> > > all slots are freed. If nr_reclaim > 0, the reclaim just freed a folio
> > > of nr_reclaim pages. We can round up by nr_reclaim to skip the slots
> > > that were occupied by the folio, but note here we are not locking the
> > > ci so there could be new folios landing in that range. Just keep
> > > iterating the reclaim seems still a good option and that makes the
> > > code simpler, and in practice maybe faster as there are less branches
> > > and calculations involved.
> >
> > I see now. The 'conflicting' may be not precise. I didn't understand
> > this because __try_to_reclaim_swap() is called in several places, and
> > all of them have the same situation about lock releasing and retaking
> > on ci->lock around __try_to_reclaim_swap(). As you said, we may need
> > refactor __try_to_reclaim_swap() and make change in all those places.
>
> It's a bit different, other callers of __try_to_reclaim_swap are just
> best effort try to reclaim a slot's swap cache, because ultimately the
> allocator will reclaim the slot if needed anyway. But here, it is the
> allocator doing the reclaim, so we want precisely every slot to be
> cleaned.
OK, I see. Thanks for the explanation. While I think that's why we did
the recheck in later for loop. The old way and your change may have the
similar effect.
>
> Avoid the align /round_up also make the code a bit cleaner.
>
next prev parent reply other threads:[~2025-12-18 3:34 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-04 19:29 [PATCH v4 00/19] mm, swap: swap table phase II: unify swapin use swap cache and cleanup flags Kairui Song
2025-12-04 19:29 ` [PATCH v4 01/19] mm, swap: rename __read_swap_cache_async to swap_cache_alloc_folio Kairui Song
2025-12-11 1:01 ` Baoquan He
2025-12-04 19:29 ` [PATCH v4 02/19] mm, swap: split swap cache preparation loop into a standalone helper Kairui Song
2025-12-04 19:29 ` [PATCH v4 03/19] mm, swap: never bypass the swap cache even for SWP_SYNCHRONOUS_IO Kairui Song
2025-12-04 19:29 ` [PATCH v4 04/19] mm, swap: always try to free swap cache for SWP_SYNCHRONOUS_IO devices Kairui Song
2025-12-04 19:29 ` [PATCH v4 05/19] mm, swap: simplify the code and reduce indention Kairui Song
2025-12-04 19:29 ` [PATCH v4 06/19] mm, swap: free the swap cache after folio is mapped Kairui Song
2025-12-11 4:21 ` Baoquan He
2025-12-04 19:29 ` [PATCH v4 07/19] mm/shmem: never bypass the swap cache for SWP_SYNCHRONOUS_IO Kairui Song
2025-12-04 19:29 ` [PATCH v4 08/19] mm/shmem, swap: remove SWAP_MAP_SHMEM Kairui Song
2025-12-04 19:29 ` [PATCH v4 09/19] mm, swap: swap entry of a bad slot should not be considered as swapped out Kairui Song
2025-12-15 3:57 ` Baoquan He
2025-12-15 4:12 ` Kairui Song
2025-12-04 19:29 ` [PATCH v4 10/19] mm, swap: consolidate cluster reclaim and usability check Kairui Song
2025-12-15 4:12 ` Baoquan He
2025-12-15 4:38 ` Kairui Song
2025-12-17 11:15 ` Baoquan He
2025-12-17 18:30 ` Kairui Song
2025-12-18 3:33 ` Baoquan He [this message]
2025-12-04 19:29 ` [PATCH v4 11/19] mm, swap: split locked entry duplicating into a standalone helper Kairui Song
2025-12-17 11:22 ` Baoquan He
2025-12-17 18:37 ` Kairui Song
2025-12-19 17:26 ` Kairui Song
2025-12-20 4:00 ` Baoquan He
2025-12-04 19:29 ` [PATCH v4 12/19] mm, swap: use swap cache as the swap in synchronize layer Kairui Song
2025-12-18 3:31 ` Baoquan He
2025-12-18 3:40 ` Kairui Song
2025-12-04 19:29 ` [PATCH v4 13/19] mm, swap: remove workaround for unsynchronized swap map cache state Kairui Song
2025-12-18 3:37 ` Baoquan He
2025-12-04 19:29 ` [PATCH v4 14/19] mm, swap: cleanup swap entry management workflow Kairui Song
2025-12-04 19:29 ` [PATCH v4 15/19] mm, swap: add folio to swap cache directly on allocation Kairui Song
2025-12-04 19:29 ` [PATCH v4 16/19] mm, swap: check swap table directly for checking cache Kairui Song
2025-12-04 19:29 ` [PATCH v4 17/19] mm, swap: clean up and improve swap entries freeing Kairui Song
2025-12-04 19:29 ` [PATCH v4 18/19] mm, swap: drop the SWAP_HAS_CACHE flag Kairui Song
2025-12-04 19:29 ` [PATCH v4 19/19] mm, swap: remove no longer needed _swap_info_get Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aUN2I9MhVJFPAMb6@MiWiFi-R3L-srv \
--to=bhe@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=nphamcs@gmail.com \
--cc=ryncsn@gmail.com \
--cc=shikemeng@huaweicloud.com \
--cc=willy@infradead.org \
--cc=ying.huang@linux.alibaba.com \
--cc=yosry.ahmed@linux.dev \
--cc=youngjun.park@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox