linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kairui Song <ryncsn@gmail.com>
To: YoungJun Park <youngjun.park@lge.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	 Baoquan He <bhe@redhat.com>, Barry Song <baohua@kernel.org>,
	Chris Li <chrisl@kernel.org>,  Nhat Pham <nphamcs@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Yosry Ahmed <yosry.ahmed@linux.dev>,
	David Hildenbrand <david@redhat.com>,
	 Hugh Dickins <hughd@google.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	 "Huang, Ying" <ying.huang@linux.alibaba.com>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	 Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	 "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 10/19] mm, swap: consolidate cluster reclaim and check logic
Date: Fri, 31 Oct 2025 15:11:13 +0800	[thread overview]
Message-ID: <CAMgjq7A2gs+CMRftP9r4Pt=GKDAO=NaZVuKFYBVkZZjgz8c96g@mail.gmail.com> (raw)
In-Reply-To: <aQRIOMsAkDciWFw/@yjaykim-PowerEdge-T330>

On Fri, Oct 31, 2025 at 1:25 PM YoungJun Park <youngjun.park@lge.com> wrote:
>
> On Wed, Oct 29, 2025 at 11:58:36PM +0800, Kairui Song wrote:
>
> > From: Kairui Song <kasong@tencent.com>
> >
>
> Hello Kairu, great work on your patchwork. :)
> > Swap cluster cache reclaim requires releasing the lock, so some extra
> > checks are needed after the reclaim. To prepare for checking swap cache
> > using the swap table directly, consolidate the swap cluster reclaim and
> > check the logic.
> >
> > Also, adjust it very slightly. By moving the cluster empty and usable
> > check into the reclaim helper, it will avoid a redundant scan of the
> > slots if the cluster is empty.
>
> This is Change 1
>
> > And always scan the whole region during reclaim, don't skip slots
> > covered by a reclaimed folio. Because the reclaim is lockless, it's
> > possible that new cache lands at any time. And for allocation, we want
> > all caches to be reclaimed to avoid fragmentation. And besides, if the
> > scan offset is not aligned with the size of the reclaimed folio, we are
> > skipping some existing caches.
>
> This is Change 2
>
> > There should be no observable behavior change, which might slightly
> > improve the fragmentation issue or performance.
> >
> > Signed-off-by: Kairui Song <kasong@tencent.com>
> > ---
> >  mm/swapfile.c | 47 +++++++++++++++++++++++------------------------
> >  1 file changed, 23 insertions(+), 24 deletions(-)
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index d66141f1c452..e4c521528817 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -778,42 +778,50 @@ static int swap_cluster_setup_bad_slot(struct swap_cluster_info *cluster_info,
> >       return 0;
> >  }
> >
> > -static bool cluster_reclaim_range(struct swap_info_struct *si,
> > -                               struct swap_cluster_info *ci,
> > -                               unsigned long start, unsigned long end)
> > +static unsigned int cluster_reclaim_range(struct swap_info_struct *si,
> > +                                       struct swap_cluster_info *ci,
> > +                                       unsigned long start, unsigned int order)
> >  {
> > +     unsigned int nr_pages = 1 << order;
> > +     unsigned long offset = start, end = start + nr_pages;
> >       unsigned char *map = si->swap_map;
> > -     unsigned long offset = start;
> >       int nr_reclaim;
> >
> >       spin_unlock(&ci->lock);
> >       do {
> >               switch (READ_ONCE(map[offset])) {
> >               case 0:
> > -                     offset++;
> >                       break;
> >               case SWAP_HAS_CACHE:
> >                       nr_reclaim = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY);
> > -                     if (nr_reclaim > 0)
> > -                             offset += nr_reclaim;
> > -                     else
> > +                     if (nr_reclaim < 0)
> >                               goto out;
> >                       break;
> >               default:
> >                       goto out;
> >               }
> > -     } while (offset < end);
> > +     } while (++offset < end);
>
> Change 2
>
> >  out:
> >       spin_lock(&ci->lock);
> > +
> > +     /*
> > +      * We just dropped ci->lock so cluster could be used by another
> > +      * order or got freed, check if it's still usable or empty.
> > +      */
> > +     if (!cluster_is_usable(ci, order))
> > +             return SWAP_ENTRY_INVALID;
> > +     if (cluster_is_empty(ci))
> > +             return cluster_offset(si, ci);
> > +
>
> Change 1
>
> >       /*
> >        * Recheck the range no matter reclaim succeeded or not, the slot
> >        * could have been be freed while we are not holding the lock.
> >        */
> >       for (offset = start; offset < end; offset++)
> >               if (READ_ONCE(map[offset]))
> > -                     return false;
> > +                     return SWAP_ENTRY_INVALID;
> >
> > -     return true;
> > +     return start;
> >  }
> >
> >  static bool cluster_scan_range(struct swap_info_struct *si,
> > @@ -901,7 +909,7 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si,
> >       unsigned long start = ALIGN_DOWN(offset, SWAPFILE_CLUSTER);
> >       unsigned long end = min(start + SWAPFILE_CLUSTER, si->max);
> >       unsigned int nr_pages = 1 << order;
> > -     bool need_reclaim, ret;
> > +     bool need_reclaim;
> >
> >       lockdep_assert_held(&ci->lock);
> >
> > @@ -913,20 +921,11 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si,
> >               if (!cluster_scan_range(si, ci, offset, nr_pages, &need_reclaim))
> >                       continue;
> >               if (need_reclaim) {
> > -                     ret = cluster_reclaim_range(si, ci, offset, offset + nr_pages);
> > -                     /*
> > -                      * Reclaim drops ci->lock and cluster could be used
> > -                      * by another order. Not checking flag as off-list
> > -                      * cluster has no flag set, and change of list
> > -                      * won't cause fragmentation.
> > -                      */
> > -                     if (!cluster_is_usable(ci, order))
> > -                             goto out;
> > -                     if (cluster_is_empty(ci))
> > -                             offset = start;
> > +                     found = cluster_reclaim_range(si, ci, offset, order);
> >                       /* Reclaim failed but cluster is usable, try next */
> > -                     if (!ret)
>
> Part of Change 1 (apply return value change)
>
> As I understand Change 1 just remove redudant checking.
> But, I think another part changed also.
> (maybe I don't fully understand comment or something)
>
> cluster_reclaim_range can return SWAP_ENTRY_INVALID
> if the cluster becomes unusable for the requested order.
> (!cluster_is_usable return SWAP_ENTRY_INVALID)
> And it continues loop to the next offset for reclaim try.
> Is this the intended behavior?

Thanks for the very careful review! I should keep the
cluster_is_usable check or abort in other ways to avoid touching an
unusable cluster, will fix it.


  reply	other threads:[~2025-10-31  7:11 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-29 15:58 [PATCH 00/19] mm, swap: never bypass swap cache and cleanup flags (swap table phase II) Kairui Song
2025-10-29 15:58 ` [PATCH 01/19] mm/swap: rename __read_swap_cache_async to swap_cache_alloc_folio Kairui Song
2025-10-30 22:53   ` Yosry Ahmed
2025-11-03  8:28     ` Barry Song
2025-11-03  9:02       ` Kairui Song
2025-11-03  9:10         ` Barry Song
2025-11-03 16:50         ` Yosry Ahmed
2025-10-29 15:58 ` [PATCH 02/19] mm, swap: split swap cache preparation loop into a standalone helper Kairui Song
2025-10-29 15:58 ` [PATCH 03/19] mm, swap: never bypass the swap cache even for SWP_SYNCHRONOUS_IO Kairui Song
2025-11-04  3:47   ` Barry Song
2025-11-04 10:44     ` Kairui Song
2025-10-29 15:58 ` [PATCH 04/19] mm, swap: always try to free swap cache for SWP_SYNCHRONOUS_IO devices Kairui Song
2025-11-04  4:19   ` Barry Song
2025-11-04  8:26     ` Barry Song
2025-11-04 10:55       ` Kairui Song
2025-10-29 15:58 ` [PATCH 05/19] mm, swap: simplify the code and reduce indention Kairui Song
2025-10-29 15:58 ` [PATCH 06/19] mm, swap: free the swap cache after folio is mapped Kairui Song
2025-11-04  9:14   ` Barry Song
2025-11-04 10:50     ` Kairui Song
2025-11-04 19:52       ` Barry Song
2025-10-29 15:58 ` [PATCH 07/19] mm/shmem: never bypass the swap cache for SWP_SYNCHRONOUS_IO Kairui Song
2025-10-29 15:58 ` [PATCH 08/19] mm/shmem, swap: remove SWAP_MAP_SHMEM Kairui Song
2025-10-29 15:58 ` [PATCH 09/19] mm, swap: swap entry of a bad slot should not be considered as swapped out Kairui Song
2025-10-29 15:58 ` [PATCH 10/19] mm, swap: consolidate cluster reclaim and check logic Kairui Song
2025-10-31  5:25   ` YoungJun Park
2025-10-31  7:11     ` Kairui Song [this message]
2025-10-29 15:58 ` [PATCH 11/19] mm, swap: split locked entry duplicating into a standalone helper Kairui Song
2025-10-29 15:58 ` [PATCH 12/19] mm, swap: use swap cache as the swap in synchronize layer Kairui Song
2025-10-29 19:25   ` kernel test robot
2025-10-29 15:58 ` [PATCH 13/19] mm, swap: remove workaround for unsynchronized swap map cache state Kairui Song
2025-11-07  3:07   ` Barry Song
2025-11-09 14:18     ` Kairui Song
2025-11-10  7:21       ` Barry Song
2025-11-16 16:01         ` Kairui Song
2025-10-29 15:58 ` [PATCH 14/19] mm, swap: sanitize swap entry management workflow Kairui Song
2025-10-29 19:25   ` kernel test robot
2025-10-30  5:25     ` Kairui Song
2025-10-29 19:25   ` kernel test robot
2025-11-01  4:51   ` YoungJun Park
2025-11-01  8:59     ` Kairui Song
2025-11-01  9:08       ` YoungJun Park
2025-10-29 15:58 ` [PATCH 15/19] mm, swap: add folio to swap cache directly on allocation Kairui Song
2025-10-29 16:52   ` Kairui Song
2025-10-31  5:56   ` YoungJun Park
2025-10-31  7:02     ` Kairui Song
2025-10-29 15:58 ` [PATCH 16/19] mm, swap: check swap table directly for checking cache Kairui Song
2025-11-06 21:02   ` Barry Song
2025-11-07  3:13     ` Kairui Song
2025-10-29 15:58 ` [PATCH 17/19] mm, swap: clean up and improve swap entries freeing Kairui Song
2025-10-29 15:58 ` [PATCH 18/19] mm, swap: drop the SWAP_HAS_CACHE flag Kairui Song
2025-10-29 15:58 ` [PATCH 19/19] mm, swap: remove no longer needed _swap_info_get Kairui Song
2025-10-30 23:04 ` [PATCH 00/19] mm, swap: never bypass swap cache and cleanup flags (swap table phase II) Yosry Ahmed
2025-10-31  6:58   ` Kairui Song
2025-11-05  7:39 ` Chris Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMgjq7A2gs+CMRftP9r4Pt=GKDAO=NaZVuKFYBVkZZjgz8c96g@mail.gmail.com' \
    --to=ryncsn@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=nphamcs@gmail.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yosry.ahmed@linux.dev \
    --cc=youngjun.park@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox