linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Barry Song <baohua@kernel.org>
To: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	 David Hildenbrand <david@kernel.org>,
	Michal Hocko <mhocko@kernel.org>,
	 Qi Zheng <zhengqi.arch@bytedance.com>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	 Lorenzo Stoakes <ljs@kernel.org>,
	Kairui Song <kasong@tencent.com>,
	 Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>,  Wei Xu <weixugc@google.com>,
	Wang Lian <wanglian@kylinos.cn>,  Kunwu Chan <chentao@kylinos.cn>
Subject: Re: [RFC PATCH v2] mm: Improve pgdat_balanced() to avoid over-reclamation for higher-order allocation
Date: Wed, 22 Apr 2026 18:56:26 +0800	[thread overview]
Message-ID: <CAGsJ_4wyDqnoBXcBQL932pkg8QY79EWrbmKaVqNvm_s5RQrNFw@mail.gmail.com> (raw)
In-Reply-To: <8d4df864-2954-4eb6-b8d7-ae6595646e6e@linux.alibaba.com>

On Wed, Apr 22, 2026 at 2:59 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
>
>
> On 4/22/26 10:18 AM, Barry Song (Xiaomi) wrote:
> > We may encounter cases where the system still has plenty of free
> > memory, but cannot satisfy higher-order allocations. On phones, we
> > have observed that bursty network transfers can cause devices to
> > heat up. Baolin and Kairui have seen similar behavior on servers.
> >
> > Currently, kswapd behaves as follows: when a higher-order allocation
> > is issued with __GFP_KSWAPD_RECLAIM, pgdat_balanced() returns false
> > because __zone_watermark_ok() fails if no suitable higher-order
> > pages exist, even when free memory is well above the high watermark.
> > As a result, kswapd_shrink_node() sets an excessively large
> > sc->nr_to_reclaim and attempts aggressive reclamation:
> >
> >   for_each_managed_zone_pgdat(zone, pgdat, z, sc->reclaim_idx) {
> >       sc->nr_to_reclaim += max(high_wmark_pages(zone), SWAP_CLUSTER_MAX);
> >   }
> >
> > We have an opportunity to re-evaluate the balance by resetting
> > sc->order to 0 after shrink_node() with the following code
> > in kswapd_shrink_node():
> >   /*
> >    * Fragmentation may mean that the system cannot be rebalanced for
> >    * high-order allocations. If twice the allocation size has been
> >    * reclaimed then recheck watermarks only at order-0 to prevent
> >    * excessive reclaim.
> >    */
> >   if (sc->order && sc->nr_reclaimed >= compact_gap(sc->order))
> >       sc->order = 0;
> >
> > But we have actually scanned and over-reclaimed far more than
> > compact_gap(sc->order). If higher-order allocations continue, we may
> > see persistently high kswapd CPU utilization coexisting with plenty of
> > free memory in the system.
> >
> > We may want to evaluate the situation earlier at the beginning.
> > If there is plenty of free memory, we could avoid triggering
> > reclamation with an excessively large sc->nr_to_reclaim value
> > and instead prefer compaction.
> >
> > Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: David Hildenbrand <david@kernel.org>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Qi Zheng <zhengqi.arch@bytedance.com>
> > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > Cc: Lorenzo Stoakes <ljs@kernel.org>
> > Cc: Kairui Song <kasong@tencent.com>
> > Cc: Axel Rasmussen <axelrasmussen@google.com>
> > Cc: Yuanchu Xie <yuanchu@google.com>
> > Cc: Wei Xu <weixugc@google.com>
> > Co-developed-by: Wang Lian <wanglian@kylinos.cn>
> > Co-developed-by: Kunwu Chan <chentao@kylinos.cn>
> > Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org>
> > ---
>
> Thanks Barry for sending out the RFC patch for discussion.
>
> Yes, we have indeed seen reports from our customers' scenarios where
> fragmentation caused kswapd to be woken up and reclaim too many file
> folios (even when free memory was sufficient), leading to severe I/O
> contention that impacted some applications.
>
> However, I'm concerned that this patch might also have side effects,
> such as affecting system defragmentation. In some scenarios, directly
> reclaiming clean pagecache to free up space might be a faster way to

balance_pgdat() can still reclaim clean page cache even when
pgdat_balanced() returns true, provided that nr_boost_reclaim is
non-zero.

                /*
                 * If boosting is not active then only reclaim if there are no
                 * eligible zones. Note that sc.reclaim_idx is not used as
                 * buffer_heads_over_limit may have adjusted it.
                 */
                if (!nr_boost_reclaim && balanced)
                        goto out;

                /* Limit the priority of boosting to avoid reclaim writeback */
                if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
                        raise_priority = false;

                /*
                 * Do not writeback or swap pages for boosted reclaim. The
                 * intent is to relieve pressure not issue sub-optimal IO
                 * from reclaim context. If no pages are reclaimed, the
                 * reclaim will be aborted.
                 */
                sc.may_writepage = !nr_boost_reclaim;
                sc.may_swap = !nr_boost_reclaim;

I find that nr_boost_reclaim is almost always non-zero in bursty
network scenarios. So I guess clean page cache is still reclaimed,
but with much lower kswapd pressure.

> defragment. At the very least, I think under defrag_mode, we should be
> more aggressive about defragmentation (including reclaiming some memory
> by kswapd).

I guess we can keep the current behavior if defrag_mode prefers
over-reclaiming to form contiguous pages. Is it simply an
if (defrag_mode) check?

>
> >   -RFC v1 was "mm: net: disable kswapd for high-order network
> >    buffer allocation":
> >   https://lore.kernel.org/linux-mm/20251013101636.69220-1-21cnbao@gmail.com/
> >
> >   mm/vmscan.c | 7 +++++++
> >   1 file changed, 7 insertions(+)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index bd1b1aa12581..4f9668aa8eef 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -6964,6 +6964,13 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx)
> >               if (__zone_watermark_ok(zone, order, mark, highest_zoneidx,
> >                                       0, free_pages))
> >                       return true;
> > +             /*
> > +              * Free pages may be well above the watermark, but if
> > +              * higher-order pages are unavailable, kswapd may still
> > +              * trigger excessive reclamation.
> > +              */
> > +             if (order && compaction_suitable(zone, order, mark, highest_zoneidx))
> > +                     return true;
> >       }
> >
> >       /*
>

Thanks
Barry


  reply	other threads:[~2026-04-22 10:56 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-22  2:18 Barry Song (Xiaomi)
2026-04-22  6:58 ` Baolin Wang
2026-04-22 10:56   ` Barry Song [this message]
2026-04-22 15:47 ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGsJ_4wyDqnoBXcBQL932pkg8QY79EWrbmKaVqNvm_s5RQrNFw@mail.gmail.com \
    --to=baohua@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=chentao@kylinos.cn \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=wanglian@kylinos.cn \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox