linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Barry Song <21cnbao@gmail.com>
To: mhocko@suse.com
Cc: 21cnbao@gmail.com, alexei.starovoitov@gmail.com, corbet@lwn.net,
	davem@davemloft.net, david@redhat.com, edumazet@google.com,
	hannes@cmpxchg.org, harry.yoo@oracle.com, horms@kernel.org,
	jackmanb@google.com, kuba@kernel.org, kuniyu@google.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linyunsheng@huawei.com,
	netdev@vger.kernel.org, pabeni@redhat.com,
	roman.gushchin@linux.dev, surenb@google.com,
	v-songbaohua@oppo.com, vbabka@suse.cz, willemb@google.com,
	willy@infradead.org, zhouhuacai@oppo.com, ziy@nvidia.com,
	baolin.wang@linux.alibaba.com
Subject: Re: [RFC PATCH] mm: net: disable kswapd for high-order network buffer allocation
Date: Tue, 14 Oct 2025 16:08:12 +0800	[thread overview]
Message-ID: <20251014080812.2985-1-21cnbao@gmail.com> (raw)
In-Reply-To: <aO37Od0VxOGmWCjm@tiehlicka>

On Tue, Oct 14, 2025 at 3:26 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 13-10-25 20:30:13, Vlastimil Babka wrote:
> > On 10/13/25 12:16, Barry Song wrote:
> > > From: Barry Song <v-songbaohua@oppo.com>
> [...]
> > I wonder if we should either:
> >
> > 1) sacrifice a new __GFP flag specifically for "!allow_spin" case to
> > determine it precisely.
>
> As said in other reply I do not think this is a good fit for this
> specific case as it is all or nothing approach. Soon enough we discover
> that "no effort to reclaim/compact" hurts other usecases. So I do not
> think we need a dedicated flag for this specific case. We need a way to
> tell kswapd/kcompactd how much to try instead.

+Baolin, who may have observed the same issue.

An issue with vmscan is that kcompactd is woken up very late, only after
reclaiming a large number of order-0 pages to satisfy an order-3
application.

static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
{

...
                balanced = pgdat_balanced(pgdat, sc.order, highest_zoneidx);
                if (!balanced && nr_boost_reclaim) {
                        nr_boost_reclaim = 0;
                        goto restart;
                }

                /*
                 * If boosting is not active then only reclaim if there are no
                 * eligible zones. Note that sc.reclaim_idx is not used as
                 * buffer_heads_over_limit may have adjusted it.
                 */
                if (!nr_boost_reclaim && balanced)
                        goto out;
...
                if (kswapd_shrink_node(pgdat, &sc))
                        raise_priority = false;
...

out:

                ...
                /*
                 * As there is now likely space, wakeup kcompact to defragment
                 * pageblocks.
                 */
                wakeup_kcompactd(pgdat, pageblock_order, highest_zoneidx);
}

As pgdat_balanced() needs at least one 3-order pages to return true:

bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
                         int highest_zoneidx, unsigned int alloc_flags,
                         long free_pages)
{
        ...  
        if (free_pages <= min + z->lowmem_reserve[highest_zoneidx])
                return false;

        /* If this is an order-0 request then the watermark is fine */
        if (!order)
                return true;

        /* For a high-order request, check at least one suitable page is free */
        for (o = order; o < NR_PAGE_ORDERS; o++) {
                struct free_area *area = &z->free_area[o];
                int mt;

                if (!area->nr_free)
                        continue;

                for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
                        if (!free_area_empty(area, mt)) 
                                return true;
                }    

#ifdef CONFIG_CMA
                if ((alloc_flags & ALLOC_CMA) &&
                    !free_area_empty(area, MIGRATE_CMA)) {
                        return true;
                }    
#endif
                if ((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_OOM)) &&
                    !free_area_empty(area, MIGRATE_HIGHATOMIC)) {
                        return true;
                }

}

This appears to be incorrect and will always lead to over-reclamation in order0
to satisfy high-order applications.

I wonder if we should "goto out" earlier to wake up kcompactd when there
is plenty of memory available, even if no order-3 pages exist.

Conceptually, what I mean is:

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c80fcae7f2a1..d0e03066bbaa 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -7057,9 +7057,8 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
                 * eligible zones. Note that sc.reclaim_idx is not used as
                 * buffer_heads_over_limit may have adjusted it.
                 */
-               if (!nr_boost_reclaim && balanced)
+               if (!nr_boost_reclaim && (balanced || we_have_plenty_memory_to_compact()))
                        goto out;

                /* Limit the priority of boosting to avoid reclaim writeback */
                if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
                        raise_priority = false;


Thanks
Barry


  reply	other threads:[~2025-10-14  8:15 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-13 10:16 Barry Song
2025-10-13 18:30 ` Vlastimil Babka
2025-10-13 21:35   ` Shakeel Butt
2025-10-13 21:53     ` Alexei Starovoitov
2025-10-13 22:25       ` Shakeel Butt
2025-10-13 22:46   ` Roman Gushchin
2025-10-14  4:31     ` Barry Song
2025-10-14  7:24     ` Michal Hocko
2025-10-14  7:26   ` Michal Hocko
2025-10-14  8:08     ` Barry Song [this message]
2025-10-14 14:27     ` Shakeel Butt
2025-10-14 15:14       ` Michal Hocko
2025-10-14 17:22         ` Shakeel Butt
2025-10-15  6:21           ` Michal Hocko
2025-10-15 18:26             ` Shakeel Butt
2025-10-13 18:53 ` Eric Dumazet
2025-10-14  3:58   ` Barry Song
2025-10-14  5:07     ` Eric Dumazet
2025-10-14  6:43       ` Barry Song
2025-10-14  7:01         ` Eric Dumazet
2025-10-14  8:17           ` Barry Song
2025-10-14  8:25             ` Eric Dumazet
2025-10-13 21:56 ` Matthew Wilcox
2025-10-14  4:09   ` Barry Song
2025-10-14  5:04     ` Eric Dumazet
2025-10-14  8:58       ` Barry Song
2025-10-14  9:49         ` Eric Dumazet
2025-10-14 10:19           ` Barry Song
2025-10-14 10:39             ` Eric Dumazet
2025-10-14 20:17               ` Barry Song
2025-10-15  6:39                 ` Eric Dumazet
2025-10-15  7:35                   ` Barry Song
2025-10-15 16:39                     ` Suren Baghdasaryan
2025-10-14 14:37             ` Shakeel Butt
2025-10-14 20:28               ` Barry Song
2025-10-15 18:13                 ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251014080812.2985-1-21cnbao@gmail.com \
    --to=21cnbao@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=david@redhat.com \
    --cc=edumazet@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=horms@kernel.org \
    --cc=jackmanb@google.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linyunsheng@huawei.com \
    --cc=mhocko@suse.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=roman.gushchin@linux.dev \
    --cc=surenb@google.com \
    --cc=v-songbaohua@oppo.com \
    --cc=vbabka@suse.cz \
    --cc=willemb@google.com \
    --cc=willy@infradead.org \
    --cc=zhouhuacai@oppo.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox