From: Barry Song <21cnbao@gmail.com>
To: mhocko@suse.com
Cc: 21cnbao@gmail.com, alexei.starovoitov@gmail.com, corbet@lwn.net,
davem@davemloft.net, david@redhat.com, edumazet@google.com,
hannes@cmpxchg.org, harry.yoo@oracle.com, horms@kernel.org,
jackmanb@google.com, kuba@kernel.org, kuniyu@google.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, linyunsheng@huawei.com,
netdev@vger.kernel.org, pabeni@redhat.com,
roman.gushchin@linux.dev, surenb@google.com,
v-songbaohua@oppo.com, vbabka@suse.cz, willemb@google.com,
willy@infradead.org, zhouhuacai@oppo.com, ziy@nvidia.com,
baolin.wang@linux.alibaba.com
Subject: Re: [RFC PATCH] mm: net: disable kswapd for high-order network buffer allocation
Date: Tue, 14 Oct 2025 16:08:12 +0800 [thread overview]
Message-ID: <20251014080812.2985-1-21cnbao@gmail.com> (raw)
In-Reply-To: <aO37Od0VxOGmWCjm@tiehlicka>
On Tue, Oct 14, 2025 at 3:26 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 13-10-25 20:30:13, Vlastimil Babka wrote:
> > On 10/13/25 12:16, Barry Song wrote:
> > > From: Barry Song <v-songbaohua@oppo.com>
> [...]
> > I wonder if we should either:
> >
> > 1) sacrifice a new __GFP flag specifically for "!allow_spin" case to
> > determine it precisely.
>
> As said in other reply I do not think this is a good fit for this
> specific case as it is all or nothing approach. Soon enough we discover
> that "no effort to reclaim/compact" hurts other usecases. So I do not
> think we need a dedicated flag for this specific case. We need a way to
> tell kswapd/kcompactd how much to try instead.
+Baolin, who may have observed the same issue.
An issue with vmscan is that kcompactd is woken up very late, only after
reclaiming a large number of order-0 pages to satisfy an order-3
application.
static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
{
...
balanced = pgdat_balanced(pgdat, sc.order, highest_zoneidx);
if (!balanced && nr_boost_reclaim) {
nr_boost_reclaim = 0;
goto restart;
}
/*
* If boosting is not active then only reclaim if there are no
* eligible zones. Note that sc.reclaim_idx is not used as
* buffer_heads_over_limit may have adjusted it.
*/
if (!nr_boost_reclaim && balanced)
goto out;
...
if (kswapd_shrink_node(pgdat, &sc))
raise_priority = false;
...
out:
...
/*
* As there is now likely space, wakeup kcompact to defragment
* pageblocks.
*/
wakeup_kcompactd(pgdat, pageblock_order, highest_zoneidx);
}
As pgdat_balanced() needs at least one 3-order pages to return true:
bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
int highest_zoneidx, unsigned int alloc_flags,
long free_pages)
{
...
if (free_pages <= min + z->lowmem_reserve[highest_zoneidx])
return false;
/* If this is an order-0 request then the watermark is fine */
if (!order)
return true;
/* For a high-order request, check at least one suitable page is free */
for (o = order; o < NR_PAGE_ORDERS; o++) {
struct free_area *area = &z->free_area[o];
int mt;
if (!area->nr_free)
continue;
for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
if (!free_area_empty(area, mt))
return true;
}
#ifdef CONFIG_CMA
if ((alloc_flags & ALLOC_CMA) &&
!free_area_empty(area, MIGRATE_CMA)) {
return true;
}
#endif
if ((alloc_flags & (ALLOC_HIGHATOMIC|ALLOC_OOM)) &&
!free_area_empty(area, MIGRATE_HIGHATOMIC)) {
return true;
}
}
This appears to be incorrect and will always lead to over-reclamation in order0
to satisfy high-order applications.
I wonder if we should "goto out" earlier to wake up kcompactd when there
is plenty of memory available, even if no order-3 pages exist.
Conceptually, what I mean is:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c80fcae7f2a1..d0e03066bbaa 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -7057,9 +7057,8 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx)
* eligible zones. Note that sc.reclaim_idx is not used as
* buffer_heads_over_limit may have adjusted it.
*/
- if (!nr_boost_reclaim && balanced)
+ if (!nr_boost_reclaim && (balanced || we_have_plenty_memory_to_compact()))
goto out;
/* Limit the priority of boosting to avoid reclaim writeback */
if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
raise_priority = false;
Thanks
Barry
next prev parent reply other threads:[~2025-10-14 8:15 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-13 10:16 Barry Song
2025-10-13 18:30 ` Vlastimil Babka
2025-10-13 21:35 ` Shakeel Butt
2025-10-13 21:53 ` Alexei Starovoitov
2025-10-13 22:25 ` Shakeel Butt
2025-10-13 22:46 ` Roman Gushchin
2025-10-14 4:31 ` Barry Song
2025-10-14 7:24 ` Michal Hocko
2025-10-14 7:26 ` Michal Hocko
2025-10-14 8:08 ` Barry Song [this message]
2025-10-14 14:27 ` Shakeel Butt
2025-10-14 15:14 ` Michal Hocko
2025-10-14 17:22 ` Shakeel Butt
2025-10-15 6:21 ` Michal Hocko
2025-10-15 18:26 ` Shakeel Butt
2025-10-13 18:53 ` Eric Dumazet
2025-10-14 3:58 ` Barry Song
2025-10-14 5:07 ` Eric Dumazet
2025-10-14 6:43 ` Barry Song
2025-10-14 7:01 ` Eric Dumazet
2025-10-14 8:17 ` Barry Song
2025-10-14 8:25 ` Eric Dumazet
2025-10-13 21:56 ` Matthew Wilcox
2025-10-14 4:09 ` Barry Song
2025-10-14 5:04 ` Eric Dumazet
2025-10-14 8:58 ` Barry Song
2025-10-14 9:49 ` Eric Dumazet
2025-10-14 10:19 ` Barry Song
2025-10-14 10:39 ` Eric Dumazet
2025-10-14 20:17 ` Barry Song
2025-10-15 6:39 ` Eric Dumazet
2025-10-15 7:35 ` Barry Song
2025-10-15 16:39 ` Suren Baghdasaryan
2025-10-14 14:37 ` Shakeel Butt
2025-10-14 20:28 ` Barry Song
2025-10-15 18:13 ` Shakeel Butt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251014080812.2985-1-21cnbao@gmail.com \
--to=21cnbao@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=corbet@lwn.net \
--cc=davem@davemloft.net \
--cc=david@redhat.com \
--cc=edumazet@google.com \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=horms@kernel.org \
--cc=jackmanb@google.com \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linyunsheng@huawei.com \
--cc=mhocko@suse.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=roman.gushchin@linux.dev \
--cc=surenb@google.com \
--cc=v-songbaohua@oppo.com \
--cc=vbabka@suse.cz \
--cc=willemb@google.com \
--cc=willy@infradead.org \
--cc=zhouhuacai@oppo.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox