linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeel.butt@linux.dev>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Barry Song <21cnbao@gmail.com>,
	netdev@vger.kernel.org,  linux-mm@kvack.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	 Barry Song <v-songbaohua@oppo.com>,
	Jonathan Corbet <corbet@lwn.net>,
	 Eric Dumazet <edumazet@google.com>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	 Paolo Abeni <pabeni@redhat.com>,
	Willem de Bruijn <willemb@google.com>,
	 "David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	 Simon Horman <horms@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	 Michal Hocko <mhocko@suse.com>,
	Brendan Jackman <jackmanb@google.com>,
	 Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
	Yunsheng Lin <linyunsheng@huawei.com>,
	 Huacai Zhou <zhouhuacai@oppo.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	 Harry Yoo <harry.yoo@oracle.com>,
	David Hildenbrand <david@redhat.com>,
	 Matthew Wilcox <willy@infradead.org>,
	Roman Gushchin <roman.gushchin@linux.dev>
Subject: Re: [RFC PATCH] mm: net: disable kswapd for high-order network buffer allocation
Date: Mon, 13 Oct 2025 14:35:28 -0700	[thread overview]
Message-ID: <dhmafwxu2jj4lu6acoqdhqh46k33sbsj5jvepcfzly4c7dn2t7@ln5dgubll4ac> (raw)
In-Reply-To: <927bcdf7-1283-4ddd-bd5e-d2e399b26f7d@suse.cz>

On Mon, Oct 13, 2025 at 08:30:13PM +0200, Vlastimil Babka wrote:
> On 10/13/25 12:16, Barry Song wrote:
> > From: Barry Song <v-songbaohua@oppo.com>
> > 
> > On phones, we have observed significant phone heating when running apps
> > with high network bandwidth. This is caused by the network stack frequently
> > waking kswapd for order-3 allocations. As a result, memory reclamation becomes
> > constantly active, even though plenty of memory is still available for network
> > allocations which can fall back to order-0.
> > 
> > Commit ce27ec60648d ("net: add high_order_alloc_disable sysctl/static key")
> > introduced high_order_alloc_disable for the transmit (TX) path
> > (skb_page_frag_refill()) to mitigate some memory reclamation issues,
> > allowing the TX path to fall back to order-0 immediately, while leaving the
> > receive (RX) path (__page_frag_cache_refill()) unaffected. Users are
> > generally unaware of the sysctl and cannot easily adjust it for specific use
> > cases. Enabling high_order_alloc_disable also completely disables the
> > benefit of order-3 allocations. Additionally, the sysctl does not apply to the
> > RX path.
> > 
> > An alternative approach is to disable kswapd for these frequent
> > allocations and provide best-effort order-3 service for both TX and RX paths,
> > while removing the sysctl entirely.
> > 
> > Cc: Jonathan Corbet <corbet@lwn.net>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Kuniyuki Iwashima <kuniyu@google.com>
> > Cc: Paolo Abeni <pabeni@redhat.com>
> > Cc: Willem de Bruijn <willemb@google.com>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Simon Horman <horms@kernel.org>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Brendan Jackman <jackmanb@google.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Zi Yan <ziy@nvidia.com>
> > Cc: Yunsheng Lin <linyunsheng@huawei.com>
> > Cc: Huacai Zhou <zhouhuacai@oppo.com>
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > ---
> >  Documentation/admin-guide/sysctl/net.rst | 12 ------------
> >  include/net/sock.h                       |  1 -
> >  mm/page_frag_cache.c                     |  2 +-
> >  net/core/sock.c                          |  8 ++------
> >  net/core/sysctl_net_core.c               |  7 -------
> >  5 files changed, 3 insertions(+), 27 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
> > index 2ef50828aff1..b903bbae239c 100644
> > --- a/Documentation/admin-guide/sysctl/net.rst
> > +++ b/Documentation/admin-guide/sysctl/net.rst
> > @@ -415,18 +415,6 @@ GRO has decided not to coalesce, it is placed on a per-NAPI list. This
> >  list is then passed to the stack when the number of segments reaches the
> >  gro_normal_batch limit.
> >  
> > -high_order_alloc_disable
> > -------------------------
> > -
> > -By default the allocator for page frags tries to use high order pages (order-3
> > -on x86). While the default behavior gives good results in most cases, some users
> > -might have hit a contention in page allocations/freeing. This was especially
> > -true on older kernels (< 5.14) when high-order pages were not stored on per-cpu
> > -lists. This allows to opt-in for order-0 allocation instead but is now mostly of
> > -historical importance.
> > -
> > -Default: 0
> > -
> >  2. /proc/sys/net/unix - Parameters for Unix domain sockets
> >  ----------------------------------------------------------
> >  
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 60bcb13f045c..62306c1095d5 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -3011,7 +3011,6 @@ extern __u32 sysctl_wmem_default;
> >  extern __u32 sysctl_rmem_default;
> >  
> >  #define SKB_FRAG_PAGE_ORDER	get_order(32768)
> > -DECLARE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key);
> >  
> >  static inline int sk_get_wmem0(const struct sock *sk, const struct proto *proto)
> >  {
> > diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c
> > index d2423f30577e..dd36114dd16f 100644
> > --- a/mm/page_frag_cache.c
> > +++ b/mm/page_frag_cache.c
> > @@ -54,7 +54,7 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
> >  	gfp_t gfp = gfp_mask;
> >  
> >  #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> > -	gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) |  __GFP_COMP |
> > +	gfp_mask = (gfp_mask & ~__GFP_RECLAIM) |  __GFP_COMP |
> >  		   __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC;
> 
> I'm a bit worried about proliferating "~__GFP_RECLAIM" allocations now that
> we introduced alloc_pages_nolock() and kmalloc_nolock() where it's
> interpreted as "cannot spin" - see gfpflags_allow_spinning(). Currently it's
> fine for the page allocator itself where we have a different entry point
> that uses ALLOC_TRYLOCK, but it can affect nested allocations of all kinds
> of debugging and accounting metadata (page_owner, memcg, alloc tags for slab
> objects etc). kmalloc_nolock() relies on gfpflags_allow_spinning() fully
> 
> I wonder if we should either:
> 
> 1) sacrifice a new __GFP flag specifically for "!allow_spin" case to
> determine it precisely.
> 
> 2) keep __GFP_KSWAPD_RECLAIM for allocations that remove it for purposes of
> not being disturbing (like proposed here), but that can in fact allow
> spinning. Instead, decide to not wake up kswapd by those when other
> information indicates it's an opportunistic allocation
> (~__GFP_DIRECT_RECLAIM, _GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC,
> order > 0...)
> 
> 3) something better?
> 

For the !allow_spin allocations, I think we should just add a new __GFP
flag instead of adding more complexity to other allocators which may or
may not want kswapd wakeup for many different reasons.





  reply	other threads:[~2025-10-13 21:35 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-13 10:16 Barry Song
2025-10-13 18:30 ` Vlastimil Babka
2025-10-13 21:35   ` Shakeel Butt [this message]
2025-10-13 21:53     ` Alexei Starovoitov
2025-10-13 22:25       ` Shakeel Butt
2025-10-13 22:46   ` Roman Gushchin
2025-10-14  4:31     ` Barry Song
2025-10-14  7:24     ` Michal Hocko
2025-10-14  7:26   ` Michal Hocko
2025-10-14  8:08     ` Barry Song
2025-10-14 14:27     ` Shakeel Butt
2025-10-14 15:14       ` Michal Hocko
2025-10-14 17:22         ` Shakeel Butt
2025-10-15  6:21           ` Michal Hocko
2025-10-15 18:26             ` Shakeel Butt
2025-10-13 18:53 ` Eric Dumazet
2025-10-14  3:58   ` Barry Song
2025-10-14  5:07     ` Eric Dumazet
2025-10-14  6:43       ` Barry Song
2025-10-14  7:01         ` Eric Dumazet
2025-10-14  8:17           ` Barry Song
2025-10-14  8:25             ` Eric Dumazet
2025-10-13 21:56 ` Matthew Wilcox
2025-10-14  4:09   ` Barry Song
2025-10-14  5:04     ` Eric Dumazet
2025-10-14  8:58       ` Barry Song
2025-10-14  9:49         ` Eric Dumazet
2025-10-14 10:19           ` Barry Song
2025-10-14 10:39             ` Eric Dumazet
2025-10-14 20:17               ` Barry Song
2025-10-15  6:39                 ` Eric Dumazet
2025-10-15  7:35                   ` Barry Song
2025-10-15 16:39                     ` Suren Baghdasaryan
2025-10-14 14:37             ` Shakeel Butt
2025-10-14 20:28               ` Barry Song
2025-10-15 18:13                 ` Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dhmafwxu2jj4lu6acoqdhqh46k33sbsj5jvepcfzly4c7dn2t7@ln5dgubll4ac \
    --to=shakeel.butt@linux.dev \
    --cc=21cnbao@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=david@redhat.com \
    --cc=edumazet@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=horms@kernel.org \
    --cc=jackmanb@google.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linyunsheng@huawei.com \
    --cc=mhocko@suse.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=roman.gushchin@linux.dev \
    --cc=surenb@google.com \
    --cc=v-songbaohua@oppo.com \
    --cc=vbabka@suse.cz \
    --cc=willemb@google.com \
    --cc=willy@infradead.org \
    --cc=zhouhuacai@oppo.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox