Re: [PATCH v4 18/22] slab: refill sheaves from all nodes

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mateusz Guzik <mjguzik@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Harry Yoo <harry.yoo@oracle.com>,
	Petr Tesarik <ptesarik@suse.com>,
	 Christoph Lameter <cl@gentwo.org>,
	David Rientjes <rientjes@google.com>,
	 Roman Gushchin <roman.gushchin@linux.dev>,
	Hao Li <hao.li@linux.dev>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	 "Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Suren Baghdasaryan <surenb@google.com>,
	 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Alexei Starovoitov <ast@kernel.org>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
	linux-rt-devel@lists.linux.dev, bpf@vger.kernel.org,
	 kasan-dev@googlegroups.com
Subject: Re: [PATCH v4 18/22] slab: refill sheaves from all nodes
Date: Tue, 27 Jan 2026 15:28:53 +0100	[thread overview]
Message-ID: <cburjqy3r73ojiaathpxwayvq7up263m3lvrikicrkkybdj2iz@vefohvamiqr4> (raw)
In-Reply-To: <20260123-sheaves-for-all-v4-18-041323d506f7@suse.cz>

On Fri, Jan 23, 2026 at 07:52:56AM +0100, Vlastimil Babka wrote:
> __refill_objects() currently only attempts to get partial slabs from the
> local node and then allocates new slab(s). Expand it to trying also
> other nodes while observing the remote node defrag ratio, similarly to
> get_any_partial().
> 
> This will prevent allocating new slabs on a node while other nodes have
> many free slabs. It does mean sheaves will contain non-local objects in
> that case. Allocations that care about specific node will still be
> served appropriately, but might get a slowpath allocation.

While I can agree pulling memory from other nodes is necessary in some
cases, I believe the patch as proposed is way too agressive and the
commit message does not justify it.

Interestingly there were already reports concerning this, for example:
https://lore.kernel.org/oe-lkp/202601132136.77efd6d7-lkp@intel.com/T/#u

quoting:
* [vbabka:b4/sheaves-for-all-rebased] [slab]  aa8fdb9e25: will-it-scale.per_process_ops 46.5% regression

The system at hand has merely 2 nodes and it already got:

         %stddev     %change         %stddev
             \          |                \  
      7274 ± 13%     -27.0%       5310 ± 16%  perf-c2c.DRAM.local
      1458 ± 14%    +272.3%       5431 ± 10%  perf-c2c.DRAM.remote
     77502 ±  9%     -58.6%      32066 ± 11%  perf-c2c.HITM.local
    150.83 ± 12%   +2150.3%       3394 ± 12%  perf-c2c.HITM.remote
     77653 ±  9%     -54.3%      35460 ± 10%  perf-c2c.HITM.total

As in a significant increase in traffic.

Things have to be way worse on systems with 4 and more nodes.

This is not a microbenchmark-specific problem either -- any cache miss
on memory allocated like that induces interconnect traffic. That's a
real slowdown in real workloads.

Admittedly I don't know what the policy is at the moment, it may be
things already suck.

A basic test for sanity is this: suppose you have a process whose all
threads are bound to one node. absent memory shortage in the local
node and allocations which somehow explicitly request a different node,
is it going to get local memory from kmalloc et al?

To my understanding with the patch at hand the answer is no.

Then not only this particular process is penalized for its lifetime, but
everything else is penalized on top -- even ignoring straight up penalty
for interconnect traffic, there is only so much it can handle to begin
with.

Readily usable slabs in other nodes should be of no significance as long
as there are enough resources locally.

If you are looking to reduce total memory usage, I would instead check
how things work out for resuing the same backing pages for differently
sizes objects (I mean is it even implemented?) and would investigate if
additional kmalloc slab sizes would help -- there are power-of-2 jumps
all the way to 8k. Chances are decent sizes like 384 and 768 bytes would
in fact drop real memory requirement.

iow, I think this patch should be dropped at least for the time being

next prev parent reply	other threads:[~2026-01-27 14:29 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-23  6:52 [PATCH v4 00/22] slab: replace cpu (partial) slabs with sheaves Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 01/22] mm/slab: add rcu_barrier() to kvfree_rcu_barrier_on_cache() Vlastimil Babka
2026-01-27 16:08   ` Liam R. Howlett
2026-01-23  6:52 ` [PATCH v4 02/22] mm/slab: fix false lockdep warning in __kfree_rcu_sheaf() Vlastimil Babka
2026-01-23 12:03   ` Sebastian Andrzej Siewior
2026-01-24 10:58     ` Harry Yoo
2026-01-23  6:52 ` [PATCH v4 03/22] slab: add SLAB_CONSISTENCY_CHECKS to SLAB_NEVER_MERGE Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 04/22] mm/slab: move and refactor __kmem_cache_alias() Vlastimil Babka
2026-01-27 16:17   ` Liam R. Howlett
2026-01-27 16:59     ` Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 05/22] mm/slab: make caches with sheaves mergeable Vlastimil Babka
2026-01-27 16:23   ` Liam R. Howlett
2026-01-23  6:52 ` [PATCH v4 06/22] slab: add sheaves to most caches Vlastimil Babka
2026-01-26  6:36   ` Hao Li
2026-01-26  8:39     ` Vlastimil Babka
2026-01-26 13:59   ` Breno Leitao
2026-01-27 16:34   ` Liam R. Howlett
2026-01-27 17:01     ` Vlastimil Babka
2026-01-29  7:24   ` Zhao Liu
2026-01-29  8:21     ` Vlastimil Babka
2026-01-30  7:15       ` Zhao Liu
2026-02-04 18:01         ` Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 07/22] slab: introduce percpu sheaves bootstrap Vlastimil Babka
2026-01-26  6:13   ` Hao Li
2026-01-26  8:42     ` Vlastimil Babka
2026-01-27 17:31   ` Liam R. Howlett
2026-01-23  6:52 ` [PATCH v4 08/22] slab: make percpu sheaves compatible with kmalloc_nolock()/kfree_nolock() Vlastimil Babka
2026-01-23 18:05   ` Alexei Starovoitov
2026-01-27 17:36   ` Liam R. Howlett
2026-01-29  8:25     ` Vlastimil Babka
2026-03-02 11:56   ` D, Suneeth
2026-03-02 12:16     ` Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 09/22] slab: handle kmalloc sheaves bootstrap Vlastimil Babka
2026-01-27 18:30   ` Liam R. Howlett
2026-01-23  6:52 ` [PATCH v4 10/22] slab: add optimized sheaf refill from partial list Vlastimil Babka
2026-01-26  7:12   ` Hao Li
2026-01-29  7:43     ` Harry Yoo
2026-01-29  8:29       ` Vlastimil Babka
2026-01-27 20:05   ` Liam R. Howlett
2026-01-29  8:01   ` Harry Yoo
2026-01-23  6:52 ` [PATCH v4 11/22] slab: remove cpu (partial) slabs usage from allocation paths Vlastimil Babka
2026-01-23 18:17   ` Alexei Starovoitov
2026-01-23  6:52 ` [PATCH v4 12/22] slab: remove SLUB_CPU_PARTIAL Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 13/22] slab: remove the do_slab_free() fastpath Vlastimil Babka
2026-01-23 18:15   ` Alexei Starovoitov
2026-01-23  6:52 ` [PATCH v4 14/22] slab: remove defer_deactivate_slab() Vlastimil Babka
2026-01-23 17:31   ` Alexei Starovoitov
2026-01-23  6:52 ` [PATCH v4 15/22] slab: simplify kmalloc_nolock() Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 16/22] slab: remove struct kmem_cache_cpu Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 17/22] slab: remove unused PREEMPT_RT specific macros Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 18/22] slab: refill sheaves from all nodes Vlastimil Babka
2026-01-27 14:28   ` Mateusz Guzik [this message]
2026-01-27 22:04     ` Vlastimil Babka
2026-01-29  9:16   ` Harry Yoo
2026-01-23  6:52 ` [PATCH v4 19/22] slab: update overview comments Vlastimil Babka
2026-01-23  6:52 ` [PATCH v4 20/22] slab: remove frozen slab checks from __slab_free() Vlastimil Babka
2026-01-29  7:16   ` Harry Yoo
2026-01-23  6:52 ` [PATCH v4 21/22] mm/slub: remove DEACTIVATE_TO_* stat items Vlastimil Babka
2026-01-29  7:21   ` Harry Yoo
2026-01-23  6:53 ` [PATCH v4 22/22] mm/slub: cleanup and repurpose some " Vlastimil Babka
2026-01-29  7:40   ` Harry Yoo
2026-01-29 15:18 ` [PATCH v4 00/22] slab: replace cpu (partial) slabs with sheaves Hao Li
2026-01-29 15:28   ` Vlastimil Babka
2026-01-29 16:06     ` Hao Li
2026-01-29 16:44       ` Liam R. Howlett
2026-01-30  4:38         ` Hao Li
2026-01-30  4:50     ` Hao Li
2026-01-30  6:17       ` Hao Li
2026-02-04 18:02       ` Vlastimil Babka
2026-02-04 18:24         ` Christoph Lameter (Ampere)
2026-02-06 16:44           ` Vlastimil Babka
2026-03-26 12:43 ` [REGRESSION] " Aishwarya Rambhadran
2026-03-26 14:42   ` Vlastimil Babka (SUSE)
2026-03-26 18:16     ` Uladzislau Rezki
2026-03-26 18:24       ` Vlastimil Babka (SUSE)
2026-03-26 18:50         ` Ryan Roberts
2026-03-27  7:54           ` Vlastimil Babka (SUSE)
2026-03-27  8:58             ` Ryan Roberts
2026-03-27 10:00               ` Harry Yoo (Oracle)
2026-03-27 11:21                 ` Vlastimil Babka (SUSE)
2026-03-27 16:24                   ` Aishwarya Rambhadran
2026-03-27  3:20     ` Harry Yoo (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cburjqy3r73ojiaathpxwayvq7up263m3lvrikicrkkybdj2iz@vefohvamiqr4 \
    --to=mjguzik@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=cl@gentwo.org \
    --cc=hao.li@linux.dev \
    --cc=harry.yoo@oracle.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=ptesarik@suse.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=surenb@google.com \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox