linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mateusz Guzik <mjguzik@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Suren Baghdasaryan <surenb@google.com>,
	 "Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Christoph Lameter <cl@gentwo.org>,
	 David Rientjes <rientjes@google.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	 Harry Yoo <harry.yoo@oracle.com>,
	Uladzislau Rezki <urezki@gmail.com>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
	rcu@vger.kernel.org, maple-tree@lists.infradead.org
Subject: Re: [PATCH v5 01/14] slab: add opt-in caching layer of percpu sheaves
Date: Sat, 13 Sep 2025 16:35:03 +0200	[thread overview]
Message-ID: <ja3gd6bdckiuanm3xu4hszpjm7euvffm3k7tmu7drh6mdel7m6@oyr635vikumk> (raw)
In-Reply-To: <20250723-slub-percpu-caches-v5-1-b792cd830f5d@suse.cz>

On Wed, Jul 23, 2025 at 03:34:34PM +0200, Vlastimil Babka wrote:
> The sheaves do not distinguish NUMA locality of the cached objects.

While currently sheaves are opt-in, to my understanding the plan is to
make this the default.

I would argue a hard requirement for a general purpose allocator in this
day and age is to provide node-local memory by default. Notably if you
have a workload which was careful to bind itself to one node, it should
not receive memory backed by other nodes unless there is no other
option. AFAIU this is satisifed with the stock allocator on the grounds
of running on a given domain, without having to explicitly demand memory
from it for everyting.

I expect the lack of NUMA-awareness to result in increased accumulation
of "mismatched" memory as uptime goes up, violating the above.

Some examples how I expect that to happen should this get expanded to
all allocations:
- wherever init happens to reap a zombie, task_struct and some more
  stuff may be "misplaced"
- even ignoring init, literally any fork/exec/exit heavy workload which
  runs on more than one node will be ripe with mismatched frees as the
  scheduler moves things around and the original parent reaps children
- a process passes a file descriptor to a process on another domain and
  the latter is the last to fput
- a container creates a bunch of dentries and whacks them
etc.

In all of these cases getting unlucky means you are using non-local
memory, which in turn will result in weird anomalies which suddenly
clear themselves up if you restart the program (or which show up out of
nowhere).

Arguably, the fork thing is a problem as is and *probably* could be
reduced by asking the scheduler upfront where it would run the child
domain-wise if it had to do it right now and making fork allocate memory
from that domain.

But even with this or some other mitigation in place there would be
plenty of potential to free non-local memory, so the general problem
statement stands.

I admit though I don't have a good solution as to how to handle the
"bad" frees. Someone (I think you?) stated that one of the previous
allocators was just freeing to per-domain lists or arrays and that was
causing trouble -- perhaps this would work if it came with small limits
in place for how big these can get?


  parent reply	other threads:[~2025-09-13 14:35 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-23 13:34 [PATCH v5 00/14] SLUB " Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 01/14] slab: add opt-in caching layer of " Vlastimil Babka
2025-08-18 10:09   ` Harry Yoo
2025-08-26  8:03     ` Vlastimil Babka
2025-08-19  4:19   ` Suren Baghdasaryan
2025-08-26  8:51     ` Vlastimil Babka
2025-09-13 14:35   ` Mateusz Guzik [this message]
2025-09-13 20:32     ` Vlastimil Babka
2025-09-14  2:22   ` Hillf Danton
2025-09-14 20:24     ` Vlastimil Babka
2025-09-15  0:11       ` Hillf Danton
2025-09-15  7:21         ` Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 02/14] slab: add sheaf support for batching kfree_rcu() operations Vlastimil Babka
2025-07-23 16:39   ` Uladzislau Rezki
2025-07-24 14:30     ` Vlastimil Babka
2025-07-24 17:36       ` Uladzislau Rezki
2025-07-23 13:34 ` [PATCH v5 03/14] slab: sheaf prefilling for guaranteed allocations Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 04/14] slab: determine barn status racily outside of lock Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 05/14] tools: Add testing support for changes to rcu and slab for sheaves Vlastimil Babka
2025-08-22 16:28   ` Suren Baghdasaryan
2025-08-26  9:32     ` Vlastimil Babka
2025-08-27  0:19       ` Suren Baghdasaryan
2025-07-23 13:34 ` [PATCH v5 06/14] tools: Add sheaves support to testing infrastructure Vlastimil Babka
2025-08-22 16:56   ` Suren Baghdasaryan
2025-08-26  9:59     ` Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 07/14] maple_tree: use percpu sheaves for maple_node_cache Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 08/14] mm, vma: use percpu sheaves for vm_area_struct cache Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 09/14] mm, slub: skip percpu sheaves for remote object freeing Vlastimil Babka
2025-08-25  5:22   ` Harry Yoo
2025-08-26 10:11     ` Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 10/14] mm, slab: allow NUMA restricted allocations to use percpu sheaves Vlastimil Babka
2025-08-22 19:58   ` Suren Baghdasaryan
2025-08-25  6:52   ` Harry Yoo
2025-08-26 10:49     ` Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 11/14] testing/radix-tree/maple: Increase readers and reduce delay for faster machines Vlastimil Babka
2025-07-23 13:34 ` [PATCH v5 12/14] maple_tree: Sheaf conversion Vlastimil Babka
2025-08-22 20:18   ` Suren Baghdasaryan
2025-08-26 14:22     ` Liam R. Howlett
2025-08-27  2:07       ` Suren Baghdasaryan
2025-08-28 14:27         ` Liam R. Howlett
2025-07-23 13:34 ` [PATCH v5 13/14] maple_tree: Add single node allocation support to maple state Vlastimil Babka
2025-08-22 20:25   ` Suren Baghdasaryan
2025-08-26 15:10     ` Liam R. Howlett
2025-08-27  2:03       ` Suren Baghdasaryan
2025-07-23 13:34 ` [PATCH v5 14/14] maple_tree: Convert forking to use the sheaf interface Vlastimil Babka
2025-08-22 20:29   ` Suren Baghdasaryan
2025-08-15 22:53 ` [PATCH v5 00/14] SLUB percpu sheaves Sudarsan Mahendran
2025-08-16  8:05   ` Harry Yoo
2025-08-16 17:35     ` Sudarsan Mahendran
2025-08-16 18:31       ` Vlastimil Babka
2025-08-16 18:33         ` Vlastimil Babka
2025-08-17  4:28           ` Sudarsan Mahendran
2025-09-13  0:09 ` Benchmarking " Sudarsan Mahendran
2025-09-15  7:51   ` Jan Engelhardt
2025-09-15 12:13     ` Paul E. McKenney
2025-09-15 15:22       ` Vlastimil Babka
2025-09-16 17:09         ` Suren Baghdasaryan
2025-09-17  5:19           ` Uladzislau Rezki
2025-09-17 16:14             ` Suren Baghdasaryan
2025-09-17 23:59               ` Suren Baghdasaryan
2025-09-18 11:50                 ` Uladzislau Rezki
2025-09-18 15:29                   ` Liam R. Howlett
2025-09-19 15:07                     ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ja3gd6bdckiuanm3xu4hszpjm7euvffm3k7tmu7drh6mdel7m6@oyr635vikumk \
    --to=mjguzik@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=cl@gentwo.org \
    --cc=harry.yoo@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maple-tree@lists.infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=surenb@google.com \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox