* Re: [LSF/MM/BPF TOPIC] Strategies for memory deallocation/movement for Dynamic Capacity Pooling
2026-04-14 7:08 ` [LSF/MM/BPF TOPIC] Strategies for memory deallocation/movement for Dynamic Capacity Pooling Hannes Reinecke
@ 2026-04-15 0:26 ` Gregory Price
0 siblings, 0 replies; 2+ messages in thread
From: Gregory Price @ 2026-04-15 0:26 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Jonathan Cameron, lsf-pc, linux-cxl, linux-fsdevel, linux-mm
On Tue, Apr 14, 2026 at 09:08:22AM +0200, Hannes Reinecke wrote:
> On 4/13/26 23:10, Gregory Price wrote:
> > On Mon, Apr 13, 2026 at 04:43:59PM +0100, Jonathan Cameron wrote:
> > > >
> > > > So quite some things to discuss; however, not sure if this isn't too
> > > > much of an arcane topic which should rather be directed at places like
> > > > LPC. But I'll let the PC decide.
> > >
> > > Superficially feels a bit arcane, particularly as we are currently
> > > kicking untagged memory into the long grass as there are too many
> > > open questions on how to present it at all (e.g. related to Gregory's
> > > recent work on private nodes). On recent CXL sync calls the proposal
> > > has been to do tagged memory first and only support allocation of
> > > all memory with a given tag in one go and full release.
> > >
> >
> > General consensus after last few months seems to be:
> >
> > "While technically possible, untagged memory a bad idea for $REASONS"
> >
> > I do not thing the private node case changes this, if anything it only
> > changes where the capacity ends up.
> >
> Thing is, there will be things like CXL switches. And with that we'll get
> CXL memory behind the switch, making it possible to reshuffle memory
> 'behind the back' of the application.
> While the situation is similar to the current memory hotplug case
> (and, in fact, the mechanism on the host side will be the same I guess),
> the problem is now that we have a bit more flexibility.
>
> The reason why one would want to reshuffle memory behind a CXL switch
> is to deallocate memory from one machine to reassign it to another
> machine. But as the request is just for 'memory' (not 'this particular
> CXL card holding _that_ memory'), the admin gets to decide _which_
> of the memory areas assigned to machine A should be moved to machine B.
> But how?
>
> And that basically is the question: Can we get the admin / orchestration
> a better idea which of the memory blocks should be preferred for
> reassignment?
> I'm sure there are applications which have a pretty flexible memory
> allocation strategy which, with some prodding, they would be happy to
> relinquish. But I'm equally sure there are applications which react
> extremely allergic to memory being pulled of underneath them.
> And then there are 'modern' applications, which also don't like that
> but for them it really doesn't matter as one can simply restart them.
>
> So it would be cool if we could address this, as then the admin
> /orchastration can make a far better choice which memory area to
> reassign.
> And it might even help in other scenarios (VM ballooning?), too.
>
I'm a little confused by how you imagine this memory actually gets used.
1) Are you hotplugging directly into the buddy as a normal NUMA node
and letting the kernel dole out allocations to anything?
- i.e.: existing add_memory_driver_managed() interface
2) Are you trying to plop the entire dynamically added extent into a
a specific workload? Something like ioremap/mremap or ZONE_DEVICE
exposed by a driver's /dev/fd ?
3) Are you reserving this region specifically for in-kernel/driver
use but not doled out to random users?
4) Are you trying to just plop an entire extent into a VM (in which
case you technically shouldn't even need to hotplug, in theory)?
5) Are you trying to just decide which memory to release based on how
much of it is used / hot / cold / etc?
I see lot of "Wondering if..." here based on what a switch COULD do, but
divorced from real use cases, 99.999% of what COULD be done is useless.
There are basically an infinite number of ways we should shuffle this
memory around - the actual question is what's useful?
Some use-case clarity here would be helpful.
~Gregory
^ permalink raw reply [flat|nested] 2+ messages in thread