Re: [LSF/MM/BPF TOPIC] Strategies for memory deallocation/movement for Dynamic Capacity Pooling

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [LSF/MM/BPF TOPIC] Strategies for memory deallocation/movement for Dynamic Capacity Pooling
       [not found]   ` <ad1b4dVCgG27A5AK@gourry-fedora-PF4VCD3F>
@ 2026-04-14  7:08     ` Hannes Reinecke
  2026-04-15  0:26       ` Gregory Price
  0 siblings, 1 reply; 2+ messages in thread
From: Hannes Reinecke @ 2026-04-14  7:08 UTC (permalink / raw)
  To: Gregory Price, Jonathan Cameron
  Cc: lsf-pc, linux-cxl, linux-fsdevel, linux-mm

On 4/13/26 23:10, Gregory Price wrote:
> On Mon, Apr 13, 2026 at 04:43:59PM +0100, Jonathan Cameron wrote:
>>>
>>> So quite some things to discuss; however, not sure if this isn't too
>>> much of an arcane topic which should rather be directed at places like
>>> LPC. But I'll let the PC decide.
>>
>> Superficially feels a bit arcane, particularly as we are currently
>> kicking untagged memory into the long grass as there are too many
>> open questions on how to present it at all (e.g. related to Gregory's
>> recent work on private nodes).  On recent CXL sync calls the proposal
>> has been to do tagged memory first and only support allocation of
>> all memory with a given tag in one go and full release.
>>
> 
> General consensus after last few months seems to be:
> 
> "While technically possible, untagged memory a bad idea for $REASONS"
> 
> I do not thing the private node case changes this, if anything it only
> changes where the capacity ends up.
> 
Thing is, there will be things like CXL switches. And with that we'll 
get CXL memory behind the switch, making it possible to reshuffle memory
'behind the back' of the application.
While the situation is similar to the current memory hotplug case
(and, in fact, the mechanism on the host side will be the same I guess),
the problem is now that we have a bit more flexibility.

The reason why one would want to reshuffle memory behind a CXL switch
is to deallocate memory from one machine to reassign it to another
machine. But as the request is just for 'memory' (not 'this particular
CXL card holding _that_ memory'), the admin gets to decide _which_
of the memory areas assigned to machine A should be moved to machine B.
But how?

And that basically is the question: Can we get the admin / orchestration
a better idea which of the memory blocks should be preferred for 
reassignment?
I'm sure there are applications which have a pretty flexible memory
allocation strategy which, with some prodding, they would be happy to 
relinquish. But I'm equally sure there are applications which react
extremely allergic to memory being pulled of underneath them.
And then there are 'modern' applications, which also don't like that
but for them it really doesn't matter as one can simply restart them.

So it would be cool if we could address this, as then the admin 
/orchastration can make a far better choice which memory area to
reassign.
And it might even help in other scenarios (VM ballooning?), too.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Strategies for memory deallocation/movement for Dynamic Capacity Pooling
  2026-04-14  7:08     ` [LSF/MM/BPF TOPIC] Strategies for memory deallocation/movement for Dynamic Capacity Pooling Hannes Reinecke
@ 2026-04-15  0:26       ` Gregory Price
  0 siblings, 0 replies; 2+ messages in thread
From: Gregory Price @ 2026-04-15  0:26 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jonathan Cameron, lsf-pc, linux-cxl, linux-fsdevel, linux-mm

On Tue, Apr 14, 2026 at 09:08:22AM +0200, Hannes Reinecke wrote:
> On 4/13/26 23:10, Gregory Price wrote:
> > On Mon, Apr 13, 2026 at 04:43:59PM +0100, Jonathan Cameron wrote:
> > > > 
> > > > So quite some things to discuss; however, not sure if this isn't too
> > > > much of an arcane topic which should rather be directed at places like
> > > > LPC. But I'll let the PC decide.
> > > 
> > > Superficially feels a bit arcane, particularly as we are currently
> > > kicking untagged memory into the long grass as there are too many
> > > open questions on how to present it at all (e.g. related to Gregory's
> > > recent work on private nodes).  On recent CXL sync calls the proposal
> > > has been to do tagged memory first and only support allocation of
> > > all memory with a given tag in one go and full release.
> > > 
> > 
> > General consensus after last few months seems to be:
> > 
> > "While technically possible, untagged memory a bad idea for $REASONS"
> > 
> > I do not thing the private node case changes this, if anything it only
> > changes where the capacity ends up.
> > 
> Thing is, there will be things like CXL switches. And with that we'll get
> CXL memory behind the switch, making it possible to reshuffle memory
> 'behind the back' of the application.
> While the situation is similar to the current memory hotplug case
> (and, in fact, the mechanism on the host side will be the same I guess),
> the problem is now that we have a bit more flexibility.
> 
> The reason why one would want to reshuffle memory behind a CXL switch
> is to deallocate memory from one machine to reassign it to another
> machine. But as the request is just for 'memory' (not 'this particular
> CXL card holding _that_ memory'), the admin gets to decide _which_
> of the memory areas assigned to machine A should be moved to machine B.
> But how?
> 
> And that basically is the question: Can we get the admin / orchestration
> a better idea which of the memory blocks should be preferred for
> reassignment?
> I'm sure there are applications which have a pretty flexible memory
> allocation strategy which, with some prodding, they would be happy to
> relinquish. But I'm equally sure there are applications which react
> extremely allergic to memory being pulled of underneath them.
> And then there are 'modern' applications, which also don't like that
> but for them it really doesn't matter as one can simply restart them.
> 
> So it would be cool if we could address this, as then the admin
> /orchastration can make a far better choice which memory area to
> reassign.
> And it might even help in other scenarios (VM ballooning?), too.
> 

I'm a little confused by how you imagine this memory actually gets used.

  1) Are you hotplugging directly into the buddy as a normal NUMA node
     and letting the kernel dole out allocations to anything?
     - i.e.: existing add_memory_driver_managed() interface

  2) Are you trying to plop the entire dynamically added extent into a
     a specific workload?  Something like ioremap/mremap or ZONE_DEVICE
     exposed by a driver's /dev/fd ?

  3) Are you reserving this region specifically for in-kernel/driver
     use but not doled out to random users?

  4) Are you trying to just plop an entire extent into a VM (in which
     case you technically shouldn't even need to hotplug, in theory)?

  5) Are you trying to just decide which memory to release based on how
     much of it is used / hot / cold / etc?

I see lot of "Wondering if..." here based on what a switch COULD do, but
divorced from real use cases, 99.999% of what COULD be done is useless.

There are basically an infinite number of ways we should shuffle this
memory around - the actual question is what's useful?

Some use-case clarity here would be helpful.

~Gregory


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-04-15  0:27 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <e06deabf-44ad-4f3f-b817-78506c5e3203@suse.de>
     [not found] ` <20260413164359.00001c86@huawei.com>
     [not found]   ` <ad1b4dVCgG27A5AK@gourry-fedora-PF4VCD3F>
2026-04-14  7:08     ` [LSF/MM/BPF TOPIC] Strategies for memory deallocation/movement for Dynamic Capacity Pooling Hannes Reinecke
2026-04-15  0:26       ` Gregory Price

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox