linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Gregory Price <gregory.price@memverge.com>
To: Dragan Stancevic <dragan@stancevic.com>
Cc: lsf-pc@lists.linux-foundation.org, nil-migration@lists.linux.dev,
	linux-cxl@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory​
Date: Fri, 7 Apr 2023 20:05:50 -0400	[thread overview]
Message-ID: <ZDCv3lxLbquITy8M@memverge.com> (raw)
In-Reply-To: <5d1156eb-02ae-a6cc-54bb-db3df3ca0597@stancevic.com>

On Fri, Apr 07, 2023 at 04:05:31PM -0500, Dragan Stancevic wrote:
> Hi folks-
> 
> if it's not too late for the schedule...
> 
> I am starting to tackle VM live migration and hypervisor clustering over
> switched CXL memory[1][2], intended for cloud virtualization types of loads.
> 
> I'd be interested in doing a small BoF session with some slides and get into
> a discussion/brainstorming with other people that deal with VM/LM cloud
> loads. Among other things to discuss would be page migrations over switched
> CXL memory, shared in-memory ABI to allow VM hand-off between hypervisors,
> etc...
> 
> A few of us discussed some of this under the ZONE_XMEM thread, but I figured
> it might be better to start a separate thread.
> 
> If there is interested, thank you.
> 
> 
> [1]. High-level overview available at http://nil-migration.org/
> [2]. Based on CXL spec 3.0
> 
> --
> Peace can only come as a natural consequence
> of universal enlightenment -Dr. Nikola Tesla

I've been chatting about this with folks offline, figure i'll toss my
thoughts on the issue here.


Some things to consider:


1. If secure-compute is being used, then this mechanism won't work as
   pages will be pinned, and therefore not movable and excluded from
   using cxl memory at all.

   This issue does not exist with traditional live migration, because
   typically some kind of copy is used from one virtual space to another
   (i.e. RMDA), so pages aren't really migrated in the kernel memory
   block/numa node sense.


2. During the migration process, the memory needs to be forced not to be
   migrated to another node by other means (tiering software, swap,
   etc).  The obvious way of doing this would be to migrate and
   temporarily pin the page... but going back to problem #1 we see that
   ZONE_MOVABLE and Pinning are mutually exclusive.  So that's
   troublesome.


3. This is changing the semantics of migration from a virtual memory
   movement to a physical memory movement.  Typically you would expect
   the RDMA process for live migration to work something like...

   a) migration request arrives
   b) source host informs destination host of size requirements
   c) destination host allocations memory and passes a Virtual Address
      back to source host
   d) source host initates an RDMA from HostA-VA to HostB-VA
   e) CPU task is migrated

   Importantly, the allocation of memory by Host B handles the important
   step of creating HVA->HPA mappings, and the Extended/Nested Page
   Tables can simply be flushed and re-created after the VM is fully
   migrated.

   to long didn't read: live migration is a virtual address operation,
   and node-migration is a PHYSICAL address operation, the virtual
   addresses remain the same.

   This is problematic, as it's changing the underlying semantics of the
   migration operation.



Problem #1 and #2 are head-scratchers, but maybe solvable.

Problem #3 is the meat and potatos of the issue in my opinion. So lets
consider that a little more closely.

Generically: NIL Migration is basically a pass by reference operation.

The reference in this case is... the page tables.  You need to know how
to interpret the data in the CXL memory region on the remote host, and
that's a "relative page table translation" (to coin a phrase? I'm not
sure how to best describe it).

That's... complicated to say the least.
1) Pages on the physical hardware do not need to be contiguous
2) The CFMW on source and target host do not need to be mapped at the
   same place
3) There's not pre-allocation in these charts, and migration isn't
   targeted, so having the source-host "expertly place" the data isn't
   possible (right now, i suppose you could make kernel extensions).
4) Similar to problem #2 above, even with a pre-allocate added in, you
   would need to ensure those mappings were pinned during migration,
   lest the target host end up swapping a page or something.



An Option:  Make pages physically contiguous on migration to CXL

In this case, you don't necessarily care about the Host Virtual
Addresses, what you actually care about are the structure of the pages
in memory (are they physically contiguous? or do you need to
reconstruct the contiguity by inspecting the page tables?).

If a migration API were capable of reserving large swaths of contiguous
CXL memory, you could discard individual page information and instead
send page range information, reconstructing the virtual-physical
mappings this way.



That's about as far as I've thought about it so far.  Feel free to rip
it apart! :]

~Gregory


  parent reply	other threads:[~2023-04-08  0:06 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-07 21:05 Dragan Stancevic
2023-04-07 22:23 ` James Houghton
2023-04-07 23:17   ` David Rientjes
2023-04-08  1:33     ` Dragan Stancevic
2023-04-08 16:24     ` Dragan Stancevic
2023-04-08  0:05 ` Gregory Price [this message]
2023-04-11  0:56   ` Dragan Stancevic
2023-04-11  1:48     ` Gregory Price
2023-04-14  3:32       ` Dragan Stancevic
2023-04-14 13:16         ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Jonathan Cameron
2023-04-11  6:37   ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory​ Huang, Ying
2023-04-11 15:36     ` Gregory Price
2023-04-12  2:54       ` Huang, Ying
2023-04-12  8:38         ` David Hildenbrand
     [not found]           ` <CGME20230412111034epcas2p1b46d2a26b7d3ac5db3b0e454255527b0@epcas2p1.samsung.com>
2023-04-12 11:10             ` FW: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
2023-04-12 11:26               ` David Hildenbrand
     [not found]                 ` <CGME20230414084110epcas2p20b90a8d1892110d7ca3ac16290cd4686@epcas2p2.samsung.com>
2023-04-14  8:41                   ` Kyungsan Kim
2023-04-12 15:40               ` Matthew Wilcox
     [not found]                 ` <CGME20230414084114epcas2p4754d6c0d3c86a0d6d4e855058562100f@epcas2p4.samsung.com>
2023-04-14  8:41                   ` Kyungsan Kim
2023-04-12 15:15           ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory​ James Bottomley
2023-05-03 23:42             ` Dragan Stancevic
2023-04-12 15:26           ` Gregory Price
2023-04-12 15:50             ` David Hildenbrand
2023-04-12 16:34               ` Gregory Price
2023-04-14  4:16                 ` Dragan Stancevic
2023-04-14  3:33     ` Dragan Stancevic
2023-04-14  5:35       ` Huang, Ying
2023-04-09 17:40 ` Shreyas Shah
2023-04-11  1:08   ` Dragan Stancevic
2023-04-11  1:17     ` Shreyas Shah
2023-04-11  1:32       ` Dragan Stancevic
2023-04-11  4:33         ` Shreyas Shah
2023-04-14  3:26           ` Dragan Stancevic
     [not found] ` <CGME20230410030532epcas2p49eae675396bf81658c1a3401796da1d4@epcas2p4.samsung.com>
2023-04-10  3:05   ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
2023-04-10 17:46     ` [External] " Viacheslav A.Dubeyko
2023-04-14  3:27     ` Dragan Stancevic
2023-04-11 18:00 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory​ Dave Hansen
2023-05-09 15:08 ` Dragan Stancevic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZDCv3lxLbquITy8M@memverge.com \
    --to=gregory.price@memverge.com \
    --cc=dragan@stancevic.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=nil-migration@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox