From: Gregory Price <gregory.price@memverge.com>
To: Dragan Stancevic <dragan@stancevic.com>
Cc: lsf-pc@lists.linux-foundation.org, nil-migration@lists.linux.dev,
linux-cxl@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory
Date: Fri, 7 Apr 2023 20:05:50 -0400 [thread overview]
Message-ID: <ZDCv3lxLbquITy8M@memverge.com> (raw)
In-Reply-To: <5d1156eb-02ae-a6cc-54bb-db3df3ca0597@stancevic.com>
On Fri, Apr 07, 2023 at 04:05:31PM -0500, Dragan Stancevic wrote:
> Hi folks-
>
> if it's not too late for the schedule...
>
> I am starting to tackle VM live migration and hypervisor clustering over
> switched CXL memory[1][2], intended for cloud virtualization types of loads.
>
> I'd be interested in doing a small BoF session with some slides and get into
> a discussion/brainstorming with other people that deal with VM/LM cloud
> loads. Among other things to discuss would be page migrations over switched
> CXL memory, shared in-memory ABI to allow VM hand-off between hypervisors,
> etc...
>
> A few of us discussed some of this under the ZONE_XMEM thread, but I figured
> it might be better to start a separate thread.
>
> If there is interested, thank you.
>
>
> [1]. High-level overview available at http://nil-migration.org/
> [2]. Based on CXL spec 3.0
>
> --
> Peace can only come as a natural consequence
> of universal enlightenment -Dr. Nikola Tesla
I've been chatting about this with folks offline, figure i'll toss my
thoughts on the issue here.
Some things to consider:
1. If secure-compute is being used, then this mechanism won't work as
pages will be pinned, and therefore not movable and excluded from
using cxl memory at all.
This issue does not exist with traditional live migration, because
typically some kind of copy is used from one virtual space to another
(i.e. RMDA), so pages aren't really migrated in the kernel memory
block/numa node sense.
2. During the migration process, the memory needs to be forced not to be
migrated to another node by other means (tiering software, swap,
etc). The obvious way of doing this would be to migrate and
temporarily pin the page... but going back to problem #1 we see that
ZONE_MOVABLE and Pinning are mutually exclusive. So that's
troublesome.
3. This is changing the semantics of migration from a virtual memory
movement to a physical memory movement. Typically you would expect
the RDMA process for live migration to work something like...
a) migration request arrives
b) source host informs destination host of size requirements
c) destination host allocations memory and passes a Virtual Address
back to source host
d) source host initates an RDMA from HostA-VA to HostB-VA
e) CPU task is migrated
Importantly, the allocation of memory by Host B handles the important
step of creating HVA->HPA mappings, and the Extended/Nested Page
Tables can simply be flushed and re-created after the VM is fully
migrated.
to long didn't read: live migration is a virtual address operation,
and node-migration is a PHYSICAL address operation, the virtual
addresses remain the same.
This is problematic, as it's changing the underlying semantics of the
migration operation.
Problem #1 and #2 are head-scratchers, but maybe solvable.
Problem #3 is the meat and potatos of the issue in my opinion. So lets
consider that a little more closely.
Generically: NIL Migration is basically a pass by reference operation.
The reference in this case is... the page tables. You need to know how
to interpret the data in the CXL memory region on the remote host, and
that's a "relative page table translation" (to coin a phrase? I'm not
sure how to best describe it).
That's... complicated to say the least.
1) Pages on the physical hardware do not need to be contiguous
2) The CFMW on source and target host do not need to be mapped at the
same place
3) There's not pre-allocation in these charts, and migration isn't
targeted, so having the source-host "expertly place" the data isn't
possible (right now, i suppose you could make kernel extensions).
4) Similar to problem #2 above, even with a pre-allocate added in, you
would need to ensure those mappings were pinned during migration,
lest the target host end up swapping a page or something.
An Option: Make pages physically contiguous on migration to CXL
In this case, you don't necessarily care about the Host Virtual
Addresses, what you actually care about are the structure of the pages
in memory (are they physically contiguous? or do you need to
reconstruct the contiguity by inspecting the page tables?).
If a migration API were capable of reserving large swaths of contiguous
CXL memory, you could discard individual page information and instead
send page range information, reconstructing the virtual-physical
mappings this way.
That's about as far as I've thought about it so far. Feel free to rip
it apart! :]
~Gregory
next prev parent reply other threads:[~2023-04-08 0:06 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-07 21:05 Dragan Stancevic
2023-04-07 22:23 ` James Houghton
2023-04-07 23:17 ` David Rientjes
2023-04-08 1:33 ` Dragan Stancevic
2023-04-08 16:24 ` Dragan Stancevic
2023-04-08 0:05 ` Gregory Price [this message]
2023-04-11 0:56 ` Dragan Stancevic
2023-04-11 1:48 ` Gregory Price
2023-04-14 3:32 ` Dragan Stancevic
2023-04-14 13:16 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Jonathan Cameron
2023-04-11 6:37 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Huang, Ying
2023-04-11 15:36 ` Gregory Price
2023-04-12 2:54 ` Huang, Ying
2023-04-12 8:38 ` David Hildenbrand
[not found] ` <CGME20230412111034epcas2p1b46d2a26b7d3ac5db3b0e454255527b0@epcas2p1.samsung.com>
2023-04-12 11:10 ` FW: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
2023-04-12 11:26 ` David Hildenbrand
[not found] ` <CGME20230414084110epcas2p20b90a8d1892110d7ca3ac16290cd4686@epcas2p2.samsung.com>
2023-04-14 8:41 ` Kyungsan Kim
2023-04-12 15:40 ` Matthew Wilcox
[not found] ` <CGME20230414084114epcas2p4754d6c0d3c86a0d6d4e855058562100f@epcas2p4.samsung.com>
2023-04-14 8:41 ` Kyungsan Kim
2023-04-12 15:15 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory James Bottomley
2023-05-03 23:42 ` Dragan Stancevic
2023-04-12 15:26 ` Gregory Price
2023-04-12 15:50 ` David Hildenbrand
2023-04-12 16:34 ` Gregory Price
2023-04-14 4:16 ` Dragan Stancevic
2023-04-14 3:33 ` Dragan Stancevic
2023-04-14 5:35 ` Huang, Ying
2023-04-09 17:40 ` Shreyas Shah
2023-04-11 1:08 ` Dragan Stancevic
2023-04-11 1:17 ` Shreyas Shah
2023-04-11 1:32 ` Dragan Stancevic
2023-04-11 4:33 ` Shreyas Shah
2023-04-14 3:26 ` Dragan Stancevic
[not found] ` <CGME20230410030532epcas2p49eae675396bf81658c1a3401796da1d4@epcas2p4.samsung.com>
2023-04-10 3:05 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Kyungsan Kim
2023-04-10 17:46 ` [External] " Viacheslav A.Dubeyko
2023-04-14 3:27 ` Dragan Stancevic
2023-04-11 18:00 ` [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory Dave Hansen
2023-05-09 15:08 ` Dragan Stancevic
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZDCv3lxLbquITy8M@memverge.com \
--to=gregory.price@memverge.com \
--cc=dragan@stancevic.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=nil-migration@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox