From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9B7CC77B70 for ; Fri, 14 Apr 2023 04:16:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0CB876B0072; Fri, 14 Apr 2023 00:16:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 07B9D900003; Fri, 14 Apr 2023 00:16:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E5E5D900002; Fri, 14 Apr 2023 00:16:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D603D6B0072 for ; Fri, 14 Apr 2023 00:16:27 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A94D8A04E2 for ; Fri, 14 Apr 2023 04:16:27 +0000 (UTC) X-FDA: 80678684814.15.4B7B9D6 Received: from bumble.birch.relay.mailchannels.net (bumble.birch.relay.mailchannels.net [23.83.209.25]) by imf14.hostedemail.com (Postfix) with ESMTP id 33F46100010 for ; Fri, 14 Apr 2023 04:16:23 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=stancevic.com header.s=dreamhost header.b="aIZBSA/f"; arc=pass ("mailchannels.net:s=arc-2022:i=1"); spf=pass (imf14.hostedemail.com: domain of dragan@stancevic.com designates 23.83.209.25 as permitted sender) smtp.mailfrom=dragan@stancevic.com; dmarc=none ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681445784; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/as/Nvw3cN5cKfjIJaXeek6O7hbJZriTIkoM0kW2bkA=; b=HZzJ8mBDzsBz3wq7mmRgaCzGkq4tD9YiDUrtByOgkqveXieII+o1r6VxjbaZdc096Hq9nq 98Gh+TTNQ4JuiXGcDnwO51RDFJx+UdRIM9voX5HKvEBGXVHcE4CoQ9nfzMYCKETq01JGIj K/OkCN4RWO5e+N0V1G4bB5vRrQPBgWc= ARC-Authentication-Results: i=2; imf14.hostedemail.com; dkim=pass header.d=stancevic.com header.s=dreamhost header.b="aIZBSA/f"; arc=pass ("mailchannels.net:s=arc-2022:i=1"); spf=pass (imf14.hostedemail.com: domain of dragan@stancevic.com designates 23.83.209.25 as permitted sender) smtp.mailfrom=dragan@stancevic.com; dmarc=none ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1681445784; a=rsa-sha256; cv=pass; b=tFAZcv8mmaCLbhpohBkz5Q8jWbUsLx1PrHsLIvqop4kNBqtTsZ5C83MlAuwqqAcee//+Cs xDDr5BYlYP0b4dp/X7u3q3Uqi6eQ5ME4JWO5s6nU2kYAf3vUNFZsrmrM2YMf4DZPk+iMMB jZMysDeCFSKfwWOd1+HzSPeeASpATzw= X-Sender-Id: dreamhost|x-authsender|dragan@stancevic.com Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id DA94941F45; Fri, 14 Apr 2023 04:16:22 +0000 (UTC) Received: from pdx1-sub0-mail-a207.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 3E82F41E6F; Fri, 14 Apr 2023 04:16:22 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1681445782; a=rsa-sha256; cv=none; b=lEynDKe47Cr5VpiGDqzD71lybFYLkjlFRrLzu5E65c2og1K0MwEbtnCJFIwUZF7/3ii+mn j1uYA2NitC62hSBHvakYjs2PnJLWAGvQHOWPIpBLeSGnl61GYF415425BQMuEFbtHMiVg6 8C5i7EL1uikwUF+vo/AUdBqVaAYyOAXhqwdy1p3XsY0hmB5p0EgX/G3O0/B2HiOkaSfEuB P4V//Uq5sAlAb4qxNCuiaYC+hnKwlkXOI1A8TaX67M+2aBL6tCXe6jW3wXwcHtc9n0E5GK SxKqOd0fUpFGeim+eSnGLi+iMvRY4BSkf247YmDCG13ma5ZHZYFTSchVvbj8yg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1681445782; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/as/Nvw3cN5cKfjIJaXeek6O7hbJZriTIkoM0kW2bkA=; b=XaXxYD8iHqUHF/deLLJ/zN3SLZ3HBnpjQpUGwGhXtg8C2ZGf1B5Yu8CgL4PEUPEUqEKCUv C6NytmTZxJYZX6KlITiRwI42AiQBU3YTqLw8wIkS4deD2506r3LmhrVBuMET2lxu534pdy fR+W1ULe2lvcEVEwNJVWF4Z2VwG2a25hetY8NrelIKHSlx5wR3I8HIIBLdVl+W70CwH72m 0eYtb1olAcz+D2V6jlEtOcdvYk3++l2PhSg3+9X4RZWl+WW7Ymkeoc6rZELnX+kZr35zoN jTfIdg9GOtWk1f56I0gpAwdWRnIGisz/C6JeKbmYD2IvzwkvK4IIAnuAEHFM4A== ARC-Authentication-Results: i=1; rspamd-548d6c8f77-vr5t4; auth=pass smtp.auth=dreamhost smtp.mailfrom=dragan@stancevic.com X-Sender-Id: dreamhost|x-authsender|dragan@stancevic.com X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|dragan@stancevic.com X-MailChannels-Auth-Id: dreamhost X-Industry-Quick: 2223833e254dbf24_1681445782689_1851875865 X-MC-Loop-Signature: 1681445782689:1097326943 X-MC-Ingress-Time: 1681445782688 Received: from pdx1-sub0-mail-a207.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.125.42.185 (trex/6.7.2); Fri, 14 Apr 2023 04:16:22 +0000 Received: from [192.168.1.31] (99-160-136-52.lightspeed.nsvltn.sbcglobal.net [99.160.136.52]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: dragan@stancevic.com) by pdx1-sub0-mail-a207.dreamhost.com (Postfix) with ESMTPSA id 4PyNRn2Jqjz89; Thu, 13 Apr 2023 21:16:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=stancevic.com; s=dreamhost; t=1681445782; bh=/as/Nvw3cN5cKfjIJaXeek6O7hbJZriTIkoM0kW2bkA=; h=Date:Subject:To:Cc:From:Content-Type:Content-Transfer-Encoding; b=aIZBSA/fCH5c6Nabsd+Ebnf1Wu+/0nFY7OxWCsgmih0LU9VRn9nMybYJ8RcqiXJiH Ap9PlWmeRfTJtLqnh/3GP0tNj9tfGdwWZGgaj8d4oiaaQIv5y8xWK5BGUA60IPg2f/ oDxjaB2A0uBH1pFRB9VWva9Ji5Hxh7gZfkungdQBrYD0Bm76EC3si6QDvlzNlUQlM2 no2wxw7snald7uRillnEfpKnrPKo7kSiXTDpG3ydGEJDz0Y11mXVwKiamFrWJICYGW zo68SfgXk68k3mWaHbQMQMANbP1q7X83LA7/IS1phy2MmsomnhWeJn5L7i0otAn1o/ PItP3ws5U3cfw== Message-ID: <30f254de-5bbb-bfb9-7321-62dc70db0ba9@stancevic.com> Date: Thu, 13 Apr 2023 23:16:20 -0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: =?UTF-8?Q?Re=3a_=5bLSF/MM/BPF_TOPIC=5d_BoF_VM_live_migration_over_C?= =?UTF-8?B?WEwgbWVtb3J54oCL?= Content-Language: en-US To: Gregory Price , David Hildenbrand Cc: "Huang, Ying" , lsf-pc@lists.linux-foundation.org, nil-migration@lists.linux.dev, linux-cxl@vger.kernel.org, linux-mm@kvack.org References: <5d1156eb-02ae-a6cc-54bb-db3df3ca0597@stancevic.com> <87v8i22abl.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkjtzu7e.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Dragan Stancevic In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 33F46100010 X-Stat-Signature: kshfsrs3c3wes6unfzcc699uz9ate8r8 X-Rspam-User: X-HE-Tag: 1681445783-135071 X-HE-Meta: U2FsdGVkX1/45BJOnGBSS6XvUIX8beQxSUfxOwhKtW5WEnvlDAlrgKPhWCj2+j3YCY7ufkq1G1yyz4gaMCjo15id70ztWOEgAeLn2J3/2MlSJpXwcnhnIp6QUF3L/AcoHTuhs4oRV8f+ylAyXw9EbYGufizHekN5DNYAq2Rjft4HxDXEI/SvW1j0vN2j9obuLftuVgFtN/Pu12pWOP4a2DuvZRj51vorUZRIhMeGzDDTkNsNai2GW15QX2Heq60Mt45/CzOstz8thdAEyGrJCFv7IRWb7R5vnncWcirnqZqUTrHL9dloD4jYx4i5e4zIkyRFmqI3vL6EdQt514vCPKoZRKE/SghodycOuy8cIm2iBqecRR1aGfxOb0DkwKSiB4+UJHuf6h0dUNis8Otkgj/KE6naTSbHWXxaCW/4mdhNdpJougZet0qr3krb5cPR10gA0wAW2nIerstydlXN7odRzlZkdcBtwaGB9bSXOryEzIeFmU08tI3P3AgfkwrMl8RZ1Tqj4MYWWwVg9YWzZYX7gWTVRH6Mqg0lTtGr/tg+ezMs7X/YI949XkeHqLTfSvAwltBNm05pLVh1mSEaC/ywrUKKqk3Gur8enTjJY5CMyQmB/Eu+PasyfUHXW3RM8Ti9Zfltu4Mb7drsS+G3kc7yhS/KZTHX9GEIlZFWOXwQKW/AZqhcdB0UBAIffuyrohnugk1pTNHErHVjIcw7AGGf9ifR7sCdEw96ER7EKXo2HQdNykw5wQr4rFmmAAB+x6Pjv/YcOGY69wHQECdjeG6MVNfLA2ldsqWViO5Q4SAf244jqHI101Xj04wof0Bb+PjQ+hQt0N3zrY1byqjTo/VG/aUrshq8uTls/lF7U/aozbpLA6ODVpmdEvJRFzf1VE+aNKOokfa7/AtUz3aFiDoK5jws/Rif3bJe1RZUAGkIbhwKDiS80a8QrLAcFrkHpmcYpqINeT7o0P3EkYz RXHZNE3G hdKzYd9ce9Bt6gs2PLUfHozOFUSx70RJIg9bMJ60AQMsoLMUYufTHRFrggLCUPTfq5Cp+CVPuIJq3InYqNh5vWLdqg6s7+FUycLmrMLgRFQPOehYaEimwtpISdJOnDvl3hhdtcvYuYmSS6u1zxEqc/1gGaW/ggldMCIdnoS3ISeP4lbq3nMHoq722eP8ZJkWm8WSwObeP9breZM/5T9QCjyc7CSOUSf13L7w4zknpoSlhrhphn/6WM9LoEoRZr2JL0SaqxjMptEbCgQah/dC1BWOATwXTsD+r0ZUaEEuJ6KinNO1luWfHP3zM+rBsfMM+Gjr39xRs1Ck+n2Ii92MZZJU+y85NPMW4sM/R+SDCI7LGJJQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Gregory- On 4/12/23 11:34, Gregory Price wrote: > On Wed, Apr 12, 2023 at 05:50:55PM +0200, David Hildenbrand wrote: >> >> long-term: possibly forever, controlled by user space. In practice, anything >> longer than ~10 seconds ( best guess :) ). There can be long-term pinnings >> that are of very short duration, we just don't know what user space is up to >> and when it will decide to unpin. >> >> Assume user space requests to trigger read/write of a user space page to a >> file: the page is pinned, DMA is started, once DMA completes the page is >> unpinned. Short-term. User space does not control how long the page remains >> pinned. >> >> In contrast: >> >> Example #1: mapping VM guest memory into an IOMMU using vfio for PCI >> passthrough requires pinning the pages. Until user space decides to unmap >> the pages from the IOMMU, the pages will remain pinned. -> long-term >> >> Example #2: mapping a user space address range into an IOMMU to repeatedly >> perform RDMA using that address range requires pinning the pages. Until user >> space decides to unregister that range, the pages remain pinned. -> >> long-term >> >> Example #3: registering a user space address range with io_uring as a fixed >> buffer, such that io_uring OPS can avoid the page table walks by simply >> using the pinned pages that were looked up once. As long as the fixed buffer >> remains registered, the pages stay pinned. -> long-term >> >> -- >> Thanks, >> >> David / dhildenb >> > > That pretty much precludes live migration from using CXL as a transport > mechanism, since live migration would be a user-initiated process, you > would need what amounts to an atomic move between hosts to ensure pages > are not left pinned. Do you really need an atomic-in-between-hots? I mean, it's not really a failure if you are in the process of migrating pages onto the switched cxl memory memory and one of the pages is pulled out of cxl and back on the hypervisor. The running VM cpu can do loads and stores from either. So it's running, it's not affected. It's just that your migration is potentially "stalled" or "canceled". You only encounter issues when all your pages are on cxl and the other hypervisor is pulling pages out. > The more i'm reading the more i'm somewhat convinced CXL memory should > not allow pinning at all. I think you want to be able to somehow pin the pages on one hypervisor and unpin them on the other hypervisor. Or in some other way "pass ownership" between the hypervisor. Right? Because of the scenario I mention above, if your source hypervisor takes a page out of cxl, then your destination hypervisor has a hole in VMs address space and can't run it. > I suppose you could implement a new RDMA feature where the remote host's > CXL memory is temporarily mapped, data is migrated, and then that area > is unmapped. Basically the exact same RDMA mechanism, but using memory > instead of network. This would make the operation a kernel-controlled > if pin/unpin is required. That would move us from the shared memory in the CXL 3 spec into the sections on direct memory placement I think. Which in order of preference is a #2 for me personally and a "backup" plan if #1 shared memory doesn't pan out. > Lots to talk about. > > ~Gregory > -- -- Peace can only come as a natural consequence of universal enlightenment -Dr. Nikola Tesla