From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CCF9C7619A for ; Wed, 12 Apr 2023 15:51:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 22BA06B0074; Wed, 12 Apr 2023 11:51:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1DE406B0075; Wed, 12 Apr 2023 11:51:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0557A900002; Wed, 12 Apr 2023 11:51:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E634E6B0074 for ; Wed, 12 Apr 2023 11:51:02 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 9FBEE1A0123 for ; Wed, 12 Apr 2023 15:51:02 +0000 (UTC) X-FDA: 80673177564.16.676C8A4 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 44063A0005 for ; Wed, 12 Apr 2023 15:51:00 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=F2Bon8M6; spf=pass (imf25.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681314660; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=b8Udey6gD7hkKCelqudc6L+j8l+ybiTD+45ym6JKdjU=; b=vdbUf8Q9rEEizK2VdyxNvVmF9X7r4KlTqQWmqZv9Byadb+M1GMhTYf/xKhpthSwHxyrux7 SB4b+krn2U7upJ7M1/EanvCYPdnIse04p18Hy1aKoTuRQ2pMxYxRwdHPZLZSRrLsVrEDnA PJx+W2i1qeMDXiVCmTwoGgWc8Y+Wc7Y= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=F2Bon8M6; spf=pass (imf25.hostedemail.com: domain of david@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681314660; a=rsa-sha256; cv=none; b=maW4vNcec3eMnst+HCfWvw05ZD8CT+lXMKlbjJof6SiWYUm2KiIa/Q4pGURsm9sFiz1Exn JuIIxfYtcxiXJKZ6k8BXWoY5VrbikLlamRRIO8MszysBBBSu81Ewv3/m5qLbeegpih60Bn ZuRrPTYZ1cQCAAw+d3AhEF+gK3ch3XM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681314659; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=b8Udey6gD7hkKCelqudc6L+j8l+ybiTD+45ym6JKdjU=; b=F2Bon8M6GxJ3LePc2CRAb0y+Z3TOJNbJpbXU+F7B9cGXikhccIteO2fXWaTeHN/8ndFEzn 3UddRIHJBZLxMA8O1P/n9L/r0nBlRm0yNZGbQIW+eW39zMHG03V3trDk6KGrI0+YknHBX6 SUkYvdN7qHl87FzM40rJnkEWiY0OWWs= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-624-Nxy1WKdaNuGMbK8FtY72Wg-1; Wed, 12 Apr 2023 11:50:58 -0400 X-MC-Unique: Nxy1WKdaNuGMbK8FtY72Wg-1 Received: by mail-wr1-f72.google.com with SMTP id e25-20020adfa459000000b002cfe70737d2so1987815wra.1 for ; Wed, 12 Apr 2023 08:50:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681314657; x=1683906657; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=b8Udey6gD7hkKCelqudc6L+j8l+ybiTD+45ym6JKdjU=; b=brPtMuFx7DXfTeAwvRc3n/+jJOfAQ7ZskZhNe5/GoKvgRq68Y1r0FcZyah5LTB2J6E N1zDykGOAGwirE6gshgJvHVHASjbQeQEdo+IIEDHiDQ/vC+BrsRtSecEEfjOP9eUNMLq mnfBUL/nsJUtyCeZm9K11TUj4sTau75s1ZI0vYSS+eGniBiEKJ842QAoDEdEEdLpbJ3n m/zEDKEHTGXf2OiMbxdU/Xj4YxrHhZZaow7ISrtMB3RnB6bjyjQ0ljf2Zuk71+JS0mNr Sun4R+J/5Cu1VSBhI4WQgLmn+UCkHslZng0EV2HOstRzfnpM++4p8VwlPZUYTgJO9W/C r/DA== X-Gm-Message-State: AAQBX9e7y5rVGFbdghAQgkSMWNBrR2/XPYL5ym4PR9JMISBivQDDG9si Wqi2/wN7qVzTUrELUBRCJ/WRWNI+YxR7nAqcbWNp3LUGUjw+OrkObpyYmKdeTy1T5ODCm3JpOlZ CR8G/0QG8hZ0= X-Received: by 2002:a7b:c3c8:0:b0:3ed:f5b5:37fc with SMTP id t8-20020a7bc3c8000000b003edf5b537fcmr2353183wmj.1.1681314657048; Wed, 12 Apr 2023 08:50:57 -0700 (PDT) X-Google-Smtp-Source: AKy350bwlS0al9uG2dSdIlKmYUi9C35HoaEAtA4olxB/jHjibws68jMXRqEnTf209NkApV4poz1+tg== X-Received: by 2002:a7b:c3c8:0:b0:3ed:f5b5:37fc with SMTP id t8-20020a7bc3c8000000b003edf5b537fcmr2353169wmj.1.1681314656717; Wed, 12 Apr 2023 08:50:56 -0700 (PDT) Received: from ?IPV6:2003:cb:c702:4b00:c6fa:b613:dbdc:ab? (p200300cbc7024b00c6fab613dbdc00ab.dip0.t-ipconnect.de. [2003:cb:c702:4b00:c6fa:b613:dbdc:ab]) by smtp.gmail.com with ESMTPSA id c9-20020a05600c0a4900b003ee6aa4e6a9sm2935483wmq.5.2023.04.12.08.50.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 12 Apr 2023 08:50:56 -0700 (PDT) Message-ID: Date: Wed, 12 Apr 2023 17:50:55 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 To: Gregory Price Cc: "Huang, Ying" , Dragan Stancevic , lsf-pc@lists.linux-foundation.org, nil-migration@lists.linux.dev, linux-cxl@vger.kernel.org, linux-mm@kvack.org References: <5d1156eb-02ae-a6cc-54bb-db3df3ca0597@stancevic.com> <87v8i22abl.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkjtzu7e.fsf@yhuang6-desk2.ccr.corp.intel.com> From: David Hildenbrand Organization: Red Hat Subject: =?UTF-8?Q?Re=3a_=5bLSF/MM/BPF_TOPIC=5d_BoF_VM_live_migration_over_C?= =?UTF-8?B?WEwgbWVtb3J54oCL?= In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 44063A0005 X-Rspam-User: X-Stat-Signature: ugeg6pu5iut3cq7hby3a6do47ot7zd95 X-HE-Tag: 1681314660-299674 X-HE-Meta: U2FsdGVkX18lvFQvMYmqkyuw2Ao4+9f0/iQW1qcTuzWWZNQuKu6GHBfTikR4AsONd6g/Bc0wHEsM1SBgUA5zjlEwDoEb818l+EwK8POHP1Ts+qqjc6ITfs43VWwvf8c4hjLdZtqphXjwYqOucqdOcq5XtA9gDT5OG6g+01ehRIS7bktOJLgvk53F5bbR5MMpY7G0wyJ/2RMadaj4T9HgKCPbAaLspOqFAGlenIciQvLHOWP1v7p2UX4C3JiKU6d6vRJnkHnsK55YEZgITLa5WCFrYOFixx4wxy9A2JJDHKEIZjuDdvy2dxzwyl2KUp9mDEi4ZKSSKKMmxH9UrLWkW73GpLdqNWMu2p4sWJrJTcFBTvwO5gVaL4ol28oSDBInCwWvkGVs5mZPVFqZjBXtC6jW4BWOFiV67gNfVVU0iTkQ+IvL2DHTmOQi77ZUE3xrWggTDm/Eu5ZfK1K/aeeoY7EUjFWaIk/dt+Sk2HllDXUAHAX2qQymj3mIAWTQEZWyWOtUMpaGFDsaUjdEycGQPR9TBK0H/aAXwLRHRHG82VEfFfBhKbbO9pV3/agSRA+tn1dbka588S50cTD6RSpiC9G907W0Fu4mO7GpbqwA9ByY1AlVqDd4rxA7K25qyZxSbPGtm9KCjwiIIwQj4QjDKumm+SBOs7XMXWdeKdDy1LRK6O9FCyelkrn6h28opbyHZPQvPLnjX1akuU9lWy+3mkLLcZ4RyUtxRfKpLXY4PORce/RacAdsk7gj+obbVqNSHMGfFWx4I7wunwDL2z8zu9in3uTJe+fYy4hB445XFZtz9bb2A3xvHkQF6TEVaR1WLN5HHRNfyPPJ7XZFK8Yt6WaynA7+VgPkKSGaKq2ifA86aVA4hQsu6D3qHFJwThNwSS1YTDQa5ncKI6CLDCuseAplv1sXSszVuaPqKpOGigevU1V4k629csPRMaKPl7vS/lX+zQ42a/zuBu5YXp6 jNvNTdIY 21ksxHjvEycumSPBR48MdHbqKxK+X0nhtiKA8szwqqkSxBzaGmWsRKoHipcQ5MBxq1ssilwWpF4NaB1805caNBrJ3gpMeVnMMQvkOcxF7LqQw/oBLc1Nlimrsd1WlLj/EJHaDPpl6Q7im1uoeKyuvPBJyP/BSgviU1+UDvO97sEznnoKnQdNMCE492sTuepL24NMp4l4igwjWBW4+pUi7+93OzOnEtRQcwwVbvSWC6hE+guQymqdNFjWgKyephQ2x2K4joe9B0kbuxKOnBwRabHpTftVNTzdnsXRBFpuLH2731lYIGLE/DDj6xZKg5G/VcM2watMxjiDfRrtIa42CJ+vV8BDDGFWNRn7Tm9beX69yhe59MB8V2SuBK7XDNSiNmX95n7baRHcfFe85/bcSofdY250KT6nwktlIFVyGfXhTstXauvMAbvhKxMvsrNr7iud98NnXCpLtA9G4L2+LM91LrBlVTtagILN0Hd+gIIpPkQNInJT2o4HI66YwTOrqX685 X-Bogosity: Ham, tests=bogofilter, spamicity=0.044475, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12.04.23 17:26, Gregory Price wrote: > On Wed, Apr 12, 2023 at 10:38:04AM +0200, David Hildenbrand wrote: >> On 12.04.23 04:54, Huang, Ying wrote: >>> Gregory Price writes: >>> >>>> On Tue, Apr 11, 2023 at 02:37:50PM +0800, Huang, Ying wrote: >>>>> Gregory Price writes: >>>>> >>>>> [snip] >>>>> >>>>>> 2. During the migration process, the memory needs to be forced not to be >>>>>> migrated to another node by other means (tiering software, swap, >>>>>> etc). The obvious way of doing this would be to migrate and >>>>>> temporarily pin the page... but going back to problem #1 we see that >>>>>> ZONE_MOVABLE and Pinning are mutually exclusive. So that's >>>>>> troublesome. >>>>> >>>>> Can we use memory policy (cpusets, mbind(), set_mempolicy(), etc.) to >>>>> avoid move pages out of CXL.mem node? Now, there are gaps in tiering, >>>>> but I think it is fixable. >>>>> >>>>> Best Regards, >>>>> Huang, Ying >>>>> >>>>> [snip] >>>> >>>> That feels like a hack/bodge rather than a proper solution to me. >>>> >>>> Maybe this is an affirmative argument for the creation of an EXMEM >>>> zone. >>> >>> Let's start with requirements. What is the requirements for a new zone >>> type? >> >> I'm stills scratching my head regarding this. I keep hearing all different >> kind of statements that just add more confusions "we want it to be >> hotunpluggable" "we want to allow for long-term pinning memory" "but we >> still want it to be movable" "we want to place some unmovable allocations on >> it". Huh? >> >> Just to clarify: ZONE_MOVABLE allows for pinning. It just doesn't allow for >> long-term pinning of memory. >> > > I apologize for the confusion, this is my fault. I had assumed that > since dax regions can't be pinned, subsequent nodes backed by a dax > device could not be pinned. In testing this, this is not the case. > > Re: long-term pinning, can you be more explicit as to what is considered > long-term? Minutes? hours? days? etc long-term: possibly forever, controlled by user space. In practice, anything longer than ~10 seconds ( best guess :) ). There can be long-term pinnings that are of very short duration, we just don't know what user space is up to and when it will decide to unpin. Assume user space requests to trigger read/write of a user space page to a file: the page is pinned, DMA is started, once DMA completes the page is unpinned. Short-term. User space does not control how long the page remains pinned. In contrast: Example #1: mapping VM guest memory into an IOMMU using vfio for PCI passthrough requires pinning the pages. Until user space decides to unmap the pages from the IOMMU, the pages will remain pinned. -> long-term Example #2: mapping a user space address range into an IOMMU to repeatedly perform RDMA using that address range requires pinning the pages. Until user space decides to unregister that range, the pages remain pinned. -> long-term Example #3: registering a user space address range with io_uring as a fixed buffer, such that io_uring OPS can avoid the page table walks by simply using the pinned pages that were looked up once. As long as the fixed buffer remains registered, the pages stay pinned. -> long-term -- Thanks, David / dhildenb