From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE8BEC2D0A3 for ; Mon, 2 Nov 2020 18:03:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5835F21D91 for ; Mon, 2 Nov 2020 18:03:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="e3280KJR" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5835F21D91 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A9A776B0036; Mon, 2 Nov 2020 13:03:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A23BD6B005C; Mon, 2 Nov 2020 13:03:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8A1116B0068; Mon, 2 Nov 2020 13:03:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0141.hostedemail.com [216.40.44.141]) by kanga.kvack.org (Postfix) with ESMTP id 5B69D6B0036 for ; Mon, 2 Nov 2020 13:03:33 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id DCEDA82499B9 for ; Mon, 2 Nov 2020 18:03:32 +0000 (UTC) X-FDA: 77440250664.12.door74_180b96d272b2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id C88A518052F8D for ; Mon, 2 Nov 2020 18:03:31 +0000 (UTC) X-HE-Tag: door74_180b96d272b2 X-Filterd-Recvd-Size: 7020 Received: from mail-ej1-f68.google.com (mail-ej1-f68.google.com [209.85.218.68]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Mon, 2 Nov 2020 18:03:30 +0000 (UTC) Received: by mail-ej1-f68.google.com with SMTP id o9so18045803ejg.1 for ; Mon, 02 Nov 2020 10:03:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=nmufHlN7PEujC/sOqa9SNa26JVNl51LU7nJggN5+u5I=; b=e3280KJR0jxfgMu0wD9WOKNR0rIAFMrLrVk6sPhZhxV/4ZVNzNmHKspxecd5l+V1ah Qt0WEGkMFYD4YhAr+tAFpUonbH+ld3DqGyJ1VKcxGUTLOrpoc7WSGF/zXSLYHUgOTZB4 cppyjc2e9yDHzttGQV2BEg7w3eomVu7+tvKatvpMNuDkQViTX6Q6A5nomVOhIgTxcfdF 1RAyOPJg/Nrd9EY91TeuV3xc7VnXvz036UUFugrx3aEley/j+PCxTU7EDdkFK6CuGFcE dVEzUuzwMX3j3ZbD3pT75KqSaiZVFLXspKZgtp4f4E42IZTxe6xqzHmk9VRxpXZrzLWx mlAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=nmufHlN7PEujC/sOqa9SNa26JVNl51LU7nJggN5+u5I=; b=Bz3c2iiGnzYMVImK6QWfNoPksL0G7PVHEfQOXBJaBz4rtZhemG9kFIbg7IXW5Qfm23 /E1+joga1ZFfoFBfr2l1JXWt956nZxu7NnS5+Ds6tWS/Y77Q0vLUFy+gri9y2EPUH13E LH9Y4t9KcJ0OfeHdxt+jmemYknKF8oNmkm5V9YAtwni2/Ql5uA+yjbIHDxCiX0+rhmDy c8H6xgcQc0Xqfu6PAgkW+FUxzfPh3N0FTEWRxEL6KJorZ0pWmPkxFpyv8gH+DGWeTXIc gEQR+0Gm7PMVHDYQHIuypB21yuDpv16sQeYy0JXx/wKgYCQCE0aidmwJC5I42mm7InNP UaYQ== X-Gm-Message-State: AOAM530w15U4z4ehHRQjasxKgeghTUzFGa0WCX10vxumYdslmCgAe5c1 o3p7TYH6VfkMfRplGS9X7/tDucdwOl4vxsqCB2+a9A== X-Google-Smtp-Source: ABdhPJx84KmXGYY1qSAAmTqWyWTY7uzYVX9PJp29vAKJJ+OmvdQnWhirb250MJnNQiDindqjU6p2UCetER7HakL35rI= X-Received: by 2002:a17:906:280a:: with SMTP id r10mr16327458ejc.45.1604340208573; Mon, 02 Nov 2020 10:03:28 -0800 (PST) MIME-Version: 1.0 References: <451b2571-c3e8-97d8-bfd0-f8054a1b75c5@redhat.com> <958912b2-1436-378f-43d7-cbc5c8955ffd@redhat.com> <2f9fa312-e080-d995-eb82-1ac9e6128a33@redhat.com> In-Reply-To: <2f9fa312-e080-d995-eb82-1ac9e6128a33@redhat.com> From: Dan Williams Date: Mon, 2 Nov 2020 10:03:16 -0800 Message-ID: Subject: Re: Onlining CXL Type2 device coherent memory To: David Hildenbrand Cc: Vikram Sethi , "linux-cxl@vger.kernel.org" , "Natu, Mahesh" , "Rudoff, Andy" , Jeff Smith , Mark Hairgrove , "jglisse@redhat.com" , Linux MM , Linux ACPI , Anshuman Khandual , "alex.williamson@redhat.com" , Samer El-Haj-Mahmoud , Shanker Donthineni , Joao Martins Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Nov 2, 2020 at 9:53 AM David Hildenbrand wrote: > > On 02.11.20 17:17, Vikram Sethi wrote: > > Hi David, > >> From: David Hildenbrand > >> On 31.10.20 17:51, Dan Williams wrote: > >>> On Sat, Oct 31, 2020 at 3:21 AM David Hildenbrand = wrote: > >>>> > >>>> On 30.10.20 21:37, Dan Williams wrote: > >>>>> On Wed, Oct 28, 2020 at 4:06 PM Vikram Sethi wr= ote: > >>>>>> > >>>>>> Hello, > >>>>>> > >>>>>> I wanted to kick off a discussion on how Linux onlining of CXL [1]= type 2 > >> device > >>>>>> Coherent memory aka Host managed device memory (HDM) will work for > >> type 2 CXL > >>>>>> devices which are available/plugged in at boot. A type 2 CXL devic= e can be > >> simply > >>>>>> thought of as an accelerator with coherent device memory, that als= o has a > >>>>>> CXL.cache to cache system memory. > >>>>>> > >>>>>> One could envision that BIOS/UEFI could expose the HDM in EFI memo= ry map > >>>>>> as conventional memory as well as in ACPI SRAT/SLIT/HMAT. However,= at > >> least > >>>>>> on some architectures (arm64) EFI conventional memory available at= kernel > >> boot > >>>>>> memory cannot be offlined, so this may not be suitable on all arch= itectures. > >>>>> > >>>>> That seems an odd restriction. Add David, linux-mm, and linux-acpi = as > >>>>> they might be interested / have comments on this restriction as wel= l. > >>>>> > >>>> > >>>> I am missing some important details. > >>>> > >>>> a) What happens after offlining? Will the memory be remove_memory()'= ed? > >>>> Will the device get physically unplugged? > >>>> > > Not always IMO. If the device was getting reset, the HDM memory is goin= g to be > > unavailable while device is reset. Offlining the memory around the rese= t would > > Ouch, that speaks IMHO completely against exposing it as System RAM as > default. > > > be sufficient, but depending if driver had done the add_memory in probe= , > > it perhaps would be onerous to have to remove_memory as well before res= et, > > and then add it back after reset. I realize you=E2=80=99re saying such = a procedure > > would be abusing hotplug framework, and we could perhaps require that m= emory > > be removed prior to reset, but not clear to me that it *must* be remove= d for > > correctness. > > > > Another usecase of offlining without removing HDM could be around > > Virtualization/passing entire device with its memory to a VM. If device= was > > being used in the host kernel, and is then unbound, and bound to vfio-p= ci > > (vfio-cxl?), would we expect vfio-pci to add_memory_driver_managed? > > At least for passing through memory to VMs (via KVM), you don't actually > need struct pages / memory exposed to the buddy via > add_memory_driver_managed(). Actually, doing that sounds like the wrong > approach. > > E.g., you would "allocate" the memory via devdax/dax_hmat and directly > map the resulting device into guest address space. At least that's what > some people are doing with ...and Joao is working to see if the host kernel can skip allocating 'struct page' or do it on demand if the guest ever requests host kernel services on its memory. Typically it does not so host 'struct page' space for devdax memory ranges goes wasted.