From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3069C388F7 for ; Sat, 31 Oct 2020 16:51:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4AF8C2065C for ; Sat, 31 Oct 2020 16:51:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="bHWZLHhd" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4AF8C2065C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8D3E16B0036; Sat, 31 Oct 2020 12:51:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 884596B005C; Sat, 31 Oct 2020 12:51:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 74D6F6B005D; Sat, 31 Oct 2020 12:51:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0200.hostedemail.com [216.40.44.200]) by kanga.kvack.org (Postfix) with ESMTP id 46BBB6B0036 for ; Sat, 31 Oct 2020 12:51:41 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id E65492478 for ; Sat, 31 Oct 2020 16:51:40 +0000 (UTC) X-FDA: 77432811960.28.dogs26_060f4cd272a0 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id C69CE6D64 for ; Sat, 31 Oct 2020 16:51:40 +0000 (UTC) X-HE-Tag: dogs26_060f4cd272a0 X-Filterd-Recvd-Size: 7790 Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Sat, 31 Oct 2020 16:51:38 +0000 (UTC) Received: by mail-ej1-f46.google.com with SMTP id s15so12810158ejf.8 for ; Sat, 31 Oct 2020 09:51:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=g2N2bRo+0Nu9FyFzR3jSXJbqcWCGERbqh2aT9VZgZe0=; b=bHWZLHhdgzYG9jxmwswDLjuL61ArDzTneQZ8u8Ea2FbBuXW1MLDxB10BCzP3jG7Fr0 Gx+gz7ODZzG+UnerSEJAaJyOOvVvhv5kIkZPvnAyd05JGJ9m5FFwXoqEKU2K8A71h0Nv +sv77gryarFaBBD+kmaIlW7n8gecCJHRD5Fx0TG3wYERDMKMEui1Z13r+yt8JL0gMfwp uPk19evhJB1iZUcanddX2dLaNSx0RnOrrYJwJYy5D+PzhK/Njb7zVl2B6KurYghUFtA7 3HZSgL+d5XJnPIjjTOM5tPl3trHvzfJBt4H5F2QPjcx/iTjplx1VIsQewaJn6iWf2cvp 9c7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=g2N2bRo+0Nu9FyFzR3jSXJbqcWCGERbqh2aT9VZgZe0=; b=FZ8D/xTn4Fx+uk2+DaQ+6LkY8dyJSdR/fbQnfkNH0vf6Kj6X9V/+Bx0mErzmAXezaX 5b53p6BXk8NGkpIBkozCEzjRkSJrdNsxWb60ZFZlDoal+Wl+U3j/SPlxxf1xJ73eKpRE dSosP7ZZBxQc2ECO5iDpxU6bWdFebQWzxmMjys7NAJJWDgsBBlHJNlQDLw6U4lc7l6Tw Aq4NoNUjNLs5WGYc1nEcUOoK7RCd0eWDPYc1u2JH2z8nhzGFoOqM6sGrk3y4e4a/JG1W TnJ4O+KSwGMK4LFKJteIlxdpvA57uv958sLQ+YKkdYIpe2G019J3+q25U224sQU4pDKA NNuA== X-Gm-Message-State: AOAM533Q2GlU3kM0wiW7EUOca+jw72c3MdMNUWhfU7yo1VRd1b90EyWd s7Fr+0OhnMHAaoQssQ4RmO0MidZ8tVLAuEB9rTW+zA== X-Google-Smtp-Source: ABdhPJx+yFu4DlwZbGmpKSiA9IsU63OA0ThjarRqDAyMcWC+4PuYmxCS5HAMbySsajs/H5zrMgFUTz47bCGoYMjCI24= X-Received: by 2002:a17:906:280a:: with SMTP id r10mr7805066ejc.45.1604163097253; Sat, 31 Oct 2020 09:51:37 -0700 (PDT) MIME-Version: 1.0 References: <451b2571-c3e8-97d8-bfd0-f8054a1b75c5@redhat.com> In-Reply-To: <451b2571-c3e8-97d8-bfd0-f8054a1b75c5@redhat.com> From: Dan Williams Date: Sat, 31 Oct 2020 09:51:23 -0700 Message-ID: Subject: Re: Onlining CXL Type2 device coherent memory To: David Hildenbrand Cc: Vikram Sethi , "linux-cxl@vger.kernel.org" , "Natu, Mahesh" , "Rudoff, Andy" , Jeff Smith , Mark Hairgrove , "jglisse@redhat.com" , Linux MM , Linux ACPI , Anshuman Khandual Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Oct 31, 2020 at 3:21 AM David Hildenbrand wrote: > > On 30.10.20 21:37, Dan Williams wrote: > > On Wed, Oct 28, 2020 at 4:06 PM Vikram Sethi wrote: > >> > >> Hello, > >> > >> I wanted to kick off a discussion on how Linux onlining of CXL [1] type 2 device > >> Coherent memory aka Host managed device memory (HDM) will work for type 2 CXL > >> devices which are available/plugged in at boot. A type 2 CXL device can be simply > >> thought of as an accelerator with coherent device memory, that also has a > >> CXL.cache to cache system memory. > >> > >> One could envision that BIOS/UEFI could expose the HDM in EFI memory map > >> as conventional memory as well as in ACPI SRAT/SLIT/HMAT. However, at least > >> on some architectures (arm64) EFI conventional memory available at kernel boot > >> memory cannot be offlined, so this may not be suitable on all architectures. > > > > That seems an odd restriction. Add David, linux-mm, and linux-acpi as > > they might be interested / have comments on this restriction as well. > > > > I am missing some important details. > > a) What happens after offlining? Will the memory be remove_memory()'ed? > Will the device get physically unplugged? > > b) What's the general purpose of the memory and its intended usage when > *not* exposed as system RAM? What's the main point of treating it like > ordinary system RAM as default? > > Also, can you be sure that you can offline that memory? If it's > ZONE_NORMAL (as usually all system RAM in the initial map), there are no > such guarantees, especially once the system ran for long enough, but > also in other cases (e.g., shuffling), or if allocation policies change > in the future. > > So I *guess* you would already have to use kernel cmdline hacks like > "movablecore" to make it work. In that case, you can directly specify > what you *actually* want (which I am not sure yet I completely > understood) - e.g., something like "memmap=16G!16G" ... or something > similar. > > I consider offlining+removing *boot* memory to not physically unplug it > (e.g., a DIMM getting unplugged) abusing the memory hotunplug > infrastructure. It's a different thing when manually adding memory like > dax_kmem does via add_memory_driver_managed(). > > > Now, back to your original question: arm64 does not support physically > unplugging DIMMs that were part of the initial map. If you'd reboot > after unplugging a DIMM, your system would crash. We achieve that by > disallowing to offline boot memory - we could also try to handle it in > ACPI code. But again, most uses of offlining+removing boot memory are > abusing the memory hotunplug infrastructure and should rather be solved > cleanly via a different mechanism (firmware, kernel cmdline, ...). > > Just recently discussed in > > https://lkml.kernel.org/r/de8388df2fbc5a6a33aab95831ba7db4@codeaurora.org > > >> Further, the device driver associated with the type 2 device/accelerator may > >> want to save off a chunk of HDM for driver private use. > >> So it seems the more appropriate model may be something like dev dax model > >> where the device driver probe/open calls add_memory_driver_managed, and > >> the driver could choose how much of the HDM it wants to reserve and how > >> much to make generally available for application mmap/malloc. > > > > Sure, it can always be driver managed. The trick will be getting the > > platform firmware to agree to not map it by default, but I suspect > > you'll have a hard time convincing platform-firmware to take that > > stance. The BIOS does not know, and should not care what OS is booting > > when it produces the memory map. So I think CXL memory unplug after > > the fact is more realistic than trying to get the BIOS not to map it. > > So, to me it looks like arm64 needs to reconsider its unplug stance. > > My personal opinion is, if memory isn't just "ordinary system RAM", then > let the system know early that memory is special (as we do with > soft-reserved). > > Ideally, you could configure the firmware (e.g., via BIOS setup) on what > to do, that's the cleanest solution, but I can understand that's rather > hard to achieve. Yes, my hope, which is about the most influence I can have on platform-firmware implementations, is that it marks CXL attached memory as soft-reserved by default and allow OS policy decide where it goes. Barring that, for the configuration that Vikram mentioned, the only other way to get this differentiated / not-ordinary system-ram back to being driver managed would be to unplug it. The soft-reserved path is cleaner.