Re: Onlining CXL Type2 device coherent memory

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>,
	Vikram Sethi <vsethi@nvidia.com>
Cc: "linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"Natu, Mahesh" <mahesh.natu@intel.com>,
	"Rudoff, Andy" <andy.rudoff@intel.com>,
	Jeff Smith <JSMITH@nvidia.com>,
	Mark Hairgrove <mhairgrove@nvidia.com>,
	"jglisse@redhat.com" <jglisse@redhat.com>,
	Linux MM <linux-mm@kvack.org>,
	Linux ACPI <linux-acpi@vger.kernel.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>
Subject: Re: Onlining CXL Type2 device coherent memory
Date: Sat, 31 Oct 2020 11:21:26 +0100	[thread overview]
Message-ID: <451b2571-c3e8-97d8-bfd0-f8054a1b75c5@redhat.com> (raw)
In-Reply-To: <CAPcyv4jWFf0=VoA2EiXPaQphA-5z9JFO8h0Agy0dO0w6nDyorw@mail.gmail.com>

On 30.10.20 21:37, Dan Williams wrote:
> On Wed, Oct 28, 2020 at 4:06 PM Vikram Sethi <vsethi@nvidia.com> wrote:
>>
>> Hello,
>>
>> I wanted to kick off a discussion on how Linux onlining of CXL [1] type 2 device
>> Coherent memory aka Host managed device memory (HDM) will work for type 2 CXL
>> devices which are available/plugged in at boot. A type 2 CXL device can be simply
>> thought of as an accelerator with coherent device memory, that also has a
>> CXL.cache to cache system memory.
>>
>> One could envision that BIOS/UEFI could expose the HDM in EFI memory map
>> as conventional memory as well as in ACPI SRAT/SLIT/HMAT. However, at least
>> on some architectures (arm64) EFI conventional memory available at kernel boot
>> memory cannot be offlined, so this may not be suitable on all architectures.
> 
> That seems an odd restriction. Add David, linux-mm, and linux-acpi as
> they might be interested / have comments on this restriction as well.
> 

I am missing some important details.

a) What happens after offlining? Will the memory be remove_memory()'ed? 
Will the device get physically unplugged?

b) What's the general purpose of the memory and its intended usage when 
*not* exposed as system RAM? What's the main point of treating it like 
ordinary system RAM as default?

Also, can you be sure that you can offline that memory? If it's 
ZONE_NORMAL (as usually all system RAM in the initial map), there are no 
such guarantees, especially once the system ran for long enough, but 
also in other cases (e.g., shuffling), or if allocation policies change 
in the future.

So I *guess* you would already have to use kernel cmdline hacks like 
"movablecore" to make it work. In that case, you can directly specify 
what you *actually* want (which I am not sure yet I completely 
understood) - e.g., something like "memmap=16G!16G" ... or something 
similar.

I consider offlining+removing *boot* memory to not physically unplug it 
(e.g., a DIMM getting unplugged) abusing the memory hotunplug 
infrastructure. It's a different thing when manually adding memory like 
dax_kmem does via add_memory_driver_managed().

Now, back to your original question: arm64 does not support physically 
unplugging DIMMs that were part of the initial map. If you'd reboot 
after unplugging a DIMM, your system would crash. We achieve that by 
disallowing to offline boot memory - we could also try to handle it in 
ACPI code. But again, most uses of offlining+removing boot memory are 
abusing the memory hotunplug infrastructure and should rather be solved 
cleanly via a different mechanism (firmware, kernel cmdline, ...).

Just recently discussed in

https://lkml.kernel.org/r/de8388df2fbc5a6a33aab95831ba7db4@codeaurora.org

>> Further, the device driver associated with the type 2 device/accelerator may
>> want to save off a chunk of HDM for driver private use.
>> So it seems the more appropriate model may be something like dev dax model
>> where the device driver probe/open calls add_memory_driver_managed, and
>> the driver could choose how much of the HDM it wants to reserve and how
>> much to make generally available for application mmap/malloc.
> 
> Sure, it can always be driver managed. The trick will be getting the
> platform firmware to agree to not map it by default, but I suspect
> you'll have a hard time convincing platform-firmware to take that
> stance. The BIOS does not know, and should not care what OS is booting
> when it produces the memory map. So I think CXL memory unplug after
> the fact is more realistic than trying to get the BIOS not to map it.
> So, to me it looks like arm64 needs to reconsider its unplug stance.

My personal opinion is, if memory isn't just "ordinary system RAM", then 
let the system know early that memory is special (as we do with 
soft-reserved).

Ideally, you could configure the firmware (e.g., via BIOS setup) on what 
to do, that's the cleanest solution, but I can understand that's rather 
hard to achieve.

-- 
Thanks,

David / dhildenb

next prev parent reply	other threads:[~2020-10-31 10:21 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <BL0PR12MB25321C8689BAFDF8678E5C69BD170@BL0PR12MB2532.namprd12.prod.outlook.com>
2020-10-30 20:37 ` Dan Williams
2020-10-30 20:59   ` Matthew Wilcox
2020-10-30 23:38     ` Dan Williams
2020-10-30 22:39   ` Vikram Sethi
2020-11-02 17:47     ` Dan Williams
2020-10-31 10:21   ` David Hildenbrand [this message]
2020-10-31 16:51     ` Dan Williams
2020-11-02  9:51       ` David Hildenbrand
2020-11-02 16:17         ` Vikram Sethi
2020-11-02 17:53           ` David Hildenbrand
2020-11-02 18:03             ` Dan Williams
2020-11-02 19:25               ` Vikram Sethi
2020-11-02 19:45                 ` Dan Williams
2020-11-03  3:56                 ` Alistair Popple
2020-11-02 18:34       ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=451b2571-c3e8-97d8-bfd0-f8054a1b75c5@redhat.com \
    --to=david@redhat.com \
    --cc=JSMITH@nvidia.com \
    --cc=andy.rudoff@intel.com \
    --cc=anshuman.khandual@arm.com \
    --cc=dan.j.williams@intel.com \
    --cc=jglisse@redhat.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mahesh.natu@intel.com \
    --cc=mhairgrove@nvidia.com \
    --cc=vsethi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox