From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCF5AC55178 for ; Fri, 30 Oct 2020 20:37:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4BE88221FF for ; Fri, 30 Oct 2020 20:37:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="S22MpHoq" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4BE88221FF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 718A46B0036; Fri, 30 Oct 2020 16:37:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C8F96B005C; Fri, 30 Oct 2020 16:37:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 590616B005D; Fri, 30 Oct 2020 16:37:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0151.hostedemail.com [216.40.44.151]) by kanga.kvack.org (Postfix) with ESMTP id 248716B0036 for ; Fri, 30 Oct 2020 16:37:37 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 9F788181AEF1A for ; Fri, 30 Oct 2020 20:37:36 +0000 (UTC) X-FDA: 77429752512.09.debt37_300011827299 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id 80F55180AD804 for ; Fri, 30 Oct 2020 20:37:36 +0000 (UTC) X-HE-Tag: debt37_300011827299 X-Filterd-Recvd-Size: 6479 Received: from mail-ej1-f53.google.com (mail-ej1-f53.google.com [209.85.218.53]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Fri, 30 Oct 2020 20:37:35 +0000 (UTC) Received: by mail-ej1-f53.google.com with SMTP id 7so10386560ejm.0 for ; Fri, 30 Oct 2020 13:37:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=M0yxY74xWsPt5sZN5p++K4jkGrPBDD8koRyi9GnKBFk=; b=S22MpHoqEXvqyUrQAPw674b1W1Vz7Q4PyyQbYdBWx046WIj3kdsyiHvQFKF9Jt+G93 plND+AW82uz1ZPkiwR7VsnfABV7yDDYCzcRSepOPJRIRTnQl4AaXrnzSLj0jIp1rtQOj ElLFbcdMFQSZYHQ4mTyZalg8AoMNcMJViaIj5w7Sdk0so11pNODUmGa2IEKwtlxIoTBu yoziX6CQQZyL/o84slpcE+qc43TIIOe/U0oOQQ1OhV08K7jsnrnh2LCHpgpvTCUjiz2+ SLLwsiHr3zK3YQCagmRiotwx7A6+rA9ui/q0a7YpPNXRYO8Rfpz4rgI2kdVA8fL6Iqp3 RkhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=M0yxY74xWsPt5sZN5p++K4jkGrPBDD8koRyi9GnKBFk=; b=Udmgq5T5BULCmm1s7nEqgfWzoCBRNhRZBwqUSieqjSMhdU8Yp9SX/iG7DRGsan+wb8 ByxZ+DiSyYXwTmuzOFnlgyL3GfcurZoZVrrZ9sWRnj6BU3Ozc9rOda8at14jD+vwebmr lEE6NQa8E7lec/SqLGB53v/YOwB4kR9Iwwj+TpXPvvK9qnrMGux4aYARoru9EkLYcfcl ebdsdLH2dzmCBWrgCJQNo6o0DEVzoBNEJnE711relU5rKeKkDAfCjeqmq6V1YuzhiXf3 c81VuyfENY0ob3E42iiBJYTAPepQp/d8j51ePEAeHH5SqKpjLUjSQvgbxWYbj7jg/VTw rJsQ== X-Gm-Message-State: AOAM5303o+BfHxLO9/SLoygnJqZiMGBp7tx/zqloT7ShuGWoFEEckLmp 06w7lmpbgb8KnEz08bcWseksz9CxmTyl3mq/vEXu9w== X-Google-Smtp-Source: ABdhPJz2V/T5RbaTHtnCsE/t6M9HLRLRF46JjqGVltyFoc8XhA78ITBcqtYoWwv/7YiEowU+E1BhiyDGT6zDrxxVgOE= X-Received: by 2002:a17:906:d92c:: with SMTP id rn12mr1935796ejb.472.1604090253894; Fri, 30 Oct 2020 13:37:33 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Dan Williams Date: Fri, 30 Oct 2020 13:37:18 -0700 Message-ID: Subject: Re: Onlining CXL Type2 device coherent memory To: Vikram Sethi Cc: "linux-cxl@vger.kernel.org" , "Natu, Mahesh" , "Rudoff, Andy" , Jeff Smith , Mark Hairgrove , "jglisse@redhat.com" , David Hildenbrand , Linux MM , Linux ACPI Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Oct 28, 2020 at 4:06 PM Vikram Sethi wrote: > > Hello, > > I wanted to kick off a discussion on how Linux onlining of CXL [1] type 2 device > Coherent memory aka Host managed device memory (HDM) will work for type 2 CXL > devices which are available/plugged in at boot. A type 2 CXL device can be simply > thought of as an accelerator with coherent device memory, that also has a > CXL.cache to cache system memory. > > One could envision that BIOS/UEFI could expose the HDM in EFI memory map > as conventional memory as well as in ACPI SRAT/SLIT/HMAT. However, at least > on some architectures (arm64) EFI conventional memory available at kernel boot > memory cannot be offlined, so this may not be suitable on all architectures. That seems an odd restriction. Add David, linux-mm, and linux-acpi as they might be interested / have comments on this restriction as well. > Further, the device driver associated with the type 2 device/accelerator may > want to save off a chunk of HDM for driver private use. > So it seems the more appropriate model may be something like dev dax model > where the device driver probe/open calls add_memory_driver_managed, and > the driver could choose how much of the HDM it wants to reserve and how > much to make generally available for application mmap/malloc. Sure, it can always be driver managed. The trick will be getting the platform firmware to agree to not map it by default, but I suspect you'll have a hard time convincing platform-firmware to take that stance. The BIOS does not know, and should not care what OS is booting when it produces the memory map. So I think CXL memory unplug after the fact is more realistic than trying to get the BIOS not to map it. So, to me it looks like arm64 needs to reconsider its unplug stance. > Another thing to think about is whether the kernel relies on UEFI having fully > described NUMA proximity domains and end-end NUMA distances for HDM, > or whether the kernel will provide some infrastructure to make use of the > device-local affinity information provided by the device in the Coherent Device > Attribute Table (CDAT) via a mailbox, and use that to add a new NUMA node ID > for the HDM, and with the NUMA distances calculated by adding to the NUMA > distance of the host bridge/Root port with the device local distance. At least > that's how I think CDAT is supposed to work when kernel doesn't want to rely > on BIOS tables. The kernel can supplement the NUMA configuration from CDAT, but not if the memory is already enumerated in the EFI Memory Map and ACPI SRAT/HMAT. At that point CDAT is a nop because the BIOS has precluded the OS from consuming it. > A similar question on NUMA node ID and distances for HDM arises for CXL hotplug. > Will the kernel rely on CDAT, and create its own NUMA node ID and patch up > distances, or will it rely on BIOS providing PXM domain reserved at boot in > SRAT to be used later on hotplug? I don't expect the kernel to merge any CDAT data into the ACPI tables. Instead the kernel will optionally use CDAT as an alternative method to generate Linux NUMA topology independent of ACPI SRAT. Think of it like Linux supporting both ACPI and Open Firmware NUMA descriptions at the same time. CDAT is its own NUMA description domain unless BIOS has blurred the lines and pre-incorporated it into SRAT/HMAT. That said I think the CXL attached memory not described by EFI / ACPI is currently the NULL set.