From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4140BC433F5 for ; Fri, 28 Jan 2022 00:47:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C41546B0071; Thu, 27 Jan 2022 19:47:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BF1A66B0073; Thu, 27 Jan 2022 19:47:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A75606B0074; Thu, 27 Jan 2022 19:47:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0237.hostedemail.com [216.40.44.237]) by kanga.kvack.org (Postfix) with ESMTP id 9753F6B0071 for ; Thu, 27 Jan 2022 19:47:25 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 5A41C8249980 for ; Fri, 28 Jan 2022 00:47:25 +0000 (UTC) X-FDA: 79077857250.29.B04C650 Received: from mail-pg1-f180.google.com (mail-pg1-f180.google.com [209.85.215.180]) by imf15.hostedemail.com (Postfix) with ESMTP id ADDFEA0002 for ; Fri, 28 Jan 2022 00:47:24 +0000 (UTC) Received: by mail-pg1-f180.google.com with SMTP id j10so3821669pgc.6 for ; Thu, 27 Jan 2022 16:47:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=iPzypwT9RpE5f0REEGQwwMZW+473s1sjHm1OUy2RZYs=; b=cXNfbi8gGOdOaX8W0lnXPihwEArCZIbf4JMmmYVYDIeQO6E3NRjqEv4aCv2Z0+7LfO bWo8yX77fcj6m3lirA9MWv0hh2p2VKlK3fP7+4Tfa7Q8wqxg7l/3l+Z4HpcGJO222KM+ 9yniav8p3lJ9V0q3Jygh0jVlpIFjLMTpgCwwkj3Cha1AAL3MywdP0ZM3peGlg5PkbtyU vDEgEgyxya96PKPj74n/aOUE6E4QRuCgb8JzhBQy0U4lV2eas/hg6IyhW1xj3heW/dFR cPgRxv5aFJETjbYqwnUzkoW1yZJJwH0IWWxhJ1eeMnTPgIu4LhoQrPrSTdBJKLosY4gI uQtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=iPzypwT9RpE5f0REEGQwwMZW+473s1sjHm1OUy2RZYs=; b=C8WXQP2dIRXKmpzSOH4zmMkPDbyGhMdWHRb4KZHESUaXYytaK5QaGc1GuginY9nbgh qy7PAJsAByy1NqihFWVfVN/p1uxIc5eHztIiKlQzoMWcDnSCsnxqLMkwvO7CSlGh5xi2 tzepNs6r8TPgzYfOKC9FaWSMVeUGPfZFjavSruzluUJhN6YMELJpx+zF4lA5eKtTxRe9 NPYfdRVNJ3KZpDZJymdGyLRILmh92iYfKxbneE8dqI0joMCbAFvtFRh7mFCsCKl6POT/ YBzJi/G9lX8qMI3Y+eMEK9jOG5xsfNJKGQFVSuBvo+x2/+PFEArFly9Kl9zuGd889c5U qsxg== X-Gm-Message-State: AOAM532rk9ST+IJVtZU4s2sbSuwaJQMXk6IJUBxCpo6txWzqLywm/pAc UfyrW2HzcSpcqvjrH8iU0KAA0uclZT2KLbnsKQFZ2Q== X-Google-Smtp-Source: ABdhPJyYHG4Afjsq6G+2KDJXjprfuJ7oSzYM/+qdbSIgEeGjrmesjQ0Xdcp+uge1cSuM0wmcuICRetGySHVn2Oi1u8Y= X-Received: by 2002:a62:784b:: with SMTP id t72mr5575652pfc.86.1643330843434; Thu, 27 Jan 2022 16:47:23 -0800 (PST) MIME-Version: 1.0 References: <07cedbe6-00ab-52fc-9475-c8d7120f5a95@jagalactic.com> <0100017e9c54d4e0-c1f1a7db-e2ab-4552-a7eb-8e4b56cd9528-000000@email.amazonses.com> In-Reply-To: <0100017e9c54d4e0-c1f1a7db-e2ab-4552-a7eb-8e4b56cd9528-000000@email.amazonses.com> From: Dan Williams Date: Thu, 27 Jan 2022 16:47:12 -0800 Message-ID: Subject: Re: Should bios always mark CXL DRAM as EFI_MEMORY_SP? To: John Groves Cc: "linux-cxl@vger.kernel.org" , Jonathan Cameron , Ben Widawsky , John Groves , Linux MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: nil X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: ADDFEA0002 X-Stat-Signature: wqbyu4n8yetfun9czscg1ho86uxnamr7 Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=intel-com.20210112.gappssmtp.com header.s=20210112 header.b=cXNfbi8g; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=intel.com (policy=none); spf=none (imf15.hostedemail.com: domain of dan.j.williams@intel.com has no SPF policy when checking 209.85.215.180) smtp.mailfrom=dan.j.williams@intel.com X-HE-Tag: 1643330844-47141 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000043, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: [ add linux-mm since my opinion is not the only one that matters here ] Responses inline below with only my Linux kernel developer hat on, i.e. not necessarily the view of $current_employer: On Thu, Jan 27, 2022 at 8:18 AM John Groves wrote: > > I=E2=80=99d like to seek some feedback and see whether a consensus exists= or can be developed regarding how system firmware (bios/efi/etc) should pr= esent CXL DRAM to a system in a pre-fabric world (CXL 1.1/2.0). > > > > The CXL spec, along with the Intel documentation are pretty specific and = useful, but one open issue seems not to be outright specified: should the s= ystem mark CXL-attached DRAM as =E2=80=9Cspecific purpose=E2=80=9D (EFI_MEM= ORY_SP)? Consistency across platforms is certainly desirable. If this behav= ior is not prescribed, we could end up with inconsistent behavior across se= rver and bios vendors. > > > > If this is already specified, no need to read on (but please point me to = where it=E2=80=99s specified). There is no specification for how an OS handles EFI_MEMORY_SP. Everything below is only a Linux perspective and likely any other OS you ask will give a different perspective. > Objective: I think everyone will likely agree that it should be possible = to use CXL DRAM as either general-purpose memory, or via DAX, or a mix. [..] > > =C2=B7 Currently cannot be online-converted to DAX-managed (can this= work? Is it intended to be working?) > There is no guaranteed way to un-online memory, especially ZONE_NORMAL memory. There are heuristics to make it fail less often, but in general it's not reliable so online-conversion to DAX-managed is not being attempted for the general case. > If online conversion from general-purpose to DAX is not going to work, it= seems that the default should preserve the ability to use it either way: m= ark the memory as EFI_MEMORY_SP. Yes, unfortunately that requires a paradigm shift for end users to make a policy decision about memory where they did not need to make one before. My hope is that distributions would set a default daxctl policy to just online soft-reserved (Linux term for EFI_MEMORY_SP) memory. That way savvy users have a control point to change the policy to varying degrees of exclusive access through a DAX-device instance / instances, and other users, that don't even know what EFI_MEMORY_SP is, will see just another NUMA node by default. > Is there a right and wrong answer re:EFI_MEMORY_SP? How important is it t= o have consistency across platforms? The Principle of Least Surprise applies, and the vast bulk of users simply don't know that they need to care about memory types and memory performance classes. The ones that do know and care are also likely the ones to be surprised if they can not guarantee 100% exclusive access, i.e. machines purpose built to run a workload where the application gets 100% of the high performance memory. The distro gets to decide the CONFIG_EFI_SOFT_RESERVE policy, and if it chooses CONFIG_EFI_SOFT_RESERVE=3Dy I think it should go further to ship daxctl and a policy that onlines it by default. https://github.com/pmem/ndctl/blob/main/Documentation/daxctl/daxctl-reconfi= gure-device.txt#L244 > If there is a consensus, the next question is who should express it. Perh= aps the CXL consortium. I=E2=80=99m a part of that, but it seemed like the = Linux dev community was the right place to start. EFI_MEMORY_SP is defined as a hint, so to me that effectively kicks all the policy questions over to OS specific / Distro specific solution space.