From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6AFBC282EC for ; Fri, 14 Mar 2025 11:14:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F376280004; Fri, 14 Mar 2025 07:14:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A1A9280001; Fri, 14 Mar 2025 07:14:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 36965280004; Fri, 14 Mar 2025 07:14:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 1A844280001 for ; Fri, 14 Mar 2025 07:14:57 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 612E41A0E98 for ; Fri, 14 Mar 2025 11:14:57 +0000 (UTC) X-FDA: 83219899434.07.976ED8F Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf07.hostedemail.com (Postfix) with ESMTP id 6E9B340014 for ; Fri, 14 Mar 2025 11:14:55 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf07.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741950895; a=rsa-sha256; cv=none; b=ajlB5sa2iChkgKrOCXHTEiI+KzPSGhXz1f9V6w7hcIWi087vDZmmkA0LFRA9DZQ7OBLfzw Ph1sZm+xWmPlI2uVhDN667J9b1YwILQcVCN6aVYr3xDUH0s3Ybd3+OgosMEJXq4vWrC3Of eEGWh52zu2VXIfz+YVuTZgJUAHAIqa0= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf07.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741950895; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EDhh1x5+J5qnGQlwJkKCux69CHqdPCxSe5O5zra0Zho=; b=mgzhXj2UizCRWyjRWUU2uJY8XDInE+ZbFvMnuGtiyIaa+lsP6Yp52SkMZ0GUvUQFPGyVF1 5kRb0l5F4v91eX/3FHPnySkHI5AzQfCftO4W+2uQ54V2Z2tiK+alsLFJnZA2La1SBxQNHJ oltvFvobPw6sWcUa7oRuoVxF4ryeCGE= Received: from mail.maildlp.com (unknown [172.18.186.216]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4ZDhWw4xQpz6J7rS; Fri, 14 Mar 2025 19:11:40 +0800 (CST) Received: from frapeml500008.china.huawei.com (unknown [7.182.85.71]) by mail.maildlp.com (Postfix) with ESMTPS id C6460140A70; Fri, 14 Mar 2025 19:14:51 +0800 (CST) Received: from localhost (10.203.177.66) by frapeml500008.china.huawei.com (7.182.85.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Fri, 14 Mar 2025 12:14:51 +0100 Date: Fri, 14 Mar 2025 11:14:50 +0000 From: Jonathan Cameron To: Gregory Price CC: , , , Subject: Re: [LSF/MM] CXL Boot to Bash - Section 0: ACPI and Linux Resources Message-ID: <20250314111450.000011f2@huawei.com> In-Reply-To: References: <20250313165539.000001f4@huawei.com> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.42; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.203.177.66] X-ClientProxiedBy: lhrpeml100004.china.huawei.com (7.191.162.219) To frapeml500008.china.huawei.com (7.182.85.71) X-Rspamd-Server: rspam07 X-Rspam-User: X-Stat-Signature: kidw41ind9s1wik8rh6egq6j5uyti48w X-Rspamd-Queue-Id: 6E9B340014 X-HE-Tag: 1741950895-934274 X-HE-Meta: U2FsdGVkX1/TyyB5Kzf5/bvE4LFBwn3Xw60ShcfIrGWeQfvK4yDigjR5cqBNBv1Sa5wDtR+YSZ1lhnJC7D+eTg59Vp8p449ypLOKVF0hGm1kt52AwBDmUs7pQmnP8B3mMPwDzTGlQwOu+g4qEUNLZTMxEjeuCwPOhnfV5csv/WDi7fRU9I5pEkdteyP/8oIEaujpz3BhaF/w6G5dLTP+bgE4pAzebKdpEa6FRhyFdGTgskGXW/rzRnaV6uF5LSCt75XMGT2luM03d/KO49A1cCCtQxSpMugDtmPOqe2kQ3zZQ2t9brWFNHeBNZSFOTWZyJtoSf/cerlbw+QZXk99txqkErvPJ14LbPIOwMW1LRgCIdHubbuQOdu3fJ0pBC3FzOasow6ZomwlQbRmRxHPfQ88OvHrnGIsgEhr60uzOpjhlfcIV63I/8umOTkB6x+iejlo5JeBQhji4aLK/avvr9gJJ32KMwkDJZk0GbTebIXJ3BPTYQVSnxtV2OzcVBA6zgLAkyldGwoRrOUPVOOdMORVl6pABwcIWr6US52J5ku6mNsObyPLx2QI4BUeowW5beAkyYs8J2n+0uS2OF7Kc8BnWr7slHK4uI+PrDRgsJmh0imcVa97IYGDj/gTTsoqbhs1fOR3POG2ZljfP2edlqZpDyq/zNyhmZwXJJrecJcRQvFQQzyZOoZgr1RJKjp9Frm9j0iqEJ72e+/TuzfcfEvth/eiKF7JEtNBSP5hRuf8V+q7qtf8IoZi+CHF+vD+CzYw34Dr59PYndrwSMRbcWJVkzxR8zh8VUoR5RVvULOFpNHg3ONoSimP2E3GxcbzfNhTNMmzmAFJmx8kN4TRMI+ZR8wO5HSAn4mpjOOtfA8FfC/zXV7su0SMJnpH5pxAihFl32BN2N+LfT54flW8YCi6VczFMR8qg9HdSFhO9FZEDRtyvpbSxas+9PwEswT/y7oclojUK39O4xcGVvQ kWv/lVK2 OI/B+6oC5Nug7t6XODg4ev34/LSxzCWizke2PYPsE1Wj3H4S8xw5ND+52fltPRLaP0Fq7jDCVZLEYBHcoohbXFsMdgpTEtjgrBBfGTPfDVhyuZd3nYnBeYgfYlaSaWRw/OIt4EGCNc0qnzRDSavjIid7yTUuagRpYEr8lvh7AgG5n2VNEPa9Um/2CX2mS6X8xOCOZc+u1kIDh3csi3pT+j2zlCRRsKChaPUbPmSON/qP92yM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 13 Mar 2025 13:30:58 -0400 Gregory Price wrote: > On Thu, Mar 13, 2025 at 04:55:39PM +0000, Jonathan Cameron wrote: > > > > Maybe ignore Generic Initiators for this doc. They are relevant for > > CXL but in the fabric they only matter for type 1 / 2 devices not > > memory and only if the BIOS wants to do HMAT for end to end. Gets > > more fun when they are in the host side of the root bridge. > > > > Fair, I wanted to reference the proposals but I personally don't have a > strong understanding of this yet. Dave Jiang mentioned wanting to write > some info on CDAT with some reference to the Generic Port work as well. > > Some help understanding this a little better would be very much > appreciated, but I like your summary below. Noted for updated version. > > > # Generic Port > > > > In the scenario where CXL memory devices are not present at boot, or > > not configured by the BIOS or he BIOS has not provided full HMAT > > descriptions for the configured memory, we may still want to > > generate proximity domain configurations for those devices. > > The Generic Port structures are intended to fill this gap, so > > that performance information can still be utilized when the > > devices are available at runtime by combining host information > > with that discovered from devices. > > > > Or just > > # Generic Ports > > > > These are fun ;) > > > > > > > > > ==== > > > HMAT > > > ==== > > > The Heterogeneous Memory Attributes Table contains information such as > > > cache attributes and bandwidth and latency details for memory proximity > > > domains. For the purpose of this document, we will only discuss the > > > SSLIB entry. > > > > No fun. You miss Intel's extensions to memory-side caches ;) > > (which is wise!) > > > > Yes yes, but I'm trying to be nice. I'm debating on writing the Section > 4 interleave addendum on Zen5 too :P What do they get up to? I've not seen that one yet! May be a case of 'Hold my beer' for these crazies. > > > > ================== > > > NUMA node creation > > > =================== > > > NUMA nodes are *NOT* hot-pluggable. All *POSSIBLE* NUMA nodes are > > > identified at `__init` time, more specifically during `mm_init`. > > > > > > What this means is that the CEDT and SRAT must contain sufficient > > > `proximity domain` information for linux to identify how many NUMA > > > nodes are required (and what memory regions to associate with them). > > > > Is it worth talking about what is effectively a constraint of the spec > > and what is a Linux current constraint? > > > > SRAT is only ACPI defined way of getting Proximity nodes. Linux chooses > > to at most map those 1:1 with NUMA nodes. > > CEDT adds on description of SPA ranges where there might be memory that Linux > > might want to map to 1 or more NUMA nodes > > > > Rather than asking if it's worth talking about, I'll spin that around > and ask what value the distinction adds. The source of the constraint > seems less relevant than "All nodes must be defined during mm_init by > something - be it ACPI or CXL source data". > > Maybe if this turns into a book, it's worth breaking it out for > referential purposes (pointing to each point in each spec). Fair point. It doesn't add much. > > > > > > > Basically, the heuristic is as follows: > > > 1) Add one NUMA node per Proximity Domain described in SRAT > > > > if it contains, memory, CPU or generic initiator. > > > > noted > > > > 2) If the SRAT describes all memory described by all CFMWS > > > - do not create nodes for CFMWS > > > 3) If SRAT does not describe all memory described by CFMWS > > > - create a node for that CFMWS > > > > > > Generally speaking, you will see one NUMA node per Host bridge, unless > > > inter-host-bridge interleave is in use (see Section 4 - Interleave). > > > > I just love corners: QoS concerns might mean multiple CFMWS and hence > > multiple nodes per host bridge (feel free to ignore this one - has > > anyone seen this in the wild yet?) Similar mess for properties such > > as persistence, sharing etc. > > This actually come up as a result of me writing this - this does exist > in the wild and is causing all kinds of fun on the weighted_interleave > functionality. > > I plan to come back and add this as an addendum, but probably not until > after LSF. > > We'll probably want to expand this into a library of case studies that > cover these different choices - in hopes of getting some set of > *suggested* configurations for platform vendors to help play nice with > linux (especially for things that actually consume these blasted nodes). Agreed. We'll be looking back on this in a year or so and thinking, wasn't life nice an simple back then! Jonathan > > ~Gregory >