From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EB8DC282C6 for ; Tue, 4 Mar 2025 00:32:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7DFC66B0082; Mon, 3 Mar 2025 19:32:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 78F956B0083; Mon, 3 Mar 2025 19:32:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 657DD6B0085; Mon, 3 Mar 2025 19:32:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 489F46B0082 for ; Mon, 3 Mar 2025 19:32:50 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id CE82DA4EFE for ; Tue, 4 Mar 2025 00:32:49 +0000 (UTC) X-FDA: 83181993258.22.2E52EAD Received: from mail-qt1-f179.google.com (mail-qt1-f179.google.com [209.85.160.179]) by imf21.hostedemail.com (Postfix) with ESMTP id DFDDE1C0012 for ; Tue, 4 Mar 2025 00:32:47 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=M8VRCtAS; dmarc=none; spf=pass (imf21.hostedemail.com: domain of gourry@gourry.net designates 209.85.160.179 as permitted sender) smtp.mailfrom=gourry@gourry.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741048368; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pw+9bjqD0Zwi6XTLWN9K32oaqVMQWYgDhTNGSzicq5Y=; b=QtcuzNJqxepLLlA4qFBBwL+PscrSmGkpFimbEElWclMoUkm75fMPF3uKcSb7s3dwyb8cjn WYowiu+6bM+9J3yC7uO81FwrT1+vuVllwqP0UrzhTBXsa4yHV+bEcOvEvTZNxRdMETi2GY o6VWNTa7wOeVBHzbMfqSJHYYVSKvCZs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741048368; a=rsa-sha256; cv=none; b=L7SIyYzvj+aoAZ5rwv4yMo99HlCYM7oBVbxzMGpo5+37iBHizBO9LmiopYRY2GpPPh8aqh aAQLUm2A09hMdLVOvVP9T3hR1OIdBRChlAEu12j2XJqI0EZuTGD80l0K7cQjS7w1HuP1tb flOg+v1O3rr/dwWeFVCLCwBx0BdphPg= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=M8VRCtAS; dmarc=none; spf=pass (imf21.hostedemail.com: domain of gourry@gourry.net designates 209.85.160.179 as permitted sender) smtp.mailfrom=gourry@gourry.net Received: by mail-qt1-f179.google.com with SMTP id d75a77b69052e-474f0c1e1c6so14967011cf.1 for ; Mon, 03 Mar 2025 16:32:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1741048367; x=1741653167; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=pw+9bjqD0Zwi6XTLWN9K32oaqVMQWYgDhTNGSzicq5Y=; b=M8VRCtAS1Wx4FqkfBPS6Vu2gWtFB2RPxMPd+8Y+9nCbhynbvOtli8BVts30mG6WH8c z1WvjlpDGX/XkH95TaAYdU0ZYSoWg3kz0276smWFwXiePPV8cwJzGemaL4R6cTmHw6/7 kv38fdgOK7RXdi2CyHtwQDHTfm14fydbTaCoU6vBIJbsZ9UTzeHEkM5xpbVkpfAYzpi4 2IF7V8sYXrCsaQxfZZ/H7fgDJwttvVOlA1vAPlQ0V0ZZckMr3WwqhSfGeBUH93/Ou74f IaCW7OTJ/lRBGKFSzTCtPiTQcfugcQub8Poj7qia9mXHaBoDddSlvrkiQ9bCZTxDn5XW sv9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741048367; x=1741653167; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=pw+9bjqD0Zwi6XTLWN9K32oaqVMQWYgDhTNGSzicq5Y=; b=RcV0mnYedmCs0SGVJI828pC9NxpFTYkLVRT+d18LKpxdQb7Wc9mlzgAWEXDFOzBbzI YKlyXbu8xdCp3xa7XtMoncmdq0hDmnQMT2L/7Mj8Mg6c4r9u9OrVSJHaRyQqCiw+8/sN fIX8eKQ3EopuZmTsfs5MI26J+j0ZCOE7IJcCKzncLDm19J+cQ3oX1Klx+stFoGK4yb1O FVE+09fstK2QDyiIxCU8GDHGBr16KdRgjfTsCHnJTWEJhcb5pLRFDOOy0Yo81ViOUJ9m Z6dXvfnfu/XNXDlwVU2nu6zIRGeGFPoIlUs+HukcR3Xuoe07zPTvCEfswbxthK/Ffonl cNkA== X-Gm-Message-State: AOJu0YxQgxuYn1Gc7xGy9RCThDRunFZEmNA4PmxnymXbLgTPVR9WZJJC kPy4ye5SHZU4XnzdfKuqICAcUqHuWlR6Iv0cedQ4A/EfaP9wlsVLHzGv+R+2+Dk= X-Gm-Gg: ASbGnctfyZAb0raU4krYS5LS2lBzjVRlynUvOXJzI0tWz5EjDKXOUYHcWrTf1xVZ0dl 7TJu5QpNVLHcTAIEU2j9v61qaR7FYkotPN6He24wvAVWK+U5YiDdyi2f2ur1ptXjK5i0Dq3iz5J oJW8IKVWcVO29Axih9mJ+9jsrxtAJjyRkLVxGEe2p9tVKMWmSR8MQwiQBqalUCsySnCNHNMhh3M NzTDCyRoFr+QKL8R7dGadCglOS7psqFUWuUDGuapEi2yON7C07VofREij2FUkOaqLbsalToCl07 884u9HMkmmWJ2bBw05yWjZrsvGBwzwCCgy4R4mxtR6EuF6nhCuTn9asyZRjFo3XG64TNh5rd/M0 pzu7D3ufSdCardk2ddTdFsp5zBnw= X-Google-Smtp-Source: AGHT+IEVAevyW2OR0o4+vVHk933xbD+rCHnDoRi5qyqfNQX2/sa4Qf+bjeIiSl09eve86dbBFG8pFA== X-Received: by 2002:a05:622a:5c6:b0:472:1275:6967 with SMTP id d75a77b69052e-474bc08782dmr261346811cf.21.1741048366820; Mon, 03 Mar 2025 16:32:46 -0800 (PST) Received: from gourry-fedora-PF4VCD3F (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4747242fe2dsm64847301cf.75.2025.03.03.16.32.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 16:32:45 -0800 (PST) Date: Mon, 3 Mar 2025 19:32:43 -0500 From: Gregory Price To: lsf-pc@lists.linux-foundation.org Cc: linux-mm@kvack.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [LSF/MM] CXL Boot to Bash - Section 1: BIOS, EFI, and Early Boot Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: DFDDE1C0012 X-Stat-Signature: 1ztt49g9oibshyt56gp1k4hstcgjcmoy X-HE-Tag: 1741048367-448707 X-HE-Meta: U2FsdGVkX18MTFmK0lTIPmgfQ4rkTHu0du9H7j4msWltu3LyYqTnWEojT8XzqKgoOwLmZQGBKdRN7qP/MhYqQ0Pl5EDAhPvYRFSKR4sNmHd2XCgYbV+OYVPMBL59AiYU4Ev7UvMyCbIIjrL00lvtM7uSluRjPZbk1IkdmVHUH6+IvvaMYAqk6BJcS500FDB1eKyJU6IHTDAI0bNNLCfdQBi2UTVrqnNrlwRkziE26bYWGTq4G8PgUXpATAOs+Ejee+F5rQOFAudTqI8TzDZfBJltLqDwlcS68EXGZQlsegghFN0QK/LFeNyIZArGyKy3lt6uSdvOzCJAarfn+f/Ww/AhgAqQqK9C29FUYbAoY3FVQu2auPA7yilHoEVnMMUVAvsa1bIjWRuqw4Xl73qWI+i1ZrejRmaceOaIjVhy6v55YAGgRWVzp5l+TK/6lJOsDg/oSuK3h4wz3CQcMMoq6ew1SOKugfyjudJ+rptDX91kJWOQFjA/pqrmp2hhYE6YwSEMbWwLrfSD/CwDJqRF8XFigXdQGw32GATj3UWEDTOdyahyYlTm1KRh4gj7wtNyzU1OgjC5h5reaz56tqcnKwBB+ssCrskJN6RdJ2v4akTPhOlMsz8jVDEyt7ccniDZdDLd9Cl46+U/gVGy05DXeyek9PDa0TB2+zBBHTZNJr54avVH08Gsq6aYuYKnYk1Jsg+auVfruG3N7kjFpcZ5+LZNBgOSYYPbkxWYWmOn4MErgksJcax3QV0iwtJfN/0zvvSVO8MzOQ8UCkL8GjApMR5xpw2eXz7UhzdfEvsSBzDgjSBGiYDDq3FBleTc7KRFWig8RbILZ4+rMET4JJ7Sjh6O/f1UYvcZUb13smeFHeI2rMumEkylscZllMdXCqv7bkG6TK8B0wP4Pd5sgdXOJIR6553Zk1IcMUjVR9iIBw+GU+UtjTCUMqKHTqYiL2IfYjBk+dGI2phwTUatp5F kdVKVMKu FZNCUjkXjcjETciq+PZ61bJbNYWOSwVKvj+BRoZi+yOCMYhWs+Q6wNh8UOU/WXfk0HqCjX5F747kavUNuX3aTMrdR4r7+eb9HctTZ2O3zmppJ2aF6vftXMqf4q6Vq1WFvcfdOq8nAN8H8IvqLGymgzY3IcHyvnRodZre6S+s6b/axYl5wLakVK73SG4fAGDMSP2qNG2gAcCu/eHksiTJU9iXiNMuvSNGuXXW3ZzYfkxGXpioO/P9q0rQ1G9kHMb8yKp61GNnmTfjf9TabCYQKxMiAC+QpJIfQ+17o0n6izAlAt4cuTJGMCLD+6ylFhz94bB761fvQrtyvU38= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 04, 2025 at 09:17:09PM -0500, Gregory Price wrote: > ------------------------------------------------------------------ > Step 2: BIOS / EFI generates the CEDT (CXL Early Detection Table). > ------------------------------------------------------------------ > > This table is responsible for reporting each "CXL Host Bridge" and > "CXL Fixed Memory Window" present at boot - which enables early boot > software to manage those devices and the memory capacity presented > by those devices. > > Example CEDT Entries (truncated) > Subtable Type : 00 [CXL Host Bridge Structure] > Reserved : 00 > Length : 0020 > Associated host bridge : 00000005 > > Subtable Type : 01 [CXL Fixed Memory Window Structure] > Reserved : 00 > Length : 002C > Reserved : 00000000 > Window base address : 000000C050000000 > Window size : 0000003CA0000000 > > If this memory is NOT marked "Special Purpose" by BIOS (next section), > you should find a matching entry EFI Memory Map and /proc/iomem > > BIOS-e820: [mem 0x000000c050000000-0x000000fcefffffff] usable > /proc/iomem: c050000000-fcefffffff : System RAM > > > Observation: This memory is treated as 100% normal System RAM > > 1) This memory may be placed in any zone (ZONE_NORMAL, typically) > 2) The kernel may use this memory for arbitrary allocations > 4) The driver still enumerates CXL devices and memory regions, but > 3) The CXL driver CANNOT manage this memory (as of today) > (Caveat: *some* RAS features may still work, possibly) > > This creates an nuanced management state. > > The memory is online by default and completely usable, AND the driver > appears to be managing the devices - BUT the memory resources and the > management structure are fundamentally separate. > 1) CXL Driver manages CXL features > 2) Non-CXL SystemRAM mechanisms surface the memory to allocators. > Adding some additional context here ------------------------------------- Nuance X: NUMA Nodes and ACPI Tables. ------------------------------------- ACPI Table parsing is partially architecture/platform dependent, but there is common code that affects boot-time creation of NUMA nodes. NUMA-nodes are not a dynamic resource. They are (presently, Feb 2025) statically configured during kernel init, and the number of possible NUMA nodes (N_POSSIBLE) may not change during runtime. CEDT/CFMW and SRAT/Memory Affinity entries describe memory regions associated with CXL devices. These tables are used to allocate NUMA node IDs during _init. The "System Resource Affinity Table" has "Memory Affinity" entries which associate memory regions with a "Proximity Domain" Subtable Type : 01 [Memory Affinity] Length : 28 Proximity Domain : 00000001 Reserved1 : 0000 Base Address : 000000C050000000 Address Length : 0000003CA0000000 The "Proximity Domain" utilized by the kernel ACPI driver to match this region with a NUMA node (in most cases, the proximity domains here will directly translate to a NUMA node ID - but not always). CEDT/CFMWS do not have a proximity domain - so the kernel will assign it a NUMA node association IFF no SRAT Memory Affinity entry is present. SRAT entries are optional, CFMWS are required for each host bridge. If SRAT entries are present, one NUMA node is created for each detected proximity domain in the SRAT. Additional NUMA nodes are created for each CFMWS without a matching SRAT entry. CFMWS describes host-bridge information, and so if SRAT is missing - all devices behind the host bridge will become naturally associated with the same NUMA node. big long TL;DR: This creates the subtle assumption that each host-bridge will have devices with similar performance characteristics if they're intended for use as general purpose memory and/or interleave. This means you should expect to have to reboot your machine if a different NUMA topology is needed (for example, if you are physically hotunplugging a volatile device to plug in a non-volatile device). Stay tuned for more Fun and Profit with ACPI tables :] ~Gregory