From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40B38C02193 for ; Wed, 5 Feb 2025 02:17:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8B019280016; Tue, 4 Feb 2025 21:17:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 85F23280015; Tue, 4 Feb 2025 21:17:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E270280016; Tue, 4 Feb 2025 21:17:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4A571280015 for ; Tue, 4 Feb 2025 21:17:16 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id E5BA21A04B2 for ; Wed, 5 Feb 2025 02:17:15 +0000 (UTC) X-FDA: 83084278830.19.C4D3E8C Received: from mail-qv1-f42.google.com (mail-qv1-f42.google.com [209.85.219.42]) by imf26.hostedemail.com (Postfix) with ESMTP id DD8DB140008 for ; Wed, 5 Feb 2025 02:17:13 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=nMGo5kwX; spf=pass (imf26.hostedemail.com: domain of gourry@gourry.net designates 209.85.219.42 as permitted sender) smtp.mailfrom=gourry@gourry.net; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738721834; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gQ4wMumQZupZLc/dhljQZvdj84cqZT9CeaohEOD51F8=; b=2GjmPuCokHFdFtjjlXqIXyG2LXqrgj6boXQoC9HcTIuC+ZFPXtTaPUFKeAoXLNVPFvXY7i IKpm47LZKSnsZBtcoxASO+2ChB4bwlD5AlLto2dV/ueWhwXf0MsheMRgknFx2bfMFB4+88 b5ZRuS2KtOixV75mjg/pO63mW1YJhK4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738721834; a=rsa-sha256; cv=none; b=fvunbywnC0FlRX3PDCd7U4oFXAt0N1dDaw3G2ejSr83VnpI7lCW9qD+93DaNd6FHgbkcYq +tJyJQPFc9Rva0PkZCNrxcfivcOa2t6m0ba0aBOCF7C4LRzsI3hLKWsinmajvDO9Jjmhpy +nBhngMPZQIQYfFdp2pmdnDsz8yNzac= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=nMGo5kwX; spf=pass (imf26.hostedemail.com: domain of gourry@gourry.net designates 209.85.219.42 as permitted sender) smtp.mailfrom=gourry@gourry.net; dmarc=none Received: by mail-qv1-f42.google.com with SMTP id 6a1803df08f44-6e41e18137bso18522196d6.1 for ; Tue, 04 Feb 2025 18:17:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1738721833; x=1739326633; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=gQ4wMumQZupZLc/dhljQZvdj84cqZT9CeaohEOD51F8=; b=nMGo5kwXStzO9DlYB1P+MsC8peB4hBAHiQEozJmgoqCGQCFbvgZ9dFiHdSmgivH2r9 VjQItm/xR3B7Ju3VwnF0U6GqP05n4geOALLXlNHrVRkBIR5gIYxW7QnOBLkcoGdRy43L 8dsKEtz5M3+xOtaSHmVCuLIfmWyKU+/2RwCm6QIS2b5tc+ygOp6KlPdlfbSgpBQIjGar oMIvdWh2uv8dbNQAO5o5bk3JGJHDq8E6yaqHpMZYaOYIovs7ZCtqRq3WovYzPPj1vw2Q WNGKBZAr1ey8syUgi2zn84FPOUGdPambyVbR05FVlipu+qFxKf+BlxDl9NAPs05nJin9 dnhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738721833; x=1739326633; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=gQ4wMumQZupZLc/dhljQZvdj84cqZT9CeaohEOD51F8=; b=tiifKbRfIPqnIf9Hmb8GP4cUE9dBkLOav9D+4PAN6hSWtRa8YuK2GPltxWcP0plNhH PjzaDHT+LS2/rWAdXj++ooI5miXNs2jWHirnBuzFeiijNjbAeLjzJPT6xl+Qx1tUE8BL Fc+gjfTyo2kyvzfdAGNGuRIKub7jsQDzL6SOA2DZynVgFrY8ao/+lU0Bvtj59ZV/CfJ4 X4Et+fXYs1dQ27IOaYwKp619V3+6aYlczpVwlKietbhfsU3AqtV1psNEo7JhZogRQfHf lKa8kYcd1xmyF4FUyAPgxFpUFYJOXIyhw+WBRGWruXcx2Iu7dgmtwU3U9aCGeiAIJz+G h5lw== X-Gm-Message-State: AOJu0YyhnAQ5u/yfLcrjGkAmGE6PLgrBnNWiF5gjQ8Q9TO0k27n0V1rp Q4etQXuOwswTh6pZ4l1ipb7LcKFLX/C+6IBWVfFHmyjzAcQteemLvGvbsy5x284= X-Gm-Gg: ASbGncv8KDB8DgH6/2zNLIGV1u0BTJ2zvEhtBjGp7yhRemo3bIqoLHLXY5d44tW8cmY DxacOTBNS3ap/dMzGHN8G9sNsjGl4fij/2jWX8JDNgbVJB7qWppr99qNNednh6yNRQ+VhdVCgOs NF53TY4fD3uSIWqPtXMoVp1IY8TY7KfUadGThqSONPghyNK70wx0mgf3UXAzK6oal2ZXDsSuiMV VcVMo+x+uyjFHV9pQhJ1i5cbpQqOBJSdivVlwczx7F5YHM/cxoXmnGjupkLb33s6/xnrmMSXRFx d2bqJIe9u3xtNX9bcczp/ocvX4YNPOiL7FhFNHer70cD8pluCkmzw4Wdjcg4fZebg2qQVFZNqg= = X-Google-Smtp-Source: AGHT+IGcCYeEGXr7ma2IdlwziuxtLLKn8PauDO/TRhblgjJ2/h/K5GKeyVVeyaZAXnakq4YCgficbw== X-Received: by 2002:ad4:5f45:0:b0:6e4:227b:c2b0 with SMTP id 6a1803df08f44-6e42fbef82cmr19007526d6.22.1738721832796; Tue, 04 Feb 2025 18:17:12 -0800 (PST) Received: from gourry-fedora-PF4VCD3F (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6e254814007sm68655186d6.34.2025.02.04.18.17.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Feb 2025 18:17:11 -0800 (PST) Date: Tue, 4 Feb 2025 21:17:09 -0500 From: Gregory Price To: lsf-pc@lists.linux-foundation.org Cc: linux-mm@kvack.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [LSF/MM] CXL Boot to Bash - Section 1: BIOS, EFI, and Early Boot Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: DD8DB140008 X-Stat-Signature: zog5e6b9x5idad378wfrjqkpnnnmu6e4 X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1738721833-787804 X-HE-Meta: U2FsdGVkX1/RXMmcpI6wB996vQCNhDBP6TbXCmCwOXWwUe1XRztfdyRHRRaDDhwN/Ubo/yBzRb2klKPvNvNuEciEX/sxJd0/EX6Q685QzuEks6jl6UCONr5XyYLkFMz3KS5R2GeMZp7ukZerLeAr4oUHw248iqqAvTT3K5sq4gqrry2W3BLTS6rURI3hzgaB78+2DJawL2sfTUgRF5x7qpx4jRdEhuXzJ7ln4W06Cm0oBcGj4a7mbcRJI0hgOpvrqpWJWe4bqemWWSF5VVglAgUyrLC+TuLwohUdlVzvnh0PXKvkUsns0ON34wYYsEOch92RMcBrWSFEbYdXHdmuZNcisVIHyoq5WMYinHwzBk+CVYvOj5+5dZES+hvZpjjRuZxbiV6OxCBqJH0jsMO/aN1XMVf8UTiZ82TAe/P/RGdwh4HQOfWLiABJ0zaYF6+OJ2cjVccu0o+ROA11OA97VkREEJ/XzD3FXJLGAqCrcFAFXoMXwkmGqS5eMPtJ8O+nPtA86WDMbyvCTjNoBneQT7lyB5P2QpmnzGxU2q/V3AQTQe9fbwixmVFnrt9DEV8NFmvn9slHkRGx00ctPj4JPzeNpQYcMWeTxmjHhh85oh0hwYYzK4q9mRT+d2Tom2Z0gSg7Ah5AvWXw50JsdNuaIlDRASzR72wo23SCnFsP4dI8gD+q2Fp0vSzvo6OMxntuTYczx78RltMVdcAbOQ8b2d2/xFhZzOm1CglnnYBXi+zUNSKrI79v+dH+2lnGnWnAVFZRO/BlPMJtYLceKFPAdl8ZyCUATimgIlnBeDHxri0tAVX9iy3kYEcmw+vvvl3MIr6XnggDVvvDUtAbIn8IDWJcHwFJOxoNIaPheMyDf6AHh1XDnuxzeKB9a6rP2XJOuaKON8J3InjHH0h3uf9rN7ov1OAgrhRnMJHKwxccB19CP+O5adaIPdUN+wHdnUkAd9PaxfzsRIpqdFtN233 ABw7ALW9 13bjKRF3SgVOlQW0IMaV8ao8q+zaPVQxPkUxdJIM7DVpXQ1RwmMjggH3Jt2ocMWOV5E73ZFjPdfEXso7PkjVT79NTapb3bNePFv4MMJrIIz9ycal5mRnysDBizVUfSq4EgUsp+leMhnwbl2oOc244j1rgvBLMz/U1BDgT5zdk3MXWGRjb7U+PUzYZvMXbCg9gxxMeOIMaPMaHLffw17Cpjf7NZZjp4KYI/fY01PxQ9iVjL9YDZB+AahOhn/Mdg2dQOBk62SohHVV7uOsDxRbAH1Bou67uRa/S7wlUlw4A0/ykngi7YT+0LG5RMQt5Omkdm843EjH9fn5G9OepEm07G/YOb5MDbAHlQxmIlWCydtrPjwrgF9idnFXnTQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Tossing this out as larger documentation of these steps for comment, not as a representation of what will show up in the talk. This is trying to cover the minimum needed information to start reasoning about the growing complexity of configurations. Platform / BIOS / EFI Configuraiton =================================== --------------------------------------- Step 1: BIOS-time hardware programming. --------------------------------------- I don't want to focus on platform specifics, so really all you need to know about this phase for the purpose of MM is that platforms may program the CXL device heirarchy and lock the configuration. In practice it means you probably can't reconfigure things after boot without doing major teardowns of the devices and resetting them - assuming the platform doesn't have major quirks that prevent this. This has implications for Hotplug, Interleave, and RAS, but we'll cover those explicitly elsewhere. Otherwise, if something gets mucked up at this stage - complain to your platform / hardware vendor. ------------------------------------------------------------------ Step 2: BIOS / EFI generates the CEDT (CXL Early Detection Table). ------------------------------------------------------------------ This table is responsible for reporting each "CXL Host Bridge" and "CXL Fixed Memory Window" present at boot - which enables early boot software to manage those devices and the memory capacity presented by those devices. Example CEDT Entries (truncated) Subtable Type : 00 [CXL Host Bridge Structure] Reserved : 00 Length : 0020 Associated host bridge : 00000005 Subtable Type : 01 [CXL Fixed Memory Window Structure] Reserved : 00 Length : 002C Reserved : 00000000 Window base address : 000000C050000000 Window size : 0000003CA0000000 If this memory is NOT marked "Special Purpose" by BIOS (next section), you should find a matching entry EFI Memory Map and /proc/iomem BIOS-e820: [mem 0x000000c050000000-0x000000fcefffffff] usable /proc/iomem: c050000000-fcefffffff : System RAM Observation: This memory is treated as 100% normal System RAM 1) This memory may be placed in any zone (ZONE_NORMAL, typically) 2) The kernel may use this memory for arbitrary allocations 4) The driver still enumerates CXL devices and memory regions, but 3) The CXL driver CANNOT manage this memory (as of today) (Caveat: *some* RAS features may still work, possibly) This creates an nuanced management state. The memory is online by default and completely usable, AND the driver appears to be managing the devices - BUT the memory resources and the management structure are fundamentally separate. 1) CXL Driver manages CXL features 2) Non-CXL SystemRAM mechanisms surface the memory to allocators. --------------------------------------------------------------- Step 3: EFI_MEMORY_SP - Deferring Management to the CXL Driver. --------------------------------------------------------------- Assuming you DON'T want CXL memory to default to SystemRAM and prefer NOT to have your kernel allocate arbitrary resources on CXL, you probably want to defer managing these memory regions to the CXL driver. The mechanism for is setting EFI_MEMORY_SP bit on CXL memory in BIOS. This will mark the memory "Special Purpose". Doing this will result in your memory being marked "Soft Reserved" on x86 and ARM (presently unknown on other architectures). You will see Memory Map and iomem entries like so: BIOS-e820: [mem 0x000000c050000000-0x000000fcefffffff] soft reserved /proc/iomem: c050000000-fcefffffff : Soft Reserved Unless of course: 1) CONFIG_EFI_SOFT_RESERVE=n in your build config, or 2) You set the nosoftreserve boot parameter 3) You kexec'd from a kernel where conditions #1 or #2 are met In which case you'll get SystemRAM as if EFI_MEMORY_SP was never set. (#3 was fun to debug, for some definition of fun. Ask me over coffee) ------------------------------------------------------------ First bit of nuanced complexity: Early-Boot Resource Re-use. ------------------------------------------------------------ How are MemoryMap resources managed by a driver after being reserved during early boot? Example: Hot-(un)plugging a device. What if we replace said Hot-unplugged device with a device with a new capacity? What if the arch/platform code combines two adjacent regions with similar attributes before creating resources? Recent work by Nathan Fontenot [1] has been looking to try to address some of the issues with these Soft Reserved resources and either re-using them or handing them off entirely to the relative driver for management. [1] https://lore.kernel.org/linux-cxl/cover.1737046620.git.nathan.fontenot@amd.com/ -------------------------------------------------------------------- The Complexity story up til now (what's likely to show up in slides) -------------------------------------------------------------------- Platform and BIOS: May configure all the devices prior to kernel hand-off. May or may not support reconfiguring / hotplug. BIOS and EFI: EFI_MEMORY_SP - used to defer management to drivers Kernel Build and Boot: CONFIG_EFI_SOFT_RESERVE=n - Will always result in CXL as SystemRAM nosoftreserve - Will always result in CXL as SystemRAM kexec - SystemRAM configs carry over to target -------------------------------------------------------------------- Next Up: Driver Management - Decoders, HPA/SPA, DAX, and RAS. Memory (Block) Hotplug - Zones, Auto-Online, and User Policy. RAS - Poison, MCE, and why you probably want CXL=ZONE_MOVABLE. Interleave - RAS and Region Management (Hotplug-ability) ~Gregory