* [PATCH v3 0/1] Identify the accurate NUMA ID of CFMW @ 2026-02-11 10:33 Cui Chao 2026-02-11 10:33 ` [PATCH v3 1/1] mm: numa_memblks: " Cui Chao 0 siblings, 1 reply; 4+ messages in thread From: Cui Chao @ 2026-02-11 10:33 UTC (permalink / raw) To: Andrew Morton Cc: Jonathan Cameron, Mike Rapoport, Wang Yinfeng, dan.j.williams, Pratyush Brahma, Gregory Price, David Hildenbrand, linux-cxl, linux-kernel, linux-mm, qemu-devel Changes in v3: - Clearly state that this issue was discovered in QEMU emulation. While it may theoretically occur on hardware, no such case has been observed so far. - Describe the runtime effects in the changelog. - Add the Fixes: tag information in the changelog. - Update the comments to clarify the selection logic between numa_meminfo and numa_reserved_meminfo. - Add Jonathan's Reviewed-by tag. Cui Chao (1): mm: numa_memblks: Identify the accurate NUMA ID of CFMW mm/numa_memblks.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) -- 2.33.0 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v3 1/1] mm: numa_memblks: Identify the accurate NUMA ID of CFMW 2026-02-11 10:33 [PATCH v3 0/1] Identify the accurate NUMA ID of CFMW Cui Chao @ 2026-02-11 10:33 ` Cui Chao 2026-02-11 14:21 ` Gregory Price 2026-02-12 0:39 ` dan.j.williams 0 siblings, 2 replies; 4+ messages in thread From: Cui Chao @ 2026-02-11 10:33 UTC (permalink / raw) To: Andrew Morton Cc: Jonathan Cameron, Mike Rapoport, Wang Yinfeng, dan.j.williams, Pratyush Brahma, Gregory Price, David Hildenbrand, linux-cxl, linux-kernel, linux-mm, qemu-devel, Jonathan Cameron In some physical memory layout designs, the address space of CFMW (CXL Fixed Memory Window) resides between multiple segments of system memory belonging to the same NUMA node. In numa_cleanup_meminfo, these multiple segments of system memory are merged into a larger numa_memblk. When identifying which NUMA node the CFMW belongs to, it may be incorrectly assigned to the NUMA node of the merged system memory. When a CXL RAM region is created in userspace, the memory capacity of the newly created region is not added to the CFMW-dedicated NUMA node. Instead, it is accumulated into an existing NUMA node (e.g., NUMA0 containing RAM). This makes it impossible to clearly distinguish between the two types of memory, which may affect memory-tiering applications. Example memory layout: Physical address space: 0x00000000 - 0x1FFFFFFF System RAM (node0) 0x20000000 - 0x2FFFFFFF CXL CFMW (node2) 0x40000000 - 0x5FFFFFFF System RAM (node0) 0x60000000 - 0x7FFFFFFF System RAM (node1) After numa_cleanup_meminfo, the two node0 segments are merged into one: 0x00000000 - 0x5FFFFFFF System RAM (node0) // CFMW is inside the range 0x60000000 - 0x7FFFFFFF System RAM (node1) So the CFMW (0x20000000-0x2FFFFFFF) will be incorrectly assigned to node0. To address this scenario, accurately identifying the correct NUMA node can be achieved by checking whether the region belongs to both numa_meminfo and numa_reserved_meminfo. 1. Issue Impact and Backport Recommendation: This patch fixes an issue observed in QEMU emulation where, during the dynamic creation of a CXL RAM region, the memory capacity is not assigned to the correct CFMW-dedicated NUMA node. While hardware platforms could potentially have such memory configurations, we are not currently aware of any such hardware. This issue leads to: Failure of the memory tiering mechanism: The system is designed to treat System RAM as fast memory and CXL memory as slow memory. For performance optimization, hot pages may be migrated to fast memory while cold pages are migrated to slow memory. The system uses NUMA IDs as an index to identify different tiers of memory. If the NUMA ID for CXL memory is calculated incorrectly and its capacity is aggregated into the NUMA node containing System RAM (i.e., the node for fast memory), the CXL memory cannot be correctly identified. It may be misjudged as fast memory, thereby affecting performance optimization strategies. Inability to distinguish between System RAM and CXL memory even for simple manual binding: Tools like |numactl|and other NUMA policy utilities cannot differentiate between System RAM and CXL memory, making it impossible to perform reasonable memory binding. Inaccurate system reporting: Tools like |numactl -H|would display memory capacities that do not match the actual physical hardware layout, impacting operations and monitoring. This issue affects all users utilizing the CXL RAM functionality who rely on memory tiering or NUMA-aware scheduling. Therefore, I recommend backporting this patch to all stable kernel series that support dynamic CXL region creation. 2. Why a Kernel Update is Recommended Over a Firmware Update: In the scenario of dynamic CXL region creation, the association between the memory's HPA range and its corresponding NUMA node is established when the kernel driver performs the commit operation. This is a runtime, OS-managed operation where the platform firmware cannot intervene to provide a fix. Considering factors like hardware platform architecture, memory resources, and others, such a physical address layout can indeed occur. This patch does not introduce risk; it simply correctly handles the NUMA node assignment for CXL RAM regions within such a physical address layout. Thus, I believe a kernel fix is necessary. Fixes: 779dd20cfb56 ("cxl/region: Add region creation support") Signed-off-by: Cui Chao <cuichao1753@phytium.com.cn> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> --- mm/numa_memblks.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/mm/numa_memblks.c b/mm/numa_memblks.c index 5b009a9cd8b4..0892d532908c 100644 --- a/mm/numa_memblks.c +++ b/mm/numa_memblks.c @@ -568,15 +568,16 @@ static int meminfo_to_nid(struct numa_meminfo *mi, u64 start) int phys_to_target_node(u64 start) { int nid = meminfo_to_nid(&numa_meminfo, start); + int reserved_nid = meminfo_to_nid(&numa_reserved_meminfo, start); /* - * Prefer online nodes, but if reserved memory might be - * hot-added continue the search with reserved ranges. + * Prefer online nodes unless the address is also described + * by reserved ranges, in which case use the reserved nid. */ - if (nid != NUMA_NO_NODE) + if (nid != NUMA_NO_NODE && reserved_nid == NUMA_NO_NODE) return nid; - return meminfo_to_nid(&numa_reserved_meminfo, start); + return reserved_nid; } EXPORT_SYMBOL_GPL(phys_to_target_node); -- 2.33.0 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v3 1/1] mm: numa_memblks: Identify the accurate NUMA ID of CFMW 2026-02-11 10:33 ` [PATCH v3 1/1] mm: numa_memblks: " Cui Chao @ 2026-02-11 14:21 ` Gregory Price 2026-02-12 0:39 ` dan.j.williams 1 sibling, 0 replies; 4+ messages in thread From: Gregory Price @ 2026-02-11 14:21 UTC (permalink / raw) To: Cui Chao Cc: Andrew Morton, Jonathan Cameron, Mike Rapoport, Wang Yinfeng, dan.j.williams, Pratyush Brahma, David Hildenbrand, linux-cxl, linux-kernel, linux-mm, qemu-devel On Wed, Feb 11, 2026 at 06:33:20PM +0800, Cui Chao wrote: > In some physical memory layout designs, the address space of CFMW (CXL > Fixed Memory Window) resides between multiple segments of system memory > belonging to the same NUMA node. In numa_cleanup_meminfo, these multiple > segments of system memory are merged into a larger numa_memblk. When > identifying which NUMA node the CFMW belongs to, it may be incorrectly > assigned to the NUMA node of the merged system memory. > > When a CXL RAM region is created in userspace, the memory capacity of > the newly created region is not added to the CFMW-dedicated NUMA node. > Instead, it is accumulated into an existing NUMA node (e.g., NUMA0 > containing RAM). This makes it impossible to clearly distinguish > between the two types of memory, which may affect memory-tiering > applications. > > Example memory layout: > > Physical address space: > 0x00000000 - 0x1FFFFFFF System RAM (node0) > 0x20000000 - 0x2FFFFFFF CXL CFMW (node2) > 0x40000000 - 0x5FFFFFFF System RAM (node0) > 0x60000000 - 0x7FFFFFFF System RAM (node1) > > After numa_cleanup_meminfo, the two node0 segments are merged into one: > 0x00000000 - 0x5FFFFFFF System RAM (node0) // CFMW is inside the range > 0x60000000 - 0x7FFFFFFF System RAM (node1) > > So the CFMW (0x20000000-0x2FFFFFFF) will be incorrectly assigned to node0. > > To address this scenario, accurately identifying the correct NUMA node > can be achieved by checking whether the region belongs to both > numa_meminfo and numa_reserved_meminfo. > Changelog comments after this are a bit much but other than that > > Fixes: 779dd20cfb56 ("cxl/region: Add region creation support") > Signed-off-by: Cui Chao <cuichao1753@phytium.com.cn> > Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Gregory Price <gourry@gourry.net> ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v3 1/1] mm: numa_memblks: Identify the accurate NUMA ID of CFMW 2026-02-11 10:33 ` [PATCH v3 1/1] mm: numa_memblks: " Cui Chao 2026-02-11 14:21 ` Gregory Price @ 2026-02-12 0:39 ` dan.j.williams 1 sibling, 0 replies; 4+ messages in thread From: dan.j.williams @ 2026-02-12 0:39 UTC (permalink / raw) To: Cui Chao, Andrew Morton Cc: Jonathan Cameron, Mike Rapoport, Wang Yinfeng, dan.j.williams, Pratyush Brahma, Gregory Price, David Hildenbrand, linux-cxl, linux-kernel, linux-mm, qemu-devel, Jonathan Cameron Cui Chao wrote: > In some physical memory layout designs, the address space of CFMW (CXL > Fixed Memory Window) resides between multiple segments of system memory > belonging to the same NUMA node. In numa_cleanup_meminfo, these multiple > segments of system memory are merged into a larger numa_memblk. When > identifying which NUMA node the CFMW belongs to, it may be incorrectly > assigned to the NUMA node of the merged system memory. > > When a CXL RAM region is created in userspace, the memory capacity of > the newly created region is not added to the CFMW-dedicated NUMA node. > Instead, it is accumulated into an existing NUMA node (e.g., NUMA0 > containing RAM). This makes it impossible to clearly distinguish > between the two types of memory, which may affect memory-tiering > applications. > > Example memory layout: > > Physical address space: > 0x00000000 - 0x1FFFFFFF System RAM (node0) > 0x20000000 - 0x2FFFFFFF CXL CFMW (node2) > 0x40000000 - 0x5FFFFFFF System RAM (node0) > 0x60000000 - 0x7FFFFFFF System RAM (node1) > > After numa_cleanup_meminfo, the two node0 segments are merged into one: > 0x00000000 - 0x5FFFFFFF System RAM (node0) // CFMW is inside the range > 0x60000000 - 0x7FFFFFFF System RAM (node1) > > So the CFMW (0x20000000-0x2FFFFFFF) will be incorrectly assigned to node0. > > To address this scenario, accurately identifying the correct NUMA node > can be achieved by checking whether the region belongs to both > numa_meminfo and numa_reserved_meminfo. Looks good, thanks for the clear statement on why this matters. Going forward, conciseness is valued. So here is a potential condensed statement of impact: --- While this issue is only observed in a QEMU configuration, and no known end users are impacted by this problem, it is likely that some firmware implementation is leaving memory map holes in a CXL Fixed Memory Window. CXL hotplug depends on mapping free window capacity, and it seems to be only a coincidence to have not hit this problem yet. --- With that and adding: Cc: <stable@vger.kernel.org> You can add: Reviewed-by: Dan Williams <dan.j.williams@intel.com> ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-02-12 0:39 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2026-02-11 10:33 [PATCH v3 0/1] Identify the accurate NUMA ID of CFMW Cui Chao 2026-02-11 10:33 ` [PATCH v3 1/1] mm: numa_memblks: " Cui Chao 2026-02-11 14:21 ` Gregory Price 2026-02-12 0:39 ` dan.j.williams
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox