From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B17BDC02194 for ; Thu, 6 Feb 2025 15:59:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 345E36B0085; Thu, 6 Feb 2025 10:59:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2CED16B0088; Thu, 6 Feb 2025 10:59:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 148F66B008A; Thu, 6 Feb 2025 10:59:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E715F6B0085 for ; Thu, 6 Feb 2025 10:59:42 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 966D6A1293 for ; Thu, 6 Feb 2025 15:59:42 +0000 (UTC) X-FDA: 83089980204.13.B2992C6 Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) by imf11.hostedemail.com (Postfix) with ESMTP id 9910340010 for ; Thu, 6 Feb 2025 15:59:40 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=iYSpHpoQ; dmarc=none; spf=pass (imf11.hostedemail.com: domain of gourry@gourry.net designates 209.85.222.175 as permitted sender) smtp.mailfrom=gourry@gourry.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738857580; a=rsa-sha256; cv=none; b=AbQhVSyYBECWToDOXPtRpF184iRPTbzZNqjT7UXxIkwTNObRI3RMJ1gVZoYWz2kgUREnXt r/7I8AVph5n6unUZjkFpYPwtZPii9dD7bqnxOjy9rxRdPELMH27sarHqXk+5eAZoRoQqrY IGwzacV8g5TULF466scZdx5buXHpUeY= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=iYSpHpoQ; dmarc=none; spf=pass (imf11.hostedemail.com: domain of gourry@gourry.net designates 209.85.222.175 as permitted sender) smtp.mailfrom=gourry@gourry.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738857580; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4rgX3enkDd37pO8Eu6+jW5LrbU18LPSV6Ey4Xi/pbpE=; b=Cb3WdJBr5YsFIa5l/nIthOoMKT80GLyrLXaWa9JzfqFPE/zfaSvJTP1HSkN214ry/rHfeA MHhwvehng2h6fcS5TWsUxCRhvYTKXSPYBTAzkl17Pebbrp1qPqSr4SdW6yTILEjB4MPmQP fKz9eSrUfvy+scc3AjZkAkUrSKiGmd8= Received: by mail-qk1-f175.google.com with SMTP id af79cd13be357-7b6e9db19c8so84956785a.3 for ; Thu, 06 Feb 2025 07:59:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1738857579; x=1739462379; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=4rgX3enkDd37pO8Eu6+jW5LrbU18LPSV6Ey4Xi/pbpE=; b=iYSpHpoQeIHgRB6itrNHa+HsmFcxm37gd9H6wrfIIvBVSKVqd8Awp39Wb8OfD9l5qZ kocTbNKqKTlVDQ+vWqQu4CxhnpLMhMIMdXKIhpalN5zFgeYY2jWNb87dcH27DBiZ87wi lbGN+0UKbin30NnpbGHhes2IPDkvK9CCXohGRg/s/It++10umiRayg2bWpux5POUbYqH WJMB5Cncv52neatUJ5I+MpY3U/zEGCgq3U0/uMSaCtpqkcI4BU9+03wOWYLc7QiKUQIi VLRN7t5rGUWUxpqi5GR1sHavVen3KNCeo8xfHHkMES0/hCQCw1i/pFljEII/ilMQ8Z+4 Q+hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738857579; x=1739462379; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=4rgX3enkDd37pO8Eu6+jW5LrbU18LPSV6Ey4Xi/pbpE=; b=GSxTZOb8b45U8yxCgj8MjeKjk22UQhD30VJvg5TUMZOqEX8oyYVyaPspE3bSZDeBVI 1kDFwiOcLubeJeNJNZ0OCkn9j25JsaXsj7pdcqQKzeet6qqqupaUwKG7W8XO4rE5SsNT 25BA3XhQ/wnL1kUfgzeCK98aAUHgJ504I6GvxktPy4MMAjb0H5tkXzXnLmwp73IWQngn ZKlGRH+uECH3vy1ywsUlol80LL5OFdstT6WOZTw5Mmsveb6qk8pe2ci/fll1r2qr1fyc 8ZC4zFjjVTng7/UVIIzsJZM+mc6ywlxMNowyT/eiECwpkVYkS0M7yjhSHQmXLUYJxbca Gwbg== X-Forwarded-Encrypted: i=1; AJvYcCXL4zHbBbYLDhsnaGs5hPTNr8MyG4Z6SIFgFCxzg7wBG07up8WLuQHbFBGOAanD5lQjvQRhX6bupw==@kvack.org X-Gm-Message-State: AOJu0YxA10Y/NNGPMut6kdWl1wNO/9LXqfFowTONcN195eqWCJgL/zcL vMovzO3Vys6GdGZ3GOon7DiF6ZucTfonCP7KNWFN/vvOOG6OP3lJQhWFKDfeyC4= X-Gm-Gg: ASbGncuJBC17JIYa0Heq2HUh0x8Cmkz4BqZ50yEDBs1d1jjfPk+T4MMGbXadG3o2UwP JRE5SLoteZ479CPH01cO0se5KivdNEjSmEVrhHUXJZVIn+hwLCwil6jhSi9kHd9dOty3TLsKrUj +r3nzvxNEH5qEXHKF5XOlvPEm/hQf9MW8rV4jpqO/WcjbQI6dZDq91M87ay9f1HnYZwB+10qdhN oMUNILhq3j3wmFANJzAiTPlfBAXn7YVExP4Tyshai/aSNHBSjI6ZP1kimJ4wjsLFBjo2QWgRhht 6MxhiBVev7fdEgBGxFt07FlbymKrUI1Qp7fQtsHyOouNFnapjIKgXWYf+K4YPmgnPxX4NejO0g= = X-Google-Smtp-Source: AGHT+IGVD8lviSGrZUHhDUeXaeng33mAdOdehqoxqNfFEFyKhbcfoqIeOcq35NP8wXT2MTIiiQ0kcA== X-Received: by 2002:a05:620a:40d4:b0:7b1:492c:ba83 with SMTP id af79cd13be357-7c039f95dfemr1002655885a.10.1738857579502; Thu, 06 Feb 2025 07:59:39 -0800 (PST) Received: from gourry-fedora-PF4VCD3F (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7c041ded12asm75286685a.5.2025.02.06.07.59.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Feb 2025 07:59:39 -0800 (PST) Date: Thu, 6 Feb 2025 10:59:37 -0500 From: Gregory Price To: Dan Williams Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: CXL Boot to Bash - Section 2: The Drivers Message-ID: References: <67a4069572eab_2d2c294d4@dwillia2-xfh.jf.intel.com.notmuch> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <67a4069572eab_2d2c294d4@dwillia2-xfh.jf.intel.com.notmuch> X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 9910340010 X-Stat-Signature: z3bttmo3rpn4h156zr6cdac9cak6px3k X-Rspam-User: X-HE-Tag: 1738857580-666594 X-HE-Meta: U2FsdGVkX1+tEKZdWkEcDimwGCop3d3/dhcbsX9zfWd0PwT0+grhtKSu1gj71uvRZxrteXGD3bSVqLBFRGzf+z9kOaM7+nVl+x2bEAOb9beIkcYAW1rzTdcOcDB+k9Upv1tLjaxFBUbC0Omo2Onk1cRNCyK2JsrJKstUfTZ8Ec/0F0Bdu6ncxsELlkQWlRHzAR9WWp8iCYPB6zTEq/qpmgutt+vhkrNzWaqQnG1S+Hnr+KduM0ouNhpgei2vScSn8307btTNjun2KfFnfrDzp2SA1TLU9BSLb+mj25G5Ax7Bm0ZdhX/a4Idhup4VtvbkoLqMs+WhlKFrCijozmlgHu31th7AuT9BvGhzTifyaZq1Gemf6JoJNmavf7cShXt+01IwANwsQrqcY0Ngei58PjBhsQJS1HH81ZRM3g3rYH1I2dAbKuLt5bPszxKgMc7i/VF18a/1Cq3b5GSTlMjhVQxj0UD8arrrIpnzD8Jv2R9fO+JXeWG73ill3/1kEGBUDekFZx7OrBlUR2iDvAp7iflRjJEYT0pm+he4SVMfhqjsLRy9zWKuz4xyUJMws4jnIuAbpuR6zNSEpUWAlKkT/3Vx0Y8z2AfKEgCCLQ6MdzVKGLATR4Bakc+fM14Gi59yE0w6/kUa9s5mEQ5tb9A3S9NdRd/IPuuMk8XbUdpWHEK179TBWz0v85eS35SPdCeyXz42uyHpM1s6bclICi6U5uSSvNuBTZiaPiUQPbf65jhKJ0k3ZckfJT4E7w6raEyPiA3SLE4EFJPR0bWsaFufta0TXUauMp6CnQlhmicV8rMfaxWe7U29IYayhWw3a5vVn4Tmi6oiTDaXqBoiBEY4MqVQWB6QlsruiV3hhIeULrjXoo9sn4C1xD+b0NgK0pvftBjoNsXmmHCyhq3gbuSsNhviluEFXAUg9hr4xC8ms+Pe7lOoziRTqy8bXL7wk3U/lbO6NEwEiD8ne5UJMZU F9HbT0cd B9ip31h0OStI2VT3+zTmrRDA/y/7agYrkSxVYmRaH9hlb1WUWrQEfc3oH6Io15+qQWUGiSvMMmOppB+TRwlTQNQ7sHFtBZ5xPxoPzHcfAEhV3+mCOWtksp/zHjfWo58VeYmRsNwVG3lkH64pjWloodGJfPmqYSa//BKFLLTbRsi+HM40PyD0Qdf7eAHFLNYvIr278tGYfPv2lolJRIYoB/CIZV6SJ26Pl4qfBvwy5c01HikPzuqRAalZJMEGq48DmCqhqw38NoSz5/8vhi0qX6nopnNYX6s1c1YbOQhVbrB0ERl5piOm6pqA2RjEEjrb4VrcNcO2KqPhHA5SiPTwXnKXwaC3S4X2eG8NvTv39D+fg5LJ2zvFceT/Ktx8VoKgNbuH9FQLhJJi86Xz5qswrPIYYoPLJxItXgDUL X-Bogosity: Ham, tests=bogofilter, spamicity=0.000377, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 05, 2025 at 04:47:17PM -0800, Dan Williams wrote: > Gregory Price wrote: > > [/sys/bus/cxl/devices]# ls > > dax_region0 decoder0.0 decoder1.0 decoder2.0 ..... > > dax_region1 decoder0.1 decoder1.1 decoder3.0 ..... > > > > ^^^ These dax regions require `CONFIG_DEV_DAX_CXL` enabled to fully > > surface as dax devices, which can then be converted to system ram. > > At least for this problem the plan is to fall back to > CONFIG_DEV_DAX_HMEM [1] which skips all of the RAS and device > enumeration benefits and just shunts EFI_MEMORY_SP over to device_dax. > Hm, would this actually happen in the scenario where CONFIG_DEV_DAX_CXL is not enabled but everything else is? The region0 still gets created and associated with the resource, but the dax_region0 never gets created. On one system I have I see the following: c050000000-fcefffffff : Soft Reserved c050000000-fcefffffff : CXL Window 0 c050000000-fcefffffff : region0 c050000000-fcefffffff : dax0.0 c050000000-fcefffffff : System RAM (kmem) fcf0000000-ffffffffff : Reserved 10000000000-1035fffffff : Soft Reserved 10000000000-1035fffffff : CXL Window 1 10000000000-1035fffffff : region1 10000000000-1035fffffff : dax1.0 10000000000-1035fffffff : System RAM (kmem) I would expect the above HMEM/shunt to only work if everything down through CXL Window 0 is torn down. But if CONFIG_DEV_DAX_CXL is not enabled, everything "succeeds", it just doesn't "Do what you want"(TM) - dax0.0 and RAM entries are absent. It makes me wonder whether the driver over-componentized the build. > I am otherwise open to suggestions about a better model for how to > handle a type of memory capacity that elicits diverging opinions on > whether it should be treated as System RAM, dedicated application > memory, or some kind of cold-memory swap target. > My gut tells me there's no "elegant solution" here given that user intent is fairly unknowable - i.e. best we can do is make the build and boot options easier to understand. > > --------------------------------------------------------------- > > Step 6: DAX surfacing Memory Blocks - First bit of User Policy. > > --------------------------------------------------------------- > > > > The last step in surfacing memory to allocators is to convert a dax > > device into memory blocks. On most default kernel builds, dax devices > > are not automatically converted to SystemRAM. > > I thought most distributions are shipping with > CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE, or the default online udev rule? > For example Fedora is CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y and RHEL is > CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n, but with the udev hotplug rule. > Good point, my bias take showing up in the notes here. I didn't know RHEL had gotten as far as a udev rule already. I'll adjust my notes. But this also hides some nuance as well - the default behavior onlines memory into ZONE_NORMAL with DEFAULT_ONLINE (next section). > > Alternatively, this can be done at Build or Boot time using > > CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE (v6.13 or below) > > CONFIG_MHP_DEFAULT_ONLINE_TYPE_* (v6.14 or above) > > memhp_default_state=* (boot param predating cxl) > > Oh, TIL the new CONFIG_MHP_DEFAULT_ONLINE_TYPE_* option. > It was only just added: https://lore.kernel.org/linux-mm/20241226182918.648799-1-gourry@gourry.net/ Basically creates parity between memhp_default_state and build options. > > The base is 256MB aligned (the minimum for the CXL Spec), and the > > window size is 512MB. This results in a loss of almost a full memory > > block worth of memory (~1280MB on the front, and ~512MB on the back). > > > > This is a loss of ~0.7% of capacity (1.5GB) for that region (121.25GB). > > This feels like an example, of "hey platform vendors, I understand > that spec grants you the freedom to misalign, please refrain from taking > advantage of that freedom". > Only x86 appears to actually do this (presently) - so is this a real constraint or just a quirk of how the x86 arch code has chosen to "optimize memory block size"? Granted I'm a platform consumer, not a vendor - but I wouldn't even know where to look to see where this constraint is defined (if it is). All I'd know is "CXL Says I can align to 256MB, and minimum memory block size on linux is 256MB so allons y!" On the linux side - these platforms are now out there, in the wild. So the surface impression now appears to be that linux just throws away ~0.5% of your CXL capacity for no reason on these platforms. That said, I also understand that more memory blocks might affect allocation performance when the system is pressured - but losing gigabytes of memory can also reduce performance. (Preview of one of my next nuance additions in section 3) If this (advisement) change is unwelcome, then we should be spewing a really loud warning somewhere so vendors get signal for consumers. ~Gregory