From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 122A5C54798 for ; Tue, 5 Mar 2024 06:36:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 747846B0088; Tue, 5 Mar 2024 01:36:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F7186B008A; Tue, 5 Mar 2024 01:36:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E5FF6B008C; Tue, 5 Mar 2024 01:36:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 4DA546B0088 for ; Tue, 5 Mar 2024 01:36:42 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 0DB3080847 for ; Tue, 5 Mar 2024 06:36:42 +0000 (UTC) X-FDA: 81862027044.10.032CD40 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) by imf03.hostedemail.com (Postfix) with ESMTP id 3DF502001A for ; Tue, 5 Mar 2024 06:36:38 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=nptpbf7T; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf03.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.21 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709620600; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3Dv39VYUEGE1hoEDyZ4yABZwqMl0P/JFxBLroqwIZbM=; b=bmbbtZ962/+mKVRNeoQtmDAezD4gXFATLI3GNatvojX2MgNAaeBYalBKJ7GYkqW7BGPsJ4 20aja/EEOxsfV4xQS95yJnW4307s+9lUmVOyN6Ky9jP5fY1N3Yk0ko+xeNZiSzguolcJbq SMmFa+tRc0JFRYn0nhECO4d/zAs4bqQ= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=nptpbf7T; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf03.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.21 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709620600; a=rsa-sha256; cv=none; b=qcFeCQfeuxsGHLqqtxzHghIg88zh+wKbuJ/lQlSXJ32omo/vbDJW8+6aMJDcfZwsZgBcza 5qUKWltiHxXMhNrOvSpYUGMrtiX77smLLOJDMvWLcoglkrjxs/flZm+Uo3zTP/blAlPJeU BMrPhM2yqhN0hzlw3ELR07Jm8+os4FE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1709620600; x=1741156600; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=fsK+f7bh6SaYyPN7/YyJPG+tUPYKxJnt+4ogSrVq56o=; b=nptpbf7TqiXFfXBNSQUbf6Qmum9TOOkCxeN6qAIuldp6ZolTcYrzxYJ2 pRD45s1S9/L+HQ3gFf8W2GSyLx1zjl4tajT+N1+fN9NTOdHgHhuWQUGEJ dQTrKTeHcuPTo97hW0S+0JBqK/crB3yys+lj5tLg5b5Cb7AAq7AKtYiBt MvFwGzu3zKHO8d8dF8L9s6Ww/dSrGQ7ItqCpTvSc8gl64AMiXMXpvh8ra hEJLbI2JIUqOOBudKbG13hwjjW8Uy/dNkQpkPG86/6fYutJnJZbz6x5bx N/zNjrtBbMC/tcYZomZYwD7uH12l5cUHb0Ep5p8xE7WmGFEtnAp9yIaqm w==; X-IronPort-AV: E=McAfee;i="6600,9927,11003"; a="4076929" X-IronPort-AV: E=Sophos;i="6.06,205,1705392000"; d="scan'208";a="4076929" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Mar 2024 22:36:38 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.06,205,1705392000"; d="scan'208";a="46795681" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Mar 2024 22:36:31 -0800 From: "Huang, Ying" To: "Ho-Ren (Jack) Chuang" Cc: Gregory Price , aneesh.kumar@linux.ibm.com, mhocko@suse.com, tj@kernel.org, john@jagalactic.com, Eishan Mirakhur , Vinicius Tavares Petrucci , Ravis OpenSrc , Alistair Popple , "Rafael J. Wysocki" , Len Brown , Andrew Morton , Dave Jiang , Dan Williams , Jonathan Cameron , linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "Ho-Ren (Jack) Chuang" , "Ho-Ren (Jack) Chuang" , linux-cxl@vger.kernel.org, qemu-devel@nongnu.org Subject: Re: [External] Re: [PATCH v1 0/1] Improved Memory Tier Creation for CPUless NUMA Nodes In-Reply-To: (Ho-Ren Chuang's message of "Mon, 4 Mar 2024 22:22:59 -0800") References: <20240301082248.3456086-1-horenchuang@bytedance.com> <87frx6btqp.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Tue, 05 Mar 2024 14:34:36 +0800 Message-ID: <87h6hl9og3.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 3DF502001A X-Stat-Signature: s5zxqoogcisgxiqhd7a8y96iwhgkfdnj X-Rspam-User: X-HE-Tag: 1709620598-412546 X-HE-Meta: U2FsdGVkX18ofZw55YjqrjyymggTrzkY5IuPfG9t9biRDnRoktxZKhiiva8LY8C0i0x21J9A3g5rniO7dH5dRQZcqWDDZ7yXMUm8bIxMxO3x0IxgTSbgpBtRQ/jEvX4FdpEPC9HtpPqtfzpRwUxo/83O9APFlGY2KTqDsmj4fXp+xtXePGZP75rMJoGR3ki5vKBKay3pKIcXS5zrigBI/i2ObXtQ1vonmvZ6paQBNh47ZhjNSH3bAw+effkd7Mbeba+NKumbAKDX/DXJMkYflNY/eFDob5HYFuviNbsWXwGe2tdb9awGofYcZMYz2DMjS1NQn2cHwOYZUd+hBV0lx5RfzqMbMoz2PWLYxdv3JDE5lkP94vMBllCicKm7j3RcoubDll538Jnc2aMMXo2DIQrTua3CnKDJGDFncNdaNFVvqJSMheCI+s8Acd48oB4B9vztWuvyRdE3KfFAK+H/BUD/zkO53GacRGa5sZwa4WWfaphZok0jKMuyB5jaWghhQOewvUE5x5C8BCBJv3bXK1KFauvN32OSHT0vSeODdVQUQsoVbPTcmPc+/+pf7Fym/qJCaU4AUkTymGQXLQWuZKFCVHVab7Ko04pcf/feoPuinArRzhdbRnJxguD+Xxs+xsNjLkjKqV4h7kcSVnXsX6DPg/Pik+5xqiQ1s8CM8yxgEmkDGm7mMTlVQWv4Z5kLmazC/euRFV/dKozCvZh93XasHRzxyBhTIjqM/hp6QI7E+DTGu0tMHAO97eCBoxNCGOhl0GH2TwE4eb/YW/W+7aQeDDEJaSkPuQ0tgMj5jb/53aeCBHe4A5hyXh5dGVveYevXOGKZiQ18Hm0+rF5a/JA+FnvMrgC2uz3ssBFi4s+2Qb1ngptJ3BtXMO5KiGIbtOcii7mJWV8WVsQd3N2vkv5IOLzwIUATCNQHTHu1G+c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: "Ho-Ren (Jack) Chuang" writes: > On Sun, Mar 3, 2024 at 6:47=E2=80=AFPM Huang, Ying = wrote: >> >> "Ho-Ren (Jack) Chuang" writes: >> >> > The memory tiering component in the kernel is functionally useless for >> > CPUless memory/non-DRAM devices like CXL1.1 type3 memory because the n= odes >> > are lumped together in the DRAM tier. >> > https://lore.kernel.org/linux-mm/PH0PR08MB7955E9F08CCB64F23963B5C3A860= A@PH0PR08MB7955.namprd08.prod.outlook.com/T/ >> >> I think that it's unfair to call it "useless". Yes, it doesn't work if >> the CXL memory device are not enumerate via drivers/dax/kmem.c. So, >> please be specific about in which cases it doesn't work instead of too >> general "useless". >> > > Thank you and I didn't mean anything specific. I simply reused phrases > we discussed > earlier in the previous patchset. I will change them to the following in = v2: > "At boot time, current memory tiering assigns all detected memory nodes > to the same DRAM tier. This results in CPUless memory/non-DRAM devices, > such as CXL1.1 type3 memory, being unable to be assigned to the > correct memory tier, > leading to the inability to migrate pages between different types of memo= ry." > > Please see if this looks more specific. I don't think that the description above is accurate. In fact, there are 2 ways to enumerate the memory device, 1. Mark it as reserved memory (E820_TYPE_SOFT_RESERVED, etc.) in E820 table or something similar. 2. Mark it as normal memory (E820_TYPE_RAM) in E820 table or something similar For 1, the memory device (including CXL memory) is onlined via drivers/dax/kmem.c, so will be put in proper memory tiers. For 2, the memory device is indistinguishable with normal DRAM with current implementation. And this is what this patch is working on. Right? -- Best Regards, Huang, Ying >> > This patchset automatically resolves the issues. It delays the initial= ization >> > of memory tiers for CPUless NUMA nodes until they obtain HMAT informat= ion >> > at boot time, eliminating the need for user intervention. >> > If no HMAT specified, it falls back to using `default_dram_type`. >> > >> > Example usecase: >> > We have CXL memory on the host, and we create VMs with a new system me= mory >> > device backed by host CXL memory. We inject CXL memory performance att= ributes >> > through QEMU, and the guest now sees memory nodes with performance att= ributes >> > in HMAT. With this change, we enable the guest kernel to construct >> > the correct memory tiering for the memory nodes. >> > >> > Ho-Ren (Jack) Chuang (1): >> > memory tier: acpi/hmat: create CPUless memory tiers after obtaining >> > HMAT info >> > >> > drivers/acpi/numa/hmat.c | 3 ++ >> > include/linux/memory-tiers.h | 6 +++ >> > mm/memory-tiers.c | 76 ++++++++++++++++++++++++++++++++---- >> > 3 files changed, 77 insertions(+), 8 deletions(-) >> >> -- >> Best Regards, >> Huang, Ying