From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5BFC4C00144 for ; Tue, 2 Aug 2022 03:40:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8EAC56B0071; Mon, 1 Aug 2022 23:40:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 899616B0072; Mon, 1 Aug 2022 23:40:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 713638E0001; Mon, 1 Aug 2022 23:40:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 6211F6B0071 for ; Mon, 1 Aug 2022 23:40:25 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 2A16E12076F for ; Tue, 2 Aug 2022 03:40:25 +0000 (UTC) X-FDA: 79753250010.10.337B361 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf17.hostedemail.com (Postfix) with ESMTP id 85030400F8 for ; Tue, 2 Aug 2022 03:40:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1659411623; x=1690947623; h=date:from:to:cc:subject:message-id:references: in-reply-to:mime-version; bh=9CtqgfcE820nutXsiP1bInh1zOJMjF2NS92fZRBwhNI=; b=GUJ74MjBYtP1XCTSUKpMWjGb/ki2lRCq9Wuql7wcUx6VsB5zEks1qfDg qyRRt0R43PIvx/hUpTr/+WRKLmoKIG0V3wCCRLqYc1VpYUMgdp5FVjH41 8hbjyrx7SmGTZl/cLYgf9Vw4m8vLEnlio/dNE0uZZsfq5yzQxYUVcPqKW fHJZ7zCKnzFDp7wNHg7lYc0gTC8I9iW5XC+9sxoWlcyzpN4ZudMDJLWiY sBii4IjJ5feK/f7p/wQYFDMuaQLmaP+rQLUvE4rBD7m5pGQBm+tFXLXhx 8e7ju6MZZ2kMhUP4xI+LSyVwNEUCmHZ/Pbf7UAyNV27TGRp6RN+HNocpQ Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10426"; a="315149112" X-IronPort-AV: E=Sophos;i="5.93,210,1654585200"; d="scan'208";a="315149112" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Aug 2022 20:40:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,210,1654585200"; d="scan'208";a="605880980" Received: from fmsmsx604.amr.corp.intel.com ([10.18.126.84]) by fmsmga007.fm.intel.com with ESMTP; 01 Aug 2022 20:40:21 -0700 Received: from fmsmsx607.amr.corp.intel.com (10.18.126.87) by fmsmsx604.amr.corp.intel.com (10.18.126.84) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Mon, 1 Aug 2022 20:40:21 -0700 Received: from fmsmsx602.amr.corp.intel.com (10.18.126.82) by fmsmsx607.amr.corp.intel.com (10.18.126.87) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Mon, 1 Aug 2022 20:40:20 -0700 Received: from FMSEDG603.ED.cps.intel.com (10.1.192.133) by fmsmsx602.amr.corp.intel.com (10.18.126.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28 via Frontend Transport; Mon, 1 Aug 2022 20:40:20 -0700 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (104.47.58.169) by edgegateway.intel.com (192.55.55.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2375.28; Mon, 1 Aug 2022 20:40:20 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OmO/cOgqa+PzUkLqRWhLjZ0Qipy/cUv00y8O1FMpIQ9ToWrwlUkw2ROpWl3xW0UdCEzLw/hBdD3RpGsvFde+SqGSR55UIlJtL3P4LrtdoQHuUrbW5hc/TDvBMDNjc13inwm6uMACftkLZaJwUaDuOsVCJMhR3+is2ne90aK5spZakKuK7wkWWe0SAUAeP6GDAbVNOd3x74pnA6kX3cKKBwXyK9iMmRxr7TDET1dUyU5iOlYUt2NcBwwizTzZN2RHUhLXXEiDEiOc2W+YrXGtz1mEgZU+bbarDsReDNqwq2DvB1dhpAWeynXQp+xyB3w67ib6yHydt3OTUGD8vGcUuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=g9YcbCe36dn2JH5SNFybSxMAeqshFuPiRdYjCnx8MXM=; b=O1M+RG9iTup16C4YGXNZp9E9bJG66sHKp9AtJwTH1syakc7O1iFMvnrj+7rmlmtzeDEY24n7BE5K3UU2Y8MHli60pA3dWYIdEHuQSK79asambtAiiEqGbQNLqODTye4CK4LZ4iRgVF2RziHs0Eby6wLVnmXrV70cn904bYDupBJQlbMyW5mjpltykcmkdm+G2SJW4Kr6M9q5ryo3gZRCzaXq3KrOhFp5H3e4AJV2EJZuNI3dguMQJDFUe4MVPlhVMI/CouKMFJN0bqtD7dw4XxHyShS7nUlRZ7eSkM48ZnHbBhv4pXNdhD3oGsGr485z0bxXdfAvdo6Az58gxU9paw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from MWHPR1101MB2126.namprd11.prod.outlook.com (2603:10b6:301:50::20) by DM5PR11MB1275.namprd11.prod.outlook.com (2603:10b6:3:15::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5482.14; Tue, 2 Aug 2022 03:40:13 +0000 Received: from MWHPR1101MB2126.namprd11.prod.outlook.com ([fe80::9847:345e:4c5b:ca12]) by MWHPR1101MB2126.namprd11.prod.outlook.com ([fe80::9847:345e:4c5b:ca12%6]) with mapi id 15.20.5482.016; Tue, 2 Aug 2022 03:40:13 +0000 Date: Mon, 1 Aug 2022 20:40:10 -0700 From: Dan Williams To: "Huang, Ying" , Dan Williams CC: Aneesh Kumar K.V , , , Wei Xu , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , "Linux Kernel Mailing List" , Hesham Almatary , Dave Hansen , "Jonathan Cameron" , Alistair Popple , Johannes Weiner , , "Jagdish Gediya" Subject: Re: [PATCH v11 1/8] mm/demotion: Add support for explicit memory tiers Message-ID: <62e89c9addcc_62c2a29443@dwillia2-xfh.jf.intel.com.notmuch> References: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> <20220728190436.858458-2-aneesh.kumar@linux.ibm.com> <62e890da7f784_577a029473@dwillia2-xfh.jf.intel.com.notmuch> <874jyvjpw9.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <874jyvjpw9.fsf@yhuang6-desk2.ccr.corp.intel.com> X-ClientProxiedBy: SJ0PR13CA0132.namprd13.prod.outlook.com (2603:10b6:a03:2c6::17) To MWHPR1101MB2126.namprd11.prod.outlook.com (2603:10b6:301:50::20) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 20423a42-10c9-48b9-68ae-08da7438b50e X-MS-TrafficTypeDiagnostic: DM5PR11MB1275:EE_ X-LD-Processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: J0zCWZNp0tjO91vWT+SRbdNwTeF6SaHtZNRdhHu5vybqg2Qzftclc79xT76RlkUA/9kWTbnmVN3RvtWqNwEft0/S2pQq3I/EOGZiqBgGkl9lbf4f+YOjXZC0yggLLV7StLdZwJfhAiVMS/iaUfYpvLKEejrcYgr1TQxOWbjCv0LDq26qirtxYiISRn9HmnmqpXXDOr4FMv7kpOGGdvuNjwUDKbwaD8/QnrBV9z9PJ5ZXLPCS1GoEnpNjriGkDWV+msQvlkG0Dlc4kb1h5GYW4Z4Ta4IP8PR0pQjZER0CENiydw3N6yU5BC+/PuNwwANdWgLg59Jos1pMeMYkiiKf3eZIOnvYhhRfJXG5qjEdYfYVq/QZisMXsbH3qO8xIFtmuf5OqjFBHnb/pEA7pAyq2LShkYy6DVdSVUhIFFYtJKPRVd3gUsdWfTsVj6uSzwhgyIgTxboGXypNZ3TbcAu7sv+r5ONfcYqsneW8NszOl3v5EwToc+P++KTfU8eLSJeu2icoi4sWhIDoGHAAx813GUhxur7KAALAtDYMoXIP0Qw1dxHTuRo3VSYoKVP8DC2TcATO7x59Mb6KSV2A2cwQmvwXDr0XBhszrCOOVtDl3ZhtgyLQ2vT5ZMpsGItQ5D3a1UFoE2QOgqB1NTPjFJwdKLI9xUXSVC17MGSgDn2kGRZ6RIDYFef5LdeZD+nuEthTmMbm7iOblHZqfvoSlB/vIzZA0LkBO2CiwKJayZQlWPKxnRw2tjdkwu/5jfnGJPeoaQ770AM1NnqRTogJfqvX+vshIbej9GxTGby+vSr/bO4R6PqdImHX++iwsaZ3tMos X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MWHPR1101MB2126.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(39860400002)(396003)(136003)(366004)(346002)(376002)(83380400001)(8676002)(8936002)(5660300002)(4326008)(7416002)(2906002)(86362001)(186003)(316002)(54906003)(110136005)(966005)(6486002)(38100700002)(66946007)(66556008)(66476007)(478600001)(41300700001)(6506007)(6512007)(26005)(9686003)(82960400001);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?gVylWDQcoaZPo6+dkPAqWb0QZYjEx7LnUZrM6Phr1T26qNmVlE0MAn8GRbzR?= =?us-ascii?Q?s35QUarMrSZ3P21kPjqZEZ6DZNb2CjoSYvqMUo9mkN5i537S3mqj/3G+IrOr?= =?us-ascii?Q?vlKH5TP+gPG5k2ux/dlK1Af/yUtf0kvTzXByDKYsVVLP6CCjS4nRuojXvLoq?= =?us-ascii?Q?/6uZRfO5mTAyqTEE6LHEWfOBF8EJS4SMun7KH8mpo4D4goQxvFEHU1xwvpBg?= =?us-ascii?Q?3RvmRCy4OTGnoRm7wtsCVugww1F4QoVkzim9r4w13MFPthmrFKJ4Z7ejDikw?= =?us-ascii?Q?B6tl6H2bm6vbBPWfv8sBe7BSLgBNtL535nHQSoUNcVg6cWMmiPCPhnBropns?= =?us-ascii?Q?kYg8ye2PO7FPzu8CzXuxb6fD0ypjJ74rvVPqvzRKic5/DP8VA4ICs82mvSAg?= =?us-ascii?Q?t+et0eLQ++L2cwbVI21cSo3e9PviltdSqKtktYRZeOcI+pVGfVyjZWqXIbHj?= =?us-ascii?Q?rVSlOEIIzz6T7mujz6IAFgkk6PBpjI6tJNWXNrwntRd1E5yfy8RPIebvHXRf?= =?us-ascii?Q?tVjKxSHiT/a7aaJG5chrBq/dXbOPzyanBnAuXwpj5jJvOzexTwRhY3E8r/rM?= =?us-ascii?Q?M6G2rMvOuy8+lMzROjHqnDmKHwak9HYFJpocFWSbUHhPf66Izv7Gt4mZDRz1?= =?us-ascii?Q?kiifMJ8dLpeVlS944Gc4wctZ01+Ibb2TDeX582xJa7FmAcjI4Rr1zbMWhisk?= =?us-ascii?Q?tZDmME6pQF+Rry/6tilBGEQD+lZrxnM+BQnfYbFD2lrkNJcAxWhnMsFqHVa2?= =?us-ascii?Q?KDuJTE32K/88g+Xn6aU4xwPj1FFj9dCoqF3gPLPm9ymqrnjC/r0h4n+23b2q?= =?us-ascii?Q?M4FHQ5F9nUdYLFthJ6c7jbaF9HYh6rgsFrmKNUyILcCT/52fuP0DuQjkgdQ0?= =?us-ascii?Q?YwQelWYk9keJxgYLhu5AibGTbNKwZYfHKve+X5z6Dx4WLsvcPGXS0hp5zeIl?= =?us-ascii?Q?GjgaoqANah8UAGr6liSYIWqANkz+i1AvCD9IiZ0AVlWE9hNc0hhw5lPUMljz?= =?us-ascii?Q?7R4LYr8/hauv05Gw+HURTap/mtxgD4JEVjrh8vg8HmaNsuSq3XviToppF9Jl?= =?us-ascii?Q?VSzVVX66Gddvuj3j2mWO1XDf6hUFtvBi99LKib9KdGvKePJUclKuwq1NQbcd?= =?us-ascii?Q?B+8ENwR5ESf6P1dbZiZIPg6FTUUGReVbURJ3O1ObN6QPuoQAFgMWi/bg2GdQ?= =?us-ascii?Q?MHIZdcJy2rdw3F4kIqjiHOkymgz4EWcm6/Fe3pTCPE0GX0GTi8kQUWmHmU16?= =?us-ascii?Q?pkevr30GWnLSEo3NKlpVdZ8Vs4hRug50s9DyGhx4U1aL0hd+6kdr0WhqnXqJ?= =?us-ascii?Q?Ly/EPEo6/n4OCqwO8CJ6RtW2duFKcxxZ3h6CjW2WPSTVp/Srcm8nlAe8Lga6?= =?us-ascii?Q?7W5j3WONbVfhWawgmzT9bZaG/bw2qj2dPw2UhUxLbjLWXGa6CBdTkbAHdzE7?= =?us-ascii?Q?Qd+s1RchHcekZCugG8T9ITk5yzIeGU1cklCwq4F/1Z4BF5VO/TC327wEwKIY?= =?us-ascii?Q?CrL3cwBUPqNhdPqjWJf9v0mTBtR4ko60NkOkxQjSnZon+1JNJC1xcojoppiL?= =?us-ascii?Q?XPAUulvDMqz+UvY3TJW4tLwq+rdTIY9TlMaRebicCfrjjhDQm5cH5f3c/dDe?= =?us-ascii?Q?zw=3D=3D?= X-MS-Exchange-CrossTenant-Network-Message-Id: 20423a42-10c9-48b9-68ae-08da7438b50e X-MS-Exchange-CrossTenant-AuthSource: MWHPR1101MB2126.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Aug 2022 03:40:13.2519 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: HyLEgk1tqw4gq3TzaWJURDcCWG+c0fgXcoS/aauzBi4ho8HeUQ3fpF+EeniwHjONrEFG28qbvscpiXibFA7fO8+2gXJWcHx9t/d0Oc55S8o= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR11MB1275 X-OriginatorOrg: intel.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659411624; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=g9YcbCe36dn2JH5SNFybSxMAeqshFuPiRdYjCnx8MXM=; b=E0rx4JWmV7lRpNeANEPBPKFoZrE5Z7ZC5XN1SUayHIneI0OP5xrU6eCTdTi4EUKHlAaXrg aufJ2tyXmxqiFSUsGRQUKjmkuiRzbHC8h1M0lDREMygIGsh5Y9yxCMimFeJhaxqY0LVbKn QLhJRZuGfkeXh9uFDhOVpTFbkTkJByA= ARC-Authentication-Results: i=2; imf17.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=GUJ74MjB; arc=reject ("signature check failed: fail, {[1] = sig:microsoft.com:reject}"); spf=pass (imf17.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1659411624; a=rsa-sha256; cv=fail; b=DiJ1bRV2qgbv3EdoDAhwFW4edLHSFhbW1dosadg+aYtVK5wf8or/eZ3zZUHKGSkcFPTlHn 3el+ekTW3E7+Bx/i5i0KW/QzhxpFSYsJj0yJj4RsVCgiM2REqQfPaARkgmnN1FYiYC+KW0 ZnIZ3TXzlpdd/Ihw/x+97Gjo34Co9vs= X-Rspam-User: Authentication-Results: imf17.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=GUJ74MjB; arc=reject ("signature check failed: fail, {[1] = sig:microsoft.com:reject}"); spf=pass (imf17.hostedemail.com: domain of dan.j.williams@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: u71fzw7fct9aauff7mqi36zw6oencqk3 X-Rspamd-Queue-Id: 85030400F8 X-Rspamd-Server: rspam10 X-HE-Tag: 1659411623-818538 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Huang, Ying wrote: > Dan Williams writes: > > > Aneesh Kumar K.V wrote: > >> In the current kernel, memory tiers are defined implicitly via a demotion path > >> relationship between NUMA nodes, which is created during the kernel > >> initialization and updated when a NUMA node is hot-added or hot-removed. The > >> current implementation puts all nodes with CPU into the highest tier, and builds > >> the tier hierarchy tier-by-tier by establishing the per-node demotion targets > >> based on the distances between nodes. > >> > >> This current memory tier kernel implementation needs to be improved for several > >> important use cases, > >> > >> The current tier initialization code always initializes each memory-only NUMA > >> node into a lower tier. But a memory-only NUMA node may have a high performance > >> memory device (e.g. a DRAM-backed memory-only node on a virtual machine) that > >> should be put into a higher tier. > >> > >> The current tier hierarchy always puts CPU nodes into the top tier. But on a > >> system with HBM or GPU devices, the memory-only NUMA nodes mapping these devices > >> should be in the top tier, and DRAM nodes with CPUs are better to be placed into > >> the next lower tier. > >> > >> With current kernel higher tier node can only be demoted to nodes with shortest > >> distance on the next lower tier as defined by the demotion path, not any other > >> node from any lower tier. This strict, demotion order does not work in all use > >> cases (e.g. some use cases may want to allow cross-socket demotion to another > >> node in the same demotion tier as a fallback when the preferred demotion node is > >> out of space), This demotion order is also inconsistent with the page allocation > >> fallback order when all the nodes in a higher tier are out of space: The page > >> allocation can fall back to any node from any lower tier, whereas the demotion > >> order doesn't allow that. > >> > >> This patch series address the above by defining memory tiers explicitly. > >> > >> Linux kernel presents memory devices as NUMA nodes and each memory device is of > >> a specific type. The memory type of a device is represented by its abstract > >> distance. A memory tier corresponds to a range of abstract distance. This allows > >> for classifying memory devices with a specific performance range into a memory > >> tier. > >> > >> This patch configures the range/chunk size to be 128. The default DRAM > >> abstract distance is 512. We can have 4 memory tiers below the default DRAM > >> abstract distance which cover the range 0 - 127, 127 - 255, 256- 383, 384 - 511. > >> Slower memory devices like persistent memory will have abstract distance below > >> the default DRAM level and hence will be placed in these 4 lower tiers. > >> > >> A kernel parameter is provided to override the default memory tier. > >> > >> Link: https://lore.kernel.org/linux-mm/CAAPL-u9Wv+nH1VOZTj=9p9S70Y3Qz3+63EkqncRDdHfubsrjfw@mail.gmail.com > >> Link: https://lore.kernel.org/linux-mm/7b72ccf4-f4ae-cb4e-f411-74d055482026@linux.ibm.com > >> > >> Signed-off-by: Jagdish Gediya > >> Signed-off-by: Aneesh Kumar K.V > >> --- > >> include/linux/memory-tiers.h | 17 ++++++ > >> mm/Makefile | 1 + > >> mm/memory-tiers.c | 102 +++++++++++++++++++++++++++++++++++ > >> 3 files changed, 120 insertions(+) > >> create mode 100644 include/linux/memory-tiers.h > >> create mode 100644 mm/memory-tiers.c > >> > >> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h > >> new file mode 100644 > >> index 000000000000..8d7884b7a3f0 > >> --- /dev/null > >> +++ b/include/linux/memory-tiers.h > >> @@ -0,0 +1,17 @@ > >> +/* SPDX-License-Identifier: GPL-2.0 */ > >> +#ifndef _LINUX_MEMORY_TIERS_H > >> +#define _LINUX_MEMORY_TIERS_H > >> + > >> +/* > >> + * Each tier cover a abstrace distance chunk size of 128 > >> + */ > >> +#define MEMTIER_CHUNK_BITS 7 > >> +#define MEMTIER_CHUNK_SIZE (1 << MEMTIER_CHUNK_BITS) > >> +/* > >> + * For now let's have 4 memory tier below default DRAM tier. > >> + */ > >> +#define MEMTIER_ADISTANCE_DRAM (1 << (MEMTIER_CHUNK_BITS + 2)) > >> +/* leave one tier below this slow pmem */ > >> +#define MEMTIER_ADISTANCE_PMEM (1 << MEMTIER_CHUNK_BITS) > > > > Why is memory type encoded in these values? There is no reason to > > believe that PMEM is of a lower performance tier than DRAM. Consider > > high performance energy backed DRAM that makes it "PMEM", consider CXL > > attached DRAM over a switch topology and constrained links that makes it > > a lower performance tier than locally attached DRAM. The names should be > > associated with tiers that indicate their usage. Something like HOT, > > GENERAL, and COLD. Where, for example, HOT is low capacity high > > performance compared to the general purpose pool, and COLD is high > > capacity low performance intended to offload the general purpose tier. > > > > It does not need to be exactly that ontology, but please try to not > > encode policy meaning behind memory types. There has been explicit > > effort to avoid that to date because types are fraught for declaring > > relative performance characteristics, and the relative performance > > changes based on what memory types are assembled in a given system. > > Yes. MEMTIER_ADISTANCE_PMEM is something over simplified. That is only > used in this very first version to make it as simple as possible. I am failing to see the simplicity of using names that convey a performance contract that are invalid depending on the system. > I think we can come up with something better in the later version. > For example, identify the abstract distance of a PMEM device based on > HMAT, etc. Memory tiering has nothing to do with persistence why is PMEM in the name at all? > And even in this first version, we should put MEMTIER_ADISTANCE_PMEM > in dax/kmem.c. Because it's just for that specific type of memory > used now, not for all PMEM. dax/kmem.c also handles HBM and "soft reserved" memory in general. There is also nothing PMEM specific about the device-dax subsystem. > In the current design, memory type is used to report the performance of > the hardware, in terms of abstract distance, per Johannes' suggestion. That sounds fine, just pick an abstract name, not an explicit memory type. > Which is an abstraction of memory latency and bandwidth. Policy is > described via memory tiers. Several memory types may be put in one > memory tier. The abstract distance chunk size of the memory tier may > be adjusted according to policy. That part all sounds good. That said, I do not see the benefit of waiting to run away from these inadequate names.