From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A2A9C04FFE for ; Wed, 8 May 2024 04:16:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F22426B0082; Wed, 8 May 2024 00:16:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ED2736B0083; Wed, 8 May 2024 00:16:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D99C06B0085; Wed, 8 May 2024 00:16:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id BCCCA6B0082 for ; Wed, 8 May 2024 00:16:11 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 332D3140F77 for ; Wed, 8 May 2024 04:16:11 +0000 (UTC) X-FDA: 82093916142.20.D9D8618 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) by imf23.hostedemail.com (Postfix) with ESMTP id 7DEC5140016 for ; Wed, 8 May 2024 04:16:08 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=c1j5Gv5p; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf23.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.9 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715141769; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vMsZZhISyOoU6OIrxpdTZOm4yLaPhQdSQOhqS2abQ7o=; b=fJ4jvB3twzTw31jWvgAzZtx/hNA5+9B4gqt6B+2xECf7YoAveiCdGRVwsb09fq9+8l0RAv g/1us2d6k71VsBosi7FSclmzKIybEtw3p47vhg74gIqQYJHS0lbM4fry23igyDbfEh8yOE 1TqV5sZowR4FQrGAFE4wCiF3yS4bWRs= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=c1j5Gv5p; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf23.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.9 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715141769; a=rsa-sha256; cv=none; b=K1QXrZiyFeXswhVWJzfiWvrVS2No7wwf5W6vxOKDbRqwW4W69rzyohwsBZHRVbtDYTO2ro qLNv3NeEWTYBjX41g281LDHafF2bYOCT0Dya/3qEwCIp8KCLeBZEk4GnSBiZd8FsNAgNMz OzbG3IV/Kb+k3fDQtOfns7eiU8bX9sU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1715141769; x=1746677769; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=nwBCEQtK/YVAvIEW2pjuqKRUgAAkJMrZ23IjtQD2rI4=; b=c1j5Gv5pslFLo64fCNhBunnrL64Uf6qxE5IAqdGTEKxt7fwrhFmvP+7c ONTMp0vNjj4fNk4d4VcyjXBMdcdP5+Eksaxazu9umDeVIe/cGpuV+CCVv Yu4TnyDlF0tyWXc+D68N9s5P1pk/QuQSMkhfNrL3D5VS88Qg0T2YtSnJm 3ejKqg5I/NoOO8PNPxZcwI4iZ/ibBJZMMx7YS7rb+2Gpj3vVGkQtd+3FU +rwRywkByMhPispUL/bkCcvmNC/S8Fsopb6END2TWEMEQltZLy1n3/R4y kfdhjFBvApik6YLGQlA69Wn+a8Z7pR1wcnCGZ7DRXoQxejzgHbVJmbLIf w==; X-CSE-ConnectionGUID: FT5iOmQwRRydq6Y7NmB16Q== X-CSE-MsgGUID: K40SFlKLQlCRnV0Ianwk5w== X-IronPort-AV: E=McAfee;i="6600,9927,11066"; a="33487392" X-IronPort-AV: E=Sophos;i="6.08,144,1712646000"; d="scan'208";a="33487392" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 May 2024 21:16:07 -0700 X-CSE-ConnectionGUID: Q0kSpGBLQFeJFnPZnlDVQQ== X-CSE-MsgGUID: Qv+jeFFxQ0mbI9RKw+l50Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,144,1712646000"; d="scan'208";a="28732342" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 May 2024 21:16:01 -0700 From: "Huang, Ying" To: David Rientjes Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, Michal Hocko , Dan Williams , John Hubbard , Zi Yan , Bharata B Rao , Dave Jiang , "Aneesh Kumar K.V" , Alistair Popple , Christoph Lameter , Andrew Morton , Linus Torvalds , Dave Hansen , Mel Gorman , Jon Grimm , Gregory Price , Wei Xu , Johannes Weiner , SeongJae Park , David Hildenbrand , Davidlohr Bueso Subject: Re: [LSF/MM/BPF TOPIC] Locally attached memory tiering In-Reply-To: (David Rientjes's message of "Mon, 6 May 2024 20:37:19 -0700 (PDT)") References: Date: Wed, 08 May 2024 12:14:09 +0800 Message-ID: <87msp1kkj2.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 7DEC5140016 X-Stat-Signature: w48xqii1i1ix7r8yhbfym389ogbk5rp4 X-Rspam-User: X-HE-Tag: 1715141768-937775 X-HE-Meta: U2FsdGVkX1+pruWYeq6a5Z8yKcuI5jkL2p7/brIfDuuNnQ0AhhCFlG9/G1q/sdfOuAlsGoBo2UtnkbH/u9A/nkA0ABg1JLER39Oc0Hu18QQPxHoPN9FSip7xLY+0RAOxkScA9RrzgHZ82PZ0ksS4bhSqu15iyahzfj7giHWl4+MYFZkDJIuYP0tmO9y9eOlr9b3/2PKGIucyISgs2BeCtD2FFdFDuoLF8Gu5pGgsYT9VDSypGiegkl9jRWFv1o04S8yGOI5ZXHn2l3HfKHj5AVcRWhQFt7yArNGOM9ywWEVHAgybnEmXmeRd7LA8gLO0E4MmOq7wZxTIOdEMxsa1NfJOsvLhzNUHGN/HUIxgARoDarlf8IfQDE9T3ISZHbJkiHLHEwoin9HTs6kPfzx6I3Cc6FW5kyUaDk5muGox47PA1RtdoJSb3qOe09R0UjMX5gqr/VlofTKTd8lKUOJHRP2LEhUt5sBgaGfZKNpM5/NFWm7eBeP1E3yXTE42OsHMxT7tjMGwy/g+XcYDOp9SGqQcr1DEztu99S0KhGpWm+jGsXRIAgTeWJw6N5k5q4Pnaf7f1tGewV/FgFlhYGh2T9VUTLzg3b3gsA59DARg70q34dkYvsKQwWrpmusD9RCQbaSnEMyvgw0B/uM/iVhef43S+lcFpDaX8A9SPXHd+RyDIUPTTvX9OkRvmhmtgHHb5dSQx4lnJLJo+44be/FCGmxo76OwYNEMi0oc80iiw/T58kbYX1JDNwiuPF4KxX2XoiBYqG9whLwfMCMp5A5AAPJRcXcwN3/mJ4vW0kEDNEUorLJiqWqBzFfq5lm2Do9ZojSkrfraPX+D2LPwS3VinF08IPHdsijyzU+F3CEmWkPLgX9rRFO0wc5gQgLF5LLezspIec9nEltWtWpthKyjJcC6X3enONzV9A8hftlj02YhEHTYJ4WPQVl9l1U/lERP30uxbrBNvX3ro5xmnl9 57qBELTV 9zcbgSEVFVjNEJ9YnFmVVQXqYEByYGFL3beFYGpa27m4voL2h5C8Afg8d5LTVQxBPScnuEqIzywTZuCRjn47lXNRx0mRZ+R7Wf6s5U69qsbxmZ4tbNg/SMX6a5mN5udY5nH3an0XkBa6Lx9pZpP/keiGKBA8i41GxsL8zsA3fjDGxkFkrCjmDJRYXT5jo0EJMN/9F68G1QTl2EQSfuMXUxFaMHQD5RvWa2snzWBPJOgb57IWe8XPhmqMtWMD1OgTsc9jrH4fHmSNpiOC8byG5gHDx6oRrG4w3yDiV96EqXFKY/n2kcOaT8Vg2RzDL2Az5GVBX9A1Z/PMDqD4Kxf9X59UdeF6rVefPEXYHCYq7Q9s0AZY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, David, Thanks! This is a great summary! David Rientjes writes: > Hi all, > > I think it would be very worthwhile to have a block set aside for > discussion on locally attached memory tiering extensions at LSF/MM/BPF > 2024. > > Primarily interested in discussing Linux enlightenment for CXL 1.1 and > later type-3 memory expansion devices (CXL.mem). I think we could touch > on CXL 2.0 and later memory pooling architectures if we have time and > there is interest, but the primary focus here would be local attached. > > Based on the premise for a Memory Tiering Working Group[1], there is > widespread interest in the foundational topics for generally useful Linux > enlightenment: > > - Decoupling CPU balancing from memory balancing (or obsoleting CPU > balancing entirely) > > + John Hubbard notes this would be useful for GPUs: > > a) GPUs have their own processors that are invisible to the kernel's > NUMA "which tasks are active on which NUMA nodes" calculations, > and > > b) Similar to where CXL is generally going, we have already built > fully memory-coherent hardware, which include memory-only NUMA > nodes. > > - In-kernel hot memory abstraction, informed by hardware hinting drivers > (incl some architectures like Power10), usable as a NUMA Balancing > backend for promotion and other areas of the kernel like transparent > hugepage utilization > > - NUMA and memory tiering enlightenment for accelerators, such as for > optimal use of GPU memory, extremely important for a cloud provider > (hint hint :) > > - Asynchronous memory promotion independent of task_numa_fault() while > considering the cost of page migration (due to identifying cold memory) > > - What the role of userspace plays in this decision-making and how we can > extend the default policy and mechanisms in the kernel to allow for it > if necessary > > Additional topics that you find interesting are also very helpful! In addition to the hot memory identification and promotion, I think that we should consider the cold memory identification and demotion too as a full solution. The existing method based on the page table accessed bit may be good enough, but we still need to consider the full solution in the context of the general NUMA balancing. > I'm biased toward a generally useful solution that would leverage the > kernel as the ultimate source of truth for page hotness that can be > extended for multiple use caes, one of which is memory tiering support. > But certainly if there are other approaches, we can discuss that as well. > > A few main goals from this discussion: > > - Ensure that proposals address, or can be extended to address, the > emerging needs of the various use cases that users may have > > - Surface any constraints that stakeholders may find to be prohibitive > for support in the core MM subsystem > > - Alignment and division of work for developers who are actively looking > to contribute to this area > > As I'm just one of many stakeholders for this discussion, I'd nominate > Michal Hocko to moderate it if he's willing to do so. If he's so willing, > we'd be in good hands :) > > [1] https://lore.kernel.org/linux-mm/45d850ec-623b-7c07-c266-e948cdbf1f62@linux.com/T/ -- Best Regards, Huang, Ying