From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E66BCC04FFE for ; Wed, 8 May 2024 21:39:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 36BB86B0088; Wed, 8 May 2024 17:39:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F39B6B0089; Wed, 8 May 2024 17:39:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 16E646B008A; Wed, 8 May 2024 17:39:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E90396B0088 for ; Wed, 8 May 2024 17:39:33 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A5FC01411C1 for ; Wed, 8 May 2024 21:39:33 +0000 (UTC) X-FDA: 82096545426.30.CA0DA47 Received: from silver.cherry.relay.mailchannels.net (silver.cherry.relay.mailchannels.net [23.83.223.166]) by imf25.hostedemail.com (Postfix) with ESMTP id 1EB54A0013 for ; Wed, 8 May 2024 21:39:30 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=stgolabs.net header.s=dreamhost header.b=IUlOzFJl; spf=pass (imf25.hostedemail.com: domain of dave@stgolabs.net designates 23.83.223.166 as permitted sender) smtp.mailfrom=dave@stgolabs.net; dmarc=none; arc=reject ("signature check failed: fail, {[1] = sig:mailchannels.net:reject}") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1715204371; a=rsa-sha256; cv=fail; b=Gic31BZlpxP2sLtmjHKZUXjIaR0XirRPzFHI7tkWU8QN5u8p4vem8XLmsC6nWTM3g82bzc OKxozl50FCAoCIOpXkPCP+/sWY8Lpr8w5j5qpvQDakKW2uQmPdAlAQkNRNkXcbobfjcKZO 0NX2+XntZN1KrleBBs2bfBIvKqEPNgU= ARC-Authentication-Results: i=2; imf25.hostedemail.com; dkim=pass header.d=stgolabs.net header.s=dreamhost header.b=IUlOzFJl; spf=pass (imf25.hostedemail.com: domain of dave@stgolabs.net designates 23.83.223.166 as permitted sender) smtp.mailfrom=dave@stgolabs.net; dmarc=none; arc=reject ("signature check failed: fail, {[1] = sig:mailchannels.net:reject}") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715204371; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hw0dRYDTOB7LLE9l9dQlBefrVZqGTtwTEw0f5CAztNM=; b=Z1WQU9T6i37F4Htbu2KkjS3SiNqj+ayg0dwv9UWAtOfbtKxxugoNctvqAGd8BqnRc/2S0U M2SJmtNFnwSDmV3nDYVfvZgPkGuZO5TUZCrMi4ayMwyJAmLBbm6M2EKUrgR7kxHdUBwNw2 msP82BUjl7wjSqEdDoZl6qrubHHMi3w= X-Sender-Id: dreamhost|x-authsender|dave@stgolabs.net Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 80B0482092; Wed, 8 May 2024 21:39:23 +0000 (UTC) Received: from pdx1-sub0-mail-a309.dreamhost.com (unknown [127.0.0.6]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id C6C5083951; Wed, 8 May 2024 21:39:22 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1715204363; a=rsa-sha256; cv=none; b=sNRLAPtnN8YPPySvTA+pxmzia6qt3UHDibTEVAaiTVtHUFnxKfjpi59U5Ti8glGpOXr007 NuNb8EIa3haObQcgWnu4xGEW45eN53I99YTvkqe3fq4ob9Y78BPCM2g6LLWkE3w3DQjIzo cKBcd6ywlvPIP/QXf6A4FZWYo7h7lf/yiw5/iVpBJ5C61ydAGQV6nRcmbQBOREzt9h4PIG Amju2nnJfsj8NZItVqlpaIMtH+We1/Uzi9PX7nTLbpHYrC98Jqstf19YNZQ2EfPp1YxHwV PsKBnOLEMudeiAr91bxqlhnrOsyjEWY4AlK+YpZHBR3PKajPwF7ciuwzHm3pUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1715204363; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tI+uUl2ILSD8MTAYd6bvAlKXCxXIksm0rklGzhE8KJA=; b=GLEvviP9PSVt8LqCtyhyCFpQq6N1A9gzxXl6xhqgZ0BVBPmCtMjAoPXnm39XUWcKNOAiOm qgQeEIqXeddwL8MqvmhpzCpCt+pqlyWMzcc06rmJurNkDS9T6iZCvVv2mlFmMFU4VpxJPq d4YLRWLOPLjOnsqf8mI92yMCRHlE5h/juyiKXNOKp94Q66lDTbeqNSz6cxm52rh+XgVUcE tLxeeOBrmEP2Cbxlhh3teXo/NEk812AXKkaP+jMCwqmSzDbTew+1kiILuH2tip8ILEySlh x+iotVoJAGT9R/5yIsnhysKmC3f5l6gVgh1DO3WD4SYIH0sdhh89+TL/sz5W5Q== ARC-Authentication-Results: i=1; rspamd-5d55749bb4-pwdvj; auth=pass smtp.auth=dreamhost smtp.mailfrom=dave@stgolabs.net X-Sender-Id: dreamhost|x-authsender|dave@stgolabs.net X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|dave@stgolabs.net X-MailChannels-Auth-Id: dreamhost X-Desert-Dime: 7b1b2fcf4f435a01_1715204363331_4227845176 X-MC-Loop-Signature: 1715204363331:142236151 X-MC-Ingress-Time: 1715204363331 Received: from pdx1-sub0-mail-a309.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.111.87.15 (trex/6.9.2); Wed, 08 May 2024 21:39:23 +0000 Received: from offworld (ip72-199-50-187.sd.sd.cox.net [72.199.50.187]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: dave@stgolabs.net) by pdx1-sub0-mail-a309.dreamhost.com (Postfix) with ESMTPSA id 4VZT7F0yC2z6w; Wed, 8 May 2024 14:39:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=stgolabs.net; s=dreamhost; t=1715204362; bh=hw0dRYDTOB7LLE9l9dQlBefrVZqGTtwTEw0f5CAztNM=; h=Date:From:To:Cc:Subject:Content-Type; b=IUlOzFJlGuZLDdqEaeDuR+gaMyEqcwBCKSqtT+8seSg5J7hCPqXUozk7U5RvcF1fT 2Yvlw1+GUlsr/j7fcTHNkJFpYAZbdbAKBZijEARQn8QJz3eJMXrNQXXG0RQOpwjS/d lz8gbW6GgbUNCvccBN1BmOmomXLYtw7gOt3XrIaAkpU3rcmnZeRmeuoiZ+eMH+Gnlw /RTwDdLBeFcilVw4w9w3+4NMkbTNtdHH8D2W7XC7B0raRIcGIMqBqrpfDhdILFEsP9 zTmLox2mjHYsX65TU/SHsiZg1qEkmQkr39UfwGWKSdCmX1p30EKr8XauWEuYGt04i7 ZhMBI0Rn1wO9Q== Date: Wed, 8 May 2024 14:39:18 -0700 From: Davidlohr Bueso To: David Rientjes Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, Michal Hocko , Dan Williams , John Hubbard , Zi Yan , Bharata B Rao , Dave Jiang , "Aneesh Kumar K.V" , "Huang, Ying" , Alistair Popple , Christoph Lameter , Andrew Morton , Linus Torvalds , Dave Hansen , Mel Gorman , Jon Grimm , Gregory Price , Wei Xu , Johannes Weiner , SeongJae Park , David Hildenbrand , peterz@infradead.org, a.manzanares@samsung.com Subject: Re: [LSF/MM/BPF TOPIC] Locally attached memory tiering Message-ID: <20240508213918.7ndnrjs6pxnklbpi@offworld> Mail-Followup-To: David Rientjes , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, Michal Hocko , Dan Williams , John Hubbard , Zi Yan , Bharata B Rao , Dave Jiang , "Aneesh Kumar K.V" , "Huang, Ying" , Alistair Popple , Christoph Lameter , Andrew Morton , Linus Torvalds , Dave Hansen , Mel Gorman , Jon Grimm , Gregory Price , Wei Xu , Johannes Weiner , SeongJae Park , David Hildenbrand , peterz@infradead.org, a.manzanares@samsung.com References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20220429 X-Stat-Signature: pu8ncwgujo676673yoasfajoxqqt66f9 X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 1EB54A0013 X-HE-Tag: 1715204370-486754 X-HE-Meta: U2FsdGVkX18FQcveJAo25Qzc212NHVM7q2Tp954kDmw86N1StCM4kOEk1P/plqkMGNeP6b+p/I7AdaiQnbg49npRIf0ct1jYysPXxvqqAe3jyQP3bW8a8egEYtnyxtPQIh972tdkrah/P5ykkYkDp+Z5w83mQOz1aAPuW5J20jbhpPXfQH87AWhl4xQmqOUsPEIAiCES/p5E0Qa0aBvQ0TaY6vBOzk9wYdNNK4sjQjWgoN8roTSVPzakeY06CSSeVtZ/ZJ2kcPDSormZcXLHJ0PBT/YJeofBmQOlQqynPuNGH9ztPdaEKCaomAeNog9Fgdl00xQOS1a0lPGKRnL+umjE8jsF+yAyFR16WKt54MnJLW2Roa9CWuUh33QmmMCKitayuKklp85ajq2TXtYLAFAeUMyVeflnHaMpv7cBheDQtsaZzPteDUh2gMYxOU4u0FIyLerdmrggOqLbTBuSgHpBWEUMbGnvZj7xz9wqIq/NqKk6eNW1wn/vHqhx8jVaTr6iQ0Rlm+jsXvxy23N7zilNmBIBlkidw/+iEJpNzXvZh58fQOnvvDJ90aNAMS8AbtGRijNTsTd4tgr+sseWU7ym1FL1EzcbpgdSgy8HzzJf7l9mWePRcB5O48/ARmPg2GgYHc2BWXt12t5s182v8ddrBanehWDFyFbUUcYpjRKWOht7aOJtXPLGUkvtNBO6HiI/1Sklz4S681+Idkr3uxE2lNcMk6xBVAfEAH3UbnrjB8cK/+j42H/A1I6uC2c7Z4bh+LA819u63bLdSTJzkfVUPzTbauuE1ym54Fl5J8iuIpH9QWc86PQOGmRBPsCQ30AYl95eSq0VrKTIxp/CpsLISpQcGjcTm9AJD2F1wPqs7V8zS1EnKAE6n+jO/w6M4cxThDzezFj7MipbF/HQfeHwOy9BioS9VjJsZLEbwQ7D4K9gWV10IQVaUw8EGSueByd3Pu/GWxF/ZY2oM1l KeMd6jBD Rqt14RMb8fBZkyBbfHblyj6K+paHt73EthSrivee/VbMO954NBRFcFd4ktTkFUy9iE/EUmjt3Km4bpXQRaKajFQvrIbaDWkcOg+QpKYfSARHont4FrOYKDGTirBXmN92xic0851Sx57aaLqdYpXQBtV9oi2BGRKdIon27guurrtU4O/gcTEWAtT88quSj0DpTTxI5CPPJgpUKtLNPHtRTCQikRx/La1vx8oSdGMqh/NcCNZqWKkzHLGU0LKgrRO5Do6V0WzmE5ehBrlyf7Rkqyoe2ZhkTLxYgt1YdTTfVOpav0mQLgwXoAkp84lMp7018hTfTZB4y70Y+dfhkJwgd/BAxKIh7he9RloDVmaqSV0suSto5mB6n8V3OU8voaCS6XOnVHpe0VwF41F7V3VZSlbE1vfr3X9n2zs1J35VbT0MsJrRetKvWc7NECIC0sD8tMt/urxCC/FLEubf8BS50XtRfSlN17+i1r5+GtZNlfHWEj0IuNd+1++7hfw53qp0zp3Qh4ZG/vfGwUj/KlkrPpvWeR79YJWdhjibm74Rg2zxJhw7ytOYowZ7j8g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, 06 May 2024, David Rientjes wrote: >Hi all, > >I think it would be very worthwhile to have a block set aside for >discussion on locally attached memory tiering extensions at LSF/MM/BPF >2024. +1 fyi Adam's proposal which touches on both cxl and tiering: https://lore.kernel.org/all/9bf86b97-319f-4f58-b658-1fe3ed0b1993@nmtadam.samsung/ >Primarily interested in discussing Linux enlightenment for CXL 1.1 and >later type-3 memory expansion devices (CXL.mem). I think we could touch >on CXL 2.0 and later memory pooling architectures if we have time and >there is interest, but the primary focus here would be local attached. > >Based on the premise for a Memory Tiering Working Group[1], there is >widespread interest in the foundational topics for generally useful Linux >enlightenment: > > - Decoupling CPU balancing from memory balancing (or obsoleting CPU > balancing entirely) > > + John Hubbard notes this would be useful for GPUs: > > a) GPUs have their own processors that are invisible to the kernel's > NUMA "which tasks are active on which NUMA nodes" calculations, > and > > b) Similar to where CXL is generally going, we have already built > fully memory-coherent hardware, which include memory-only NUMA > nodes. +Cc peterz > - In-kernel hot memory abstraction, informed by hardware hinting drivers > (incl some architectures like Power10), usable as a NUMA Balancing > backend for promotion and other areas of the kernel like transparent > hugepage utilization > > - NUMA and memory tiering enlightenment for accelerators, such as for > optimal use of GPU memory, extremely important for a cloud provider > (hint hint :) > > - Asynchronous memory promotion independent of task_numa_fault() while > considering the cost of page migration (due to identifying cold memory) This would be nice for users who like to disable NUMA balancing. But overall when compared to anything hardware can give us (ala ppc, without the required kernel overhead of x86-based counters), I fear that software solutions will always be found wanting. And, afaik, numa balancing based promotion is still one of the top pain points in memory tiering. So, of course, improving the software approach is still a good thing. Fyi along these lines, improving/optimizing the current numa balancing approach has proven irrelevant in the larger scale of benchmarks, afaik. For example (active) LRU based promotion instead of blindly promoting the faulting page which could be rarely used. Benchmarks shows significant reduction in a lot of the promote/demote traffic dealing with ping pong cases, but unfortunately show little to no tangible performance wins in actual benchmark numbers. Similarly, the proposed migrc[1] which shows great TLB flushing benefits but minimal benchmark (XSBench) improvement. ... which brings me to the topic of benchmarking. What are the workloads people care about, beyond pmbench? I tend to use oltp based database workloads with wss/buffers larger than the total amount of fast memory nodes. > - What the role of userspace plays in this decision-making and how we can > extend the default policy and mechanisms in the kernel to allow for it > if necessary > >Additional topics that you find interesting are also very helpful! > >I'm biased toward a generally useful solution that would leverage the >kernel as the ultimate source of truth for page hotness that can be >extended for multiple use caes, one of which is memory tiering support. >But certainly if there are other approaches, we can discuss that as well. > >A few main goals from this discussion: > > - Ensure that proposals address, or can be extended to address, the > emerging needs of the various use cases that users may have > > - Surface any constraints that stakeholders may find to be prohibitive > for support in the core MM subsystem > > - Alignment and division of work for developers who are actively looking > to contribute to this area > >As I'm just one of many stakeholders for this discussion, I'd nominate >Michal Hocko to moderate it if he's willing to do so. If he's so willing, >we'd be in good hands :) > > [1] https://lore.kernel.org/linux-mm/45d850ec-623b-7c07-c266-e948cdbf1f62@linux.com/T/ > Thanks, Davidlohr [1] https://lore.kernel.org/linux-mm/20240226030613.22366-1-byungchul@sk.com/