From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 740E5C47258 for ; Thu, 25 Jan 2024 21:37:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 09DCD8D0007; Thu, 25 Jan 2024 16:37:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 04D948D0001; Thu, 25 Jan 2024 16:37:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E57768D0007; Thu, 25 Jan 2024 16:37:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D2A3A8D0001 for ; Thu, 25 Jan 2024 16:37:07 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A74E4160287 for ; Thu, 25 Jan 2024 21:37:07 +0000 (UTC) X-FDA: 81719144094.22.E41CFD2 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf26.hostedemail.com (Postfix) with ESMTP id 0ABDF140028 for ; Thu, 25 Jan 2024 21:37:05 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=gZ15Y0N7; spf=pass (imf26.hostedemail.com: domain of rientjes@google.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706218626; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S+cD3HNnvtw9X+G0L9x1YklvKT09ByuvJtcMQmdN0B4=; b=IjnLitxMzFs1eYzGPsFMSvZNrvuYDBfI3VY/GBscnN7+8dRdwfOhfV5M9h8m27+37RDENI w9rdfSzyg8CdOaLvu+CQ+Vxm4fRM+1Iov4s4VjCWcDVKPjUnXN/MfIXAPt616qNWfR6l3k SQdt7wREpsPR8saUJI8LuncSXpmDQ9g= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706218626; a=rsa-sha256; cv=none; b=wMawRzWomE+EFy3nvmi5jA2E3kJomngUfySQQnCFjr7EsbwKnGjibwf4gCimAFrHA1z4HH 7wqi3mXAqcpb77LeDZ61CxcghkCl6jX6wbT/mwXNyliq1IefE6akcXdv1hYrIq8ivMt11K x0bJ2JV3QTTQ85VcdR4hAZcYRz2Dl6w= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=gZ15Y0N7; spf=pass (imf26.hostedemail.com: domain of rientjes@google.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-1d72d240c69so38945ad.1 for ; Thu, 25 Jan 2024 13:37:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706218625; x=1706823425; darn=kvack.org; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=S+cD3HNnvtw9X+G0L9x1YklvKT09ByuvJtcMQmdN0B4=; b=gZ15Y0N7t3Ur+COiNPv1f/xqIQs/a49h5JH0mTuQOAGdEeGo19lGo8ZendSc5vJ+AD mWXL5+8Ex+ChJMfnv813ZuaiwaIaeFeBGVEqAAeXOrEt18MKSJ4ybfqUzMQMBh+ktxai ZNm+879DhHZCq/CAESCOvaop/bjlhNpS+58oydaP71fnuC3gXXK3BpKv2MyT7fXo+Z7i zHIGISPJMs/h3APGWiwPBQsntyCXAPUmmWx/g0KpRoUhPJ99PY/Sx+vkgrJMVvUQ4v3y lWqFlOonSNEJFQNmsZeuMJY9x4nPx30Ders40sh7CUo1wGB2kyMTljPWL3KFv1fA73im UKNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706218625; x=1706823425; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=S+cD3HNnvtw9X+G0L9x1YklvKT09ByuvJtcMQmdN0B4=; b=G5R12ZdwFiyJ2auWRkMxBLjACLRq5fg09ea9NSgtAnyNo6Biyq/ZeVnS2nYQo0nrLR R7XxGeBiUIy+Y4PyOBbuTsVIB7s3II+ROeUaPGdTzjexyj8BaVsJDuE8DQu5+INx4Hfn ZCJuFYdy+MC4Ve1X1zhU07aWPyqUgOI5+5pV6CKPEp4FkT9toIfok6bjZFFoS/R01JvJ L9wBjA+/oq79Y9RtGbFtla9a1X8ydwX0+ZqesbUObFzO7SuZJZgLKeIhqOi0YuwsliCZ Lvr1f3GtdWgPUOnlE0QDTqcR9T/7ABaj0Ze6eKjb00S9HcDTxhGjjKSOcu6X2u8VJAFa CYFQ== X-Gm-Message-State: AOJu0YzKR0+QofpjeKLxpd7HNFwDfg8tQtV1XylmP1cpMJfQVScyWAwK ytUuRq6vGPCsg1dZovbmJrb7LCmWbsoz+tk+9RaSj0Vx+2Fg8z1decOdgBvLPA== X-Google-Smtp-Source: AGHT+IHW/zocLKH5+WdxG8N1V+w5ufrzNarW1YzvV9/XcRuGLqSYCV91T8MHBfk75gCH2e8XzmYjYQ== X-Received: by 2002:a17:903:234f:b0:1d7:4b04:108e with SMTP id c15-20020a170903234f00b001d74b04108emr13150plh.15.1706218624468; Thu, 25 Jan 2024 13:37:04 -0800 (PST) Received: from [2620:0:1008:15:8d79:aa0b:df21:e137] ([2620:0:1008:15:8d79:aa0b:df21:e137]) by smtp.gmail.com with ESMTPSA id lo15-20020a170903434f00b001d71f2ae008sm10670540plb.85.2024.01.25.13.37.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Jan 2024 13:37:03 -0800 (PST) Date: Thu, 25 Jan 2024 13:37:02 -0800 (PST) From: David Rientjes To: Matthew Wilcox cc: John Hubbard , Zi Yan , Bharata B Rao , Dave Jiang , "Aneesh Kumar K.V" , "Huang, Ying" , Alistair Popple , Christoph Lameter , Andrew Morton , Linus Torvalds , Dave Hansen , Mel Gorman , Jon Grimm , Gregory Price , Brian Morris , Wei Xu , Johannes Weiner , SeongJae Park , linux-mm@kvack.org Subject: Re: [RFC] Memory tiering kernel alignment In-Reply-To: Message-ID: References: <75f21150-1e12-4f4b-e578-e170e4fea18b@google.com> <2b29dd3d-bb2c-6a8c-94d2-d5c2e035516a@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: 0ABDF140028 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: is3bjjindtyqh8ffxxs5kx3fbqsgqzhu X-HE-Tag: 1706218625-982907 X-HE-Meta: U2FsdGVkX183oLYjqyenXcux+kIDS47DKBcqDD2GmeAWIJ8powUERXb8RYKeWJ+PftPwhV1/Xz/hYUbaRk/X16kklhU3xbljGfhvyfhDOTh0PDBrzblZAR3DG/XB80FRp/re0s4PfrwtszJRlKY1XAEsUHj/xeCl0B1JAEvizdYPeMNYE+kFRbtCwRllw2emxnDPD+01oxCm6ylBaeK6EzdZAOTF/czztg1+H9kz2LD/LrIVRsZeasv/NQuVhHYwe3Th32FEMKGbQFpFSQCX4jpjuljQ/iaap5wrKThZRbiBEHZ+wg57rkEwLuuANf+S92tMzJmYXgEuEb+XxrjxE+nF1CugboXccwCPhEzPAx4tlpQ9WP+qQhoabPOIYZeyIn4A6pFYObclG1te0nNqjRoIW0ZywIUSkQaqnjLDqGtkCRSOl6PoIFF+3SvXaXnNKv5KVe+kNbtxlzGxVNkNFJKqhmTq5w6SSO9n2vFYDKlsCRQtHv9PpvEvz1VWb78lcctFDHWFKGgJh7+8uzG9eTjU6BxZHl2d9cRxhPa5nRIoXPAi8/6mrtjrMkNLyYfPe77xdxuLbY8vrXeTKYIiYUmHfCA4UBpZAF5L7QYSv9sV6lT0mlZol1sEgdstwpDJthlI1thk/zRIcBQqIBkDJ6xi4WohwQ4f0u0atFTKNKzzjS1L3j7ZfBVXONOt1zVl+Xf88yTPZELlVJ6YbGQ2kcjgJrTaJmsflwavvoo3Ebw45mHYIcy1OWBnw99gNL3K0qBzfRcztyyr9/ZEw9r/aTfvz0uhPj7iBeHQI2wlX5znRXmtQks3G7+R3K/+epsQeQUTKJDVAQwBWyi/aiKdeUbVpLn5IzQyNR6msCuZ4cbfXZbR5QUGIhbfggvuQ+1c2esxU7K9YI+RraiPLPYsTFwBxv3EgVKfZDs5TuVNvN77xm57VMChn47yy4mbbCn3M64ij+bXom/LxUNWAHG FXNdJ+bk /Z74g6a81w6xYo06jdARbMJVYFckHYQ2T73TQb2+oqfvkD2EH8R4zQzLFc/UVaDaufN8Wxb9xxRUmIV/gV6nXInsTppPPycABwNDbtqVHm4jqAZq1SaioEAWB1tGGfR8wzNqIbG5+YgSFjRs1DvHIyUL+Hw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, 25 Jan 2024, Matthew Wilcox wrote: > On Thu, Jan 25, 2024 at 12:04:37PM -0800, David Rientjes wrote: > > On Thu, 25 Jan 2024, Matthew Wilcox wrote: > > > On Thu, Jan 25, 2024 at 10:26:19AM -0800, David Rientjes wrote: > > > > There is a lot of excitement around upcoming CXL type 3 memory expansion > > > > devices and their cost savings potential. As the industry starts to > > > > adopt this technology, one of the key components in strategic planning is > > > > how the upstream Linux kernel will support various tiered configurations > > > > to meet various user needs. I think it goes without saying that this is > > > > quite interesting to cloud providers as well as other hyperscalers :) > > > > > > I'm not excited. I'm disappointed that people are falling for this scam. > > > CXL is the ATM of this decade. The protocol is not fit for the purpose > > > of accessing remote memory, adding 10ns just for an encode/decode cycle. > > > Hands up everybody who's excited about memory latency increasing by 17%. > > > > Right, I don't think that anybody is claiming that we can leverage locally > > attached CXL memory as through it was DRAM on the same or remote socket > > and that there won't be a noticable impact to application performance > > while the memory is still across the device. > > > > It does offer several cost savings benefits for offloading of cold memory, > > though, if locally attached and I think the support for that use case is > > inevitable -- in fact, Linux has some sophisticated support for the > > locally attached use case already. > > > > > Then there are the lies from the vendors who want you to buy switches. > > > Not one of them are willing to guarantee you the worst case latency > > > through their switches. > > > > I should have prefaced this thread by saying "locally attached CXL memory > > expansion", because that's the primary focus of many of the folks on this > > email thread :) > > That's a huge relief. I was not looking forward to the patches to add > support for pooling (etc). > > Using CXL as cold-data-storage makes a certain amount of sense, although > I'm not really sure why it offers an advantage over NAND. It's faster > than NAND, but you still want to bring it back locally before operating > on it. NAND is denser, and consumes less power while idle. NAND comes > with a DMA controller to move the data instead of relying on the CPU to > move the data around. And of course moving the data first to CXL and > then to swap means that it's got to go over the memory bus multiple > times, unless you're building a swap device which attaches to the > other end of the CXL bus ... > This is **exactly** the type of discussion we're looking to have :) There are some things that I've chatted informally with folks about that I'd like to bring to the forum: - Decoupling CPU migration from memory migration for NUMA Balancing (or perhaps deprecating CPU migration entirely) - Allowing NUMA Balancing to do migration as part of a kthread asynchronous to the NUMA hint fault, in kernel context - Abstraction for future hardware devices that can provide an expanded view into page hotness that can be leveraged in different areas of the kernel, including as a backend for NUMA Balanacing to replace NUMA hint faults - Per-container support for configuring balancing and memory migration - Opting certain types of memory into NUMA Balancing (like tmpfs) while leaving other types alone - Utilizing hardware accelerated memory migration as a replacement for the traditional migrate_pages() path when available I could go code all of this up and spend an enormous amount of time doing so only to get NAKed by somebody because I'm ripping out their critical use case that I just didn't know about :) There's also the question of whether DAMON should be the source of truth for this or it should be decoupled. My dream world would be where we could discuss various use cases for locally attached CXL memory and determine, as a group, what the shared, comprehensive "Linux vision" for it is and do so before LSF/MM/BPF. In a perfect world, we could block out an expanded MM session in Salt Lake City to bring all these concepts together, what approaches sound reasonable vs unreasonable, and leave that conference with a clear understanding of what needs to happen.