From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABFCDC77B60 for ; Fri, 28 Apr 2023 03:55:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0E6FD6B0071; Thu, 27 Apr 2023 23:55:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 098C1900002; Thu, 27 Apr 2023 23:55:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA10E6B0074; Thu, 27 Apr 2023 23:55:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D8A206B0071 for ; Thu, 27 Apr 2023 23:55:18 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A54FB801C5 for ; Fri, 28 Apr 2023 03:55:18 +0000 (UTC) X-FDA: 80729434716.09.BCC0636 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf04.hostedemail.com (Postfix) with ESMTP id C347340009 for ; Fri, 28 Apr 2023 03:55:16 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=sqJRCep+; spf=pass (imf04.hostedemail.com: domain of weixugc@google.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=weixugc@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682654116; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NBXnDIptseiQonrEbb7yWC7QZyy0vJiPQH/5HZBViM8=; b=o422/bHYNhtqQssZP2qaemzIYUaaB+OH1sJVrvngwwvU9294C9oAhCcG4pqhyIMvOORqMt YRamg2w3E1IsQLdIBB9a2AqwX2+omfItspjb0wQ8py+07QHwJlT0gW/axURbI2rc7bdSqs 3Mx/I+SCe5rNeTsgIyrm7izNbdFD2AQ= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=sqJRCep+; spf=pass (imf04.hostedemail.com: domain of weixugc@google.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=weixugc@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682654116; a=rsa-sha256; cv=none; b=eKz8NWdK4pJTJGWawrJQvnSs/4bFPlgyZMkZNVl4IHVtuUe8+019cijeRZTA/bZNDzHV4m isx6O0ZqFa+Cg72FM3YSdOHAClI7XuNzy8lhMmzHUBfF971U+6Mxr44st7rG/C+QbVammK UiPd/auj6R5q1FBTVaYsflcjREPQS8g= Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-2496863c2c7so7867704a91.1 for ; Thu, 27 Apr 2023 20:55:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682654115; x=1685246115; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=NBXnDIptseiQonrEbb7yWC7QZyy0vJiPQH/5HZBViM8=; b=sqJRCep+tSKZ4HXg8wI6mSzXlzV+tjbmR5r6LfEBxRm0gEn1ws7vhDQmGNsc+I8j2y 4T6BKCdHnXmXyLuhIKA8+kJ1ade1zz65ymOalyWpwEDpeARSufGbKKxaqBJ2oGw46o/I +2OYE9ylU6wm32X/ZmCTvRKRpwWbjirCCNLOe3hpSva6AA7uVacsbPSeH1V3NmG8Rzs6 uHNsTLSku0bsxuVukBPeN7rq/6AL4beS9xf+XvKmgsa+KrMGoPaRBi6Iypcd2cqeYbUD OmnQJsYRaIrnOe8lQ+iZSn6GUwTEuKYKCjqsFtGN1hBj/UNqqzoVdWoUrp5F5wfkEKIj Y/cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682654115; x=1685246115; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NBXnDIptseiQonrEbb7yWC7QZyy0vJiPQH/5HZBViM8=; b=ktwHB8tSCkgJGRFxldc4Deza7JqrDUnUyrpJtGvYftEtI8FTLttYjbIE0zf63z08D0 v1n0XtcNeiffkwzXjTffAMw00S5BpcD1Xd6zc4HCxAAZ5y8zcI+WWg+IWYKTdrsNfrOg mhTfmaJuH03Xm0DA2KC79WoseXNbdz/LbI8Vyq2O3K9CWXzzG6hE0av1TbgCUo7p3P74 5jn2mb2SNAnwENJnoc1rfo4LDinS4sgjqRumEJwPblGxx1bcxSTI4CbvHVOSvIzGUY0N 3uQvvnKJqvnTiamxHrdROfngHGfevGb3ZX0O9mSEkT5HjTwrkgNJ5tdwf2uoxCL7x1WL ZzNw== X-Gm-Message-State: AC+VfDw1qeqXwaMBEXba4fbZu6n29W+WWxPb1JGPQGqIm4ismmmGWfZ0 ItaN3P8HZbGvTCL9ozI2l2wRmpGWEbPVmYACcARp2g== X-Google-Smtp-Source: ACHHUZ4H6bgxdqt2gU8BhH0xlLV4hOuVjCCRIMiP45fynIrUuwHDfjL6505Jv9h0CSI+6c3yvHc08ZooN+lE3ARJtyY= X-Received: by 2002:a17:90a:4897:b0:23b:3699:b8a9 with SMTP id b23-20020a17090a489700b0023b3699b8a9mr3868629pjh.17.1682654115521; Thu, 27 Apr 2023 20:55:15 -0700 (PDT) MIME-Version: 1.0 References: <7443f0e6-6be2-3320-60d9-03da0cca2987@google.com> In-Reply-To: From: Wei Xu Date: Thu, 27 Apr 2023 20:55:03 -0700 Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] The future of memory tiering To: Frank van der Linden Cc: David Rientjes , Michal Hocko , Dan Williams , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, Johannes Weiner , Dave Hansen , Huang Ying , "Aneesh Kumar K.V" , Yang Shi , Davidlohr Bueso , Jon Grimm , Greg Thelen Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: C347340009 X-Stat-Signature: p1xp84e11t3j6wsbyyru9f9f73adfyhy X-Rspam-User: X-HE-Tag: 1682654116-616407 X-HE-Meta: U2FsdGVkX1+E5fdRcgPgJSYtQC34BPCWGWjgS+6M/llNSNLPOodMcFtFja9TJRuIjYCrX4iFJleBVGUHCenOIN7HQabP14dzOmO4NEAmwBhiW8vd3qHUOLxgpajKfFBk+ubA67cK2+1nPq/D/TSYYIMCZzu2zrV09xXu/wA+GhvL3hGlsWZWMRycjhjZkz49oPMN4Q7uRDhTjspjQ8xUFoZoo6Ri6XjjkdyCZfmpD/w8l8yrFngs47a2YT3Y7Fa8xRti1Qq62rkzCFKyOsTRePeO4D3rOZRfsX0mE+vAOB6THfzaUhCIQAFIxHgEDj28nGDTwJWE7myDtGQ90c55svN2Y9pBqAcffWQsI02YINOCU0bDb87vX2T5akVzVu5bKjbNnbqbk2PrjnOWpdNO+5bKJkrQvlIdogXlVKp+2bgcFM3EdvQqeBofRYh5HgYXCf0/2a8oWOq46Pba5YBsUHn5oS3pe/kzRwGv5QeFiecDDsV2n2/ejtK9McEFNcbhDCzE4kNWVUTOo8Ff3YUd0WFpiFs+VucuZD4yz9gPOM15qFCR8NjrQlTUtvN+gABKuE2y+BctW/E1LW+sYEa+Qzz8R4VVEL8Akcy/GBAK9NxBzLvsGcAJYgKLOzko9fdjXqPjsNIO/FBRzjGqOm6Tgbtme3YnWTVJU2Qg85xj2WxMUTxwZHt/Rc5/qfpifZFhlAQq+XFpqAsXaAIIYQEjRBIUX3S3osw4O06iQh7lO84E9Mb/8l6Ro4M4Nj0fSb6VoHAhsF6mu82VvZZIa4kLo/yQy4XZm9ZJBaecdT/wOvzSTSHqQB0JePVki03rxb+EPG0dq7lpdKQv/GM7x9xnsDqN1qQ0DKQEBYj5EaSCTFxxtOllRbBppI/mcU2tWriMskHc4TzOtL/S7CW90BFi4HtQzdjoNzOq2aj9UiqS726OW4fAPZS9NNhBKhySve49ygjfcOQWDO65Zw343of 1TWcujU0 mo6DhbWCLL/ShA5nOtRsnC04xB7ESSJQBLveEYcAAT1g/VM3iiOG68nSKjksczdQmIBqr33C2bXdBrnf2Rfw9/bLu0LoYtvnycISvKIvAWsSmVpzDHuOYrbY96k/RGJtaQ8flExafuJ3G2OAOuqQMF2BhHfYd8XhzhclPjB8WlAF3gNRd7LB3RkdrgNhUU52n6T8cgfIof9xWIHdAsDtEJDCPfhW5Qt7kvXlmtbpw5SmxTIsSaUCCiWd4iw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 27, 2023 at 10:10=E2=80=AFAM Frank van der Linden wrote: > > On Wed, Apr 26, 2023 at 9:30=E2=80=AFPM David Rientjes wrote: > > > > Hi everybody, > > > > As requested, sending along a last minute topic suggestion for > > consideration for LSF/MM/BPF 2023 :) > > > > For a sizable set of emerging technologies, memory tiering presents one= of > > the most formidable challenges and exicting opportunities for the MM > > subsystem today. > > > > "Memory tiering" can mean many different things based on the user: from > > traditional every day NUMA, to swap (to zswap), to NVDIMMs, to HBM, to > > locally attached CXL memory, to memory borrowing over PCIe, to memory > > pooling with disaggregation, and beyond. > > > > Just as NUMA started out only being useful for the supercomputers, memo= ry > > tiering will likely evolve over the next five years to take on an > > expanding set of use cases, and likely with rapidly increasing adoption > > even beyond hyperscalers. > > > > I think a discussion about memory tiering would be highly valuable. A = few > > key questions that I think can drive this discussion: > > > > - What are the various form factors that must be supported as short-te= rm > > goals as well as need to be supported 5+ years into the future? > > > > - What incremental changes need to be made on top of NUMA support to > > fully support the wide range of use cases that will be coming? (Is > > memory tiering support built entirely upon NUMA?) > > > > - What is the minimum viable *default* support that the MM subsystem > > should provide for tiered configs? What are the set of optimization= s > > that should be left to userspace or BPF to control? > > > > - What are the various page promotion technqiues that we must plan for > > beyond traditional NUMA balancing that will allow us to exploit > > hardware innovation? > > > > (And I'm sure there are more topics of discussion that others would > > readily add. It would be great to have additional ideas in replies.) > > > > A key challenge in all of this is to make memory tiering support in the > > upstream kernel compatible with the roadmaps of various CPU vendors. A > > key goal is to ensure the end user benefits from all of this rapid > > innovation with generalized support that is well abstracted and allows = for > > extensibility. > > Thank you for bringing this one up. Memory tiering is a very important > topic that should definitely be discussed. I'm especially interested > in the userspace control part (which I proposed as a separate topic, > but happy to see it addressed as part of this discussion too, as that > is where the motivation originally came from). With the increased > complexity introduced by memory tiers, is it still possible to provide > a one-size-fits-all default? If there is such a default, is it > accurately represented by the current model of NUMA nodes, where pages > will be demoted to a slower tier as a 'reclaim' operation (e.g. you > basically map a global LRU model on to tiers of increased latency)? > Are there reasons to break that model, and should applications be able > to do that? Is the current mempolicy/madvise model sufficient? > > - Frank I am definitely interested in the discussions on memory tiering as well. In particular: - What should be the interface to configure and initialize various memory devices, especially CXL.mem devices, as tiered memory nodes/zones? - What kind of framework do we need to leverage existing and future hardware support (e.g. accessed bits/counters, PMU/IBS, etc) for page promotions? - How can the userspace influence the memory tiering policies? - What kind of memory tiering controls do we want to provide for cgroups? Wei