From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39582C77B61 for ; Thu, 27 Apr 2023 17:10:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9C6BD900002; Thu, 27 Apr 2023 13:10:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 976BF6B0072; Thu, 27 Apr 2023 13:10:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 83F85900002; Thu, 27 Apr 2023 13:10:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 719C86B0071 for ; Thu, 27 Apr 2023 13:10:29 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 42FDF1C6A3A for ; Thu, 27 Apr 2023 17:10:29 +0000 (UTC) X-FDA: 80727809778.11.0B17DF4 Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com [209.85.218.51]) by imf06.hostedemail.com (Postfix) with ESMTP id 653EA180015 for ; Thu, 27 Apr 2023 17:10:27 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=N5bgjy+0; spf=pass (imf06.hostedemail.com: domain of fvdl@google.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=fvdl@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682615427; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GfXId5iPvgLT3d5C8v8bAdsH+uEo8xDzRbr0igUWb38=; b=HqNQ5iEYa+tYQSzhKvkiBzFOlKRQotWrPr8KcpxAe1Q1EQWPtAuBI7oDzsCViiZST5ZYr/ Q149P0m8pcs8suDkZ8lHFy3O5bMD0tEHYH94HQ4Tzu9BON43d1HW46d1tqFtFHj4GxFTmu HHqsNpiVCHKsJQLbj5YuN9NAbKIIrM4= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=N5bgjy+0; spf=pass (imf06.hostedemail.com: domain of fvdl@google.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=fvdl@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682615427; a=rsa-sha256; cv=none; b=ZzGYX8nRJNkXdXuHNpYN0VJwd+Yg3LaouzCKUDMCTw3fQrJB03xbjmz9MqHoE/Bn1DbG0y Y8DW9Xd2p9VxLT7E0vmiWx0+9MjoTerQLnh4avfwZA0BOSjaR+YYP0n67pxGGsZw5iVO8h QZCcFYaWEKBJvU2cVBD5AaTM4w1wsbo= Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-956eacbe651so1620569866b.3 for ; Thu, 27 Apr 2023 10:10:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682615426; x=1685207426; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GfXId5iPvgLT3d5C8v8bAdsH+uEo8xDzRbr0igUWb38=; b=N5bgjy+0Lt6mLLheQ2Sck2h78LvUm1zO/8WOKUEPXKtYmJ1M2QhHd2KjIQYFD9nNlt HdHUUFyjw3J08KhR5KZYC0uk0k/ZNFC4fIAqWwxAfhEhcEW0aZ6OvwIs9LbEhjf3cT8e SfIjcoNYtpqRypQTwIws+1YIE3W+Sol9qS1LD7v2dBZQRu0vhSPSP9E33c9+eWBVKAyn 8G/YZrH9PsJVfTAExCHPlNNKMHIA8nq/mtyaK8P2vb+TYMDxWI16TbD0fboLvakiVG5m nUnQAac8ooRIC1H++HgTrTvC4nOB1qV0ZKQQw8LcvgwaJ2nfI8ykFGazrguZ5xypXiIa SJmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682615426; x=1685207426; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GfXId5iPvgLT3d5C8v8bAdsH+uEo8xDzRbr0igUWb38=; b=XpTSe2HrmPqQbYUBLnxaaxJTQbjN+pFRPvAtuaCKp1Vw8AtnNjVepZPpiJBikjLjR/ JxsDgwNj5P981sPVFhC/VT0/b7pJkc+6P3FvvJB4s+IjVhyDwnoizRXAfrvRs3NFA6KQ AG71zMjT/YX3HBz9xPsTtQwBm9PIp8gpg8Y+oPrQPjxAkSqKYn5EYBKSnPGtBVEHLWDH XJpZgq/s4NTnE61VAj7IyRvcCP1q17E2hD8JfC4V8AJOOh2NZlh2LNzJuY0SLeMCPJvT eOgTaGhlg4YidNYdUoex8lHOWD5/v2t0ete+Fh2eIaceGLr2kumE/6Aa/p03XS5qWPAC 56Og== X-Gm-Message-State: AC+VfDzo4f2sBmHmP2IzrxJ5MrLLAgMCky28pDTOVAHLDRERi6vl/voL hfKx/mMWKnVRIA4J3rDm1OdgyLvzkFgoLXD5pK7dMw== X-Google-Smtp-Source: ACHHUZ4PPFy+ac4qtNFor6WBcnp1bs0blJTMZ6axbYcDE7NdS//6slclbgShh8lQJbizet9uOeqDA3vBe4y0O2lE668= X-Received: by 2002:a17:907:1b17:b0:94a:643f:ba5e with SMTP id mp23-20020a1709071b1700b0094a643fba5emr2278515ejc.46.1682615425632; Thu, 27 Apr 2023 10:10:25 -0700 (PDT) MIME-Version: 1.0 References: <7443f0e6-6be2-3320-60d9-03da0cca2987@google.com> In-Reply-To: <7443f0e6-6be2-3320-60d9-03da0cca2987@google.com> From: Frank van der Linden Date: Thu, 27 Apr 2023 10:10:14 -0700 Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] The future of memory tiering To: David Rientjes Cc: Michal Hocko , Dan Williams , lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org, Wei Xu , Johannes Weiner , Dave Hansen , Huang Ying , "Aneesh Kumar K.V" , Yang Shi , Davidlohr Bueso , Jon Grimm Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: kfx4nhbyuif745rotuokx48z9yb6t7nm X-Rspamd-Queue-Id: 653EA180015 X-HE-Tag: 1682615427-764238 X-HE-Meta: U2FsdGVkX185wskefveNWpiZ59q8ydeBU97lYa3PB7vROfImvspDbjOiJhE0R+2pi/AMHp6O6rd/gBIAxGxE8rmDGUowl3zE18UItNmaI8ehSZnhoArCBY4AAiixisp8gO9vMy2vt1ric+1oeSgZeC/CjnSk2rn2iqO6dDnx4oX7vWeTDUA61y9Pl17MjWC5XY/HS8hT1f5amO5OkgAMpalgkYSjVar9mgD5YHf7eKI0XySkOFU6G2LBXtf7PmlxJCn8deYcAX+dXQ67GXqbEEjBg4SdVx9NGwGIBSBtTXs4IgOuVdO7tG3MBaSh6TE/GyFu5ZYi/lSESw3W0/fqPKVmR77LqcHLGuL400T7eGXBz5WsmmPlZIqhS7xfu1q+BWdjwNkJsWhUi9R7xeYdZd/MC+k54PYErEMEdypPRHVO6oFFHZWR1IHJ9bc2dXriSpaEZnFGrjoHM+h58/Gt2WdaprczEVd9I3X+fy6ME3x1gKXUGexA7ygjc4Ec4z9E5ll1bkHWmISUjs7Zzh5H//tVUU12npTXFJWgGv/DBkVjCZlwXiNp2BPkkzVth9O8QU9gtV225XeeNq96f97aabdwkrd08JRMIYpzbsTg6KMYY6thTir8byTwWptVRZDRzZCODavkz4bXclmPsdVuRuzX/BYyGgGuHJBTXdVXSlJ1CFUgbbyVF76Wu2Qed9Vgg9FIEEm1lbc9M/srNmx8MMtIA1h6x48oVZW9N81uHbp7ljiHIqyI5bWx/ut7pQqqy65Jb5v+0qJZ+eskmqc+FGPYuPLZ4bsjRmAfFFPwLR6cBBquI2t2e47E5wdeevoX59p3hUoZcF+yvq5SNTw/0/8XzKKEJWUSDQByhbh2NNSezTyAeq+F+C1/YkTWA7NYtveN4OihjK0ks4DtXlvyESdKoskxrsQkrpq3bDvWvAW/kbPg9gIO/1gSfa5RVlHfK6RjYsRtfjaUmT+qnnt bEclrEtF I7gfa+gRhYwdfbeN5ozHrB6IGbNY67izsFOvEz7qXYtLOe3L94u8j+BuTKFKqkQTne6FLA0wLV62wm3PyLkR5Dn4+6CeRl7FwvRnR14InyjX5O3+x7mpp0SQefA0+v2pGNi7zB4NEksoCy2Q58CXT3IyaD/5wM93URcyvgm2XBgf8cywhF6rLf0Bt9d+eZMcJ9RbnGmBxpYHf35PpVv+Qu8VRPfvO5C9cxeR3l2eVaJiOTCRPrEIk6P8x3g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Apr 26, 2023 at 9:30=E2=80=AFPM David Rientjes wrote: > > Hi everybody, > > As requested, sending along a last minute topic suggestion for > consideration for LSF/MM/BPF 2023 :) > > For a sizable set of emerging technologies, memory tiering presents one o= f > the most formidable challenges and exicting opportunities for the MM > subsystem today. > > "Memory tiering" can mean many different things based on the user: from > traditional every day NUMA, to swap (to zswap), to NVDIMMs, to HBM, to > locally attached CXL memory, to memory borrowing over PCIe, to memory > pooling with disaggregation, and beyond. > > Just as NUMA started out only being useful for the supercomputers, memory > tiering will likely evolve over the next five years to take on an > expanding set of use cases, and likely with rapidly increasing adoption > even beyond hyperscalers. > > I think a discussion about memory tiering would be highly valuable. A fe= w > key questions that I think can drive this discussion: > > - What are the various form factors that must be supported as short-term > goals as well as need to be supported 5+ years into the future? > > - What incremental changes need to be made on top of NUMA support to > fully support the wide range of use cases that will be coming? (Is > memory tiering support built entirely upon NUMA?) > > - What is the minimum viable *default* support that the MM subsystem > should provide for tiered configs? What are the set of optimizations > that should be left to userspace or BPF to control? > > - What are the various page promotion technqiues that we must plan for > beyond traditional NUMA balancing that will allow us to exploit > hardware innovation? > > (And I'm sure there are more topics of discussion that others would > readily add. It would be great to have additional ideas in replies.) > > A key challenge in all of this is to make memory tiering support in the > upstream kernel compatible with the roadmaps of various CPU vendors. A > key goal is to ensure the end user benefits from all of this rapid > innovation with generalized support that is well abstracted and allows fo= r > extensibility. Thank you for bringing this one up. Memory tiering is a very important topic that should definitely be discussed. I'm especially interested in the userspace control part (which I proposed as a separate topic, but happy to see it addressed as part of this discussion too, as that is where the motivation originally came from). With the increased complexity introduced by memory tiers, is it still possible to provide a one-size-fits-all default? If there is such a default, is it accurately represented by the current model of NUMA nodes, where pages will be demoted to a slower tier as a 'reclaim' operation (e.g. you basically map a global LRU model on to tiers of increased latency)? Are there reasons to break that model, and should applications be able to do that? Is the current mempolicy/madvise model sufficient? - Frank