From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0523C2B9F4 for ; Mon, 14 Jun 2021 21:51:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4359A61209 for ; Mon, 14 Jun 2021 21:51:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4359A61209 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8FD5F6B006C; Mon, 14 Jun 2021 17:51:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8AD2E6B006E; Mon, 14 Jun 2021 17:51:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79C836B0070; Mon, 14 Jun 2021 17:51:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0214.hostedemail.com [216.40.44.214]) by kanga.kvack.org (Postfix) with ESMTP id 48B3B6B006C for ; Mon, 14 Jun 2021 17:51:12 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id CBC09181AEF09 for ; Mon, 14 Jun 2021 21:51:11 +0000 (UTC) X-FDA: 78253675542.14.4F6C5C7 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf12.hostedemail.com (Postfix) with ESMTP id A49BAF2 for ; Mon, 14 Jun 2021 21:50:58 +0000 (UTC) IronPort-SDR: PaRwYTg+aTCHvFHudajgEMv/vOP/Pz1i+QuGusBtCsTusvUWKZugvJLRs+QbCE3r0ppBCmgEGO Ht2dJG2CkdXA== X-IronPort-AV: E=McAfee;i="6200,9189,10015"; a="185577945" X-IronPort-AV: E=Sophos;i="5.83,273,1616482800"; d="scan'208";a="185577945" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2021 14:51:06 -0700 IronPort-SDR: c6GYhGm8MPTqShrU8uvRlxdMJ/AQSPaPUZ/KdbUnliqUAxJesrgbqhBdCTbdocwnOuhqN1v/Wj bueH4tu+AaAQ== X-IronPort-AV: E=Sophos;i="5.83,273,1616482800"; d="scan'208";a="451729425" Received: from schen9-mobl.amr.corp.intel.com ([10.209.40.23]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2021 14:51:05 -0700 To: lsf-pc@lists.linux-foundation.org Cc: linux-mm@kvack.org, Michal Hocko , Dan Williams , Dave Hansen From: Tim Chen Subject: [LSF/MM TOPIC] Tiered memory accounting and management Message-ID: <475cbc62-a430-2c60-34cc-72ea8baebf2c@linux.intel.com> Date: Mon, 14 Jun 2021 14:51:04 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Authentication-Results: imf12.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none); spf=none (imf12.hostedemail.com: domain of tim.c.chen@linux.intel.com has no SPF policy when checking 192.55.52.136) smtp.mailfrom=tim.c.chen@linux.intel.com X-Stat-Signature: mqjbzqp7cwpowoj4gqxofswzu9t7d177 X-Rspamd-Queue-Id: A49BAF2 X-Rspamd-Server: rspam06 X-HE-Tag: 1623707458-861486 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Tim Chen Tiered memory accounting and management ------------------------------------------------------------ Traditionally, all RAM is DRAM. Some DRAM might be closer/faster than others, but a byte of media has about the same cost whether it is close or far. But, with new memory tiers such as High-Bandwidth Memory or Persistent Memory, there is a choice between fast/expensive and slow/cheap. But, the current memory cgroups still live in the old model. There is only one set of limits, and it implies that all memory has the same cost. We would like to extend memory cgroups to comprehend different memory tiers to give users a way to choose a mix between fast/expensive and slow/cheap. To manage such memory, we will need to account memory usage and impose limits for each kind of memory. There were a couple of approaches that have been discussed previously to partition the memory between the cgroups listed below. We will like to use the LSF/MM session to come to a consensus on the approach to take. 1. Per NUMA node limit and accounting for each cgroup. We can assign higher limits on better performing memory node for higher priority cgroups. There are some loose ends here that warrant further discussions: (1) A user friendly interface for such limits. Will a proportional weight for the cgroup that translate to actual absolute limit be more suitable? (2) Memory mis-configurations can occur more easily as the admin has a much larger number of limits spread among between the cgroups to manage. Over-restrictive limits can lead to under utilized and wasted memory and hurt performance. (3) OOM behavior when a cgroup hits its limit. 2. Per memory tier limit and accounting for each cgroup. We can assign higher limits on memories in better performing memory tier for higher priority cgroups. I previously prototyped a soft limit based implementation to demonstrate the tiered limit idea. There are also a number of issues here: (1) The advantage is we have fewer limits to deal with simplifying configuration. However, there are doubts raised by a number of people on whether we can really properly classify the NUMA nodes into memory tiers. There could still be significant performance differences between NUMA nodes even for the same kind of memory. We will also not have the fine-grained control and flexibility that comes with a per NUMA node limit. (2) Will a memory hierarchy defined by promotion/demotion relationship between memory nodes be a viable approach for defining memory tiers? These issues related to the management of systems with multiple kind of memories can be ironed out in this session.