From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=paXG=LI=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,
	URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E0523C2B9F4
	for <linux-mm@archiver.kernel.org>; Mon, 14 Jun 2021 21:51:13 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 4359A61209
	for <linux-mm@archiver.kernel.org>; Mon, 14 Jun 2021 21:51:13 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4359A61209
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 8FD5F6B006C; Mon, 14 Jun 2021 17:51:12 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 8AD2E6B006E; Mon, 14 Jun 2021 17:51:12 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 79C836B0070; Mon, 14 Jun 2021 17:51:12 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0214.hostedemail.com [216.40.44.214])
	by kanga.kvack.org (Postfix) with ESMTP id 48B3B6B006C
	for <linux-mm@kvack.org>; Mon, 14 Jun 2021 17:51:12 -0400 (EDT)
Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with ESMTP id CBC09181AEF09
	for <linux-mm@kvack.org>; Mon, 14 Jun 2021 21:51:11 +0000 (UTC)
X-FDA: 78253675542.14.4F6C5C7
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
	by imf12.hostedemail.com (Postfix) with ESMTP id A49BAF2
	for <linux-mm@kvack.org>; Mon, 14 Jun 2021 21:50:58 +0000 (UTC)
IronPort-SDR: PaRwYTg+aTCHvFHudajgEMv/vOP/Pz1i+QuGusBtCsTusvUWKZugvJLRs+QbCE3r0ppBCmgEGO
 Ht2dJG2CkdXA==
X-IronPort-AV: E=McAfee;i="6200,9189,10015"; a="185577945"
X-IronPort-AV: E=Sophos;i="5.83,273,1616482800"; 
   d="scan'208";a="185577945"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
  by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2021 14:51:06 -0700
IronPort-SDR: c6GYhGm8MPTqShrU8uvRlxdMJ/AQSPaPUZ/KdbUnliqUAxJesrgbqhBdCTbdocwnOuhqN1v/Wj
 bueH4tu+AaAQ==
X-IronPort-AV: E=Sophos;i="5.83,273,1616482800"; 
   d="scan'208";a="451729425"
Received: from schen9-mobl.amr.corp.intel.com ([10.209.40.23])
  by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jun 2021 14:51:05 -0700
To: lsf-pc@lists.linux-foundation.org
Cc: linux-mm@kvack.org, Michal Hocko <mhocko@suse.com>,
 Dan Williams <dan.j.williams@intel.com>, Dave Hansen <dave.hansen@intel.com>
From: Tim Chen <tim.c.chen@linux.intel.com>
Subject: [LSF/MM TOPIC] Tiered memory accounting and management
Message-ID: <475cbc62-a430-2c60-34cc-72ea8baebf2c@linux.intel.com>
Date: Mon, 14 Jun 2021 14:51:04 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.6.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Authentication-Results: imf12.hostedemail.com;
	dkim=none;
	dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none);
	spf=none (imf12.hostedemail.com: domain of tim.c.chen@linux.intel.com has no SPF policy when checking 192.55.52.136) smtp.mailfrom=tim.c.chen@linux.intel.com
X-Stat-Signature: mqjbzqp7cwpowoj4gqxofswzu9t7d177
X-Rspamd-Queue-Id: A49BAF2
X-Rspamd-Server: rspam06
X-HE-Tag: 1623707458-861486
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>


From: Tim Chen <tim.c.chen@linux.intel.com>

Tiered memory accounting and management
------------------------------------------------------------
Traditionally, all RAM is DRAM.  Some DRAM might be closer/faster
than others, but a byte of media has about the same cost whether it
is close or far.  But, with new memory tiers such as High-Bandwidth
Memory or Persistent Memory, there is a choice between fast/expensive
and slow/cheap.  But, the current memory cgroups still live in the
old model. There is only one set of limits, and it implies that all
memory has the same cost.  We would like to extend memory cgroups to
comprehend different memory tiers to give users a way to choose a mix
between fast/expensive and slow/cheap.

To manage such memory, we will need to account memory usage and
impose limits for each kind of memory.

There were a couple of approaches that have been discussed previously to partition
the memory between the cgroups listed below.  We will like to
use the LSF/MM session to come to a consensus on the approach to
take.

1.	Per NUMA node limit and accounting for each cgroup.  
We can assign higher limits on better performing memory node for higher priority cgroups.

There are some loose ends here that warrant further discussions: 
(1) A user friendly interface for such limits.  Will a proportional
weight for the cgroup that translate to actual absolute limit be more suitable?
(2) Memory mis-configurations can occur more easily as the admin
has a much larger number of limits spread among between the
cgroups to manage.  Over-restrictive limits can lead to under utilized
and wasted memory and hurt performance. 
(3) OOM behavior when a cgroup hits its limit.

2.	Per memory tier limit and accounting for each cgroup. 
We can assign higher limits on memories in better performing 
memory tier for higher priority cgroups.  I previously
prototyped a soft limit based implementation to demonstrate the 
tiered limit idea.

There are also a number of issues here:
(1)	The advantage is we have fewer limits to deal with simplifying
configuration. However, there are doubts raised by a number 
of people on whether we can really properly classify the NUMA 
nodes into memory tiers. There could still be significant performance 
differences between NUMA nodes even for the same kind of memory.
We will also not have the fine-grained control and flexibility that comes
with a per NUMA node limit.
(2)	Will a memory hierarchy defined by promotion/demotion relationship between
memory nodes be a viable approach for defining memory tiers?

These issues related to  the management of systems with multiple kind of memories
can be ironed out in this session.