From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 35200FA373D
	for <linux-mm@archiver.kernel.org>; Mon, 31 Oct 2022 01:34:46 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 349FB6B0071; Sun, 30 Oct 2022 21:34:45 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 2D39F6B0073; Sun, 30 Oct 2022 21:34:45 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 174686B0074; Sun, 30 Oct 2022 21:34:45 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id E090E6B0071
	for <linux-mm@kvack.org>; Sun, 30 Oct 2022 21:34:44 -0400 (EDT)
Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay02.hostedemail.com (Postfix) with ESMTP id 6C027120375
	for <linux-mm@kvack.org>; Mon, 31 Oct 2022 01:34:44 +0000 (UTC)
X-FDA: 80079525288.11.94784AF
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
	by imf28.hostedemail.com (Postfix) with ESMTP id CD51DC0017
	for <linux-mm@kvack.org>; Mon, 31 Oct 2022 01:34:42 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1667180083; x=1698716083;
  h=from:to:cc:subject:references:date:in-reply-to:
   message-id:mime-version;
  bh=Pr/aS2rdynHUpdhyMcCOSJRgj/kU8LUfG/+C6yMaxzY=;
  b=mrYlLBd3gva/qKdiX9m7e1NM9lR4wA5vf7OXVP8XXXGfxbYqbdeyVGWk
   qQ0m26qzeY8Yd1kSCofyMk0a/PG3Fiv3XXM9PAbZZL3Fzo+KI4ad/J/yt
   evJ+T0CsSU5wgUN6b3AJr5aBPoD/wj3MQFBQtPVKMoLPd7CfUBjdwCmMa
   Tvcktt/GKIXFIbkLsg3x70Rf3y5lqTlfyTDi/i/kwEDyJTs9wgfBd+JiN
   rkuVL2+EcFpMW+qR7hYSh4nl/d38dWfZ2mz8oiZQQB5TIgnm9oxSw/dIs
   K0+DgVj7H9gA0hNQxXT5y/+5vzBqxm4bAdd0rbSkmeEW6z0DTMqbgqsFN
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10516"; a="310490881"
X-IronPort-AV: E=Sophos;i="5.95,227,1661842800"; 
   d="scan'208";a="310490881"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
  by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2022 18:34:40 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10516"; a="696832417"
X-IronPort-AV: E=Sophos;i="5.95,227,1661842800"; 
   d="scan'208";a="696832417"
Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55])
  by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2022 18:34:37 -0700
From: "Huang, Ying" <ying.huang@intel.com>
To: Bharata B Rao <bharata@amd.com>
Cc: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>,  <linux-mm@kvack.org>,
  <linux-kernel@vger.kernel.org>,  Andrew Morton
 <akpm@linux-foundation.org>,  Alistair Popple <apopple@nvidia.com>,  Dan
 Williams <dan.j.williams@intel.com>,  Dave Hansen <dave.hansen@intel.com>,
  "Davidlohr Bueso" <dave@stgolabs.net>,  Hesham Almatary
 <hesham.almatary@huawei.com>,  Jagdish Gediya <jvgediya.oss@gmail.com>,
  Johannes Weiner <hannes@cmpxchg.org>,  Jonathan Cameron
 <Jonathan.Cameron@huawei.com>,  "Michal Hocko" <mhocko@kernel.org>,  Tim
 Chen <tim.c.chen@intel.com>,  Wei Xu <weixugc@google.com>,  Yang Shi
 <shy828301@gmail.com>
Subject: Re: [RFC] memory tiering: use small chunk size and more tiers
References: <20221027065925.476955-1-ying.huang@intel.com>
	<578c9b89-10eb-1e23-8868-cdd6685d8d4e@linux.ibm.com>
	<877d0kk5uf.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<59291b98-6907-0acf-df11-6d87681027cc@linux.ibm.com>
	<8735b8jy9k.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<0d938c9f-c810-b10a-e489-c2b312475c52@amd.com>
	<87tu3oibyr.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<07912a0d-eb91-a6ef-2b9d-74593805f29e@amd.com>
Date: Mon, 31 Oct 2022 09:33:49 +0800
In-Reply-To: <07912a0d-eb91-a6ef-2b9d-74593805f29e@amd.com> (Bharata B. Rao's
	message of "Fri, 28 Oct 2022 19:23:33 +0530")
Message-ID: <87leowepz6.fsf@yhuang6-desk2.ccr.corp.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=ascii
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1667180083;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=Q2NK+4BMt1cFzv1xISNIpUWytgwVP4Qu7XAYjTOKFrc=;
	b=2FtwuNilKv8GWzfOav02GT6bwsLZZrgOQohF11Js2unFOYI6Ln7vAiw1nuio4eVJrQNjv/
	ZveWHPM15wqBJh1fojsQIWg0UN5dud/83DR3+vAGJ9kw1S9Nsyn44VGDV+kVOE4mi9L6U/
	XrzchNhKJBoi5s4JXUldKDB5RMj26Mw=
ARC-Authentication-Results: i=1;
	imf28.hostedemail.com;
	dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=mrYlLBd3;
	spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667180083; a=rsa-sha256;
	cv=none;
	b=F+Mi7r5LBlnJhXyWnx8e3bedBIJqJZ7gzUpGvDyIdnT1eU2PTFK/AwOoExc8g9ZrT+dOLY
	h+4MSSOt4fSSGimJcdGe0RRR119F+MZIqVThMDypvrlqigyKsxPVGo/3aGP9uOdZ8clDbl
	07WBHHGnl5/w+bM4hySyFZlD5t5C5zY=
Authentication-Results: imf28.hostedemail.com;
	dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=mrYlLBd3;
	spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
X-Rspam-User: 
X-Rspamd-Queue-Id: CD51DC0017
X-Rspamd-Server: rspam03
X-Stat-Signature: dmescdbhrrnzcx1cxkwcpnrf7ndzerxt
X-HE-Tag: 1667180082-418520
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Bharata B Rao <bharata@amd.com> writes:

> On 10/28/2022 2:03 PM, Huang, Ying wrote:
>> Bharata B Rao <bharata@amd.com> writes:
>> 
>>> On 10/28/2022 11:16 AM, Huang, Ying wrote:
>>>> If my understanding were correct, you think the latency / bandwidth of
>>>> these NUMA nodes will near each other, but may be different.
>>>>
>>>> Even if the latency / bandwidth of these NUMA nodes isn't exactly same,
>>>> we should deal with that in memory types instead of memory tiers.
>>>> There's only one abstract distance for each memory type.
>>>>
>>>> So, I still believe we will not have many memory tiers with my proposal.
>>>>
>>>> I don't care too much about the exact number, but want to discuss some
>>>> general design choice,
>>>>
>>>> a) Avoid to group multiple memory types into one memory tier by default
>>>>    at most times.
>>>
>>> Do you expect the abstract distances of two different types to be
>>> close enough in real life (like you showed in your example with
>>> CXL - 5000 and PMEM - 5100) that they will get assigned into same tier
>>> most times?
>>>
>>> Are you foreseeing that abstract distance that get mapped by sources
>>> like HMAT would run into this issue?
>> 
>> Only if we set abstract distance chunk size large.  So, I think that
>> it's better to set chunk size as small as possible to avoid potential
>> issue.  What is the downside to set the chunk size small?
>
> I don't see anything in particular. However
>
> - With just two memory types (default_dram_type and dax_slowmem_type
> with adistance values of 576 and 576*5 respectively) defined currently,
> - With no interface yet to set/change adistance value of a memory type,
> - With no defined way to convert the performance characteristics info
> (bw and latency) from sources like HMAT into a adistance value,
>
> I find it a bit difficult to see how a chunk size of 10 against the
> existing 128 could be more useful.

OK.  Maybe we pay too much attention to specific number.  My target
isn't to push this specific RFC into kernel.  I just want to discuss the
design choices with community.

My basic idea is NOT to group memory types into memory tiers via
customizing abstract distance chunk size.  Because that's hard to be
used and implemented.  So far, it appears that nobody objects this.

Then, it's even better to avoid to adjust abstract chunk size in kernel
as much as possible.  This will make the life of the user space
tools/scripts easier.  One solution is to define more than enough
possible tiers under DRAM (we have unlimited number of tiers above
DRAM).

In the upstream implementation, 4 tiers are possible below DRAM.  That's
enough for now.  But in the long run, it may be better to define more.
100 possible tiers below DRAM may be too extreme.  How about define the
abstract distance of DRAM to be 1050 and chunk size to be 100.  Then we
will have 10 possible tiers below DRAM.  That may be more than enough
even in the long run?

Again, the specific number isn't so important for me.  So please suggest
your number if necessary.

Best Regards,
Huang, Ying