From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E3125C4332F
	for <linux-mm@archiver.kernel.org>; Wed,  2 Nov 2022 08:46:24 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 7C86C8E0002; Wed,  2 Nov 2022 04:46:24 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 779A18E0001; Wed,  2 Nov 2022 04:46:24 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 63FF18E0002; Wed,  2 Nov 2022 04:46:24 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id 53A798E0001
	for <linux-mm@kvack.org>; Wed,  2 Nov 2022 04:46:24 -0400 (EDT)
Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id 25D281A1115
	for <linux-mm@kvack.org>; Wed,  2 Nov 2022 08:46:24 +0000 (UTC)
X-FDA: 80087870688.18.680730C
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
	by imf17.hostedemail.com (Postfix) with ESMTP id 15E1740003
	for <linux-mm@kvack.org>; Wed,  2 Nov 2022 08:46:22 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1667378783; x=1698914783;
  h=from:to:cc:subject:references:date:in-reply-to:
   message-id:mime-version;
  bh=aS31B0qCdqNEVkcc38xnNSgQkHYgfw1HrY1wqPLFUtU=;
  b=brK0383t4Ub5ivgJi9+8QDMIrznzQAQcZXxShIb8iVc51kftdUWZI6Lz
   1ifo4JPeW9G6r5AlnPTc4WQ6J/UGKegUl26M/SO+RR8oGlOQ1Z1XmS6T0
   7wZQiOGhl6oCXUR/rgnGA//dX2xGbFITj7Ein9BzNGKH/JydKNOV5+XVt
   wedAxj+J6QiF5NksOnq8lilla5tZySwvi4SygSWSU+Bvj7xVZDKlAro5S
   IBr/THCzhRPCtABSsXhgU/PcTTqHPh9Hyl59nSWhrXRPeulhbNK9rh88i
   Unipt+U7ncs30cqDluG5uBHo1W8cAb9cgV+f9MHQqQk94QFjRG5kYSuRi
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10518"; a="292661597"
X-IronPort-AV: E=Sophos;i="5.95,232,1661842800"; 
   d="scan'208";a="292661597"
Received: from orsmga002.jf.intel.com ([10.7.209.21])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2022 01:46:21 -0700
X-IronPort-AV: E=McAfee;i="6500,9779,10518"; a="634180580"
X-IronPort-AV: E=Sophos;i="5.95,232,1661842800"; 
   d="scan'208";a="634180580"
Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55])
  by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2022 01:46:17 -0700
From: "Huang, Ying" <ying.huang@intel.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Bharata B Rao <bharata@amd.com>,  Aneesh Kumar K V
 <aneesh.kumar@linux.ibm.com>,  linux-mm@kvack.org,
  linux-kernel@vger.kernel.org,  Andrew Morton <akpm@linux-foundation.org>,
  Alistair Popple <apopple@nvidia.com>,  Dan Williams
 <dan.j.williams@intel.com>,  Dave Hansen <dave.hansen@intel.com>,
  Davidlohr Bueso <dave@stgolabs.net>,  Hesham Almatary
 <hesham.almatary@huawei.com>,  Jagdish Gediya <jvgediya.oss@gmail.com>,
  Johannes Weiner <hannes@cmpxchg.org>,  Jonathan Cameron
 <Jonathan.Cameron@huawei.com>,  Tim Chen <tim.c.chen@intel.com>,  Wei Xu
 <weixugc@google.com>,  Yang Shi <shy828301@gmail.com>
Subject: Re: [RFC] memory tiering: use small chunk size and more tiers
References: <0d938c9f-c810-b10a-e489-c2b312475c52@amd.com>
	<87tu3oibyr.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<07912a0d-eb91-a6ef-2b9d-74593805f29e@amd.com>
	<87leowepz6.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<Y2Eui+kKvwj8ip+T@dhcp22.suse.cz>
	<878rkuchpm.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<Y2IhiSnpQsmY7khx@dhcp22.suse.cz>
	<87bkppbx75.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<Y2Inot4i4xUGH60O@dhcp22.suse.cz>
	<877d0dbw13.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<Y2IsvbOadq+TZx+R@dhcp22.suse.cz>
Date: Wed, 02 Nov 2022 16:45:38 +0800
In-Reply-To: <Y2IsvbOadq+TZx+R@dhcp22.suse.cz> (Michal Hocko's message of
	"Wed, 2 Nov 2022 09:39:25 +0100")
Message-ID: <8735b1bv7x.fsf@yhuang6-desk2.ccr.corp.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=ascii
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1667378783;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=nBU2uKBZJkhLDSkLei84kmcAK9ewj/ssKD1ALUI5KMw=;
	b=7Q5d62pG6T7B1k+nNV6/P1KjMMYy3NV7OqGQa/DTXHVQ81mbpqL9ZPHXfo1lXA2MdZmMqW
	nwWWXQkWhA1JW+f5TZNXqjD+1TTnkhFxH3sdxUcStBOUoe5T6EuqcF5e7SPsezxIiwcnS3
	2Jg8fZJY+l48DRBry/2hSI9RfVu0AzQ=
ARC-Authentication-Results: i=1;
	imf17.hostedemail.com;
	dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=brK0383t;
	dmarc=pass (policy=none) header.from=intel.com;
	spf=pass (imf17.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667378783; a=rsa-sha256;
	cv=none;
	b=GlmLc95rOm4U7DkhtF4NW0l166WYVUUw2LEScfyw1lLOAXEfS8stwfm+YacGgjGv52HR/g
	lp/nVx3gd0KkL+hRvKM4c9nmAQu0DGU5UCaMLKL9Ndzu7snfCLXnoHf+GqQlQ7sBO5L6CF
	ZElLk7UMWihgSj6YrYqhJ3SvsW7qyYY=
Authentication-Results: imf17.hostedemail.com;
	dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=brK0383t;
	dmarc=pass (policy=none) header.from=intel.com;
	spf=pass (imf17.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com
X-Rspam-User: 
X-Rspamd-Server: rspam08
X-Rspamd-Queue-Id: 15E1740003
X-Stat-Signature: qtsmtdz1q79ygag844iooc9byktkg8bb
X-HE-Tag: 1667378782-916089
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Michal Hocko <mhocko@suse.com> writes:

> On Wed 02-11-22 16:28:08, Huang, Ying wrote:
>> Michal Hocko <mhocko@suse.com> writes:
>> 
>> > On Wed 02-11-22 16:02:54, Huang, Ying wrote:
>> >> Michal Hocko <mhocko@suse.com> writes:
>> >> 
>> >> > On Wed 02-11-22 08:39:49, Huang, Ying wrote:
>> >> >> Michal Hocko <mhocko@suse.com> writes:
>> >> >> 
>> >> >> > On Mon 31-10-22 09:33:49, Huang, Ying wrote:
>> >> >> > [...]
>> >> >> >> In the upstream implementation, 4 tiers are possible below DRAM.  That's
>> >> >> >> enough for now.  But in the long run, it may be better to define more.
>> >> >> >> 100 possible tiers below DRAM may be too extreme.
>> >> >> >
>> >> >> > I am just curious. Is any configurations with more than couple of tiers
>> >> >> > even manageable? I mean applications have been struggling even with
>> >> >> > regular NUMA systems for years and vast majority of them is largerly
>> >> >> > NUMA unaware. How are they going to configure for a more complex system
>> >> >> > when a) there is no resource access control so whatever you aim for
>> >> >> > might not be available and b) in which situations there is going to be a
>> >> >> > demand only for subset of tears (GPU memory?) ?
>> >> >> 
>> >> >> Sorry for confusing.  I think that there are only several (less than 10)
>> >> >> tiers in a system in practice.  Yes, here, I suggested to define 100 (10
>> >> >> in the later text) POSSIBLE tiers below DRAM.  My intention isn't to
>> >> >> manage a system with tens memory tiers.  Instead, my intention is to
>> >> >> avoid to put 2 memory types into one memory tier by accident via make
>> >> >> the abstract distance range of each memory tier as small as possible.
>> >> >> More possible memory tiers, smaller abstract distance range of each
>> >> >> memory tier.
>> >> >
>> >> > TBH I do not really understand how tweaking ranges helps anything.
>> >> > IIUC drivers are free to assign any abstract distance so they will clash
>> >> > without any higher level coordination.
>> >> 
>> >> Yes.  That's possible.  Each memory tier corresponds to one abstract
>> >> distance range.  The larger the range is, the higher the possibility of
>> >> clashing is.  So I suggest to make the abstract distance range smaller
>> >> to reduce the possibility of clashing.
>> >
>> > I am sorry but I really do not understand how the size of the range
>> > actually addresses a fundamental issue that each driver simply picks
>> > what it wants. Is there any enumeration defining basic characteristic of
>> > each tier? How does a driver developer knows which tear to assign its
>> > driver to?
>> 
>> The smaller range size will not guarantee anything.  It just tries to
>> help the default behavior.
>> 
>> The drivers are expected to assign the abstract distance based on the
>> memory latency/bandwidth, etc.
>
> Would it be possible/feasible to have a canonical way to calculate the
> abstract distance from these characteristics by the core kernel so that
> drivers do not even have fall into that trap?

Yes.  That sounds a good idea.  We can provide a function to map from
the memory latency/bandwidth to the abstract distance for the drivers.

Best Regards,
Huang, Ying