From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C82A1C25B10
	for <linux-mm@archiver.kernel.org>; Mon, 13 May 2024 07:50:32 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 2B75C6B027D; Mon, 13 May 2024 03:50:32 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 267466B027E; Mon, 13 May 2024 03:50:32 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 12EBE6B027F; Mon, 13 May 2024 03:50:32 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id E8A7B6B027D
	for <linux-mm@kvack.org>; Mon, 13 May 2024 03:50:31 -0400 (EDT)
Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay03.hostedemail.com (Postfix) with ESMTP id 73B4DA0D14
	for <linux-mm@kvack.org>; Mon, 13 May 2024 07:50:31 +0000 (UTC)
X-FDA: 82112600262.10.035FFCE
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18])
	by imf20.hostedemail.com (Postfix) with ESMTP id F04B91C0002
	for <linux-mm@kvack.org>; Mon, 13 May 2024 07:50:27 +0000 (UTC)
Authentication-Results: imf20.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=TBUGLFBm;
	dmarc=pass (policy=none) header.from=intel.com;
	spf=pass (imf20.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1715586629;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=OViaMoej05f8/D1ogvQi6AwGQW1/rEIhNqqg4N465y0=;
	b=Rpdm3Ie03xFAfW5e7i+xom82kI2wu+DqGvFixBzx8xx3+ebRTflPWZtv6OgPHHQUogN0aA
	XWZE1eJS4/iyymbH2m2HdZZHsWAFa6h9OX97JfYoj0EWB0hrcXloseqCr4ydVG4DkesF8e
	yd9jdEshr61onMfBcrAL3j/jnw5txfw=
ARC-Authentication-Results: i=1;
	imf20.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=TBUGLFBm;
	dmarc=pass (policy=none) header.from=intel.com;
	spf=pass (imf20.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715586629; a=rsa-sha256;
	cv=none;
	b=Ne76gOspuYA84L+iuw5ngu4gqXg74i4su4p1orLmCG1535edc6cWtkSkBC1IYd+t2P91x/
	ici0lZn3+YPyBfct9A7ipKyKGmhVZIXtEyp+NPxB+hJEt4fDI8w8reSVBuCnK2KLGINgDw
	wX6O8dlILtN36+if0u0xUURtrGLKdpE=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1715586628; x=1747122628;
  h=from:to:cc:subject:in-reply-to:references:date:
   message-id:mime-version;
  bh=ezxZcm49CnI9QTIRrhWMZmQjHY0QFt11Xj2lMjs29VM=;
  b=TBUGLFBmt5Yt/1zUvErraq2yeP2XeJt+w7+ImaZeOUOS2XpgA9zZ5SVG
   XxeaJgMBcc9+9O3mqzwbNrJg/wu8dXvBXE1zfLip6LZJvvKQU4FhN9HoX
   utCIlGbgYYk7o7ZK+CvbIB5uE6TUxN65m53ZNi9kGx0N5LgmPBWbuhaxh
   ZrAghgmsbI1zJSif9A7SAGLatvDUngA+aKgTidqPnL/ziSMAxufHsNb/6
   Wk8stKYiSemsdHU+M3RImlVQJvIMxr1fT0zazlOJEKKi3hlNSfPflLK3R
   49wgQpNXmp914jS9IlbQ+rHGSgyDcFzFYgZPi5Vlw+zgNFOTWOi28DA+Z
   g==;
X-CSE-ConnectionGUID: VMfPl6U/QOWBFQshdH//5w==
X-CSE-MsgGUID: 681plGgER7Wjf+deURKx/A==
X-IronPort-AV: E=McAfee;i="6600,9927,11071"; a="11329552"
X-IronPort-AV: E=Sophos;i="6.08,157,1712646000"; 
   d="scan'208";a="11329552"
Received: from orviesa009.jf.intel.com ([10.64.159.149])
  by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 May 2024 00:50:12 -0700
X-CSE-ConnectionGUID: I10OFWL3S0CKf95F5BpdGQ==
X-CSE-MsgGUID: Qq+RU7QTSUm5uoqSIHBHoQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.08,157,1712646000"; 
   d="scan'208";a="30376505"
Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55])
  by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 May 2024 00:50:06 -0700
From: "Huang, Ying" <ying.huang@intel.com>
To: David Rientjes <rientjes@google.com>
Cc: lsf-pc@lists.linux-foundation.org,  linux-mm@kvack.org,  Michal Hocko
 <mhocko@suse.com>,  Dan Williams <dan.j.williams@intel.com>,  John Hubbard
 <jhubbard@nvidia.com>,  Zi Yan <ziy@nvidia.com>,  Bharata B Rao
 <bharata@amd.com>,  Dave Jiang <dave.jiang@intel.com>,  "Aneesh Kumar K.V"
 <aneesh.kumar@linux.ibm.com>,  Alistair Popple <apopple@nvidia.com>,
  Christoph Lameter <cl@gentwo.org>,  Andrew Morton
 <akpm@linux-foundation.org>,  Linus Torvalds
 <torvalds@linux-foundation.org>,  Dave Hansen
 <dave.hansen@linux.intel.com>,  Mel Gorman <mgorman@suse.de>,  Jon Grimm
 <jon.grimm@amd.com>,  Gregory Price <gourry.memverge@gmail.com>,  Wei Xu
 <weixugc@google.com>,  Johannes Weiner <hannes@cmpxchg.org>,  SeongJae
 Park <sj@kernel.org>,  David Hildenbrand <david@redhat.com>,
  peterz@infradead.org,  a.manzanares@samsung.com
Subject: Re: [LSF/MM/BPF TOPIC] Locally attached memory tiering
In-Reply-To: <sekh6rzzxdf4rjtk7z4rlxek2lp6x5ua35333lfjaax7mkh7pk@gxxskmzkjkzw>
	(Davidlohr Bueso's message of "Sun, 12 May 2024 18:49:25 -0700")
References: <e90dc785-c4e6-47e4-8eda-d35325c82ff9@google.com>
	<20240508213918.7ndnrjs6pxnklbpi@offworld>
	<87pltviwv5.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<sekh6rzzxdf4rjtk7z4rlxek2lp6x5ua35333lfjaax7mkh7pk@gxxskmzkjkzw>
Date: Mon, 13 May 2024 15:48:14 +0800
Message-ID: <87frumcfup.fsf@yhuang6-desk2.ccr.corp.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain; charset=ascii
X-Rspamd-Queue-Id: F04B91C0002
X-Stat-Signature: jtwwxmyiutr1zng4m4y4rau7jc175m8s
X-Rspam-User: 
X-Rspamd-Server: rspam08
X-HE-Tag: 1715586627-793885
X-HE-Meta: U2FsdGVkX1/fNNjmblO4u4rqc4yYyh8pypav1+V623fSwttKD//mKjpyf6Rj+MgNqQz5qJ4DbJel3oF0BnQAF9Ccg7BZuB6ZIGc363LR/6IfG8kOuHuhTWL96aWat8n9oNZatxV3k6M+hJIcnNydDnvgSKIddJyb2YP43tmBOD1+DmEUv2s10bH8DTEBe7OfGq1C3NQdSkTaZ3leXso8fxBi+/T3UZ1RIznDQ8kLEq2PVAitkBtSuJBF87F9aHNHjcPXkmKLGQkTp7YYZspV6luQbeDS6VJHyHo5X/oOY3C7jzPlgHc0QFCi7WRp58RRk/hokd953EYJBO4Y5tqCWsGatTjuIcBagqk3Rn4Q/fF4vAR5R0Gn4v4wDFl2H5AfUi3EOvJ89V4Jy4vPbKqxME9uMvZTda+cdJWVDQyylR+iAVvDMGlVjxlSnkohTljTEzc/07+mnyAGCa7fsWKWmU+uD2c7HlVgZHQrph/IllD8QA37TDUmjuu+W6io7vnKRqJfcYDu7gNY0trL6TuGFHWrsLkmUww922hsxr9cI1jG441sTEHsqNCglDMyDGZwaOJdiJgUzIkgRud2Sq00ONX46SW+KwpbZ9P7MKiTIIXmRkS3A7rTO7IG5ppw20B4uZsQh4lPB2revCelECG5uQXR9IdV/McQtOp0wBDcM9ygvq1z3uFsz8VtYvAxvL2GthoZyEj6xr1EVQhX6W8HyQWbrkCsiEMV++Un0NeKrLmIHcvtr7RgQSyFrNVeg9QEIAjYynRftM3Juc4ud6CjNooqkyXc49qJIk1oOnbzKj90692NY8R5zcfZ+CS1Fbt3ZhieK/gj4wJVt1uDSgoDwINQxKxJAadPhbfe+7VgJxUrvEmCObT7XJchKcY6JnK54kneop7JqUThmHylpmP+1E7Q4mReGahd2hkuOzbFccc/fSWL28gvZINs7R+zi6HvIGtLop1pBL+skHU4Xqe
 DOUI8h/m
 VbbqX9lgZ6Zs72pjJmWAJeiEdIVyLVvSNKHQzv207YzeD2RCp3EHOSuN22FIJqfIOc8SPkrb/vqKQGbThL3JzcQeIl78itGYffLH8pzttsJmfBxyUKZlywcg1ldTt+sPYeB6aed8eGUNapPOL2S4KbhSgcvx6PqYxAJnWl2lJAGc2n74E8dkXWB4s+YvPd/GxCFn581nGHoYSBkiMcAIKeUHZTupgbceUEStHF4qr9OYkb02ix5iv3FPb6QbZdyI6Tx6fGoFyhqK45WQf8fMDB15oJA==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Davidlohr Bueso <dave@stgolabs.net> writes:

> On Thu, 09 May 2024, Huang, Ying wrote:
>
>>With the default configuration, current NUMA balancing based promotion
>>solution will almost try to promote any faulting pages.  To select hot
>>pages to promote and control thrashing between NUMA nodes, the promote
>>rate limit needs to be configured.  For example, via,
>>
>>echo 200 > /proc/sys/kernel/numa_balancing_promote_rate_limit_MBps
>>
>>200MB hot pages will be selected and promoted every second.  Can you try it?
>
> Yes, I've played with this tunnable and, just like the LRU approach, it
> shows nice micro wins (less amount of promotions/demotions) but little for
> actual benchmark improvements at a higher level, merely noise level or
> very sublte wins. In fact, the actual data from that series for this
> parameter was a ~2% pmbench win with the rate limiting, but a 69% promotion
> rate descrease.

Thanks a lot for update!

IIUC, page promotion/demotion only helps performance if there are hot
pages in the slow memory and cold pages in the hot memory.  This may be
not true for quite some workloads configurations.

For example, the default allocation mechanism is local first.  In the
context of memory tiering, it's the fast memory first.  In various
workloads, it's quite normal that hot pages will be allocated firstly.
This makes it unnecessary to optimize the page placement until there's
some configuration changes in the system.

So, to evaluate the optimization, we need to

1) check the overhead of the optimization when page placement is almost
optimal already.

2) find configurations where the page placement isn't good enough, and
check whether memory tiering optimization works.

> And this is really my point, how much effort do we want to put in optimizing
> software mechanisms for hot page detection? Are there other benchmarks we
> should be using? And perhaps doing the async promotion and not incurring in
> the numa balancing overhead and comparing the cost of migration before
> promoting would yield some better numbers, but that also might be easy to
> get wrong when compared to the relative hotness of the page.

I believe that there are still quite some spaces to optimize the
software mechanisms.  The current implementation is as simple as
possible in fact.

--
Best Regards,
Huang, Ying