From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 859EDC433EF
	for <linux-mm@archiver.kernel.org>; Mon, 20 Jun 2022 03:24:28 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id E4F076B0071; Sun, 19 Jun 2022 23:24:27 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id DFE646B0073; Sun, 19 Jun 2022 23:24:27 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id CEDF86B0074; Sun, 19 Jun 2022 23:24:27 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id BF74E6B0071
	for <linux-mm@kvack.org>; Sun, 19 Jun 2022 23:24:27 -0400 (EDT)
Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay10.hostedemail.com (Postfix) with ESMTP id 8F2C876B
	for <linux-mm@kvack.org>; Mon, 20 Jun 2022 03:24:27 +0000 (UTC)
X-FDA: 79597171374.20.C465084
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
	by imf23.hostedemail.com (Postfix) with ESMTP id B768314000B
	for <linux-mm@kvack.org>; Mon, 20 Jun 2022 03:24:26 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1655695466; x=1687231466;
  h=from:to:cc:subject:references:date:in-reply-to:
   message-id:mime-version;
  bh=D9XE01XwM+TilxJWOF9FdLepcJrbW+dWDW1WZgjV+/E=;
  b=WwtDkAsfjFEj8/91vGviGR8zCrpoYrc+U11jFQESfFBvbJjeMlETmHwU
   JS+9BtYDH3SxldH+Ofc9bleR80isEj2Gk51o5cRD3R0PvqvemdDOUp91f
   u+h+4CmG0Mwrqkef68V6kvwSg+FXFEptR/rXCtXOJgvVFrl1Jm0Ym/IhG
   1ifA/XFM3mq6VaIw2tEuoSiK+628Muo8yhHJhrSzFKUe9rQpl78HE6IYO
   0hqYg3BHLXtYSEOjHNBfaeslzlelfcozFz+UBWBdUMeza5NoW+ya6uQdh
   n9txgWuRtW+vtOQxp53zMjc9pZu1zWlVJp9tNAy4Kv7SPmk5EjVGtAvFE
   A==;
X-IronPort-AV: E=McAfee;i="6400,9594,10380"; a="262826394"
X-IronPort-AV: E=Sophos;i="5.92,306,1650956400"; 
   d="scan'208";a="262826394"
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2022 20:24:25 -0700
X-IronPort-AV: E=Sophos;i="5.92,306,1650956400"; 
   d="scan'208";a="729180271"
Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.239.13.94])
  by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2022 20:24:22 -0700
From: "Huang, Ying" <ying.huang@intel.com>
To: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,  <linux-mm@kvack.org>,
  <linux-kernel@vger.kernel.org>,  Johannes Weiner <hannes@cmpxchg.org>,
  Michal Hocko <mhocko@suse.com>,  Rik van Riel <riel@surriel.com>,  Mel
 Gorman <mgorman@techsingularity.net>,  Peter Zijlstra
 <peterz@infradead.org>,  Dave Hansen <dave.hansen@linux.intel.com>,  Yang
 Shi <shy828301@gmail.com>,  Zi Yan <ziy@nvidia.com>,  Wei Xu
 <weixugc@google.com>,  osalvador <osalvador@suse.de>,  Shakeel Butt
 <shakeelb@google.com>,  "Zhong Jiang" <zhongjiang-ali@linux.alibaba.com>
Subject: Re: [PATCH -V3 0/3] memory tiering: hot page selection
References: <20220614081635.194014-1-ying.huang@intel.com>
	<872bdaee-21a0-005b-b66c-893eb331e39a@linux.alibaba.com>
Date: Mon, 20 Jun 2022 11:24:17 +0800
In-Reply-To: <872bdaee-21a0-005b-b66c-893eb331e39a@linux.alibaba.com> (Baolin
	Wang's message of "Mon, 20 Jun 2022 11:19:23 +0800")
Message-ID: <87czf4rp9a.fsf@yhuang6-desk2.ccr.corp.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=ascii
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655695467; a=rsa-sha256;
	cv=none;
	b=GKZmxz/gn6ehVRO2NQODjejM1vw+i4PBKkz52a4YsSRqUIJbjLN6BWYB+raXTvG+JRTKvF
	T1WRWUjil6fdu+/ldX+d3EwZxuxB7sKKjj/5tlYjoq/glJqPvlM1wqnNYx36OT+fdida3c
	uqmncdk22nOPGj9bINxMfZzbIUFrT98=
ARC-Authentication-Results: i=1;
	imf23.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=WwtDkAsf;
	dmarc=pass (policy=none) header.from=intel.com;
	spf=none (imf23.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=ying.huang@intel.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1655695467;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=3sERX0mKqKNsODC2oU0THCnvFbBki9kPjJ3Bb3lsdK8=;
	b=szS3UskeTvF/qQFXI+S7Bd9qrdRhF/L+OfKKqOottMrMocsL1IUo/B67m0EWtG0zzn9Mwh
	J3rJf31VYXu5FCrjwgFYwJ+CsnXmd2a4rfvFzUsyXpNRzeAzbitat0NWoqJxtiMHozQirH
	nJUIONi70f1iKOpDJQG5Ugu6R34Wgxs=
Authentication-Results: imf23.hostedemail.com;
	dkim=pass header.d=intel.com header.s=Intel header.b=WwtDkAsf;
	dmarc=pass (policy=none) header.from=intel.com;
	spf=none (imf23.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 134.134.136.126) smtp.mailfrom=ying.huang@intel.com
X-Stat-Signature: rph4k57nuqychyy94uujhxxxtf6jfqge
X-Rspamd-Queue-Id: B768314000B
X-Rspam-User: 
X-Rspamd-Server: rspam03
X-HE-Tag: 1655695466-908124
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Baolin Wang <baolin.wang@linux.alibaba.com> writes:

> On 6/14/2022 4:16 PM, Huang Ying wrote:
>> To optimize page placement in a memory tiering system with NUMA
>> balancing, the hot pages in the slow memory nodes need to be
>> identified.  Essentially, the original NUMA balancing implementation
>> selects the mostly recently accessed (MRU) pages to promote.  But this
>> isn't a perfect algorithm to identify the hot pages.  Because the
>> pages with quite low access frequency may be accessed eventually given
>> the NUMA balancing page table scanning period could be quite long
>> (e.g. 60 seconds).  So in this patchset, we implement a new hot page
>> identification algorithm based on the latency between NUMA balancing
>> page table scanning and hint page fault.  Which is a kind of mostly
>> frequently accessed (MFU) algorithm.
>> In NUMA balancing memory tiering mode, if there are hot pages in
>> slow
>> memory node and cold pages in fast memory node, we need to
>> promote/demote hot/cold pages between the fast and cold memory nodes.
>> A choice is to promote/demote as fast as possible.  But the CPU
>> cycles
>> and memory bandwidth consumed by the high promoting/demoting
>> throughput will hurt the latency of some workload because of accessing
>> inflating and slow memory bandwidth contention.
>> A way to resolve this issue is to restrict the max
>> promoting/demoting
>> throughput.  It will take longer to finish the promoting/demoting.
>> But the workload latency will be better.  This is implemented in this
>> patchset as the page promotion rate limit mechanism.
>> The promotion hot threshold is workload and system configuration
>> dependent.  So in this patchset, a method to adjust the hot threshold
>> automatically is implemented.  The basic idea is to control the number
>> of the candidate promotion pages to match the promotion rate limit.
>> We used the pmbench memory accessing benchmark tested the patchset
>> on
>> a 2-socket server system with DRAM and PMEM installed.  The test
>> results are as follows,
>> 		pmbench score		promote rate
>> 		 (accesses/s)			MB/s
>> 		-------------		------------
>> base		  146887704.1		       725.6
>> hot selection     165695601.2		       544.0
>> rate limit	  162814569.8		       165.2
>> auto adjustment	  170495294.0                  136.9
>>  From the results above,
>> With hot page selection patch [1/3], the pmbench score increases
>> about
>> 12.8%, and promote rate (overhead) decreases about 25.0%, compared with
>> base kernel.
>> With rate limit patch [2/3], pmbench score decreases about 1.7%, and
>> promote rate decreases about 69.6%, compared with hot page selection
>> patch.
>> With threshold auto adjustment patch [3/3], pmbench score increases
>> about 4.7%, and promote rate decrease about 17.1%, compared with rate
>> limit patch.
>
> I did a simple testing with mysql on my machine which contains 1 DRAM
> node (30G) and 1 PMEM node (126G).
>
> sysbench /usr/share/sysbench/oltp_read_write.lua \
> ......
> --tables=200 \
> --table-size=1000000 \
> --report-interval=10 \
> --threads=16 \
> --time=120
>
> The tps can be improved about 5% from below data, and I think this is
> a good start to optimize the promotion. So for this series, please
> feel free to add:
>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>
> Without this patchset:
>  transactions:                        2080188 (3466.48 per sec.)
>
> With this patch set:
>  transactions:                        2174296 (3623.40 per sec.)

Thanks a lot!

Best Regards,
Huang, Ying