From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B959DC433EF
	for <linux-mm@archiver.kernel.org>; Wed, 30 Mar 2022 06:36:54 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 0FDEB8D0002; Wed, 30 Mar 2022 02:36:54 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 0AE038D0001; Wed, 30 Mar 2022 02:36:54 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id EB73A8D0002; Wed, 30 Mar 2022 02:36:53 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24])
	by kanga.kvack.org (Postfix) with ESMTP id DA2148D0001
	for <linux-mm@kvack.org>; Wed, 30 Mar 2022 02:36:53 -0400 (EDT)
Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay07.hostedemail.com (Postfix) with ESMTP id 956502176A
	for <linux-mm@kvack.org>; Wed, 30 Mar 2022 06:36:53 +0000 (UTC)
X-FDA: 79300094706.14.F55B972
Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133])
	by imf14.hostedemail.com (Postfix) with ESMTP id 3C66A100008
	for <linux-mm@kvack.org>; Wed, 30 Mar 2022 06:36:51 +0000 (UTC)
X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0V8cbA2h_1648622206;
Received: from 30.30.125.253(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0V8cbA2h_1648622206)
          by smtp.aliyun-inc.com(127.0.0.1);
          Wed, 30 Mar 2022 14:36:47 +0800
Message-ID: <784aee91-6a01-6e67-389e-1e1883796894@linux.alibaba.com>
Date: Wed, 30 Mar 2022 14:37:35 +0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.7.0
Subject: Re: [PATCH] mm: migrate: set demotion targets differently
To: Jagdish Gediya <jvgediya@linux.ibm.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
 akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com,
 dave.hansen@linux.intel.com, ying.huang@intel.com
References: <20220329115222.8923-1-jvgediya@linux.ibm.com>
 <b7d1ab3b-e92c-d3aa-72cb-b80cc1a61e85@linux.alibaba.com>
 <YkMR8OY779Bcri3I@li-6e1fa1cc-351b-11b2-a85c-b897023bb5f3.ibm.com>
From: Baolin Wang <baolin.wang@linux.alibaba.com>
In-Reply-To: <YkMR8OY779Bcri3I@li-6e1fa1cc-351b-11b2-a85c-b897023bb5f3.ibm.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Rspam-User: 
X-Stat-Signature: 9g7fmdshgjz6a48m3zk7htqrdraqith1
Authentication-Results: imf14.hostedemail.com;
	dkim=none;
	spf=pass (imf14.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.133 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com;
	dmarc=pass (policy=none) header.from=alibaba.com
X-Rspamd-Server: rspam01
X-Rspamd-Queue-Id: 3C66A100008
X-HE-Tag: 1648622211-964991
X-Bogosity: Ham, tests=bogofilter, spamicity=0.001044, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>


On 3/29/2022 10:04 PM, Jagdish Gediya wrote:
> On Tue, Mar 29, 2022 at 08:26:05PM +0800, Baolin Wang wrote:
> Hi Baolin,
>> Hi Jagdish,
>>
>> On 3/29/2022 7:52 PM, Jagdish Gediya wrote:
>>> The current implementation to identify the demotion
>>> targets limits some of the opportunities to share
>>> the demotion targets between multiple source nodes.
>>>
>>> Implement a logic to identify the loop in the demotion
>>> targets such that all the possibilities of demotion can
>>> be utilized. Don't share the used targets between all
>>> the nodes, instead create the used targets from scratch
>>> for each individual node based on for what all node this
>>> node is a demotion target. This helps to share the demotion
>>> targets without missing any possible way of demotion.
>>>
>>> e.g. with below NUMA topology, where node 0 & 1 are
>>> cpu + dram nodes, node 2 & 3 are equally slower memory
>>> only nodes, and node 4 is slowest memory only node,
>>>
>>> available: 5 nodes (0-4)
>>> node 0 cpus: 0 1
>>> node 0 size: n MB
>>> node 0 free: n MB
>>> node 1 cpus: 2 3
>>> node 1 size: n MB
>>> node 1 free: n MB
>>> node 2 cpus:
>>> node 2 size: n MB
>>> node 2 free: n MB
>>> node 3 cpus:
>>> node 3 size: n MB
>>> node 3 free: n MB
>>> node 4 cpus:
>>> node 4 size: n MB
>>> node 4 free: n MB
>>> node distances:
>>> node   0   1   2   3   4
>>>     0:  10  20  40  40  80
>>>     1:  20  10  40  40  80
>>>     2:  40  40  10  40  80
>>>     3:  40  40  40  10  80
>>>     4:  80  80  80  80  10
>>>
>>> The existing implementation gives below demotion targets,
>>>
>>> node    demotion_target
>>>    0              3, 2
>>>    1              4
>>>    2              X
>>>    3              X
>>>    4		X
>>>
>>> With this patch applied, below are the demotion targets,
>>>
>>> node    demotion_target
>>>    0              3, 2
>>>    1              3, 2
>>>    2              3
>>>    3              4
>>>    4		X
>>
>> Node 2 and node 3 both are slow memory and have same distance, why node 2
>> should demote cold memory to node 3? They should have the same target
>> demotion node 4, which is the slowest memory node, right?
>>
> Current demotion target finding algorithm works based on best distance, as distance between node 2 & 3 is 40 and distance between node 2 & 4 is 80, node 2 demotes to node 3.

If node 2 can demote to node 3, which means node 3's memory is colder 
than node 2, right? The accessing time of node 3 should be larger than 
node 2, then we can demote colder memory to node 3 from node 2.

But node 2 and node 3 are same memory type and have same distance, the 
accessing time of node 2 and node 3 should be same too, so why add so 
many page migration between node 2 and node 3? I'm still not sure the 
benefits.

Huang Ying and Dave, how do you think about this demotion targets?

>>>
>>> e.g. with below NUMA topology, where node 0, 1 & 2 are
>>> cpu + dram nodes and node 3 is slow memory node,
>>>
>>> available: 4 nodes (0-3)
>>> node 0 cpus: 0 1
>>> node 0 size: n MB
>>> node 0 free: n MB
>>> node 1 cpus: 2 3
>>> node 1 size: n MB
>>> node 1 free: n MB
>>> node 2 cpus: 4 5
>>> node 2 size: n MB
>>> node 2 free: n MB
>>> node 3 cpus:
>>> node 3 size: n MB
>>> node 3 free: n MB
>>> node distances:
>>> node   0   1   2   3
>>>     0:  10  20  20  40
>>>     1:  20  10  20  40
>>>     2:  20  20  10  40
>>>     3:  40  40  40  10
>>>
>>> The existing implementation gives below demotion targets,
>>>
>>> node    demotion_target
>>>    0              3
>>>    1              X
>>>    2              X
>>>    3              X
>>>
>>> With this patch applied, below are the demotion targets,
>>>
>>> node    demotion_target
>>>    0              3
>>>    1              3
>>>    2              3
>>>    3              X
>>
>> Sounds reasonable.
>>
>>>
>>> with below NUMA topology, where node 0 & 2 are cpu + dram
>>> nodes and node 1 & 3 are slow memory nodes,
>>>
>>> available: 4 nodes (0-3)
>>> node 0 cpus: 0 1
>>> node 0 size: n MB
>>> node 0 free: n MB
>>> node 1 cpus:
>>> node 1 size: n MB
>>> node 1 free: n MB
>>> node 2 cpus: 2 3
>>> node 2 size: n MB
>>> node 2 free: n MB
>>> node 3 cpus:
>>> node 3 size: n MB
>>> node 3 free: n MB
>>> node distances:
>>> node   0   1   2   3
>>>     0:  10  40  20  80
>>>     1:  40  10  80  80
>>>     2:  20  80  10  40
>>>     3:  80  80  40  10
>>>
>>> The existing implementation gives below demotion targets,
>>>
>>> node    demotion_target
>>>    0              3
>>>    1              X
>>>    2              3
>>>    3              X
>>
>> If I understand correctly, this is not true. The demotion route should be as
>> below with existing implementation:
>> node 0 ---> node 1
>> node 1 ---> X
>> node 2 ---> node 3
>> node 3 ---> X
>>
> Its typo, It should be 0 -> 1, Will correct it in v2.
>>>
>>> With this patch applied, below are the demotion targets,
>>>
>>> node    demotion_target
>>>    0              1
>>>    1              3
>>>    2              3
>>>    3              X
>>>
>>> As it can be seen above, node 3 can be demotion target for node
>>> 1 but existing implementation doesn't configure it that way. It
>>> is better to move pages from node 1 to node 3 instead of moving
>>> it from node 1 to swap.
>>
>> Which means node 3 is the slowest memory node?
>>
> Node 1 and 3 are equally slower but 1 is near to 0 and 3 is near to 2. Basically you can think of it like node 1 is slow memory logical node near to node 0 and node 3 is slow memory logical node near to node 2.

OK.