From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B959DC433EF for ; Wed, 30 Mar 2022 06:36:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0FDEB8D0002; Wed, 30 Mar 2022 02:36:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0AE038D0001; Wed, 30 Mar 2022 02:36:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB73A8D0002; Wed, 30 Mar 2022 02:36:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id DA2148D0001 for ; Wed, 30 Mar 2022 02:36:53 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 956502176A for ; Wed, 30 Mar 2022 06:36:53 +0000 (UTC) X-FDA: 79300094706.14.F55B972 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by imf14.hostedemail.com (Postfix) with ESMTP id 3C66A100008 for ; Wed, 30 Mar 2022 06:36:51 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0V8cbA2h_1648622206; Received: from 30.30.125.253(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0V8cbA2h_1648622206) by smtp.aliyun-inc.com(127.0.0.1); Wed, 30 Mar 2022 14:36:47 +0800 Message-ID: <784aee91-6a01-6e67-389e-1e1883796894@linux.alibaba.com> Date: Wed, 30 Mar 2022 14:37:35 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH] mm: migrate: set demotion targets differently To: Jagdish Gediya Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com, dave.hansen@linux.intel.com, ying.huang@intel.com References: <20220329115222.8923-1-jvgediya@linux.ibm.com> From: Baolin Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Stat-Signature: 9g7fmdshgjz6a48m3zk7htqrdraqith1 Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.133 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 3C66A100008 X-HE-Tag: 1648622211-964991 X-Bogosity: Ham, tests=bogofilter, spamicity=0.001044, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 3/29/2022 10:04 PM, Jagdish Gediya wrote: > On Tue, Mar 29, 2022 at 08:26:05PM +0800, Baolin Wang wrote: > Hi Baolin, >> Hi Jagdish, >> >> On 3/29/2022 7:52 PM, Jagdish Gediya wrote: >>> The current implementation to identify the demotion >>> targets limits some of the opportunities to share >>> the demotion targets between multiple source nodes. >>> >>> Implement a logic to identify the loop in the demotion >>> targets such that all the possibilities of demotion can >>> be utilized. Don't share the used targets between all >>> the nodes, instead create the used targets from scratch >>> for each individual node based on for what all node this >>> node is a demotion target. This helps to share the demotion >>> targets without missing any possible way of demotion. >>> >>> e.g. with below NUMA topology, where node 0 & 1 are >>> cpu + dram nodes, node 2 & 3 are equally slower memory >>> only nodes, and node 4 is slowest memory only node, >>> >>> available: 5 nodes (0-4) >>> node 0 cpus: 0 1 >>> node 0 size: n MB >>> node 0 free: n MB >>> node 1 cpus: 2 3 >>> node 1 size: n MB >>> node 1 free: n MB >>> node 2 cpus: >>> node 2 size: n MB >>> node 2 free: n MB >>> node 3 cpus: >>> node 3 size: n MB >>> node 3 free: n MB >>> node 4 cpus: >>> node 4 size: n MB >>> node 4 free: n MB >>> node distances: >>> node 0 1 2 3 4 >>> 0: 10 20 40 40 80 >>> 1: 20 10 40 40 80 >>> 2: 40 40 10 40 80 >>> 3: 40 40 40 10 80 >>> 4: 80 80 80 80 10 >>> >>> The existing implementation gives below demotion targets, >>> >>> node demotion_target >>> 0 3, 2 >>> 1 4 >>> 2 X >>> 3 X >>> 4 X >>> >>> With this patch applied, below are the demotion targets, >>> >>> node demotion_target >>> 0 3, 2 >>> 1 3, 2 >>> 2 3 >>> 3 4 >>> 4 X >> >> Node 2 and node 3 both are slow memory and have same distance, why node 2 >> should demote cold memory to node 3? They should have the same target >> demotion node 4, which is the slowest memory node, right? >> > Current demotion target finding algorithm works based on best distance, as distance between node 2 & 3 is 40 and distance between node 2 & 4 is 80, node 2 demotes to node 3. If node 2 can demote to node 3, which means node 3's memory is colder than node 2, right? The accessing time of node 3 should be larger than node 2, then we can demote colder memory to node 3 from node 2. But node 2 and node 3 are same memory type and have same distance, the accessing time of node 2 and node 3 should be same too, so why add so many page migration between node 2 and node 3? I'm still not sure the benefits. Huang Ying and Dave, how do you think about this demotion targets? >>> >>> e.g. with below NUMA topology, where node 0, 1 & 2 are >>> cpu + dram nodes and node 3 is slow memory node, >>> >>> available: 4 nodes (0-3) >>> node 0 cpus: 0 1 >>> node 0 size: n MB >>> node 0 free: n MB >>> node 1 cpus: 2 3 >>> node 1 size: n MB >>> node 1 free: n MB >>> node 2 cpus: 4 5 >>> node 2 size: n MB >>> node 2 free: n MB >>> node 3 cpus: >>> node 3 size: n MB >>> node 3 free: n MB >>> node distances: >>> node 0 1 2 3 >>> 0: 10 20 20 40 >>> 1: 20 10 20 40 >>> 2: 20 20 10 40 >>> 3: 40 40 40 10 >>> >>> The existing implementation gives below demotion targets, >>> >>> node demotion_target >>> 0 3 >>> 1 X >>> 2 X >>> 3 X >>> >>> With this patch applied, below are the demotion targets, >>> >>> node demotion_target >>> 0 3 >>> 1 3 >>> 2 3 >>> 3 X >> >> Sounds reasonable. >> >>> >>> with below NUMA topology, where node 0 & 2 are cpu + dram >>> nodes and node 1 & 3 are slow memory nodes, >>> >>> available: 4 nodes (0-3) >>> node 0 cpus: 0 1 >>> node 0 size: n MB >>> node 0 free: n MB >>> node 1 cpus: >>> node 1 size: n MB >>> node 1 free: n MB >>> node 2 cpus: 2 3 >>> node 2 size: n MB >>> node 2 free: n MB >>> node 3 cpus: >>> node 3 size: n MB >>> node 3 free: n MB >>> node distances: >>> node 0 1 2 3 >>> 0: 10 40 20 80 >>> 1: 40 10 80 80 >>> 2: 20 80 10 40 >>> 3: 80 80 40 10 >>> >>> The existing implementation gives below demotion targets, >>> >>> node demotion_target >>> 0 3 >>> 1 X >>> 2 3 >>> 3 X >> >> If I understand correctly, this is not true. The demotion route should be as >> below with existing implementation: >> node 0 ---> node 1 >> node 1 ---> X >> node 2 ---> node 3 >> node 3 ---> X >> > Its typo, It should be 0 -> 1, Will correct it in v2. >>> >>> With this patch applied, below are the demotion targets, >>> >>> node demotion_target >>> 0 1 >>> 1 3 >>> 2 3 >>> 3 X >>> >>> As it can be seen above, node 3 can be demotion target for node >>> 1 but existing implementation doesn't configure it that way. It >>> is better to move pages from node 1 to node 3 instead of moving >>> it from node 1 to swap. >> >> Which means node 3 is the slowest memory node? >> > Node 1 and 3 are equally slower but 1 is near to 0 and 3 is near to 2. Basically you can think of it like node 1 is slow memory logical node near to node 0 and node 3 is slow memory logical node near to node 2. OK.