From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AABDC433EF for ; Wed, 30 Mar 2022 06:46:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E1B28D0002; Wed, 30 Mar 2022 02:46:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 891AB8D0001; Wed, 30 Mar 2022 02:46:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 733868D0002; Wed, 30 Mar 2022 02:46:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0035.hostedemail.com [216.40.44.35]) by kanga.kvack.org (Postfix) with ESMTP id 6424A8D0001 for ; Wed, 30 Mar 2022 02:46:58 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 1498072543 for ; Wed, 30 Mar 2022 06:46:58 +0000 (UTC) X-FDA: 79300120116.23.09D3EC7 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf11.hostedemail.com (Postfix) with ESMTP id 1E8A340005 for ; Wed, 30 Mar 2022 06:46:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1648622817; x=1680158817; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=1jS6mrvvrk0FhpnC8+5qcpTEmgGUnYdS15XRcziWF2M=; b=T/QCBHAp6IcSHJhNaDRMs3EscRmSDpQ7QbGMG+0LjmXKNvrH6Rf7xCLL DyYHsX35ONVDkGny3P7Dt2J90VQsFbuYad/EJiK3pR8uowztEwdvP4luS 2i6WQBATlUN3SJJxdZGj4aVwEwFxSGGeopUdeYGwfRsDvwr8SWpqD0ZRq vMTqEmdBX8R7A8ryMh9Pd1h34C1d+DsQSIA8uUSeX4Tym052/BAw7H9Kt 6mCGrxXUoa2FP1O3ny8fId+EiMLMGyQNzEf7xCzqCXsOQY478eVti6Iz5 NYSzCtbmbfX4uilrhMUz9meZG/ygvoSqb9tTARck+rYKBYnto9FETbcKB A==; X-IronPort-AV: E=McAfee;i="6200,9189,10301"; a="259176084" X-IronPort-AV: E=Sophos;i="5.90,221,1643702400"; d="scan'208";a="259176084" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Mar 2022 23:46:55 -0700 X-IronPort-AV: E=Sophos;i="5.90,221,1643702400"; d="scan'208";a="546726375" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.239.13.94]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Mar 2022 23:46:53 -0700 From: "Huang, Ying" To: Jagdish Gediya Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com, baolin.wang@linux.alibaba.com, dave.hansen@linux.intel.com Subject: Re: [PATCH] mm: migrate: set demotion targets differently References: <20220329115222.8923-1-jvgediya@linux.ibm.com> Date: Wed, 30 Mar 2022 14:46:51 +0800 In-Reply-To: <20220329115222.8923-1-jvgediya@linux.ibm.com> (Jagdish Gediya's message of "Tue, 29 Mar 2022 17:22:22 +0530") Message-ID: <87pmm4c4ys.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="T/QCBHAp"; spf=none (imf11.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 134.134.136.24) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: zzx343xjdo58m7objxfykzku9rdu8tpn X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 1E8A340005 X-HE-Tag: 1648622816-503275 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, Jagdish, Jagdish Gediya writes: > The current implementation to identify the demotion > targets limits some of the opportunities to share > the demotion targets between multiple source nodes. Yes. It sounds reasonable to share demotion targets among multiple source nodes. One question, are example machines below are real hardware now or in near future? Or you just think they are possible? And, before going into the implementation details, I think that we can discuss the perfect demotion order firstly. > Implement a logic to identify the loop in the demotion > targets such that all the possibilities of demotion can > be utilized. Don't share the used targets between all > the nodes, instead create the used targets from scratch > for each individual node based on for what all node this > node is a demotion target. This helps to share the demotion > targets without missing any possible way of demotion. > > e.g. with below NUMA topology, where node 0 & 1 are > cpu + dram nodes, node 2 & 3 are equally slower memory > only nodes, and node 4 is slowest memory only node, > > available: 5 nodes (0-4) > node 0 cpus: 0 1 > node 0 size: n MB > node 0 free: n MB > node 1 cpus: 2 3 > node 1 size: n MB > node 1 free: n MB > node 2 cpus: > node 2 size: n MB > node 2 free: n MB > node 3 cpus: > node 3 size: n MB > node 3 free: n MB > node 4 cpus: > node 4 size: n MB > node 4 free: n MB > node distances: > node 0 1 2 3 4 > 0: 10 20 40 40 80 > 1: 20 10 40 40 80 > 2: 40 40 10 40 80 > 3: 40 40 40 10 80 > 4: 80 80 80 80 10 > > The existing implementation gives below demotion targets, > > node demotion_target > 0 3, 2 > 1 4 > 2 X > 3 X > 4 X > > With this patch applied, below are the demotion targets, > > node demotion_target > 0 3, 2 > 1 3, 2 > 2 3 > 3 4 > 4 X For such machine, I think the perfect demotion order is, node demotion_target 0 2, 3 1 2, 3 2 4 3 4 4 X > e.g. with below NUMA topology, where node 0, 1 & 2 are > cpu + dram nodes and node 3 is slow memory node, > > available: 4 nodes (0-3) > node 0 cpus: 0 1 > node 0 size: n MB > node 0 free: n MB > node 1 cpus: 2 3 > node 1 size: n MB > node 1 free: n MB > node 2 cpus: 4 5 > node 2 size: n MB > node 2 free: n MB > node 3 cpus: > node 3 size: n MB > node 3 free: n MB > node distances: > node 0 1 2 3 > 0: 10 20 20 40 > 1: 20 10 20 40 > 2: 20 20 10 40 > 3: 40 40 40 10 > > The existing implementation gives below demotion targets, > > node demotion_target > 0 3 > 1 X > 2 X > 3 X > > With this patch applied, below are the demotion targets, > > node demotion_target > 0 3 > 1 3 > 2 3 > 3 X I think this is perfect already. > with below NUMA topology, where node 0 & 2 are cpu + dram > nodes and node 1 & 3 are slow memory nodes, > > available: 4 nodes (0-3) > node 0 cpus: 0 1 > node 0 size: n MB > node 0 free: n MB > node 1 cpus: > node 1 size: n MB > node 1 free: n MB > node 2 cpus: 2 3 > node 2 size: n MB > node 2 free: n MB > node 3 cpus: > node 3 size: n MB > node 3 free: n MB > node distances: > node 0 1 2 3 > 0: 10 40 20 80 > 1: 40 10 80 80 > 2: 20 80 10 40 > 3: 80 80 40 10 > > The existing implementation gives below demotion targets, > > node demotion_target > 0 3 > 1 X > 2 3 > 3 X Should be as below as you said in another email of the thread. node demotion_target 0 1 1 X 2 3 3 X > With this patch applied, below are the demotion targets, > > node demotion_target > 0 1 > 1 3 > 2 3 > 3 X The original demotion order looks better for me. 1 and 3 are at the same level from the perspective of the whole system. Another example, node 0 & 2 are cpu + dram nodes and node 1 are slow memory node near node 0, available: 3 nodes (0-2) node 0 cpus: 0 1 node 0 size: n MB node 0 free: n MB node 1 cpus: node 1 size: n MB node 1 free: n MB node 2 cpus: 2 3 node 2 size: n MB node 2 free: n MB node distances: node 0 1 2 0: 10 40 20 1: 40 10 80 2: 20 80 10 Demotion order 1: node demotion_target 0 1 1 X 2 X Demotion order 2: node demotion_target 0 1 1 X 2 1 Demotion order 2 looks better. But I think that demotion order 1 makes some sense too (like node reclaim mode). It seems that, If a demotion target has same distance to several current demotion sources, the demotion target should be shared among the demotion sources. And as Dave pointed out, we may eventually need a mechanism to override the default demotion order generated automatically. So we can just use some simple mechanism that makes sense in most cases in kernel automatically. And leave the best demotion for users to some customization mechanism. > As it can be seen above, node 3 can be demotion target for node > 1 but existing implementation doesn't configure it that way. It > is better to move pages from node 1 to node 3 instead of moving > it from node 1 to swap. > > Signed-off-by: Jagdish Gediya > Signed-off-by: Aneesh Kumar K.V Best Regards, Huang, Ying [snip]