From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47536C433F5 for ; Thu, 31 Mar 2022 07:23:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 919748D0001; Thu, 31 Mar 2022 03:23:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 89FD96B0073; Thu, 31 Mar 2022 03:23:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6F2A58D0001; Thu, 31 Mar 2022 03:23:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 5B4356B0072 for ; Thu, 31 Mar 2022 03:23:24 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 2358223CA1 for ; Thu, 31 Mar 2022 07:23:24 +0000 (UTC) X-FDA: 79303840728.15.FA6BE0E Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf29.hostedemail.com (Postfix) with ESMTP id 318C312000F for ; Thu, 31 Mar 2022 07:23:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1648711403; x=1680247403; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=g1tUG2Q8DPCSkogHEocQF8gq9/NFNRZEipovqQEvc7c=; b=c8e+rjDJNF6tJ4DFHorkuFzh4ta1z5k67f4qHNR2xjRx80yFXwcvNxbu b3l137x9IkR8TFeS9ekFWT3UJIInJ9HU0ubyEk5WSoNf9nOUxZf927uoq pHba/4th/igj2My5UwcoBhmmUq7Mq//lsCFCT8emxSm06vr8ZU+G3COJ1 Cl1tTkf270yEyw9uCraebvkC0tBYW9OarrACBvhKL/c7Ay6N6TP8OzBE3 bDaSdtX9rOL4sMrGM/+0pEqe9GQ9euZztEdxjitPt36u2wUycKiDBFGLG v3Pr4YJhIL2UJm2TgqbPt2wvNOHmzBWs5JL65gmBAw625kcxx2vaWTE7T g==; X-IronPort-AV: E=McAfee;i="6200,9189,10302"; a="259925735" X-IronPort-AV: E=Sophos;i="5.90,224,1643702400"; d="scan'208";a="259925735" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Mar 2022 00:23:21 -0700 X-IronPort-AV: E=Sophos;i="5.90,224,1643702400"; d="scan'208";a="566225927" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.239.13.94]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Mar 2022 00:23:18 -0700 From: "Huang, Ying" To: "Aneesh Kumar K.V" , Jagdish Gediya Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, baolin.wang@linux.alibaba.com, dave.hansen@linux.intel.com, Fan Du Subject: Re: [PATCH] mm: migrate: set demotion targets differently References: <20220329115222.8923-1-jvgediya@linux.ibm.com> <87pmm4c4ys.fsf@yhuang6-desk2.ccr.corp.intel.com> <87lewrxsv1.fsf@linux.ibm.com> <878rsrc672.fsf@yhuang6-desk2.ccr.corp.intel.com> <87ilruy5zt.fsf@linux.ibm.com> Date: Thu, 31 Mar 2022 15:23:16 +0800 In-Reply-To: <87ilruy5zt.fsf@linux.ibm.com> (Aneesh Kumar K. V.'s message of "Thu, 31 Mar 2022 12:15:58 +0530") Message-ID: <87h77ebn6j.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Stat-Signature: osr9yt7yjojhfy38sjaq96kuj16qmhwk Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=c8e+rjDJ; spf=none (imf29.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 318C312000F X-HE-Tag: 1648711402-783116 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: "Aneesh Kumar K.V" writes: > "Huang, Ying" writes: > >> "Aneesh Kumar K.V" writes: >> >>> "Huang, Ying" writes: >>> >>>> Hi, Jagdish, >>>> >>>> Jagdish Gediya writes: >>>> >>> >>> ... >>> >>>>> e.g. with below NUMA topology, where node 0 & 1 are >>>>> cpu + dram nodes, node 2 & 3 are equally slower memory >>>>> only nodes, and node 4 is slowest memory only node, >>>>> >>>>> available: 5 nodes (0-4) >>>>> node 0 cpus: 0 1 >>>>> node 0 size: n MB >>>>> node 0 free: n MB >>>>> node 1 cpus: 2 3 >>>>> node 1 size: n MB >>>>> node 1 free: n MB >>>>> node 2 cpus: >>>>> node 2 size: n MB >>>>> node 2 free: n MB >>>>> node 3 cpus: >>>>> node 3 size: n MB >>>>> node 3 free: n MB >>>>> node 4 cpus: >>>>> node 4 size: n MB >>>>> node 4 free: n MB >>>>> node distances: >>>>> node 0 1 2 3 4 >>>>> 0: 10 20 40 40 80 >>>>> 1: 20 10 40 40 80 >>>>> 2: 40 40 10 40 80 >>>>> 3: 40 40 40 10 80 >>>>> 4: 80 80 80 80 10 >>>>> >>>>> The existing implementation gives below demotion targets, >>>>> >>>>> node demotion_target >>>>> 0 3, 2 >>>>> 1 4 >>>>> 2 X >>>>> 3 X >>>>> 4 X >>>>> >>>>> With this patch applied, below are the demotion targets, >>>>> >>>>> node demotion_target >>>>> 0 3, 2 >>>>> 1 3, 2 >>>>> 2 3 >>>>> 3 4 >>>>> 4 X >>>> >>>> For such machine, I think the perfect demotion order is, >>>> >>>> node demotion_target >>>> 0 2, 3 >>>> 1 2, 3 >>>> 2 4 >>>> 3 4 >>>> 4 X >>> >>> I guess the "equally slow nodes" is a confusing definition here. Now if the >>> system consists of 2 1GB equally slow memory and the firmware doesn't want to >>> differentiate between them, firmware can present a single NUMA node >>> with 2GB capacity? The fact that we are finding two NUMA nodes is a hint >>> that there is some difference between these two memory devices. This is >>> also captured by the fact that the distance between 2 and 3 is 40 and not 10. >> >> Do you have more information about this? > > Not sure I follow the question there. I was checking shouldn't firmware > do a single NUMA node if two memory devices are of the same type? How will > optane present such a config? Both the DIMMs will have the same > proximity domain value and hence dax kmem will add them to the same NUMA > node? Sorry for confusing. I just wanted to check whether you have more information about the machine configuration above. The machines in my hand have no complex NUMA topology as in the patch description. > If you are suggesting that firmware doesn't do that, then I agree with you > that a demotion target like the below is good. > > node demotion_target > 0 2, 3 > 1 2, 3 > 2 4 > 3 4 > 4 X > > We can also achieve that with a smiple change as below. Glad to see the demotion order can be implemented in a simple way. My concern is that is it necessary to do this? If there are real machines with the NUMA topology, then I think it's good to add the support. But if not, why do we make the code complex unnecessarily? I don't have these kind of machines, do you have and will have? > @@ -3120,7 +3120,7 @@ static void __set_migration_target_nodes(void) > { > nodemask_t next_pass = NODE_MASK_NONE; > nodemask_t this_pass = NODE_MASK_NONE; > - nodemask_t used_targets = NODE_MASK_NONE; > + nodemask_t this_pass_used_targets = NODE_MASK_NONE; > int node, best_distance; > > /* > @@ -3141,17 +3141,20 @@ static void __set_migration_target_nodes(void) > /* > * To avoid cycles in the migration "graph", ensure > * that migration sources are not future targets by > - * setting them in 'used_targets'. Do this only > + * setting them in 'this_pass_used_targets'. Do this only > * once per pass so that multiple source nodes can > * share a target node. > * > - * 'used_targets' will become unavailable in future > + * 'this_pass_used_targets' will become unavailable in future > * passes. This limits some opportunities for > * multiple source nodes to share a destination. > */ > - nodes_or(used_targets, used_targets, this_pass); > + nodes_or(this_pass_used_targets, this_pass_used_targets, this_pass); > > for_each_node_mask(node, this_pass) { > + > + nodemask_t used_targets = this_pass_used_targets; > + > best_distance = -1; > > /* Best Regards, Huang, Ying