From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5F57C00140 for ; Mon, 1 Aug 2022 01:04:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 386008E0002; Sun, 31 Jul 2022 21:04:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 335358E0001; Sun, 31 Jul 2022 21:04:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1FCF48E0002; Sun, 31 Jul 2022 21:04:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 126368E0001 for ; Sun, 31 Jul 2022 21:04:30 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id DE0E1C074E for ; Mon, 1 Aug 2022 01:04:29 +0000 (UTC) X-FDA: 79749228258.16.C8183C0 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf12.hostedemail.com (Postfix) with ESMTP id 0A11740052 for ; Mon, 1 Aug 2022 01:04:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1659315869; x=1690851869; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=MBsZNqKqs+B3csBGzjyuQ2OxAFygDzHNCmXM9erWVhI=; b=K2SjaNWTwq5HiPLrJQ8UmqOfyd0UexVTVjblNGKFYhai3OAnUpNyEjPN ehGgbygnV0MIdQmJP5PdwD4/0ONmf155V5e1p326nBalNGXWaVX8aq50f 3zSxymTCRaAC2oPEEPPDSma2pYXu/TReGzZIdg6FUt1k8HdP/kjQlhtQK Fy4ACvrSG5BHn+R7WlB3Vol+XtXz3qsMYH66O3CCmeOvdGRdZEG5+6fiH qVQcc0X7HkQ9GsnLVSUwIxTUE5T4cNHDrOZB3niOKgYP7xYqgnLFm4nzz o0d3SeQl4hSHxvSN785OjYsKYkEdvNFuJeq4PuRwoZOdlShVeqiEKaM/b w==; X-IronPort-AV: E=McAfee;i="6400,9594,10425"; a="275938927" X-IronPort-AV: E=Sophos;i="5.93,206,1654585200"; d="scan'208";a="275938927" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2022 18:04:27 -0700 X-IronPort-AV: E=Sophos;i="5.93,206,1654585200"; d="scan'208";a="552346665" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2022 18:04:23 -0700 From: "Huang, Ying" To: Aneesh Kumar K V Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Wei Xu , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Johannes Weiner , jvgediya.oss@gmail.com Subject: Re: [PATCH v11 8/8] mm/demotion: Update node_is_toptier to work with memory tiers References: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> <20220728190436.858458-9-aneesh.kumar@linux.ibm.com> <87sfmkl8x0.fsf@yhuang6-desk2.ccr.corp.intel.com> <9fa09da8-eff7-e39a-96b0-2bc51711f08f@linux.ibm.com> Date: Mon, 01 Aug 2022 09:04:08 +0800 In-Reply-To: (Aneesh Kumar K. V.'s message of "Fri, 29 Jul 2022 12:17:45 +0530") Message-ID: <87o7x4lqpj.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=K2SjaNWT; spf=pass (imf12.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659315869; a=rsa-sha256; cv=none; b=KmT1QbcruiK/6yR72MkMtczBmjR/TwwMz51MLMRNMsiI5MrmYmAi7ByGfWLOVccSaA8J8Q BJCVXaROSeRxaw5QUS3FT42NnLDXAZhbaCaCtMWR3QvWvcZGMEq/p2kz+iUsXMRXzVzdaV 9GIoz4sFwgMHocT572Fz1uRJAJaHekI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659315869; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=K6m2yRVMKR1xgxuBLpgmM1O2P5/Ch8DMGSfQLR3XshI=; b=dPI0pvXBYOubgguCd2Y55B7qPRnccQrDAYVQsw7qAMO2CPyoKom4mzxEfYNiIHWvgTUH1a Q8q+IJnWCmLIM7N3+5+4NPaVnIUuq1tVqclP+InP/ef1UNfn9ynbKm04xibUBwNVe9bs5R Hr0qvhRXBbKXqXLSq+4WkxDCEUKsxtI= X-Stat-Signature: 91zffgee14ehm87pobua3dee8amgqum9 X-Rspamd-Queue-Id: 0A11740052 Authentication-Results: imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=K2SjaNWT; spf=pass (imf12.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1659315868-958297 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Aneesh Kumar K V writes: > On 7/29/22 12:11 PM, Aneesh Kumar K V wrote: >> On 7/29/22 12:09 PM, Huang, Ying wrote: >>> "Aneesh Kumar K.V" writes: >>> >>>> With memory tiers support we can have memory only NUMA nodes >>>> in the top tier from which we want to avoid promotion tracking NUMA >>>> faults. Update node_is_toptier to work with memory tiers. >>>> All NUMA nodes are by default top tier nodes. With lower memory >>>> tiers added we consider all memory tiers above a memory tier having >>>> CPU NUMA nodes as a top memory tier >>>> >>>> Signed-off-by: Aneesh Kumar K.V >>>> --- >>>> include/linux/memory-tiers.h | 11 ++++++++++ >>>> include/linux/node.h | 5 ----- >>>> mm/huge_memory.c | 1 + >>>> mm/memory-tiers.c | 42 ++++++++++++++++++++++++++++++++++++ >>>> mm/migrate.c | 1 + >>>> mm/mprotect.c | 1 + >>>> 6 files changed, 56 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h >>>> index f8dbeda617a7..bc9fb9d39b2c 100644 >>>> --- a/include/linux/memory-tiers.h >>>> +++ b/include/linux/memory-tiers.h >>>> @@ -35,6 +35,7 @@ struct memory_dev_type *init_node_memory_type(int node, struct memory_dev_type * >>>> #ifdef CONFIG_MIGRATION >>>> int next_demotion_node(int node); >>>> void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets); >>>> +bool node_is_toptier(int node); >>>> #else >>>> static inline int next_demotion_node(int node) >>>> { >>>> @@ -45,6 +46,11 @@ static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *target >>>> { >>>> *targets = NODE_MASK_NONE; >>>> } >>>> + >>>> +static inline bool node_is_toptier(int node) >>>> +{ >>>> + return true; >>>> +} >>>> #endif >>>> >>>> #else >>>> @@ -64,5 +70,10 @@ static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *target >>>> { >>>> *targets = NODE_MASK_NONE; >>>> } >>>> + >>>> +static inline bool node_is_toptier(int node) >>>> +{ >>>> + return true; >>>> +} >>>> #endif /* CONFIG_NUMA */ >>>> #endif /* _LINUX_MEMORY_TIERS_H */ >>>> diff --git a/include/linux/node.h b/include/linux/node.h >>>> index 40d641a8bfb0..9ec680dd607f 100644 >>>> --- a/include/linux/node.h >>>> +++ b/include/linux/node.h >>>> @@ -185,9 +185,4 @@ static inline void register_hugetlbfs_with_node(node_registration_func_t reg, >>>> >>>> #define to_node(device) container_of(device, struct node, dev) >>>> >>>> -static inline bool node_is_toptier(int node) >>>> -{ >>>> - return node_state(node, N_CPU); >>>> -} >>>> - >>>> #endif /* _LINUX_NODE_H_ */ >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>> index 834f288b3769..8405662646e9 100644 >>>> --- a/mm/huge_memory.c >>>> +++ b/mm/huge_memory.c >>>> @@ -35,6 +35,7 @@ >>>> #include >>>> #include >>>> #include >>>> +#include >>>> >>>> #include >>>> #include >>>> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c >>>> index 84e2be31a853..36d87dc422ab 100644 >>>> --- a/mm/memory-tiers.c >>>> +++ b/mm/memory-tiers.c >>>> @@ -30,6 +30,7 @@ static DEFINE_MUTEX(memory_tier_lock); >>>> static LIST_HEAD(memory_tiers); >>>> struct memory_dev_type *node_memory_types[MAX_NUMNODES]; >>>> #ifdef CONFIG_MIGRATION >>>> +static int top_tier_adistance; >>>> /* >>>> * node_demotion[] examples: >>>> * >>>> @@ -159,6 +160,31 @@ static struct memory_tier *__node_get_memory_tier(int node) >>>> } >>>> >>>> #ifdef CONFIG_MIGRATION >>>> +bool node_is_toptier(int node) >>>> +{ >>>> + bool toptier; >>>> + pg_data_t *pgdat; >>>> + struct memory_tier *memtier; >>>> + >>>> + pgdat = NODE_DATA(node); >>>> + if (!pgdat) >>>> + return false; >>>> + >>>> + rcu_read_lock(); >>>> + memtier = rcu_dereference(pgdat->memtier); >>>> + if (!memtier) { >>>> + toptier = true; >>>> + goto out; >>>> + } >>>> + if (memtier->adistance_start >= top_tier_adistance) >>>> + toptier = true; >>>> + else >>>> + toptier = false; >>>> +out: >>>> + rcu_read_unlock(); >>>> + return toptier; >>>> +} >>>> + >>>> void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets) >>>> { >>>> struct memory_tier *memtier; >>>> @@ -315,6 +341,22 @@ static void establish_demotion_targets(void) >>>> } >>>> } while (1); >>>> } >>>> + /* >>>> + * Promotion is allowed from a memory tier to higher >>>> + * memory tier only if the memory tier doesn't include >>>> + * compute. We want to skip promotion from a memory tier, >>>> + * if any node that is part of the memory tier have CPUs. >>>> + * Once we detect such a memory tier, we consider that tier >>>> + * as top tiper from which promotion on is not allowed. >>>> + */ >>>> + list_for_each_entry(memtier, &memory_tiers, list) { >>>> + tier_nodes = get_memtier_nodemask(memtier); >>>> + nodes_and(tier_nodes, node_states[N_CPU], tier_nodes); >>>> + if (!nodes_empty(tier_nodes)) { >>>> + top_tier_adistance = memtier->adistance_start; >>> >>> IMHO, this should be, >>> >>> top_tier_adistance = memtier->adistance_start + MEMTIER_CHUNK_SIZE; >>> >> >> Good catch. Will update. BTW i did send v12 version of the patchset already to the list. >> >> > > Checking this again, we consider a node top tier if the node's memtier abstract distance > satisfy the below. > > if (memtier->adistance_start <= top_tier_adistance) > toptier = true; I admit that this works correctly. And I think that the following code is even more correct conceptually. If so, why not help the code reader to understand it more easily? if (memtier->adistance_start + MEMTIER_CHUNK_SIZE <= top_tier_adistance) toptier = true; Best Regards, Huang, Ying > With that we should be good with the current code. But I agree with you that top_tier_distance > should cover the full range of the top memory tier. > > -aneesh