From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA95BC00140 for ; Mon, 1 Aug 2022 02:06:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C01C68E0002; Sun, 31 Jul 2022 22:06:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BB15C8E0001; Sun, 31 Jul 2022 22:06:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A78A08E0002; Sun, 31 Jul 2022 22:06:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 91A758E0001 for ; Sun, 31 Jul 2022 22:06:54 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6FEC94070F for ; Mon, 1 Aug 2022 02:06:54 +0000 (UTC) X-FDA: 79749385548.16.263E94E Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by imf31.hostedemail.com (Postfix) with ESMTP id 661E5200DC for ; Mon, 1 Aug 2022 02:06:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1659319613; x=1690855613; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=z3o9XhM+vgD3t9N8cNdE+UfIUMTL5F3Mmc63kX4QpwU=; b=IcmWkzzB1tes7RZFDPY3fo07zMt87XkjXRUX2OfMio5beb41OIPQpT/V 2t129EcHPkwmKZTcDL94o2E62OZL6S83eE6PvGce3US60MSr7xlisy4SI 3xVueumR8yTcb/YtPa3fQg4jGo8RxOLCCXBgpe8MdmTDcsiuiuW2BkYVo BHxfFStcbuuUh1ZbrdS3G/lyIKofpkQPjq8annYmz2r1xxOMTz1+80OmL 6m10MRiKFAYHKs1N2Hl//X3LNsyjuC8gWIPa9DhrXaaEswnZth34QWIKX /veQzVJnOigojUuGsjBT7RbiME1J6yR+q9IvQrvA7LloynrXwmHFOfhgl w==; X-IronPort-AV: E=McAfee;i="6400,9594,10425"; a="350753624" X-IronPort-AV: E=Sophos;i="5.93,206,1654585200"; d="scan'208";a="350753624" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2022 19:06:51 -0700 X-IronPort-AV: E=Sophos;i="5.93,206,1654585200"; d="scan'208";a="577612599" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2022 19:06:48 -0700 From: "Huang, Ying" To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Wei Xu , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Johannes Weiner , jvgediya.oss@gmail.com Subject: Re: [PATCH v11 4/8] mm/demotion/dax/kmem: Set node's abstract distance to MEMTIER_ADISTANCE_PMEM References: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> <20220728190436.858458-5-aneesh.kumar@linux.ibm.com> <875yjgmocg.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkt8s7w9.fsf@linux.ibm.com> Date: Mon, 01 Aug 2022 10:06:44 +0800 In-Reply-To: <87bkt8s7w9.fsf@linux.ibm.com> (Aneesh Kumar K. V.'s message of "Fri, 29 Jul 2022 12:49:34 +0530") Message-ID: <87k07slnt7.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii ARC-Authentication-Results: i=1; imf31.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=IcmWkzzB; spf=pass (imf31.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659319614; a=rsa-sha256; cv=none; b=Z6Py9nTW9CBTYaueQo5U7MBdgWoOfBXA+NihSGhLycEVaCYAy3uz+F/WimsENvE8dG0fXt cS5rzxJN/RWpev4pkbZWA9hCKExaE1g9BVCPt2vfXIpATLHXOlTrSOhWRJ/ljHwwGhgxZP eTgxZIX+PjI6P/y3rzZNVDttB2NHluI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659319614; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=avK1vCpGvXA9s/kadwVOS1NXW4MbByBuNA7gXr0MyLM=; b=ASdM2dGNWCj4TS4DqKWrcgXcDyDyvE6hNmiITdkG9ZHFF2E7JUJYATfAJxQBYh885Jp3Vr lxGI8ZBaYCIQcsxBo6AOjl1InC439Yb4uIjeEd5zFBXpoth2oqukc2zXugMIlb41MxXiIq pb9IyQ5Trype8jO7dDGKjPpFtZBc7DQ= X-Stat-Signature: epokfcdg8mtob34spnpycukj175fwnqk X-Rspamd-Queue-Id: 661E5200DC Authentication-Results: imf31.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=IcmWkzzB; spf=pass (imf31.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.31 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1659319613-571544 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: "Aneesh Kumar K.V" writes: > "Huang, Ying" writes: > >> "Aneesh Kumar K.V" writes: >> >>> By default, all nodes are assigned to the default memory tier which >>> is the memory tier designated for nodes with DRAM >>> >>> Set dax kmem device node's tier to slower memory tier by assigning >>> abstract distance to MEMTIER_ADISTANCE_PMEM. PMEM tier >>> appears below the default memory tier in demotion order. >>> >>> Signed-off-by: Aneesh Kumar K.V >>> --- >>> drivers/dax/kmem.c | 9 +++++++++ >>> include/linux/memory-tiers.h | 19 ++++++++++++++++++- >>> mm/memory-tiers.c | 28 ++++++++++++++++------------ >>> 3 files changed, 43 insertions(+), 13 deletions(-) >>> >>> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c >>> index a37622060fff..6b0d5de9a3e9 100644 >>> --- a/drivers/dax/kmem.c >>> +++ b/drivers/dax/kmem.c >>> @@ -11,6 +11,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> #include "dax-private.h" >>> #include "bus.h" >>> >>> @@ -41,6 +42,12 @@ struct dax_kmem_data { >>> struct resource *res[]; >>> }; >>> >>> +static struct memory_dev_type default_pmem_type = { >> >> Why is this named as default_pmem_type? We will not change the memory >> type of a node usually. >> > > Any other suggestion? pmem_dev_type? Or dax_pmem_type? DAX is used to enumerate the memory device. > >>> + .adistance = MEMTIER_ADISTANCE_PMEM, >>> + .tier_sibiling = LIST_HEAD_INIT(default_pmem_type.tier_sibiling), >>> + .nodes = NODE_MASK_NONE, >>> +}; >>> + >>> static int dev_dax_kmem_probe(struct dev_dax *dev_dax) >>> { >>> struct device *dev = &dev_dax->dev; >>> @@ -62,6 +69,8 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) >>> return -EINVAL; >>> } >>> >>> + init_node_memory_type(numa_node, &default_pmem_type); >>> + >> >> The memory hot-add below may fail. So the error handling needs to be >> added. >> >> And, it appears that the memory type and memory tier of a node may be >> fully initialized here before NUMA hot-adding started. So I suggest to >> set node_memory_types[] here only. And set memory_dev_type->nodes in >> node hot-add callback. I think there is the proper place to complete >> the initialization. >> >> And, in theory dax/kmem.c can be unloaded. So we need to clear >> node_memory_types[] for nodes somewhere. >> > > I guess by module exit we can be sure that all the memory managed > by dax/kmem is hotplugged out. How about something like below? Because we set node_memorty_types[] in dev_dax_kmem_probe(), it's natural to clear it in dev_dax_kmem_remove(). Best Regards, Huang, Ying > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > index 6b0d5de9a3e9..eb4e158012a9 100644 > --- a/drivers/dax/kmem.c > +++ b/drivers/dax/kmem.c > @@ -248,6 +248,7 @@ static void __exit dax_kmem_exit(void) > dax_driver_unregister(&device_dax_kmem_driver); > if (!any_hotremove_failed) > kfree_const(kmem_name); > + unregister_memory_type(&default_pmem_type); > } > > MODULE_AUTHOR("Intel Corporation"); > diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h > index fc6b7a14da51..8355baf5b8b4 100644 > --- a/include/linux/memory-tiers.h > +++ b/include/linux/memory-tiers.h > @@ -31,6 +31,7 @@ struct memory_dev_type { > #ifdef CONFIG_NUMA > extern bool numa_demotion_enabled; > void init_node_memory_type(int node, struct memory_dev_type *default_type); > +void unregister_memory_type(struct memory_dev_type *memtype); > #ifdef CONFIG_MIGRATION > int next_demotion_node(int node); > void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets); > @@ -57,6 +58,10 @@ static inline bool node_is_toptier(int node) > #define numa_demotion_enabled false > static inline void init_node_memory_type(int node, struct memory_dev_type *default_type) > { > +} > + > +static inline void unregister_memory_type(struct memory_dev_type *memtype) > +{ > > } > > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c > index 064e0f932795..4d29ebd4c4f3 100644 > --- a/mm/memory-tiers.c > +++ b/mm/memory-tiers.c > @@ -500,6 +500,28 @@ void init_node_memory_type(int node, struct memory_dev_type *default_type) > mutex_unlock(&memory_tier_lock); > } > > +void unregister_memory_type(struct memory_dev_type *memtype) > +{ > + int node; > + struct memory_tier *memtier = memtype->memtier; > + > + mutex_lock(&memory_tier_lock); > + for(node = 0; node < MAX_NUMNODES; node++) { > + if (node_memory_types[node] == memtype) { > + if (!nodes_empty(memtype->nodes)) > + WARN_ON(1); > + node_memory_types[node] = NULL; > + } > + } > + > + list_del(&memtype->tier_sibiling); > + memtype->memtier = NULL; > + if (list_empty(&memtier->memory_types)) > + destroy_memory_tier(memtier); > + > + mutex_unlock(&memory_tier_lock); > +} > + > void update_node_adistance(int node, struct memory_dev_type *memtype) > { > pg_data_t *pgdat;