From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6BBEDC43334 for ; Mon, 13 Jun 2022 06:59:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DEA406B024E; Mon, 13 Jun 2022 02:59:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D9C496B0250; Mon, 13 Jun 2022 02:59:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C89D66B0251; Mon, 13 Jun 2022 02:59:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id B6CB06B024E for ; Mon, 13 Jun 2022 02:59:24 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7F25667B for ; Mon, 13 Jun 2022 06:59:24 +0000 (UTC) X-FDA: 79572311448.18.88D31A2 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf30.hostedemail.com (Postfix) with ESMTP id D10D380087 for ; Mon, 13 Jun 2022 06:59:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1655103563; x=1686639563; h=message-id:subject:from:to:cc:date:in-reply-to: references:mime-version:content-transfer-encoding; bh=j3eWTQk7h2QMpwvrlWqr83AalhCYxmBrbOjJyKycaFI=; b=etz/G5oaW6u43soUzXMuQ8kUtPS36S+wtcW9EEL887X+sLuSk8+yuJaB blHi31CHVduf78IwQ4ddejZKirevdY08Y7Fh2GTethFKN/XRB8xk9mx/+ dPNAZRTeR4hyelT8sNGwkJh6PFYb17M0z5+Ju9FlffOiY8MZzwCp2zRpy Y9HOOwTkcgbrmScZhELFRLWKV0J7nYNJCJ5HAmT6twhruPz/vsdNwHFfL 5RJNb1ozO5hLSV81RSZl0gE3RaqBRY5p/jADiU8CGGc0hOqz5isAdNtU3 o0BYao7aEiMgG4fjaZoBEHOk+Hoa/m3IzfDQPRIDR0URYtduiq3vOGnDo g==; X-IronPort-AV: E=McAfee;i="6400,9594,10376"; a="342165200" X-IronPort-AV: E=Sophos;i="5.91,296,1647327600"; d="scan'208";a="342165200" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2022 23:59:20 -0700 X-IronPort-AV: E=Sophos;i="5.91,296,1647327600"; d="scan'208";a="639563620" Received: from xinyangc-mobl.ccr.corp.intel.com ([10.254.214.65]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2022 23:59:15 -0700 Message-ID: <193ad45f2ec47ac157a812975f3e4235fcbc061a.camel@intel.com> Subject: Re: [PATCH v6 04/13] mm/demotion/dax/kmem: Set node's memory tier to MEMORY_TIER_PMEM From: Ying Huang To: "Aneesh Kumar K.V" , linux-mm@kvack.org, akpm@linux-foundation.org Cc: Wei Xu , Greg Thelen , Yang Shi , Davidlohr Bueso , Tim C Chen , Brice Goglin , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Feng Tang , Jagdish Gediya , Baolin Wang , David Rientjes Date: Mon, 13 Jun 2022 14:59:12 +0800 In-Reply-To: <20220610135229.182859-5-aneesh.kumar@linux.ibm.com> References: <20220610135229.182859-1-aneesh.kumar@linux.ibm.com> <20220610135229.182859-5-aneesh.kumar@linux.ibm.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.38.3-1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655103564; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jMCLkLypMApIrgbnHfPsUX7nEKw6W3g2SJWfVSa6RWk=; b=OdCVusK8SETRXlHXFeT5N5MkCAvr0S10nUb6Ir3IoT7W2rg/JIr5A8E8jbf7hxK+P6E6ui 487U/AwgQJamC6wk0LM7wjT6eg0LvU/XEe8IZIUQropvtdfALb9WZwMmPhl1ewO5eCZNWD 08R5P34Lh3fdwSh+/fGrQ/wKaKl8vKg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655103564; a=rsa-sha256; cv=none; b=6EkSZki2bsZyIV8w0pfH6VnRXSVftLyONlEeydaj5Zt5s9vSW+lnWDxVJ3t6HNeS4laTS8 WeMoScM26FPgStH3Gl5Ysv/+GBZWPVF3i2+KqwenhlJQCzYmvp0mCR7iBDq2TaxKjMFCVB 5BvtHlPkCbK2gTiPFs3WBZKGZlU0VtI= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="etz/G5oa"; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf30.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 134.134.136.100) smtp.mailfrom=ying.huang@intel.com X-Rspamd-Server: rspam11 X-Rspam-User: Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="etz/G5oa"; dmarc=pass (policy=none) header.from=intel.com; spf=none (imf30.hostedemail.com: domain of ying.huang@intel.com has no SPF policy when checking 134.134.136.100) smtp.mailfrom=ying.huang@intel.com X-Stat-Signature: ifwn6s1nfaejzetgzzdhum7qm6o5untn X-Rspamd-Queue-Id: D10D380087 X-HE-Tag: 1655103563-212192 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 2022-06-10 at 19:22 +0530, Aneesh Kumar K.V wrote: > By default, all nodes are assigned to DEFAULT_MEMORY_TIER which > is the memory tier designated for nodes with DRAM > > Set dax kmem device node's tier to MEMORY_TIER_PMEM. MEMORY_TIER_PMEM > is assigned a default rank value of 100 and appears below DEFAULT_MEMORY_TIER > in demotion order. > > Signed-off-by: Jagdish Gediya > Signed-off-by: Aneesh Kumar K.V > --- >  drivers/dax/kmem.c | 4 ++ >  include/linux/memory-tiers.h | 1 + >  mm/memory-tiers.c | 78 ++++++++++++++++++++++++++++++++++++ >  3 files changed, 83 insertions(+) > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > index a37622060fff..0cb3de3d138f 100644 > --- a/drivers/dax/kmem.c > +++ b/drivers/dax/kmem.c > @@ -11,6 +11,7 @@ >  #include >  #include >  #include > +#include >  #include "dax-private.h" >  #include "bus.h" >   > > @@ -147,6 +148,9 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) >   > >   dev_set_drvdata(dev, data); >   > > +#ifdef CONFIG_TIERED_MEMORY > + node_create_and_set_memory_tier(numa_node, MEMORY_TIER_PMEM); > +#endif >   return 0; >   > >  err_request_mem: > diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h > index 44c3c3b16a36..e102ec73ab80 100644 > --- a/include/linux/memory-tiers.h > +++ b/include/linux/memory-tiers.h > @@ -18,6 +18,7 @@ >  #define MAX_MEMORY_TIERS 3 >   > >  extern bool numa_demotion_enabled; > +int node_create_and_set_memory_tier(int node, int tier); >  #else >  #define numa_demotion_enabled false >   > > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c > index c3123a457d90..00d393a5a628 100644 > --- a/mm/memory-tiers.c > +++ b/mm/memory-tiers.c > @@ -67,6 +67,84 @@ static struct memory_tier *register_memory_tier(unsigned int tier, >   return memtier; >  } >   > > +static struct memory_tier *__node_get_memory_tier(int node) > +{ > + struct memory_tier *memtier; > + > + list_for_each_entry(memtier, &memory_tiers, list) { > + if (node_isset(node, memtier->nodelist)) > + return memtier; > + } > + return NULL; > +} > + I suggest to add NODE_DATA(nid)->mem_tier before this patch. That is, part of [9/13]. That will make code much simpler and easier to review. And, in addition to dax_kmem, whenever a normal node is onlined, we need to add it to the default memory tier. I found this is done in [5/13]. IMHO, we should move that part before this patch. Best Regards, Huang, Ying > +static struct memory_tier *__get_memory_tier_from_id(int id) > +{ > + struct memory_tier *memtier; > + > + list_for_each_entry(memtier, &memory_tiers, list) { > + if (memtier->id == id) > + return memtier; > + } > + return NULL; > +} > + > +static int __node_create_and_set_memory_tier(int node, int tier) > +{ > + int ret = 0; > + struct memory_tier *memtier; > + > + memtier = __get_memory_tier_from_id(tier); > + if (!memtier) { > + int rank; > + > + rank = get_rank_from_tier(tier); > + if (rank == -1) { > + ret = -EINVAL; > + goto out; > + } > + memtier = register_memory_tier(tier, rank); > + if (!memtier) { > + ret = -EINVAL; > + goto out; > + } > + } > + node_set(node, memtier->nodelist); > +out: > + return ret; > +} > + > +int node_create_and_set_memory_tier(int node, int tier) > +{ > + struct memory_tier *current_tier; > + int ret = 0; > + > + mutex_lock(&memory_tier_lock); > + > + current_tier = __node_get_memory_tier(node); > + if (!current_tier) { > + ret = __node_create_and_set_memory_tier(node, tier); > + goto out; > + } > + > + if (current_tier->id == tier) > + goto out; > + > + node_clear(node, current_tier->nodelist); > + > + ret = __node_create_and_set_memory_tier(node, tier); > + if (ret) { > + /* reset it back to older tier */ > + node_set(node, current_tier->nodelist); > + goto out; > + } > +out: > + mutex_unlock(&memory_tier_lock); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(node_create_and_set_memory_tier); > + >  static int __init memory_tier_init(void) >  { >   struct memory_tier *memtier;