From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF0ADC00144 for ; Mon, 1 Aug 2022 05:10:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4A7128E0002; Mon, 1 Aug 2022 01:10:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 42F558E0001; Mon, 1 Aug 2022 01:10:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2A8CF8E0002; Mon, 1 Aug 2022 01:10:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 14AF98E0001 for ; Mon, 1 Aug 2022 01:10:53 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DFE2F14078C for ; Mon, 1 Aug 2022 05:10:52 +0000 (UTC) X-FDA: 79749849144.07.6A4D81D Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf20.hostedemail.com (Postfix) with ESMTP id CADDB1C00FA for ; Mon, 1 Aug 2022 05:10:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1659330651; x=1690866651; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=sXlUNBQU0YAqd7QbFGF/4F/JI9f2vYUenSiG17xay1s=; b=QiAkKRJC/qzXBseMB3BoFcMKVKz38PGoWGRXg1f+jDnNpPfwUx71QJEj BZ8Q3t4eib/MQ+wa3ZIv/zgzr53Kj+40p6An6WmlZXEIK7rw12E1hKWiy zr3faIHEFxbdsyz/1vooopFSP+Opdtog6RaXgEa0Vck3CVHWp1HnU1Cx/ c0AOCEV6rM5FD5kqbjt0SBRvgJ4IE1FTyrNha4SZzuduRzuc422ebMd63 Qd8O01R0w7pofYVi8iDs9wBVyLg7d2odX5FdEBsbvdv08Xa2SxZ+QsQza QUchfYkQErl5gTZwcCxruHNGx1BHJyIXmuH35uwsqqBJ0OQ51IPn7841q Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10425"; a="269445297" X-IronPort-AV: E=Sophos;i="5.93,206,1654585200"; d="scan'208";a="269445297" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2022 22:10:50 -0700 X-IronPort-AV: E=Sophos;i="5.93,206,1654585200"; d="scan'208";a="577651319" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jul 2022 22:10:46 -0700 From: "Huang, Ying" To: Aneesh Kumar K V Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Wei Xu , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Johannes Weiner , jvgediya.oss@gmail.com Subject: Re: [PATCH v11 4/8] mm/demotion/dax/kmem: Set node's abstract distance to MEMTIER_ADISTANCE_PMEM References: <20220728190436.858458-1-aneesh.kumar@linux.ibm.com> <20220728190436.858458-5-aneesh.kumar@linux.ibm.com> <875yjgmocg.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkt8s7w9.fsf@linux.ibm.com> <87k07slnt7.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Mon, 01 Aug 2022 13:10:42 +0800 In-Reply-To: (Aneesh Kumar K. V.'s message of "Mon, 1 Aug 2022 10:10:39 +0530") Message-ID: <87tu6wk0q5.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659330652; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=k6+xRWcBn9Wv58oGyFWgGIzX9joEcyye6et79w+MpQA=; b=qo8iSLvUhdUVcQolghLyLzKZt514bl4pUUhgAP369veCoING8D8RGNWXUBZGntvuFgxTzh 4A1njsJYQDwHXjPCEVjQqISHSIGO8sVBnCeRifo4PP0ilVJ/ME2aJ3q5zv1OPF/q+HT4dT dT1CRCTwPJXE8NjjEYhSLbAZLprqXFc= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=QiAkKRJC; spf=pass (imf20.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659330652; a=rsa-sha256; cv=none; b=AyBbGivM9Z6uyvWkHK6tLuvjWwYYBpYt3xxwYYkjjeOIyxegYyUDuRynrWNpMJZR/P5hXk dV2sAwZXgmfeu1h5LycTJgHxNPSyfKwUe63UEAasOoJu8oJ/PXRU5VDoenZ5b5oJtlWfHx k2xL7fejT/oNyCE7fV1SpW2pjPNk8Ok= X-Rspam-User: Authentication-Results: imf20.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=QiAkKRJC; spf=pass (imf20.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: zp7xef9ukuuknm6jws3r3b6bw6fhntuc X-Rspamd-Queue-Id: CADDB1C00FA X-Rspamd-Server: rspam10 X-HE-Tag: 1659330651-503965 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Aneesh Kumar K V writes: > On 8/1/22 7:36 AM, Huang, Ying wrote: >> "Aneesh Kumar K.V" writes: >> >>> "Huang, Ying" writes: >>> >>>> "Aneesh Kumar K.V" writes: >>>> >>>>> By default, all nodes are assigned to the default memory tier which >>>>> is the memory tier designated for nodes with DRAM >>>>> >>>>> Set dax kmem device node's tier to slower memory tier by assigning >>>>> abstract distance to MEMTIER_ADISTANCE_PMEM. PMEM tier >>>>> appears below the default memory tier in demotion order. >>>>> >>>>> Signed-off-by: Aneesh Kumar K.V >>>>> --- >>>>> drivers/dax/kmem.c | 9 +++++++++ >>>>> include/linux/memory-tiers.h | 19 ++++++++++++++++++- >>>>> mm/memory-tiers.c | 28 ++++++++++++++++------------ >>>>> 3 files changed, 43 insertions(+), 13 deletions(-) >>>>> >>>>> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c >>>>> index a37622060fff..6b0d5de9a3e9 100644 >>>>> --- a/drivers/dax/kmem.c >>>>> +++ b/drivers/dax/kmem.c >>>>> @@ -11,6 +11,7 @@ >>>>> #include >>>>> #include >>>>> #include >>>>> +#include >>>>> #include "dax-private.h" >>>>> #include "bus.h" >>>>> >>>>> @@ -41,6 +42,12 @@ struct dax_kmem_data { >>>>> struct resource *res[]; >>>>> }; >>>>> >>>>> +static struct memory_dev_type default_pmem_type = { >>>> >>>> Why is this named as default_pmem_type? We will not change the memory >>>> type of a node usually. >>>> >>> >>> Any other suggestion? pmem_dev_type? >> >> Or dax_pmem_type? >> >> DAX is used to enumerate the memory device. >> >>> >>>>> + .adistance = MEMTIER_ADISTANCE_PMEM, >>>>> + .tier_sibiling = LIST_HEAD_INIT(default_pmem_type.tier_sibiling), >>>>> + .nodes = NODE_MASK_NONE, >>>>> +}; >>>>> + >>>>> static int dev_dax_kmem_probe(struct dev_dax *dev_dax) >>>>> { >>>>> struct device *dev = &dev_dax->dev; >>>>> @@ -62,6 +69,8 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) >>>>> return -EINVAL; >>>>> } >>>>> >>>>> + init_node_memory_type(numa_node, &default_pmem_type); >>>>> + >>>> >>>> The memory hot-add below may fail. So the error handling needs to be >>>> added. >>>> >>>> And, it appears that the memory type and memory tier of a node may be >>>> fully initialized here before NUMA hot-adding started. So I suggest to >>>> set node_memory_types[] here only. And set memory_dev_type->nodes in >>>> node hot-add callback. I think there is the proper place to complete >>>> the initialization. >>>> >>>> And, in theory dax/kmem.c can be unloaded. So we need to clear >>>> node_memory_types[] for nodes somewhere. >>>> >>> >>> I guess by module exit we can be sure that all the memory managed >>> by dax/kmem is hotplugged out. How about something like below? >> >> Because we set node_memorty_types[] in dev_dax_kmem_probe(), it's >> natural to clear it in dev_dax_kmem_remove(). >> > > Most of required reset/clear is done as part of memory hotunplug. So > if we did manage to successfully unplug the memory, everything except > node_memory_types[node] should be reset. That makes the clear_node_memory_type > the below. > > void clear_node_memory_type(int node, struct memory_dev_type *memtype) > { > > mutex_lock(&memory_tier_lock); > /* > * memory unplug did clear the node from the memtype and > * dax/kem did initialize this node's memory type. > */ > if (!node_isset(node, memtype->nodes) && node_memory_types[node] == memtype){ > node_memory_types[node] = NULL; > } > mutex_unlock(&memory_tier_lock); > } > > With the module unload, it is kind of force removing the usage of the specific memtype. > Considering module unload will remove the usage of specific memtype from other parts > of the kernel and we already do all the required reset in memory hot unplug, do we > need to do the clear_node_memory_type above? Per my understanding, we need to call clear_node_memory_type() in dev_dax_kmem_remove(). After that, we have nothing to do in dax_kmem_exit(). Best Regards, Huang, Ying