From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71D91ECAAD1 for ; Thu, 1 Sep 2022 06:16:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C01C26B0072; Thu, 1 Sep 2022 02:16:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BB0946B0073; Thu, 1 Sep 2022 02:16:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A7A346B0074; Thu, 1 Sep 2022 02:16:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9B3596B0072 for ; Thu, 1 Sep 2022 02:16:29 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 5F58E12068D for ; Thu, 1 Sep 2022 06:16:29 +0000 (UTC) X-FDA: 79862507298.13.83AB53F Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf12.hostedemail.com (Postfix) with ESMTP id 7941640055 for ; Thu, 1 Sep 2022 06:16:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1662012988; x=1693548988; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=yImlM/1OG2vrSOuvDpVrqiRnkRtOiJmPik+XMrb8h5M=; b=SFmo3RyIU9+xLcRoQILGMcMVUf0GQYAN2bf7/EnOhk/jtx7vf1Hr9Qn8 DVQKj6Ge4VXjmh7Yd5YyWFqZnNqalGU460Kr389t4av7W9kknQciDRAE/ K8msDBYwLg3U2FosftCiu/x+Gte61fx7BJ2bXTHc9NIR1PjXES1kPEfUN YdtZehlqF0/KuGtqcZlOfY//8GZwrpM1W3NmE/kWF8XfzNYdjMdDaKJfj B4KvdigGqFSMXndPloUXXsZGkmAru2nqRTUnu1eE/ybgBxlUfGrw5mvyX KFEglo8jPpVdHAHrFAOEvL4WKZO0MYM+LqU4TNv9QQJ795IahtVOBGCNn A==; X-IronPort-AV: E=McAfee;i="6500,9779,10456"; a="296909807" X-IronPort-AV: E=Sophos;i="5.93,280,1654585200"; d="scan'208";a="296909807" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Aug 2022 23:16:22 -0700 X-IronPort-AV: E=Sophos;i="5.93,280,1654585200"; d="scan'208";a="608431820" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Aug 2022 23:16:18 -0700 From: "Huang, Ying" To: "Aneesh Kumar K.V" Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Wei Xu , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Johannes Weiner , jvgediya.oss@gmail.com, Bharata B Rao Subject: Re: [PATCH mm-unstable] mm/demotion: Assign correct memory type for multiple dax devices with the same node affinity References: <20220826100224.542312-1-aneesh.kumar@linux.ibm.com> Date: Thu, 01 Sep 2022 14:15:31 +0800 In-Reply-To: <20220826100224.542312-1-aneesh.kumar@linux.ibm.com> (Aneesh Kumar K. V.'s message of "Fri, 26 Aug 2022 15:32:24 +0530") Message-ID: <87a67j1uyk.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662012989; a=rsa-sha256; cv=none; b=VFuO3LF35CeEjWqKQx3R+BHEq6zoWE0A6ONxiTrqlbAk/LZqU9G7SBFDAtL+64+gt637yX F9AZlftholUfxN7LaM+JAIPvCTC42QVPsTx/DXA1dpRc8pcJ1nOkPLeHgC27y6HwMhMWCX EgPRkwVFN9HaOyXJWyS9f+QacDmjAwQ= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=SFmo3RyI; spf=softfail (imf12.hostedemail.com: 134.134.136.65 is neither permitted nor denied by domain of ying.huang@intel.com) smtp.mailfrom=ying.huang@intel.com; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662012989; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qVVcP052vrAooMaqDCVA1iaeC+iDzkcqW4RjRWQzl4c=; b=5Jms7THmFlMcdBJZpu2s6B6XrHBv47ZYoOhinYCHEqBdptJ654USJRyjfDXoY3SswP7X4S VzSFeH9T/9ftOn8dvlYmA8TmhNZN1QyANunPutUpImPPRIN1faHMbmmoatSoyHuWb3GF/+ 2pmhh4oPrk+tNncFD2CAbnwfxJ7Me/c= X-Rspamd-Queue-Id: 7941640055 X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=SFmo3RyI; spf=softfail (imf12.hostedemail.com: 134.134.136.65 is neither permitted nor denied by domain of ying.huang@intel.com) smtp.mailfrom=ying.huang@intel.com; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none) X-Rspamd-Server: rspam03 X-Stat-Signature: zmd1tkyj7uqdx659jyogjnpwkdm3pwz8 X-HE-Tag: 1662012988-356090 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: "Aneesh Kumar K.V" writes: > With multiple dax devices having the same node affinity, the kernel wrongly assigned > default_dram memory type to some devices after the memory hotplug operation. Fix this by > not clearing node_memory_types on the dax device remove. Sorry for late reply. Just for confirmation. There are multiple dax devices in one NUMA node? If you can show the bug reproducing steps, that will make it even easier to understand. Best Regards, Huang, Ying > The current kernel cleared node_memory_type on successful removal of a dax device. > But then we can have multiple dax devices with the same node affinity. Clearing the > node_memory_type results in assigning other dax devices to the default dram type when > we bring them online. > > Signed-off-by: Aneesh Kumar K.V > --- > mm/memory-tiers.c | 37 +++++++++++++++++++++++++++++-------- > 1 file changed, 29 insertions(+), 8 deletions(-) > > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c > index ba844fe9cc8c..c4bd6d052a33 100644 > --- a/mm/memory-tiers.c > +++ b/mm/memory-tiers.c > @@ -27,9 +27,14 @@ struct demotion_nodes { > nodemask_t preferred; > }; > > +struct node_memory_type_map { > + struct memory_dev_type *memtype; > + int map_count; > +}; > + > static DEFINE_MUTEX(memory_tier_lock); > static LIST_HEAD(memory_tiers); > -static struct memory_dev_type *node_memory_types[MAX_NUMNODES]; > +static struct node_memory_type_map node_memory_types[MAX_NUMNODES]; > static struct memory_dev_type *default_dram_type; > #ifdef CONFIG_MIGRATION > static int top_tier_adistance; > @@ -386,9 +391,19 @@ static inline void establish_demotion_targets(void) {} > > static inline void __init_node_memory_type(int node, struct memory_dev_type *memtype) > { > - if (!node_memory_types[node]) { > - node_memory_types[node] = memtype; > - kref_get(&memtype->kref); > + if (!node_memory_types[node].memtype) > + node_memory_types[node].memtype = memtype; > + /* > + * for each device getting added in the same NUMA node > + * with this specific memtype, bump the map count. We > + * Only take memtype device reference once, so that > + * changing a node memtype can be done by droping the > + * only reference count taken here. > + */ > + > + if (node_memory_types[node].memtype == memtype) { > + if (!node_memory_types[node].map_count++) > + kref_get(&memtype->kref); > } > } > > @@ -406,7 +421,7 @@ static struct memory_tier *set_node_memory_tier(int node) > > __init_node_memory_type(node, default_dram_type); > > - memtype = node_memory_types[node]; > + memtype = node_memory_types[node].memtype; > node_set(node, memtype->nodes); > memtier = find_create_memory_tier(memtype); > if (!IS_ERR(memtier)) > @@ -448,7 +463,7 @@ static bool clear_node_memory_tier(int node) > > rcu_assign_pointer(pgdat->memtier, NULL); > synchronize_rcu(); > - memtype = node_memory_types[node]; > + memtype = node_memory_types[node].memtype; > node_clear(node, memtype->nodes); > if (nodes_empty(memtype->nodes)) { > list_del_init(&memtype->tier_sibiling); > @@ -502,8 +517,14 @@ EXPORT_SYMBOL_GPL(init_node_memory_type); > void clear_node_memory_type(int node, struct memory_dev_type *memtype) > { > mutex_lock(&memory_tier_lock); > - if (node_memory_types[node] == memtype) { > - node_memory_types[node] = NULL; > + if (node_memory_types[node].memtype == memtype) > + node_memory_types[node].map_count--; > + /* > + * If we umapped all the attached devices to this node, > + * clear the node memory type. > + */ > + if (!node_memory_types[node].map_count) { > + node_memory_types[node].memtype = NULL; > kref_put(&memtype->kref, release_memtype); > } > mutex_unlock(&memory_tier_lock);