Re: [PATCH mm-unstable] mm/demotion: Assign correct memory type for multiple dax devices with the same node affinity

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Huang, Ying" <ying.huang@intel.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: linux-mm@kvack.org,  akpm@linux-foundation.org,
	 Wei Xu <weixugc@google.com>,  Yang Shi <shy828301@gmail.com>,
	 Davidlohr Bueso <dave@stgolabs.net>,
	 Tim C Chen <tim.c.chen@intel.com>,
	 Michal Hocko <mhocko@kernel.org>,
	 Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	 Hesham Almatary <hesham.almatary@huawei.com>,
	 Dave Hansen <dave.hansen@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	 Alistair Popple <apopple@nvidia.com>,
	 Dan Williams <dan.j.williams@intel.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	 jvgediya.oss@gmail.com,  Bharata B Rao <bharata@amd.com>
Subject: Re: [PATCH mm-unstable] mm/demotion: Assign correct memory type for multiple dax devices with the same node affinity
Date: Thu, 01 Sep 2022 14:15:31 +0800	[thread overview]
Message-ID: <87a67j1uyk.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <20220826100224.542312-1-aneesh.kumar@linux.ibm.com> (Aneesh Kumar K. V.'s message of "Fri, 26 Aug 2022 15:32:24 +0530")

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:

> With multiple dax devices having the same node affinity, the kernel wrongly assigned
> default_dram memory type to some devices after the memory hotplug operation. Fix this by
> not clearing node_memory_types on the dax device remove.

Sorry for late reply.

Just for confirmation.  There are multiple dax devices in one NUMA node?

If you can show the bug reproducing steps, that will make it even easier
to understand.

Best Regards,
Huang, Ying

> The current kernel cleared node_memory_type on successful removal of a dax device.
> But then we can have multiple dax devices with the same node affinity. Clearing the
> node_memory_type results in assigning other dax devices to the default dram type when
> we bring them online.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  mm/memory-tiers.c | 37 +++++++++++++++++++++++++++++--------
>  1 file changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> index ba844fe9cc8c..c4bd6d052a33 100644
> --- a/mm/memory-tiers.c
> +++ b/mm/memory-tiers.c
> @@ -27,9 +27,14 @@ struct demotion_nodes {
>  	nodemask_t preferred;
>  };
>  
> +struct node_memory_type_map {
> +	struct memory_dev_type *memtype;
> +	int map_count;
> +};
> +
>  static DEFINE_MUTEX(memory_tier_lock);
>  static LIST_HEAD(memory_tiers);
> -static struct memory_dev_type *node_memory_types[MAX_NUMNODES];
> +static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
>  static struct memory_dev_type *default_dram_type;
>  #ifdef CONFIG_MIGRATION
>  static int top_tier_adistance;
> @@ -386,9 +391,19 @@ static inline void establish_demotion_targets(void) {}
>  
>  static inline void __init_node_memory_type(int node, struct memory_dev_type *memtype)
>  {
> -	if (!node_memory_types[node]) {
> -		node_memory_types[node] = memtype;
> -		kref_get(&memtype->kref);
> +	if (!node_memory_types[node].memtype)
> +		node_memory_types[node].memtype = memtype;
> +	/*
> +	 * for each device getting added in the same NUMA node
> +	 * with this specific memtype, bump the map count. We
> +	 * Only take memtype device reference once, so that
> +	 * changing a node memtype can be done by droping the
> +	 * only reference count taken here.
> +	 */
> +
> +	if (node_memory_types[node].memtype == memtype) {
> +		if (!node_memory_types[node].map_count++)
> +			kref_get(&memtype->kref);
>  	}
>  }
>  
> @@ -406,7 +421,7 @@ static struct memory_tier *set_node_memory_tier(int node)
>  
>  	__init_node_memory_type(node, default_dram_type);
>  
> -	memtype = node_memory_types[node];
> +	memtype = node_memory_types[node].memtype;
>  	node_set(node, memtype->nodes);
>  	memtier = find_create_memory_tier(memtype);
>  	if (!IS_ERR(memtier))
> @@ -448,7 +463,7 @@ static bool clear_node_memory_tier(int node)
>  
>  		rcu_assign_pointer(pgdat->memtier, NULL);
>  		synchronize_rcu();
> -		memtype = node_memory_types[node];
> +		memtype = node_memory_types[node].memtype;
>  		node_clear(node, memtype->nodes);
>  		if (nodes_empty(memtype->nodes)) {
>  			list_del_init(&memtype->tier_sibiling);
> @@ -502,8 +517,14 @@ EXPORT_SYMBOL_GPL(init_node_memory_type);
>  void clear_node_memory_type(int node, struct memory_dev_type *memtype)
>  {
>  	mutex_lock(&memory_tier_lock);
> -	if (node_memory_types[node] == memtype) {
> -		node_memory_types[node] = NULL;
> +	if (node_memory_types[node].memtype == memtype)
> +		node_memory_types[node].map_count--;
> +	/*
> +	 * If we umapped all the attached devices to this node,
> +	 * clear the node memory type.
> +	 */
> +	if (!node_memory_types[node].map_count) {
> +		node_memory_types[node].memtype = NULL;
>  		kref_put(&memtype->kref, release_memtype);
>  	}
>  	mutex_unlock(&memory_tier_lock);

next prev parent reply	other threads:[~2022-09-01  6:16 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-26 10:02 Aneesh Kumar K.V
2022-08-27  3:00 ` Andrew Morton
2022-09-01  6:15 ` Huang, Ying [this message]
2022-09-01  6:24   ` Aneesh Kumar K V
2022-09-01  6:45     ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a67j1uyk.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=apopple@nvidia.com \
    --cc=bharata@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=hannes@cmpxchg.org \
    --cc=hesham.almatary@huawei.com \
    --cc=jvgediya.oss@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=shy828301@gmail.com \
    --cc=tim.c.chen@intel.com \
    --cc=weixugc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox