Re: [PATCH v3 3/3] mm/mempolicy: Support memory hotplug in weighted interleave

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Rakie Kim <rakie.kim@sk.com>
To: Rakie Kim <rakie.kim@sk.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org,
	joshua.hahnjy@gmail.com, dan.j.williams@intel.com,
	ying.huang@linux.alibaba.com, david@redhat.com,
	Jonathan.Cameron@huawei.com, kernel_team@skhynix.com,
	honggyu.kim@sk.com, yunjeong.mun@sk.com,
	Gregory Price <gourry@gourry.net>
Subject: Re: [PATCH v3 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
Date: Mon, 24 Mar 2025 17:54:27 +0900	[thread overview]
Message-ID: <20250324085433.998-1-rakie.kim@sk.com> (raw)
In-Reply-To: <20250324084920.987-1-rakie.kim@sk.com>

On Mon, 24 Mar 2025 17:48:39 +0900 Rakie Kim <rakie.kim@sk.com> wrote:
> On Fri, 21 Mar 2025 10:24:46 -0400 Gregory Price <gourry@gourry.net> wrote:
> > On Thu, Mar 20, 2025 at 01:17:48PM +0900, Rakie Kim wrote:
> > ... snip ...
> > > +	mutex_lock(&sgrp->kobj_lock);
> > > +	if (sgrp->nattrs[nid]) {
> > > +		mutex_unlock(&sgrp->kobj_lock);
> > > +		pr_info("Node [%d] already exists\n", nid);
> > > +		kfree(new_attr);
> > > +		kfree(name);
> > > +		return 0;
> > > +	}
> > >  
> > > -	if (sysfs_create_file(&sgrp->wi_kobj, &node_attr->kobj_attr.attr)) {
> > > -		kfree(node_attr->kobj_attr.attr.name);
> > > -		kfree(node_attr);
> > > -		pr_err("failed to add attribute to weighted_interleave\n");
> > > -		return -ENOMEM;
> > > +	sgrp->nattrs[nid] = new_attr;
> > > +	mutex_unlock(&sgrp->kobj_lock);
> > > +
> > > +	sysfs_attr_init(&sgrp->nattrs[nid]->kobj_attr.attr);
> > > +	sgrp->nattrs[nid]->kobj_attr.attr.name = name;
> > > +	sgrp->nattrs[nid]->kobj_attr.attr.mode = 0644;
> > > +	sgrp->nattrs[nid]->kobj_attr.show = node_show;
> > > +	sgrp->nattrs[nid]->kobj_attr.store = node_store;
> > > +	sgrp->nattrs[nid]->nid = nid;
> > 
> > These accesses need to be inside the lock as well.  Probably we can't
> > get here concurrently, but I can't so so definitively that I'm
> > comfortable blind-accessing it outside the lock.
> 
> You're right, and I appreciate your point. It's not difficult to apply your
> suggestion, so I plan to update the code as follows:
> 
>     sgrp->nattrs[nid] = new_attr;
> 
>     sysfs_attr_init(&sgrp->nattrs[nid]->kobj_attr.attr);
>     sgrp->nattrs[nid]->kobj_attr.attr.name = name;
>     sgrp->nattrs[nid]->kobj_attr.attr.mode = 0644;
>     sgrp->nattrs[nid]->kobj_attr.show = node_show;
>     sgrp->nattrs[nid]->kobj_attr.store = node_store;
>     sgrp->nattrs[nid]->nid = nid;
> 
>     ret = sysfs_create_file(&sgrp->wi_kobj,
>            &sgrp->nattrs[nid]->kobj_attr.attr);
>     if (ret) {
>         mutex_unlock(&sgrp->kobj_lock);
>         ...
>     }
>     mutex_unlock(&sgrp->kobj_lock);
> 
> > 
> > > +static int wi_node_notifier(struct notifier_block *nb,
> > > +			       unsigned long action, void *data)
> > > +{
> > ... snip ...
> > > +	case MEM_OFFLINE:
> > > +		sysfs_wi_node_release(nid);
> > 
> > I'm still not convinced this is correct.  `offline_pages()` says this:
> > 
> > /*
> >  * {on,off}lining is constrained to full memory sections (or more
> >  * precisely to memory blocks from the user space POV).
> >  */
> > 
> > And that is the function calling:
> > 	memory_notify(MEM_OFFLINE, &arg);
> > 
> > David pointed out that this should be called when offlining each memory
> > block.  This is not the same as simply doing `echo 0 > online`, you need
> > to remove the dax device associated with the memory.
> > 
> > For example:
> > 
> >       node1
> >     /       \
> >  dax0.0    dax1.0
> >    |          |
> >   mb1        mb2
> > 
> > 
> > With this code, if I `daxctl reconfigure-device devmem dax0.0` it will
> > remove the first memory block, causing MEM_OFFLINE event to fire and
> > removing the node - despite the fact that dax1.0 is still present.
> > 
> > This matters for systems with memory holes in CXL hotplug memory and
> > also for systems with Dynamic Capacity Devices surfacing capacity as
> > separate dax devices.
> > 
> > ~Gregory
> 
> If all memory blocks belonging to a node are offlined, the node will lose its
> `N_MEMORY` state before the notifier callback is invoked. This should help avoid
> the issue you mentioned.
> Please let me know your thoughts on this approach.
> 
> Rakie
> 

I'm sorry, the code is missing.
I may not fully understand the scenario you described, but I think your concern
can be addressed by adding a simple check like the following:

    case MEM_OFFLINE:
        if (!node_state(nid, N_MEMORY)) --> this point
            sysfs_wi_node_release(nid);

If all memory blocks belonging to a node are offlined, the node will lose its
`N_MEMORY` state before the notifier callback is invoked. This should help avoid
the issue you mentioned.
Please let me know your thoughts on this approach.

Rakie.

next prev parent reply	other threads:[~2025-03-24  8:54 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-20  4:17 [PATCH v3 0/3] Enhance sysfs handling for " Rakie Kim
2025-03-20  4:17 ` [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
2025-03-20  5:40   ` Rakie Kim
2025-03-20 16:59     ` Gregory Price
2025-03-21  4:36       ` Rakie Kim
2025-03-21  4:53         ` Gregory Price
2025-03-21  5:06           ` Rakie Kim
2025-03-20 16:45   ` Joshua Hahn
2025-03-21  4:37     ` Rakie Kim
2025-03-21 14:03       ` Gregory Price
2025-03-24  8:47         ` Rakie Kim
2025-03-21 13:59   ` Gregory Price
2025-03-24 16:40   ` Markus Elfring
2025-03-25 10:27     ` Rakie Kim
2025-03-20  4:17 ` [PATCH v3 2/3] mm/mempolicy: Support dynamic sysfs updates for weighted interleave Rakie Kim
2025-03-21 14:09   ` Gregory Price
2025-03-24  8:48     ` Rakie Kim
2025-04-02 16:33   ` Dan Williams
2025-04-03  4:25     ` Rakie Kim
2025-03-20  4:17 ` [PATCH v3 3/3] mm/mempolicy: Support memory hotplug in " Rakie Kim
2025-03-21 14:24   ` Gregory Price
2025-03-24  8:48     ` Rakie Kim
2025-03-24  8:54       ` Rakie Kim [this message]
2025-03-24 13:32         ` Gregory Price
2025-03-25 10:27           ` Rakie Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250324085433.998-1-rakie.kim@sk.com \
    --to=rakie.kim@sk.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=honggyu.kim@sk.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kernel_team@skhynix.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yunjeong.mun@sk.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox