linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Rakie Kim <rakie.kim@sk.com>
To: Rakie Kim <rakie.kim@sk.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org,
	joshua.hahnjy@gmail.com, dan.j.williams@intel.com,
	ying.huang@linux.alibaba.com, david@redhat.com,
	Jonathan.Cameron@huawei.com, kernel_team@skhynix.com,
	honggyu.kim@sk.com, yunjeong.mun@sk.com,
	Gregory Price <gourry@gourry.net>
Subject: Re: [PATCH v3 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
Date: Mon, 24 Mar 2025 17:54:27 +0900	[thread overview]
Message-ID: <20250324085433.998-1-rakie.kim@sk.com> (raw)
In-Reply-To: <20250324084920.987-1-rakie.kim@sk.com>

On Mon, 24 Mar 2025 17:48:39 +0900 Rakie Kim <rakie.kim@sk.com> wrote:
> On Fri, 21 Mar 2025 10:24:46 -0400 Gregory Price <gourry@gourry.net> wrote:
> > On Thu, Mar 20, 2025 at 01:17:48PM +0900, Rakie Kim wrote:
> > ... snip ...
> > > +	mutex_lock(&sgrp->kobj_lock);
> > > +	if (sgrp->nattrs[nid]) {
> > > +		mutex_unlock(&sgrp->kobj_lock);
> > > +		pr_info("Node [%d] already exists\n", nid);
> > > +		kfree(new_attr);
> > > +		kfree(name);
> > > +		return 0;
> > > +	}
> > >  
> > > -	if (sysfs_create_file(&sgrp->wi_kobj, &node_attr->kobj_attr.attr)) {
> > > -		kfree(node_attr->kobj_attr.attr.name);
> > > -		kfree(node_attr);
> > > -		pr_err("failed to add attribute to weighted_interleave\n");
> > > -		return -ENOMEM;
> > > +	sgrp->nattrs[nid] = new_attr;
> > > +	mutex_unlock(&sgrp->kobj_lock);
> > > +
> > > +	sysfs_attr_init(&sgrp->nattrs[nid]->kobj_attr.attr);
> > > +	sgrp->nattrs[nid]->kobj_attr.attr.name = name;
> > > +	sgrp->nattrs[nid]->kobj_attr.attr.mode = 0644;
> > > +	sgrp->nattrs[nid]->kobj_attr.show = node_show;
> > > +	sgrp->nattrs[nid]->kobj_attr.store = node_store;
> > > +	sgrp->nattrs[nid]->nid = nid;
> > 
> > These accesses need to be inside the lock as well.  Probably we can't
> > get here concurrently, but I can't so so definitively that I'm
> > comfortable blind-accessing it outside the lock.
> 
> You're right, and I appreciate your point. It's not difficult to apply your
> suggestion, so I plan to update the code as follows:
> 
>     sgrp->nattrs[nid] = new_attr;
> 
>     sysfs_attr_init(&sgrp->nattrs[nid]->kobj_attr.attr);
>     sgrp->nattrs[nid]->kobj_attr.attr.name = name;
>     sgrp->nattrs[nid]->kobj_attr.attr.mode = 0644;
>     sgrp->nattrs[nid]->kobj_attr.show = node_show;
>     sgrp->nattrs[nid]->kobj_attr.store = node_store;
>     sgrp->nattrs[nid]->nid = nid;
> 
>     ret = sysfs_create_file(&sgrp->wi_kobj,
>            &sgrp->nattrs[nid]->kobj_attr.attr);
>     if (ret) {
>         mutex_unlock(&sgrp->kobj_lock);
>         ...
>     }
>     mutex_unlock(&sgrp->kobj_lock);
> 
> > 
> > > +static int wi_node_notifier(struct notifier_block *nb,
> > > +			       unsigned long action, void *data)
> > > +{
> > ... snip ...
> > > +	case MEM_OFFLINE:
> > > +		sysfs_wi_node_release(nid);
> > 
> > I'm still not convinced this is correct.  `offline_pages()` says this:
> > 
> > /*
> >  * {on,off}lining is constrained to full memory sections (or more
> >  * precisely to memory blocks from the user space POV).
> >  */
> > 
> > And that is the function calling:
> > 	memory_notify(MEM_OFFLINE, &arg);
> > 
> > David pointed out that this should be called when offlining each memory
> > block.  This is not the same as simply doing `echo 0 > online`, you need
> > to remove the dax device associated with the memory.
> > 
> > For example:
> > 
> >       node1
> >     /       \
> >  dax0.0    dax1.0
> >    |          |
> >   mb1        mb2
> > 
> > 
> > With this code, if I `daxctl reconfigure-device devmem dax0.0` it will
> > remove the first memory block, causing MEM_OFFLINE event to fire and
> > removing the node - despite the fact that dax1.0 is still present.
> > 
> > This matters for systems with memory holes in CXL hotplug memory and
> > also for systems with Dynamic Capacity Devices surfacing capacity as
> > separate dax devices.
> > 
> > ~Gregory
> 
> If all memory blocks belonging to a node are offlined, the node will lose its
> `N_MEMORY` state before the notifier callback is invoked. This should help avoid
> the issue you mentioned.
> Please let me know your thoughts on this approach.
> 
> Rakie
> 

I'm sorry, the code is missing.
I may not fully understand the scenario you described, but I think your concern
can be addressed by adding a simple check like the following:

    case MEM_OFFLINE:
        if (!node_state(nid, N_MEMORY)) --> this point
            sysfs_wi_node_release(nid);

If all memory blocks belonging to a node are offlined, the node will lose its
`N_MEMORY` state before the notifier callback is invoked. This should help avoid
the issue you mentioned.
Please let me know your thoughts on this approach.

Rakie.



  reply	other threads:[~2025-03-24  8:54 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-20  4:17 [PATCH v3 0/3] Enhance sysfs handling for " Rakie Kim
2025-03-20  4:17 ` [PATCH v3 1/3] mm/mempolicy: Fix memory leaks in weighted interleave sysfs Rakie Kim
2025-03-20  5:40   ` Rakie Kim
2025-03-20 16:59     ` Gregory Price
2025-03-21  4:36       ` Rakie Kim
2025-03-21  4:53         ` Gregory Price
2025-03-21  5:06           ` Rakie Kim
2025-03-20 16:45   ` Joshua Hahn
2025-03-21  4:37     ` Rakie Kim
2025-03-21 14:03       ` Gregory Price
2025-03-24  8:47         ` Rakie Kim
2025-03-21 13:59   ` Gregory Price
2025-03-24 16:40   ` Markus Elfring
2025-03-25 10:27     ` Rakie Kim
2025-03-20  4:17 ` [PATCH v3 2/3] mm/mempolicy: Support dynamic sysfs updates for weighted interleave Rakie Kim
2025-03-21 14:09   ` Gregory Price
2025-03-24  8:48     ` Rakie Kim
2025-04-02 16:33   ` Dan Williams
2025-04-03  4:25     ` Rakie Kim
2025-03-20  4:17 ` [PATCH v3 3/3] mm/mempolicy: Support memory hotplug in " Rakie Kim
2025-03-21 14:24   ` Gregory Price
2025-03-24  8:48     ` Rakie Kim
2025-03-24  8:54       ` Rakie Kim [this message]
2025-03-24 13:32         ` Gregory Price
2025-03-25 10:27           ` Rakie Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250324085433.998-1-rakie.kim@sk.com \
    --to=rakie.kim@sk.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=honggyu.kim@sk.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kernel_team@skhynix.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yunjeong.mun@sk.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox