linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Rakie Kim <rakie.kim@sk.com>
To: Gregory Price <gourry@gourry.net>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org,
	joshua.hahnjy@gmail.com, dan.j.williams@intel.com,
	ying.huang@linux.alibaba.com, kernel_team@skhynix.com,
	honggyu.kim@sk.com, yunjeong.mun@sk.com,
	Rakie Kim <rakie.kim@sk.com>
Subject: Re: [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted interleave
Date: Thu, 13 Mar 2025 15:33:37 +0900	[thread overview]
Message-ID: <20250313063351.692-1-rakie.kim@sk.com> (raw)
In-Reply-To: <Z9GwNWNC9VfR3Y6A@gourry-fedora-PF4VCD3F>

On Wed, 12 Mar 2025 12:03:01 -0400 Gregory Price <gourry@gourry.net> wrote:
> On Wed, Mar 12, 2025 at 04:56:25PM +0900, Rakie Kim wrote:
> > The weighted interleave policy distributes page allocations across multiple
> > NUMA nodes based on their performance weight, thereby optimizing memory
> > bandwidth utilization. The weight values for each node are configured
> > through sysfs.
> > 
> > Previously, the sysfs entries for configuring weighted interleave were only
> > created during initialization. This approach had several limitations:
> > - Sysfs entries were generated for all possible nodes at boot time,
> >   including nodes without memory, leading to unnecessary sysfs creation.
> 
> It's not that it's unnecessary, it's that it allowed for configuration
> of nodes which may not have memory now but may have memory in the
> future.  This was not well documented.

I will update the commit message to reflect your feedback.

> 
> > - Some memory devices transition to an online state after initialization,
> >   but the existing implementation failed to create sysfs entries for
> >   these dynamically added nodes. As a result, memory hotplugged nodes
> >   were not properly recognized by the weighed interleave mechanism.
> > 
> 
> The current system creates 1 node per N_POSSIBLE nodes, and since nodes
> can't transition between possible and not-possible your claims here are
> contradictory.
> 
> I think you mean that simply switching from N_POSSIBLE to N_MEMORY is
> insufficient since nodes may transition in and out of the N_MEMORY
> state.  Therefore this patch utilizes a hotplug callback to add and
> remove sysfs entries based on whether a node is in the N_MEMORY set.

I will update the commit message to reflect your feedback.

> 
> > To resolve these issues, this patch introduces two key improvements:
> > 1) At initialization, only nodes that are online and have memory are
> >    recognized, preventing the creation of unnecessary sysfs entries.
> > 2) Nodes that become available after initialization are dynamically
> >    detected and integrated through the memory hotplug mechanism.
> > 
> > With this enhancement, the weighted interleave policy now properly supports
> > memory hotplug, ensuring that newly added nodes are recognized and sysfs
> > entries are created accordingly.
> >
> 
> It doesn't "support memory hotplug" so much as it "Minimizes weighted
> interleave to exclude memoryless nodes".

I will update the commit message to reflect your feedback.

> 
> > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > ---
> >  mm/mempolicy.c | 47 ++++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 42 insertions(+), 5 deletions(-)
> > 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 1691748badb2..94efff89e0be 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -113,6 +113,7 @@
> >  #include <asm/tlbflush.h>
> >  #include <asm/tlb.h>
> >  #include <linux/uaccess.h>
> > +#include <linux/memory.h>
> >  
> >  #include "internal.h"
> >  
> > @@ -3489,9 +3490,38 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
> >  	return 0;
> >  }
> >  
> > +struct kobject *wi_kobj;
> > +
> > +static int wi_node_notifier(struct notifier_block *nb,
> > +			       unsigned long action, void *data)
> > +{
> > +	int err;
> > +	struct memory_notify *arg = data;
> > +	int nid = arg->status_change_nid;
> > +
> > +	if (nid < 0)
> > +		goto notifier_end;
> > +
> > +	switch(action) {
> > +	case MEM_ONLINE:
> > +		err = add_weight_node(nid, wi_kobj);
> > +		if (err) {
> > +			pr_err("failed to add sysfs [node%d]\n", nid);
> > +			kobject_put(wi_kobj);
> > +			return NOTIFY_BAD;
> > +		}
> > +		break;
> > +	case MEM_OFFLINE:
> > +		sysfs_wi_node_release(node_attrs[nid], wi_kobj);
> > +		break;
> > +	}
> 
> I'm fairly certain this logic is wrong.  If I add two memory blocks and
> then remove one, would this logic not remove the sysfs entries despite
> there being a block remaining?

Regarding the assumption about node configuration:
Are you assuming that a node has two memory blocks and that
MEM_OFFLINE is triggered when one of them is offlined? If so, then
you are correct that this logic would need modification.

I performed a simple test by offlining a single memory block:
# echo 0 > /sys/devices/system/node/node2/memory100/online

In this case, MEM_OFFLINE was not triggered. However, I need to
conduct further analysis to confirm this behavior under different
conditions. I will review this in more detail and share my
findings, including the test methodology and results.

> 
> > +
> > +notifier_end:
> > +	return NOTIFY_OK;
> > +}
> > +
> >  static int add_weighted_interleave_group(struct kobject *root_kobj)
> >  {
> > -	struct kobject *wi_kobj;
> >  	int nid, err;
> >  
> >  	wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> > @@ -3505,16 +3535,23 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> >  		return err;
> >  	}
> >  
> > -	for_each_node_state(nid, N_POSSIBLE) {
> > +	for_each_online_node(nid) {
> > +		if (!node_state(nid, N_MEMORY))
> 
> Rather than online node, why not just add for each N_MEMORY node -
> regardless of if its memory is online or not?  If the memory is offline,
> then it will be excluded from the weighted interleave mechanism by
> nature of the node being invalid for allocations anyway.

Regarding the decision to check both N_MEMORY and N_ONLINE:
This was done to ensure consistency with the conditions under which
`wi_node_notifier` is triggered. Specifically, `MEM_ONLINE` is called
only when a node is in both the N_MEMORY and N_ONLINE states.

I will review this logic further. If my understanding is correct,
keeping the current implementation is the appropriate approach.
However, I will conduct additional testing to validate this and
provide further updates accordingly.

> 
> > +			continue;
> > +
> >  		err = add_weight_node(nid, wi_kobj);
> >  		if (err) {
> >  			pr_err("failed to add sysfs [node%d]\n", nid);
> > -			break;
> > +			goto err_out;
> >  		}
> >  	}
> > -	if (err)
> > -		kobject_put(wi_kobj);
> > +
> > +	hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI);
> >  	return 0;
> > +
> > +err_out:
> > +	kobject_put(wi_kobj);
> > +	return err;
> >  }
> >  
> >  static void mempolicy_kobj_release(struct kobject *kobj)
> > -- 
> > 2.34.1
> > 


  reply	other threads:[~2025-03-13  6:34 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-12  7:56 [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init() Rakie Kim
2025-03-12  7:56 ` [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
2025-03-12 16:03   ` Gregory Price
2025-03-13  6:33     ` Rakie Kim [this message]
2025-03-13 16:23       ` Gregory Price
2025-03-13 22:36         ` David Hildenbrand
2025-03-14  6:00           ` Rakie Kim
2025-03-14  9:17             ` David Hildenbrand
2025-03-17  8:23               ` Rakie Kim
2025-03-12  7:56 ` [PATCH v2 3/4] mm/mempolicy: Enable sysfs support for " Rakie Kim
2025-03-12 16:14   ` Gregory Price
2025-03-13  6:34     ` Rakie Kim
2025-03-13 16:40       ` Gregory Price
2025-03-14  6:35         ` Rakie Kim
2025-03-12  7:56 ` [PATCH v2 4/4] mm/mempolicy: Fix duplicate node addition in sysfs for " Rakie Kim
2025-03-12 15:04   ` Joshua Hahn
2025-03-13  6:34     ` Rakie Kim
2025-03-13 16:42   ` Gregory Price
2025-03-14  6:35     ` Rakie Kim
2025-03-12 15:49 ` [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init() Gregory Price
2025-03-13  6:31   ` Rakie Kim
2025-03-13 15:52     ` Gregory Price
2025-03-14  7:44       ` Rakie Kim
2025-03-14 10:55       ` Jonathan Cameron
2025-03-14 13:42         ` Gregory Price
2025-03-17  8:24           ` Rakie Kim
2025-03-17  8:24         ` Rakie Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250313063351.692-1-rakie.kim@sk.com \
    --to=rakie.kim@sk.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=gourry@gourry.net \
    --cc=honggyu.kim@sk.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kernel_team@skhynix.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yunjeong.mun@sk.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox