From: Rakie Kim <rakie.kim@sk.com>
To: Gregory Price <gourry@gourry.net>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org,
joshua.hahnjy@gmail.com, dan.j.williams@intel.com,
ying.huang@linux.alibaba.com, kernel_team@skhynix.com,
honggyu.kim@sk.com, yunjeong.mun@sk.com,
Rakie Kim <rakie.kim@sk.com>
Subject: Re: [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted interleave
Date: Thu, 13 Mar 2025 15:33:37 +0900 [thread overview]
Message-ID: <20250313063351.692-1-rakie.kim@sk.com> (raw)
In-Reply-To: <Z9GwNWNC9VfR3Y6A@gourry-fedora-PF4VCD3F>
On Wed, 12 Mar 2025 12:03:01 -0400 Gregory Price <gourry@gourry.net> wrote:
> On Wed, Mar 12, 2025 at 04:56:25PM +0900, Rakie Kim wrote:
> > The weighted interleave policy distributes page allocations across multiple
> > NUMA nodes based on their performance weight, thereby optimizing memory
> > bandwidth utilization. The weight values for each node are configured
> > through sysfs.
> >
> > Previously, the sysfs entries for configuring weighted interleave were only
> > created during initialization. This approach had several limitations:
> > - Sysfs entries were generated for all possible nodes at boot time,
> > including nodes without memory, leading to unnecessary sysfs creation.
>
> It's not that it's unnecessary, it's that it allowed for configuration
> of nodes which may not have memory now but may have memory in the
> future. This was not well documented.
I will update the commit message to reflect your feedback.
>
> > - Some memory devices transition to an online state after initialization,
> > but the existing implementation failed to create sysfs entries for
> > these dynamically added nodes. As a result, memory hotplugged nodes
> > were not properly recognized by the weighed interleave mechanism.
> >
>
> The current system creates 1 node per N_POSSIBLE nodes, and since nodes
> can't transition between possible and not-possible your claims here are
> contradictory.
>
> I think you mean that simply switching from N_POSSIBLE to N_MEMORY is
> insufficient since nodes may transition in and out of the N_MEMORY
> state. Therefore this patch utilizes a hotplug callback to add and
> remove sysfs entries based on whether a node is in the N_MEMORY set.
I will update the commit message to reflect your feedback.
>
> > To resolve these issues, this patch introduces two key improvements:
> > 1) At initialization, only nodes that are online and have memory are
> > recognized, preventing the creation of unnecessary sysfs entries.
> > 2) Nodes that become available after initialization are dynamically
> > detected and integrated through the memory hotplug mechanism.
> >
> > With this enhancement, the weighted interleave policy now properly supports
> > memory hotplug, ensuring that newly added nodes are recognized and sysfs
> > entries are created accordingly.
> >
>
> It doesn't "support memory hotplug" so much as it "Minimizes weighted
> interleave to exclude memoryless nodes".
I will update the commit message to reflect your feedback.
>
> > Signed-off-by: Rakie Kim <rakie.kim@sk.com>
> > ---
> > mm/mempolicy.c | 47 ++++++++++++++++++++++++++++++++++++++++++-----
> > 1 file changed, 42 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 1691748badb2..94efff89e0be 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -113,6 +113,7 @@
> > #include <asm/tlbflush.h>
> > #include <asm/tlb.h>
> > #include <linux/uaccess.h>
> > +#include <linux/memory.h>
> >
> > #include "internal.h"
> >
> > @@ -3489,9 +3490,38 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
> > return 0;
> > }
> >
> > +struct kobject *wi_kobj;
> > +
> > +static int wi_node_notifier(struct notifier_block *nb,
> > + unsigned long action, void *data)
> > +{
> > + int err;
> > + struct memory_notify *arg = data;
> > + int nid = arg->status_change_nid;
> > +
> > + if (nid < 0)
> > + goto notifier_end;
> > +
> > + switch(action) {
> > + case MEM_ONLINE:
> > + err = add_weight_node(nid, wi_kobj);
> > + if (err) {
> > + pr_err("failed to add sysfs [node%d]\n", nid);
> > + kobject_put(wi_kobj);
> > + return NOTIFY_BAD;
> > + }
> > + break;
> > + case MEM_OFFLINE:
> > + sysfs_wi_node_release(node_attrs[nid], wi_kobj);
> > + break;
> > + }
>
> I'm fairly certain this logic is wrong. If I add two memory blocks and
> then remove one, would this logic not remove the sysfs entries despite
> there being a block remaining?
Regarding the assumption about node configuration:
Are you assuming that a node has two memory blocks and that
MEM_OFFLINE is triggered when one of them is offlined? If so, then
you are correct that this logic would need modification.
I performed a simple test by offlining a single memory block:
# echo 0 > /sys/devices/system/node/node2/memory100/online
In this case, MEM_OFFLINE was not triggered. However, I need to
conduct further analysis to confirm this behavior under different
conditions. I will review this in more detail and share my
findings, including the test methodology and results.
>
> > +
> > +notifier_end:
> > + return NOTIFY_OK;
> > +}
> > +
> > static int add_weighted_interleave_group(struct kobject *root_kobj)
> > {
> > - struct kobject *wi_kobj;
> > int nid, err;
> >
> > wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> > @@ -3505,16 +3535,23 @@ static int add_weighted_interleave_group(struct kobject *root_kobj)
> > return err;
> > }
> >
> > - for_each_node_state(nid, N_POSSIBLE) {
> > + for_each_online_node(nid) {
> > + if (!node_state(nid, N_MEMORY))
>
> Rather than online node, why not just add for each N_MEMORY node -
> regardless of if its memory is online or not? If the memory is offline,
> then it will be excluded from the weighted interleave mechanism by
> nature of the node being invalid for allocations anyway.
Regarding the decision to check both N_MEMORY and N_ONLINE:
This was done to ensure consistency with the conditions under which
`wi_node_notifier` is triggered. Specifically, `MEM_ONLINE` is called
only when a node is in both the N_MEMORY and N_ONLINE states.
I will review this logic further. If my understanding is correct,
keeping the current implementation is the appropriate approach.
However, I will conduct additional testing to validate this and
provide further updates accordingly.
>
> > + continue;
> > +
> > err = add_weight_node(nid, wi_kobj);
> > if (err) {
> > pr_err("failed to add sysfs [node%d]\n", nid);
> > - break;
> > + goto err_out;
> > }
> > }
> > - if (err)
> > - kobject_put(wi_kobj);
> > +
> > + hotplug_memory_notifier(wi_node_notifier, DEFAULT_CALLBACK_PRI);
> > return 0;
> > +
> > +err_out:
> > + kobject_put(wi_kobj);
> > + return err;
> > }
> >
> > static void mempolicy_kobj_release(struct kobject *kobj)
> > --
> > 2.34.1
> >
next prev parent reply other threads:[~2025-03-13 6:34 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-12 7:56 [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init() Rakie Kim
2025-03-12 7:56 ` [PATCH v2 2/4] mm/mempolicy: Support memory hotplug in weighted interleave Rakie Kim
2025-03-12 16:03 ` Gregory Price
2025-03-13 6:33 ` Rakie Kim [this message]
2025-03-13 16:23 ` Gregory Price
2025-03-13 22:36 ` David Hildenbrand
2025-03-14 6:00 ` Rakie Kim
2025-03-14 9:17 ` David Hildenbrand
2025-03-17 8:23 ` Rakie Kim
2025-03-12 7:56 ` [PATCH v2 3/4] mm/mempolicy: Enable sysfs support for " Rakie Kim
2025-03-12 16:14 ` Gregory Price
2025-03-13 6:34 ` Rakie Kim
2025-03-13 16:40 ` Gregory Price
2025-03-14 6:35 ` Rakie Kim
2025-03-12 7:56 ` [PATCH v2 4/4] mm/mempolicy: Fix duplicate node addition in sysfs for " Rakie Kim
2025-03-12 15:04 ` Joshua Hahn
2025-03-13 6:34 ` Rakie Kim
2025-03-13 16:42 ` Gregory Price
2025-03-14 6:35 ` Rakie Kim
2025-03-12 15:49 ` [PATCH v2 1/4] mm/mempolicy: Fix memory leaks in mempolicy_sysfs_init() Gregory Price
2025-03-13 6:31 ` Rakie Kim
2025-03-13 15:52 ` Gregory Price
2025-03-14 7:44 ` Rakie Kim
2025-03-14 10:55 ` Jonathan Cameron
2025-03-14 13:42 ` Gregory Price
2025-03-17 8:24 ` Rakie Kim
2025-03-17 8:24 ` Rakie Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250313063351.692-1-rakie.kim@sk.com \
--to=rakie.kim@sk.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=gourry@gourry.net \
--cc=honggyu.kim@sk.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kernel_team@skhynix.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ying.huang@linux.alibaba.com \
--cc=yunjeong.mun@sk.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox