[PATCH 0/2] Implement numa node notifier

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Implement numa node notifier
@ 2025-04-01  9:27 Oscar Salvador
  2025-04-01  9:27 ` [PATCH 1/2] mm,memory_hotplug: " Oscar Salvador
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Oscar Salvador @ 2025-04-01  9:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, linux-mm, linux-kernel, Vlastimil Babka,
	Hyeonggon Yoo, mkoutny, Dan Williams, Jonathan Cameron,
	Oscar Salvador

Memory notifier is a tool that allow consumers to get notified whenever
memory gets onlined or offlined in the system.
Currently, there are 10 consumers of that, but 5 out of those 10 consumers
are only interested in getting notifications when a numa node has changed its
state.
That means going from memoryless to memory-aware of vice versa.

Which means that for every {online,offline}_pages operation they get
notified even though the numa node might not have changed its state.

The first patch implements a numa node notifier that does just that, and have
those consumers register in there, so they get notified only when they are
interested.

The second patch replaces 'status_change_normal{_normal}' fields within
memory_notify with a 'nid', as that is only what we need for memory
notifer and update the only user of it (page_ext).

Consumers that are only interested in numa node states change are:

 - memory-tier
 - slub
 - cpuset
 - hmat
 - cxl

Oscar Salvador (2):
  mm,memory_hotplug: Implement numa node notifier
  mm,memory_hotplug: Replace status_change_nid parameter in
    memory_notify

 drivers/acpi/numa/hmat.c  |  6 +--
 drivers/base/node.c       | 19 +++++++++
 drivers/cxl/core/region.c | 14 +++----
 drivers/cxl/cxl.h         |  4 +-
 include/linux/memory.h    | 37 ++++++++++++++++++
 kernel/cgroup/cpuset.c    |  2 +-
 mm/memory-tiers.c         |  8 ++--
 mm/memory_hotplug.c       | 82 +++++++++++++++++++++++++++++----------
 mm/page_ext.c             | 12 +-----
 mm/slub.c                 | 22 +++++------
 10 files changed, 146 insertions(+), 60 deletions(-)

-- 
2.49.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/2] mm,memory_hotplug: Implement numa node notifier
  2025-04-01  9:27 [PATCH 0/2] Implement numa node notifier Oscar Salvador
@ 2025-04-01  9:27 ` Oscar Salvador
  2025-04-01 14:19   ` Harry Yoo
                     ` (3 more replies)
  2025-04-01  9:27 ` [PATCH 2/2] mm,memory_hotplug: Replace status_change_nid parameter in memory_notify Oscar Salvador
                   ` (2 subsequent siblings)
  3 siblings, 4 replies; 22+ messages in thread
From: Oscar Salvador @ 2025-04-01  9:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, linux-mm, linux-kernel, Vlastimil Babka,
	Hyeonggon Yoo, mkoutny, Dan Williams, Jonathan Cameron,
	Oscar Salvador

There are at least four consumers of hotplug_memory_notifier that what they
really are interested in is whether any numa node changed its state, e.g: going
from being memory aware to becoming memoryless.

Implement a specific notifier for numa nodes when their state gets changed,
and have those consumers that only care about numa node state changes use it.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 drivers/acpi/numa/hmat.c  |  6 +--
 drivers/base/node.c       | 19 +++++++++
 drivers/cxl/core/region.c | 14 +++----
 drivers/cxl/cxl.h         |  4 +-
 include/linux/memory.h    | 38 ++++++++++++++++++
 kernel/cgroup/cpuset.c    |  2 +-
 mm/memory-tiers.c         |  8 ++--
 mm/memory_hotplug.c       | 84 +++++++++++++++++++++++++++++----------
 mm/slub.c                 | 22 +++++-----
 9 files changed, 148 insertions(+), 49 deletions(-)

diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index bfbb08b1e6af..d18f3efa2149 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -918,10 +918,10 @@ static int hmat_callback(struct notifier_block *self,
 			 unsigned long action, void *arg)
 {
 	struct memory_target *target;
-	struct memory_notify *mnb = arg;
+	struct node_notify *mnb = arg;
 	int pxm, nid = mnb->status_change_nid;
 
-	if (nid == NUMA_NO_NODE || action != MEM_ONLINE)
+	if (nid == NUMA_NO_NODE || action != NODE_BECAME_MEM_AWARE)
 		return NOTIFY_OK;
 
 	pxm = node_to_pxm(nid);
@@ -1074,7 +1074,7 @@ static __init int hmat_init(void)
 	hmat_register_targets();
 
 	/* Keep the table and structures if the notifier may use them */
-	if (hotplug_memory_notifier(hmat_callback, HMAT_CALLBACK_PRI))
+	if (hotplug_node_notifier(hmat_callback, HMAT_CALLBACK_PRI))
 		goto out_put;
 
 	if (!hmat_set_default_dram_perf())
diff --git a/drivers/base/node.c b/drivers/base/node.c
index 0ea653fa3433..182c71dfb5b8 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -110,6 +110,25 @@ static const struct attribute_group *node_access_node_groups[] = {
 	NULL,
 };
 
+static BLOCKING_NOTIFIER_HEAD(node_chain);
+
+int register_node_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&node_chain, nb);
+}
+EXPORT_SYMBOL(register_node_notifier);
+
+void unregister_node_notifier(struct notifier_block *nb)
+{
+	blocking_notifier_chain_unregister(&node_chain, nb);
+}
+EXPORT_SYMBOL(unregister_node_notifier);
+
+int node_notify(unsigned long val, void *v)
+{
+	return blocking_notifier_call_chain(&node_chain, val, v);
+}
+
 static void node_remove_accesses(struct node *node)
 {
 	struct node_access_nodes *c, *cnext;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e8d11a988fd9..7d187088f557 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2409,12 +2409,12 @@ static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
 					  unsigned long action, void *arg)
 {
 	struct cxl_region *cxlr = container_of(nb, struct cxl_region,
-					       memory_notifier);
-	struct memory_notify *mnb = arg;
+					       node_notifier);
+	struct node_notify *mnb = arg;
 	int nid = mnb->status_change_nid;
 	int region_nid;
 
-	if (nid == NUMA_NO_NODE || action != MEM_ONLINE)
+	if (nid == NUMA_NO_NODE || action != NODE_BECAME_MEM_AWARE)
 		return NOTIFY_DONE;
 
 	/*
@@ -3388,7 +3388,7 @@ static void shutdown_notifiers(void *_cxlr)
 {
 	struct cxl_region *cxlr = _cxlr;
 
-	unregister_memory_notifier(&cxlr->memory_notifier);
+	unregister_node_notifier(&cxlr->node_notifier);
 	unregister_mt_adistance_algorithm(&cxlr->adist_notifier);
 }
 
@@ -3427,9 +3427,9 @@ static int cxl_region_probe(struct device *dev)
 	if (rc)
 		return rc;
 
-	cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback;
-	cxlr->memory_notifier.priority = CXL_CALLBACK_PRI;
-	register_memory_notifier(&cxlr->memory_notifier);
+	cxlr->node_notifier.notifier_call = cxl_region_perf_attrs_callback;
+	cxlr->node_notifier.priority = CXL_CALLBACK_PRI;
+	register_node_notifier(&cxlr->node_notifier);
 
 	cxlr->adist_notifier.notifier_call = cxl_region_calculate_adistance;
 	cxlr->adist_notifier.priority = 100;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index bbbaa0d0a670..d4c9a499de7a 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -532,7 +532,7 @@ struct cxl_region_params {
  * @flags: Region state flags
  * @params: active + config params for the region
  * @coord: QoS access coordinates for the region
- * @memory_notifier: notifier for setting the access coordinates to node
+ * @node_notifier: notifier for setting the access coordinates to node
  * @adist_notifier: notifier for calculating the abstract distance of node
  */
 struct cxl_region {
@@ -545,7 +545,7 @@ struct cxl_region {
 	unsigned long flags;
 	struct cxl_region_params params;
 	struct access_coordinate coord[ACCESS_COORDINATE_MAX];
-	struct notifier_block memory_notifier;
+	struct notifier_block node_notifier;
 	struct notifier_block adist_notifier;
 };
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 12daa6ec7d09..1d814dfbb8a8 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -99,6 +99,14 @@ int set_memory_block_size_order(unsigned int order);
 #define	MEM_PREPARE_ONLINE	(1<<6)
 #define	MEM_FINISH_OFFLINE	(1<<7)
 
+/* These states are used for numa node notifiers */
+#define NODE_BECOMING_MEM_AWARE		(1<<0)
+#define NODE_BECAME_MEM_AWARE		(1<<1)
+#define NODE_BECOMING_MEMORYLESS	(1<<2)
+#define NODE_BECAME_MEMORYLESS		(1<<3)
+#define NODE_CANCEL_MEM_AWARE		(1<<4)
+#define NODE_CANCEL_MEMORYLESS		(1<<5)
+
 struct memory_notify {
 	/*
 	 * The altmap_start_pfn and altmap_nr_pages fields are designated for
@@ -113,6 +121,11 @@ struct memory_notify {
 	int status_change_nid;
 };
 
+struct node_notify {
+	int status_change_nid_normal;
+	int status_change_nid;
+};
+
 struct notifier_block;
 struct mem_section;
 
@@ -149,15 +162,34 @@ static inline int hotplug_memory_notifier(notifier_fn_t fn, int pri)
 {
 	return 0;
 }
+
+static inline int register_node_notifier(struct notifier_block *nb)
+{
+	return 0;
+}
+static inline void unregister_node_notifier(struct notifier_block *nb)
+{
+}
+static inline int node_notify(unsigned long val, void *v)
+{
+	return 0;
+}
+static inline int hotplug_node_notifier(notifier_fn_t fn, int pri)
+{
+	return 0;
+}
 #else /* CONFIG_MEMORY_HOTPLUG */
 extern int register_memory_notifier(struct notifier_block *nb);
+extern int register_node_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
+extern void unregister_node_notifier(struct notifier_block *nb);
 int create_memory_block_devices(unsigned long start, unsigned long size,
 				struct vmem_altmap *altmap,
 				struct memory_group *group);
 void remove_memory_block_devices(unsigned long start, unsigned long size);
 extern void memory_dev_init(void);
 extern int memory_notify(unsigned long val, void *v);
+extern int node_notify(unsigned long val, void *v);
 extern struct memory_block *find_memory_block(unsigned long section_nr);
 typedef int (*walk_memory_blocks_func_t)(struct memory_block *, void *);
 extern int walk_memory_blocks(unsigned long start, unsigned long size,
@@ -177,6 +209,12 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func,
 	register_memory_notifier(&fn##_mem_nb);			\
 })
 
+#define hotplug_node_notifier(fn, pri) ({		\
+	static __meminitdata struct notifier_block fn##_node_nb =\
+		{ .notifier_call = fn, .priority = pri };\
+	register_node_notifier(&fn##_node_nb);			\
+})
+
 #ifdef CONFIG_NUMA
 void memory_block_add_nid(struct memory_block *mem, int nid,
 			  enum meminit_context context);
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 0f910c828973..62a5d34c4331 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3939,7 +3939,7 @@ void __init cpuset_init_smp(void)
 	cpumask_copy(top_cpuset.effective_cpus, cpu_active_mask);
 	top_cpuset.effective_mems = node_states[N_MEMORY];
 
-	hotplug_memory_notifier(cpuset_track_online_nodes, CPUSET_CALLBACK_PRI);
+	hotplug_node_notifier(cpuset_track_online_nodes, CPUSET_CALLBACK_PRI);
 
 	cpuset_migrate_mm_wq = alloc_ordered_workqueue("cpuset_migrate_mm", 0);
 	BUG_ON(!cpuset_migrate_mm_wq);
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index fc14fe53e9b7..dfe6c28c8352 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -872,7 +872,7 @@ static int __meminit memtier_hotplug_callback(struct notifier_block *self,
 					      unsigned long action, void *_arg)
 {
 	struct memory_tier *memtier;
-	struct memory_notify *arg = _arg;
+	struct node_notify *arg = _arg;
 
 	/*
 	 * Only update the node migration order when a node is
@@ -882,13 +882,13 @@ static int __meminit memtier_hotplug_callback(struct notifier_block *self,
 		return notifier_from_errno(0);
 
 	switch (action) {
-	case MEM_OFFLINE:
+	case NODE_BECAME_MEMORYLESS:
 		mutex_lock(&memory_tier_lock);
 		if (clear_node_memory_tier(arg->status_change_nid))
 			establish_demotion_targets();
 		mutex_unlock(&memory_tier_lock);
 		break;
-	case MEM_ONLINE:
+	case NODE_BECAME_MEM_AWARE:
 		mutex_lock(&memory_tier_lock);
 		memtier = set_node_memory_tier(arg->status_change_nid);
 		if (!IS_ERR(memtier))
@@ -929,7 +929,7 @@ static int __init memory_tier_init(void)
 	nodes_and(default_dram_nodes, node_states[N_MEMORY],
 		  node_states[N_CPU]);
 
-	hotplug_memory_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI);
+	hotplug_node_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI);
 	return 0;
 }
 subsys_initcall(memory_tier_init);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 75401866fb76..4bb9ff282ec9 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -701,7 +701,7 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages)
 
 /* check which state of node_states will be changed when online memory */
 static void node_states_check_changes_online(unsigned long nr_pages,
-	struct zone *zone, struct memory_notify *arg)
+	struct zone *zone, struct node_notify *arg)
 {
 	int nid = zone_to_nid(zone);
 
@@ -714,7 +714,7 @@ static void node_states_check_changes_online(unsigned long nr_pages,
 		arg->status_change_nid_normal = nid;
 }
 
-static void node_states_set_node(int node, struct memory_notify *arg)
+static void node_states_set_node(int node, struct node_notify *arg)
 {
 	if (arg->status_change_nid_normal >= 0)
 		node_set_state(node, N_NORMAL_MEMORY);
@@ -1177,7 +1177,9 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
 	int need_zonelists_rebuild = 0;
 	const int nid = zone_to_nid(zone);
 	int ret;
-	struct memory_notify arg;
+	struct memory_notify mem_arg;
+	struct node_notify node_arg;
+	bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false;
 
 	/*
 	 * {on,off}lining is constrained to full memory sections (or more
@@ -1194,11 +1196,23 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
 	/* associate pfn range with the zone */
 	move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE);
 
-	arg.start_pfn = pfn;
-	arg.nr_pages = nr_pages;
-	node_states_check_changes_online(nr_pages, zone, &arg);
+	mem_arg.start_pfn = pfn;
+	mem_arg.nr_pages = nr_pages;
+	node_states_check_changes_online(nr_pages, zone, &node_arg);
 
-	ret = memory_notify(MEM_GOING_ONLINE, &arg);
+	if (node_arg.status_change_nid >= 0) {
+		/* Node is becoming memory aware. Notify consumers */
+		cancel_node_notifier_on_err = true;
+		ret = node_notify(NODE_BECOMING_MEM_AWARE, &node_arg);
+		ret = notifier_to_errno(ret);
+		if (ret)
+			goto failed_addition;
+	}
+
+	cancel_mem_notifier_on_err = true;
+	mem_arg.status_change_nid = node_arg.status_change_nid;
+	mem_arg.status_change_nid_normal = node_arg.status_change_nid_normal;
+	ret = memory_notify(MEM_GOING_ONLINE, &mem_arg);
 	ret = notifier_to_errno(ret);
 	if (ret)
 		goto failed_addition;
@@ -1224,7 +1238,7 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
 	online_pages_range(pfn, nr_pages);
 	adjust_present_page_count(pfn_to_page(pfn), group, nr_pages);
 
-	node_states_set_node(nid, &arg);
+	node_states_set_node(nid, &node_arg);
 	if (need_zonelists_rebuild)
 		build_all_zonelists(NULL);
 
@@ -1245,16 +1259,26 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
 	kswapd_run(nid);
 	kcompactd_run(nid);
 
+	if (node_arg.status_change_nid >= 0)
+		/*
+		 * Node went from memoryless to have memory. Notifiy interested
+		 * consumers
+		 */
+		node_notify(NODE_BECAME_MEM_AWARE, &node_arg);
+
 	writeback_set_ratelimit();
 
-	memory_notify(MEM_ONLINE, &arg);
+	memory_notify(MEM_ONLINE, &mem_arg);
 	return 0;
 
 failed_addition:
 	pr_debug("online_pages [mem %#010llx-%#010llx] failed\n",
 		 (unsigned long long) pfn << PAGE_SHIFT,
 		 (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
-	memory_notify(MEM_CANCEL_ONLINE, &arg);
+	if (cancel_node_notifier_on_err)
+		node_notify(NODE_CANCEL_MEM_AWARE, &node_arg);
+	if (cancel_mem_notifier_on_err)
+		memory_notify(MEM_CANCEL_ONLINE, &mem_arg);
 	remove_pfn_range_from_zone(zone, pfn, nr_pages);
 	return ret;
 }
@@ -1898,7 +1922,7 @@ early_param("movable_node", cmdline_parse_movable_node);
 
 /* check which state of node_states will be changed when offline memory */
 static void node_states_check_changes_offline(unsigned long nr_pages,
-		struct zone *zone, struct memory_notify *arg)
+		struct zone *zone, struct node_notify *arg)
 {
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	unsigned long present_pages = 0;
@@ -1935,7 +1959,7 @@ static void node_states_check_changes_offline(unsigned long nr_pages,
 		arg->status_change_nid = zone_to_nid(zone);
 }
 
-static void node_states_clear_node(int node, struct memory_notify *arg)
+static void node_states_clear_node(int node, struct node_notify *arg)
 {
 	if (arg->status_change_nid_normal >= 0)
 		node_clear_state(node, N_NORMAL_MEMORY);
@@ -1963,7 +1987,9 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	unsigned long pfn, managed_pages, system_ram_pages = 0;
 	const int node = zone_to_nid(zone);
 	unsigned long flags;
-	struct memory_notify arg;
+	struct memory_notify mem_arg;
+	struct node_notify node_arg;
+	bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false;
 	char *reason;
 	int ret;
 
@@ -2022,11 +2048,22 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 		goto failed_removal_pcplists_disabled;
 	}
 
-	arg.start_pfn = start_pfn;
-	arg.nr_pages = nr_pages;
-	node_states_check_changes_offline(nr_pages, zone, &arg);
+	mem_arg.start_pfn = start_pfn;
+	mem_arg.nr_pages = nr_pages;
+	node_states_check_changes_offline(nr_pages, zone, &node_arg);
+
+	if (node_arg.status_change_nid >= 0) {
+		cancel_node_notifier_on_err = true;
+		ret = node_notify(NODE_BECOMING_MEMORYLESS, &node_arg);
+		ret = notifier_to_errno(ret);
+		if (ret)
+			goto failed_removal_isolated;
+	}
 
-	ret = memory_notify(MEM_GOING_OFFLINE, &arg);
+	cancel_mem_notifier_on_err = true;
+	mem_arg.status_change_nid = node_arg.status_change_nid;
+	mem_arg.status_change_nid_normal = node_arg.status_change_nid_normal;
+	ret = memory_notify(MEM_GOING_OFFLINE, &mem_arg);
 	ret = notifier_to_errno(ret);
 	if (ret) {
 		reason = "notifier failure";
@@ -2106,27 +2143,32 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	 * Make sure to mark the node as memory-less before rebuilding the zone
 	 * list. Otherwise this node would still appear in the fallback lists.
 	 */
-	node_states_clear_node(node, &arg);
+	node_states_clear_node(node, &node_arg);
 	if (!populated_zone(zone)) {
 		zone_pcp_reset(zone);
 		build_all_zonelists(NULL);
 	}
 
-	if (arg.status_change_nid >= 0) {
+	if (node_arg.status_change_nid >= 0) {
 		kcompactd_stop(node);
 		kswapd_stop(node);
+		/*Node went memoryless. Notifiy interested consumers */
+		node_notify(NODE_BECAME_MEMORYLESS, &node_arg);
 	}
 
 	writeback_set_ratelimit();
 
-	memory_notify(MEM_OFFLINE, &arg);
+	memory_notify(MEM_OFFLINE, &mem_arg);
 	remove_pfn_range_from_zone(zone, start_pfn, nr_pages);
 	return 0;
 
 failed_removal_isolated:
 	/* pushback to free area */
 	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
-	memory_notify(MEM_CANCEL_OFFLINE, &arg);
+	if (cancel_node_notifier_on_err)
+		node_notify(NODE_CANCEL_MEMORYLESS, &node_arg);
+	if (cancel_mem_notifier_on_err)
+		memory_notify(MEM_CANCEL_OFFLINE, &mem_arg);
 failed_removal_pcplists_disabled:
 	lru_cache_enable();
 	zone_pcp_enable(zone);
diff --git a/mm/slub.c b/mm/slub.c
index 184fd2b14758..74350f6c8ddd 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5928,10 +5928,10 @@ static int slab_mem_going_offline_callback(void *arg)
 
 static void slab_mem_offline_callback(void *arg)
 {
-	struct memory_notify *marg = arg;
+	struct node_notify *narg = arg;
 	int offline_node;
 
-	offline_node = marg->status_change_nid_normal;
+	offline_node = narg->status_change_nid_normal;
 
 	/*
 	 * If the node still has available memory. we need kmem_cache_node
@@ -5954,8 +5954,8 @@ static int slab_mem_going_online_callback(void *arg)
 {
 	struct kmem_cache_node *n;
 	struct kmem_cache *s;
-	struct memory_notify *marg = arg;
-	int nid = marg->status_change_nid_normal;
+	struct node_notify *narg = arg;
+	int nid = narg->status_change_nid_normal;
 	int ret = 0;
 
 	/*
@@ -6007,18 +6007,18 @@ static int slab_memory_callback(struct notifier_block *self,
 	int ret = 0;
 
 	switch (action) {
-	case MEM_GOING_ONLINE:
+	case NODE_BECOMING_MEM_AWARE:
 		ret = slab_mem_going_online_callback(arg);
 		break;
-	case MEM_GOING_OFFLINE:
+	case NODE_BECOMING_MEMORYLESS:
 		ret = slab_mem_going_offline_callback(arg);
 		break;
-	case MEM_OFFLINE:
-	case MEM_CANCEL_ONLINE:
+	case NODE_BECAME_MEMORYLESS:
+	case NODE_CANCEL_MEM_AWARE:
 		slab_mem_offline_callback(arg);
 		break;
-	case MEM_ONLINE:
-	case MEM_CANCEL_OFFLINE:
+	case NODE_BECAME_MEM_AWARE:
+	case NODE_CANCEL_MEMORYLESS:
 		break;
 	}
 	if (ret)
@@ -6094,7 +6094,7 @@ void __init kmem_cache_init(void)
 			sizeof(struct kmem_cache_node),
 			SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
 
-	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
+	hotplug_node_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
 
 	/* Able to allocate the per node structures */
 	slab_state = PARTIAL;
-- 
2.49.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/2] mm,memory_hotplug: Replace status_change_nid parameter in memory_notify
  2025-04-01  9:27 [PATCH 0/2] Implement numa node notifier Oscar Salvador
  2025-04-01  9:27 ` [PATCH 1/2] mm,memory_hotplug: " Oscar Salvador
@ 2025-04-01  9:27 ` Oscar Salvador
  2025-04-02  2:53   ` Harry Yoo
  2025-04-02 16:09   ` Vlastimil Babka
  2025-04-02 16:06 ` [PATCH 0/2] Implement numa node notifier Vlastimil Babka
  2025-04-03 12:29 ` Jonathan Cameron
  3 siblings, 2 replies; 22+ messages in thread
From: Oscar Salvador @ 2025-04-01  9:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, linux-mm, linux-kernel, Vlastimil Babka,
	Hyeonggon Yoo, mkoutny, Dan Williams, Jonathan Cameron,
	Oscar Salvador

memory notify consumers are only interested in which node the memory we
are adding belongs to, so replace current status_change_nid{_normal} fields
with only one that specifies the node.

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/memory.h |  3 +--
 mm/memory_hotplug.c    |  6 ++----
 mm/page_ext.c          | 12 +-----------
 3 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/include/linux/memory.h b/include/linux/memory.h
index 1d814dfbb8a8..4d8884578a1a 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -117,8 +117,7 @@ struct memory_notify {
 	unsigned long altmap_nr_pages;
 	unsigned long start_pfn;
 	unsigned long nr_pages;
-	int status_change_nid_normal;
-	int status_change_nid;
+	int nid;
 };
 
 struct node_notify {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4bb9ff282ec9..185d799c79e2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1198,6 +1198,7 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
 
 	mem_arg.start_pfn = pfn;
 	mem_arg.nr_pages = nr_pages;
+	mem_arg.nid = nid;
 	node_states_check_changes_online(nr_pages, zone, &node_arg);
 
 	if (node_arg.status_change_nid >= 0) {
@@ -1210,8 +1211,6 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
 	}
 
 	cancel_mem_notifier_on_err = true;
-	mem_arg.status_change_nid = node_arg.status_change_nid;
-	mem_arg.status_change_nid_normal = node_arg.status_change_nid_normal;
 	ret = memory_notify(MEM_GOING_ONLINE, &mem_arg);
 	ret = notifier_to_errno(ret);
 	if (ret)
@@ -2050,6 +2049,7 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 
 	mem_arg.start_pfn = start_pfn;
 	mem_arg.nr_pages = nr_pages;
+	mem_arg.nid = node;
 	node_states_check_changes_offline(nr_pages, zone, &node_arg);
 
 	if (node_arg.status_change_nid >= 0) {
@@ -2061,8 +2061,6 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	}
 
 	cancel_mem_notifier_on_err = true;
-	mem_arg.status_change_nid = node_arg.status_change_nid;
-	mem_arg.status_change_nid_normal = node_arg.status_change_nid_normal;
 	ret = memory_notify(MEM_GOING_OFFLINE, &mem_arg);
 	ret = notifier_to_errno(ret);
 	if (ret) {
diff --git a/mm/page_ext.c b/mm/page_ext.c
index c351fdfe9e9a..477e6f24b7ab 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -378,16 +378,6 @@ static int __meminit online_page_ext(unsigned long start_pfn,
 	start = SECTION_ALIGN_DOWN(start_pfn);
 	end = SECTION_ALIGN_UP(start_pfn + nr_pages);
 
-	if (nid == NUMA_NO_NODE) {
-		/*
-		 * In this case, "nid" already exists and contains valid memory.
-		 * "start_pfn" passed to us is a pfn which is an arg for
-		 * online__pages(), and start_pfn should exist.
-		 */
-		nid = pfn_to_nid(start_pfn);
-		VM_BUG_ON(!node_online(nid));
-	}
-
 	for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION)
 		fail = init_section_page_ext(pfn, nid);
 	if (!fail)
@@ -436,7 +426,7 @@ static int __meminit page_ext_callback(struct notifier_block *self,
 	switch (action) {
 	case MEM_GOING_ONLINE:
 		ret = online_page_ext(mn->start_pfn,
-				   mn->nr_pages, mn->status_change_nid);
+				   mn->nr_pages, mn->nid);
 		break;
 	case MEM_OFFLINE:
 		offline_page_ext(mn->start_pfn,
-- 
2.49.0



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mm,memory_hotplug: Implement numa node notifier
  2025-04-01  9:27 ` [PATCH 1/2] mm,memory_hotplug: " Oscar Salvador
@ 2025-04-01 14:19   ` Harry Yoo
  2025-04-02 16:03   ` Vlastimil Babka
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Harry Yoo @ 2025-04-01 14:19 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, David Hildenbrand, linux-mm, linux-kernel,
	Vlastimil Babka, Hyeonggon Yoo, mkoutny, Dan Williams,
	Jonathan Cameron

On Tue, Apr 01, 2025 at 11:27:15AM +0200, Oscar Salvador wrote:
> There are at least four consumers of hotplug_memory_notifier that what they
> really are interested in is whether any numa node changed its state, e.g: going
> from being memory aware to becoming memoryless.
> 
> Implement a specific notifier for numa nodes when their state gets changed,
> and have those consumers that only care about numa node state changes use it.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---

Hi Oscar,

Please Cc harry.yoo@oracle.com instead of 42.hyeyoo@gmail.com
in the future—I recently changed my email.

FWIW, I reviewed this patch and it looks good to me:

Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

>  drivers/acpi/numa/hmat.c  |  6 +--
>  drivers/base/node.c       | 19 +++++++++
>  drivers/cxl/core/region.c | 14 +++----
>  drivers/cxl/cxl.h         |  4 +-
>  include/linux/memory.h    | 38 ++++++++++++++++++
>  kernel/cgroup/cpuset.c    |  2 +-
>  mm/memory-tiers.c         |  8 ++--
>  mm/memory_hotplug.c       | 84 +++++++++++++++++++++++++++++----------
>  mm/slub.c                 | 22 +++++-----
>  9 files changed, 148 insertions(+), 49 deletions(-)
>  
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 75401866fb76..4bb9ff282ec9 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1963,7 +1987,9 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  	unsigned long pfn, managed_pages, system_ram_pages = 0;
>  	const int node = zone_to_nid(zone);
>  	unsigned long flags;
> -	struct memory_notify arg;
> +	struct memory_notify mem_arg;
> +	struct node_notify node_arg;
> +	bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false;
>  	char *reason;
>  	int ret;
>  

[...]

> +	if (node_arg.status_change_nid >= 0) {
>  		kcompactd_stop(node);
>  		kswapd_stop(node);
> +		/*Node went memoryless. Notifiy interested consumers */
> +		node_notify(NODE_BECAME_MEMORYLESS, &node_arg);
>  	}

nit: /* Node went memoryless. ... */
       ^ a whitespace omitted.

>  	writeback_set_ratelimit();
>  
> -	memory_notify(MEM_OFFLINE, &arg);
> +	memory_notify(MEM_OFFLINE, &mem_arg);
>  	remove_pfn_range_from_zone(zone, start_pfn, nr_pages);
>  	return 0;

> diff --git a/mm/slub.c b/mm/slub.c
> index 184fd2b14758..74350f6c8ddd 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5928,10 +5928,10 @@ static int slab_mem_going_offline_callback(void *arg)
>  
>  static void slab_mem_offline_callback(void *arg)
>  {
> -	struct memory_notify *marg = arg;
> +	struct node_notify *narg = arg;
>  	int offline_node;
>  
> -	offline_node = marg->status_change_nid_normal;
> +	offline_node = narg->status_change_nid_normal;
>  
>  	/*
>  	 * If the node still has available memory. we need kmem_cache_node

I was wondering if this offline_node check is still necessary after
this patch, but as SLUB is still notified for N_HIGH_MEMORY-only nodes,
it is necessary.

-- 
Cheers,
Harry (formerly known as Hyeonggon)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] mm,memory_hotplug: Replace status_change_nid parameter in memory_notify
  2025-04-01  9:27 ` [PATCH 2/2] mm,memory_hotplug: Replace status_change_nid parameter in memory_notify Oscar Salvador
@ 2025-04-02  2:53   ` Harry Yoo
  2025-04-02 16:09   ` Vlastimil Babka
  1 sibling, 0 replies; 22+ messages in thread
From: Harry Yoo @ 2025-04-02  2:53 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, David Hildenbrand, linux-mm, linux-kernel,
	Vlastimil Babka, Hyeonggon Yoo, mkoutny, Dan Williams,
	Jonathan Cameron

On Tue, Apr 01, 2025 at 11:27:16AM +0200, Oscar Salvador wrote:
> memory notify consumers are only interested in which node the memory we
> are adding belongs to, so replace current status_change_nid{_normal} fields
> with only one that specifies the node.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---

Looks good to me,
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

>  include/linux/memory.h |  3 +--
>  mm/memory_hotplug.c    |  6 ++----
>  mm/page_ext.c          | 12 +-----------
>  3 files changed, 4 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/page_ext.c b/mm/page_ext.c
> index c351fdfe9e9a..477e6f24b7ab 100644
> --- a/mm/page_ext.c
> +++ b/mm/page_ext.c
> @@ -378,16 +378,6 @@ static int __meminit online_page_ext(unsigned long start_pfn,
>  	start = SECTION_ALIGN_DOWN(start_pfn);
>  	end = SECTION_ALIGN_UP(start_pfn + nr_pages);
>  
> -	if (nid == NUMA_NO_NODE) {
> -		/*
> -		 * In this case, "nid" already exists and contains valid memory.
> -		 * "start_pfn" passed to us is a pfn which is an arg for
> -		 * online__pages(), and start_pfn should exist.
> -		 */
> -		nid = pfn_to_nid(start_pfn);
> -		VM_BUG_ON(!node_online(nid));
> -	}

Ok, now that users who care 'status change' are using node notifier,
mem_arg.nid is the NUMA node the pfn range belongs to,
and always not NUMA_NO_NODE.

>  	for (pfn = start; !fail && pfn < end; pfn += PAGES_PER_SECTION)
>  		fail = init_section_page_ext(pfn, nid);
>  	if (!fail)

-- 
Cheers,
Harry (formerly known as Hyeonggon)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mm,memory_hotplug: Implement numa node notifier
  2025-04-01  9:27 ` [PATCH 1/2] mm,memory_hotplug: " Oscar Salvador
  2025-04-01 14:19   ` Harry Yoo
@ 2025-04-02 16:03   ` Vlastimil Babka
  2025-04-02 16:57     ` Oscar Salvador
  2025-04-03 12:44   ` Jonathan Cameron
  2025-04-04 10:09   ` David Hildenbrand
  3 siblings, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2025-04-02 16:03 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: David Hildenbrand, linux-mm, linux-kernel, Hyeonggon Yoo,
	mkoutny, Dan Williams, Jonathan Cameron

On 4/1/25 11:27, Oscar Salvador wrote:
> There are at least four consumers of hotplug_memory_notifier that what they
> really are interested in is whether any numa node changed its state, e.g: going
> from being memory aware to becoming memoryless.
> 
> Implement a specific notifier for numa nodes when their state gets changed,
> and have those consumers that only care about numa node state changes use it.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

<snip>

> -static void node_states_set_node(int node, struct memory_notify *arg)
> +static void node_states_set_node(int node, struct node_notify *arg)
>  {
>  	if (arg->status_change_nid_normal >= 0)
>  		node_set_state(node, N_NORMAL_MEMORY);
> @@ -1177,7 +1177,9 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
>  	int need_zonelists_rebuild = 0;
>  	const int nid = zone_to_nid(zone);
>  	int ret;
> -	struct memory_notify arg;
> +	struct memory_notify mem_arg;
> +	struct node_notify node_arg;
> +	bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false;
>  
>  	/*
>  	 * {on,off}lining is constrained to full memory sections (or more
> @@ -1194,11 +1196,23 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
>  	/* associate pfn range with the zone */
>  	move_pfn_range_to_zone(zone, pfn, nr_pages, NULL, MIGRATE_ISOLATE);
>  
> -	arg.start_pfn = pfn;
> -	arg.nr_pages = nr_pages;
> -	node_states_check_changes_online(nr_pages, zone, &arg);
> +	mem_arg.start_pfn = pfn;
> +	mem_arg.nr_pages = nr_pages;
> +	node_states_check_changes_online(nr_pages, zone, &node_arg);
>  
> -	ret = memory_notify(MEM_GOING_ONLINE, &arg);
> +	if (node_arg.status_change_nid >= 0) {

Hmm, don't we need to add "|| node_arg.status_change_nid_normal >= 0"? Or we
fail to notify addition of normal memory to a node that already has !normal
memory?

> +		/* Node is becoming memory aware. Notify consumers */
> +		cancel_node_notifier_on_err = true;
> +		ret = node_notify(NODE_BECOMING_MEM_AWARE, &node_arg);
> +		ret = notifier_to_errno(ret);
> +		if (ret)
> +			goto failed_addition;
> +	}
> +
> +	cancel_mem_notifier_on_err = true;
> +	mem_arg.status_change_nid = node_arg.status_change_nid;
> +	mem_arg.status_change_nid_normal = node_arg.status_change_nid_normal;
> +	ret = memory_notify(MEM_GOING_ONLINE, &mem_arg);
>  	ret = notifier_to_errno(ret);
>  	if (ret)
>  		goto failed_addition;
> @@ -1224,7 +1238,7 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
>  	online_pages_range(pfn, nr_pages);
>  	adjust_present_page_count(pfn_to_page(pfn), group, nr_pages);
>  
> -	node_states_set_node(nid, &arg);
> +	node_states_set_node(nid, &node_arg);
>  	if (need_zonelists_rebuild)
>  		build_all_zonelists(NULL);
>  
> @@ -1245,16 +1259,26 @@ int online_pages(unsigned long pfn, unsigned long nr_pages,
>  	kswapd_run(nid);
>  	kcompactd_run(nid);
>  
> +	if (node_arg.status_change_nid >= 0)
> +		/*
> +		 * Node went from memoryless to have memory. Notifiy interested
> +		 * consumers
> +		 */
> +		node_notify(NODE_BECAME_MEM_AWARE, &node_arg);
> +
>  	writeback_set_ratelimit();
>  
> -	memory_notify(MEM_ONLINE, &arg);
> +	memory_notify(MEM_ONLINE, &mem_arg);
>  	return 0;
>  
>  failed_addition:
>  	pr_debug("online_pages [mem %#010llx-%#010llx] failed\n",
>  		 (unsigned long long) pfn << PAGE_SHIFT,
>  		 (((unsigned long long) pfn + nr_pages) << PAGE_SHIFT) - 1);
> -	memory_notify(MEM_CANCEL_ONLINE, &arg);
> +	if (cancel_node_notifier_on_err)
> +		node_notify(NODE_CANCEL_MEM_AWARE, &node_arg);
> +	if (cancel_mem_notifier_on_err)
> +		memory_notify(MEM_CANCEL_ONLINE, &mem_arg);

Switch the order of those just for symmetry? :)

>  	remove_pfn_range_from_zone(zone, pfn, nr_pages);
>  	return ret;
>  }
> @@ -1898,7 +1922,7 @@ early_param("movable_node", cmdline_parse_movable_node);
>  
>  /* check which state of node_states will be changed when offline memory */
>  static void node_states_check_changes_offline(unsigned long nr_pages,
> -		struct zone *zone, struct memory_notify *arg)
> +		struct zone *zone, struct node_notify *arg)
>  {
>  	struct pglist_data *pgdat = zone->zone_pgdat;
>  	unsigned long present_pages = 0;
> @@ -1935,7 +1959,7 @@ static void node_states_check_changes_offline(unsigned long nr_pages,
>  		arg->status_change_nid = zone_to_nid(zone);
>  }
>  
> -static void node_states_clear_node(int node, struct memory_notify *arg)
> +static void node_states_clear_node(int node, struct node_notify *arg)
>  {
>  	if (arg->status_change_nid_normal >= 0)
>  		node_clear_state(node, N_NORMAL_MEMORY);
> @@ -1963,7 +1987,9 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  	unsigned long pfn, managed_pages, system_ram_pages = 0;
>  	const int node = zone_to_nid(zone);
>  	unsigned long flags;
> -	struct memory_notify arg;
> +	struct memory_notify mem_arg;
> +	struct node_notify node_arg;
> +	bool cancel_mem_notifier_on_err = false, cancel_node_notifier_on_err = false;
>  	char *reason;
>  	int ret;
>  
> @@ -2022,11 +2048,22 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  		goto failed_removal_pcplists_disabled;
>  	}
>  
> -	arg.start_pfn = start_pfn;
> -	arg.nr_pages = nr_pages;
> -	node_states_check_changes_offline(nr_pages, zone, &arg);
> +	mem_arg.start_pfn = start_pfn;
> +	mem_arg.nr_pages = nr_pages;
> +	node_states_check_changes_offline(nr_pages, zone, &node_arg);
> +
> +	if (node_arg.status_change_nid >= 0) {

Ditto.

> +		cancel_node_notifier_on_err = true;
> +		ret = node_notify(NODE_BECOMING_MEMORYLESS, &node_arg);
> +		ret = notifier_to_errno(ret);
> +		if (ret)
> +			goto failed_removal_isolated;
> +	}
>  
> -	ret = memory_notify(MEM_GOING_OFFLINE, &arg);
> +	cancel_mem_notifier_on_err = true;
> +	mem_arg.status_change_nid = node_arg.status_change_nid;
> +	mem_arg.status_change_nid_normal = node_arg.status_change_nid_normal;
> +	ret = memory_notify(MEM_GOING_OFFLINE, &mem_arg);
>  	ret = notifier_to_errno(ret);
>  	if (ret) {
>  		reason = "notifier failure";
> @@ -2106,27 +2143,32 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  	 * Make sure to mark the node as memory-less before rebuilding the zone
>  	 * list. Otherwise this node would still appear in the fallback lists.
>  	 */
> -	node_states_clear_node(node, &arg);
> +	node_states_clear_node(node, &node_arg);
>  	if (!populated_zone(zone)) {
>  		zone_pcp_reset(zone);
>  		build_all_zonelists(NULL);
>  	}
>  
> -	if (arg.status_change_nid >= 0) {
> +	if (node_arg.status_change_nid >= 0) {
>  		kcompactd_stop(node);
>  		kswapd_stop(node);
> +		/*Node went memoryless. Notifiy interested consumers */
> +		node_notify(NODE_BECAME_MEMORYLESS, &node_arg);
>  	}
>  
>  	writeback_set_ratelimit();
>  
> -	memory_notify(MEM_OFFLINE, &arg);
> +	memory_notify(MEM_OFFLINE, &mem_arg);
>  	remove_pfn_range_from_zone(zone, start_pfn, nr_pages);
>  	return 0;
>  
>  failed_removal_isolated:
>  	/* pushback to free area */
>  	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
> -	memory_notify(MEM_CANCEL_OFFLINE, &arg);
> +	if (cancel_node_notifier_on_err)
> +		node_notify(NODE_CANCEL_MEMORYLESS, &node_arg);
> +	if (cancel_mem_notifier_on_err)
> +		memory_notify(MEM_CANCEL_OFFLINE, &mem_arg);

Ditto.

>  failed_removal_pcplists_disabled:
>  	lru_cache_enable();
>  	zone_pcp_enable(zone);
> diff --git a/mm/slub.c b/mm/slub.c
> index 184fd2b14758..74350f6c8ddd 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -5928,10 +5928,10 @@ static int slab_mem_going_offline_callback(void *arg)
>  
>  static void slab_mem_offline_callback(void *arg)
>  {
> -	struct memory_notify *marg = arg;
> +	struct node_notify *narg = arg;
>  	int offline_node;
>  
> -	offline_node = marg->status_change_nid_normal;
> +	offline_node = narg->status_change_nid_normal;
>  
>  	/*
>  	 * If the node still has available memory. we need kmem_cache_node
> @@ -5954,8 +5954,8 @@ static int slab_mem_going_online_callback(void *arg)
>  {
>  	struct kmem_cache_node *n;
>  	struct kmem_cache *s;
> -	struct memory_notify *marg = arg;
> -	int nid = marg->status_change_nid_normal;
> +	struct node_notify *narg = arg;
> +	int nid = narg->status_change_nid_normal;
>  	int ret = 0;
>  
>  	/*
> @@ -6007,18 +6007,18 @@ static int slab_memory_callback(struct notifier_block *self,
>  	int ret = 0;
>  
>  	switch (action) {
> -	case MEM_GOING_ONLINE:
> +	case NODE_BECOMING_MEM_AWARE:
>  		ret = slab_mem_going_online_callback(arg);
>  		break;
> -	case MEM_GOING_OFFLINE:
> +	case NODE_BECOMING_MEMORYLESS:
>  		ret = slab_mem_going_offline_callback(arg);
>  		break;
> -	case MEM_OFFLINE:
> -	case MEM_CANCEL_ONLINE:
> +	case NODE_BECAME_MEMORYLESS:
> +	case NODE_CANCEL_MEM_AWARE:
>  		slab_mem_offline_callback(arg);
>  		break;
> -	case MEM_ONLINE:
> -	case MEM_CANCEL_OFFLINE:
> +	case NODE_BECAME_MEM_AWARE:
> +	case NODE_CANCEL_MEMORYLESS:
>  		break;
>  	}
>  	if (ret)
> @@ -6094,7 +6094,7 @@ void __init kmem_cache_init(void)
>  			sizeof(struct kmem_cache_node),
>  			SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
>  
> -	hotplug_memory_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
> +	hotplug_node_notifier(slab_memory_callback, SLAB_CALLBACK_PRI);
>  
>  	/* Able to allocate the per node structures */
>  	slab_state = PARTIAL;



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] Implement numa node notifier
  2025-04-01  9:27 [PATCH 0/2] Implement numa node notifier Oscar Salvador
  2025-04-01  9:27 ` [PATCH 1/2] mm,memory_hotplug: " Oscar Salvador
  2025-04-01  9:27 ` [PATCH 2/2] mm,memory_hotplug: Replace status_change_nid parameter in memory_notify Oscar Salvador
@ 2025-04-02 16:06 ` Vlastimil Babka
  2025-04-02 17:03   ` Oscar Salvador
  2025-04-03 12:29 ` Jonathan Cameron
  3 siblings, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2025-04-02 16:06 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: David Hildenbrand, linux-mm, linux-kernel, Hyeonggon Yoo,
	mkoutny, Dan Williams, Jonathan Cameron

On 4/1/25 11:27, Oscar Salvador wrote:
> Memory notifier is a tool that allow consumers to get notified whenever
> memory gets onlined or offlined in the system.
> Currently, there are 10 consumers of that, but 5 out of those 10 consumers
> are only interested in getting notifications when a numa node has changed its
> state.
> That means going from memoryless to memory-aware of vice versa.
> 
> Which means that for every {online,offline}_pages operation they get
> notified even though the numa node might not have changed its state.
> 
> The first patch implements a numa node notifier that does just that, and have
> those consumers register in there, so they get notified only when they are
> interested.

What if we had two chains:

register_node_notifier()
register_node_normal_notifier()

I think they could have shared the state #defines and struct node_notify
would have just one nid and be always >= 0.

Or would it add too much extra boilerplate and only slab cares?

> The second patch replaces 'status_change_normal{_normal}' fields within
> memory_notify with a 'nid', as that is only what we need for memory
> notifer and update the only user of it (page_ext).
> 
> Consumers that are only interested in numa node states change are:
> 
>  - memory-tier
>  - slub
>  - cpuset
>  - hmat
>  - cxl
> 
> 
> Oscar Salvador (2):
>   mm,memory_hotplug: Implement numa node notifier
>   mm,memory_hotplug: Replace status_change_nid parameter in
>     memory_notify
> 
>  drivers/acpi/numa/hmat.c  |  6 +--
>  drivers/base/node.c       | 19 +++++++++
>  drivers/cxl/core/region.c | 14 +++----
>  drivers/cxl/cxl.h         |  4 +-
>  include/linux/memory.h    | 37 ++++++++++++++++++
>  kernel/cgroup/cpuset.c    |  2 +-
>  mm/memory-tiers.c         |  8 ++--
>  mm/memory_hotplug.c       | 82 +++++++++++++++++++++++++++++----------
>  mm/page_ext.c             | 12 +-----
>  mm/slub.c                 | 22 +++++------
>  10 files changed, 146 insertions(+), 60 deletions(-)
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] mm,memory_hotplug: Replace status_change_nid parameter in memory_notify
  2025-04-01  9:27 ` [PATCH 2/2] mm,memory_hotplug: Replace status_change_nid parameter in memory_notify Oscar Salvador
  2025-04-02  2:53   ` Harry Yoo
@ 2025-04-02 16:09   ` Vlastimil Babka
  1 sibling, 0 replies; 22+ messages in thread
From: Vlastimil Babka @ 2025-04-02 16:09 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: David Hildenbrand, linux-mm, linux-kernel, Hyeonggon Yoo,
	mkoutny, Dan Williams, Jonathan Cameron

On 4/1/25 11:27, Oscar Salvador wrote:
> memory notify consumers are only interested in which node the memory we
> are adding belongs to, so replace current status_change_nid{_normal} fields
> with only one that specifies the node.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mm,memory_hotplug: Implement numa node notifier
  2025-04-02 16:03   ` Vlastimil Babka
@ 2025-04-02 16:57     ` Oscar Salvador
  0 siblings, 0 replies; 22+ messages in thread
From: Oscar Salvador @ 2025-04-02 16:57 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Hildenbrand, linux-mm, linux-kernel,
	Hyeonggon Yoo, mkoutny, Dan Williams, Jonathan Cameron

On Wed, Apr 02, 2025 at 06:03:04PM +0200, Vlastimil Babka wrote:
> On 4/1/25 11:27, Oscar Salvador wrote:
> > -	arg.start_pfn = pfn;
> > -	arg.nr_pages = nr_pages;
> > -	node_states_check_changes_online(nr_pages, zone, &arg);
> > +	mem_arg.start_pfn = pfn;
> > +	mem_arg.nr_pages = nr_pages;
> > +	node_states_check_changes_online(nr_pages, zone, &node_arg);
> >  
> > -	ret = memory_notify(MEM_GOING_ONLINE, &arg);
> > +	if (node_arg.status_change_nid >= 0) {
> 
> Hmm, don't we need to add "|| node_arg.status_change_nid_normal >= 0"? Or we
> fail to notify addition of normal memory to a node that already has !normal
> memory?

Ah, I think you are right, nice catch.
Yes, if we add normal (<= ZONE_NORMAL) memory to a node that only has ZONE_MOVABLE,
status_change_nid_normal will change, but status_change_nid will not.

So yes, we need a "|| node_arg.status_change_nid_normal >= 0".

> > -	memory_notify(MEM_CANCEL_ONLINE, &arg);
> > +	if (cancel_node_notifier_on_err)
> > +		node_notify(NODE_CANCEL_MEM_AWARE, &node_arg);
> > +	if (cancel_mem_notifier_on_err)
> > +		memory_notify(MEM_CANCEL_ONLINE, &mem_arg);
> 
> Switch the order of those just for symmetry? :)

Will do.

> > -	arg.start_pfn = start_pfn;
> > -	arg.nr_pages = nr_pages;
> > -	node_states_check_changes_offline(nr_pages, zone, &arg);
> > +	mem_arg.start_pfn = start_pfn;
> > +	mem_arg.nr_pages = nr_pages;
> > +	node_states_check_changes_offline(nr_pages, zone, &node_arg);
> > +
> > +	if (node_arg.status_change_nid >= 0) {
> 
> Ditto.

Yap.

> >  failed_removal_isolated:
> >  	/* pushback to free area */
> >  	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
> > -	memory_notify(MEM_CANCEL_OFFLINE, &arg);
> > +	if (cancel_node_notifier_on_err)
> > +		node_notify(NODE_CANCEL_MEMORYLESS, &node_arg);
> > +	if (cancel_mem_notifier_on_err)
> > +		memory_notify(MEM_CANCEL_OFFLINE, &mem_arg);
> 
> Ditto.

Copy that.

Thanks Vlastimil!

-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] Implement numa node notifier
  2025-04-02 16:06 ` [PATCH 0/2] Implement numa node notifier Vlastimil Babka
@ 2025-04-02 17:03   ` Oscar Salvador
  2025-04-03 13:02     ` David Hildenbrand
  0 siblings, 1 reply; 22+ messages in thread
From: Oscar Salvador @ 2025-04-02 17:03 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Hildenbrand, linux-mm, linux-kernel,
	Hyeonggon Yoo, mkoutny, Dan Williams, Jonathan Cameron

On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote:
> What if we had two chains:
> 
> register_node_notifier()
> register_node_normal_notifier()
> 
> I think they could have shared the state #defines and struct node_notify
> would have just one nid and be always >= 0.
> 
> Or would it add too much extra boilerplate and only slab cares?

We could indeed go on that direction to try to decouple
status_change_nid from status_change_nid_normal.

Although as you said, slub is the only user of status_change_nid_normal
for the time beign, so I am not sure of adding a second chain for only
one user.

Might look cleaner though, and the advantatge is that slub would not get
notified for nodes adquiring only ZONE_MOVABLE.

Let us see what David thinks about it.

thanks for the suggestion ;-)

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] Implement numa node notifier
  2025-04-01  9:27 [PATCH 0/2] Implement numa node notifier Oscar Salvador
                   ` (2 preceding siblings ...)
  2025-04-02 16:06 ` [PATCH 0/2] Implement numa node notifier Vlastimil Babka
@ 2025-04-03 12:29 ` Jonathan Cameron
  3 siblings, 0 replies; 22+ messages in thread
From: Jonathan Cameron @ 2025-04-03 12:29 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, David Hildenbrand, linux-mm, linux-kernel,
	Vlastimil Babka, Hyeonggon Yoo, mkoutny, Dan Williams, linux-cxl

On Tue,  1 Apr 2025 11:27:14 +0200
Oscar Salvador <osalvador@suse.de> wrote:

> Memory notifier is a tool that allow consumers to get notified whenever
> memory gets onlined or offlined in the system.
> Currently, there are 10 consumers of that, but 5 out of those 10 consumers
> are only interested in getting notifications when a numa node has changed its
> state.
> That means going from memoryless to memory-aware of vice versa.
> 
> Which means that for every {online,offline}_pages operation they get
> notified even though the numa node might not have changed its state.
> 
> The first patch implements a numa node notifier that does just that, and have
> those consumers register in there, so they get notified only when they are
> interested.
> 
> The second patch replaces 'status_change_normal{_normal}' fields within
> memory_notify with a 'nid', as that is only what we need for memory
> notifer and update the only user of it (page_ext).
> 
> Consumers that are only interested in numa node states change are:
> 
>  - memory-tier
>  - slub
>  - cpuset
>  - hmat
>  - cxl
> 
Hi Oscar,

Idea seems good to me.

+CC linux-cxl for information of others not on the thread.

> 
> Oscar Salvador (2):
>   mm,memory_hotplug: Implement numa node notifier
>   mm,memory_hotplug: Replace status_change_nid parameter in
>     memory_notify
> 
>  drivers/acpi/numa/hmat.c  |  6 +--
>  drivers/base/node.c       | 19 +++++++++
>  drivers/cxl/core/region.c | 14 +++----
>  drivers/cxl/cxl.h         |  4 +-
>  include/linux/memory.h    | 37 ++++++++++++++++++
>  kernel/cgroup/cpuset.c    |  2 +-
>  mm/memory-tiers.c         |  8 ++--
>  mm/memory_hotplug.c       | 82 +++++++++++++++++++++++++++++----------
>  mm/page_ext.c             | 12 +-----
>  mm/slub.c                 | 22 +++++------
>  10 files changed, 146 insertions(+), 60 deletions(-)
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mm,memory_hotplug: Implement numa node notifier
  2025-04-01  9:27 ` [PATCH 1/2] mm,memory_hotplug: " Oscar Salvador
  2025-04-01 14:19   ` Harry Yoo
  2025-04-02 16:03   ` Vlastimil Babka
@ 2025-04-03 12:44   ` Jonathan Cameron
  2025-04-04 10:09   ` David Hildenbrand
  3 siblings, 0 replies; 22+ messages in thread
From: Jonathan Cameron @ 2025-04-03 12:44 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, David Hildenbrand, linux-mm, linux-kernel,
	Vlastimil Babka, Hyeonggon Yoo, mkoutny, Dan Williams, linux-cxl

On Tue,  1 Apr 2025 11:27:15 +0200
Oscar Salvador <osalvador@suse.de> wrote:

> There are at least four consumers of hotplug_memory_notifier that what they
> really are interested in is whether any numa node changed its state, e.g: going
> from being memory aware to becoming memoryless.

Cover letter says 5.  Whilst that's at least 4, maybe update this if you do
a v2 to say at least 5 :)

> 
> Implement a specific notifier for numa nodes when their state gets changed,
> and have those consumers that only care about numa node state changes use it.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

A couple of trivial things below.  To me this looks fine otherwise.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> +#define hotplug_node_notifier(fn, pri) ({		\
> +	static __meminitdata struct notifier_block fn##_node_nb =\
> +		{ .notifier_call = fn, .priority = pri };\
> +	register_node_notifier(&fn##_node_nb);			\
> +})

Trivial but spacing before \ seems rather random. Maybe I'm missing
how it is consistent.

> +
>  #ifdef CONFIG_NUMA
>  void memory_block_add_nid(struct memory_block *mem, int nid,
>  			  enum meminit_context context);

> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 75401866fb76..4bb9ff282ec9 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c


> @@ -2106,27 +2143,32 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  	 * Make sure to mark the node as memory-less before rebuilding the zone
>  	 * list. Otherwise this node would still appear in the fallback lists.
>  	 */
> -	node_states_clear_node(node, &arg);
> +	node_states_clear_node(node, &node_arg);
>  	if (!populated_zone(zone)) {
>  		zone_pcp_reset(zone);
>  		build_all_zonelists(NULL);
>  	}
>  
> -	if (arg.status_change_nid >= 0) {
> +	if (node_arg.status_change_nid >= 0) {
>  		kcompactd_stop(node);
>  		kswapd_stop(node);
> +		/*Node went memoryless. Notifiy interested consumers */

Trivial: missing space after *

> +		node_notify(NODE_BECAME_MEMORYLESS, &node_arg);
>  	}
>  
>  	writeback_set_ratelimit();
>  
> -	memory_notify(MEM_OFFLINE, &arg);
> +	memory_notify(MEM_OFFLINE, &mem_arg);
>  	remove_pfn_range_from_zone(zone, start_pfn, nr_pages);
>  	return 0;
>  
>  failed_removal_isolated:
>  	/* pushback to free area */
>  	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
> -	memory_notify(MEM_CANCEL_OFFLINE, &arg);
> +	if (cancel_node_notifier_on_err)
> +		node_notify(NODE_CANCEL_MEMORYLESS, &node_arg);
> +	if (cancel_mem_notifier_on_err)
> +		memory_notify(MEM_CANCEL_OFFLINE, &mem_arg);
>  failed_removal_pcplists_disabled:
>  	lru_cache_enable();
>  	zone_pcp_enable(zone);




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] Implement numa node notifier
  2025-04-02 17:03   ` Oscar Salvador
@ 2025-04-03 13:02     ` David Hildenbrand
  2025-04-03 13:08       ` David Hildenbrand
  2025-04-03 22:06       ` Harry Yoo
  0 siblings, 2 replies; 22+ messages in thread
From: David Hildenbrand @ 2025-04-03 13:02 UTC (permalink / raw)
  To: Oscar Salvador, Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Hyeonggon Yoo, mkoutny,
	Dan Williams, Jonathan Cameron

On 02.04.25 19:03, Oscar Salvador wrote:
> On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote:
>> What if we had two chains:
>>
>> register_node_notifier()
>> register_node_normal_notifier()
>>
>> I think they could have shared the state #defines and struct node_notify
>> would have just one nid and be always >= 0.
>>
>> Or would it add too much extra boilerplate and only slab cares?
> 
> We could indeed go on that direction to try to decouple
> status_change_nid from status_change_nid_normal.
> 
> Although as you said, slub is the only user of status_change_nid_normal
> for the time beign, so I am not sure of adding a second chain for only
> one user.
> 
> Might look cleaner though, and the advantatge is that slub would not get
> notified for nodes adquiring only ZONE_MOVABLE.
> 
> Let us see what David thinks about it.

I'd hope we'd be able to get rid of the _normal stuff completely, it's seems
way to specialized.

We added that in

commit b9d5ab2562eceeada5e4837a621b6260574dd11d
Author: Lai Jiangshan <laijs@cn.fujitsu.com>
Date:   Tue Dec 11 16:01:05 2012 -0800

     slub, hotplug: ignore unrelated node's hot-adding and hot-removing
     
     SLUB only focuses on the nodes which have normal memory and it ignores the
     other node's hot-adding and hot-removing.
     
     Aka: if some memory of a node which has no onlined memory is online, but
     this new memory onlined is not normal memory (for example, highmem), we
     should not allocate kmem_cache_node for SLUB.
     
     And if the last normal memory is offlined, but the node still has memory,
     we should remove kmem_cache_node for that node.  (The current code delays
     it when all of the memory is offlined)
     
     So we only do something when marg->status_change_nid_normal > 0.
     marg->status_change_nid is not suitable here.
     
     The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3
     for every node even the node don't have normal memory, SLAB tolerates
     kmem_list3 on alien nodes.  SLUB only focuses on the nodes which have
     normal memory, it don't tolerate alien kmem_cache_node.  The patch makes
     SLUB become self-compatible and avoids WARNs and BUGs in rare conditions.


How "bad" would it be if we do the slab_mem_going_online_callback() etc even
for completely-movable nodes? I assume one kmem_cache_alloc() per slab_caches.

slab_mem_going_offline_callback() only does shrinking, #dontcare

Looking at slab_mem_offline_callback(), we never even free the caches either
way when offlining. So the implication would be that we would have movable-only nodes
set in slab_nodes.


We don't expect many such nodes, so ... do we care?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] Implement numa node notifier
  2025-04-03 13:02     ` David Hildenbrand
@ 2025-04-03 13:08       ` David Hildenbrand
  2025-04-03 13:57         ` Harry Yoo
  2025-04-04  8:47         ` Vlastimil Babka
  2025-04-03 22:06       ` Harry Yoo
  1 sibling, 2 replies; 22+ messages in thread
From: David Hildenbrand @ 2025-04-03 13:08 UTC (permalink / raw)
  To: Oscar Salvador, Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Hyeonggon Yoo, mkoutny,
	Dan Williams, Jonathan Cameron

On 03.04.25 15:02, David Hildenbrand wrote:
> On 02.04.25 19:03, Oscar Salvador wrote:
>> On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote:
>>> What if we had two chains:
>>>
>>> register_node_notifier()
>>> register_node_normal_notifier()
>>>
>>> I think they could have shared the state #defines and struct node_notify
>>> would have just one nid and be always >= 0.
>>>
>>> Or would it add too much extra boilerplate and only slab cares?
>>
>> We could indeed go on that direction to try to decouple
>> status_change_nid from status_change_nid_normal.
>>
>> Although as you said, slub is the only user of status_change_nid_normal
>> for the time beign, so I am not sure of adding a second chain for only
>> one user.
>>
>> Might look cleaner though, and the advantatge is that slub would not get
>> notified for nodes adquiring only ZONE_MOVABLE.
>>
>> Let us see what David thinks about it.
> 
> I'd hope we'd be able to get rid of the _normal stuff completely, it's seems
> way to specialized.
> 
> We added that in
> 
> commit b9d5ab2562eceeada5e4837a621b6260574dd11d
> Author: Lai Jiangshan <laijs@cn.fujitsu.com>
> Date:   Tue Dec 11 16:01:05 2012 -0800
> 
>       slub, hotplug: ignore unrelated node's hot-adding and hot-removing
>       
>       SLUB only focuses on the nodes which have normal memory and it ignores the
>       other node's hot-adding and hot-removing.
>       
>       Aka: if some memory of a node which has no onlined memory is online, but
>       this new memory onlined is not normal memory (for example, highmem), we
>       should not allocate kmem_cache_node for SLUB.
>       
>       And if the last normal memory is offlined, but the node still has memory,
>       we should remove kmem_cache_node for that node.  (The current code delays
>       it when all of the memory is offlined)
>       
>       So we only do something when marg->status_change_nid_normal > 0.
>       marg->status_change_nid is not suitable here.
>       
>       The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3
>       for every node even the node don't have normal memory, SLAB tolerates
>       kmem_list3 on alien nodes.  SLUB only focuses on the nodes which have
>       normal memory, it don't tolerate alien kmem_cache_node.  The patch makes
>       SLUB become self-compatible and avoids WARNs and BUGs in rare conditions.
> 
> 
> How "bad" would it be if we do the slab_mem_going_online_callback() etc even
> for completely-movable nodes? I assume one kmem_cache_alloc() per slab_caches.
> 
> slab_mem_going_offline_callback() only does shrinking, #dontcare
> 
> Looking at slab_mem_offline_callback(), we never even free the caches either
> way when offlining. So the implication would be that we would have movable-only nodes
> set in slab_nodes.
> 
> 
> We don't expect many such nodes, so ... do we care?

BTW, isn't description of slab_nodes wrong?

"Tracks for which NUMA nodes we have kmem_cache_nodes allocated." -- but 
as there is no freeing done in slab_mem_offline_callback(), isn't it 
always kept allocated?

(probably I am missing something)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] Implement numa node notifier
  2025-04-03 13:08       ` David Hildenbrand
@ 2025-04-03 13:57         ` Harry Yoo
  2025-04-04  8:47         ` Vlastimil Babka
  1 sibling, 0 replies; 22+ messages in thread
From: Harry Yoo @ 2025-04-03 13:57 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, Vlastimil Babka, Andrew Morton, linux-mm,
	linux-kernel, Hyeonggon Yoo, mkoutny, Dan Williams,
	Jonathan Cameron

On Thu, Apr 03, 2025 at 03:08:18PM +0200, David Hildenbrand wrote:
> On 03.04.25 15:02, David Hildenbrand wrote:
> > On 02.04.25 19:03, Oscar Salvador wrote:
> > > On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote:
> > > > What if we had two chains:
> > > > 
> > > > register_node_notifier()
> > > > register_node_normal_notifier()
> > > > 
> > > > I think they could have shared the state #defines and struct node_notify
> > > > would have just one nid and be always >= 0.
> > > > 
> > > > Or would it add too much extra boilerplate and only slab cares?
> > > 
> > > We could indeed go on that direction to try to decouple
> > > status_change_nid from status_change_nid_normal.
> > > 
> > > Although as you said, slub is the only user of status_change_nid_normal
> > > for the time beign, so I am not sure of adding a second chain for only
> > > one user.
> > > 
> > > Might look cleaner though, and the advantatge is that slub would not get
> > > notified for nodes adquiring only ZONE_MOVABLE.
> > > 
> > > Let us see what David thinks about it.
> > 
> > I'd hope we'd be able to get rid of the _normal stuff completely, it's seems
> > way to specialized.
> > 
> > We added that in
> > 
> > commit b9d5ab2562eceeada5e4837a621b6260574dd11d
> > Author: Lai Jiangshan <laijs@cn.fujitsu.com>
> > Date:   Tue Dec 11 16:01:05 2012 -0800
> > 
> >       slub, hotplug: ignore unrelated node's hot-adding and hot-removing
> >       SLUB only focuses on the nodes which have normal memory and it ignores the
> >       other node's hot-adding and hot-removing.
> >       Aka: if some memory of a node which has no onlined memory is online, but
> >       this new memory onlined is not normal memory (for example, highmem), we
> >       should not allocate kmem_cache_node for SLUB.
> >       And if the last normal memory is offlined, but the node still has memory,
> >       we should remove kmem_cache_node for that node.  (The current code delays
> >       it when all of the memory is offlined)
> >       So we only do something when marg->status_change_nid_normal > 0.
> >       marg->status_change_nid is not suitable here.
> >       The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3
> >       for every node even the node don't have normal memory, SLAB tolerates
> >       kmem_list3 on alien nodes.  SLUB only focuses on the nodes which have
> >       normal memory, it don't tolerate alien kmem_cache_node.  The patch makes
> >       SLUB become self-compatible and avoids WARNs and BUGs in rare conditions.
> > 
> > 
> > How "bad" would it be if we do the slab_mem_going_online_callback() etc even
> > for completely-movable nodes? I assume one kmem_cache_alloc() per slab_caches.
> > 
> > slab_mem_going_offline_callback() only does shrinking, #dontcare
> > 
> > Looking at slab_mem_offline_callback(), we never even free the caches either
> > way when offlining. So the implication would be that we would have movable-only nodes
> > set in slab_nodes.
> > 
> > 
> > We don't expect many such nodes, so ... do we care?
> 
> BTW, isn't description of slab_nodes wrong?
> 
> "Tracks for which NUMA nodes we have kmem_cache_nodes allocated." -- but as
> there is no freeing done in slab_mem_offline_callback(), isn't it always
> kept allocated?

It was, but not anymore :)

I think this patch series [1] forgot the fact that it changed the meaning
from 'NUMA nodes that have kmem_cache_node', to 'NUMA nodes that have normal
memory (that can be allocated as slab memory)'?

[1] https://lore.kernel.org/all/20210113131634.3671-1-vbabka@suse.cz

> 
> (probably I am missing something)
> 
> -- 
> Cheers,
> 
> David / dhildenb
> 
> 

-- 
Cheers,
Harry (formerly known as Hyeonggon)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] Implement numa node notifier
  2025-04-03 13:02     ` David Hildenbrand
  2025-04-03 13:08       ` David Hildenbrand
@ 2025-04-03 22:06       ` Harry Yoo
  2025-04-04  8:50         ` Vlastimil Babka
  1 sibling, 1 reply; 22+ messages in thread
From: Harry Yoo @ 2025-04-03 22:06 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, Vlastimil Babka, Andrew Morton, linux-mm,
	linux-kernel, Hyeonggon Yoo, mkoutny, Dan Williams,
	Jonathan Cameron

On Thu, Apr 03, 2025 at 03:02:25PM +0200, David Hildenbrand wrote:
> On 02.04.25 19:03, Oscar Salvador wrote:
> > On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote:
> > > What if we had two chains:
> > > 
> > > register_node_notifier()
> > > register_node_normal_notifier()
> > > 
> > > I think they could have shared the state #defines and struct node_notify
> > > would have just one nid and be always >= 0.
> > > 
> > > Or would it add too much extra boilerplate and only slab cares?
> > 
> > We could indeed go on that direction to try to decouple
> > status_change_nid from status_change_nid_normal.
> > 
> > Although as you said, slub is the only user of status_change_nid_normal
> > for the time beign, so I am not sure of adding a second chain for only
> > one user.
> > 
> > Might look cleaner though, and the advantatge is that slub would not get
> > notified for nodes adquiring only ZONE_MOVABLE.
> > 
> > Let us see what David thinks about it.
> 
> I'd hope we'd be able to get rid of the _normal stuff completely, it's seems
> way to specialized.

Hmm, perhaps we can remove it with as part of this patch series?

status_change_nid_normal has been used to indicate both 'There is a
status change' AND 'The node id when the NUMA node has normal memory'.

But since NUMA node notifier triggers only when there is a state change,
it can simply pass nid, like patch 2 does. SLUB can then check whether the
node has normal memory.

Or am I missing something?

> We added that in
> 
> commit b9d5ab2562eceeada5e4837a621b6260574dd11d
> Author: Lai Jiangshan <laijs@cn.fujitsu.com>
> Date:   Tue Dec 11 16:01:05 2012 -0800
> 
>     slub, hotplug: ignore unrelated node's hot-adding and hot-removing
>     SLUB only focuses on the nodes which have normal memory and it ignores the
>     other node's hot-adding and hot-removing.
>     Aka: if some memory of a node which has no onlined memory is online, but
>     this new memory onlined is not normal memory (for example, highmem), we
>     should not allocate kmem_cache_node for SLUB.
>     And if the last normal memory is offlined, but the node still has memory,
>     we should remove kmem_cache_node for that node.  (The current code delays
>     it when all of the memory is offlined)
>     So we only do something when marg->status_change_nid_normal > 0.
>     marg->status_change_nid is not suitable here.
>     The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3
>     for every node even the node don't have normal memory, SLAB tolerates
>     kmem_list3 on alien nodes.  SLUB only focuses on the nodes which have
>     normal memory, it don't tolerate alien kmem_cache_node.  The patch makes
>     SLUB become self-compatible and avoids WARNs and BUGs in rare conditions.
> 
> 
> How "bad" would it be if we do the slab_mem_going_online_callback() etc even
> for completely-movable nodes? I assume one kmem_cache_alloc() per slab_caches.
>
> slab_mem_going_offline_callback() only does shrinking, #dontcare
> 
> Looking at slab_mem_offline_callback(), we never even free the caches either
> way when offlining. So the implication would be that we would have movable-only nodes
> set in slab_nodes.
> 
> 
> We don't expect many such nodes, so ... do we care?
> 
> -- 
> Cheers,
> 
> David / dhildenb
> 
> 

-- 
Cheers,
Harry (formerly known as Hyeonggon)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] Implement numa node notifier
  2025-04-03 13:08       ` David Hildenbrand
  2025-04-03 13:57         ` Harry Yoo
@ 2025-04-04  8:47         ` Vlastimil Babka
  1 sibling, 0 replies; 22+ messages in thread
From: Vlastimil Babka @ 2025-04-04  8:47 UTC (permalink / raw)
  To: David Hildenbrand, Oscar Salvador
  Cc: Andrew Morton, linux-mm, linux-kernel, Hyeonggon Yoo, mkoutny,
	Dan Williams, Jonathan Cameron

On 4/3/25 15:08, David Hildenbrand wrote:
> On 03.04.25 15:02, David Hildenbrand wrote:
>> On 02.04.25 19:03, Oscar Salvador wrote:
>>> On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote:
>>>> What if we had two chains:
>>>>
>>>> register_node_notifier()
>>>> register_node_normal_notifier()
>>>>
>>>> I think they could have shared the state #defines and struct node_notify
>>>> would have just one nid and be always >= 0.
>>>>
>>>> Or would it add too much extra boilerplate and only slab cares?
>>>
>>> We could indeed go on that direction to try to decouple
>>> status_change_nid from status_change_nid_normal.
>>>
>>> Although as you said, slub is the only user of status_change_nid_normal
>>> for the time beign, so I am not sure of adding a second chain for only
>>> one user.
>>>
>>> Might look cleaner though, and the advantatge is that slub would not get
>>> notified for nodes adquiring only ZONE_MOVABLE.
>>>
>>> Let us see what David thinks about it.
>> 
>> I'd hope we'd be able to get rid of the _normal stuff completely, it's seems
>> way to specialized.
>> 
>> We added that in
>> 
>> commit b9d5ab2562eceeada5e4837a621b6260574dd11d
>> Author: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Date:   Tue Dec 11 16:01:05 2012 -0800
>> 
>>       slub, hotplug: ignore unrelated node's hot-adding and hot-removing
>>       
>>       SLUB only focuses on the nodes which have normal memory and it ignores the
>>       other node's hot-adding and hot-removing.
>>       
>>       Aka: if some memory of a node which has no onlined memory is online, but
>>       this new memory onlined is not normal memory (for example, highmem), we
>>       should not allocate kmem_cache_node for SLUB.
>>       
>>       And if the last normal memory is offlined, but the node still has memory,
>>       we should remove kmem_cache_node for that node.  (The current code delays
>>       it when all of the memory is offlined)
>>       
>>       So we only do something when marg->status_change_nid_normal > 0.
>>       marg->status_change_nid is not suitable here.
>>       
>>       The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3
>>       for every node even the node don't have normal memory, SLAB tolerates
>>       kmem_list3 on alien nodes.  SLUB only focuses on the nodes which have
>>       normal memory, it don't tolerate alien kmem_cache_node.  The patch makes
>>       SLUB become self-compatible and avoids WARNs and BUGs in rare conditions.
>> 
>> 
>> How "bad" would it be if we do the slab_mem_going_online_callback() etc even
>> for completely-movable nodes? I assume one kmem_cache_alloc() per slab_caches.

Yes.

>> slab_mem_going_offline_callback() only does shrinking, #dontcare
>> 
>> Looking at slab_mem_offline_callback(), we never even free the caches either
>> way when offlining. So the implication would be that we would have movable-only nodes
>> set in slab_nodes.

Yes.

>> We don't expect many such nodes, so ... do we care?

Right, the memory waste should be negligible in the big picture. So Oscar
can you change this as part of your series?

> BTW, isn't description of slab_nodes wrong?
> 
> "Tracks for which NUMA nodes we have kmem_cache_nodes allocated." -- but 
> as there is no freeing done in slab_mem_offline_callback(), isn't it 
> always kept allocated?

Hm yes right now it's "is it in slab_nodes => it's allocated" and not
equivalent.

I guess we could remove slab_mem_offline_callback() completely and thus stop
clearing from slab_nodes and the comment will be true again?

That would mean new caches created after node hotremove will allocate for
that node but that's even more negligible than the waste we're doing already.

> (probably I am missing something)
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] Implement numa node notifier
  2025-04-03 22:06       ` Harry Yoo
@ 2025-04-04  8:50         ` Vlastimil Babka
  2025-04-04 10:02           ` Harry Yoo
  0 siblings, 1 reply; 22+ messages in thread
From: Vlastimil Babka @ 2025-04-04  8:50 UTC (permalink / raw)
  To: Harry Yoo, David Hildenbrand
  Cc: Oscar Salvador, Andrew Morton, linux-mm, linux-kernel,
	Hyeonggon Yoo, mkoutny, Dan Williams, Jonathan Cameron

On 4/4/25 00:06, Harry Yoo wrote:
> On Thu, Apr 03, 2025 at 03:02:25PM +0200, David Hildenbrand wrote:
>> On 02.04.25 19:03, Oscar Salvador wrote:
>> > On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote:
>> > > What if we had two chains:
>> > > 
>> > > register_node_notifier()
>> > > register_node_normal_notifier()
>> > > 
>> > > I think they could have shared the state #defines and struct node_notify
>> > > would have just one nid and be always >= 0.
>> > > 
>> > > Or would it add too much extra boilerplate and only slab cares?
>> > 
>> > We could indeed go on that direction to try to decouple
>> > status_change_nid from status_change_nid_normal.
>> > 
>> > Although as you said, slub is the only user of status_change_nid_normal
>> > for the time beign, so I am not sure of adding a second chain for only
>> > one user.
>> > 
>> > Might look cleaner though, and the advantatge is that slub would not get
>> > notified for nodes adquiring only ZONE_MOVABLE.
>> > 
>> > Let us see what David thinks about it.
>> 
>> I'd hope we'd be able to get rid of the _normal stuff completely, it's seems
>> way to specialized.
> 
> Hmm, perhaps we can remove it with as part of this patch series?
> 
> status_change_nid_normal has been used to indicate both 'There is a
> status change' AND 'The node id when the NUMA node has normal memory'.
> 
> But since NUMA node notifier triggers only when there is a state change,
> it can simply pass nid, like patch 2 does. SLUB can then check whether the
> node has normal memory.

Well the state change could be adding movable memory, SLUB checks that
there's no normal memory and thus does nothing. Then normal memory is added
to the node, but there's no new notification and SLUB is left without
supporting the node forever.

But with David's suggestion we avoid that problem.

> Or am I missing something?
> 
>> We added that in
>> 
>> commit b9d5ab2562eceeada5e4837a621b6260574dd11d
>> Author: Lai Jiangshan <laijs@cn.fujitsu.com>
>> Date:   Tue Dec 11 16:01:05 2012 -0800
>> 
>>     slub, hotplug: ignore unrelated node's hot-adding and hot-removing
>>     SLUB only focuses on the nodes which have normal memory and it ignores the
>>     other node's hot-adding and hot-removing.
>>     Aka: if some memory of a node which has no onlined memory is online, but
>>     this new memory onlined is not normal memory (for example, highmem), we
>>     should not allocate kmem_cache_node for SLUB.
>>     And if the last normal memory is offlined, but the node still has memory,
>>     we should remove kmem_cache_node for that node.  (The current code delays
>>     it when all of the memory is offlined)
>>     So we only do something when marg->status_change_nid_normal > 0.
>>     marg->status_change_nid is not suitable here.
>>     The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3
>>     for every node even the node don't have normal memory, SLAB tolerates
>>     kmem_list3 on alien nodes.  SLUB only focuses on the nodes which have
>>     normal memory, it don't tolerate alien kmem_cache_node.  The patch makes
>>     SLUB become self-compatible and avoids WARNs and BUGs in rare conditions.
>> 
>> 
>> How "bad" would it be if we do the slab_mem_going_online_callback() etc even
>> for completely-movable nodes? I assume one kmem_cache_alloc() per slab_caches.
>>
>> slab_mem_going_offline_callback() only does shrinking, #dontcare
>> 
>> Looking at slab_mem_offline_callback(), we never even free the caches either
>> way when offlining. So the implication would be that we would have movable-only nodes
>> set in slab_nodes.
>> 
>> 
>> We don't expect many such nodes, so ... do we care?
>> 
>> -- 
>> Cheers,
>> 
>> David / dhildenb
>> 
>> 
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] Implement numa node notifier
  2025-04-04  8:50         ` Vlastimil Babka
@ 2025-04-04 10:02           ` Harry Yoo
  0 siblings, 0 replies; 22+ messages in thread
From: Harry Yoo @ 2025-04-04 10:02 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Hildenbrand, Oscar Salvador, Andrew Morton, linux-mm,
	linux-kernel, Hyeonggon Yoo, mkoutny, Dan Williams,
	Jonathan Cameron

On Fri, Apr 04, 2025 at 10:50:49AM +0200, Vlastimil Babka wrote:
> On 4/4/25 00:06, Harry Yoo wrote:
> > On Thu, Apr 03, 2025 at 03:02:25PM +0200, David Hildenbrand wrote:
> >> On 02.04.25 19:03, Oscar Salvador wrote:
> >> > On Wed, Apr 02, 2025 at 06:06:51PM +0200, Vlastimil Babka wrote:
> >> > > What if we had two chains:
> >> > > 
> >> > > register_node_notifier()
> >> > > register_node_normal_notifier()
> >> > > 
> >> > > I think they could have shared the state #defines and struct node_notify
> >> > > would have just one nid and be always >= 0.
> >> > > 
> >> > > Or would it add too much extra boilerplate and only slab cares?
> >> > 
> >> > We could indeed go on that direction to try to decouple
> >> > status_change_nid from status_change_nid_normal.
> >> > 
> >> > Although as you said, slub is the only user of status_change_nid_normal
> >> > for the time beign, so I am not sure of adding a second chain for only
> >> > one user.
> >> > 
> >> > Might look cleaner though, and the advantatge is that slub would not get
> >> > notified for nodes adquiring only ZONE_MOVABLE.
> >> > 
> >> > Let us see what David thinks about it.
> >> 
> >> I'd hope we'd be able to get rid of the _normal stuff completely, it's seems
> >> way to specialized.
> > 
> > Hmm, perhaps we can remove it with as part of this patch series?
> > 
> > status_change_nid_normal has been used to indicate both 'There is a
> > status change' AND 'The node id when the NUMA node has normal memory'.
> > 
> > But since NUMA node notifier triggers only when there is a state change,
> > it can simply pass nid, like patch 2 does. SLUB can then check whether the
> > node has normal memory.
> 
> Well the state change could be adding movable memory, SLUB checks that
> there's no normal memory and thus does nothing. Then normal memory is added
> to the node, but there's no new notification and SLUB is left without
> supporting the node forever.

You're right. I thought it still notifies state changes from
N_NORMAL_MEMORY || N_MEMORY, but if you don't even want to notify state
change from N_NORMAL_MEMORY... yes. It won't work.

> But with David's suggestion we avoid that problem.

David's suggestion looks good to me too!

> > Or am I missing something?
> > 
> >> We added that in
> >> 
> >> commit b9d5ab2562eceeada5e4837a621b6260574dd11d
> >> Author: Lai Jiangshan <laijs@cn.fujitsu.com>
> >> Date:   Tue Dec 11 16:01:05 2012 -0800
> >> 
> >>     slub, hotplug: ignore unrelated node's hot-adding and hot-removing
> >>     SLUB only focuses on the nodes which have normal memory and it ignores the
> >>     other node's hot-adding and hot-removing.
> >>     Aka: if some memory of a node which has no onlined memory is online, but
> >>     this new memory onlined is not normal memory (for example, highmem), we
> >>     should not allocate kmem_cache_node for SLUB.
> >>     And if the last normal memory is offlined, but the node still has memory,
> >>     we should remove kmem_cache_node for that node.  (The current code delays
> >>     it when all of the memory is offlined)
> >>     So we only do something when marg->status_change_nid_normal > 0.
> >>     marg->status_change_nid is not suitable here.
> >>     The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3
> >>     for every node even the node don't have normal memory, SLAB tolerates
> >>     kmem_list3 on alien nodes.  SLUB only focuses on the nodes which have
> >>     normal memory, it don't tolerate alien kmem_cache_node.  The patch makes
> >>     SLUB become self-compatible and avoids WARNs and BUGs in rare conditions.
> >> 
> >> 
> >> How "bad" would it be if we do the slab_mem_going_online_callback() etc even
> >> for completely-movable nodes? I assume one kmem_cache_alloc() per slab_caches.
> >>
> >> slab_mem_going_offline_callback() only does shrinking, #dontcare
> >> 
> >> Looking at slab_mem_offline_callback(), we never even free the caches either
> >> way when offlining. So the implication would be that we would have movable-only nodes
> >> set in slab_nodes.
> >> 
> >> 
> >> We don't expect many such nodes, so ... do we care?
> >> 
> >> -- 
> >> Cheers,
> >> 
> >> David / dhildenb
> >> 
> >> 
> > 
> 

-- 
Cheers,
Harry (formerly known as Hyeonggon)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mm,memory_hotplug: Implement numa node notifier
  2025-04-01  9:27 ` [PATCH 1/2] mm,memory_hotplug: " Oscar Salvador
                     ` (2 preceding siblings ...)
  2025-04-03 12:44   ` Jonathan Cameron
@ 2025-04-04 10:09   ` David Hildenbrand
  2025-04-04 12:56     ` Oscar Salvador
  3 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2025-04-04 10:09 UTC (permalink / raw)
  To: Oscar Salvador, Andrew Morton
  Cc: linux-mm, linux-kernel, Vlastimil Babka, Hyeonggon Yoo, mkoutny,
	Dan Williams, Jonathan Cameron

On 01.04.25 11:27, Oscar Salvador wrote:
> There are at least four consumers of hotplug_memory_notifier that what they
> really are interested in is whether any numa node changed its state, e.g: going
> from being memory aware to becoming memoryless.
> 
> Implement a specific notifier for numa nodes when their state gets changed,
> and have those consumers that only care about numa node state changes use it.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>   drivers/acpi/numa/hmat.c  |  6 +--
>   drivers/base/node.c       | 19 +++++++++
>   drivers/cxl/core/region.c | 14 +++----
>   drivers/cxl/cxl.h         |  4 +-
>   include/linux/memory.h    | 38 ++++++++++++++++++
>   kernel/cgroup/cpuset.c    |  2 +-
>   mm/memory-tiers.c         |  8 ++--
>   mm/memory_hotplug.c       | 84 +++++++++++++++++++++++++++++----------
>   mm/slub.c                 | 22 +++++-----
>   9 files changed, 148 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> index bfbb08b1e6af..d18f3efa2149 100644
> --- a/drivers/acpi/numa/hmat.c
> +++ b/drivers/acpi/numa/hmat.c
> @@ -918,10 +918,10 @@ static int hmat_callback(struct notifier_block *self,
>   			 unsigned long action, void *arg)
>   {
>   	struct memory_target *target;
> -	struct memory_notify *mnb = arg;
> +	struct node_notify *mnb = arg;
>   	int pxm, nid = mnb->status_change_nid;
>   
> -	if (nid == NUMA_NO_NODE || action != MEM_ONLINE)
> +	if (nid == NUMA_NO_NODE || action != NODE_BECAME_MEM_AWARE)
>   		return NOTIFY_OK;
>   
>   	pxm = node_to_pxm(nid);
> @@ -1074,7 +1074,7 @@ static __init int hmat_init(void)
>   	hmat_register_targets();
>   
>   	/* Keep the table and structures if the notifier may use them */
> -	if (hotplug_memory_notifier(hmat_callback, HMAT_CALLBACK_PRI))
> +	if (hotplug_node_notifier(hmat_callback, HMAT_CALLBACK_PRI))
>   		goto out_put;
>   
>   	if (!hmat_set_default_dram_perf())
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 0ea653fa3433..182c71dfb5b8 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -110,6 +110,25 @@ static const struct attribute_group *node_access_node_groups[] = {
>   	NULL,
>   };
>   
> +static BLOCKING_NOTIFIER_HEAD(node_chain);
> +
> +int register_node_notifier(struct notifier_block *nb)
> +{
> +	return blocking_notifier_chain_register(&node_chain, nb);
> +}
> +EXPORT_SYMBOL(register_node_notifier);
> +
> +void unregister_node_notifier(struct notifier_block *nb)
> +{
> +	blocking_notifier_chain_unregister(&node_chain, nb);
> +}
> +EXPORT_SYMBOL(unregister_node_notifier);
> +
> +int node_notify(unsigned long val, void *v)
> +{
> +	return blocking_notifier_call_chain(&node_chain, val, v);
> +}
> +
>   static void node_remove_accesses(struct node *node)
>   {
>   	struct node_access_nodes *c, *cnext;
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e8d11a988fd9..7d187088f557 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2409,12 +2409,12 @@ static int cxl_region_perf_attrs_callback(struct notifier_block *nb,
>   					  unsigned long action, void *arg)
>   {
>   	struct cxl_region *cxlr = container_of(nb, struct cxl_region,
> -					       memory_notifier);
> -	struct memory_notify *mnb = arg;
> +					       node_notifier);
> +	struct node_notify *mnb = arg;
>   	int nid = mnb->status_change_nid;
>   	int region_nid;
>   
> -	if (nid == NUMA_NO_NODE || action != MEM_ONLINE)
> +	if (nid == NUMA_NO_NODE || action != NODE_BECAME_MEM_AWARE)
>   		return NOTIFY_DONE;
>   
>   	/*
> @@ -3388,7 +3388,7 @@ static void shutdown_notifiers(void *_cxlr)
>   {
>   	struct cxl_region *cxlr = _cxlr;
>   
> -	unregister_memory_notifier(&cxlr->memory_notifier);
> +	unregister_node_notifier(&cxlr->node_notifier);
>   	unregister_mt_adistance_algorithm(&cxlr->adist_notifier);
>   }
>   
> @@ -3427,9 +3427,9 @@ static int cxl_region_probe(struct device *dev)
>   	if (rc)
>   		return rc;
>   
> -	cxlr->memory_notifier.notifier_call = cxl_region_perf_attrs_callback;
> -	cxlr->memory_notifier.priority = CXL_CALLBACK_PRI;
> -	register_memory_notifier(&cxlr->memory_notifier);
> +	cxlr->node_notifier.notifier_call = cxl_region_perf_attrs_callback;
> +	cxlr->node_notifier.priority = CXL_CALLBACK_PRI;
> +	register_node_notifier(&cxlr->node_notifier);
>   
>   	cxlr->adist_notifier.notifier_call = cxl_region_calculate_adistance;
>   	cxlr->adist_notifier.priority = 100;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index bbbaa0d0a670..d4c9a499de7a 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -532,7 +532,7 @@ struct cxl_region_params {
>    * @flags: Region state flags
>    * @params: active + config params for the region
>    * @coord: QoS access coordinates for the region
> - * @memory_notifier: notifier for setting the access coordinates to node
> + * @node_notifier: notifier for setting the access coordinates to node
>    * @adist_notifier: notifier for calculating the abstract distance of node
>    */
>   struct cxl_region {
> @@ -545,7 +545,7 @@ struct cxl_region {
>   	unsigned long flags;
>   	struct cxl_region_params params;
>   	struct access_coordinate coord[ACCESS_COORDINATE_MAX];
> -	struct notifier_block memory_notifier;
> +	struct notifier_block node_notifier;
>   	struct notifier_block adist_notifier;
>   };
>   
> diff --git a/include/linux/memory.h b/include/linux/memory.h
> index 12daa6ec7d09..1d814dfbb8a8 100644
> --- a/include/linux/memory.h
> +++ b/include/linux/memory.h
> @@ -99,6 +99,14 @@ int set_memory_block_size_order(unsigned int order);
>   #define	MEM_PREPARE_ONLINE	(1<<6)
>   #define	MEM_FINISH_OFFLINE	(1<<7)
>   
> +/* These states are used for numa node notifiers */
> +#define NODE_BECOMING_MEM_AWARE		(1<<0)
> +#define NODE_BECAME_MEM_AWARE		(1<<1)
> +#define NODE_BECOMING_MEMORYLESS	(1<<2)
> +#define NODE_BECAME_MEMORYLESS		(1<<3)
> +#define NODE_CANCEL_MEM_AWARE		(1<<4)
> +#define NODE_CANCEL_MEMORYLESS		(1<<5)

Assuming we can remove the _normal stuff and we can do what we do in 
patch #2 here already ... meaning we unconditionally store the nid in 
the MEM notifier ...

What about extending the existing memory notifier instead?

That is, we add

MEM_NODE_BECOMING_MEM_AWARE ... and trigger it using the same notifier 
chain. We only have to make sure these new events will be properly 
filtered out (IIRC, for most we do that already).

Of course, the range will not apply to these events, but the nid would 
apply to all.

Just a thought.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mm,memory_hotplug: Implement numa node notifier
  2025-04-04 10:09   ` David Hildenbrand
@ 2025-04-04 12:56     ` Oscar Salvador
  2025-04-04 13:14       ` David Hildenbrand
  0 siblings, 1 reply; 22+ messages in thread
From: Oscar Salvador @ 2025-04-04 12:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, linux-mm, linux-kernel, Vlastimil Babka,
	Hyeonggon Yoo, mkoutny, Dan Williams, Jonathan Cameron

On Fri, Apr 04, 2025 at 12:09:39PM +0200, David Hildenbrand wrote:
> Assuming we can remove the _normal stuff and we can do what we do in patch
> #2 here already ... meaning we unconditionally store the nid in the MEM
> notifier ...
> 
> What about extending the existing memory notifier instead?
> 
> That is, we add
> 
> MEM_NODE_BECOMING_MEM_AWARE ... and trigger it using the same notifier
> chain. We only have to make sure these new events will be properly filtered
> out (IIRC, for most we do that already).
> 
> Of course, the range will not apply to these events, but the nid would apply
> to all.

But that would defeat the purpose of having a diferent notifier for
those only interested in node changes, which is not having to bother at
all with unrelated notifications.

Yes, the handling would be simpler than it is now, but honestly I still see value
in having them both decoupled from eacher other, I think it is cleaner
and expresses in a more clear way the change of what the consumer is interested in
get notified for.

Unless there is a strong objection, I would pursue that path, getting
rid of the _normal stuff for slub along the way.

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] mm,memory_hotplug: Implement numa node notifier
  2025-04-04 12:56     ` Oscar Salvador
@ 2025-04-04 13:14       ` David Hildenbrand
  0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2025-04-04 13:14 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, linux-mm, linux-kernel, Vlastimil Babka,
	Hyeonggon Yoo, mkoutny, Dan Williams, Jonathan Cameron

On 04.04.25 14:56, Oscar Salvador wrote:
> On Fri, Apr 04, 2025 at 12:09:39PM +0200, David Hildenbrand wrote:
>> Assuming we can remove the _normal stuff and we can do what we do in patch
>> #2 here already ... meaning we unconditionally store the nid in the MEM
>> notifier ...
>>
>> What about extending the existing memory notifier instead?
>>
>> That is, we add
>>
>> MEM_NODE_BECOMING_MEM_AWARE ... and trigger it using the same notifier
>> chain. We only have to make sure these new events will be properly filtered
>> out (IIRC, for most we do that already).
>>
>> Of course, the range will not apply to these events, but the nid would apply
>> to all.
> 
> But that would defeat the purpose of having a diferent notifier for
> those only interested in node changes, which is not having to bother at
> all with unrelated notifications.
 > > Yes, the handling would be simpler than it is now, but honestly I 
still see value
> in having them both decoupled from eacher other, I think it is cleaner
> and expresses in a more clear way the change of what the consumer is interested in
> get notified for.

No strong opinion, just a thought.

> 
> Unless there is a strong objection, I would pursue that path, getting
> rid of the _normal stuff for slub along the way.

Yeah, getting rid of _normal and just have a "nid" will make things a 
lot cleaner.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-04-04 13:15 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-01  9:27 [PATCH 0/2] Implement numa node notifier Oscar Salvador
2025-04-01  9:27 ` [PATCH 1/2] mm,memory_hotplug: " Oscar Salvador
2025-04-01 14:19   ` Harry Yoo
2025-04-02 16:03   ` Vlastimil Babka
2025-04-02 16:57     ` Oscar Salvador
2025-04-03 12:44   ` Jonathan Cameron
2025-04-04 10:09   ` David Hildenbrand
2025-04-04 12:56     ` Oscar Salvador
2025-04-04 13:14       ` David Hildenbrand
2025-04-01  9:27 ` [PATCH 2/2] mm,memory_hotplug: Replace status_change_nid parameter in memory_notify Oscar Salvador
2025-04-02  2:53   ` Harry Yoo
2025-04-02 16:09   ` Vlastimil Babka
2025-04-02 16:06 ` [PATCH 0/2] Implement numa node notifier Vlastimil Babka
2025-04-02 17:03   ` Oscar Salvador
2025-04-03 13:02     ` David Hildenbrand
2025-04-03 13:08       ` David Hildenbrand
2025-04-03 13:57         ` Harry Yoo
2025-04-04  8:47         ` Vlastimil Babka
2025-04-03 22:06       ` Harry Yoo
2025-04-04  8:50         ` Vlastimil Babka
2025-04-04 10:02           ` Harry Yoo
2025-04-03 12:29 ` Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox