linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Harry Yoo <harry.yoo@oracle.com>
To: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Vlastimil Babka <vbabka@suse.cz>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	linux-cxl@vger.kernel.org
Subject: Re: [PATCH v2 1/3] mm,slub: Do not special case N_NORMAL nodes for slab_nodes
Date: Tue, 8 Apr 2025 23:18:38 +0900	[thread overview]
Message-ID: <Z_UwPmyxyu8YNLG_@harry> (raw)
In-Reply-To: <92ff4f7f-90d2-48ab-8f7d-7fc3485276b5@redhat.com>

On Tue, Apr 08, 2025 at 12:17:52PM +0200, David Hildenbrand wrote:
> On 08.04.25 10:41, Oscar Salvador wrote:
> > Currently, slab_mem_going_going_callback() checks whether the node has
> > N_NORMAL memory in order to be set in slab_nodes.
> > While it is true that gettind rid of that enforcing would mean
> > ending up with movables nodes in slab_nodes, the memory waste that comes
> > with that is negligible.
> > 
> > So stop checking for status_change_nid_normal and just use status_change_nid
> > instead which works for both types of memory.
> > 
> > Also, once we allocate the kmem_cache_node cache  for the node in
> > slab_mem_online_callback(), we never deallocate it in
> > slab_mem_off_callback() when the node goes memoryless, so we can just
> > get rid of it.
> > 
> > The only side effect is that we will stop clearing the node from slab_nodes.
> > 
> 
> Feel free to add a Suggested-by: if you think it applies.
> 
> 
> Do we have to take care of the N_NORMAL_MEMORY check in kmem_cache_init() ? Likely it
> would have to be a N_MEMORY check.
> 
> 
> But, I was wondering if we could get rid of the "slab_nodes" thingy as a first step?

The following commit says that SLUB has slab_nodes thingy for a reason...
kmem_cache_node might not be ready yet even when N_NORMAL_MEMORY check
says it now has normal memory.

@Vlastimil maybe a dumb question but why not check s->node[nid]
instead of having slab_nodes bitmask?

commit 7e1fa93deff44677a94dfc323ff629bbf5cf9360
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Wed Feb 24 12:01:12 2021 -0800

    mm, slab, slub: stop taking memory hotplug lock
    
    Since commit 03afc0e25f7f ("slab: get_online_mems for
    kmem_cache_{create,destroy,shrink}") we are taking memory hotplug lock for
    SLAB and SLUB when creating, destroying or shrinking a cache.  It is quite
    a heavy lock and it's best to avoid it if possible, as we had several
    issues with lockdep complaining about ordering in the past, see e.g.
    e4f8e513c3d3 ("mm/slub: fix a deadlock in show_slab_objects()").
    
    The problem scenario in 03afc0e25f7f (solved by the memory hotplug lock)
    can be summarized as follows: while there's slab_mutex synchronizing new
    kmem cache creation and SLUB's MEM_GOING_ONLINE callback
    slab_mem_going_online_callback(), we may miss creation of kmem_cache_node
    for the hotplugged node in the new kmem cache, because the hotplug
    callback doesn't yet see the new cache, and cache creation in
    init_kmem_cache_nodes() only inits kmem_cache_node for nodes in the
    N_NORMAL_MEMORY nodemask, which however may not yet include the new node,
    as that happens only later after the MEM_GOING_ONLINE callback.
    
    Instead of using get/put_online_mems(), the problem can be solved by SLUB
    maintaining its own nodemask of nodes for which it has allocated the
    per-node kmem_cache_node structures.  This nodemask would generally mirror
    the N_NORMAL_MEMORY nodemask, but would be updated only in under SLUB's
    control in its memory hotplug callbacks under the slab_mutex.  This patch
    adds such nodemask and its handling.
    
    Commit 03afc0e25f7f mentiones "issues like [the one above]", but there
    don't appear to be further issues.  All the paths (shared for SLAB and
    SLUB) taking the memory hotplug locks are also taking the slab_mutex,
    except kmem_cache_shrink() where 03afc0e25f7f replaced slab_mutex with
    get/put_online_mems().
    
    We however cannot simply restore slab_mutex in kmem_cache_shrink(), as
    SLUB can enters the function from a write to sysfs 'shrink' file, thus
    holding kernfs lock, and in kmem_cache_create() the kernfs lock is nested
    within slab_mutex.  But on closer inspection we don't actually need to
    protect kmem_cache_shrink() from hotplug callbacks: While SLUB's
    __kmem_cache_shrink() does for_each_kmem_cache_node(), missing a new node
    added in parallel hotplug is not fatal, and parallel hotremove does not
    free kmem_cache_node's anymore after the previous patch, so use-after free
    cannot happen.  The per-node shrinking itself is protected by
    n->list_lock.  Same is true for SLAB, and SLOB is no-op.
    
    SLAB also doesn't need the memory hotplug locking, which it only gained by
    03afc0e25f7f through the shared paths in slab_common.c.  Its memory
    hotplug callbacks are also protected by slab_mutex against races with
    these paths.  The problem of SLUB relying on N_NORMAL_MEMORY doesn't apply
    to SLAB, as its setup_kmem_cache_nodes relies on N_ONLINE, and the new
    node is already set there during the MEM_GOING_ONLINE callback, so no
    special care is needed for SLAB.
    
    As such, this patch removes all get/put_online_mems() usage by the slab
    subsystem.
    
    Link: https://lkml.kernel.org/r/20210113131634.3671-3-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: Qian Cai <cai@redhat.com>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

> 
> From 518a2b83a9c5bd85d74ddabbc36ce5d181a88ed6 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@redhat.com>
> Date: Tue, 8 Apr 2025 12:16:13 +0200
> Subject: [PATCH] tmp
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/slub.c | 56 ++++---------------------------------------------------
>  1 file changed, 4 insertions(+), 52 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index b46f87662e71d..afe31149e7f4e 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -445,14 +445,6 @@ static inline struct kmem_cache_node *get_node(struct kmem_cache *s, int node)
>  	for (__node = 0; __node < nr_node_ids; __node++) \
>  		 if ((__n = get_node(__s, __node)))
> -/*
> - * Tracks for which NUMA nodes we have kmem_cache_nodes allocated.
> - * Corresponds to node_state[N_NORMAL_MEMORY], but can temporarily
> - * differ during memory hotplug/hotremove operations.
> - * Protected by slab_mutex.
> - */
> -static nodemask_t slab_nodes;
> -
>  #ifndef CONFIG_SLUB_TINY
>  /*
>   * Workqueue used for flush_cpu_slab().
> @@ -3706,10 +3698,9 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  	if (!slab) {
>  		/*
>  		 * if the node is not online or has no normal memory, just
> -		 * ignore the node constraint
> +		 * ignore the node constraint.
>  		 */
> -		if (unlikely(node != NUMA_NO_NODE &&
> -			     !node_isset(node, slab_nodes)))
> +		if (unlikely(node != NUMA_NO_NODE && !node_state(node, N_NORMAL_MEMORY)))
>  			node = NUMA_NO_NODE;
>  		goto new_slab;
>  	}
> @@ -3719,7 +3710,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  		 * same as above but node_match() being false already
>  		 * implies node != NUMA_NO_NODE
>  		 */
> -		if (!node_isset(node, slab_nodes)) {
> +		if (!node_state(node, N_NORMAL_MEMORY)) {
>  			node = NUMA_NO_NODE;
>  		} else {
>  			stat(s, ALLOC_NODE_MISMATCH);
> @@ -5623,7 +5614,7 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
>  {
>  	int node;
> -	for_each_node_mask(node, slab_nodes) {
> +	for_each_node_state(node, N_NORMAL_MEMORY) {
>  		struct kmem_cache_node *n;
>  		if (slab_state == DOWN) {
> @@ -6164,30 +6155,6 @@ static int slab_mem_going_offline_callback(void *arg)
>  	return 0;
>  }
> -static void slab_mem_offline_callback(void *arg)
> -{
> -	struct memory_notify *marg = arg;
> -	int offline_node;
> -
> -	offline_node = marg->status_change_nid_normal;
> -
> -	/*
> -	 * If the node still has available memory. we need kmem_cache_node
> -	 * for it yet.
> -	 */
> -	if (offline_node < 0)
> -		return;
> -
> -	mutex_lock(&slab_mutex);
> -	node_clear(offline_node, slab_nodes);
> -	/*
> -	 * We no longer free kmem_cache_node structures here, as it would be
> -	 * racy with all get_node() users, and infeasible to protect them with
> -	 * slab_mutex.
> -	 */
> -	mutex_unlock(&slab_mutex);
> -}
> -
>  static int slab_mem_going_online_callback(void *arg)
>  {
>  	struct kmem_cache_node *n;
> @@ -6229,11 +6196,6 @@ static int slab_mem_going_online_callback(void *arg)
>  		init_kmem_cache_node(n);
>  		s->node[nid] = n;
>  	}
> -	/*
> -	 * Any cache created after this point will also have kmem_cache_node
> -	 * initialized for the new node.
> -	 */
> -	node_set(nid, slab_nodes);
>  out:
>  	mutex_unlock(&slab_mutex);
>  	return ret;
> @@ -6253,8 +6215,6 @@ static int slab_memory_callback(struct notifier_block *self,
>  		break;
>  	case MEM_OFFLINE:
>  	case MEM_CANCEL_ONLINE:
> -		slab_mem_offline_callback(arg);
> -		break;
>  	case MEM_ONLINE:
>  	case MEM_CANCEL_OFFLINE:
>  		break;
> @@ -6309,7 +6269,6 @@ void __init kmem_cache_init(void)
>  {
>  	static __initdata struct kmem_cache boot_kmem_cache,
>  		boot_kmem_cache_node;
> -	int node;
>  	if (debug_guardpage_minorder())
>  		slub_max_order = 0;
> @@ -6321,13 +6280,6 @@ void __init kmem_cache_init(void)
>  	kmem_cache_node = &boot_kmem_cache_node;
>  	kmem_cache = &boot_kmem_cache;
> -	/*
> -	 * Initialize the nodemask for which we will allocate per node
> -	 * structures. Here we don't need taking slab_mutex yet.
> -	 */
> -	for_each_node_state(node, N_NORMAL_MEMORY)
> -		node_set(node, slab_nodes);
> -
>  	create_boot_cache(kmem_cache_node, "kmem_cache_node",
>  			sizeof(struct kmem_cache_node),
>  			SLAB_HWCACHE_ALIGN | SLAB_NO_OBJ_EXT, 0, 0);
> -- 
> 2.48.1
> 
> 
> Not sure if there are any races to consider ... just an idea.
> 
> -- 
> Cheers,
> 
> David / dhildenb
> 

-- 
Cheers,
Harry (formerly known as Hyeonggon)


  parent reply	other threads:[~2025-04-08 14:19 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-08  8:41 [PATCH v2 0/3] Implement numa node notifier Oscar Salvador
2025-04-08  8:41 ` [PATCH v2 1/3] mm,slub: Do not special case N_NORMAL nodes for slab_nodes Oscar Salvador
2025-04-08 10:17   ` David Hildenbrand
2025-04-08 12:49     ` Oscar Salvador
2025-04-08 15:15       ` Harry Yoo
2025-04-08 14:18     ` Harry Yoo [this message]
2025-04-08 14:25       ` David Hildenbrand
2025-04-08 14:54         ` Harry Yoo
2025-04-08 17:55         ` Vlastimil Babka
2025-04-08 18:18           ` David Hildenbrand
2025-04-30  8:47             ` Oscar Salvador
2025-04-30  8:57               ` Vlastimil Babka
2025-04-30  9:02                 ` David Hildenbrand
2025-04-08  8:41 ` [PATCH v2 2/3] mm,memory_hotplug: Implement numa node notifier Oscar Salvador
2025-04-09 13:44   ` kernel test robot
2025-04-09 16:58     ` Oscar Salvador
2025-04-08  8:41 ` [PATCH v2 3/3] mm,memory_hotplug: Rename status_change_nid parameter in memory_notify Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z_UwPmyxyu8YNLG_@harry \
    --to=harry.yoo@oracle.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox