* [RFC/Patch](memory hotplug) fix null pointer access of kmem_cache_node after memory hotplug
@ 2007-09-18 12:33 Yasunori Goto
2007-09-18 19:05 ` Christoph Lameter
0 siblings, 1 reply; 5+ messages in thread
From: Yasunori Goto @ 2007-09-18 12:33 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm
Hi Cristoph-san.
I found panic occuring after memory hot-add on 2.6.23-rc6-mm1 yet.
Its cause was null pointer access to kmem_cache_node of SLUB at
discard_slab().
In my understanding, it should be created for all slubs after
memory-less-node(or new node) gets new memory. But, current -mm doen't it.
This patch fix for it.
In this patch, it is created after that new_slab is allocated from
new onlined memory.
If kmem_cache_node is created at online_pages() of memory hot-add,
it should be done before build_zonelist to avoid race condition.
But, it means kmem_cache_node must be allocated on other old nodes
due not to complete initialization.
I think this "delay creation" fix is better way than it.
I know that failure case of kmem_cache_alloc_node() must be written
and the prototype of init_kmem_cache_node() here is not good.
Just I would like to confirm that I don't overlook something about SLUB.
Bye.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
---
mm/slub.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
Index: current/mm/slub.c
===================================================================
--- current.orig/mm/slub.c 2007-09-18 19:46:33.000000000 +0900
+++ current/mm/slub.c 2007-09-18 19:46:59.000000000 +0900
@@ -1081,6 +1081,7 @@ static void setup_object(struct kmem_cac
s->ctor(s, object);
}
+static void init_kmem_cache_node(struct kmem_cache_node *n);
static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
{
struct page *page;
@@ -1089,6 +1090,7 @@ static struct page *new_slab(struct kmem
void *end;
void *last;
void *p;
+ int page_nid;
BUG_ON(flags & GFP_SLAB_BUG_MASK);
@@ -1097,9 +1099,20 @@ static struct page *new_slab(struct kmem
if (!page)
goto out;
- n = get_node(s, page_to_nid(page));
+ page_nid = page_to_nid(page);
+ n = get_node(s, page_nid);
if (n)
atomic_long_inc(&n->nr_slabs);
+ else if (node_state(page_nid, N_HIGH_MEMORY) && s != kmalloc_caches) {
+ /*
+ * If new memory is onlined on new(or memory less) node,
+ * this will happen. (Second comparison is to avoid eternal
+ * recursion.)
+ */
+ n = kmem_cache_alloc_node(kmalloc_caches, GFP_KERNEL, page_nid);
+ init_kmem_cache_node(n);
+ s->node[page_nid] = n;
+ }
page->slab = s;
page->flags |= 1 << PG_slab;
if (s->flags & (SLAB_DEBUG_FREE | SLAB_RED_ZONE | SLAB_POISON |
--
Yasunori Goto
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC/Patch](memory hotplug) fix null pointer access of kmem_cache_node after memory hotplug
2007-09-18 12:33 [RFC/Patch](memory hotplug) fix null pointer access of kmem_cache_node after memory hotplug Yasunori Goto
@ 2007-09-18 19:05 ` Christoph Lameter
2007-09-19 2:12 ` Yasunori Goto
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Lameter @ 2007-09-18 19:05 UTC (permalink / raw)
To: Yasunori Goto; +Cc: linux-mm
On Tue, 18 Sep 2007, Yasunori Goto wrote:
> Its cause was null pointer access to kmem_cache_node of SLUB at
> discard_slab().
> In my understanding, it should be created for all slubs after
> memory-less-node(or new node) gets new memory. But, current -mm doen't it.
> This patch fix for it.
Right. Isnt there a notifier chain that can be used to create the missing
node structure?
> If kmem_cache_node is created at online_pages() of memory hot-add,
> it should be done before build_zonelist to avoid race condition.
> But, it means kmem_cache_node must be allocated on other old nodes
> due not to complete initialization.
Why before build_zonelist? The regular slab bootstrap occurs after
zonelist creation.
> I think this "delay creation" fix is better way than it.
Looks like this is a way to on demand node structure creation?
> I know that failure case of kmem_cache_alloc_node() must be written
> and the prototype of init_kmem_cache_node() here is not good.
> Just I would like to confirm that I don't overlook something about SLUB.
Could be okay. I would feel better if we always had a per node structure
for each available node on the node that it covers.
> + else if (node_state(page_nid, N_HIGH_MEMORY) && s != kmalloc_caches) {
> + /*
> + * If new memory is onlined on new(or memory less) node,
> + * this will happen. (Second comparison is to avoid eternal
> + * recursion.)
> + */
For memoryless nodes this function will return NULL which will cause
fallback. It looks like we are not going into this branch because in that
case N_HIGH_MEMORY will not be set for the node.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC/Patch](memory hotplug) fix null pointer access of kmem_cache_node after memory hotplug
2007-09-18 19:05 ` Christoph Lameter
@ 2007-09-19 2:12 ` Yasunori Goto
2007-09-19 17:23 ` Christoph Lameter
0 siblings, 1 reply; 5+ messages in thread
From: Yasunori Goto @ 2007-09-19 2:12 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm
> On Tue, 18 Sep 2007, Yasunori Goto wrote:
>
> > Its cause was null pointer access to kmem_cache_node of SLUB at
> > discard_slab().
> > In my understanding, it should be created for all slubs after
> > memory-less-node(or new node) gets new memory. But, current -mm doen't it.
> > This patch fix for it.
>
> Right. Isnt there a notifier chain that can be used to create the missing
> node structure?
Yes, there is. Though nothing uses it so far....
> > If kmem_cache_node is created at online_pages() of memory hot-add,
> > it should be done before build_zonelist to avoid race condition.
> > But, it means kmem_cache_node must be allocated on other old nodes
> > due not to complete initialization.
>
> Why before build_zonelist? The regular slab bootstrap occurs after
> zonelist creation.
build_zonelist() is called very early stage of bootstrap, But it is
called final stage of hot-add.
When build_zonelist() is called at hot-add, all kernel module can
use new memory of the node. So, I'm afraid like following worst case.
build_zonelist()
: new_nodes_page = new_slab();
: :
: :
: discard_slab(new_nodes_page)
: (access kmem_cache_node)
:
kmem_cache_node setting,
> > I think this "delay creation" fix is better way than it.
>
> Looks like this is a way to on demand node structure creation?
Yes.
> > I know that failure case of kmem_cache_alloc_node() must be written
> > and the prototype of init_kmem_cache_node() here is not good.
> > Just I would like to confirm that I don't overlook something about SLUB.
>
> Could be okay. I would feel better if we always had a per node structure
> for each available node on the node that it covers.
>
> > + else if (node_state(page_nid, N_HIGH_MEMORY) && s != kmalloc_caches) {
> > + /*
> > + * If new memory is onlined on new(or memory less) node,
> > + * this will happen. (Second comparison is to avoid eternal
> > + * recursion.)
> > + */
>
> For memoryless nodes this function will return NULL which will cause
> fallback. It looks like we are not going into this branch because in that
> case N_HIGH_MEMORY will not be set for the node.
Probably, the comment was wrong.
When a memory less node gets new memory by hot-add,
N_HIGH_MEMORY is set at online_pages(). (It is included in
2.6.23-rc6-mm1). The first comparison is to find it.
Thanks.
--
Yasunori Goto
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC/Patch](memory hotplug) fix null pointer access of kmem_cache_node after memory hotplug
2007-09-19 2:12 ` Yasunori Goto
@ 2007-09-19 17:23 ` Christoph Lameter
2007-09-20 2:06 ` Yasunori Goto
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Lameter @ 2007-09-19 17:23 UTC (permalink / raw)
To: Yasunori Goto; +Cc: linux-mm
On Wed, 19 Sep 2007, Yasunori Goto wrote:
> build_zonelist() is called very early stage of bootstrap, But it is
> called final stage of hot-add.
> When build_zonelist() is called at hot-add, all kernel module can
> use new memory of the node. So, I'm afraid like following worst case.
>
> build_zonelist()
> : new_nodes_page = new_slab();
> : :
> : :
> : discard_slab(new_nodes_page)
> : (access kmem_cache_node)
> :
> kmem_cache_node setting,
So we cannot do this without holding off other kernel accesses since it is
not serialized like bootstrap. Sigh.
> > > I think this "delay creation" fix is better way than it.
> >
> > Looks like this is a way to on demand node structure creation?
>
> Yes.
Could be useful in general if you can make that work reliably. We can just
start out with a single per node structure for the boot node and then add
others on demand?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC/Patch](memory hotplug) fix null pointer access of kmem_cache_node after memory hotplug
2007-09-19 17:23 ` Christoph Lameter
@ 2007-09-20 2:06 ` Yasunori Goto
0 siblings, 0 replies; 5+ messages in thread
From: Yasunori Goto @ 2007-09-20 2:06 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm
> On Wed, 19 Sep 2007, Yasunori Goto wrote:
>
> > build_zonelist() is called very early stage of bootstrap, But it is
> > called final stage of hot-add.
> > When build_zonelist() is called at hot-add, all kernel module can
> > use new memory of the node. So, I'm afraid like following worst case.
> >
> > build_zonelist()
> > : new_nodes_page = new_slab();
> > : :
> > : :
> > : discard_slab(new_nodes_page)
> > : (access kmem_cache_node)
> > :
> > kmem_cache_node setting,
>
> So we cannot do this without holding off other kernel accesses since it is
> not serialized like bootstrap. Sigh.
>
> > > > I think this "delay creation" fix is better way than it.
> > >
> > > Looks like this is a way to on demand node structure creation?
> >
> > Yes.
>
> Could be useful in general if you can make that work reliably. We can just
> start out with a single per node structure for the boot node and then add
> others on demand?
Hmmmmm. I don't think demand node creation can be generic.
Just I would like to fix the panic.
Ok, I'll make a patch which sets kmem_cache_node before
build_zonelist() to fix panic for the present.
And I reconsider about allocation place issue later.
Thanks for your comment.
--
Yasunori Goto
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-09-20 2:06 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-09-18 12:33 [RFC/Patch](memory hotplug) fix null pointer access of kmem_cache_node after memory hotplug Yasunori Goto
2007-09-18 19:05 ` Christoph Lameter
2007-09-19 2:12 ` Yasunori Goto
2007-09-19 17:23 ` Christoph Lameter
2007-09-20 2:06 ` Yasunori Goto
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox