linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* 2.6.18-rc3-mm2: rcu radix tree patches break page migration
@ 2006-08-07 23:10 Christoph Lameter
  2006-08-08  1:24 ` Nick Piggin
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Lameter @ 2006-08-07 23:10 UTC (permalink / raw)
  To: npiggin; +Cc: linux-mm

If I take the following patches out then page migration works reliably 
again. Otherwise page migration may result in weird values in the
page struct. Reproduce by trying to migrate the executable pages
of a running process. This usually creates enough races to break things.
AFAIK the current radix tree rcu patches do not change the behavior
of the tree_lock at all.

radix-tree-rcu-lockless-readside.patch
redo-radix-tree-fixes.patch
adix-tree-rcu-lockless-readside-update.patch
radix-tree-rcu-lockless-readside-semicolon.patch
adix-tree-rcu-lockless-readside-update-tidy.patch
adix-tree-rcu-lockless-readside-fix-2.patch

Output in one failure scenario (after migrating the memory of cron back 
and forth between nodes):

margin:~ # ps ax|grep cron
 3995 ?        Ss     0:00 /usr/sbin/cron
 4256 ttySG0   S+     0:00 grep cron
margin:~ # cat /proc/3995/numa_maps
00000000 default
2000000000000000 default anon=2 dirty=2 N1=2
2000000000200000 default file=/var/run/nscd/passwd dirty=1 mapmax=9 N3=1
2000000800000000 default file=/usr/sbin/cron mapped=5 active=4 N0=2 N1=2 
N2=1
2000000800020000 default file=/usr/sbin/cron anon=1 dirty=1 N0=1
2000000800024000 default file=/lib/ld-2.4.so mapped=7 mapmax=46 N0=7
2000000800068000 default file=/lib/ld-2.4.so anon=2 dirty=2 N0=1 N1=1
2000000800088000 default file=/lib/libpam.so.0.81.2 mapped=1 mapmax=5 N0=1
20000008000a0000 default file=/lib/libpam.so.0.81.2
20000008000ac000 default file=/lib/libpam.so.0.81.2 anon=1 dirty=1 N0=1
20000008000b0000 default anon=1 dirty=1 N0=1
20000008000b4000 default file=/lib/libpam_misc.so.0.81.2 mapped=1 mapmax=2 
N2=1
20000008000b8000 default file=/lib/libpam_misc.so.0.81.2
20000008000c4000 default file=/lib/libpam_misc.so.0.81.2 anon=1 dirty=1 
N0=1
20000008000c8000 default file=/lib/libc-2.4.so mapped=58 mapmax=46 N0=58
2000000800300000 default file=/lib/libc-2.4.so
200000080030c000 default file=/lib/libc-2.4.so anon=2 dirty=2 N0=2
2000000800314000 default anon=1 dirty=1 N0=1
2000000800318000 default file=/lib/libdl-2.4.so mapped=1 mapmax=17 N0=1
2000000800320000 default file=/lib/libdl-2.4.so
200000080032c000 default file=/lib/libdl-2.4.so anon=1 dirty=1 N0=1
2000000800330000 default anon=5 dirty=5 N0=5
607fffff7fffc000 default anon=1 dirty=1 N0=1
607ffffffe58c000 default stack anon=1 dirty=1 N1=1

margin:~ # migratepages 3995 0-2 3
Bad page state in process 'migratepages'
page:a0007ffeafd24fe0 flags:0x000000000009020c mapping:0000000000000000 
mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
margin:~ # kernel BUG at mm/page_alloc.c:308!
events/5[31]: bugcheck! 0 [1]
Modules linked in: autofs4 ipv6 sg

Pid: 31, CPU 5, comm:             events/5
psr : 0000101008522030 ifs : 8000000000000792 ip  : [<a0000001001052e0>]    
Tainted: G    B
ip is at free_pages_bulk+0x480/0x600
unat: 0000000000000000 pfs : 0000000000000792 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 0000000000009981
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001001052e0 b6  : e0000130025cbb10 b7  : a0000001000d0580
f6  : 1003e20c49ba5e353f7cf f7  : 1003e20c49ba5e353f7cf
f8  : 1003e00000000000000a0 f9  : 1003e00000000000004e2
f10 : 1003e000000000fa00000 f11 : 1003e000000003b9aca00
r1  : a000000100d62a50 r2  : 0000000000004000 r3  : e00000b003d59070
r8  : 0000000000000026 r9  : 0000000000000001 r10 : 0000000000000002
r11 : 0000000000000003 r12 : e00000b003d5fcd0 r13 : e00000b003d58000
r14 : e00000b003d59070 r15 : 0000000000000000 r16 : 0000000000004000
r17 : e00000b079377de8 r18 : 3f00000000000000 r19 : 3f00000000000000
r20 : ffffffffffff4230 r21 : e000013003040000 r22 : ffffffffffff0028
r23 : a000000100b63290 r24 : a000000100b62e60 r25 : 0000000000000001
r26 : a000000100970e84 r27 : e000013003034230 r28 : e000013003040000
r29 : a000000100b63290 r30 : a000000100b62e60 r31 : 80000001fdc00000

Call Trace:
 [<a000000100012f60>] show_stack+0x40/0xa0
                                sp=e00000b003d5f840 bsp=e00000b003d593c0
 [<a000000100013790>] show_regs+0x7d0/0x800
                                sp=e00000b003d5fa10 bsp=e00000b003d59378
 [<a000000100033990>] die+0x230/0x300
                                sp=e00000b003d5fa10 bsp=e00000b003d59330
 [<a000000100033aa0>] die_if_kernel+0x40/0x60
                                sp=e00000b003d5fa30 bsp=e00000b003d59300
 [<a000000100034ec0>] ia64_bad_break+0x220/0x460
                                sp=e00000b003d5fa30 bsp=e00000b003d592d8
 [<a00000010000bb20>] ia64_leave_kernel+0x0/0x290
                                sp=e00000b003d5fb00 bsp=e00000b003d592d8
 [<a0000001001052e0>] free_pages_bulk+0x480/0x600
                                sp=e00000b003d5fcd0 bsp=e00000b003d59248
 [<a000000100105a40>] drain_node_pages+0xe0/0x180
                                sp=e00000b003d5fcd0 bsp=e00000b003d59200
 [<a000000100145970>] cache_reap+0x4d0/0x600
                                sp=e00000b003d5fcd0 bsp=e00000b003d591b0
 [<a0000001000c72d0>] run_workqueue+0x1b0/0x280
                                sp=e00000b003d5fce0 bsp=e00000b003d59168
 [<a0000001000c7560>] worker_thread+0x1c0/0x240
                                sp=e00000b003d5fce0 bsp=e00000b003d59128
 [<a0000001000d0260>] kthread+0x240/0x2c0
                                sp=e00000b003d5fd50 bsp=e00000b003d590e8
 [<a0000001000114a0>] kernel_thread_helper+0xe0/0x100
                                sp=e00000b003d5fe30 bsp=e00000b003d590c0
 [<a000000100009140>] start_kernel_thread+0x20/0x40
                                sp=e00000b003d5fe30 bsp=e00000b003d590c0
 <6>note: events/5[31] exited with preempt_count 1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.18-rc3-mm2: rcu radix tree patches break page migration
  2006-08-07 23:10 2.6.18-rc3-mm2: rcu radix tree patches break page migration Christoph Lameter
@ 2006-08-08  1:24 ` Nick Piggin
  2006-08-08  3:42   ` Christoph Lameter
  0 siblings, 1 reply; 9+ messages in thread
From: Nick Piggin @ 2006-08-08  1:24 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: npiggin, linux-mm, Lee Schermerhorn

Christoph Lameter wrote:
> If I take the following patches out then page migration works reliably 
> again. Otherwise page migration may result in weird values in the
> page struct. Reproduce by trying to migrate the executable pages
> of a running process. This usually creates enough races to break things.
> AFAIK the current radix tree rcu patches do not change the behavior
> of the tree_lock at all.
> 
> radix-tree-rcu-lockless-readside.patch
> redo-radix-tree-fixes.patch
> adix-tree-rcu-lockless-readside-update.patch
> radix-tree-rcu-lockless-readside-semicolon.patch
> adix-tree-rcu-lockless-readside-update-tidy.patch
> adix-tree-rcu-lockless-readside-fix-2.patch
> 
> Output in one failure scenario (after migrating the memory of cron back 
> and forth between nodes):

Yeah Lee noticed this as well...

Question: can you replace the lookup_slot with a regular lookup, then
replace the pointer switch with a radix_tree_delete + radix_tree_insert
and see if that works?

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.18-rc3-mm2: rcu radix tree patches break page migration
  2006-08-08  1:24 ` Nick Piggin
@ 2006-08-08  3:42   ` Christoph Lameter
  2006-08-08  5:45     ` Nick Piggin
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Lameter @ 2006-08-08  3:42 UTC (permalink / raw)
  To: Nick Piggin; +Cc: npiggin, linux-mm, Lee Schermerhorn

On Tue, 8 Aug 2006, Nick Piggin wrote:

> Question: can you replace the lookup_slot with a regular lookup, then
> replace the pointer switch with a radix_tree_delete + radix_tree_insert
> and see if that works?

Ahh... Okay that makes things work the right way.

Does that mean we need to get rid of radix tree replaces in 
general?

Patch:

Index: linux-2.6.18-rc3-mm2/mm/migrate.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/mm/migrate.c	2006-08-07 20:21:12.985022791 -0700
+++ linux-2.6.18-rc3-mm2/mm/migrate.c	2006-08-07 20:25:28.676221751 -0700
@@ -294,7 +294,8 @@ out:
 static int migrate_page_move_mapping(struct address_space *mapping,
 		struct page *newpage, struct page *page)
 {
-	struct page **radix_pointer;
+	struct page *radix_pointer;
+	long index;
 
 	if (!mapping) {
 		/* Anonymous page */
@@ -305,12 +306,14 @@ static int migrate_page_move_mapping(str
 
 	write_lock_irq(&mapping->tree_lock);
 
-	radix_pointer = (struct page **)radix_tree_lookup_slot(
+	index = page_index(page);
+
+	radix_pointer = (struct page *)radix_tree_lookup(
 						&mapping->page_tree,
-						page_index(page));
+						index);
 
 	if (page_count(page) != 2 + !!PagePrivate(page) ||
-			radix_tree_deref_slot(radix_pointer) != page) {
+			radix_pointer != page) {
 		write_unlock_irq(&mapping->tree_lock);
 		return -EAGAIN;
 	}
@@ -326,7 +329,8 @@ static int migrate_page_move_mapping(str
 	}
 #endif
 
-	radix_tree_replace_slot(radix_pointer, newpage);
+	radix_tree_delete(&mapping->page_tree, index);
+	radix_tree_insert(&mapping->page_tree, index, newpage);
 	__put_page(page);
 	write_unlock_irq(&mapping->tree_lock);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.18-rc3-mm2: rcu radix tree patches break page migration
  2006-08-08  3:42   ` Christoph Lameter
@ 2006-08-08  5:45     ` Nick Piggin
  2006-08-08 14:53       ` Lee Schermerhorn
  2006-08-08 16:19       ` Christoph Lameter
  0 siblings, 2 replies; 9+ messages in thread
From: Nick Piggin @ 2006-08-08  5:45 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: npiggin, linux-mm, Lee Schermerhorn

Christoph Lameter wrote:

>On Tue, 8 Aug 2006, Nick Piggin wrote:
>
>
>>Question: can you replace the lookup_slot with a regular lookup, then
>>replace the pointer switch with a radix_tree_delete + radix_tree_insert
>>and see if that works?
>>
>
>Ahh... Okay that makes things work the right way.
>
>Does that mean we need to get rid of radix tree replaces in 
>general?
>

I think it just means that my lookup_slot has a bug somewhere. Also: good
to know that I'm not corrupting anyones pagecache (except yours, and Lee's).

Let me work out what I'm doing wrong. In the meantime if you could send
that patch to akpm as a fixup, that would keep you running. Thanks guys.

Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.18-rc3-mm2: rcu radix tree patches break page migration
  2006-08-08  5:45     ` Nick Piggin
@ 2006-08-08 14:53       ` Lee Schermerhorn
  2006-08-08 16:19       ` Christoph Lameter
  1 sibling, 0 replies; 9+ messages in thread
From: Lee Schermerhorn @ 2006-08-08 14:53 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Christoph Lameter, npiggin, linux-mm

On Tue, 2006-08-08 at 15:45 +1000, Nick Piggin wrote:
> Christoph Lameter wrote:
> 
> >On Tue, 8 Aug 2006, Nick Piggin wrote:
> >
> >
> >>Question: can you replace the lookup_slot with a regular lookup, then
> >>replace the pointer switch with a radix_tree_delete + radix_tree_insert
> >>and see if that works?
> >>
> >
> >Ahh... Okay that makes things work the right way.
> >
> >Does that mean we need to get rid of radix tree replaces in 
> >general?
> >
> 
> I think it just means that my lookup_slot has a bug somewhere. Also: good
> to know that I'm not corrupting anyones pagecache (except yours, and Lee's).

I saw this under heavy "auto-migration" of just anon pages during
parallel kernel builds:  "make -j32" on a 16cpu/4node numa system.  

The symptom I saw was when two tasks raced in do_swap_page to refault a
page that I had forcibly pushed to the swap cache [but still resident in
memory] to allow possible migrate on fault.  The loser of the race would
wait behind the page lock, holding a reference handed out by
find_get_page().  When it awakened, after the winner had migrated the
page, it would see that the page had been migrated out from under it,
release the ref and retry the lookup.  When it released the reference,
the count would go to zero, and the page would be freed.  It would
ultimately hit a BUG_ON in list_del() called from free_pages_bulk():
entry->prev->next != entry.

Now that the mbind() failure seems fixed, I'm rebasing my patches.  I
think I'll be able to reproduce this fairly regularly.

> 
> Let me work out what I'm doing wrong. In the meantime if you could send
> that patch to akpm as a fixup, that would keep you running. Thanks guys.

I'm willing to test anything you come up with.  I'll take a look at the
lookup, as well, code once I can reproduce the failure.

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.18-rc3-mm2: rcu radix tree patches break page migration
  2006-08-08  5:45     ` Nick Piggin
  2006-08-08 14:53       ` Lee Schermerhorn
@ 2006-08-08 16:19       ` Christoph Lameter
  2006-08-08 17:25         ` Andrew Morton
  1 sibling, 1 reply; 9+ messages in thread
From: Christoph Lameter @ 2006-08-08 16:19 UTC (permalink / raw)
  To: akpm; +Cc: Nick Piggin, linux-mm, Lee Schermerhorn

On Tue, 8 Aug 2006, Nick Piggin wrote:

> Let me work out what I'm doing wrong. In the meantime if you could send
> that patch to akpm as a fixup, that would keep you running. Thanks guys.


page migration: replace radix_tree_lookup_slot with radix_tree_lockup

The radix tree rcu code runs into trouble when we use radix_tree_lookup
slot and use the slot to update the page reference. Use radix_tree_lookup
instead and update the page reference via radix_tree_delete and
radix_tree_insert.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.18-rc3-mm2/mm/migrate.c
===================================================================
--- linux-2.6.18-rc3-mm2.orig/mm/migrate.c	2006-08-07 20:21:12.985022791 -0700
+++ linux-2.6.18-rc3-mm2/mm/migrate.c	2006-08-08 09:15:29.352637207 -0700
@@ -294,7 +294,8 @@ out:
 static int migrate_page_move_mapping(struct address_space *mapping,
 		struct page *newpage, struct page *page)
 {
-	struct page **radix_pointer;
+	struct page *current_page;
+	long index;
 
 	if (!mapping) {
 		/* Anonymous page */
@@ -305,12 +306,14 @@ static int migrate_page_move_mapping(str
 
 	write_lock_irq(&mapping->tree_lock);
 
-	radix_pointer = (struct page **)radix_tree_lookup_slot(
+	index = page_index(page);
+
+	current_page = (struct page *)radix_tree_lookup(
 						&mapping->page_tree,
-						page_index(page));
+						index);
 
 	if (page_count(page) != 2 + !!PagePrivate(page) ||
-			radix_tree_deref_slot(radix_pointer) != page) {
+			current_page != page) {
 		write_unlock_irq(&mapping->tree_lock);
 		return -EAGAIN;
 	}
@@ -326,7 +329,8 @@ static int migrate_page_move_mapping(str
 	}
 #endif
 
-	radix_tree_replace_slot(radix_pointer, newpage);
+	radix_tree_delete(&mapping->page_tree, index);
+	radix_tree_insert(&mapping->page_tree, index, newpage);
 	__put_page(page);
 	write_unlock_irq(&mapping->tree_lock);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.18-rc3-mm2: rcu radix tree patches break page migration
  2006-08-08 16:19       ` Christoph Lameter
@ 2006-08-08 17:25         ` Andrew Morton
  2006-08-08 17:52           ` Christoph Lameter
  2006-08-09  1:39           ` Nick Piggin
  0 siblings, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2006-08-08 17:25 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Nick Piggin, linux-mm, Lee Schermerhorn

On Tue, 8 Aug 2006 09:19:08 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> The radix tree rcu code runs into trouble when we use radix_tree_lookup
> slot and use the slot to update the page reference.

"trouble"?  Do we know what it is?  What are the implications of this for
the rcu radix-tree patches?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.18-rc3-mm2: rcu radix tree patches break page migration
  2006-08-08 17:25         ` Andrew Morton
@ 2006-08-08 17:52           ` Christoph Lameter
  2006-08-09  1:39           ` Nick Piggin
  1 sibling, 0 replies; 9+ messages in thread
From: Christoph Lameter @ 2006-08-08 17:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, linux-mm, Lee Schermerhorn

On Tue, 8 Aug 2006, Andrew Morton wrote:

> On Tue, 8 Aug 2006 09:19:08 -0700 (PDT)
> Christoph Lameter <clameter@sgi.com> wrote:
> 
> > The radix tree rcu code runs into trouble when we use radix_tree_lookup
> > slot and use the slot to update the page reference.
> 
> "trouble"?  Do we know what it is?  What are the implications of this for
> the rcu radix-tree patches?

Nick is pondering on the reason for this and requested a patch until he 
figures it out.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.18-rc3-mm2: rcu radix tree patches break page migration
  2006-08-08 17:25         ` Andrew Morton
  2006-08-08 17:52           ` Christoph Lameter
@ 2006-08-09  1:39           ` Nick Piggin
  1 sibling, 0 replies; 9+ messages in thread
From: Nick Piggin @ 2006-08-09  1:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Lameter, linux-mm, Lee Schermerhorn

Andrew Morton wrote:

>On Tue, 8 Aug 2006 09:19:08 -0700 (PDT)
>Christoph Lameter <clameter@sgi.com> wrote:
>
>
>>The radix tree rcu code runs into trouble when we use radix_tree_lookup
>>slot and use the slot to update the page reference.
>>
>
>"trouble"?  Do we know what it is?  What are the implications of this for
>the rcu radix-tree patches?
>

I must have broken lookup_slot (luckily regular pagecache ops are not
affected). I'm looking at it...

--

Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-08-09  1:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-08-07 23:10 2.6.18-rc3-mm2: rcu radix tree patches break page migration Christoph Lameter
2006-08-08  1:24 ` Nick Piggin
2006-08-08  3:42   ` Christoph Lameter
2006-08-08  5:45     ` Nick Piggin
2006-08-08 14:53       ` Lee Schermerhorn
2006-08-08 16:19       ` Christoph Lameter
2006-08-08 17:25         ` Andrew Morton
2006-08-08 17:52           ` Christoph Lameter
2006-08-09  1:39           ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox