* Re: fix iounmap and a pageattr memleak (x86 and x86-64)
@ 2004-11-02 21:21 Dave Hansen
2004-11-02 22:07 ` Andrea Arcangeli
2004-11-02 22:34 ` Jason Baron
0 siblings, 2 replies; 26+ messages in thread
From: Dave Hansen @ 2004-11-02 21:21 UTC (permalink / raw)
To: andrea; +Cc: linux-mm, linux-kernel, Andi Kleen, Andrew Morton
This patch:
> From: Andrea Arcangeli <andrea@novell.com>
>
> - fix silent memleak in the pageattr code that I found while searching
> for the bug Andi fixed in the second patch below (basically reference
> counting in split page was done on the pmd instead of the pte).
>
> - Part of this patch is also needed to make the above work on x86 (otherwise
> one of my new above BUGS() will trigger signalling the fact a bug was
> there). The below patch creates a subtle dependency that (_PAGE_PCD << 24)
> must not be zero. It's not the cleanest thing ever, but since it's an
> hardware bitflag I doubt it's going to break.
>
> Signed-off-by: Andi Kleen <ak@suse.de>
> Signed-off-by: Andrea Arcangeli <andrea@novell.com>
> Signed-off-by: Andrew Morton <akpm@osdl.org>
> ---
>
> 25-akpm/arch/i386/mm/ioremap.c | 4 ++--
> 25-akpm/arch/i386/mm/pageattr.c | 13 +++++++------
> 25-akpm/arch/x86_64/mm/ioremap.c | 14 +++++++-------
> 25-akpm/arch/x86_64/mm/pageattr.c | 23 ++++++++++++++---------
> 4 files changed, 30 insertions(+), 24 deletions(-)
is hitting this BUG() during bootup:
/* memleak and potential failed 2M page regeneration */
BUG_ON(!page_count(kpte_page));
in 2.6.10-rc1-mm2.
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 511144k/524288k available (1856k kernel code, 12608k reserved,
1186k data, 164k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
------------[ cut here ]------------
kernel BUG at arch/i386/mm/pageattr.c:136!
invalid operand: 0000 [#1]
SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 0
EIP: 0060:[<c0113f48>] Not tainted VLI
EFLAGS: 00010046 (2.6.10-rc1-mm2)
EIP is at __change_page_attr+0x28c/0x358
eax: ffffffff ebx: 017ff163 ecx: 00000000 edx: c10001e0
esi: 00000000 edi: c000fff8 ebp: c10001e0 esp: c03f9d98
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c03f9000 task=c0345b40)
Stack: c102ffe0 00000000 00000000 00000046 c0113ceb c17f9000 c102ff20
c102ff20 017ff163 00000000 017ff163 00000000 c0113ceb c17fe000
c102ffc0 c102ffc0 c1000000 c17ff000 c000fff8 017ff163 00000000
017ff163 00000000 00000000
Call Trace:
[<c0113ceb>] __change_page_attr+0x2f/0x358
[<c0113ceb>] __change_page_attr+0x2f/0x358
[<c011404a>] change_page_attr+0x36/0x54
[<c0114148>] kernel_map_pages+0x30/0x5f
[<c0137d80>] __alloc_pages+0x340/0x350
[<c0137dad>] __get_free_pages+0x1d/0x30
[<c013adfa>] kmem_getpages+0x26/0xd4
[<c013c221>] cache_grow+0xb1/0x150
[<c013c84a>] cache_alloc_refill+0x232/0x280
[<c013ccbe>] kmem_cache_alloc+0x5a/0x78
[<c01d4970>] idr_pre_get+0x1c/0x44
[<c0181b60>] sysfs_fill_super+0x0/0xa4
[<c01572cc>] set_anon_super+0x10/0xb8
[<c0156cff>] sget+0xb3/0x148
[<c0181b60>] sysfs_fill_super+0x0/0xa4
[<c0157636>] get_sb_single+0x26/0x8c
[<c0157608>] compare_single+0x0/0x8
[<c01572bc>] set_anon_super+0x0/0xb8
[<c0181c1d>] sysfs_get_sb+0x19/0x1d
[<c0181b60>] sysfs_fill_super+0x0/0xa4
[<c01576ea>] do_kern_mount+0x4e/0xd0
[<c015777d>] kern_mount+0x11/0x15
[<c0409962>] sysfs_init+0x1e/0x50
[<c0409430>] mnt_init+0xb4/0xc0
[<c040917a>] vfs_caches_init+0x7e/0x94
[<c03fa831>] start_kernel+0x12d/0x150
Code: c7 0f 75 f5 f0 ff 4d 04 eb 08 0f 0b 85 00 88 c1 2d c0 8b 45 00 89
ea f6 c4 80 74 07 8b 55 0c 8d 74 26 00 8b 42 04 83 f8 ff 75 08 <0f> 0b
88 00 88 c1 2d c0 a1 ac 6c 34 c0 a8 08 0f 84 aa 00 00 00
I'm tracking down now to see exactly what's going on. This just a
regular, plain 4-way x86 box with 4GB of RAM. Removing that BUG_ON()
lets me boot just fine.
kpte: c000fff8
address: c17ff000
kpte_page: c10001e0
pgprot_val(prot): 00000163
pgprot_val(PAGE_KERNEL): 00000163
(pte_val(*kpte) & _PAGE_PSE): 00000000
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-02 21:21 fix iounmap and a pageattr memleak (x86 and x86-64) Dave Hansen @ 2004-11-02 22:07 ` Andrea Arcangeli 2004-11-02 22:21 ` Dave Hansen 2004-11-02 22:45 ` Dave Hansen 2004-11-02 22:34 ` Jason Baron 1 sibling, 2 replies; 26+ messages in thread From: Andrea Arcangeli @ 2004-11-02 22:07 UTC (permalink / raw) To: Dave Hansen; +Cc: linux-mm, linux-kernel, Andi Kleen, Andrew Morton On Tue, Nov 02, 2004 at 01:21:49PM -0800, Dave Hansen wrote: > This patch: > > >From: Andrea Arcangeli <andrea@novell.com> > > > >- fix silent memleak in the pageattr code that I found while searching > > for the bug Andi fixed in the second patch below (basically reference > > counting in split page was done on the pmd instead of the pte). > > > >- Part of this patch is also needed to make the above work on x86 > >(otherwise > > one of my new above BUGS() will trigger signalling the fact a bug was > > there). The below patch creates a subtle dependency that (_PAGE_PCD << > > 24) > > must not be zero. It's not the cleanest thing ever, but since it's an > > hardware bitflag I doubt it's going to break. > > > >Signed-off-by: Andi Kleen <ak@suse.de> > >Signed-off-by: Andrea Arcangeli <andrea@novell.com> > >Signed-off-by: Andrew Morton <akpm@osdl.org> > >--- > > > > 25-akpm/arch/i386/mm/ioremap.c | 4 ++-- > > 25-akpm/arch/i386/mm/pageattr.c | 13 +++++++------ > > 25-akpm/arch/x86_64/mm/ioremap.c | 14 +++++++------- > > 25-akpm/arch/x86_64/mm/pageattr.c | 23 ++++++++++++++--------- > > 4 files changed, 30 insertions(+), 24 deletions(-) > > is hitting this BUG() during bootup: > > /* memleak and potential failed 2M page regeneration */ > BUG_ON(!page_count(kpte_page)); > > in 2.6.10-rc1-mm2. > > Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) > Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) > Memory: 511144k/524288k available (1856k kernel code, 12608k reserved, > 1186k data, 164k init, 0k highmem) > Checking if this processor honours the WP bit even in supervisor mode... Ok. > Mount-cache hash table entries: 512 (order: 0, 4096 bytes) > ------------[ cut here ]------------ > kernel BUG at arch/i386/mm/pageattr.c:136! > invalid operand: 0000 [#1] > SMP DEBUG_PAGEALLOC > Modules linked in: > CPU: 0 > EIP: 0060:[<c0113f48>] Not tainted VLI > EFLAGS: 00010046 (2.6.10-rc1-mm2) > EIP is at __change_page_attr+0x28c/0x358 > eax: ffffffff ebx: 017ff163 ecx: 00000000 edx: c10001e0 > esi: 00000000 edi: c000fff8 ebp: c10001e0 esp: c03f9d98 > ds: 007b es: 007b ss: 0068 > Process swapper (pid: 0, threadinfo=c03f9000 task=c0345b40) > Stack: c102ffe0 00000000 00000000 00000046 c0113ceb c17f9000 c102ff20 > c102ff20 017ff163 00000000 017ff163 00000000 c0113ceb c17fe000 > c102ffc0 c102ffc0 c1000000 c17ff000 c000fff8 017ff163 00000000 > 017ff163 00000000 00000000 > Call Trace: > [<c0113ceb>] __change_page_attr+0x2f/0x358 > [<c0113ceb>] __change_page_attr+0x2f/0x358 > [<c011404a>] change_page_attr+0x36/0x54 > [<c0114148>] kernel_map_pages+0x30/0x5f > [<c0137d80>] __alloc_pages+0x340/0x350 > [<c0137dad>] __get_free_pages+0x1d/0x30 > [<c013adfa>] kmem_getpages+0x26/0xd4 > [<c013c221>] cache_grow+0xb1/0x150 > [<c013c84a>] cache_alloc_refill+0x232/0x280 > [<c013ccbe>] kmem_cache_alloc+0x5a/0x78 > [<c01d4970>] idr_pre_get+0x1c/0x44 > [<c0181b60>] sysfs_fill_super+0x0/0xa4 > [<c01572cc>] set_anon_super+0x10/0xb8 > [<c0156cff>] sget+0xb3/0x148 > [<c0181b60>] sysfs_fill_super+0x0/0xa4 > [<c0157636>] get_sb_single+0x26/0x8c > [<c0157608>] compare_single+0x0/0x8 > [<c01572bc>] set_anon_super+0x0/0xb8 > [<c0181c1d>] sysfs_get_sb+0x19/0x1d > [<c0181b60>] sysfs_fill_super+0x0/0xa4 > [<c01576ea>] do_kern_mount+0x4e/0xd0 > [<c015777d>] kern_mount+0x11/0x15 > [<c0409962>] sysfs_init+0x1e/0x50 > [<c0409430>] mnt_init+0xb4/0xc0 > [<c040917a>] vfs_caches_init+0x7e/0x94 > [<c03fa831>] start_kernel+0x12d/0x150 > Code: c7 0f 75 f5 f0 ff 4d 04 eb 08 0f 0b 85 00 88 c1 2d c0 8b 45 00 89 > ea f6 c4 80 74 07 8b 55 0c 8d 74 26 00 8b 42 04 83 f8 ff 75 08 <0f> 0b > 88 00 88 c1 2d c0 a1 ac 6c 34 c0 a8 08 0f 84 aa 00 00 00 > > I'm tracking down now to see exactly what's going on. This just a > regular, plain 4-way x86 box with 4GB of RAM. Removing that BUG_ON() > lets me boot just fine. you've a debugging option enabled. I'm afraid somebody wrote common code with the hope that change_page_attr had a natural universal API. change_page_attr API has alwasy required you a certain level of symmetry to work. Now I made it completley symmetric just to make it simpler to use and I added BUG where the previous code would screwup. If you do something like this you'll trigger a BUG_ON too: change_page_attr(page, PAGE_KERNEL_NOCACHE) change_page_attr(page, PAGE_KERNEL) change_page_attr(page, PAGE_KERNEL) the last one will BUG(). At least the new API with these changes will not leak memory anymore as far as you retain simmetry (even if you run the change_page_attr on the same page from different context). I suspect that such debugging option has never worked right. pageattr will need fixing so that to provide a natural universal API that keeps track of each page. Previously this would have failed: A change_page_attr(page1, UNCACHED) -> pte page_count = 2 B change_page_attr(page1, WHATEVER) -> pte page_count still 2 C change_page_attr(page2, UNCACHED) -> pte page count = 3 A change_page_attr(page1, PAGE_KERNEL) -> pte page count = 2 B change_page_attr(page1, PAGE_KERNEL) -> pte page count = 1 ->screwup Now the above can work fine. But still examples like the above one will trigger bugs since they're simply violating the current restricted change_page_attr API. I believe if you disable the DEBUG_PAGEALLOC it should work fine again. Still I recommend investigating _why_ debug_pagealloc is violating the API. It might not be necessary to wait for the pageattr universal feature to make DEBUG_PAGEALLOC work safe. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-02 22:07 ` Andrea Arcangeli @ 2004-11-02 22:21 ` Dave Hansen 2004-11-02 22:29 ` Andrew Morton 2004-11-03 0:54 ` Andrea Arcangeli 2004-11-02 22:45 ` Dave Hansen 1 sibling, 2 replies; 26+ messages in thread From: Dave Hansen @ 2004-11-02 22:21 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-mm, linux-kernel, Andi Kleen, Andrew Morton Andrea Arcangeli wrote: > Still I recommend investigating _why_ debug_pagealloc is violating the > API. It might not be necessary to wait for the pageattr universal > feature to make DEBUG_PAGEALLOC work safe. OK, good to know. But, for now, can we pull this out of -mm? Or, at least that BUG_ON()? DEBUG_PAGEALLOC is an awfully powerful debugging tool to just be removed like this. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-02 22:21 ` Dave Hansen @ 2004-11-02 22:29 ` Andrew Morton 2004-11-02 22:34 ` Dave Hansen 2004-11-03 0:54 ` Andrea Arcangeli 1 sibling, 1 reply; 26+ messages in thread From: Andrew Morton @ 2004-11-02 22:29 UTC (permalink / raw) To: Dave Hansen; +Cc: andrea, linux-mm, linux-kernel, ak Dave Hansen <haveblue@us.ibm.com> wrote: > > Andrea Arcangeli wrote: > > Still I recommend investigating _why_ debug_pagealloc is violating the > > API. It might not be necessary to wait for the pageattr universal > > feature to make DEBUG_PAGEALLOC work safe. > > OK, good to know. But, for now, can we pull this out of -mm? Or, at > least that BUG_ON()? DEBUG_PAGEALLOC is an awfully powerful debugging > tool to just be removed like this. If we make it a WARN_ON, will that cause a complete storm of output? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-02 22:29 ` Andrew Morton @ 2004-11-02 22:34 ` Dave Hansen 0 siblings, 0 replies; 26+ messages in thread From: Dave Hansen @ 2004-11-02 22:34 UTC (permalink / raw) To: Andrew Morton; +Cc: andrea, linux-mm, linux-kernel, ak Andrew Morton wrote: > Dave Hansen <haveblue@us.ibm.com> wrote: > >>Andrea Arcangeli wrote: >> >>>Still I recommend investigating _why_ debug_pagealloc is violating the >>>API. It might not be necessary to wait for the pageattr universal >>>feature to make DEBUG_PAGEALLOC work safe. >> >>OK, good to know. But, for now, can we pull this out of -mm? Or, at >>least that BUG_ON()? DEBUG_PAGEALLOC is an awfully powerful debugging >>tool to just be removed like this. > > If we make it a WARN_ON, will that cause a complete storm of output? Yeah, just tried it. I hit a couple hundred of them before I got to init. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-02 22:21 ` Dave Hansen 2004-11-02 22:29 ` Andrew Morton @ 2004-11-03 0:54 ` Andrea Arcangeli 1 sibling, 0 replies; 26+ messages in thread From: Andrea Arcangeli @ 2004-11-03 0:54 UTC (permalink / raw) To: Dave Hansen; +Cc: linux-mm, linux-kernel, Andi Kleen, Andrew Morton On Tue, Nov 02, 2004 at 02:21:35PM -0800, Dave Hansen wrote: > OK, good to know. But, for now, can we pull this out of -mm? Or, at I doubt it's a good idea unless 1) you want the silent memleak back or 2) you found something wrong in the fix itself. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-02 22:07 ` Andrea Arcangeli 2004-11-02 22:21 ` Dave Hansen @ 2004-11-02 22:45 ` Dave Hansen 2004-11-02 23:00 ` Dave Hansen 2004-11-02 23:04 ` Andrew Morton 1 sibling, 2 replies; 26+ messages in thread From: Dave Hansen @ 2004-11-02 22:45 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-mm, linux-kernel, Andi Kleen, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 398 bytes --] Andrea Arcangeli wrote: > Still I recommend investigating _why_ debug_pagealloc is violating the > API. It might not be necessary to wait for the pageattr universal > feature to make DEBUG_PAGEALLOC work safe. This makes the DEBUG_PAGEALLOC stuff symmetric enough to boot for me, and it's pretty damn simple. Any ideas for doing this without bloating 'struct page', even in the debugging case? [-- Attachment #2: Z3-page_debugging.patch --] [-- Type: text/plain, Size: 2413 bytes --] --- memhotplug1-dave/arch/i386/mm/pageattr.c | 7 +++++-- memhotplug1-dave/include/linux/mm.h | 3 +++ memhotplug1-dave/mm/page_alloc.c | 5 ++++- 3 files changed, 12 insertions(+), 3 deletions(-) diff -puN include/linux/mm.h~Z3-page_debugging include/linux/mm.h --- memhotplug1/include/linux/mm.h~Z3-page_debugging 2004-11-02 14:29:51.000000000 -0800 +++ memhotplug1-dave/include/linux/mm.h 2004-11-02 14:37:08.000000000 -0800 @@ -245,6 +245,9 @@ struct page { void *virtual; /* Kernel virtual address (NULL if not kmapped, ie. highmem) */ #endif /* WANT_PAGE_VIRTUAL */ +#ifdef CONFIG_DEBUG_PAGEALLOC + int mapped; +#endif }; #ifdef CONFIG_MEMORY_HOTPLUG diff -puN arch/i386/mm/pageattr.c~Z3-page_debugging arch/i386/mm/pageattr.c --- memhotplug1/arch/i386/mm/pageattr.c~Z3-page_debugging 2004-11-02 14:31:07.000000000 -0800 +++ memhotplug1-dave/arch/i386/mm/pageattr.c 2004-11-02 14:41:00.000000000 -0800 @@ -153,7 +153,7 @@ __change_page_attr(struct page *page, pg printk("pgprot_val(PAGE_KERNEL): %08lx\n", pgprot_val(PAGE_KERNEL)); printk("(pte_val(*kpte) & _PAGE_PSE): %08lx\n", (pte_val(*kpte) & _PAGE_PSE)); printk("path: %d\n", path); - BUG(); + WARN_ON(1); } if (cpu_has_pse && (page_count(kpte_page) == 1)) { @@ -224,7 +224,10 @@ void kernel_map_pages(struct page *page, /* the return value is ignored - the calls cannot fail, * large pages are disabled at boot time. */ - change_page_attr(page, numpages, enable ? PAGE_KERNEL : __pgprot(0)); + if (enable && !page->mapped) + change_page_attr(page, numpages, PAGE_KERNEL); + else if (!enable && page->mapped) + change_page_attr(page, numpages, __pgprot(0)); /* we should perform an IPI and flush all tlbs, * but that can deadlock->flush only current cpu. */ diff -puN mm/page_alloc.c~Z3-page_debugging mm/page_alloc.c --- memhotplug1/mm/page_alloc.c~Z3-page_debugging 2004-11-02 14:37:53.000000000 -0800 +++ memhotplug1-dave/mm/page_alloc.c 2004-11-02 14:42:56.000000000 -0800 @@ -1840,8 +1840,11 @@ void __devinit memmap_init_zone(unsigned INIT_LIST_HEAD(&page->lru); #ifdef WANT_PAGE_VIRTUAL /* The shift won't overflow because ZONE_NORMAL is below 4G. */ - if (!is_highmem_idx(zone)) + if (!is_highmem_idx(zone)) { set_page_address(page, __va(start_pfn << PAGE_SHIFT)); + page->mapped = 1; + } else + page->mapped = 0; #endif start_pfn++; } _ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-02 22:45 ` Dave Hansen @ 2004-11-02 23:00 ` Dave Hansen 2004-11-03 1:35 ` Andrea Arcangeli 2004-11-02 23:04 ` Andrew Morton 1 sibling, 1 reply; 26+ messages in thread From: Dave Hansen @ 2004-11-02 23:00 UTC (permalink / raw) To: Dave Hansen Cc: Andrea Arcangeli, linux-mm, linux-kernel, Andi Kleen, Andrew Morton BTW, please don't anyone going even trying to apply that piece of crap I just sent out, I just wanted to demonstrate what solves my immediate problem. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-02 23:00 ` Dave Hansen @ 2004-11-03 1:35 ` Andrea Arcangeli 2004-11-03 1:43 ` Dave Hansen 0 siblings, 1 reply; 26+ messages in thread From: Andrea Arcangeli @ 2004-11-03 1:35 UTC (permalink / raw) To: Dave Hansen; +Cc: linux-mm, linux-kernel, Andi Kleen, Andrew Morton On Tue, Nov 02, 2004 at 03:00:26PM -0800, Dave Hansen wrote: > just sent out, I just wanted to demonstrate what solves my immediate > problem. sure ;) that's like disabling the config option, the only point of change_page_attr is to split the direct mapping, it does nothing on highmem, it actually BUGS() (and it wasn't one of my new bugs ;): #ifdef CONFIG_HIGHMEM if (page >= highmem_start_page) BUG(); #endif -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-03 1:35 ` Andrea Arcangeli @ 2004-11-03 1:43 ` Dave Hansen 2004-11-03 2:26 ` Andrea Arcangeli 0 siblings, 1 reply; 26+ messages in thread From: Dave Hansen @ 2004-11-03 1:43 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-mm, linux-kernel, Andi Kleen, Andrew Morton Andrea Arcangeli wrote: > On Tue, Nov 02, 2004 at 03:00:26PM -0800, Dave Hansen wrote: > >>just sent out, I just wanted to demonstrate what solves my immediate >>problem. > > sure ;) > > that's like disabling the config option, the only point of > change_page_attr is to split the direct mapping, it does nothing on > highmem, it actually BUGS() (and it wasn't one of my new bugs ;): > > #ifdef CONFIG_HIGHMEM > if (page >= highmem_start_page) > BUG(); > #endif Oh, crap. I meant to clear ->mapped when change_attr(__pgprot(0)) was done on it, and set it when it was changed back. Doing that correctly preserves the symmetry, right? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-03 1:43 ` Dave Hansen @ 2004-11-03 2:26 ` Andrea Arcangeli 2004-11-03 2:48 ` Dave Hansen 0 siblings, 1 reply; 26+ messages in thread From: Andrea Arcangeli @ 2004-11-03 2:26 UTC (permalink / raw) To: Dave Hansen; +Cc: linux-mm, linux-kernel, Andi Kleen, Andrew Morton On Tue, Nov 02, 2004 at 05:43:45PM -0800, Dave Hansen wrote: > Oh, crap. I meant to clear ->mapped when change_attr(__pgprot(0)) was > done on it, and set it when it was changed back. Doing that correctly > preserves the symmetry, right? yes it should. I agree with Andrew a bitflag would be enough. I'd call it PG_prot_none. I realized if the page is in the freelist it's like if it's reserved, since you're guaranteed there's no other pageattr working on it (assuming every other pageattr is symmetric too, which we always depend on). So I wonder why it's not symmetric already, despite the lack of page->mapped. Would be nice to dump_stack when a __pg_prot(0) && page->mapped triggers. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-03 2:26 ` Andrea Arcangeli @ 2004-11-03 2:48 ` Dave Hansen 2004-11-03 3:05 ` Andrea Arcangeli 0 siblings, 1 reply; 26+ messages in thread From: Dave Hansen @ 2004-11-03 2:48 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-mm, linux-kernel, Andi Kleen, Andrew Morton Andrea Arcangeli wrote: > On Tue, Nov 02, 2004 at 05:43:45PM -0800, Dave Hansen wrote: > >>Oh, crap. I meant to clear ->mapped when change_attr(__pgprot(0)) was >>done on it, and set it when it was changed back. Doing that correctly >>preserves the symmetry, right? > > yes it should. I agree with Andrew a bitflag would be enough. I'd call > it PG_prot_none. It should be enough, but I don't think we want to waste a bitflag for something that's only needed for debugging anyway. They're getting precious these days. Might as well just bloat the kernel some more when the alloc debugging is on. I'll see what I can do to get some backtraces of the __pg_prot(0) && page->mapped cases. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-03 2:48 ` Dave Hansen @ 2004-11-03 3:05 ` Andrea Arcangeli 2004-11-03 19:37 ` Dave Hansen 2004-11-05 0:02 ` Dave Hansen 0 siblings, 2 replies; 26+ messages in thread From: Andrea Arcangeli @ 2004-11-03 3:05 UTC (permalink / raw) To: Dave Hansen; +Cc: linux-mm, linux-kernel, Andi Kleen, Andrew Morton On Tue, Nov 02, 2004 at 06:48:09PM -0800, Dave Hansen wrote: > It should be enough, but I don't think we want to waste a bitflag for > something that's only needed for debugging anyway. They're getting > precious these days. Might as well just bloat the kernel some more when > the alloc debugging is on. You can leave the bitflag the end (number 31) under the #ifdef. Using the bitflag is less likely to create an heisenbug (due different layout of the ram ;). > I'll see what I can do to get some backtraces of the __pg_prot(0) && > page->mapped cases. thanks! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-03 3:05 ` Andrea Arcangeli @ 2004-11-03 19:37 ` Dave Hansen 2004-11-05 0:02 ` Dave Hansen 1 sibling, 0 replies; 26+ messages in thread From: Dave Hansen @ 2004-11-03 19:37 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-mm, linux-kernel, Andi Kleen, Andrew Morton Andrea Arcangeli wrote: >>I'll see what I can do to get some backtraces of the __pg_prot(0) && >>page->mapped cases. These appear to be the 2 most common paths: cache_init_objs() and kfree(). But, the path that I'm hitting now appears to be that something got page_count(kpte_page) to -1 (*NOT* page->_count), and then the BUG_ON() trips after a get_page() is done on it. I'm tracking this down now. kpte: c0011000 address: c1a00000 kpte_page: c1000264 split: ffffffff pgprot_val(prot): 00000000 pgprot_val(PAGE_KERNEL): 00000163 (pte_val(*kpte) & _PAGE_PSE): 00000000 path: 1 Badness in __change_page_attr at arch/i386/mm/pageattr.c:156 [<c0114e43>] __change_page_attr+0x443/0x5a4 [<c01150d9>] kernel_map_pages+0x31/0x80 [<c0114fda>] change_page_attr+0x36/0x54 [<c01150f3>] kernel_map_pages+0x4b/0x80 [<c013de18>] cache_init_objs+0x194/0x1d8 [<c013e014>] cache_grow+0xe8/0x154 [<c013e74a>] cache_alloc_refill+0x226/0x274 [<c013edb3>] __kmalloc+0x8b/0xbc [<c0181302>] proc_create+0x76/0xcc [<c01814c3>] create_proc_entry+0x6b/0xb4 [<c0121c3b>] register_proc_table+0xcb/0x110 [<c0121c67>] register_proc_table+0xf7/0x110 [<c0420994>] sysctl_init+0x10/0x1c [<c0413934>] do_basic_setup+0x14/0x20 [<c01004ec>] init+0xa8/0x15c [<c0100444>] init+0x0/0x15c [<c0102305>] kernel_thread_helper+0x5/0xc kpte: c0006978 address: c052f000 kpte_page: c10000d8 split: ffffffff pgprot_val(prot): 00000000 pgprot_val(PAGE_KERNEL): 00000163 (pte_val(*kpte) & _PAGE_PSE): 00000000 path: 1 Badness in __change_page_attr at arch/i386/mm/pageattr.c:156 [<c0114e43>] __change_page_attr+0x443/0x5a4 [<c02367db>] scsi_run_queue+0xaf/0xb8 [<c0114fda>] change_page_attr+0x36/0x54 [<c01150f3>] kernel_map_pages+0x4b/0x80 [<c013e422>] cache_free_debugcheck+0x2a6/0x2c0 [<c02385a5>] scsi_probe_and_add_lun+0x135/0x188 [<c013f00b>] kfree+0x9f/0xdc [<c02385a5>] scsi_probe_and_add_lun+0x135/0x188 [<c02385a5>] scsi_probe_and_add_lun+0x135/0x188 [<c0238ac7>] scsi_scan_target+0x63/0xbc [<c0238b60>] scsi_scan_channel+0x40/0x68 [<c0238bfe>] scsi_scan_host_selected+0x76/0xb0 [<c0238c4a>] scsi_scan_host+0x12/0x18 [<c024b7ff>] ahc_linux_register_host+0x283/0x290 [<c01857e9>] sysfs_create_file+0x41/0x48 [<c01e4392>] pci_create_newid_file+0x1a/0x24 [<c01e470a>] pci_populate_driver_dir+0xa/0x10 [<c01e4786>] pci_register_driver+0x76/0x88 [<c024a867>] ahc_linux_detect+0x3b/0x60 [<c042750a>] ahc_linux_init+0xa/0x30 [<c04138ca>] do_initcalls+0x66/0xbc [<c041393e>] do_basic_setup+0x1e/0x20 [<c01004ec>] init+0xa8/0x15c [<c0100444>] init+0x0/0x15c [<c0102305>] kernel_thread_helper+0x5/0xc -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-03 3:05 ` Andrea Arcangeli 2004-11-03 19:37 ` Dave Hansen @ 2004-11-05 0:02 ` Dave Hansen 2004-11-05 0:40 ` Dave Hansen 1 sibling, 1 reply; 26+ messages in thread From: Dave Hansen @ 2004-11-05 0:02 UTC (permalink / raw) To: Andrea Arcangeli Cc: linux-mm, Linux Kernel Mailing List, Andi Kleen, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 393 bytes --] Hi Andrea, Are you sure that BUG_ON() should be for !page_count(kpte_page)? I noticed that I was getting some BUGs when the count went back to 0, but the pte page was completely full with valid ptes. I *think* it's correct to make it: BUG_ON(page_count(kpte_page) < 0); Or, I guess we could keep the old BUG_ON(), and put it inside the else block with the __put_page(). -- Dave [-- Attachment #2: Z3-page_debugging.patch --] [-- Type: text/x-patch, Size: 2413 bytes --] --- memhotplug1-dave/arch/i386/mm/pageattr.c | 7 +++++-- memhotplug1-dave/include/linux/mm.h | 3 +++ memhotplug1-dave/mm/page_alloc.c | 5 ++++- 3 files changed, 12 insertions(+), 3 deletions(-) diff -puN include/linux/mm.h~Z3-page_debugging include/linux/mm.h --- memhotplug1/include/linux/mm.h~Z3-page_debugging 2004-11-02 14:29:51.000000000 -0800 +++ memhotplug1-dave/include/linux/mm.h 2004-11-02 14:37:08.000000000 -0800 @@ -245,6 +245,9 @@ struct page { void *virtual; /* Kernel virtual address (NULL if not kmapped, ie. highmem) */ #endif /* WANT_PAGE_VIRTUAL */ +#ifdef CONFIG_DEBUG_PAGEALLOC + int mapped; +#endif }; #ifdef CONFIG_MEMORY_HOTPLUG diff -puN arch/i386/mm/pageattr.c~Z3-page_debugging arch/i386/mm/pageattr.c --- memhotplug1/arch/i386/mm/pageattr.c~Z3-page_debugging 2004-11-02 14:31:07.000000000 -0800 +++ memhotplug1-dave/arch/i386/mm/pageattr.c 2004-11-02 14:41:00.000000000 -0800 @@ -153,7 +153,7 @@ __change_page_attr(struct page *page, pg printk("pgprot_val(PAGE_KERNEL): %08lx\n", pgprot_val(PAGE_KERNEL)); printk("(pte_val(*kpte) & _PAGE_PSE): %08lx\n", (pte_val(*kpte) & _PAGE_PSE)); printk("path: %d\n", path); - BUG(); + WARN_ON(1); } if (cpu_has_pse && (page_count(kpte_page) == 1)) { @@ -224,7 +224,10 @@ void kernel_map_pages(struct page *page, /* the return value is ignored - the calls cannot fail, * large pages are disabled at boot time. */ - change_page_attr(page, numpages, enable ? PAGE_KERNEL : __pgprot(0)); + if (enable && !page->mapped) + change_page_attr(page, numpages, PAGE_KERNEL); + else if (!enable && page->mapped) + change_page_attr(page, numpages, __pgprot(0)); /* we should perform an IPI and flush all tlbs, * but that can deadlock->flush only current cpu. */ diff -puN mm/page_alloc.c~Z3-page_debugging mm/page_alloc.c --- memhotplug1/mm/page_alloc.c~Z3-page_debugging 2004-11-02 14:37:53.000000000 -0800 +++ memhotplug1-dave/mm/page_alloc.c 2004-11-02 14:42:56.000000000 -0800 @@ -1840,8 +1840,11 @@ void __devinit memmap_init_zone(unsigned INIT_LIST_HEAD(&page->lru); #ifdef WANT_PAGE_VIRTUAL /* The shift won't overflow because ZONE_NORMAL is below 4G. */ - if (!is_highmem_idx(zone)) + if (!is_highmem_idx(zone)) { set_page_address(page, __va(start_pfn << PAGE_SHIFT)); + page->mapped = 1; + } else + page->mapped = 0; #endif start_pfn++; } _ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-05 0:02 ` Dave Hansen @ 2004-11-05 0:40 ` Dave Hansen 2004-11-05 0:53 ` Andrea Arcangeli 0 siblings, 1 reply; 26+ messages in thread From: Dave Hansen @ 2004-11-05 0:40 UTC (permalink / raw) To: Andrea Arcangeli Cc: linux-mm, Linux Kernel Mailing List, Andi Kleen, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 67 bytes --] I attached the wrong patch. Here's what I meant to send. -- Dave [-- Attachment #2: Z0-leaks_only_on_negative.patch --] [-- Type: text/x-patch, Size: 674 bytes --] --- memhotplug1-dave/arch/i386/mm/pageattr.c | 2 +- 1 files changed, 1 insertion(+), 1 deletion(-) diff -puN arch/i386/mm/pageattr.c~Z0-leaks_only_on_negative arch/i386/mm/pageattr.c --- memhotplug1/arch/i386/mm/pageattr.c~Z0-leaks_only_on_negative 2004-11-04 15:57:28.000000000 -0800 +++ memhotplug1-dave/arch/i386/mm/pageattr.c 2004-11-04 15:58:50.000000000 -0800 @@ -135,7 +135,7 @@ __change_page_attr(struct page *page, pg BUG(); /* memleak and potential failed 2M page regeneration */ - BUG_ON(!page_count(kpte_page)); + BUG_ON(page_count(kpte_page) < 0); if (cpu_has_pse && (page_count(kpte_page) == 1)) { list_add(&kpte_page->lru, &df_list); _ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-05 0:40 ` Dave Hansen @ 2004-11-05 0:53 ` Andrea Arcangeli 2004-11-05 1:55 ` Dave Hansen 0 siblings, 1 reply; 26+ messages in thread From: Andrea Arcangeli @ 2004-11-05 0:53 UTC (permalink / raw) To: Dave Hansen Cc: linux-mm, Linux Kernel Mailing List, Andi Kleen, Andrew Morton On Thu, Nov 04, 2004 at 04:40:48PM -0800, Dave Hansen wrote: > I attached the wrong patch. > > Here's what I meant to send. > > -- Dave > > > --- > > memhotplug1-dave/arch/i386/mm/pageattr.c | 2 +- > 1 files changed, 1 insertion(+), 1 deletion(-) > > diff -puN arch/i386/mm/pageattr.c~Z0-leaks_only_on_negative arch/i386/mm/pageattr.c > --- memhotplug1/arch/i386/mm/pageattr.c~Z0-leaks_only_on_negative 2004-11-04 15:57:28.000000000 -0800 > +++ memhotplug1-dave/arch/i386/mm/pageattr.c 2004-11-04 15:58:50.000000000 -0800 > @@ -135,7 +135,7 @@ __change_page_attr(struct page *page, pg > BUG(); > > /* memleak and potential failed 2M page regeneration */ > - BUG_ON(!page_count(kpte_page)); > + BUG_ON(page_count(kpte_page) < 0); > > if (cpu_has_pse && (page_count(kpte_page) == 1)) { > list_add(&kpte_page->lru, &df_list); > _ that will hide the memleak again. Furthermore page_count cannot be < 0 unless we get a _double_ memleak. The only chance for kpte_page to be freed, is to be == 1 in that place. If kpte_page is == 1, it will be freed via list_add and the master page will be regenerated giving it a chance to get performance back. If it's 0, it means we leaked memory as far as I can tell. It's impossible a pte had a 0 page_count() and not to be in the freelist already. There is no put_page at all in that whole path, there's only a __put_page, so it's a memleak to get == 0 in there on any pte or pmd or whatever else we cannot have put in the freelist already. something else is still wrong, but knowing the above fixes it at least tells us it's not a double leak. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-05 0:53 ` Andrea Arcangeli @ 2004-11-05 1:55 ` Dave Hansen 2004-11-05 2:08 ` Andrea Arcangeli 0 siblings, 1 reply; 26+ messages in thread From: Dave Hansen @ 2004-11-05 1:55 UTC (permalink / raw) To: Andrea Arcangeli Cc: linux-mm, Linux Kernel Mailing List, Andi Kleen, Andrew Morton On Thu, 2004-11-04 at 16:53, Andrea Arcangeli wrote: > The only chance for kpte_page to be freed, is to be == 1 in that place. > If kpte_page is == 1, it will be freed via list_add and the master page > will be regenerated giving it a chance to get performance back. If it's > 0, it means we leaked memory as far as I can tell. > > It's impossible a pte had a 0 page_count() and not to be in the freelist > already. There is no put_page at all in that whole path, there's only a > __put_page, so it's a memleak to get == 0 in there on any pte or pmd or > whatever else we cannot have put in the freelist already. Ahhh. I forgot about the allocator's reference on the page. However, there still seems to be something fishy here. The page that's causing trouble's pfn is 0x0000000f. It's also set as PageReserved(). We may have some imbalance with page counts when the page is PageReserved() that this is catching. I can't find any asymmetric use of kernel_map_pages(). Both the slab and the allocator appear to be behaving themselves. I don't even see a case where that particular page has a get_page() done on it before the first __change_page_attr() call on it. So, it probably still has its page_count()==0 from the original set in memmap_init_zone(). What happens when a pte page is bootmem-allocated? I *think* that's the situation that I'm hitting. In that case, we can either try to hunt down the real 'struct pages' after everything is brought up, or we can just skip the BUG_ON() if the page is reserved. Any thoughts? -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-05 1:55 ` Dave Hansen @ 2004-11-05 2:08 ` Andrea Arcangeli 2004-11-05 2:23 ` Dave Hansen 0 siblings, 1 reply; 26+ messages in thread From: Andrea Arcangeli @ 2004-11-05 2:08 UTC (permalink / raw) To: Dave Hansen Cc: linux-mm, Linux Kernel Mailing List, Andi Kleen, Andrew Morton On Thu, Nov 04, 2004 at 05:55:40PM -0800, Dave Hansen wrote: > What happens when a pte page is bootmem-allocated? I *think* that's the > situation that I'm hitting. In that case, we can either try to hunt > down the real 'struct pages' after everything is brought up, or we can > just skip the BUG_ON() if the page is reserved. Any thoughts? Skipping BUG_ON if the page is reserved is something you can certainly try. However if all usages are symmetric, the only pte that should ever get freed, is the pte that change_page_attr itself has allocated via split_large_page. I tried the debug option right now, without the fixes I get a crash in X (but not in pageattr.c, it's an invalid page fault in some direct mapping), that might be a real bug or another false positive. with the fixes applied I get this, so I can reproduce at least ;) ------------[ cut here ]------------ kernel BUG at arch/i386/mm/pageattr.c:133! invalid operand: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 0 EIP: 0060:[<c011979f>] Not tainted EFLAGS: 00010046 (2.6.5-0-andrea ) EIP is at change_page_attr+0x26f/0x2d0 eax: ffffffff ebx: c1037de0 ecx: c1000100 edx: c1000100 esi: 00000001 edi: 00000000 ebp: 00000163 esp: c1bf3ee0 ds: 007b es: 007b ss: 0068 Process swapper (pid: 1, threadinfo=c1bf2000 task=c1bf1780) Stack: 0000001b c0008fbc c1bef000 00000246 00000000 00000001 c1037de0 c1037de0 00000001 00000000 00000001 c0119823 c0419780 c1037de0 c013fba5 c1bb049c 00000000 00000078 00000000 00000001 00000000 c1bf1780 00000010 c041a380 Call Trace: [<c0119823>] kernel_map_pages+0x23/0x4f [<c013fba5>] __alloc_pages+0x2f8/0x33b [<c013fc44>] __get_free_pages+0x18/0x25 [<c0142600>] cache_alloc_refill+0x28c/0x530 [<c013d74b>] mempool_alloc_slab+0x0/0xb [<c0142907>] __kmalloc+0x63/0x65 [<c013d995>] mempool_create+0x3f/0xbf [<c013d740>] mempool_free_slab+0x0/0xb [<c04c87fc>] init_bio+0xec/0x1a8 [<c0103199>] init+0x131/0x2ca [<c0103068>] init+0x0/0x2ca [<c0106005>] kernel_thread_helper+0x5/0xb Code: 0f 0b 85 00 23 e7 3a c0 e9 0f fe ff ff 8d 41 10 8b 15 38 6e <0>Kernel panic: Attempted to kill init! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-05 2:08 ` Andrea Arcangeli @ 2004-11-05 2:23 ` Dave Hansen 2004-11-05 4:03 ` Andrea Arcangeli 0 siblings, 1 reply; 26+ messages in thread From: Dave Hansen @ 2004-11-05 2:23 UTC (permalink / raw) To: Andrea Arcangeli Cc: linux-mm, Linux Kernel Mailing List, Andi Kleen, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 1935 bytes --] On Thu, 2004-11-04 at 18:08, Andrea Arcangeli wrote: > On Thu, Nov 04, 2004 at 05:55:40PM -0800, Dave Hansen wrote: > > What happens when a pte page is bootmem-allocated? I *think* that's the > > situation that I'm hitting. In that case, we can either try to hunt > > down the real 'struct pages' after everything is brought up, or we can > > just skip the BUG_ON() if the page is reserved. Any thoughts? > > Skipping BUG_ON if the page is reserved is something you can certainly > try. > > However if all usages are symmetric, the only pte that should ever get > freed, is the pte that change_page_attr itself has allocated via > split_large_page. > > I tried the debug option right now, without the fixes I get a crash in > X (but not in pageattr.c, it's an invalid page fault in some direct > mapping), that might be a real bug or another false positive. > > with the fixes applied I get this, so I can reproduce at least ;) Here we go again :) I think we're being naughty about page_count()s for pages that never hit the page allocator (ones that never hit free_all_bootmem()). They keep an initial page_count() of 0, which is a no no if they're used as pte pages and noticed by __change_page_attr(). This discrepancy isn't noticed until the page is get'd 512 times, then completely __put'd as things get allocated into space mapped by the page. The final __put hits the BUG_ON(). To find this earlier, we could also have a BUG_ON(!page_count(kpte_page)) in __change_page_attr() right after we find the kpte_page, in addition to the check after the count is modified. This patch defaults the page counts to 1 instead of 0 for all pages in the zone initialization. Any pages that go though free_all_bootmem() are set back to a state where they cleanly go in to the allocator. I'm not quite sure if this has any other weird effects, so I'll hold on to it for a week or so and see if anything turns up. -- Dave [-- Attachment #2: Z0-bootmem_page_counts.patch --] [-- Type: text/x-patch, Size: 1171 bytes --] --- memhotplug1-dave/mm/bootmem.c | 1 + memhotplug1-dave/mm/page_alloc.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff -puN mm/bootmem.c~Z0-bootmem_page_counts mm/bootmem.c --- memhotplug1/mm/bootmem.c~Z0-bootmem_page_counts 2004-11-04 18:16:20.000000000 -0800 +++ memhotplug1-dave/mm/bootmem.c 2004-11-04 18:16:42.000000000 -0800 @@ -289,6 +289,7 @@ static unsigned long __init free_all_boo if (j + 16 < BITS_PER_LONG) prefetchw(page + j + 16); __ClearPageReserved(page + j); + set_page_count(page + j, 0); } __free_pages(page, ffs(BITS_PER_LONG)-1); i += BITS_PER_LONG; diff -puN mm/page_alloc.c~Z0-bootmem_page_counts mm/page_alloc.c --- memhotplug1/mm/page_alloc.c~Z0-bootmem_page_counts 2004-11-04 18:16:20.000000000 -0800 +++ memhotplug1-dave/mm/page_alloc.c 2004-11-04 18:16:47.000000000 -0800 @@ -1824,7 +1824,7 @@ void __devinit memmap_init_zone(unsigned for (page = start; page < (start + size); page++) { set_page_zone(page, NODEZONE(nid, zone)); - set_page_count(page, 0); + set_page_count(page, 1); reset_page_mapcount(page); SetPageReserved(page); INIT_LIST_HEAD(&page->lru); _ ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-05 2:23 ` Dave Hansen @ 2004-11-05 4:03 ` Andrea Arcangeli 2004-11-05 4:20 ` Andrea Arcangeli 0 siblings, 1 reply; 26+ messages in thread From: Andrea Arcangeli @ 2004-11-05 4:03 UTC (permalink / raw) To: Dave Hansen Cc: linux-mm, Linux Kernel Mailing List, Andi Kleen, Andrew Morton On Thu, Nov 04, 2004 at 06:23:11PM -0800, Dave Hansen wrote: > I'm not quite sure if this has any other weird effects, so I'll hold on > to it for a week or so and see if anything turns up. this fixed the problem for me too. However I'm not convinced this is correct, nothing in the kernel should ever free a bootmem piece of memory after the machine has booted. If this helps, it also means we found an existing pte (not pmd) with page_count 0 during the first unmap event (bootmem allocated). The transition from mapped to unmapped works fine, but the transition from unmapped to mapped will thorw the pte away and we'll regenerate a 2M pmd where there was a pte instead. I wonder why there are 4k pages there in the first place. Anyways I understand what's going on now thanks to your debugging, and I believe the only real fix is to use PageReserved to catch if we're working on a newly allocated page or not, I don't like to depend on the page_count being 0 for the bootmem pages like the previous code was doing. I believe my code would now fall apart even if you were using it with PSE disabled (nopentium or something). So I've to fix that bit at least and I will use PageReserved for that. The page_count of bootmem pages really doesn't matter since they must never be freed. It really should remain 0 so we catch if anybody executes a put_page on it. I'll fix it up... -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-05 4:03 ` Andrea Arcangeli @ 2004-11-05 4:20 ` Andrea Arcangeli 0 siblings, 0 replies; 26+ messages in thread From: Andrea Arcangeli @ 2004-11-05 4:20 UTC (permalink / raw) To: Dave Hansen Cc: linux-mm, Linux Kernel Mailing List, Andi Kleen, Andrew Morton On Fri, Nov 05, 2004 at 05:03:08AM +0100, Andrea Arcangeli wrote: > doing. I believe my code would now fall apart even if you were using it > with PSE disabled (nopentium or something). So I've to fix that bit at ah no pse disabled was working fine thanks to the cpu_has_pse (I even already considered that case while doing the patch), only 4k pages setup at boot time would screw it up and frankly there's no good reason at all that such 4k pages should exist at all. However adding the reserved check won't make the code more robust. This way I don't need to drop the memleak detector from the code (there was a true gigantic memleak in that code that was triggering in the common case, and it got hidden just to let those 4k pages pass through). The only reason to set the page_count 0 on bootmem is to crash on put_page, nobody should ever care about the page_count of bootmem memory. bootmem memory will be PageReserved, that is the only thing I'd like to relay on. So I believe this is the right fix to make the change_page_attr aware about all kind of pagetables that might be mapping the direct mapping. This is really what the code should be doing, since we've to reconstruct a largepage and drop the pte only if the pte itself was not-reserved. We shouldn't mess 4k pages instantiated at boot time. --- sles/arch/i386/mm/pageattr.c.~1~ 2004-11-05 02:36:42.000000000 +0100 +++ sles/arch/i386/mm/pageattr.c 2004-11-05 05:18:27.216553680 +0100 @@ -129,13 +129,15 @@ __change_page_attr(struct page *page, pg } else BUG(); - /* memleak and potential failed 2M page regeneration */ - BUG_ON(!page_count(kpte_page)); - - if (cpu_has_pse && (page_count(kpte_page) == 1)) { - list_add(&kpte_page->lru, &df_list); - revert_page(kpte_page, address); - } + if (!PageReserved(kpte_page)) { + /* memleak and potential failed 2M page regeneration */ + BUG_ON(!page_count(kpte_page)); + + if (cpu_has_pse && (page_count(kpte_page) == 1)) { + list_add(&kpte_page->lru, &df_list); + revert_page(kpte_page, address); + } + } return 0; } I'll prepare a new complete patch against mainline for Andrew with some more comment (only the pageattr.c part, the ioremap.c was already fine). next things to do for the longer term: 1) fix this remaining oops with the debug knob enabled ;) (this is unrelated to change_page_attr) Unable to handle kernel paging request at virtual address c04c1a6e printing eip: c04c1a6e *pde = 0053b027 *pte = 004c1000 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 2 EIP: 0060:[<c04c1a6e>] Not tainted EFLAGS: 00013287 (2.6.5-0-andrea ) EIP is at mp_find_ioapic+0x0/0x53 eax: 00000016 ebx: 00000016 ecx: 00000001 edx: 00000001 esi: 00000001 edi: 00000016 ebp: 00000016 esp: f7fc1e60 ds: 007b es: 007b ss: 0068 Process X (pid: 2153, threadinfo=f7fc0000 task=f7fc21f0) Stack: c0115a6f c03da597 00000000 f7e29000 c025cfc6 00000001 00000001 00000016 00000001 00000001 00000016 c01130d0 00000016 00000016 f7e29000 f7fc1eb8 00000016 c0265ae3 00000002 00000001 00000001 00e29000 00400000 c03da600 Call Trace: [<c0115a6f>] mp_register_gsi+0x23/0x114 [<c025cfc6>] acpi_ut_value_exit+0x30/0x3b [<c01130d0>] acpi_register_gsi+0x71/0x85 [<c0265ae3>] acpi_pci_irq_enable+0x2fd/0x36e [<c032a29d>] pcibios_enable_device+0x14/0x18 [<c0235ee1>] pci_enable_device_bars+0x16/0x22 [<c028e477>] radeon_irq_busid+0x74/0x16c [<c028e403>] radeon_irq_busid+0x0/0x16c [<c0106403>] sys_get_thread_area+0xfa/0x139 [<c0287e8a>] radeon_ioctl+0xfd/0x125 [<c015a645>] vfs_write+0xa5/0xfc [<c0106403>] sys_get_thread_area+0xfa/0x139 [<c016be6c>] sys_ioctl+0x2cc/0x492 [<c015a868>] sys_write+0x85/0xc6 [<c0106403>] sys_get_thread_area+0xfa/0x139 [<c0107f9f>] syscall_call+0x7/0xb [<c0106403>] sys_get_thread_area+0xfa/0x139 Code: Bad EIP value. 2) drop those 4k pages from the first few megs of ram that just hurt performance and waste ram (I guess the main problem is that this is in strict talk with head.S, and the only communication between head.S and the C code so far was to check the contents of each pgd and not to touch what was not-null IIRC) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-02 22:45 ` Dave Hansen 2004-11-02 23:00 ` Dave Hansen @ 2004-11-02 23:04 ` Andrew Morton 2004-11-03 1:40 ` Andrea Arcangeli 1 sibling, 1 reply; 26+ messages in thread From: Andrew Morton @ 2004-11-02 23:04 UTC (permalink / raw) To: Dave Hansen; +Cc: andrea, linux-mm, linux-kernel, ak Dave Hansen <haveblue@us.ibm.com> wrote: > > Andrea Arcangeli wrote: > > Still I recommend investigating _why_ debug_pagealloc is violating the > > API. It might not be necessary to wait for the pageattr universal > > feature to make DEBUG_PAGEALLOC work safe. > > This makes the DEBUG_PAGEALLOC stuff symmetric enough to boot for me, > and it's pretty damn simple. Any ideas for doing this without bloating > 'struct page', even in the debugging case? You could use a bit from page->flags. But CONFIG_DEBUG_PAGEALLOC uses so much additional memory nobody would notice anyway. hm. Or maybe page->mapcount is available on these pages. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-02 23:04 ` Andrew Morton @ 2004-11-03 1:40 ` Andrea Arcangeli 0 siblings, 0 replies; 26+ messages in thread From: Andrea Arcangeli @ 2004-11-03 1:40 UTC (permalink / raw) To: Andrew Morton; +Cc: Dave Hansen, linux-mm, linux-kernel, ak On Tue, Nov 02, 2004 at 03:04:26PM -0800, Andrew Morton wrote: > Dave Hansen <haveblue@us.ibm.com> wrote: > > > > Andrea Arcangeli wrote: > > > Still I recommend investigating _why_ debug_pagealloc is violating the > > > API. It might not be necessary to wait for the pageattr universal > > > feature to make DEBUG_PAGEALLOC work safe. > > > > This makes the DEBUG_PAGEALLOC stuff symmetric enough to boot for me, > > and it's pretty damn simple. Any ideas for doing this without bloating > > 'struct page', even in the debugging case? > > You could use a bit from page->flags. But CONFIG_DEBUG_PAGEALLOC uses so > much additional memory nobody would notice anyway. > > hm. Or maybe page->mapcount is available on these pages. "page < highmem_start_page" would be equivalent to page->mapped == 1. I'll try to reproduce the problem, frankly I wondered how DEBUG_PAGEALLOC could ever work with the current pageattr code that is far from be usable outside a single device driver that owns the physical region (then I thought DEBUG_PAGEALLOC perhaps was in an unusable state like dead code). The current change_page_attr is basically only for symmetric use via ioremap_nocache or AGP explicit code, that _owns_ the region by registering in resource.c. Attempting to use the current restrictive API in random page resources not registared and owned by the caller, may not even be fixable without creating an universal API. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-02 21:21 fix iounmap and a pageattr memleak (x86 and x86-64) Dave Hansen 2004-11-02 22:07 ` Andrea Arcangeli @ 2004-11-02 22:34 ` Jason Baron 2004-11-02 23:12 ` Andrea Arcangeli 1 sibling, 1 reply; 26+ messages in thread From: Jason Baron @ 2004-11-02 22:34 UTC (permalink / raw) To: Dave Hansen; +Cc: andrea, linux-mm, linux-kernel, Andi Kleen, Andrew Morton On Tue, 2 Nov 2004, Dave Hansen wrote: > This patch: > > > From: Andrea Arcangeli <andrea@novell.com> > > > > - fix silent memleak in the pageattr code that I found while searching > > for the bug Andi fixed in the second patch below (basically reference > > counting in split page was done on the pmd instead of the pte). > > > > - Part of this patch is also needed to make the above work on x86 (otherwise > > one of my new above BUGS() will trigger signalling the fact a bug was > > there). The below patch creates a subtle dependency that (_PAGE_PCD << 24) > > must not be zero. It's not the cleanest thing ever, but since it's an > > hardware bitflag I doubt it's going to break. > > > > Signed-off-by: Andi Kleen <ak@suse.de> > > Signed-off-by: Andrea Arcangeli <andrea@novell.com> > > Signed-off-by: Andrew Morton <akpm@osdl.org> > > --- > > > > 25-akpm/arch/i386/mm/ioremap.c | 4 ++-- > > 25-akpm/arch/i386/mm/pageattr.c | 13 +++++++------ > > 25-akpm/arch/x86_64/mm/ioremap.c | 14 +++++++------- > > 25-akpm/arch/x86_64/mm/pageattr.c | 23 ++++++++++++++--------- > > 4 files changed, 30 insertions(+), 24 deletions(-) > > is hitting this BUG() during bootup: > > /* memleak and potential failed 2M page regeneration */ > BUG_ON(!page_count(kpte_page)); > > in 2.6.10-rc1-mm2. > I've seen the page_count being -1 (not sure why), for a number of pages in the identity mapped region...So the BUG() on 0 doesn't seem valid to me. Also, in order to tell if the pages should be merged back to create a huge page, i don't see how the patch differentiates b/w pages that were split and those that weren't simply based on the page_count.... -Jason -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: fix iounmap and a pageattr memleak (x86 and x86-64) 2004-11-02 22:34 ` Jason Baron @ 2004-11-02 23:12 ` Andrea Arcangeli 0 siblings, 0 replies; 26+ messages in thread From: Andrea Arcangeli @ 2004-11-02 23:12 UTC (permalink / raw) To: Jason Baron Cc: Dave Hansen, linux-mm, linux-kernel, Andi Kleen, Andrew Morton On Tue, Nov 02, 2004 at 05:34:08PM -0500, Jason Baron wrote: > I've seen the page_count being -1 (not sure why), for a number of pages in > the identity mapped region...So the BUG() on 0 doesn't seem valid to me. bugcheck on 0 is valid there, it signals memleak. page_count == -1 signals another bug that might or might not be fixed by this as far as I can tell (this furthermore is a pte, so it sure can't have page_count == 0 or -1). So please try to track down which pages had page_count == -1. Note, we must not confuse page->count with page_count, for the former value -1 means "in the freelist". For the latter it signals a bug. > Also, in order to tell if the pages should be merged back to create a huge > page, i don't see how the patch differentiates b/w pages that were split > and those that weren't simply based on the page_count.... that's a page_count of the _pte_, not of the pages. so we're guaranteed it's always 1 unless we deal with this very pageattr code that is the only one that evers boost a pte page_count to > 1. If you mean how can we provide an universal API with only keeping track of things with the page_count of the pte, we can't, and that's why it's not an universal API and that's why some symmetry is required by the API. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a> ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2004-11-05 4:20 UTC | newest] Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-11-02 21:21 fix iounmap and a pageattr memleak (x86 and x86-64) Dave Hansen 2004-11-02 22:07 ` Andrea Arcangeli 2004-11-02 22:21 ` Dave Hansen 2004-11-02 22:29 ` Andrew Morton 2004-11-02 22:34 ` Dave Hansen 2004-11-03 0:54 ` Andrea Arcangeli 2004-11-02 22:45 ` Dave Hansen 2004-11-02 23:00 ` Dave Hansen 2004-11-03 1:35 ` Andrea Arcangeli 2004-11-03 1:43 ` Dave Hansen 2004-11-03 2:26 ` Andrea Arcangeli 2004-11-03 2:48 ` Dave Hansen 2004-11-03 3:05 ` Andrea Arcangeli 2004-11-03 19:37 ` Dave Hansen 2004-11-05 0:02 ` Dave Hansen 2004-11-05 0:40 ` Dave Hansen 2004-11-05 0:53 ` Andrea Arcangeli 2004-11-05 1:55 ` Dave Hansen 2004-11-05 2:08 ` Andrea Arcangeli 2004-11-05 2:23 ` Dave Hansen 2004-11-05 4:03 ` Andrea Arcangeli 2004-11-05 4:20 ` Andrea Arcangeli 2004-11-02 23:04 ` Andrew Morton 2004-11-03 1:40 ` Andrea Arcangeli 2004-11-02 22:34 ` Jason Baron 2004-11-02 23:12 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox