* [patch] fix memmap accounting
@ 2007-01-05 14:55 Heiko Carstens, Heiko Carstens
2007-01-05 15:50 ` Dave Hansen
2007-01-05 16:48 ` Andy Whitcroft
0 siblings, 2 replies; 3+ messages in thread
From: Heiko Carstens, Heiko Carstens @ 2007-01-05 14:55 UTC (permalink / raw)
To: linux-mm
Cc: Dave Hansen, Andy Whitcroft, Mel Gorman, Martin Schwidefsky,
Andrew Morton
Using some rather large holes in memory gives me an error.
Present memory areas are 0-1GB and 1023GB-1023.5GB (1.5GB in total)
Kernel output on s390 with vmemmap is this:
Entering add_active_range(0, 0, 262143) 0 entries of 256 used
Entering add_active_range(0, 268173312, 268304383) 1 entries of 256 used
Detected 4 CPU's
Boot cpu address 0
Zone PFN ranges:
DMA 0 -> 524288
Normal 524288 -> 268304384
early_node_map[2] active PFN ranges
0: 0 -> 262143
0: 268173312 -> 268304383
On node 0 totalpages: 393214
DMA zone: 9216 pages used for memmap
DMA zone: 0 pages reserved
DMA zone: 252927 pages, LIFO batch:31
Normal zone: 4707071 pages exceeds realsize 131071 <------
Normal zone: 131071 pages, LIFO batch:31
Built 1 zonelists. Total pages: 383998
So the calculation of the number of pages needed for the memmap is wrong.
It just doesn't work with virtual memmaps since it expects that all pages
of a memmap are actually backed with physical pages which is not the case
here.
This patch fixes it, but I guess something similar is also needed for
SPARSEMEM and ia64 (with vmemmap).
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
arch/s390/Kconfig | 3 +++
mm/page_alloc.c | 4 ++++
2 files changed, 7 insertions(+)
Index: linux-2.6/arch/s390/Kconfig
===================================================================
--- linux-2.6.orig/arch/s390/Kconfig
+++ linux-2.6/arch/s390/Kconfig
@@ -30,6 +30,9 @@ config ARCH_HAS_ILOG2_U64
bool
default n
+config ARCH_HAS_VMEMMAP
+ def_bool y
+
config GENERIC_HWEIGHT
bool
default y
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -2629,7 +2629,11 @@ static void __meminit free_area_init_cor
* is used by this zone for memmap. This affects the watermark
* and per-cpu initialisations
*/
+#ifdef CONFIG_ARCH_HAS_VMEMMAP
+ memmap_pages = (realsize * sizeof(struct page)) >> PAGE_SHIFT;
+#else
memmap_pages = (size * sizeof(struct page)) >> PAGE_SHIFT;
+#endif
if (realsize >= memmap_pages) {
realsize -= memmap_pages;
printk(KERN_DEBUG
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [patch] fix memmap accounting
2007-01-05 14:55 [patch] fix memmap accounting Heiko Carstens, Heiko Carstens
@ 2007-01-05 15:50 ` Dave Hansen
2007-01-05 16:48 ` Andy Whitcroft
1 sibling, 0 replies; 3+ messages in thread
From: Dave Hansen @ 2007-01-05 15:50 UTC (permalink / raw)
To: Heiko Carstens
Cc: linux-mm, Andy Whitcroft, Mel Gorman, Martin Schwidefsky, Andrew Morton
On Fri, 2007-01-05 at 15:55 +0100, Heiko Carstens wrote:
> So the calculation of the number of pages needed for the memmap is wrong.
> It just doesn't work with virtual memmaps since it expects that all pages
> of a memmap are actually backed with physical pages which is not the case
> here.
>
> This patch fixes it, but I guess something similar is also needed for
> SPARSEMEM and ia64 (with vmemmap).
...
> --- linux-2.6.orig/arch/s390/Kconfig
> +++ linux-2.6/arch/s390/Kconfig
> @@ -30,6 +30,9 @@ config ARCH_HAS_ILOG2_U64
> bool
> default n
>
> +config ARCH_HAS_VMEMMAP
> + def_bool y
> +
> config GENERIC_HWEIGHT
> bool
> default y
> Index: linux-2.6/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -2629,7 +2629,11 @@ static void __meminit free_area_init_cor
> * is used by this zone for memmap. This affects the watermark
> * and per-cpu initialisations
> */
> +#ifdef CONFIG_ARCH_HAS_VMEMMAP
> + memmap_pages = (realsize * sizeof(struct page)) >> PAGE_SHIFT;
> +#else
> memmap_pages = (size * sizeof(struct page)) >> PAGE_SHIFT;
> +#endif
> if (realsize >= memmap_pages) {
> realsize -= memmap_pages;
> printk(KERN_DEBUG
I'm not sure this is the right fix. The same issues should, in theory,
be present for SPARSEMEM systems. So, doing it by architecture alone is
probably a bad idea. This also just kinda hacks around the problem. In
any case, at least ia64 also has vmem_map[]s and needs it too.
I think the correct solution here is to either actually record how many
pages we allocate for mem_map[]s or keep the hole information so that it
can also be referenced in this area of the code. I think the direct
accounting of how many pages went to mem_map[]s is probably best because
it tackles the problem more directly. Otherwise, we potentially need to
expose the information about how mem_map[]s cover holes on _each_ of the
methods, and effectively recalculate it here.
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [patch] fix memmap accounting
2007-01-05 14:55 [patch] fix memmap accounting Heiko Carstens, Heiko Carstens
2007-01-05 15:50 ` Dave Hansen
@ 2007-01-05 16:48 ` Andy Whitcroft
1 sibling, 0 replies; 3+ messages in thread
From: Andy Whitcroft @ 2007-01-05 16:48 UTC (permalink / raw)
To: Heiko Carstens
Cc: linux-mm, Dave Hansen, Mel Gorman, Martin Schwidefsky, Andrew Morton
Heiko Carstens wrote:
> From: Heiko Carstens <heiko.carstens@de.ibm.com>
>
> Using some rather large holes in memory gives me an error.
> Present memory areas are 0-1GB and 1023GB-1023.5GB (1.5GB in total)
>
> Kernel output on s390 with vmemmap is this:
>
> Entering add_active_range(0, 0, 262143) 0 entries of 256 used
> Entering add_active_range(0, 268173312, 268304383) 1 entries of 256 used
> Detected 4 CPU's
> Boot cpu address 0
> Zone PFN ranges:
> DMA 0 -> 524288
> Normal 524288 -> 268304384
> early_node_map[2] active PFN ranges
> 0: 0 -> 262143
> 0: 268173312 -> 268304383
> On node 0 totalpages: 393214
> DMA zone: 9216 pages used for memmap
> DMA zone: 0 pages reserved
> DMA zone: 252927 pages, LIFO batch:31
>
> Normal zone: 4707071 pages exceeds realsize 131071 <------
>
> Normal zone: 131071 pages, LIFO batch:31
> Built 1 zonelists. Total pages: 383998
>
> So the calculation of the number of pages needed for the memmap is wrong.
> It just doesn't work with virtual memmaps since it expects that all pages
> of a memmap are actually backed with physical pages which is not the case
> here.
>
> This patch fixes it, but I guess something similar is also needed for
> SPARSEMEM and ia64 (with vmemmap).
>
> Cc: Dave Hansen <haveblue@us.ibm.com>
> Cc: Andy Whitcroft <apw@shadowen.org>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> ---
> arch/s390/Kconfig | 3 +++
> mm/page_alloc.c | 4 ++++
> 2 files changed, 7 insertions(+)
>
> Index: linux-2.6/arch/s390/Kconfig
> ===================================================================
> --- linux-2.6.orig/arch/s390/Kconfig
> +++ linux-2.6/arch/s390/Kconfig
> @@ -30,6 +30,9 @@ config ARCH_HAS_ILOG2_U64
> bool
> default n
>
> +config ARCH_HAS_VMEMMAP
> + def_bool y
> +
> config GENERIC_HWEIGHT
> bool
> default y
> Index: linux-2.6/mm/page_alloc.c
> ===================================================================
> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -2629,7 +2629,11 @@ static void __meminit free_area_init_cor
> * is used by this zone for memmap. This affects the watermark
> * and per-cpu initialisations
> */
> +#ifdef CONFIG_ARCH_HAS_VMEMMAP
> + memmap_pages = (realsize * sizeof(struct page)) >> PAGE_SHIFT;
This is a pretty crude estimate. We could be using half a page in 100
pages and get the number way out. That said its also really only a hint
to try and get the water marks right. All that based on the assumption
that the pages which back the zone are from the zone. Is that even a
valid assumption. On numa-q they are 'outside' the node, on x86 they
are all out of node 0. Hmmm.
> +#else
> memmap_pages = (size * sizeof(struct page)) >> PAGE_SHIFT;
> +#endif
> if (realsize >= memmap_pages) {
> realsize -= memmap_pages;
> printk(KERN_DEBUG
I think Dave has the right of it in that we should be pushing the memmap
'consumption' issue back to the memory model not exposing it here.
However, it is key to note that these are estimates and used to set the
water marks and the like. As mentioned earlier they are already
somewhat inaccurate as we may not be allocating them from the zone
itself or even from accounted memory.
The correct fix would seem to be to have a memmap_size(zone) style
interface provided by the memory models. Of course at the time we need
the information the actual zone is not yet initialised at all.
Perhaps we could take a stab at this improving the situation, improving
the estimates without completly fixing things. Something like this
which would just make a judgement about the 'sparseness' of the memmap.
int memmap_size(struct zone *zone)
{
#if defined(CONFIG_SPARSEMEM) || defined(ARCH_HAS_VMMEMMAP)
return (zone->present_pages * sizeof(struct page)
>> PAGE_SHIFT);
#else
return (zone->spanned_pages * sizeof(struct page)
>> PAGE_SHIFT);
}
Of course this would mean changing the order a little in
free_area_init_core() so that we have these filled in at least.
Perhaps we are looking at this the wrong way round. We only care about
the realsize in the context of working out sensible watermarks. If we
simply initialised all of the zones ignoring the size of the memmap it
would get allocated from wherever. Once _all_ zones are up and running
we could do a second pass and look at the _real_ number of pages free in
the zone and make the requisite percentage of _that_. Obviously highmem
is released separately and so that would need calculating later.
-apw
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-01-05 16:48 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-05 14:55 [patch] fix memmap accounting Heiko Carstens, Heiko Carstens
2007-01-05 15:50 ` Dave Hansen
2007-01-05 16:48 ` Andy Whitcroft
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox