linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Fixes for node alignment and flatmem assumptions
@ 2006-05-19 13:42 Mel Gorman
  2006-05-19 13:43 ` [PATCH 1/2] Align the node_mem_map endpoints to a MAX_ORDER boundary Mel Gorman
  2006-05-19 13:43 ` [PATCH 2/2] FLATMEM relax requirement for memory to start at pfn 0 Mel Gorman
  0 siblings, 2 replies; 7+ messages in thread
From: Mel Gorman @ 2006-05-19 13:42 UTC (permalink / raw)
  To: akpm
  Cc: Mel Gorman, nickpiggin, linux-kernel, haveblue, ak, bob.picco,
	mbligh, linux-mm, apw, mingo

After almost 3 days of banging the head on the keyboard, it was discovered
why arch-independent zone-sizing failed on IA64 for the configuration
posted on http://www.zip.com.au/~akpm/linux/patches/stuff/config-ia64 .

The two patches in this set address the following;

1. The buddy allocator requires that the node_mem_map be aligned on
   a MAX_ORDER boundary. Patch 1 from Bob Picco's patch aligns the
   node_map_map correctly.

2. This is the one that was giving me keyboard face. The FLATMEM memory
   model assumes that

   mem_map[0] == NODE_DATA(0)->node_mem_map == PFN 0

   This is not the case on IA64 with arch-independent zone sizing because
   NODE_DATA(0)->node_mem_map starts where the first valid page frame is. On
   my test machine, that is PFN 1025 but it probably varies.  Patch 2 from Andy
   Whitcroft relaxes the assumption that NODE_DATA(0)->node_mem_map == PFN 0 .

These patches apply to 2.6.17-rc4-mm1 and are independent of
architecture-independent zone sizing. Patch 1 in particular fixes a
real problem that is just difficult to trigger. However, once applied,
have-ia64-use-add_active_range-and-free_area_init_nodes.patch will work again.

2.6.17-rc4-mm1 with this patchset have been boot-tested by me
and verified that /proc/zoneinfo is ok on x86, ppc64, x86_64 and
ia64 in a variety of configurations. Bob Picco also says that both
patches passed a test with mem=750M and 4Gb on a rx2600 (ia64) with
large memory holes. They have also been successfully tested with
have-ia64-use-add_active_range-and-free_area_init_nodes.patch added back in.
-- 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] Align the node_mem_map endpoints to a MAX_ORDER boundary
  2006-05-19 13:42 [PATCH 0/2] Fixes for node alignment and flatmem assumptions Mel Gorman
@ 2006-05-19 13:43 ` Mel Gorman
  2006-05-19 20:49   ` Andrew Morton
  2006-05-19 13:43 ` [PATCH 2/2] FLATMEM relax requirement for memory to start at pfn 0 Mel Gorman
  1 sibling, 1 reply; 7+ messages in thread
From: Mel Gorman @ 2006-05-19 13:43 UTC (permalink / raw)
  To: akpm
  Cc: Mel Gorman, nickpiggin, haveblue, ak, bob.picco, linux-kernel,
	linux-mm, apw, mingo, mbligh

From: Bob Picco <bob.picco@hp.com>

Andy added code to buddy allocator which does not require the zone's
endpoints to be aligned to MAX_ORDER. An issue is that the buddy
allocator requires the node_mem_map's endpoints to be MAX_ORDER aligned.
Otherwise __page_find_buddy could compute a buddy not in node_mem_map for
partial MAX_ORDER regions at zone's endpoints. page_is_buddy will detect
that these pages at endpoints are not PG_buddy (they were zeroed out by
bootmem allocator and not part of zone). Of course the negative here is
we could waste a little memory but the positive is eliminating all the
old checks for zone boundary conditions.

SPARSEMEM won't encounter this issue because of MAX_ORDER size constraint
when SPARSEMEM is configured. ia64 VIRTUAL_MEM_MAP doesn't need the
logic either because the holes and endpoints are handled differently.
This leaves checking alloc_remap and other arches which privately allocate
for node_mem_map.


 include/linux/mmzone.h |    1 +
 mm/page_alloc.c        |   14 +++++++++++---
 2 files changed, 12 insertions(+), 3 deletions(-)

Signed-off-by: Bob Picco <bob.picco@hp.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>

diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc4-mm1-clean/include/linux/mmzone.h linux-2.6.17-rc4-mm1-101-bob-node-alignment/include/linux/mmzone.h
--- linux-2.6.17-rc4-mm1-clean/include/linux/mmzone.h	2006-05-18 17:23:55.000000000 +0100
+++ linux-2.6.17-rc4-mm1-101-bob-node-alignment/include/linux/mmzone.h	2006-05-18 17:52:13.000000000 +0100
@@ -21,6 +21,7 @@
 #else
 #define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER
 #endif
+#define MAX_ORDER_NR_PAGES (1 << (MAX_ORDER - 1))
 
 struct free_area {
 	struct list_head	free_list;
diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc4-mm1-clean/mm/page_alloc.c linux-2.6.17-rc4-mm1-101-bob-node-alignment/mm/page_alloc.c
--- linux-2.6.17-rc4-mm1-clean/mm/page_alloc.c	2006-05-18 17:23:55.000000000 +0100
+++ linux-2.6.17-rc4-mm1-101-bob-node-alignment/mm/page_alloc.c	2006-05-18 17:58:10.000000000 +0100
@@ -2484,14 +2484,22 @@ static void __init alloc_node_mem_map(st
 #ifdef CONFIG_FLAT_NODE_MEM_MAP
 	/* ia64 gets its own node_mem_map, before this, without bootmem */
 	if (!pgdat->node_mem_map) {
-		unsigned long size;
+		unsigned long size, start, end;
 		struct page *map;
 
-		size = (pgdat->node_spanned_pages + 1) * sizeof(struct page);
+		/*
+		 * The zone's endpoints aren't required to be MAX_ORDER
+		 * aligned but the node_mem_map endpoints must be in order
+		 * for the buddy allocator to function correctly.
+		 */
+		start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1);
+		end = pgdat->node_start_pfn + pgdat->node_spanned_pages;
+		end = ALIGN(end, MAX_ORDER_NR_PAGES);
+		size =  (end - start) * sizeof(struct page);
 		map = alloc_remap(pgdat->node_id, size);
 		if (!map)
 			map = alloc_bootmem_node(pgdat, size);
-		pgdat->node_mem_map = map;
+		pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
 	}
 #ifdef CONFIG_FLATMEM
 	/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/2] FLATMEM relax requirement for memory to start at pfn 0
  2006-05-19 13:42 [PATCH 0/2] Fixes for node alignment and flatmem assumptions Mel Gorman
  2006-05-19 13:43 ` [PATCH 1/2] Align the node_mem_map endpoints to a MAX_ORDER boundary Mel Gorman
@ 2006-05-19 13:43 ` Mel Gorman
  1 sibling, 0 replies; 7+ messages in thread
From: Mel Gorman @ 2006-05-19 13:43 UTC (permalink / raw)
  To: akpm
  Cc: Mel Gorman, nickpiggin, haveblue, linux-kernel, bob.picco, ak,
	linux-mm, apw, mingo, mbligh

From: Andy Whitcroft <apw@shadowen.org>

The FLATMEM memory model assumes that memory is in one contigious area
based at pfn 0.  If we initialise node 0 to start at any other offset we
will incorrectly map pfn's to the wrong struct page *.  The key to the
memory model is the contigious nature of the memory not the location of it.
Relax the requirement for the area to start at 0.


 page_alloc.c |   17 +++++++++++++----
 1 files changed, 13 insertions(+), 4 deletions(-)

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Mel Gorman <mel@csn.ul.ie>

diff -rup -X /usr/src/patchset-0.5/bin//dontdiff linux-2.6.17-rc4-mm1-101-bob-node-alignment/mm/page_alloc.c linux-2.6.17-rc4-mm1-102-FLATMEM-relax-requirement-for-memory-to-start-at-pfn-0/mm/page_alloc.c
--- linux-2.6.17-rc4-mm1-101-bob-node-alignment/mm/page_alloc.c	2006-05-18 17:58:10.000000000 +0100
+++ linux-2.6.17-rc4-mm1-102-FLATMEM-relax-requirement-for-memory-to-start-at-pfn-0/mm/page_alloc.c	2006-05-18 19:14:44.000000000 +0100
@@ -2477,15 +2477,16 @@ static void __meminit free_area_init_cor
 
 static void __init alloc_node_mem_map(struct pglist_data *pgdat)
 {
+#ifdef CONFIG_FLAT_NODE_MEM_MAP
+	struct page *map = pgdat->node_mem_map;
+
 	/* Skip empty nodes */
 	if (!pgdat->node_spanned_pages)
 		return;
 
-#ifdef CONFIG_FLAT_NODE_MEM_MAP
 	/* ia64 gets its own node_mem_map, before this, without bootmem */
-	if (!pgdat->node_mem_map) {
+	if (!map) {
 		unsigned long size, start, end;
-		struct page *map;
 
 		/*
 		 * The zone's endpoints aren't required to be MAX_ORDER
@@ -2500,13 +2501,21 @@ static void __init alloc_node_mem_map(st
 		if (!map)
 			map = alloc_bootmem_node(pgdat, size);
 		pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
+
+		/*
+		 * With FLATMEM the global mem_map is used.  This is assumed
+		 * to be based at pfn 0 such that 'pfn = page* - mem_map'
+		 * is true. Adjust map relative to node_mem_map to
+		 * maintain this relationship.
+		 */
+		map -= pgdat->node_start_pfn;
 	}
 #ifdef CONFIG_FLATMEM
 	/*
 	 * With no DISCONTIG, the global mem_map is just set as node 0's
 	 */
 	if (pgdat == NODE_DATA(0))
-		mem_map = NODE_DATA(0)->node_mem_map;
+		mem_map = map;
 #endif
 #endif /* CONFIG_FLAT_NODE_MEM_MAP */
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Align the node_mem_map endpoints to a MAX_ORDER boundary
  2006-05-19 13:43 ` [PATCH 1/2] Align the node_mem_map endpoints to a MAX_ORDER boundary Mel Gorman
@ 2006-05-19 20:49   ` Andrew Morton
  2006-05-19 23:25     ` Mel Gorman
  2006-05-22  8:25     ` Andy Whitcroft
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew Morton @ 2006-05-19 20:49 UTC (permalink / raw)
  To: Mel Gorman
  Cc: nickpiggin, haveblue, ak, bob.picco, linux-kernel, linux-mm, apw,
	mingo, mbligh

Mel Gorman <mel@csn.ul.ie> wrote:
>
> Andy added code to buddy allocator which does not require the zone's
> endpoints to be aligned to MAX_ORDER. An issue is that the buddy
> allocator requires the node_mem_map's endpoints to be MAX_ORDER aligned.
> Otherwise __page_find_buddy could compute a buddy not in node_mem_map for
> partial MAX_ORDER regions at zone's endpoints. page_is_buddy will detect
> that these pages at endpoints are not PG_buddy (they were zeroed out by
> bootmem allocator and not part of zone). Of course the negative here is
> we could waste a little memory but the positive is eliminating all the
> old checks for zone boundary conditions.
> 
> SPARSEMEM won't encounter this issue because of MAX_ORDER size constraint
> when SPARSEMEM is configured. ia64 VIRTUAL_MEM_MAP doesn't need the
> logic either because the holes and endpoints are handled differently.
> This leaves checking alloc_remap and other arches which privately allocate
> for node_mem_map.

Do we think we need this in 2.6.17?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Align the node_mem_map endpoints to a MAX_ORDER boundary
  2006-05-19 20:49   ` Andrew Morton
@ 2006-05-19 23:25     ` Mel Gorman
  2006-05-22  8:25     ` Andy Whitcroft
  1 sibling, 0 replies; 7+ messages in thread
From: Mel Gorman @ 2006-05-19 23:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: nickpiggin, haveblue, ak, bob.picco, linux-kernel, linux-mm, apw,
	mingo, mbligh

On Fri, 19 May 2006, Andrew Morton wrote:

> Mel Gorman <mel@csn.ul.ie> wrote:
>>
>> Andy added code to buddy allocator which does not require the zone's
>> endpoints to be aligned to MAX_ORDER. An issue is that the buddy
>> allocator requires the node_mem_map's endpoints to be MAX_ORDER aligned.
>> Otherwise __page_find_buddy could compute a buddy not in node_mem_map for
>> partial MAX_ORDER regions at zone's endpoints. page_is_buddy will detect
>> that these pages at endpoints are not PG_buddy (they were zeroed out by
>> bootmem allocator and not part of zone). Of course the negative here is
>> we could waste a little memory but the positive is eliminating all the
>> old checks for zone boundary conditions.
>>
>> SPARSEMEM won't encounter this issue because of MAX_ORDER size constraint
>> when SPARSEMEM is configured. ia64 VIRTUAL_MEM_MAP doesn't need the
>> logic either because the holes and endpoints are handled differently.
>> This leaves checking alloc_remap and other arches which privately allocate
>> for node_mem_map.
>
> Do we think we need this in 2.6.17?
>

I think so. Not all architectures are making sure their node_start_pfn is 
aligned to a boundary. For example, x86_64 does, ia64 may align depending 
on the value of MAX_ORDER and i386 does not appear to make any effort. No 
one seems to be making any special effort to align the end of the zone at 
all. This potentially means that the buddy allocator is checking portions 
of memory as if they were struct page * when they are something totally 
different. I suspect it's just luck that the memory outside of the mem_map 
never looked like buddies.

Anyone else care to comment?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Align the node_mem_map endpoints to a MAX_ORDER boundary
  2006-05-19 20:49   ` Andrew Morton
  2006-05-19 23:25     ` Mel Gorman
@ 2006-05-22  8:25     ` Andy Whitcroft
  2006-05-22  8:44       ` Andrew Morton
  1 sibling, 1 reply; 7+ messages in thread
From: Andy Whitcroft @ 2006-05-22  8:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, nickpiggin, haveblue, ak, bob.picco, linux-kernel,
	linux-mm, mingo, mbligh

Andrew Morton wrote:
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
>>Andy added code to buddy allocator which does not require the zone's
>>endpoints to be aligned to MAX_ORDER. An issue is that the buddy
>>allocator requires the node_mem_map's endpoints to be MAX_ORDER aligned.
>>Otherwise __page_find_buddy could compute a buddy not in node_mem_map for
>>partial MAX_ORDER regions at zone's endpoints. page_is_buddy will detect
>>that these pages at endpoints are not PG_buddy (they were zeroed out by
>>bootmem allocator and not part of zone). Of course the negative here is
>>we could waste a little memory but the positive is eliminating all the
>>old checks for zone boundary conditions.
>>
>>SPARSEMEM won't encounter this issue because of MAX_ORDER size constraint
>>when SPARSEMEM is configured. ia64 VIRTUAL_MEM_MAP doesn't need the
>>logic either because the holes and endpoints are handled differently.
>>This leaves checking alloc_remap and other arches which privately allocate
>>for node_mem_map.
> 
> 
> Do we think we need this in 2.6.17?

I would say yes, it is a very low risk patch in my view and provides a
very large part of the protections we require.  i386 as our largest
userbase should be safe from zone/node alignment issues with just this
change.  Others need slightly more (the page_zone_idx check) which is
being discussed in another thread.

-apw

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Align the node_mem_map endpoints to a MAX_ORDER boundary
  2006-05-22  8:25     ` Andy Whitcroft
@ 2006-05-22  8:44       ` Andrew Morton
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2006-05-22  8:44 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: mel, nickpiggin, haveblue, ak, bob.picco, linux-kernel, linux-mm,
	mingo, mbligh

Andy Whitcroft <apw@shadowen.org> wrote:
>
> Andrew Morton wrote:
> > Mel Gorman <mel@csn.ul.ie> wrote:
> > 
> >>Andy added code to buddy allocator which does not require the zone's
> >>endpoints to be aligned to MAX_ORDER. An issue is that the buddy
> >>allocator requires the node_mem_map's endpoints to be MAX_ORDER aligned.
> >>Otherwise __page_find_buddy could compute a buddy not in node_mem_map for
> >>partial MAX_ORDER regions at zone's endpoints. page_is_buddy will detect
> >>that these pages at endpoints are not PG_buddy (they were zeroed out by
> >>bootmem allocator and not part of zone). Of course the negative here is
> >>we could waste a little memory but the positive is eliminating all the
> >>old checks for zone boundary conditions.
> >>
> >>SPARSEMEM won't encounter this issue because of MAX_ORDER size constraint
> >>when SPARSEMEM is configured. ia64 VIRTUAL_MEM_MAP doesn't need the
> >>logic either because the holes and endpoints are handled differently.
> >>This leaves checking alloc_remap and other arches which privately allocate
> >>for node_mem_map.
> > 
> > 
> > Do we think we need this in 2.6.17?
> 
> I would say yes, it is a very low risk patch in my view and provides a
> very large part of the protections we require.  i386 as our largest
> userbase should be safe from zone/node alignment issues with just this
> change.  Others need slightly more (the page_zone_idx check) which is
> being discussed in another thread.
> 

Well I've largely lost the plot here (which happens often), and it appears
that Nick has concerns with this approach (which also is not uncommon).

So could you guys please come to some sort of (rapid) consensus and tell me
which patches from -mm3 (hopefully but an hour away) need to go into
2.6.17?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-05-22  8:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-19 13:42 [PATCH 0/2] Fixes for node alignment and flatmem assumptions Mel Gorman
2006-05-19 13:43 ` [PATCH 1/2] Align the node_mem_map endpoints to a MAX_ORDER boundary Mel Gorman
2006-05-19 20:49   ` Andrew Morton
2006-05-19 23:25     ` Mel Gorman
2006-05-22  8:25     ` Andy Whitcroft
2006-05-22  8:44       ` Andrew Morton
2006-05-19 13:43 ` [PATCH 2/2] FLATMEM relax requirement for memory to start at pfn 0 Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox