[Patch] memory unplug v3 [0/4]

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [Patch] memory unplug v3 [0/4]
@ 2007-05-22  6:58 KAMEZAWA Hiroyuki
  2007-05-22  7:01 ` [Patch] memory unplug v3 [1/4] page isolation KAMEZAWA Hiroyuki
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-05-22  6:58 UTC (permalink / raw)
  To: Linux-MM; +Cc: mel, y-goto, clameter

This is memory unplug base patcheset v3 against 2.6.22-rc1-mm1.
just for review and for testers.

Changelog V2->V3
 - Using Meln's page grouping method. this simplifies the whole patch set.
   MIGRATE_ISOLATE migratetype is added.
 - restructured patch series.
 - rebased to 2.6.22-rc1-mm1.
 - page is isolated ASAP patch is removed.
 - several fixes.

We tested this patch on ia64/NUMA and ia64/SMP. 

How to use
 - user kernelcore=XXX boot option to create ZONE_MOVABLE.
   Memory unplug itself can work without ZONE_MOVABLE but it will be
   better to use kernelcore= if your section size is big.
  
 - After bootup, execute following.
     # echo "offline" > /sys/devices/system/memory/memoryX/state
 - you can push back offlined memory by following
     # echo "online" > /sys/devices/system/memory/memoryX/state
    
TODO
 - remove memmap after memory unplug. (for sparsemem)
 - more tests and find - page which cannot be freed -
 - Now, there is no check around ZONE_MOVABLE and bootmem.
   I hope bootmem can treat kernelcore=....
 - add better logic to allocate memory for migration.
 - speed up under heavy workload.
 - node hotplug support
 - Should make i386/x86-64/powerpc interface code. But not yet 

If you have a request to add interface for test, please tell me.

4 patches are there
[1] page isolation patch
[2] migration by kernel patch
[3] page hot removal patch
[4] ia64 interface patch


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Patch] memory unplug v3 [1/4] page isolation
  2007-05-22  6:58 [Patch] memory unplug v3 [0/4] KAMEZAWA Hiroyuki
@ 2007-05-22  7:01 ` KAMEZAWA Hiroyuki
  2007-05-22 10:19   ` Mel Gorman
  2007-05-22 18:38   ` Christoph Lameter
  2007-05-22  7:04 ` [Patch] memory unplug v3 [2/4] migration by kernel KAMEZAWA Hiroyuki
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-05-22  7:01 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto, clameter

Patch for isoalte pages.
'isoalte'means make pages to be free and never allocated.
This feature helps making the range of pages unused.

This patch is based on Mel's page grouping method.

This patch add MIGRATE_ISOLATE to MIGRATE_TYPES. By this
- MIGRATE_TYPES increases.
- bitmap for migratetype is enlarged.

If isolate_pages(start,end) is called,
- migratetype of the range turns to be MIGRATE_ISOLATE  if 
  its current type is MIGRATE_MOVABLE or MIGRATE_RESERVE.
- MIGRATE_ISOLATE is not on migratetype fallback list.

Then, pages of this migratetype will not be allocated even if it is free.

Now, isolate_pages() only can treat the range aligned to MAX_ORDER.
This can be adjusted if necesasry...maybe.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Index: devel-2.6.22-rc1-mm1/include/linux/mmzone.h
===================================================================
--- devel-2.6.22-rc1-mm1.orig/include/linux/mmzone.h	2007-05-22 14:30:43.000000000 +0900
+++ devel-2.6.22-rc1-mm1/include/linux/mmzone.h	2007-05-22 15:12:28.000000000 +0900
@@ -35,11 +35,12 @@
  */
 #define PAGE_ALLOC_COSTLY_ORDER 3
 
-#define MIGRATE_UNMOVABLE     0
-#define MIGRATE_RECLAIMABLE   1
-#define MIGRATE_MOVABLE       2
-#define MIGRATE_RESERVE       3
-#define MIGRATE_TYPES         4
+#define MIGRATE_UNMOVABLE     0		/* not reclaimable pages */
+#define MIGRATE_RECLAIMABLE   1		/* shrink_xxx routine can reap this */
+#define MIGRATE_MOVABLE       2		/* migrate_page can migrate this */
+#define MIGRATE_RESERVE       3		/* no type yet */
+#define MIGRATE_ISOLATE       4		/* never allocated from */
+#define MIGRATE_TYPES         5
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
Index: devel-2.6.22-rc1-mm1/include/linux/pageblock-flags.h
===================================================================
--- devel-2.6.22-rc1-mm1.orig/include/linux/pageblock-flags.h	2007-05-22 14:30:43.000000000 +0900
+++ devel-2.6.22-rc1-mm1/include/linux/pageblock-flags.h	2007-05-22 15:12:28.000000000 +0900
@@ -31,7 +31,7 @@
 
 /* Bit indices that affect a whole block of pages */
 enum pageblock_bits {
-	PB_range(PB_migrate, 2), /* 2 bits required for migrate types */
+	PB_range(PB_migrate, 3), /* 3 bits required for migrate types */
 	NR_PAGEBLOCK_BITS
 };
 
Index: devel-2.6.22-rc1-mm1/mm/page_alloc.c
===================================================================
--- devel-2.6.22-rc1-mm1.orig/mm/page_alloc.c	2007-05-22 14:30:43.000000000 +0900
+++ devel-2.6.22-rc1-mm1/mm/page_alloc.c	2007-05-22 15:12:28.000000000 +0900
@@ -41,6 +41,7 @@
 #include <linux/pfn.h>
 #include <linux/backing-dev.h>
 #include <linux/fault-inject.h>
+#include <linux/page-isolation.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -1056,6 +1057,7 @@
 	struct zone *zone = page_zone(page);
 	struct per_cpu_pages *pcp;
 	unsigned long flags;
+	unsigned long migrate_type;
 
 	if (PageAnon(page))
 		page->mapping = NULL;
@@ -1064,6 +1066,12 @@
 
 	if (!PageHighMem(page))
 		debug_check_no_locks_freed(page_address(page), PAGE_SIZE);
+
+	migrate_type = get_pageblock_migratetype(page);
+	if (migrate_type == MIGRATE_ISOLATE) {
+		__free_pages_ok(page, 0);
+		return;
+	}
 	arch_free_page(page, 0);
 	kernel_map_pages(page, 1, 0);
 
@@ -1071,7 +1079,7 @@
 	local_irq_save(flags);
 	__count_vm_event(PGFREE);
 	list_add(&page->lru, &pcp->list);
-	set_page_private(page, get_pageblock_migratetype(page));
+	set_page_private(page, migrate_type);
 	pcp->count++;
 	if (pcp->count >= pcp->high) {
 		free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
@@ -4389,3 +4397,53 @@
 		else
 			__clear_bit(bitidx + start_bitidx, bitmap);
 }
+
+/*
+ * set/clear page block's type to be ISOLATE.
+ * page allocater never alloc memory from ISOLATE blcok.
+ */
+
+int is_page_isolated(struct page *page)
+{
+	if ((page_count(page) == 0) &&
+	    (get_pageblock_migratetype(page) == MIGRATE_ISOLATE))
+		return 1;
+	return 0;
+}
+
+int set_migratetype_isolate(struct page *page)
+{
+	struct zone *zone;
+	unsigned long flags;
+	int migrate_type;
+	int ret = -EBUSY;
+
+	zone = page_zone(page);
+	spin_lock_irqsave(&zone->lock, flags);
+	migrate_type = get_pageblock_migratetype(page);
+	if ((migrate_type != MIGRATE_MOVABLE) &&
+	    (migrate_type != MIGRATE_RESERVE))
+		goto out;
+	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
+	move_freepages_block(zone, page, MIGRATE_ISOLATE);
+	ret = 0;
+out:
+	spin_unlock_irqrestore(&zone->lock, flags);
+	if (!ret)
+		drain_all_local_pages();
+	return ret;
+}
+
+void clear_migratetype_isolate(struct page *page)
+{
+	struct zone *zone;
+	unsigned long flags;
+	zone = page_zone(page);
+	spin_lock_irqsave(&zone->lock, flags);
+	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
+		goto out;
+	set_pageblock_migratetype(page, MIGRATE_RESERVE);
+	move_freepages_block(zone, page, MIGRATE_RESERVE);
+out:
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
Index: devel-2.6.22-rc1-mm1/mm/page_isolation.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ devel-2.6.22-rc1-mm1/mm/page_isolation.c	2007-05-22 15:12:28.000000000 +0900
@@ -0,0 +1,67 @@
+/*
+ * linux/mm/page_isolation.c
+ */
+
+#include <stddef.h>
+#include <linux/mm.h>
+#include <linux/page-isolation.h>
+
+#define ROUND_DOWN(x,y)	((x) & ~((y) - 1))
+#define ROUND_UP(x,y)	(((x) + (y) -1) & ~((y) - 1))
+int
+isolate_pages(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn, start_pfn_aligned, end_pfn_aligned;
+	unsigned long undo_pfn;
+
+	start_pfn_aligned = ROUND_DOWN(start_pfn, NR_PAGES_ISOLATION_BLOCK);
+	end_pfn_aligned = ROUND_UP(end_pfn, NR_PAGES_ISOLATION_BLOCK);
+
+	for (pfn = start_pfn_aligned;
+	     pfn < end_pfn_aligned;
+	     pfn += NR_PAGES_ISOLATION_BLOCK)
+		if (set_migratetype_isolate(pfn_to_page(pfn))) {
+			undo_pfn = pfn;
+			goto undo;
+		}
+	return 0;
+undo:
+	for (pfn = start_pfn_aligned;
+	     pfn <= undo_pfn;
+	     pfn += NR_PAGES_ISOLATION_BLOCK)
+		clear_migratetype_isolate(pfn_to_page(pfn));
+
+	return -EBUSY;
+}
+
+
+int
+free_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn, start_pfn_aligned, end_pfn_aligned;
+	start_pfn_aligned = ROUND_DOWN(start_pfn, NR_PAGES_ISOLATION_BLOCK);
+        end_pfn_aligned = ROUND_UP(end_pfn, NR_PAGES_ISOLATION_BLOCK);
+
+	for (pfn = start_pfn_aligned;
+	     pfn < end_pfn_aligned;
+	     pfn += MAX_ORDER_NR_PAGES)
+		clear_migratetype_isolate(pfn_to_page(pfn));
+	return 0;
+}
+
+int
+test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	int ret = 0;
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		if (!is_page_isolated(pfn_to_page(pfn))) {
+			ret = 1;
+			break;
+		}
+	}
+	return ret;
+}
Index: devel-2.6.22-rc1-mm1/include/linux/page-isolation.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ devel-2.6.22-rc1-mm1/include/linux/page-isolation.h	2007-05-22 15:12:28.000000000 +0900
@@ -0,0 +1,47 @@
+#ifndef __LINUX_PAGEISOLATION_H
+#define __LINUX_PAGEISOLATION_H
+/*
+ * Define an interface for capturing and isolating some amount of
+ * contiguous pages.
+ * isolated pages are freed but wll never be allocated until they are
+ * pushed back.
+ *
+ * This isolation function requires some alignment.
+ */
+
+#define PAGE_ISOLATION_ORDER	(MAX_ORDER - 1)
+#define NR_PAGES_ISOLATION_BLOCK	(1 << PAGE_ISOLATION_ORDER)
+
+/*
+ * set page isolation range.
+ * If specified range includes migrate types other than MOVABLE,
+ * this will fail with -EBUSY.
+ */
+extern int
+isolate_pages(unsigned long start_pfn, unsigned long end_pfn);
+
+/*
+ * Free all isolated memory and push back them as MIGRATE_RESERVE type.
+ */
+extern int
+free_isolated_pages(unsigned long start_pfn, unsigned long end_pfn);
+
+/*
+ * test all pages are isolated or not.
+ */
+extern int
+test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+
+/* test routine for check page is isolated or not */
+extern int is_page_isolated(struct page *page);
+
+/*
+ * Internal funcs.
+ * Changes pageblock's migrate type
+ */
+extern int set_migratetype_isolate(struct page *page);
+extern void clear_migratetype_isolate(struct page *page);
+extern int __is_page_isolated(struct page *page);
+
+
+#endif
Index: devel-2.6.22-rc1-mm1/mm/Makefile
===================================================================
--- devel-2.6.22-rc1-mm1.orig/mm/Makefile	2007-05-22 14:30:43.000000000 +0900
+++ devel-2.6.22-rc1-mm1/mm/Makefile	2007-05-22 15:12:28.000000000 +0900
@@ -11,7 +11,7 @@
 			   page_alloc.o page-writeback.o pdflush.o \
 			   readahead.o swap.o truncate.o vmscan.o \
 			   prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
-			   $(mmu-y)
+			   page_isolation.o $(mmu-y)
 
 ifeq ($(CONFIG_MMU)$(CONFIG_BLOCK),yy)
 obj-y			+= bounce.o

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch] memory unplug v3 [1/4] page isolation
  2007-05-22  7:01 ` [Patch] memory unplug v3 [1/4] page isolation KAMEZAWA Hiroyuki
@ 2007-05-22 10:19   ` Mel Gorman
  2007-05-22 11:01     ` KAMEZAWA Hiroyuki
  2007-05-22 18:38   ` Christoph Lameter
  1 sibling, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2007-05-22 10:19 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, y-goto, clameter

On Tue, 22 May 2007, KAMEZAWA Hiroyuki wrote:

> Patch for isoalte pages.
> 'isoalte'means make pages to be free and never allocated.
> This feature helps making the range of pages unused.
>
> This patch is based on Mel's page grouping method.
>
> This patch add MIGRATE_ISOLATE to MIGRATE_TYPES. By this
> - MIGRATE_TYPES increases.
> - bitmap for migratetype is enlarged.
>

Both correct.

> If isolate_pages(start,end) is called,
> - migratetype of the range turns to be MIGRATE_ISOLATE  if
>  its current type is MIGRATE_MOVABLE or MIGRATE_RESERVE.

Why not MIGRATE_RECLAIMABLE as well?

> - MIGRATE_ISOLATE is not on migratetype fallback list.
>
> Then, pages of this migratetype will not be allocated even if it is free.
>
> Now, isolate_pages() only can treat the range aligned to MAX_ORDER.
> This can be adjusted if necesasry...maybe.
>

I have a patch ready that groups pages by an arbitrary order. Right now it 
is related to the size of the huge page on the system but it's a single 
variable pageblock_order that determines the range. You may find you want 
to adjust this value.

> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
> Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> Index: devel-2.6.22-rc1-mm1/include/linux/mmzone.h
> ===================================================================
> --- devel-2.6.22-rc1-mm1.orig/include/linux/mmzone.h	2007-05-22 14:30:43.000000000 +0900
> +++ devel-2.6.22-rc1-mm1/include/linux/mmzone.h	2007-05-22 15:12:28.000000000 +0900
> @@ -35,11 +35,12 @@
>  */
> #define PAGE_ALLOC_COSTLY_ORDER 3
>
> -#define MIGRATE_UNMOVABLE     0
> -#define MIGRATE_RECLAIMABLE   1
> -#define MIGRATE_MOVABLE       2
> -#define MIGRATE_RESERVE       3
> -#define MIGRATE_TYPES         4
> +#define MIGRATE_UNMOVABLE     0		/* not reclaimable pages */
> +#define MIGRATE_RECLAIMABLE   1		/* shrink_xxx routine can reap this */
> +#define MIGRATE_MOVABLE       2		/* migrate_page can migrate this */
> +#define MIGRATE_RESERVE       3		/* no type yet */

MIGRATE_RESERVE is where the min_free_kbytes pages are kept if possible 
and the number of RESERVE blocks depends on the value of it. It is only 
allocated from if the alternative is to fail the allocation so this 
comment should read

/* min_free_kbytes free pages here */

Later we may find a way of using MIGRATE_RESERVE to isolate ranges but 
it's not necessary now because it would obscure how the patch works.

> +#define MIGRATE_ISOLATE       4		/* never allocated from */
> +#define MIGRATE_TYPES         5
>

The documentation changes probably belong in a separate patch but thanks, 
it nudges me again into getting around to it.

> #define for_each_migratetype_order(order, type) \
> 	for (order = 0; order < MAX_ORDER; order++) \
> Index: devel-2.6.22-rc1-mm1/include/linux/pageblock-flags.h
> ===================================================================
> --- devel-2.6.22-rc1-mm1.orig/include/linux/pageblock-flags.h	2007-05-22 14:30:43.000000000 +0900
> +++ devel-2.6.22-rc1-mm1/include/linux/pageblock-flags.h	2007-05-22 15:12:28.000000000 +0900
> @@ -31,7 +31,7 @@
>
> /* Bit indices that affect a whole block of pages */
> enum pageblock_bits {
> -	PB_range(PB_migrate, 2), /* 2 bits required for migrate types */
> +	PB_range(PB_migrate, 3), /* 3 bits required for migrate types */

Right.

> 	NR_PAGEBLOCK_BITS
> };
>
> Index: devel-2.6.22-rc1-mm1/mm/page_alloc.c
> ===================================================================
> --- devel-2.6.22-rc1-mm1.orig/mm/page_alloc.c	2007-05-22 14:30:43.000000000 +0900
> +++ devel-2.6.22-rc1-mm1/mm/page_alloc.c	2007-05-22 15:12:28.000000000 +0900
> @@ -41,6 +41,7 @@
> #include <linux/pfn.h>
> #include <linux/backing-dev.h>
> #include <linux/fault-inject.h>
> +#include <linux/page-isolation.h>
>
> #include <asm/tlbflush.h>
> #include <asm/div64.h>
> @@ -1056,6 +1057,7 @@
> 	struct zone *zone = page_zone(page);
> 	struct per_cpu_pages *pcp;
> 	unsigned long flags;
> +	unsigned long migrate_type;
>
> 	if (PageAnon(page))
> 		page->mapping = NULL;
> @@ -1064,6 +1066,12 @@
>
> 	if (!PageHighMem(page))
> 		debug_check_no_locks_freed(page_address(page), PAGE_SIZE);
> +
> +	migrate_type = get_pageblock_migratetype(page);
> +	if (migrate_type == MIGRATE_ISOLATE) {
> +		__free_pages_ok(page, 0);
> +		return;
> +	}

This change to the PCP allocator may be unnecessary. If you let the page 
free to the pcp lists, they will never be allocated from there because 
allocflags_to_migratetype() will never return MIGRATE_ISOLATE. What you 
could do is drain the PCP lists just before you try to hot-remove or call 
test_pages_isolated() to that the pcp pages will free back to the 
MIGRATE_ISOLATE lists.

The extra drain is undesirable but probably better than checking for 
isolate every time a free occurs to the pcp lists.

> 	arch_free_page(page, 0);
> 	kernel_map_pages(page, 1, 0);
>
> @@ -1071,7 +1079,7 @@
> 	local_irq_save(flags);
> 	__count_vm_event(PGFREE);
> 	list_add(&page->lru, &pcp->list);
> -	set_page_private(page, get_pageblock_migratetype(page));
> +	set_page_private(page, migrate_type);
> 	pcp->count++;
> 	if (pcp->count >= pcp->high) {
> 		free_pages_bulk(zone, pcp->batch, &pcp->list, 0);
> @@ -4389,3 +4397,53 @@
> 		else
> 			__clear_bit(bitidx + start_bitidx, bitmap);
> }
> +
> +/*
> + * set/clear page block's type to be ISOLATE.
> + * page allocater never alloc memory from ISOLATE blcok.
> + */
> +
> +int is_page_isolated(struct page *page)
> +{
> +	if ((page_count(page) == 0) &&
> +	    (get_pageblock_migratetype(page) == MIGRATE_ISOLATE))

(PageBuddy(page) || (page_count(page) == 0 && PagePrivate(page))) &&
 	(get_pageblock_migratetype(page) == MIGRATE_ISOLATE)

PageBuddy(page) for free pages and page_count(page) with PagePrivate 
should indicate pages that are on the pcp lists.

As you currently prevent ISOLATE pages going to the pcp lists, only the 
PageBuddy check is necessary right now but If you drain before you check 
for isolated pages, you only need the PageBuddy() check. If you choose to 
let pages on the pcp lists until a drain occurs, then you need the second 
check.

This page_count() check instead of PageBuddy() appears to be related to 
how test_pages_isolated() is implemented - more on that later.

> +		return 1;
> +	return 0;
> +}
> +
> +int set_migratetype_isolate(struct page *page)
> +{

set_pageblock_isolate() maybe to match set_pageblock_migratetype() naming?

> +	struct zone *zone;
> +	unsigned long flags;
> +	int migrate_type;
> +	int ret = -EBUSY;
> +
> +	zone = page_zone(page);
> +	spin_lock_irqsave(&zone->lock, flags);

It may be more appropriate to have the caller take this lock. More later 
in isolates_pages()

> +	migrate_type = get_pageblock_migratetype(page);
> +	if ((migrate_type != MIGRATE_MOVABLE) &&
> +	    (migrate_type != MIGRATE_RESERVE))
> +		goto out;

and maybe MIGRATE_RECLAIMABLE here particularly in view of Christoph's 
work with kmem_cache_vacate().

> +	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> +	move_freepages_block(zone, page, MIGRATE_ISOLATE);
> +	ret = 0;
> +out:
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +	if (!ret)
> +		drain_all_local_pages();

It's not clear why you drain the pcp lists when you encounter a block of 
the wrong migrate_type. Draining the pcp lists is unlikely to help you.

> +	return ret;
> +}
> +
> +void clear_migratetype_isolate(struct page *page)
> +{
> +	struct zone *zone;
> +	unsigned long flags;
> +	zone = page_zone(page);
> +	spin_lock_irqsave(&zone->lock, flags);
> +	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
> +		goto out;
> +	set_pageblock_migratetype(page, MIGRATE_RESERVE);
> +	move_freepages_block(zone, page, MIGRATE_RESERVE);

MIGRATE_RESERVE is likely not what you want to do here. The number of 
MIGRATE_RESERVE blocks in a zone is determined by 
setup_zone_migrate_reserve(). If you are setting blocks like this, then 
you need to call setup_zone_migrate_reserve() with the zone->lru_lock held 
after you have call clear_migratetype_isolate() for all the necessary 
blocks.

It may be easier to just set the blocks MIGRATE_MOVABLE.

> +out:
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +}
> Index: devel-2.6.22-rc1-mm1/mm/page_isolation.c
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ devel-2.6.22-rc1-mm1/mm/page_isolation.c	2007-05-22 15:12:28.000000000 +0900
> @@ -0,0 +1,67 @@
> +/*
> + * linux/mm/page_isolation.c
> + */
> +
> +#include <stddef.h>
> +#include <linux/mm.h>
> +#include <linux/page-isolation.h>
> +
> +#define ROUND_DOWN(x,y)	((x) & ~((y) - 1))
> +#define ROUND_UP(x,y)	(((x) + (y) -1) & ~((y) - 1))

A roundup() macro already exists in kernel.h. You may want to use that and 
define a new rounddown() macro there instead.

> +int
> +isolate_pages(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long pfn, start_pfn_aligned, end_pfn_aligned;
> +	unsigned long undo_pfn;
> +
> +	start_pfn_aligned = ROUND_DOWN(start_pfn, NR_PAGES_ISOLATION_BLOCK);
> +	end_pfn_aligned = ROUND_UP(end_pfn, NR_PAGES_ISOLATION_BLOCK);
> +
> +	for (pfn = start_pfn_aligned;
> +	     pfn < end_pfn_aligned;
> +	     pfn += NR_PAGES_ISOLATION_BLOCK)
> +		if (set_migratetype_isolate(pfn_to_page(pfn))) {

You will need to call pfn_valid() in the non-SPARSEMEM case before calling 
pfn_to_page() or this will crash in some circumstances.

You also need to check zone boundaries. Lets say start_pfn is the start of 
a non-MAX_ORDER aligned zone. Aligning it could make you start isolating 
in the wrong zone - prehaps this is intentional, I don't know.

> +			undo_pfn = pfn;
> +			goto undo;
> +		}
> +	return 0;
> +undo:
> +	for (pfn = start_pfn_aligned;
> +	     pfn <= undo_pfn;
> +	     pfn += NR_PAGES_ISOLATION_BLOCK)
> +		clear_migratetype_isolate(pfn_to_page(pfn));
> +

We fail if we encounter any non-MIGRATE_MOVABLE block in the start_pfn to 
end_pfn range but at that point we've done a lot of work. We also take and 
release an interrupt safe lock for each NR_PAGES_ISOLATION_BLOCK block 
because set_migratetype_isolate() is responsible for lock taking.

It might be better if you took the lock here, scanned first to make sure 
all the blocks were suitable for isolation and only then, call 
set_migratetype_isolate() for each of them before releasing the lock.

That would take the lock once and avoid the need for back-out code that 
changes all the MIGRATE types in the range. Even for large ranges of 
memory, it should not be too long to be holding a lock particularly in 
this path.

> +	return -EBUSY;
> +}
> +
> +
> +int
> +free_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long pfn, start_pfn_aligned, end_pfn_aligned;
> +	start_pfn_aligned = ROUND_DOWN(start_pfn, NR_PAGES_ISOLATION_BLOCK);
> +        end_pfn_aligned = ROUND_UP(end_pfn, NR_PAGES_ISOLATION_BLOCK);

spaces instead of tabs there before end_pfn_aligned.

> +
> +	for (pfn = start_pfn_aligned;
> +	     pfn < end_pfn_aligned;
> +	     pfn += MAX_ORDER_NR_PAGES)

pfn += NR_PAGES_ISOLATION_BLOCK ?

pfn_valid() ?

> +		clear_migratetype_isolate(pfn_to_page(pfn));
> +	return 0;
> +}
> +
> +int
> +test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long pfn;
> +	int ret = 0;
> +

You didn't align here, intentional?

> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> +		if (!pfn_valid(pfn))
> +			continue;
> +		if (!is_page_isolated(pfn_to_page(pfn))) {
> +			ret = 1;
> +			break;
> +		}

If the page is isolated, it's free and assuming you've drained the pcp 
lists, it will have PageBuddy() set. In that case, you should be checking 
what order the page is free at and skipping forward that number of pages. 
I am guessing this pfn++ walk here is why you are checking 
page_count(page) == 0 in is_page_isolated() instead of PageBuddy()

> +	}
> +	return ret;

The return value is a little counter-intuitive. It returns 1 if they are 
not isolated. I would expect it to return 1 if isolated like test_bit() 
returns 1 if it's set.

> +}
> Index: devel-2.6.22-rc1-mm1/include/linux/page-isolation.h
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ devel-2.6.22-rc1-mm1/include/linux/page-isolation.h	2007-05-22 15:12:28.000000000 +0900
> @@ -0,0 +1,47 @@
> +#ifndef __LINUX_PAGEISOLATION_H
> +#define __LINUX_PAGEISOLATION_H
> +/*
> + * Define an interface for capturing and isolating some amount of
> + * contiguous pages.
> + * isolated pages are freed but wll never be allocated until they are
> + * pushed back.
> + *
> + * This isolation function requires some alignment.
> + */
> +
> +#define PAGE_ISOLATION_ORDER	(MAX_ORDER - 1)
> +#define NR_PAGES_ISOLATION_BLOCK	(1 << PAGE_ISOLATION_ORDER)
> +

When grouping-pages-by-arbitary-order goes in, there will be a value 
available called pageblock_order and nr_pages_pageblock which will be 
identical to these two values.

> +/*
> + * set page isolation range.
> + * If specified range includes migrate types other than MOVABLE,
> + * this will fail with -EBUSY.
> + */
> +extern int
> +isolate_pages(unsigned long start_pfn, unsigned long end_pfn);
> +
> +/*
> + * Free all isolated memory and push back them as MIGRATE_RESERVE type.
> + */
> +extern int
> +free_isolated_pages(unsigned long start_pfn, unsigned long end_pfn);
> +
> +/*
> + * test all pages are isolated or not.
> + */
> +extern int
> +test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
> +
> +/* test routine for check page is isolated or not */
> +extern int is_page_isolated(struct page *page);
> +
> +/*
> + * Internal funcs.
> + * Changes pageblock's migrate type
> + */
> +extern int set_migratetype_isolate(struct page *page);
> +extern void clear_migratetype_isolate(struct page *page);
> +extern int __is_page_isolated(struct page *page);
> +
> +
> +#endif
> Index: devel-2.6.22-rc1-mm1/mm/Makefile
> ===================================================================
> --- devel-2.6.22-rc1-mm1.orig/mm/Makefile	2007-05-22 14:30:43.000000000 +0900
> +++ devel-2.6.22-rc1-mm1/mm/Makefile	2007-05-22 15:12:28.000000000 +0900
> @@ -11,7 +11,7 @@
> 			   page_alloc.o page-writeback.o pdflush.o \
> 			   readahead.o swap.o truncate.o vmscan.o \
> 			   prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
> -			   $(mmu-y)
> +			   page_isolation.o $(mmu-y)
>
> ifeq ($(CONFIG_MMU)$(CONFIG_BLOCK),yy)
> obj-y			+= bounce.o
>

All in all, I like this implementation. I found it nice and relatively 
straight-forward to read. Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch] memory unplug v3 [1/4] page isolation
  2007-05-22 10:19   ` Mel Gorman
@ 2007-05-22 11:01     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-05-22 11:01 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, y-goto, clameter

On Tue, 22 May 2007 11:19:27 +0100 (IST)
Mel Gorman <mel@csn.ul.ie> wrote:
> > If isolate_pages(start,end) is called,
> > - migratetype of the range turns to be MIGRATE_ISOLATE  if
> >  its current type is MIGRATE_MOVABLE or MIGRATE_RESERVE.
> 
> Why not MIGRATE_RECLAIMABLE as well?
> 

To allow that, I have to implement page_reclaime_range(start_pfn, end_pfn);
Now, I use just migration.
I'll consider it as my future work.
Maybe Christoph's work will help me.

> > - MIGRATE_ISOLATE is not on migratetype fallback list.
> >
> > Then, pages of this migratetype will not be allocated even if it is free.
> >
> > Now, isolate_pages() only can treat the range aligned to MAX_ORDER.
> > This can be adjusted if necesasry...maybe.
> >
> 
> I have a patch ready that groups pages by an arbitrary order. Right now it 
> is related to the size of the huge page on the system but it's a single 
> variable pageblock_order that determines the range. You may find you want 
> to adjust this value.
> 
I see. I'll support it in patches for next -mm.


> > +#define MIGRATE_UNMOVABLE     0		/* not reclaimable pages */
> > +#define MIGRATE_RECLAIMABLE   1		/* shrink_xxx routine can reap this */
> > +#define MIGRATE_MOVABLE       2		/* migrate_page can migrate this */
> > +#define MIGRATE_RESERVE       3		/* no type yet */
> 
> MIGRATE_RESERVE is where the min_free_kbytes pages are kept if possible 
> and the number of RESERVE blocks depends on the value of it. It is only 
> allocated from if the alternative is to fail the allocation so this 
> comment should read
> 
> /* min_free_kbytes free pages here */
> 
ok.

> Later we may find a way of using MIGRATE_RESERVE to isolate ranges but 
> it's not necessary now because it would obscure how the patch works.
> 
> > +#define MIGRATE_ISOLATE       4		/* never allocated from */
> > +#define MIGRATE_TYPES         5
> >
> 
> The documentation changes probably belong in a separate patch but thanks, 
> it nudges me again into getting around to it.
> 
Ok, I'll just consider comments for MIGRAT_ISOLATE.


>
> > +
> > +	migrate_type = get_pageblock_migratetype(page);
> > +	if (migrate_type == MIGRATE_ISOLATE) {
> > +		__free_pages_ok(page, 0);
> > +		return;
> > +	}
> 
> This change to the PCP allocator may be unnecessary. If you let the page 
> free to the pcp lists, they will never be allocated from there because 
> allocflags_to_migratetype() will never return MIGRATE_ISOLATE. What you 
> could do is drain the PCP lists just before you try to hot-remove or call 
> test_pages_isolated() to that the pcp pages will free back to the 
> MIGRATE_ISOLATE lists.
> 
Ah.. thanks. I'll remove this.


> The extra drain is undesirable but probably better than checking for 
> isolate every time a free occurs to the pcp lists.
> 
yes.

>
> > +/*
> > + * set/clear page block's type to be ISOLATE.
> > + * page allocater never alloc memory from ISOLATE blcok.
> > + */
> > +
> > +int is_page_isolated(struct page *page)
> > +{
> > +	if ((page_count(page) == 0) &&
> > +	    (get_pageblock_migratetype(page) == MIGRATE_ISOLATE))
> 
> (PageBuddy(page) || (page_count(page) == 0 && PagePrivate(page))) &&
>  	(get_pageblock_migratetype(page) == MIGRATE_ISOLATE)
> 
> PageBuddy(page) for free pages and page_count(page) with PagePrivate 
> should indicate pages that are on the pcp lists.
> 
> As you currently prevent ISOLATE pages going to the pcp lists, only the 
> PageBuddy check is necessary right now but If you drain before you check 
> for isolated pages, you only need the PageBuddy() check. If you choose to 
> let pages on the pcp lists until a drain occurs, then you need the second 
> check.
> 
> This page_count() check instead of PageBuddy() appears to be related to 
> how test_pages_isolated() is implemented - more on that later.
> 
PG_buddy is set only if page is linked to freelist. IOW, if the page
is not the head of its buddy, PG_buddy is not set.
So, I didn't use PageBuddy().

(*) If I use PG_buddy for check "page is free or not", I have to search 
    head of buddy and its order.

> > +		return 1;
> > +	return 0;
> > +}
> > +
> > +int set_migratetype_isolate(struct page *page)
> > +{
> 
> set_pageblock_isolate() maybe to match set_pageblock_migratetype() naming?
> 

> > +	struct zone *zone;
> > +	unsigned long flags;
> > +	int migrate_type;
> > +	int ret = -EBUSY;
> > +
> > +	zone = page_zone(page);
> > +	spin_lock_irqsave(&zone->lock, flags);
> 
> It may be more appropriate to have the caller take this lock. More later 
> in isolates_pages()
> 
ok.

> > +	migrate_type = get_pageblock_migratetype(page);
> > +	if ((migrate_type != MIGRATE_MOVABLE) &&
> > +	    (migrate_type != MIGRATE_RESERVE))
> > +		goto out;
> 
> and maybe MIGRATE_RECLAIMABLE here particularly in view of Christoph's 
> work with kmem_cache_vacate().
> 
ok. I'll look into.


> > +	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> > +	move_freepages_block(zone, page, MIGRATE_ISOLATE);
> > +	ret = 0;
> > +out:
> > +	spin_unlock_irqrestore(&zone->lock, flags);
> > +	if (!ret)
> > +		drain_all_local_pages();
> 
> It's not clear why you drain the pcp lists when you encounter a block of 
> the wrong migrate_type. Draining the pcp lists is unlikely to help you.
> 
Ah, drain_all_local_pages() are called when MIGRATE_ISOLATE is successfully set.
But I'll change this because I'll remove hook in free_hot_cold_page() and call
drain_all_local_pages() in somewhere.


> > +	return ret;
> > +}
> > +
> > +void clear_migratetype_isolate(struct page *page)
> > +{
> > +	struct zone *zone;
> > +	unsigned long flags;
> > +	zone = page_zone(page);
> > +	spin_lock_irqsave(&zone->lock, flags);
> > +	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
> > +		goto out;
> > +	set_pageblock_migratetype(page, MIGRATE_RESERVE);
> > +	move_freepages_block(zone, page, MIGRATE_RESERVE);
> 
> MIGRATE_RESERVE is likely not what you want to do here. The number of 
> MIGRATE_RESERVE blocks in a zone is determined by 
> setup_zone_migrate_reserve(). If you are setting blocks like this, then 
> you need to call setup_zone_migrate_reserve() with the zone->lru_lock held 
> after you have call clear_migratetype_isolate() for all the necessary 
> blocks.
> 
> It may be easier to just set the blocks MIGRATE_MOVABLE.
> 
Ok.



> > +out:
> > +	spin_unlock_irqrestore(&zone->lock, flags);
> > +}
> > Index: devel-2.6.22-rc1-mm1/mm/page_isolation.c
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ devel-2.6.22-rc1-mm1/mm/page_isolation.c	2007-05-22 15:12:28.000000000 +0900
> > @@ -0,0 +1,67 @@
> > +/*
> > + * linux/mm/page_isolation.c
> > + */
> > +
> > +#include <stddef.h>
> > +#include <linux/mm.h>
> > +#include <linux/page-isolation.h>
> > +
> > +#define ROUND_DOWN(x,y)	((x) & ~((y) - 1))
> > +#define ROUND_UP(x,y)	(((x) + (y) -1) & ~((y) - 1))
> 
> A roundup() macro already exists in kernel.h. You may want to use that and 
> define a new rounddown() macro there instead.
Oh...I couldn't find it. thank you.


> 
> > +int
> > +isolate_pages(unsigned long start_pfn, unsigned long end_pfn)
> > +{
> > +	unsigned long pfn, start_pfn_aligned, end_pfn_aligned;
> > +	unsigned long undo_pfn;
> > +
> > +	start_pfn_aligned = ROUND_DOWN(start_pfn, NR_PAGES_ISOLATION_BLOCK);
> > +	end_pfn_aligned = ROUND_UP(end_pfn, NR_PAGES_ISOLATION_BLOCK);
> > +
> > +	for (pfn = start_pfn_aligned;
> > +	     pfn < end_pfn_aligned;
> > +	     pfn += NR_PAGES_ISOLATION_BLOCK)
> > +		if (set_migratetype_isolate(pfn_to_page(pfn))) {
> 
> You will need to call pfn_valid() in the non-SPARSEMEM case before calling 
> pfn_to_page() or this will crash in some circumstances.
ok.

> 
> You also need to check zone boundaries. Lets say start_pfn is the start of 
> a non-MAX_ORDER aligned zone. Aligning it could make you start isolating 
> in the wrong zone - prehaps this is intentional, I don't know.

Ah, ok. at least pfn_valid() is necessary.



> 
> > +			undo_pfn = pfn;
> > +			goto undo;
> > +		}
> > +	return 0;
> > +undo:
> > +	for (pfn = start_pfn_aligned;
> > +	     pfn <= undo_pfn;
> > +	     pfn += NR_PAGES_ISOLATION_BLOCK)
> > +		clear_migratetype_isolate(pfn_to_page(pfn));
> > +
> 
> We fail if we encounter any non-MIGRATE_MOVABLE block in the start_pfn to 
> end_pfn range but at that point we've done a lot of work. We also take and 
> release an interrupt safe lock for each NR_PAGES_ISOLATION_BLOCK block 
> because set_migratetype_isolate() is responsible for lock taking.
> 
> It might be better if you took the lock here, scanned first to make sure 
> all the blocks were suitable for isolation and only then, call 
> set_migratetype_isolate() for each of them before releasing the lock.

Hm. ok.

> 
> That would take the lock once and avoid the need for back-out code that 
> changes all the MIGRATE types in the range. Even for large ranges of 
> memory, it should not be too long to be holding a lock particularly in 
> this path.
> 


> > +	return -EBUSY;
> > +}
> > +
> > +
> > +int
> > +free_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
> > +{
> > +	unsigned long pfn, start_pfn_aligned, end_pfn_aligned;
> > +	start_pfn_aligned = ROUND_DOWN(start_pfn, NR_PAGES_ISOLATION_BLOCK);
> > +        end_pfn_aligned = ROUND_UP(end_pfn, NR_PAGES_ISOLATION_BLOCK);
> 
> spaces instead of tabs there before end_pfn_aligned.
> 
> > +
> > +	for (pfn = start_pfn_aligned;
> > +	     pfn < end_pfn_aligned;
> > +	     pfn += MAX_ORDER_NR_PAGES)
> 
> pfn += NR_PAGES_ISOLATION_BLOCK ?
> 
yes. it should be.

> pfn_valid() ?
> 
ok.

> > +		clear_migratetype_isolate(pfn_to_page(pfn));
> > +	return 0;
> > +}
> > +
> > +int
> > +test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
> > +{
> > +	unsigned long pfn;
> > +	int ret = 0;
> > +
> 
> You didn't align here, intentional?
> 
Ah...no. check alignment in the next version.


> > +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> > +		if (!pfn_valid(pfn))
> > +			continue;
> > +		if (!is_page_isolated(pfn_to_page(pfn))) {
> > +			ret = 1;
> > +			break;
> > +		}
> 
> If the page is isolated, it's free and assuming you've drained the pcp 
> lists, it will have PageBuddy() set. In that case, you should be checking 
> what order the page is free at and skipping forward that number of pages. 
> I am guessing this pfn++ walk here is why you are checking 
> page_count(page) == 0 in is_page_isolated() instead of PageBuddy()
> 
yes. In next version, I'd like to try to treat PageBuddy() and page_order() things.


> > +	}
> > +	return ret;
> 
> The return value is a little counter-intuitive. It returns 1 if they are 
> not isolated. I would expect it to return 1 if isolated like test_bit() 
> returns 1 if it's set.
> 
ok.

> > +#define PAGE_ISOLATION_ORDER	(MAX_ORDER - 1)
> > +#define NR_PAGES_ISOLATION_BLOCK	(1 << PAGE_ISOLATION_ORDER)
> > +
> 
> When grouping-pages-by-arbitary-order goes in, there will be a value 
> available called pageblock_order and nr_pages_pageblock which will be 
> identical to these two values.
> 
ok.


> All in all, I like this implementation. I found it nice and relatively 
> straight-forward to read. Thanks
> 
Thank you for review. I'll reflect your comments in the next version.

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch] memory unplug v3 [1/4] page isolation
  2007-05-22  7:01 ` [Patch] memory unplug v3 [1/4] page isolation KAMEZAWA Hiroyuki
  2007-05-22 10:19   ` Mel Gorman
@ 2007-05-22 18:38   ` Christoph Lameter
  2007-05-23  1:41     ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2007-05-22 18:38 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto

On Tue, 22 May 2007, KAMEZAWA Hiroyuki wrote:

> Index: devel-2.6.22-rc1-mm1/mm/page_isolation.c
> ===================================================================
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ devel-2.6.22-rc1-mm1/mm/page_isolation.c	2007-05-22 15:12:28.000000000 +0900
> @@ -0,0 +1,67 @@
> +/*
> + * linux/mm/page_isolation.c
> + */
> +
> +#include <stddef.h>
> +#include <linux/mm.h>
> +#include <linux/page-isolation.h>
> +
> +#define ROUND_DOWN(x,y)	((x) & ~((y) - 1))
> +#define ROUND_UP(x,y)	(((x) + (y) -1) & ~((y) - 1))

Use the common definitions like ALIGN in kernel.h and the rounding 
functions in log2.h?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch] memory unplug v3 [1/4] page isolation
  2007-05-22 18:38   ` Christoph Lameter
@ 2007-05-23  1:41     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-05-23  1:41 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, mel, y-goto

On Tue, 22 May 2007 11:38:56 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Tue, 22 May 2007, KAMEZAWA Hiroyuki wrote:
> 
> > Index: devel-2.6.22-rc1-mm1/mm/page_isolation.c
> > ===================================================================
> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ devel-2.6.22-rc1-mm1/mm/page_isolation.c	2007-05-22 15:12:28.000000000 +0900
> > @@ -0,0 +1,67 @@
> > +/*
> > + * linux/mm/page_isolation.c
> > + */
> > +
> > +#include <stddef.h>
> > +#include <linux/mm.h>
> > +#include <linux/page-isolation.h>
> > +
> > +#define ROUND_DOWN(x,y)	((x) & ~((y) - 1))
> > +#define ROUND_UP(x,y)	(((x) + (y) -1) & ~((y) - 1))
> 
> Use the common definitions like ALIGN in kernel.h and the rounding 
> functions in log2.h?
> 
yes. I should do so.

Thanks,
-Kmae

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Patch] memory unplug v3 [2/4] migration by kernel
  2007-05-22  6:58 [Patch] memory unplug v3 [0/4] KAMEZAWA Hiroyuki
  2007-05-22  7:01 ` [Patch] memory unplug v3 [1/4] page isolation KAMEZAWA Hiroyuki
@ 2007-05-22  7:04 ` KAMEZAWA Hiroyuki
  2007-05-22 18:49   ` Christoph Lameter
  2007-05-22  7:07 ` [Patch] memory unplug v3 [3/4] page removal KAMEZAWA Hiroyuki
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-05-22  7:04 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto, clameter

This patch adds a feature that the kernel can migrate user pages by its own
context.

Now, sys_migrate(), a system call to migrate pages, works well.
When we want to migrate pages by some kernel codes, we have 2 approachs.

(a) acquire some mm->sem of a mapper of the target page.
(b) avoid race condition by additional check codes.

This patch implemetns (b) and adds following 2 codes.

1. delay freeing anon_vma while a page which belongs to it is migrated.
2. check page_mapped() before calling try_to_unmap().
 
Maybe more check will be needed. At least, this patch's migration_nocntext()
works well under heavy memory pressure on my environment.

Signed-Off-By: Yasonori Goto <y-goto@jp.fujitsu.com>
Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Index: devel-2.6.22-rc1-mm1/mm/Kconfig
===================================================================
--- devel-2.6.22-rc1-mm1.orig/mm/Kconfig	2007-05-22 14:30:39.000000000 +0900
+++ devel-2.6.22-rc1-mm1/mm/Kconfig	2007-05-22 15:12:29.000000000 +0900
@@ -152,6 +152,15 @@
 	  example on NUMA systems to put pages nearer to the processors accessing
 	  the page.
 
+config MIGRATION_BY_KERNEL
+	bool "Page migration by kernel's page scan"
+	def_bool y
+	depends on MIGRATION
+	help
+	  Allows page migration from kernel context. This means page migration
+	  can be done by codes other than sys_migrate() system call. Will add
+	  some additional check code in page migration.
+
 config RESOURCES_64BIT
 	bool "64 bit Memory and IO resources (EXPERIMENTAL)" if (!64BIT && EXPERIMENTAL)
 	default 64BIT
Index: devel-2.6.22-rc1-mm1/mm/migrate.c
===================================================================
--- devel-2.6.22-rc1-mm1.orig/mm/migrate.c	2007-05-22 14:30:39.000000000 +0900
+++ devel-2.6.22-rc1-mm1/mm/migrate.c	2007-05-22 15:12:29.000000000 +0900
@@ -607,11 +607,12 @@
  * to the newly allocated page in newpage.
  */
 static int unmap_and_move(new_page_t get_new_page, unsigned long private,
-			struct page *page, int force)
+			struct page *page, int force, int context)
 {
 	int rc = 0;
 	int *result = NULL;
 	struct page *newpage = get_new_page(page, private, &result);
+	struct anon_vma *anon_vma = NULL;
 
 	if (!newpage)
 		return -ENOMEM;
@@ -632,16 +633,29 @@
 			goto unlock;
 		wait_on_page_writeback(page);
 	}
-
+#ifdef CONFIG_MIGRATION_BY_KERNEL
+	if (PageAnon(page) && context)
+		/* hold this anon_vma until page migration ends */
+		anon_vma = anon_vma_hold(page);
+
+	if (page_mapped(page))
+		try_to_unmap(page, 1);
+#else
 	/*
 	 * Establish migration ptes or remove ptes
 	 */
 	try_to_unmap(page, 1);
+#endif
 	if (!page_mapped(page))
 		rc = move_to_new_page(newpage, page);
 
-	if (rc)
+	if (rc) {
 		remove_migration_ptes(page, page);
+	}
+#ifdef CONFIG_MIGRATION_BY_KERNEL
+	if (anon_vma)
+		anon_vma_release(anon_vma);
+#endif
 
 unlock:
 	unlock_page(page);
@@ -686,8 +700,8 @@
  *
  * Return: Number of pages not migrated or error code.
  */
-int migrate_pages(struct list_head *from,
-		new_page_t get_new_page, unsigned long private)
+int __migrate_pages(struct list_head *from,
+		new_page_t get_new_page, unsigned long private, int context)
 {
 	int retry = 1;
 	int nr_failed = 0;
@@ -707,7 +721,7 @@
 			cond_resched();
 
 			rc = unmap_and_move(get_new_page, private,
-						page, pass > 2);
+						page, pass > 2, context);
 
 			switch(rc) {
 			case -ENOMEM:
@@ -737,6 +751,25 @@
 	return nr_failed + retry;
 }
 
+int migrate_pages(struct list_head *from,
+	new_page_t get_new_page, unsigned long private)
+{
+	return __migrate_pages(from, get_new_page, private, 0);
+}
+
+#ifdef CONFIG_MIGRATION_BY_KERNEL
+/*
+ * When page migration is issued by the kernel itself without page mapper's
+ * mm->sem, we have to be more careful to do page migration.
+ */
+int migrate_pages_nocontext(struct list_head *from,
+	new_page_t get_new_page, unsigned long private)
+{
+	return __migrate_pages(from, get_new_page, private, 1);
+}
+
+#endif /* CONFIG_MIGRATION_BY_KERNEL */
+
 #ifdef CONFIG_NUMA
 /*
  * Move a list of individual pages
Index: devel-2.6.22-rc1-mm1/include/linux/rmap.h
===================================================================
--- devel-2.6.22-rc1-mm1.orig/include/linux/rmap.h	2007-05-22 14:30:39.000000000 +0900
+++ devel-2.6.22-rc1-mm1/include/linux/rmap.h	2007-05-22 15:12:29.000000000 +0900
@@ -26,12 +26,16 @@
 struct anon_vma {
 	spinlock_t lock;	/* Serialize access to vma list */
 	struct list_head head;	/* List of private "related" vmas */
+#ifdef CONFIG_MIGRATION_BY_KERNEL
+	atomic_t	ref;	/* special refcnt for migration */
+#endif
 };
 
 #ifdef CONFIG_MMU
 
 extern struct kmem_cache *anon_vma_cachep;
 
+#ifndef CONFIG_MIGRATION_BY_KERNEL
 static inline struct anon_vma *anon_vma_alloc(void)
 {
 	return kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
@@ -41,6 +45,26 @@
 {
 	kmem_cache_free(anon_vma_cachep, anon_vma);
 }
+#define anon_vma_hold(page)	do{}while(0)
+#define anon_vma_release(anon)	do{}while(0)
+
+#else /* CONFIG_MIGRATION_BY_KERNEL */
+static inline struct anon_vma *anon_vma_alloc(void)
+{
+	struct anon_vma *ret = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
+	if (ret)
+		atomic_set(&ret->ref, 0);
+	return ret;
+}
+static inline void anon_vma_free(struct anon_vma *anon_vma)
+{
+	if (atomic_read(&anon_vma->ref) == 0)
+		kmem_cache_free(anon_vma_cachep, anon_vma);
+}
+extern struct anon_vma *anon_vma_hold(struct page *page);
+extern void anon_vma_release(struct anon_vma *anon_vma);
+
+#endif /* CONFIG_MIGRATION_BY_KERNEL */
 
 static inline void anon_vma_lock(struct vm_area_struct *vma)
 {
Index: devel-2.6.22-rc1-mm1/mm/rmap.c
===================================================================
--- devel-2.6.22-rc1-mm1.orig/mm/rmap.c	2007-05-22 14:30:39.000000000 +0900
+++ devel-2.6.22-rc1-mm1/mm/rmap.c	2007-05-22 15:12:29.000000000 +0900
@@ -203,6 +203,28 @@
 	spin_unlock(&anon_vma->lock);
 	rcu_read_unlock();
 }
+#ifdef CONFIG_MIGRATION_BY_KERNEL
+struct anon_vma *anon_vma_hold(struct page *page) {
+	struct anon_vma *anon_vma;
+	anon_vma = page_lock_anon_vma(page);
+	if (!anon_vma)
+		return NULL;
+	atomic_set(&anon_vma->ref, 1);
+	spin_unlock(&anon_vma->lock);
+	return anon_vma;
+}
+
+void anon_vma_release(struct anon_vma *anon_vma)
+{
+	int empty;
+	spin_lock(&anon_vma->lock);
+	atomic_set(&anon_vma->ref, 0);
+	empty = list_empty(&anon_vma->head);
+	spin_unlock(&anon_vma->lock);
+	if (empty)
+		anon_vma_free(anon_vma);
+}
+#endif
 
 /*
  * At what user virtual address is page expected in vma?
Index: devel-2.6.22-rc1-mm1/include/linux/migrate.h
===================================================================
--- devel-2.6.22-rc1-mm1.orig/include/linux/migrate.h	2007-05-22 14:30:39.000000000 +0900
+++ devel-2.6.22-rc1-mm1/include/linux/migrate.h	2007-05-22 15:12:29.000000000 +0900
@@ -30,7 +30,10 @@
 extern int migrate_page(struct address_space *,
 			struct page *, struct page *);
 extern int migrate_pages(struct list_head *l, new_page_t x, unsigned long);
-
+#ifdef CONFIG_MIGRATION_BY_KERNEL
+extern int migrate_pages_nocontext(struct list_head *l, new_page_t x,
+					unsigned long);
+#endif
 extern int fail_migrate_page(struct address_space *,
 			struct page *, struct page *);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch] memory unplug v3 [2/4] migration by kernel
  2007-05-22  7:04 ` [Patch] memory unplug v3 [2/4] migration by kernel KAMEZAWA Hiroyuki
@ 2007-05-22 18:49   ` Christoph Lameter
  2007-05-23  1:45     ` KAMEZAWA Hiroyuki
  2007-05-23 19:14     ` Mel Gorman
  0 siblings, 2 replies; 20+ messages in thread
From: Christoph Lameter @ 2007-05-22 18:49 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto

On Tue, 22 May 2007, KAMEZAWA Hiroyuki wrote:

> +config MIGRATION_BY_KERNEL
> +	bool "Page migration by kernel's page scan"
> +	def_bool y
> +	depends on MIGRATION
> +	help
> +	  Allows page migration from kernel context. This means page migration
> +	  can be done by codes other than sys_migrate() system call. Will add
> +	  some additional check code in page migration.

I think the scope of this is much bigger than you imagine. This is also 
going to be useful when Mel is going to implement defragmentation. So I 
think this should not be a separate option but be on by default.

> Index: devel-2.6.22-rc1-mm1/mm/migrate.c
> ===================================================================
> --- devel-2.6.22-rc1-mm1.orig/mm/migrate.c	2007-05-22 14:30:39.000000000 +0900
> +++ devel-2.6.22-rc1-mm1/mm/migrate.c	2007-05-22 15:12:29.000000000 +0900
> @@ -607,11 +607,12 @@
>   * to the newly allocated page in newpage.
>   */
>  static int unmap_and_move(new_page_t get_new_page, unsigned long private,
> -			struct page *page, int force)
> +			struct page *page, int force, int context)

context is set if there is no context? Call this nocontext instead?

>  
> -	if (rc)
> +	if (rc) {
>  		remove_migration_ptes(page, page);
> +	}

Why are you adding { } here?

> +#ifdef CONFIG_MIGRATION_BY_KERNEL
> +	if (anon_vma)
> +		anon_vma_release(anon_vma);
> +#endif

The check for anon_vma != NULL could be put into anon_vma_release to avoid 
the ifdef.

> Index: devel-2.6.22-rc1-mm1/mm/rmap.c
> ===================================================================
> --- devel-2.6.22-rc1-mm1.orig/mm/rmap.c	2007-05-22 14:30:39.000000000 +0900
> +++ devel-2.6.22-rc1-mm1/mm/rmap.c	2007-05-22 15:12:29.000000000 +0900
> @@ -203,6 +203,28 @@
>  	spin_unlock(&anon_vma->lock);
>  	rcu_read_unlock();
>  }
> +#ifdef CONFIG_MIGRATION_BY_KERNEL
> +struct anon_vma *anon_vma_hold(struct page *page) {
> +	struct anon_vma *anon_vma;
> +	anon_vma = page_lock_anon_vma(page);
> +	if (!anon_vma)
> +		return NULL;
> +	atomic_set(&anon_vma->ref, 1);

Why use an atomic value if it is set and cleared within a spinlock?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch] memory unplug v3 [2/4] migration by kernel
  2007-05-22 18:49   ` Christoph Lameter
@ 2007-05-23  1:45     ` KAMEZAWA Hiroyuki
  2007-05-23  1:56       ` Christoph Lameter
  2007-05-23 19:14     ` Mel Gorman
  1 sibling, 1 reply; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-05-23  1:45 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, mel, y-goto

On Tue, 22 May 2007 11:49:04 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Tue, 22 May 2007, KAMEZAWA Hiroyuki wrote:
> 
> > +config MIGRATION_BY_KERNEL
> > +	bool "Page migration by kernel's page scan"
> > +	def_bool y
> > +	depends on MIGRATION
> > +	help
> > +	  Allows page migration from kernel context. This means page migration
> > +	  can be done by codes other than sys_migrate() system call. Will add
> > +	  some additional check code in page migration.
> 
> I think the scope of this is much bigger than you imagine. This is also 
> going to be useful when Mel is going to implement defragmentation. So I 
> think this should not be a separate option but be on by default.

ok. (Then I can remove this config.)


> >  static int unmap_and_move(new_page_t get_new_page, unsigned long private,
> > -			struct page *page, int force)
> > +			struct page *page, int force, int context)
> 
> context is set if there is no context? Call this nocontext instead?
> 
ok, this should be.
> >  
> > -	if (rc)
> > +	if (rc) {
> >  		remove_migration_ptes(page, page);
> > +	}
> 
> Why are you adding { } here?
> 
maybe my garbage from older version.

> > +#ifdef CONFIG_MIGRATION_BY_KERNEL
> > +struct anon_vma *anon_vma_hold(struct page *page) {
> > +	struct anon_vma *anon_vma;
> > +	anon_vma = page_lock_anon_vma(page);
> > +	if (!anon_vma)
> > +		return NULL;
> > +	atomic_set(&anon_vma->ref, 1);
> 
> Why use an atomic value if it is set and cleared within a spinlock?

anon_vma_free(), which see this value, doesn't take any lock and use atomic ops.
I used atomic ops to handle atomic_t.

-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch] memory unplug v3 [2/4] migration by kernel
  2007-05-23  1:45     ` KAMEZAWA Hiroyuki
@ 2007-05-23  1:56       ` Christoph Lameter
  2007-05-23  2:09         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2007-05-23  1:56 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto

On Wed, 23 May 2007, KAMEZAWA Hiroyuki wrote:

> > > +#ifdef CONFIG_MIGRATION_BY_KERNEL
> > > +struct anon_vma *anon_vma_hold(struct page *page) {
> > > +	struct anon_vma *anon_vma;
> > > +	anon_vma = page_lock_anon_vma(page);
> > > +	if (!anon_vma)
> > > +		return NULL;
> > > +	atomic_set(&anon_vma->ref, 1);
> > 
> > Why use an atomic value if it is set and cleared within a spinlock?
> 
> anon_vma_free(), which see this value, doesn't take any lock and use atomic ops.
> I used atomic ops to handle atomic_t.

anon_vma_free() only reads the value. Thus no race. You do not need an 
atomic_t. atomic_t is only necessary if a variable needs to be changed 
atomically. Reading a word from memory is atomic regardless.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch] memory unplug v3 [2/4] migration by kernel
  2007-05-23  1:56       ` Christoph Lameter
@ 2007-05-23  2:09         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-05-23  2:09 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, mel, y-goto

On Tue, 22 May 2007 18:56:56 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Wed, 23 May 2007, KAMEZAWA Hiroyuki wrote:
> 
> > > > +#ifdef CONFIG_MIGRATION_BY_KERNEL
> > > > +struct anon_vma *anon_vma_hold(struct page *page) {
> > > > +	struct anon_vma *anon_vma;
> > > > +	anon_vma = page_lock_anon_vma(page);
> > > > +	if (!anon_vma)
> > > > +		return NULL;
> > > > +	atomic_set(&anon_vma->ref, 1);
> > > 
> > > Why use an atomic value if it is set and cleared within a spinlock?
> > 
> > anon_vma_free(), which see this value, doesn't take any lock and use atomic ops.
> > I used atomic ops to handle atomic_t.
> 
> anon_vma_free() only reads the value. Thus no race. You do not need an 
> atomic_t. atomic_t is only necessary if a variable needs to be changed 
> atomically. Reading a word from memory is atomic regardless.
> 
thank you for pointing out. I understand.

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch] memory unplug v3 [2/4] migration by kernel
  2007-05-22 18:49   ` Christoph Lameter
  2007-05-23  1:45     ` KAMEZAWA Hiroyuki
@ 2007-05-23 19:14     ` Mel Gorman
  2007-05-25  7:43       ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2007-05-23 19:14 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: KAMEZAWA Hiroyuki, linux-mm, y-goto

On Tue, 22 May 2007, Christoph Lameter wrote:

> On Tue, 22 May 2007, KAMEZAWA Hiroyuki wrote:
>
>> +config MIGRATION_BY_KERNEL
>> +	bool "Page migration by kernel's page scan"
>> +	def_bool y
>> +	depends on MIGRATION
>> +	help
>> +	  Allows page migration from kernel context. This means page migration
>> +	  can be done by codes other than sys_migrate() system call. Will add
>> +	  some additional check code in page migration.
>
> I think the scope of this is much bigger than you imagine. This is also
> going to be useful when Mel is going to implement defragmentation. So I
> think this should not be a separate option but be on by default.
>

I'm not 100% sure but chances are I need this.

I put together a memory compaction prototype today[*] to check because 
it's been put off long enough. However, memory compaction works whether I 
called migrate_pages() or migrate_pages_nocontext() even when regularly 
compacting under load. That said, calling migrate_pages() is probably 
racing like mad and I am not getting nailed for it as the test machine is 
small with one CPU and the stress load is kernel compiles instead of 
processes with mapped data. I'm basing compaction on top of a slightly 
modified version of this patch and will revisit it later.

Incidentally, the results of the compaction at rest are;

Freelists before compaction
Node    0, zone   Normal, type    Unmovable    302     55     26     20     12      6      2      0      0      0      0
Node    0, zone   Normal, type  Reclaimable   3165    734    218     28      3      0      0      0      0      0      0
Node    0, zone   Normal, type      Movable   4986   2222   1980   1553    752    238     26      2      0      0      0
Node    0, zone   Normal, type      Reserve      5      3      0      0      1      1      0      0      1      1      0

Freelists after compaction
Node    0, zone   Normal, type    Unmovable    278     32     14     12     10      5      4      2      0      0      0
Node    0, zone   Normal, type  Reclaimable   3184    743    226     32      3      0      0      0      0      0      0
Node    0, zone   Normal, type      Movable    862    676    599    421    238     94     17      6      4      3     31
Node    0, zone   Normal, type      Reserve      1      1      1      1      1      1      1      1      1      1      0

So it's doing something and the machine hasn't killed itself in the face. 
Aside, the page migration framework is ridiculously easy to work with - 
kudos to all who worked on it.

[*] Considering a working prototype only took a day to put
     together, I'm irritated it took me this long to get around to it.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch] memory unplug v3 [2/4] migration by kernel
  2007-05-23 19:14     ` Mel Gorman
@ 2007-05-25  7:43       ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-05-25  7:43 UTC (permalink / raw)
  To: Mel Gorman; +Cc: clameter, linux-mm, y-goto

On Wed, 23 May 2007 20:14:39 +0100 (IST)
Mel Gorman <mel@csn.ul.ie> wrote:

> I put together a memory compaction prototype today[*] to check because 
> it's been put off long enough. However, memory compaction works whether I 
> called migrate_pages() or migrate_pages_nocontext() even when regularly 
> compacting under load. That said, calling migrate_pages() is probably 
> racing like mad and I am not getting nailed for it as the test machine is 
> small with one CPU and the stress load is kernel compiles instead of 
> processes with mapped data. I'm basing compaction on top of a slightly 
> modified version of this patch and will revisit it later.
> 
thank you for test :)

We (I and Goto-san) saw !page_mapped(page) case in try_to_unmap() under heavy
memory pressure,....swapping.
So, at least, 
==
+	if (page_mapped(page))
+		try_to_unmap(page, 1);
==
This change is necessary.

About anon_vma, see comments in page_remove_rmap().

> Incidentally, the results of the compaction at rest are;
> 
> Freelists before compaction
> Node    0, zone   Normal, type    Unmovable    302     55     26     20     12      6      2      0      0      0      0
> Node    0, zone   Normal, type  Reclaimable   3165    734    218     28      3      0      0      0      0      0      0
> Node    0, zone   Normal, type      Movable   4986   2222   1980   1553    752    238     26      2      0      0      0
> Node    0, zone   Normal, type      Reserve      5      3      0      0      1      1      0      0      1      1      0
> 
> Freelists after compaction
> Node    0, zone   Normal, type    Unmovable    278     32     14     12     10      5      4      2      0      0      0
> Node    0, zone   Normal, type  Reclaimable   3184    743    226     32      3      0      0      0      0      0      0
> Node    0, zone   Normal, type      Movable    862    676    599    421    238     94     17      6      4      3     31
> Node    0, zone   Normal, type      Reserve      1      1      1      1      1      1      1      1      1      1      0
> 
> So it's doing something and the machine hasn't killed itself in the face. 
> Aside, the page migration framework is ridiculously easy to work with - 
> kudos to all who worked on it.
> 

I'll write this patch as one independent from memory unplug, AMAP.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Patch] memory unplug v3 [3/4] page removal
  2007-05-22  6:58 [Patch] memory unplug v3 [0/4] KAMEZAWA Hiroyuki
  2007-05-22  7:01 ` [Patch] memory unplug v3 [1/4] page isolation KAMEZAWA Hiroyuki
  2007-05-22  7:04 ` [Patch] memory unplug v3 [2/4] migration by kernel KAMEZAWA Hiroyuki
@ 2007-05-22  7:07 ` KAMEZAWA Hiroyuki
  2007-05-22 18:52   ` Christoph Lameter
  2007-05-22  7:08 ` [Patch] memory unplug v3 [4/4] ia64 interface KAMEZAWA Hiroyuki
  2007-05-22 18:34 ` [Patch] memory unplug v3 [0/4] Christoph Lameter
  4 siblings, 1 reply; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-05-22  7:07 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto, clameter

This is page hot removal core patch.

How this works:
* isolate all specified range.
* for_all_pfn_in_range
	- skip if !pfn_valid()
	- skip if page_count(page)==0 && PageReserved()
	- skip if a page is isolated (freed)
	- migrate a page if it is used. (uses migration_nocontext)
	- if page cannot be migrated, returns -EBUSY.
* if timeout returns -EAGAIN.
* if signals are recevied, returns -EINTR.

* Make all pages in the range to be Reserved if they all are freed.


* This patch doesn't implement a user interface. An arch, which want to
  support memory unplug, should add offline_pages() call to its remove_pages().
  (see ia64 patch)
* This patch doesn't free memmap. this will be implemented by other patch.

if your arch support,
echo offline > /sys/devices/system/memory/memoryXXX/state 
will offline memory if it can.

offliend memory can be added again by
echo online > /sys/device/system/memory/memoryXXX/state.

A kind of defrag by hand :).

I wonder the logic can be more sophisticated and simpler...

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Index: devel-2.6.22-rc1-mm1/mm/Kconfig
===================================================================
--- devel-2.6.22-rc1-mm1.orig/mm/Kconfig	2007-05-22 15:12:29.000000000 +0900
+++ devel-2.6.22-rc1-mm1/mm/Kconfig	2007-05-22 15:12:30.000000000 +0900
@@ -126,6 +126,12 @@
 	def_bool y
 	depends on SPARSEMEM && MEMORY_HOTPLUG
 
+config MEMORY_HOTREMOVE
+	bool "Allow for memory hot remove"
+	depends on MEMORY_HOTPLUG
+	select MIGRATION
+	select MIGRATION_BY_KERNEL
+
 # Heavily threaded applications may benefit from splitting the mm-wide
 # page_table_lock, so that faults on different parts of the user address
 # space can be handled with less contention: split it at this NR_CPUS.
Index: devel-2.6.22-rc1-mm1/mm/memory_hotplug.c
===================================================================
--- devel-2.6.22-rc1-mm1.orig/mm/memory_hotplug.c	2007-05-22 14:30:39.000000000 +0900
+++ devel-2.6.22-rc1-mm1/mm/memory_hotplug.c	2007-05-22 15:12:30.000000000 +0900
@@ -23,6 +23,9 @@
 #include <linux/vmalloc.h>
 #include <linux/ioport.h>
 #include <linux/cpuset.h>
+#include <linux/delay.h>
+#include <linux/migrate.h>
+#include <linux/page-isolation.h>
 
 #include <asm/tlbflush.h>
 
@@ -308,3 +311,196 @@
 	return ret;
 }
 EXPORT_SYMBOL_GPL(add_memory);
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+
+static struct page *
+hotremove_migrate_alloc(struct page *page,
+			unsigned long private,
+			int **x)
+{
+	return alloc_page(GFP_HIGH_MOVABLE);
+}
+
+
+#define NR_OFFLINE_AT_ONCE_PAGES	(256)
+static int
+do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct page *page;
+	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+	int not_managed = 0;
+	int ret = 0;
+	LIST_HEAD(source);
+
+	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		/* page is isolated or being freed ? */
+		if ((page_count(page) == 0) || PageReserved(page))
+			continue;
+		ret = isolate_lru_page(page, &source);
+
+		if (ret == 0) {
+			move_pages--;
+		} else {
+			not_managed++;
+		}
+	}
+	ret = -EBUSY;
+	if (not_managed) {
+		if (!list_empty(&source))
+			putback_lru_pages(&source);
+		goto out;
+	}
+	ret = 0;
+	if (list_empty(&source))
+		goto out;
+	/* this function returns # of failed pages */
+	ret = migrate_pages_nocontext(&source, hotremove_migrate_alloc, 0);
+
+out:
+	return ret;
+}
+
+/*
+ * remove from free_area[] and mark all as Reserved.
+ */
+static void
+offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
+{
+	struct resource res;
+	unsigned long tmp_start, tmp_end;
+
+	res.start = start_pfn << PAGE_SHIFT;
+	res.end = (end_pfn - 1) << PAGE_SHIFT;
+	res.flags = IORESOURCE_MEM;
+	while ((res.start < res.end) && (find_next_system_ram(&res) >= 0)) {
+		tmp_start = res.start >> PAGE_SHIFT;
+		tmp_end = (res.end >> PAGE_SHIFT) + 1;
+		/* this function touches free_area[]...so please see
+		   page_alloc.c */
+		__offline_isolated_pages(tmp_start, tmp_end);
+		res.start = res.end + 1;
+		res.end = end_pfn;
+	}
+}
+
+/*
+ * Check all pages in range, recoreded as memory resource, are isolated.
+ */
+static long
+check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
+{
+	struct resource res;
+	unsigned long tmp_start, tmp_end;
+	int ret, offlined = 0;
+
+	res.start = start_pfn << PAGE_SHIFT;
+	res.end = (end_pfn - 1) << PAGE_SHIFT;
+	res.flags = IORESOURCE_MEM;
+	while ((res.start < res.end) && (find_next_system_ram(&res) >= 0)) {
+		tmp_start = res.start >> PAGE_SHIFT;
+		tmp_end = (res.end >> PAGE_SHIFT) + 1;
+		ret = test_pages_isolated(tmp_start, tmp_end);
+		if (ret)
+			return -EBUSY;
+		offlined += tmp_end - tmp_start;
+		res.start = res.end + 1;
+		res.end = end_pfn;
+	}
+	return offlined;
+}
+
+
+int offline_pages(unsigned long start_pfn,
+		  unsigned long end_pfn, unsigned long timeout)
+{
+	unsigned long pfn, nr_pages, expire;
+	long offlined_pages;
+	int ret;
+	struct page *page;
+	struct zone *zone;
+
+	BUG_ON(start_pfn >= end_pfn);
+	/* at least, alignment against pageblock is necessary */
+	if (start_pfn & (NR_PAGES_ISOLATION_BLOCK - 1))
+		return -EINVAL;
+	if (end_pfn & (NR_PAGES_ISOLATION_BLOCK - 1))
+		return -EINVAL;
+	/* This makes hotplug much easier...and readable.
+	   we assume this for now. .*/
+	if (page_zone(pfn_to_page(start_pfn)) !=
+		page_zone(pfn_to_page(end_pfn - 1)))
+		return -EINVAL;
+	/* set above range as isolated */
+	ret = isolate_pages(start_pfn, end_pfn);
+	if (ret)
+		return ret;
+	nr_pages = end_pfn - start_pfn;
+	pfn = start_pfn;
+	expire = jiffies + timeout;
+repeat:
+	/* start memory hot removal */
+	ret = -EAGAIN;
+	if (time_after(jiffies, expire))
+		goto failed_removal;
+	ret = -EINTR;
+	if (signal_pending(current))
+		goto failed_removal;
+	ret = 0;
+	/* drain all zone's lru pagevec */
+	lru_add_drain_all();
+
+	/* skip isolated pages */
+	for(; pfn < end_pfn; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		if (PageReserved(page))
+			continue;
+		if (!is_page_isolated(page))
+			break;
+	}
+	/* start point is here */
+	if (pfn != end_pfn) {
+		ret = do_migrate_range(pfn, end_pfn);
+		if (!ret) {
+			cond_resched();
+			goto repeat;
+		} else if (ret < 0) {
+			goto failed_removal;
+		} else if (ret > 0) {
+			/* some congestion found. sleep a bit */
+			msleep(10);
+			goto repeat;
+		}
+	}
+	/* check again */
+	ret = check_pages_isolated(start_pfn, end_pfn);
+	if (ret < 0) {
+		goto failed_removal;
+	}
+	offlined_pages = ret;
+	/* Ok, all of our target is islaoted.
+	   We cannot do rollback at this point. */
+	offline_isolated_pages(start_pfn, end_pfn);
+	/* removal success */
+	zone = page_zone(pfn_to_page(start_pfn));
+	zone->present_pages -= offlined_pages;
+	zone->zone_pgdat->node_present_pages -= offlined_pages;
+	totalram_pages -= offlined_pages;
+	num_physpages -= offlined_pages;
+	vm_total_pages = nr_free_pagecache_pages();
+	writeback_set_ratelimit();
+	return 0;
+
+failed_removal:
+	printk("memory offlining %lx to %lx failed\n",start_pfn, end_pfn);
+	/* pushback to free area */
+	free_isolated_pages(start_pfn, end_pfn);
+	return ret;
+}
+#endif /* CONFIG_MEMORY_HOTREMOVE */
Index: devel-2.6.22-rc1-mm1/include/linux/memory_hotplug.h
===================================================================
--- devel-2.6.22-rc1-mm1.orig/include/linux/memory_hotplug.h	2007-05-22 14:30:39.000000000 +0900
+++ devel-2.6.22-rc1-mm1/include/linux/memory_hotplug.h	2007-05-22 15:12:30.000000000 +0900
@@ -59,7 +59,10 @@
 extern void online_page(struct page *page);
 /* VM interface that may be used by firmware interface */
 extern int online_pages(unsigned long, unsigned long);
-
+#ifdef CONFIG_MEMORY_HOTREMOVE
+extern int offline_pages(unsigned long, unsigned long, unsigned long);
+extern void __offline_isolated_pages(unsigned long, unsigned long);
+#endif
 /* reasonably generic interface to expand the physical pages in a zone  */
 extern int __add_pages(struct zone *zone, unsigned long start_pfn,
 	unsigned long nr_pages);
Index: devel-2.6.22-rc1-mm1/mm/page_alloc.c
===================================================================
--- devel-2.6.22-rc1-mm1.orig/mm/page_alloc.c	2007-05-22 15:12:28.000000000 +0900
+++ devel-2.6.22-rc1-mm1/mm/page_alloc.c	2007-05-22 15:12:30.000000000 +0900
@@ -4447,3 +4447,52 @@
 out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+/*
+ * All pages in the range must be isolated before calling this.
+ */
+void
+__offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
+{
+	struct page *page, *tmp;
+	struct zone *zone;
+	struct free_area *area;
+	int order, i;
+	unsigned long pfn;
+	/* find the first valid pfn */
+	for (pfn = start_pfn; pfn < end_pfn; pfn++)
+		if (pfn_valid(pfn))
+			break;
+	if (pfn == end_pfn)
+		return;
+	zone = page_zone(pfn_to_page(pfn));
+	spin_lock(&zone->lock);
+	printk("do isoalte \n");
+	for (order = 0; order < MAX_ORDER; order++) {
+		area = &zone->free_area[order];
+		list_for_each_entry_safe(page, tmp,
+					 &area->free_list[MIGRATE_ISOLATE],
+					 lru) {
+			pfn = page_to_pfn(page);
+			if (pfn < start_pfn || end_pfn <= pfn)
+				continue;
+			printk("found %lx %lx %lx\n",
+			       start_pfn, pfn, end_pfn);
+			list_del(&page->lru);
+			rmv_page_order(page);
+			area->nr_free--;
+			__mod_zone_page_state(zone, NR_FREE_PAGES,
+					      - (1UL << order));
+		}
+	}
+	spin_unlock(&zone->lock);
+	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		BUG_ON(page_count(page));
+		SetPageReserved(page);
+	}
+}
+#endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch] memory unplug v3 [3/4] page removal
  2007-05-22  7:07 ` [Patch] memory unplug v3 [3/4] page removal KAMEZAWA Hiroyuki
@ 2007-05-22 18:52   ` Christoph Lameter
  2007-05-23  1:50     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2007-05-22 18:52 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto

On Tue, 22 May 2007, KAMEZAWA Hiroyuki wrote:

> +static int
> +do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long pfn;
> +	struct page *page;
> +	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
> +	int not_managed = 0;
> +	int ret = 0;
> +	LIST_HEAD(source);
> +
> +	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
> +		if (!pfn_valid(pfn))
> +			continue;
> +		page = pfn_to_page(pfn);
> +		/* page is isolated or being freed ? */
> +		if ((page_count(page) == 0) || PageReserved(page))
> +			continue;

The check above is not necessary. A Page count = 0 page is not on the LRU 
neither is a Reserved page.

> +	/* this function returns # of failed pages */
> +	ret = migrate_pages_nocontext(&source, hotremove_migrate_alloc, 0);

You have no context so the last parameter should be 1?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch] memory unplug v3 [3/4] page removal
  2007-05-22 18:52   ` Christoph Lameter
@ 2007-05-23  1:50     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-05-23  1:50 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, mel, y-goto

On Tue, 22 May 2007 11:52:11 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Tue, 22 May 2007, KAMEZAWA Hiroyuki wrote:
> 
> > +static int
> > +do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> > +{
> > +	unsigned long pfn;
> > +	struct page *page;
> > +	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
> > +	int not_managed = 0;
> > +	int ret = 0;
> > +	LIST_HEAD(source);
> > +
> > +	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
> > +		if (!pfn_valid(pfn))
> > +			continue;
> > +		page = pfn_to_page(pfn);
> > +		/* page is isolated or being freed ? */
> > +		if ((page_count(page) == 0) || PageReserved(page))
> > +			continue;
> 
> The check above is not necessary. A Page count = 0 page is not on the LRU 
> neither is a Reserved page.

Ah, ok. but I'm now treating error in isolate_lru_page() as fatal.
This code avoid that isolate_lru_page() returns error by !PageLRU().
I'll consider again this part.

> > +	/* this function returns # of failed pages */
> > +	ret = migrate_pages_nocontext(&source, hotremove_migrate_alloc, 0);
> 
> You have no context so the last parameter should be 1?
>
migrate_pages_noccontest()'s 3rd param is equal to migrate_pages()'s 3rd param 'private'.


-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Patch] memory unplug v3 [4/4] ia64 interface
  2007-05-22  6:58 [Patch] memory unplug v3 [0/4] KAMEZAWA Hiroyuki
                   ` (2 preceding siblings ...)
  2007-05-22  7:07 ` [Patch] memory unplug v3 [3/4] page removal KAMEZAWA Hiroyuki
@ 2007-05-22  7:08 ` KAMEZAWA Hiroyuki
  2007-05-22 18:34 ` [Patch] memory unplug v3 [0/4] Christoph Lameter
  4 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-05-22  7:08 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto, clameter

Add call for offline_pages() to ia64.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Index: devel-2.6.22-rc1-mm1/arch/ia64/mm/init.c
===================================================================
--- devel-2.6.22-rc1-mm1.orig/arch/ia64/mm/init.c	2007-05-22 14:30:38.000000000 +0900
+++ devel-2.6.22-rc1-mm1/arch/ia64/mm/init.c	2007-05-22 15:12:31.000000000 +0900
@@ -724,7 +724,17 @@
 
 int remove_memory(u64 start, u64 size)
 {
-	return -EINVAL;
+	unsigned long start_pfn, end_pfn;
+	unsigned long timeout = 120 * HZ;
+	int ret;
+	start_pfn = start >> PAGE_SHIFT;
+	end_pfn = start_pfn + (size >> PAGE_SHIFT);
+	ret = offline_pages(start_pfn, end_pfn, timeout);
+	if (ret)
+		goto out;
+	/* we can free mem_map at this point */
+out:
+	return ret;
 }
 EXPORT_SYMBOL_GPL(remove_memory);
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch] memory unplug v3 [0/4]
  2007-05-22  6:58 [Patch] memory unplug v3 [0/4] KAMEZAWA Hiroyuki
                   ` (3 preceding siblings ...)
  2007-05-22  7:08 ` [Patch] memory unplug v3 [4/4] ia64 interface KAMEZAWA Hiroyuki
@ 2007-05-22 18:34 ` Christoph Lameter
  2007-05-23  1:59   ` KAMEZAWA Hiroyuki
  4 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2007-05-22 18:34 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: Linux-MM, mel, y-goto

On Tue, 22 May 2007, KAMEZAWA Hiroyuki wrote:

>  - user kernelcore=XXX boot option to create ZONE_MOVABLE.
>    Memory unplug itself can work without ZONE_MOVABLE but it will be
>    better to use kernelcore= if your section size is big.

Hmmm.... Sure wish the ZONE_MOVABLE would go away. Isnt there some way to 
have a dynamic boundary within ZONE_NORMAL?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch] memory unplug v3 [0/4]
  2007-05-22 18:34 ` [Patch] memory unplug v3 [0/4] Christoph Lameter
@ 2007-05-23  1:59   ` KAMEZAWA Hiroyuki
  2007-05-23  2:09     ` Christoph Lameter
  0 siblings, 1 reply; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-05-23  1:59 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, mel, y-goto

On Tue, 22 May 2007 11:34:04 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Tue, 22 May 2007, KAMEZAWA Hiroyuki wrote:
> 
> >  - user kernelcore=XXX boot option to create ZONE_MOVABLE.
> >    Memory unplug itself can work without ZONE_MOVABLE but it will be
> >    better to use kernelcore= if your section size is big.
> 
> Hmmm.... Sure wish the ZONE_MOVABLE would go away. Isnt there some way to 
> have a dynamic boundary within ZONE_NORMAL?
> 
Hmm. 
1. Assume there is only ZONE_NORMAL.
2. grouping pages into MIGRATE_UNMOVABLE, MOGIRATE_RECLAIMABLE, MIGRATE_MOVABLE.
   Some range of pages can be used "only" for MIGRATE_MOVABLE(+ RECLAIMABLE)
3. page recaliming algorithm should know what type of page they should reclaim.

Current page reclaming is zone-based. So I think adding zone is not a bad option
if we use zone-based reclaiming. 

If I think of a simple way to avoid adding new zone, I'll post it. but not yet.

-Kame
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Patch] memory unplug v3 [0/4]
  2007-05-23  1:59   ` KAMEZAWA Hiroyuki
@ 2007-05-23  2:09     ` Christoph Lameter
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2007-05-23  2:09 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto

On Wed, 23 May 2007, KAMEZAWA Hiroyuki wrote:

> If I think of a simple way to avoid adding new zone, I'll post it. but not yet.

Good..... I know you can do it ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-05-25  7:43 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-22  6:58 [Patch] memory unplug v3 [0/4] KAMEZAWA Hiroyuki
2007-05-22  7:01 ` [Patch] memory unplug v3 [1/4] page isolation KAMEZAWA Hiroyuki
2007-05-22 10:19   ` Mel Gorman
2007-05-22 11:01     ` KAMEZAWA Hiroyuki
2007-05-22 18:38   ` Christoph Lameter
2007-05-23  1:41     ` KAMEZAWA Hiroyuki
2007-05-22  7:04 ` [Patch] memory unplug v3 [2/4] migration by kernel KAMEZAWA Hiroyuki
2007-05-22 18:49   ` Christoph Lameter
2007-05-23  1:45     ` KAMEZAWA Hiroyuki
2007-05-23  1:56       ` Christoph Lameter
2007-05-23  2:09         ` KAMEZAWA Hiroyuki
2007-05-23 19:14     ` Mel Gorman
2007-05-25  7:43       ` KAMEZAWA Hiroyuki
2007-05-22  7:07 ` [Patch] memory unplug v3 [3/4] page removal KAMEZAWA Hiroyuki
2007-05-22 18:52   ` Christoph Lameter
2007-05-23  1:50     ` KAMEZAWA Hiroyuki
2007-05-22  7:08 ` [Patch] memory unplug v3 [4/4] ia64 interface KAMEZAWA Hiroyuki
2007-05-22 18:34 ` [Patch] memory unplug v3 [0/4] Christoph Lameter
2007-05-23  1:59   ` KAMEZAWA Hiroyuki
2007-05-23  2:09     ` Christoph Lameter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox