memory unplug v4 intro [0/6]

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* memory unplug v4 intro [0/6]
@ 2007-06-08  5:35 KAMEZAWA Hiroyuki
  2007-06-08  5:38 ` memory unplug v4 intro [1/6] migration without mm->sem KAMEZAWA Hiroyuki
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-06-08  5:35 UTC (permalink / raw)
  To: linux-mm; +Cc: mel, y-goto, clameter, hugh, kamezawa.hiroyu

Hi,

This is memory unplug base patcheset v4 against 2.6.22-rc4-mm2.
for review and for testers.

Changelog V3 -> V4
- rebased to 2.6.22-rc4-mm2
- cleaned up. it seems simpler than previous ones.
- instread of adding refcnt to anon_vma, using dummy_vma.
- page scan logic is a bit changed.
- order of patches is changed.

We tested this patch on ia64/NUMA.

=
How to use
 - user kernelcore=XXX boot option to create ZONE_MOVABLE.
   Memory unplug itself can work without ZONE_MOVABLE but it will be
   better to use kernelcore= if your section size is big.
  
 - After bootup, execute following.
     # echo "offline" > /sys/devices/system/memory/memoryX/state
 - you can push back offlined memory by following
     # echo "online" > /sys/devices/system/memory/memoryX/state
    
TODO
 - remove memmap after memory unplug. (for sparsemem)
 - more tests and find - page which cannot be freed -
 - Now, there is no check around ZONE_MOVABLE and bootmem.
   I hope bootmem can treat kernelcore=....
 - add better logic to allocate memory for migration.
 - speed up under heavy workload.
 - node hotplug support
 - Should make i386/x86-64/powerpc interface code. But not yet 

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* memory unplug v4 intro [1/6] migration without mm->sem
  2007-06-08  5:35 memory unplug v4 intro [0/6] KAMEZAWA Hiroyuki
@ 2007-06-08  5:38 ` KAMEZAWA Hiroyuki
  2007-06-08  5:47   ` Christoph Lameter
  2007-06-08  5:39 ` memory unplug v4 [2/6] lru isolation race fix KAMEZAWA Hiroyuki
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-06-08  5:38 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto, clameter, hugh

page migratio by kernel v4.

Changelog V3 -> V4
 *use dummy_vma instead of 'int' refcnt. 
 *add dummy_vma handling helper functions.
 *remove funcs for refcnt.
 *removed extra argment 'nocontext' to migrate_pages().
  This means extra check is always inserted into migrate_page() path.
 *removes migrate_pages_nocontext().


In usual, migrate_pages(page,,) is called with holoding mm->sem by systemcall.
(mm here is a mm_struct which maps the migration target page.)
This semaphore helps avoiding some race conditions.

But, if we want to migrate a page by some kernel codes, we have to avoid
some races. This patch adds check code for following race condition.

1. A page which is not mapped can be target of migration. Then, we have
   to check page_mapped() before calling try_to_unmap().

2. We can't trust page->mapping if page_mapcount() can goes down to 0.
   But when we map newpage back to original ptes, we have to access
   anon_vma from a page, which page_mapcount() is 0.
   This patch adds a special dummy_vma to anon_vma for avoiding
   anon_vma is freed while page is unmapped.

Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


---
 include/linux/rmap.h |   30 ++++++++++++++++++++++++++++++
 mm/migrate.c         |   16 +++++++++++++---
 mm/rmap.c            |   33 +++++++++++++++++++++++++++++++++
 3 files changed, 76 insertions(+), 3 deletions(-)

Index: devel-2.6.22-rc4-mm2/mm/migrate.c
===================================================================
--- devel-2.6.22-rc4-mm2.orig/mm/migrate.c
+++ devel-2.6.22-rc4-mm2/mm/migrate.c
@@ -231,7 +231,8 @@ static void remove_anon_migration_ptes(s
 	spin_lock(&anon_vma->lock);
 
 	list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
-		remove_migration_pte(vma, old, new);
+		if (!is_dummy_vma(vma))
+			remove_migration_pte(vma, old, new);
 
 	spin_unlock(&anon_vma->lock);
 }
@@ -612,6 +613,8 @@ static int unmap_and_move(new_page_t get
 	int rc = 0;
 	int *result = NULL;
 	struct page *newpage = get_new_page(page, private, &result);
+	struct anon_vma *anon_vma = NULL;
+	struct vm_area_struct holder;
 
 	if (!newpage)
 		return -ENOMEM;
@@ -632,17 +635,23 @@ static int unmap_and_move(new_page_t get
 			goto unlock;
 		wait_on_page_writeback(page);
 	}
-
+	/* hold this anon_vma until page migration ends */
+	if (PageAnon(page) && page_mapped(page))
+		anon_vma = anon_vma_hold(page, &holder);
 	/*
 	 * Establish migration ptes or remove ptes
 	 */
-	try_to_unmap(page, 1);
+	if (page_mapped(page))
+		try_to_unmap(page, 1);
+
 	if (!page_mapped(page))
 		rc = move_to_new_page(newpage, page);
 
 	if (rc)
 		remove_migration_ptes(page, page);
 
+	anon_vma_release(anon_vma, &holder);
+
 unlock:
 	unlock_page(page);
 
@@ -685,6 +694,7 @@ move_newpage:
  * retruned to the LRU or freed.
  *
  * Return: Number of pages not migrated or error code.
+ *
  */
 int migrate_pages(struct list_head *from,
 		new_page_t get_new_page, unsigned long private)
Index: devel-2.6.22-rc4-mm2/include/linux/rmap.h
===================================================================
--- devel-2.6.22-rc4-mm2.orig/include/linux/rmap.h
+++ devel-2.6.22-rc4-mm2/include/linux/rmap.h
@@ -42,6 +42,36 @@ static inline void anon_vma_free(struct 
 	kmem_cache_free(anon_vma_cachep, anon_vma);
 }
 
+#ifdef  CONFIG_MIGRATION
+/*
+ * anon_vma->head works as refcnt for anon_vma struct.
+ * Migration needs one reference to anon_vma while unmapping -> remapping.
+ * dummy vm_area_struct is used for adding one ref to anon_vma.
+ *
+ * This means a list-walker of anon_vma->head have to check vma is dummy
+ * or not. please use is_dummy_vma() for check.
+ */
+
+extern struct anon_vma *anon_vma_hold(struct page *, struct vm_area_struct *);
+extern void anon_vma_release(struct anon_vma *, struct vm_area_struct *);
+
+static inline void init_dummy_vma(struct vm_area_struct *vma)
+{
+	vma->vm_mm = NULL;
+}
+
+static inline int is_dummy_vma(struct vm_area_struct *vma)
+{
+	if (unlikely(vma->vm_mm == NULL))
+		return 1;
+	return 0;
+}
+#else
+static inline int is_dummy_vma(struct vm_area_struct *vma) {
+	return 0;
+}
+#endif
+
 static inline void anon_vma_lock(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
Index: devel-2.6.22-rc4-mm2/mm/rmap.c
===================================================================
--- devel-2.6.22-rc4-mm2.orig/mm/rmap.c
+++ devel-2.6.22-rc4-mm2/mm/rmap.c
@@ -203,6 +203,35 @@ static void page_unlock_anon_vma(struct 
 	spin_unlock(&anon_vma->lock);
 	rcu_read_unlock();
 }
+#ifdef CONFIG_MIGRATION
+/*
+ * Record anon_vma in holder->anon_vma.
+ * Returns 1 if vma is linked to anon_vma. otherwise 0.
+ */
+struct anon_vma *
+anon_vma_hold(struct page *page, struct vm_area_struct *holder)
+{
+	struct anon_vma *anon_vma = NULL;
+	holder->anon_vma = NULL;
+	anon_vma = page_lock_anon_vma(page);
+	if (anon_vma && !list_empty(&anon_vma->head)) {
+		init_dummy_vma(holder);
+		holder->anon_vma = anon_vma;
+		__anon_vma_link(holder);
+	}
+	if (anon_vma)
+		page_unlock_anon_vma(anon_vma);
+	return holder->anon_vma;
+}
+
+void anon_vma_release(struct anon_vma *anon_vma, struct vm_area_struct *holder)
+{
+	if (!anon_vma)
+		return;
+	BUG_ON(anon_vma != holder->anon_vma);
+	anon_vma_unlink(holder);
+}
+#endif
 
 /*
  * At what user virtual address is page expected in vma?
@@ -333,6 +362,8 @@ static int page_referenced_anon(struct p
 
 	mapcount = page_mapcount(page);
 	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
+		if (is_dummy_vma(vma))
+			continue;
 		referenced += page_referenced_one(page, vma, &mapcount);
 		if (!mapcount)
 			break;
@@ -864,6 +895,8 @@ static int try_to_unmap_anon(struct page
 		return ret;
 
 	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
+		if (is_dummy_vma(vma))
+			continue;
 		ret = try_to_unmap_one(page, vma, migration);
 		if (ret == SWAP_FAIL || !page_mapped(page))
 			break;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: memory unplug v4 intro [1/6] migration without mm->sem
  2007-06-08  5:38 ` memory unplug v4 intro [1/6] migration without mm->sem KAMEZAWA Hiroyuki
@ 2007-06-08  5:47   ` Christoph Lameter
  2007-06-08  5:54     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2007-06-08  5:47 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto, hugh

On Fri, 8 Jun 2007, KAMEZAWA Hiroyuki wrote:

> Index: devel-2.6.22-rc4-mm2/include/linux/rmap.h
> ===================================================================
> --- devel-2.6.22-rc4-mm2.orig/include/linux/rmap.h
> +++ devel-2.6.22-rc4-mm2/include/linux/rmap.h
> @@ -42,6 +42,36 @@ static inline void anon_vma_free(struct 
>  	kmem_cache_free(anon_vma_cachep, anon_vma);
>  }
>  
> +#ifdef  CONFIG_MIGRATION
> +/*
> + * anon_vma->head works as refcnt for anon_vma struct.
> + * Migration needs one reference to anon_vma while unmapping -> remapping.
> + * dummy vm_area_struct is used for adding one ref to anon_vma.
> + *
> + * This means a list-walker of anon_vma->head have to check vma is dummy
> + * or not. please use is_dummy_vma() for check.
> + */
> +
> +extern struct anon_vma *anon_vma_hold(struct page *, struct vm_area_struct *);
> +extern void anon_vma_release(struct anon_vma *, struct vm_area_struct *);
> +
> +static inline void init_dummy_vma(struct vm_area_struct *vma)
> +{
> +	vma->vm_mm = NULL;
> +}
> +
> +static inline int is_dummy_vma(struct vm_area_struct *vma)
> +{
> +	if (unlikely(vma->vm_mm == NULL))
> +		return 1;
> +	return 0;
> +}
> +#else
> +static inline int is_dummy_vma(struct vm_area_struct *vma) {
> +	return 0;
> +}
> +#endif
> +
>  static inline void anon_vma_lock(struct vm_area_struct *vma)
>  {
>  	struct anon_vma *anon_vma = vma->anon_vma;

Could you fold as much as possible into mm/migrate.c?

> Index: devel-2.6.22-rc4-mm2/mm/rmap.c
> ===================================================================
> --- devel-2.6.22-rc4-mm2.orig/mm/rmap.c
> +++ devel-2.6.22-rc4-mm2/mm/rmap.c
> @@ -203,6 +203,35 @@ static void page_unlock_anon_vma(struct 
>  	spin_unlock(&anon_vma->lock);
>  	rcu_read_unlock();
>  }
> +#ifdef CONFIG_MIGRATION
> +/*
> + * Record anon_vma in holder->anon_vma.
> + * Returns 1 if vma is linked to anon_vma. otherwise 0.
> + */
> +struct anon_vma *
> +anon_vma_hold(struct page *page, struct vm_area_struct *holder)
> +{
> +	struct anon_vma *anon_vma = NULL;
> +	holder->anon_vma = NULL;
> +	anon_vma = page_lock_anon_vma(page);
> +	if (anon_vma && !list_empty(&anon_vma->head)) {
> +		init_dummy_vma(holder);
> +		holder->anon_vma = anon_vma;
> +		__anon_vma_link(holder);
> +	}
> +	if (anon_vma)
> +		page_unlock_anon_vma(anon_vma);
> +	return holder->anon_vma;
> +}
> +
> +void anon_vma_release(struct anon_vma *anon_vma, struct vm_area_struct *holder)
> +{
> +	if (!anon_vma)
> +		return;
> +	BUG_ON(anon_vma != holder->anon_vma);
> +	anon_vma_unlink(holder);
> +}
> +#endif

This is mostly also specific to page migration?

> @@ -333,6 +362,8 @@ static int page_referenced_anon(struct p
>  
>  	mapcount = page_mapcount(page);
>  	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
> +		if (is_dummy_vma(vma))
> +			continue;
>  		referenced += page_referenced_one(page, vma, &mapcount);
>  		if (!mapcount)
>  			break;
> @@ -864,6 +895,8 @@ static int try_to_unmap_anon(struct page
>  		return ret;
>  
>  	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
> +		if (is_dummy_vma(vma))
> +			continue;
>  		ret = try_to_unmap_one(page, vma, migration);
>  		if (ret == SWAP_FAIL || !page_mapped(page))
>  			break;

Could you avoid these checks by having page_referend_one fail
appropriately on the dummy vma?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: memory unplug v4 intro [1/6] migration without mm->sem
  2007-06-08  5:47   ` Christoph Lameter
@ 2007-06-08  5:54     ` KAMEZAWA Hiroyuki
  2007-06-08  5:57       ` Christoph Lameter
  0 siblings, 1 reply; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-06-08  5:54 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, mel, y-goto, hugh

On Thu, 7 Jun 2007 22:47:08 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> >  static inline void anon_vma_lock(struct vm_area_struct *vma)
> >  {
> >  	struct anon_vma *anon_vma = vma->anon_vma;
> 
> Could you fold as much as possible into mm/migrate.c?
> 
Ah, maybe ok. But scattering codes around rmap in several files is ok ?


> > +void anon_vma_release(struct anon_vma *anon_vma, struct vm_area_struct *holder)
> > +{
> > +	if (!anon_vma)
> > +		return;
> > +	BUG_ON(anon_vma != holder->anon_vma);
> > +	anon_vma_unlink(holder);
> > +}
> > +#endif
> 
> This is mostly also specific to page migration?
> 
yes. 

> > @@ -333,6 +362,8 @@ static int page_referenced_anon(struct p
> >  
> >  	mapcount = page_mapcount(page);
> >  	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
> > +		if (is_dummy_vma(vma))
> > +			continue;
> >  		referenced += page_referenced_one(page, vma, &mapcount);
> >  		if (!mapcount)
> >  			break;
> > @@ -864,6 +895,8 @@ static int try_to_unmap_anon(struct page
> >  		return ret;
> >  
> >  	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
> > +		if (is_dummy_vma(vma))
> > +			continue;
> >  		ret = try_to_unmap_one(page, vma, migration);
> >  		if (ret == SWAP_FAIL || !page_mapped(page))
> >  			break;
> 
> Could you avoid these checks by having page_referend_one fail
> appropriately on the dummy vma?
> 
Hmm, Is this better ?
==
static int page_referenced_one(struct page *page,
        struct vm_area_struct *vma, unsigned int *mapcount)
{
        struct mm_struct *mm = vma->vm_mm;
        unsigned long address;
        pte_t *pte;
        spinlock_t *ptl;
        int referenced = 0;

+	if(is_dummy_vma(vma))
+		return 0;
==

-Kame







--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: memory unplug v4 intro [1/6] migration without mm->sem
  2007-06-08  5:54     ` KAMEZAWA Hiroyuki
@ 2007-06-08  5:57       ` Christoph Lameter
  2007-06-08  6:06         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2007-06-08  5:57 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto, hugh

On Fri, 8 Jun 2007, KAMEZAWA Hiroyuki wrote:

> On Thu, 7 Jun 2007 22:47:08 -0700 (PDT)
> Christoph Lameter <clameter@sgi.com> wrote:
> 
> > >  static inline void anon_vma_lock(struct vm_area_struct *vma)
> > >  {
> > >  	struct anon_vma *anon_vma = vma->anon_vma;
> > 
> > Could you fold as much as possible into mm/migrate.c?
> > 
> Ah, maybe ok. But scattering codes around rmap in several files is ok ?

No. Lets try to keep the changes to rmap minimal.

> > Could you avoid these checks by having page_referend_one fail
> > appropriately on the dummy vma?
> > 
> Hmm, Is this better ?
> ==
> static int page_referenced_one(struct page *page,
>         struct vm_area_struct *vma, unsigned int *mapcount)
> {
>         struct mm_struct *mm = vma->vm_mm;
>         unsigned long address;
>         pte_t *pte;
>         spinlock_t *ptl;
>         int referenced = 0;
> 
> +	if(is_dummy_vma(vma))
> +		return 0;

The best solution would be if you could fill the dummy vma with such 
values that will give you the intended result without having to modify 
page_referenced_one. If you can make vma_address() fail then you have 
what you want. F.e. setting vma->vm_end to zero should do it. (is it not 
already zero?)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: memory unplug v4 intro [1/6] migration without mm->sem
  2007-06-08  5:57       ` Christoph Lameter
@ 2007-06-08  6:06         ` KAMEZAWA Hiroyuki
  2007-06-08  6:44           ` Christoph Lameter
  0 siblings, 1 reply; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-06-08  6:06 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, mel, y-goto, hugh

On Thu, 7 Jun 2007 22:57:19 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> > Ah, maybe ok. But scattering codes around rmap in several files is ok ?
> 
> No. Lets try to keep the changes to rmap minimal.
> 
Okay, will do my best.

> > > Could you avoid these checks by having page_referend_one fail
> > > appropriately on the dummy vma?
> > > 
> > Hmm, Is this better ?
> > ==
> > static int page_referenced_one(struct page *page,
> >         struct vm_area_struct *vma, unsigned int *mapcount)
> > {
> >         struct mm_struct *mm = vma->vm_mm;
> >         unsigned long address;
> >         pte_t *pte;
> >         spinlock_t *ptl;
> >         int referenced = 0;
> > 
> > +	if(is_dummy_vma(vma))
> > +		return 0;
> 
> The best solution would be if you could fill the dummy vma with such 
> values that will give you the intended result without having to modify 
> page_referenced_one. If you can make vma_address() fail then you have 
> what you want. F.e. setting vma->vm_end to zero should do it. (is it not 
> already zero?)
> 
> 
Hmm, maybe your option will work. I'll try it in the next set.
My concern is that almost all people will never imagine anon_vma can includes
dummy_vma in some special case..

-Kame


-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: memory unplug v4 intro [1/6] migration without mm->sem
  2007-06-08  6:06         ` KAMEZAWA Hiroyuki
@ 2007-06-08  6:44           ` Christoph Lameter
  2007-06-08  7:01             ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2007-06-08  6:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto, hugh

I think what Hugh meant is someething like this:

---
 mm/migrate.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

Index: linux-2.6/mm/migrate.c
===================================================================
--- linux-2.6.orig/mm/migrate.c	2007-06-07 23:27:07.000000000 -0700
+++ linux-2.6/mm/migrate.c	2007-06-07 23:40:25.000000000 -0700
@@ -209,10 +209,6 @@ static void remove_file_migration_ptes(s
 	spin_unlock(&mapping->i_mmap_lock);
 }
 
-/*
- * Must hold mmap_sem lock on at least one of the vmas containing
- * the page so that the anon_vma cannot vanish.
- */
 static void remove_anon_migration_ptes(struct page *old, struct page *new)
 {
 	struct anon_vma *anon_vma;
@@ -612,6 +608,7 @@ static int unmap_and_move(new_page_t get
 	int rc = 0;
 	int *result = NULL;
 	struct page *newpage = get_new_page(page, private, &result);
+	struct vm_area_struct dummy_vma = { 0 };
 
 	if (!newpage)
 		return -ENOMEM;
@@ -634,6 +631,12 @@ static int unmap_and_move(new_page_t get
 	}
 
 	/*
+	 * Add dummy vma so that the vma cannot vanish under us
+	 */
+	if (PageAnon(page))
+		anon_vma_link(&dummy_vma);
+
+	/*
 	 * Establish migration ptes or remove ptes
 	 */
 	try_to_unmap(page, 1);
@@ -643,6 +646,9 @@ static int unmap_and_move(new_page_t get
 	if (rc)
 		remove_migration_ptes(page, page);
 
+	/* Remove dummy vma */
+	if (PageAnon(page))
+		anon_vma_unlink(&dummy_vma);
 unlock:
 	unlock_page(page);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: memory unplug v4 intro [1/6] migration without mm->sem
  2007-06-08  6:44           ` Christoph Lameter
@ 2007-06-08  7:01             ` KAMEZAWA Hiroyuki
  2007-06-08  7:21               ` Christoph Lameter
  0 siblings, 1 reply; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-06-08  7:01 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, mel, y-goto, hugh

On Thu, 7 Jun 2007 23:44:38 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> I think what Hugh meant is someething like this:
> 
Hmm, I see. 


>  	/*
> +	 * Add dummy vma so that the vma cannot vanish under us
> +	 */
> +	if (PageAnon(page))
> +		anon_vma_link(&dummy_vma);
> +
Before calling anon_vma_link(), I have to set "dummy_vma->anon_vma = anon_vma".
anon_vma_hold() does what it has to do.
But it's not necessary to add anon_vma_hold() in rmap.c, as you pointed out.
I'll rewrite them as static func in migrate.c

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: memory unplug v4 intro [1/6] migration without mm->sem
  2007-06-08  7:01             ` KAMEZAWA Hiroyuki
@ 2007-06-08  7:21               ` Christoph Lameter
  2007-06-08  7:25                 ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2007-06-08  7:21 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto, hugh

On Fri, 8 Jun 2007, KAMEZAWA Hiroyuki wrote:

> On Thu, 7 Jun 2007 23:44:38 -0700 (PDT)
> Christoph Lameter <clameter@sgi.com> wrote:
> 
> > I think what Hugh meant is someething like this:
> > 
> Hmm, I see. 
> 
> 
> >  	/*
> > +	 * Add dummy vma so that the vma cannot vanish under us
> > +	 */
> > +	if (PageAnon(page))
> > +		anon_vma_link(&dummy_vma);
> > +
> Before calling anon_vma_link(), I have to set "dummy_vma->anon_vma = anon_vma".
> anon_vma_hold() does what it has to do.

Yup. Forgot that one.

> But it's not necessary to add anon_vma_hold() in rmap.c, as you pointed out.
> I'll rewrite them as static func in migrate.c

I do not think you need anon_vma_hold at all. Neither do you need to add 
any other function. The presence of the dummy vma while the page is 
removed and added guarantees that it does not vanish.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: memory unplug v4 intro [1/6] migration without mm->sem
  2007-06-08  7:21               ` Christoph Lameter
@ 2007-06-08  7:25                 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-06-08  7:25 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, mel, y-goto, hugh

On Fri, 8 Jun 2007 00:21:39 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> > But it's not necessary to add anon_vma_hold() in rmap.c, as you pointed out.
> > I'll rewrite them as static func in migrate.c
> 
> I do not think you need anon_vma_hold at all. Neither do you need to add 
> any other function. The presence of the dummy vma while the page is 
> removed and added guarantees that it does not vanish.
> 
Hmm, ok. add extra codes instead of new function.

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* memory unplug v4  [2/6] lru isolation race fix
  2007-06-08  5:35 memory unplug v4 intro [0/6] KAMEZAWA Hiroyuki
  2007-06-08  5:38 ` memory unplug v4 intro [1/6] migration without mm->sem KAMEZAWA Hiroyuki
@ 2007-06-08  5:39 ` KAMEZAWA Hiroyuki
  2007-06-08  5:52   ` Christoph Lameter
  2007-06-08  5:40 ` memory unplug v4 intro [3/6] walk memory resources KAMEZAWA Hiroyuki
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-06-08  5:39 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto, clameter, hugh

release_pages() in mm/swap.c changes page_count() to be 0
without clearing PageLRU flag...

This means isolate_lru_page() can see a page, PageLRU() && page_count(page)==0..
This is BUG. (get_page() will be called against count=0 page.)

Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 mm/migrate.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: devel-2.6.22-rc4-mm2/mm/migrate.c
===================================================================
--- devel-2.6.22-rc4-mm2.orig/mm/migrate.c
+++ devel-2.6.22-rc4-mm2/mm/migrate.c
@@ -49,7 +49,7 @@ int isolate_lru_page(struct page *page, 
 		struct zone *zone = page_zone(page);
 
 		spin_lock_irq(&zone->lru_lock);
-		if (PageLRU(page)) {
+		if (page_count(page) && PageLRU(page)) {
 			ret = 0;
 			get_page(page);
 			ClearPageLRU(page);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: memory unplug v4  [2/6] lru isolation race fix
  2007-06-08  5:39 ` memory unplug v4 [2/6] lru isolation race fix KAMEZAWA Hiroyuki
@ 2007-06-08  5:52   ` Christoph Lameter
  2007-06-08  5:58     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Lameter @ 2007-06-08  5:52 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto, hugh

On Fri, 8 Jun 2007, KAMEZAWA Hiroyuki wrote:

> release_pages() in mm/swap.c changes page_count() to be 0
> without clearing PageLRU flag...
> This means isolate_lru_page() can see a page, PageLRU() && page_count(page)==0..
> This is BUG. (get_page() will be called against count=0 page.)

Use get_page_unless_zero?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: memory unplug v4  [2/6] lru isolation race fix
  2007-06-08  5:52   ` Christoph Lameter
@ 2007-06-08  5:58     ` KAMEZAWA Hiroyuki
  2007-06-08  5:58       ` Christoph Lameter
  0 siblings, 1 reply; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-06-08  5:58 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-mm, mel, y-goto, hugh

On Thu, 7 Jun 2007 22:52:15 -0700 (PDT)
Christoph Lameter <clameter@sgi.com> wrote:

> On Fri, 8 Jun 2007, KAMEZAWA Hiroyuki wrote:
> 
> > release_pages() in mm/swap.c changes page_count() to be 0
> > without clearing PageLRU flag...
> > This means isolate_lru_page() can see a page, PageLRU() && page_count(page)==0..
> > This is BUG. (get_page() will be called against count=0 page.)
> 
> Use get_page_unless_zero?
> 
Oh, its better macro. thank you.

Then, the whole code will be....
==
 		if (PageLRU(page)) {
                        if (get_page_unless_zero(page)) {
				ret = 0;
	                        ClearPageLRU(page);
        	                if (PageActive(page))
                	                del_page_from_active_list(zone, page);
                        	else
                                	del_page_from_inactive_list(zone, page);
                        	list_add_tail(&page->lru, pagelist);
                	}
		}
==
Is this ok ?

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: memory unplug v4  [2/6] lru isolation race fix
  2007-06-08  5:58     ` KAMEZAWA Hiroyuki
@ 2007-06-08  5:58       ` Christoph Lameter
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Lameter @ 2007-06-08  5:58 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto, hugh

On Fri, 8 Jun 2007, KAMEZAWA Hiroyuki wrote:

> > Use get_page_unless_zero?
> > 
> Oh, its better macro. thank you.
> 
> Then, the whole code will be....
> ==
>  		if (PageLRU(page)) {
>                         if (get_page_unless_zero(page)) {

		if (PageLRU(page) && get_page_unless_zero(page))

but I am nit picking...

> 				ret = 0;
> 	                        ClearPageLRU(page);
>         	                if (PageActive(page))
>                 	                del_page_from_active_list(zone, page);
>                         	else
>                                 	del_page_from_inactive_list(zone, page);
>                         	list_add_tail(&page->lru, pagelist);
>                 	}
> 		}
> ==
> Is this ok ?

Looks better. But it will have to pass by Hugh too I guess...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* memory unplug v4 intro [3/6] walk memory resources.
  2007-06-08  5:35 memory unplug v4 intro [0/6] KAMEZAWA Hiroyuki
  2007-06-08  5:38 ` memory unplug v4 intro [1/6] migration without mm->sem KAMEZAWA Hiroyuki
  2007-06-08  5:39 ` memory unplug v4 [2/6] lru isolation race fix KAMEZAWA Hiroyuki
@ 2007-06-08  5:40 ` KAMEZAWA Hiroyuki
  2007-06-08  5:41 ` memory unplug v4 intro [4/6] page isolation KAMEZAWA Hiroyuki
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-06-08  5:40 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto, clameter, hugh

A clean up patch for "scanning memory resource [start, end)" operation.

Now, find_next_system_ram() function is used in memory hotplug, but this 
interface is not easy to use and codes are complicated.

This patch adds walk_memory_resouce(start,len,arg,func) function.
The function 'func' is called per valid memory resouce range in [start,pfn).

Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 include/linux/ioport.h         |    3 --
 include/linux/memory_hotplug.h |    9 ++++++++
 kernel/resource.c              |   26 ++++++++++++++++++++++-
 mm/memory_hotplug.c            |   45 +++++++++++++++++------------------------
 4 files changed, 53 insertions(+), 30 deletions(-)

Index: devel-2.6.22-rc4-mm2/kernel/resource.c
===================================================================
--- devel-2.6.22-rc4-mm2.orig/kernel/resource.c
+++ devel-2.6.22-rc4-mm2/kernel/resource.c
@@ -244,7 +244,7 @@ EXPORT_SYMBOL(release_resource);
  * the caller must specify res->start, res->end, res->flags.
  * If found, returns 0, res is overwritten, if not found, returns -1.
  */
-int find_next_system_ram(struct resource *res)
+static int find_next_system_ram(struct resource *res)
 {
 	resource_size_t start, end;
 	struct resource *p;
@@ -277,6 +277,30 @@ int find_next_system_ram(struct resource
 		res->end = p->end;
 	return 0;
 }
+
+int walk_memory_resource(unsigned long start_pfn, unsigned long nr_pages,
+			 void *arg, walk_memory_callback_t func)
+{
+	struct resource res;
+	unsigned long pfn, len;
+	u64 orig_end;
+	int ret;
+	res.start = (u64) start_pfn << PAGE_SHIFT;
+	res.end = ((u64)(start_pfn + nr_pages) << PAGE_SHIFT) - 1;
+	res.flags = IORESOURCE_MEM;
+	orig_end = res.end;
+	while ((res.start < res.end) && (find_next_system_ram(&res) >= 0)) {
+		pfn = (unsigned long)(res.start >> PAGE_SHIFT);
+		len = (unsigned long)(res.end + 1 - res.start) >> PAGE_SHIFT;
+		ret = (*func)(pfn, len, arg);
+		if (ret)
+			break;
+		res.start = res.end + 1;
+		res.end = orig_end;
+	}
+	return ret;
+}
+
 #endif
 
 /*
Index: devel-2.6.22-rc4-mm2/include/linux/ioport.h
===================================================================
--- devel-2.6.22-rc4-mm2.orig/include/linux/ioport.h
+++ devel-2.6.22-rc4-mm2/include/linux/ioport.h
@@ -110,9 +110,6 @@ extern int allocate_resource(struct reso
 int adjust_resource(struct resource *res, resource_size_t start,
 		    resource_size_t size);
 
-/* get registered SYSTEM_RAM resources in specified area */
-extern int find_next_system_ram(struct resource *res);
-
 /* Convenience shorthand with allocation */
 #define request_region(start,n,name)	__request_region(&ioport_resource, (start), (n), (name))
 #define request_mem_region(start,n,name) __request_region(&iomem_resource, (start), (n), (name))
Index: devel-2.6.22-rc4-mm2/include/linux/memory_hotplug.h
===================================================================
--- devel-2.6.22-rc4-mm2.orig/include/linux/memory_hotplug.h
+++ devel-2.6.22-rc4-mm2/include/linux/memory_hotplug.h
@@ -64,6 +64,15 @@ extern int online_pages(unsigned long, u
 extern int __add_pages(struct zone *zone, unsigned long start_pfn,
 	unsigned long nr_pages);
 
+/*
+ * Walk thorugh all memory which is registered as resource.
+ * arg is (start_pfn, nr_pages, private_arg_pointer)
+ */
+typedef int (*walk_memory_callback_t)(unsigned long, unsigned long, void *);
+extern int walk_memory_resource(unsigned long start_pfn,
+			unsigned long nr_pages,
+		    	void *arg, walk_memory_callback_t func);
+
 #ifdef CONFIG_NUMA
 extern int memory_add_physaddr_to_nid(u64 start);
 #else
Index: devel-2.6.22-rc4-mm2/mm/memory_hotplug.c
===================================================================
--- devel-2.6.22-rc4-mm2.orig/mm/memory_hotplug.c
+++ devel-2.6.22-rc4-mm2/mm/memory_hotplug.c
@@ -161,14 +161,27 @@ static void grow_pgdat_span(struct pglis
 					pgdat->node_start_pfn;
 }
 
-int online_pages(unsigned long pfn, unsigned long nr_pages)
+static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
+			void *arg)
 {
 	unsigned long i;
+	unsigned long onlined_pages = *(unsigned long *)arg;
+	struct page *page;
+	if (PageReserved(pfn_to_page(start_pfn)))
+		for (i = 0; i < nr_pages; i++) {
+			page = pfn_to_page(start_pfn + i);
+			online_page(page);
+			onlined_pages++;
+		}
+	*(unsigned long *)arg = onlined_pages;
+	return 0;
+}
+
+
+int online_pages(unsigned long pfn, unsigned long nr_pages)
+{
 	unsigned long flags;
 	unsigned long onlined_pages = 0;
-	struct resource res;
-	u64 section_end;
-	unsigned long start_pfn;
 	struct zone *zone;
 	int need_zonelists_rebuild = 0;
 
@@ -191,28 +204,8 @@ int online_pages(unsigned long pfn, unsi
 	if (!populated_zone(zone))
 		need_zonelists_rebuild = 1;
 
-	res.start = (u64)pfn << PAGE_SHIFT;
-	res.end = res.start + ((u64)nr_pages << PAGE_SHIFT) - 1;
-	res.flags = IORESOURCE_MEM; /* we just need system ram */
-	section_end = res.end;
-
-	while ((res.start < res.end) && (find_next_system_ram(&res) >= 0)) {
-		start_pfn = (unsigned long)(res.start >> PAGE_SHIFT);
-		nr_pages = (unsigned long)
-                           ((res.end + 1 - res.start) >> PAGE_SHIFT);
-
-		if (PageReserved(pfn_to_page(start_pfn))) {
-			/* this region's page is not onlined now */
-			for (i = 0; i < nr_pages; i++) {
-				struct page *page = pfn_to_page(start_pfn + i);
-				online_page(page);
-				onlined_pages++;
-			}
-		}
-
-		res.start = res.end + 1;
-		res.end = section_end;
-	}
+	walk_memory_resource(pfn, nr_pages, &onlined_pages,
+		online_pages_range);
 	zone->present_pages += onlined_pages;
 	zone->zone_pgdat->node_present_pages += onlined_pages;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* memory unplug v4 intro [4/6] page isolation
  2007-06-08  5:35 memory unplug v4 intro [0/6] KAMEZAWA Hiroyuki
                   ` (2 preceding siblings ...)
  2007-06-08  5:40 ` memory unplug v4 intro [3/6] walk memory resources KAMEZAWA Hiroyuki
@ 2007-06-08  5:41 ` KAMEZAWA Hiroyuki
  2007-06-08 13:24   ` Mel Gorman
  2007-06-08  5:43 ` memory unplug v4 intro [5/6] page offlining KAMEZAWA Hiroyuki
  2007-06-08  5:43 ` memory unplug v4 intro [6/6] ia64 interface KAMEZAWA Hiroyuki
  5 siblings, 1 reply; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-06-08  5:41 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto, clameter, hugh

Implement generic chunk-of-pages isolation method by using page grouping ops.

This patch add MIGRATE_ISOLATE to MIGRATE_TYPES. By this
 - MIGRATE_TYPES increases.
 - bitmap for migratetype is enlarged.

If make_pagetype_isolated(start,end) is called,
 - migratetype of the range turns to be MIGRATE_ISOLATE  if 
   its current type is MIGRATE_MOVABLE or MIGRATE_RESERVE.
 - MIGRATE_ISOLATE is not on migratetype fallback list.

Then, pages of this migratetype will not be allocated even if it is free.

Now, this patch only can treat the range aligned to MAX_ORDER.
This will be fixed if Mel's new work is merged.

Changes V3 -> V4
 - removed MIGRATE_ISOLATE check in free_hot_cold_page().
 - test_and_next_pages_isolated() is added, which sees Buddy information.
 - rounddown() macro is added to kernel.h, my own macro is removed.
 - is_page_isolated() function is removed.
 - change function names to be clearer.
   make_pagetype_isolated()/make_pagetype_movable().

Signed-Off-By: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 include/linux/kernel.h          |    1 
 include/linux/mmzone.h          |    3 +
 include/linux/page-isolation.h  |   47 ++++++++++++++++++++++++++++
 include/linux/pageblock-flags.h |    2 -
 mm/Makefile                     |    2 -
 mm/page_alloc.c                 |   63 +++++++++++++++++++++++++++++++++++++
 mm/page_isolation.c             |   67 ++++++++++++++++++++++++++++++++++++++++
 7 files changed, 182 insertions(+), 3 deletions(-)

Index: devel-2.6.22-rc4-mm2/include/linux/mmzone.h
===================================================================
--- devel-2.6.22-rc4-mm2.orig/include/linux/mmzone.h
+++ devel-2.6.22-rc4-mm2/include/linux/mmzone.h
@@ -39,7 +39,8 @@ extern int page_group_by_mobility_disabl
 #define MIGRATE_RECLAIMABLE   1
 #define MIGRATE_MOVABLE       2
 #define MIGRATE_RESERVE       3
-#define MIGRATE_TYPES         4
+#define MIGRATE_ISOLATE       4 /* can't allocate from here */
+#define MIGRATE_TYPES         5
 
 #define for_each_migratetype_order(order, type) \
 	for (order = 0; order < MAX_ORDER; order++) \
Index: devel-2.6.22-rc4-mm2/include/linux/pageblock-flags.h
===================================================================
--- devel-2.6.22-rc4-mm2.orig/include/linux/pageblock-flags.h
+++ devel-2.6.22-rc4-mm2/include/linux/pageblock-flags.h
@@ -31,7 +31,7 @@
 
 /* Bit indices that affect a whole block of pages */
 enum pageblock_bits {
-	PB_range(PB_migrate, 2), /* 2 bits required for migrate types */
+	PB_range(PB_migrate, 3), /* 3 bits required for migrate types */
 	NR_PAGEBLOCK_BITS
 };
 
Index: devel-2.6.22-rc4-mm2/mm/page_alloc.c
===================================================================
--- devel-2.6.22-rc4-mm2.orig/mm/page_alloc.c
+++ devel-2.6.22-rc4-mm2/mm/page_alloc.c
@@ -41,6 +41,7 @@
 #include <linux/pfn.h>
 #include <linux/backing-dev.h>
 #include <linux/fault-inject.h>
+#include <linux/page-isolation.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -4409,3 +4410,65 @@ void set_pageblock_flags_group(struct pa
 		else
 			__clear_bit(bitidx + start_bitidx, bitmap);
 }
+
+/*
+ * Chack a range of pages are isolated or not.
+ * returns next pfn to be tested.
+ * If pfn is not isoalted, returns 0.
+ */
+
+unsigned long test_and_next_isolated_page(unsigned long pfn)
+{
+	struct page *page;
+	if (!pfn_valid(pfn))
+		return 0;
+	page = pfn_to_page(pfn);
+	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
+		return 0;
+	if (PageBuddy(page))
+		return pfn + (1 << page_order(page));
+	/* Means pages in pcp list */
+	if (page_count(page) == 0 && page_private(page) == MIGRATE_ISOLATE)
+		return pfn + 1;
+	return 0;
+}
+
+/*
+ * set/clear page block's type to be ISOLATE.
+ * page allocater never alloc memory from ISOLATE block.
+ */
+
+
+int set_migratetype_isolate(struct page *page)
+{
+	struct zone *zone;
+	unsigned long flags;
+	int ret = -EBUSY;
+
+	zone = page_zone(page);
+	spin_lock_irqsave(&zone->lock, flags);
+	if (get_pageblock_migratetype(page) != MIGRATE_MOVABLE)
+		goto out;
+	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
+	move_freepages_block(zone, page, MIGRATE_ISOLATE);
+	ret = 0;
+out:
+	spin_unlock_irqrestore(&zone->lock, flags);
+	if (!ret)
+		drain_all_local_pages();
+	return ret;
+}
+
+void clear_migratetype_isolate(struct page *page)
+{
+	struct zone *zone;
+	unsigned long flags;
+	zone = page_zone(page);
+	spin_lock_irqsave(&zone->lock, flags);
+	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
+		goto out;
+	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+	move_freepages_block(zone, page, MIGRATE_MOVABLE);
+out:
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
Index: devel-2.6.22-rc4-mm2/mm/page_isolation.c
===================================================================
--- /dev/null
+++ devel-2.6.22-rc4-mm2/mm/page_isolation.c
@@ -0,0 +1,67 @@
+/*
+ * linux/mm/page_isolation.c
+ */
+
+#include <stddef.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/page-isolation.h>
+
+int
+make_pagetype_isolated(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn, start_pfn_aligned, end_pfn_aligned;
+	unsigned long undo_pfn;
+
+	start_pfn_aligned = rounddown(start_pfn, NR_PAGES_ISOLATION_BLOCK);
+	end_pfn_aligned = roundup(end_pfn, NR_PAGES_ISOLATION_BLOCK);
+
+	for (pfn = start_pfn_aligned;
+	     pfn < end_pfn_aligned;
+	     pfn += NR_PAGES_ISOLATION_BLOCK)
+		if (set_migratetype_isolate(pfn_to_page(pfn))) {
+			undo_pfn = pfn;
+			goto undo;
+		}
+	return 0;
+undo:
+	for (pfn = start_pfn_aligned;
+	     pfn <= undo_pfn;
+	     pfn += NR_PAGES_ISOLATION_BLOCK)
+		clear_migratetype_isolate(pfn_to_page(pfn));
+
+	return -EBUSY;
+}
+
+
+int
+make_pagetype_movable(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn, start_pfn_aligned, end_pfn_aligned;
+	start_pfn_aligned = rounddown(start_pfn, NR_PAGES_ISOLATION_BLOCK);
+        end_pfn_aligned = roundup(end_pfn, NR_PAGES_ISOLATION_BLOCK);
+
+	for (pfn = start_pfn_aligned;
+	     pfn < end_pfn_aligned;
+	     pfn += NR_PAGES_ISOLATION_BLOCK)
+		clear_migratetype_isolate(pfn_to_page(pfn));
+	return 0;
+}
+
+int
+test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+
+	pfn = start_pfn;
+	while (pfn < end_pfn) {
+		if (!pfn_valid(pfn)) {
+			pfn++;
+			continue;
+		}
+		pfn = test_and_next_isolated_page(pfn);
+		if (!pfn)
+			break;
+	}
+	return (pfn < end_pfn)? -EBUSY : 0;
+}
Index: devel-2.6.22-rc4-mm2/include/linux/page-isolation.h
===================================================================
--- /dev/null
+++ devel-2.6.22-rc4-mm2/include/linux/page-isolation.h
@@ -0,0 +1,47 @@
+#ifndef __LINUX_PAGEISOLATION_H
+#define __LINUX_PAGEISOLATION_H
+/*
+ * Define an interface for capturing and isolating some amount of
+ * contiguous pages.
+ * isolated pages are freed but wll never be allocated until they are
+ * pushed back.
+ *
+ * This isolation function requires some alignment.
+ */
+
+#define PAGE_ISOLATION_ORDER	(MAX_ORDER - 1)
+#define NR_PAGES_ISOLATION_BLOCK	(1 << PAGE_ISOLATION_ORDER)
+
+/*
+ * set page isolation range.
+ * If specified range includes migrate types other than MOVABLE,
+ * this will fail with -EBUSY.
+ */
+extern int
+make_pagetype_isolated(unsigned long start_pfn, unsigned long end_pfn);
+
+/*
+ *  Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
+ */
+extern int
+make_pagetype_movable(unsigned long start_pfn, unsigned long end_pfn);
+
+/*
+ * test all pages are isolated or not.
+ */
+extern int
+test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
+
+/* helper test routine for check page is isolated or not */
+extern unsigned long
+test_and_next_isolated_page(unsigned long pfn);
+
+/*
+ * Internal funcs.Changes pageblock's migrate type.
+ * Please use make_pagetype_isolated()/make_pagetype_movable().
+ */
+extern int set_migratetype_isolate(struct page *page);
+extern void clear_migratetype_isolate(struct page *page);
+
+
+#endif
Index: devel-2.6.22-rc4-mm2/mm/Makefile
===================================================================
--- devel-2.6.22-rc4-mm2.orig/mm/Makefile
+++ devel-2.6.22-rc4-mm2/mm/Makefile
@@ -11,7 +11,7 @@ obj-y			:= bootmem.o filemap.o mempool.o
 			   page_alloc.o page-writeback.o pdflush.o \
 			   readahead.o swap.o truncate.o vmscan.o \
 			   prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
-			   $(mmu-y)
+			   page_isolation.o $(mmu-y)
 
 obj-$(CONFIG_BOUNCE)	+= bounce.o
 obj-$(CONFIG_SWAP)	+= page_io.o swap_state.o swapfile.o thrash.o
Index: devel-2.6.22-rc4-mm2/include/linux/kernel.h
===================================================================
--- devel-2.6.22-rc4-mm2.orig/include/linux/kernel.h
+++ devel-2.6.22-rc4-mm2/include/linux/kernel.h
@@ -40,6 +40,7 @@ extern const char linux_proc_banner[];
 #define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f))
 #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
 #define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))
+#define rounddown(x, y) ((x)/(y)) * (y)
 
 /**
  * upper_32_bits - return bits 32-63 of a number

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: memory unplug v4 intro [4/6] page isolation
  2007-06-08  5:41 ` memory unplug v4 intro [4/6] page isolation KAMEZAWA Hiroyuki
@ 2007-06-08 13:24   ` Mel Gorman
  2007-06-08 13:59     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 20+ messages in thread
From: Mel Gorman @ 2007-06-08 13:24 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, y-goto, clameter, hugh

On (08/06/07 14:41), KAMEZAWA Hiroyuki didst pronounce:
> Implement generic chunk-of-pages isolation method by using page grouping ops.
> 
> This patch add MIGRATE_ISOLATE to MIGRATE_TYPES. By this
>  - MIGRATE_TYPES increases.
>  - bitmap for migratetype is enlarged.
> 
> If make_pagetype_isolated(start,end) is called,
>  - migratetype of the range turns to be MIGRATE_ISOLATE  if 
>    its current type is MIGRATE_MOVABLE or MIGRATE_RESERVE.
>  - MIGRATE_ISOLATE is not on migratetype fallback list.
> 
> Then, pages of this migratetype will not be allocated even if it is free.
> 
> Now, this patch only can treat the range aligned to MAX_ORDER.
> This will be fixed if Mel's new work is merged.
> 

Grouping by arbitrary order is now in -mm. The size of a pageblock area is
determined by pageblock_order which will either by the same as the huge page
size if avaialble or MAX_ORDER-1 if not.

> Changes V3 -> V4
>  - removed MIGRATE_ISOLATE check in free_hot_cold_page().
>  - test_and_next_pages_isolated() is added, which sees Buddy information.
>  - rounddown() macro is added to kernel.h, my own macro is removed.
>  - is_page_isolated() function is removed.
>  - change function names to be clearer.
>    make_pagetype_isolated()/make_pagetype_movable().
> 
> Signed-Off-By: Yasunori Goto <y-goto@jp.fujitsu.com>
> Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> ---
>  include/linux/kernel.h          |    1 
>  include/linux/mmzone.h          |    3 +
>  include/linux/page-isolation.h  |   47 ++++++++++++++++++++++++++++
>  include/linux/pageblock-flags.h |    2 -
>  mm/Makefile                     |    2 -
>  mm/page_alloc.c                 |   63 +++++++++++++++++++++++++++++++++++++
>  mm/page_isolation.c             |   67 ++++++++++++++++++++++++++++++++++++++++
>  7 files changed, 182 insertions(+), 3 deletions(-)
> 
> Index: devel-2.6.22-rc4-mm2/include/linux/mmzone.h
> ===================================================================
> --- devel-2.6.22-rc4-mm2.orig/include/linux/mmzone.h
> +++ devel-2.6.22-rc4-mm2/include/linux/mmzone.h
> @@ -39,7 +39,8 @@ extern int page_group_by_mobility_disabl
>  #define MIGRATE_RECLAIMABLE   1
>  #define MIGRATE_MOVABLE       2
>  #define MIGRATE_RESERVE       3
> -#define MIGRATE_TYPES         4
> +#define MIGRATE_ISOLATE       4 /* can't allocate from here */
> +#define MIGRATE_TYPES         5
>  
>  #define for_each_migratetype_order(order, type) \
>  	for (order = 0; order < MAX_ORDER; order++) \
> Index: devel-2.6.22-rc4-mm2/include/linux/pageblock-flags.h
> ===================================================================
> --- devel-2.6.22-rc4-mm2.orig/include/linux/pageblock-flags.h
> +++ devel-2.6.22-rc4-mm2/include/linux/pageblock-flags.h
> @@ -31,7 +31,7 @@
>  
>  /* Bit indices that affect a whole block of pages */
>  enum pageblock_bits {
> -	PB_range(PB_migrate, 2), /* 2 bits required for migrate types */
> +	PB_range(PB_migrate, 3), /* 3 bits required for migrate types */
>  	NR_PAGEBLOCK_BITS
>  };
>  
> Index: devel-2.6.22-rc4-mm2/mm/page_alloc.c
> ===================================================================
> --- devel-2.6.22-rc4-mm2.orig/mm/page_alloc.c
> +++ devel-2.6.22-rc4-mm2/mm/page_alloc.c
> @@ -41,6 +41,7 @@
>  #include <linux/pfn.h>
>  #include <linux/backing-dev.h>
>  #include <linux/fault-inject.h>
> +#include <linux/page-isolation.h>
>  
>  #include <asm/tlbflush.h>
>  #include <asm/div64.h>
> @@ -4409,3 +4410,65 @@ void set_pageblock_flags_group(struct pa
>  		else
>  			__clear_bit(bitidx + start_bitidx, bitmap);
>  }
> +
> +/*
> + * Chack a range of pages are isolated or not.
> + * returns next pfn to be tested.
> + * If pfn is not isoalted, returns 0.
> + */
> +

Spurious whitespace here. isolated is misspelt.

> +unsigned long test_and_next_isolated_page(unsigned long pfn)
> +{

Can this be defined with test_isolated_pages() as page_order() is now
defined in internal.h?

> +	struct page *page;
> +	if (!pfn_valid(pfn))
> +		return 0;

The caller is already calling pfn_valid() so this should be unnecessary.

Also, you may be calling pfn_valid() more than required. If you know a PFN
is within a MAX_ORDER block that contains at least one valid page, you only
have to call pfn_valid_within() which is a no-op on almost every architecture
but IA64.

> +	page = pfn_to_page(pfn);
> +	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
> +		return 0;

You shouldn't need to check this for every single page.

> +	if (PageBuddy(page))
> +		return pfn + (1 << page_order(page));
> +	/* Means pages in pcp list */
> +	if (page_count(page) == 0 && page_private(page) == MIGRATE_ISOLATE)
> +		return pfn + 1;
> +	return 0;
> +}
> +
> +/*
> + * set/clear page block's type to be ISOLATE.
> + * page allocater never alloc memory from ISOLATE block.
> + */
> +
> +

More spurious whitespace

> +int set_migratetype_isolate(struct page *page)
> +{
> +	struct zone *zone;
> +	unsigned long flags;
> +	int ret = -EBUSY;
> +
> +	zone = page_zone(page);
> +	spin_lock_irqsave(&zone->lock, flags);
> +	if (get_pageblock_migratetype(page) != MIGRATE_MOVABLE)
> +		goto out;

hmmm, review this decision on a regular basis. If the block was reclaimable
and Christoph's SLUB defragmentation patches work out, there will be more
block types that can be isolated.

> +	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> +	move_freepages_block(zone, page, MIGRATE_ISOLATE);
> +	ret = 0;
> +out:
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +	if (!ret)
> +		drain_all_local_pages();
> +	return ret;
> +}
> +
> +void clear_migratetype_isolate(struct page *page)
> +{
> +	struct zone *zone;
> +	unsigned long flags;
> +	zone = page_zone(page);
> +	spin_lock_irqsave(&zone->lock, flags);
> +	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
> +		goto out;
> +	set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> +	move_freepages_block(zone, page, MIGRATE_MOVABLE);
> +out:
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +}
> Index: devel-2.6.22-rc4-mm2/mm/page_isolation.c
> ===================================================================
> --- /dev/null
> +++ devel-2.6.22-rc4-mm2/mm/page_isolation.c
> @@ -0,0 +1,67 @@
> +/*
> + * linux/mm/page_isolation.c
> + */
> +
> +#include <stddef.h>
> +#include <linux/kernel.h>
> +#include <linux/mm.h>
> +#include <linux/page-isolation.h>
> +
> +int
> +make_pagetype_isolated(unsigned long start_pfn, unsigned long end_pfn)
> +{

As these are externally available, they could do with kerneldoc comments
explaining their purpose.

/**
 * make_pagetype_isolated - Mark a range of pages to be isolated from the buddy allocator
 * @start_pfn: The lower PFN of the range to be isolated
 * @end_pfn: The upper PFN of the range to be isolated
 *
 * Mark a range of pages to be isolated from the buddy allocator. Any
 * currently free page will no longer be available when this returns
 * successfully. Any page freed in the future will similarly be isolated
 * 
 * Returns 0 on success and -EBUSY if any part of the range cannot be
 * isolated
 */

or something

The names are not great either.

isolate_page_range() and putback_isolated_range() prehaps? I am not the
best at naming things so prehaps others will have better suggestions.

> +	unsigned long pfn, start_pfn_aligned, end_pfn_aligned;
> +	unsigned long undo_pfn;
> +
> +	start_pfn_aligned = rounddown(start_pfn, NR_PAGES_ISOLATION_BLOCK);
> +	end_pfn_aligned = roundup(end_pfn, NR_PAGES_ISOLATION_BLOCK);
> +

Check that the aligned PFNs do not go outside the zone range. This sort of
check has come up a lot, it may be a candidate for it's own helper.

> +	for (pfn = start_pfn_aligned;
> +	     pfn < end_pfn_aligned;
> +	     pfn += NR_PAGES_ISOLATION_BLOCK)
> +		if (set_migratetype_isolate(pfn_to_page(pfn))) {
> +			undo_pfn = pfn;
> +			goto undo;
> +		}
> +	return 0;
> +undo:
> +	for (pfn = start_pfn_aligned;
> +	     pfn <= undo_pfn;
> +	     pfn += NR_PAGES_ISOLATION_BLOCK)
> +		clear_migratetype_isolate(pfn_to_page(pfn));
> +
> +	return -EBUSY;
> +}
> +
> +
> +int
> +make_pagetype_movable(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long pfn, start_pfn_aligned, end_pfn_aligned;
> +	start_pfn_aligned = rounddown(start_pfn, NR_PAGES_ISOLATION_BLOCK);
> +        end_pfn_aligned = roundup(end_pfn, NR_PAGES_ISOLATION_BLOCK);

Tabs vs Spaces there.

> +
> +	for (pfn = start_pfn_aligned;
> +	     pfn < end_pfn_aligned;
> +	     pfn += NR_PAGES_ISOLATION_BLOCK)
> +		clear_migratetype_isolate(pfn_to_page(pfn));
> +	return 0;
> +}
> +
> +int
> +test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long pfn;
> +
> +	pfn = start_pfn;
> +	while (pfn < end_pfn) {
> +		if (!pfn_valid(pfn)) {
> +			pfn++;
> +			continue;
> +		}
> +		pfn = test_and_next_isolated_page(pfn);
> +		if (!pfn)
> +			break;
> +	}
> +	return (pfn < end_pfn)? -EBUSY : 0;
> +}
> Index: devel-2.6.22-rc4-mm2/include/linux/page-isolation.h
> ===================================================================
> --- /dev/null
> +++ devel-2.6.22-rc4-mm2/include/linux/page-isolation.h
> @@ -0,0 +1,47 @@
> +#ifndef __LINUX_PAGEISOLATION_H
> +#define __LINUX_PAGEISOLATION_H
> +/*
> + * Define an interface for capturing and isolating some amount of
> + * contiguous pages.
> + * isolated pages are freed but wll never be allocated until they are
> + * pushed back.
> + *
> + * This isolation function requires some alignment.
> + */
> +
> +#define PAGE_ISOLATION_ORDER	(MAX_ORDER - 1)
> +#define NR_PAGES_ISOLATION_BLOCK	(1 << PAGE_ISOLATION_ORDER)
> +

Consider using pageblock_order and pageblock_nr_pages from
pageblock-flags.h

> +/*
> + * set page isolation range.
> + * If specified range includes migrate types other than MOVABLE,
> + * this will fail with -EBUSY.
> + */
> +extern int
> +make_pagetype_isolated(unsigned long start_pfn, unsigned long end_pfn);
> +
> +/*
> + *  Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
> + */
> +extern int
> +make_pagetype_movable(unsigned long start_pfn, unsigned long end_pfn);
> +
> +/*
> + * test all pages are isolated or not.
> + */
> +extern int
> +test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
> +
> +/* helper test routine for check page is isolated or not */
> +extern unsigned long
> +test_and_next_isolated_page(unsigned long pfn);
> +
> +/*
> + * Internal funcs.Changes pageblock's migrate type.
> + * Please use make_pagetype_isolated()/make_pagetype_movable().
> + */
> +extern int set_migratetype_isolate(struct page *page);
> +extern void clear_migratetype_isolate(struct page *page);
> +
> +
> +#endif
> Index: devel-2.6.22-rc4-mm2/mm/Makefile
> ===================================================================
> --- devel-2.6.22-rc4-mm2.orig/mm/Makefile
> +++ devel-2.6.22-rc4-mm2/mm/Makefile
> @@ -11,7 +11,7 @@ obj-y			:= bootmem.o filemap.o mempool.o
>  			   page_alloc.o page-writeback.o pdflush.o \
>  			   readahead.o swap.o truncate.o vmscan.o \
>  			   prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
> -			   $(mmu-y)
> +			   page_isolation.o $(mmu-y)
>  
>  obj-$(CONFIG_BOUNCE)	+= bounce.o
>  obj-$(CONFIG_SWAP)	+= page_io.o swap_state.o swapfile.o thrash.o
> Index: devel-2.6.22-rc4-mm2/include/linux/kernel.h
> ===================================================================
> --- devel-2.6.22-rc4-mm2.orig/include/linux/kernel.h
> +++ devel-2.6.22-rc4-mm2/include/linux/kernel.h
> @@ -40,6 +40,7 @@ extern const char linux_proc_banner[];
>  #define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f))
>  #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
>  #define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))
> +#define rounddown(x, y) ((x)/(y)) * (y)
>  
>  /**
>   * upper_32_bits - return bits 32-63 of a number

-- 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: memory unplug v4 intro [4/6] page isolation
  2007-06-08 13:24   ` Mel Gorman
@ 2007-06-08 13:59     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-06-08 13:59 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, y-goto, clameter, hugh

On Fri, 8 Jun 2007 14:24:11 +0100
mel@skynet.ie (Mel Gorman) wrote:

> > +/*
> > + * Chack a range of pages are isolated or not.
> > + * returns next pfn to be tested.
> > + * If pfn is not isoalted, returns 0.
> > + */
> > +
> 
> Spurious whitespace here. isolated is misspelt.
> 
ok.

> > +unsigned long test_and_next_isolated_page(unsigned long pfn)
> > +{
> 
> Can this be defined with test_isolated_pages() as page_order() is now
> defined in internal.h?
> 
I dropped per-page test in this version and added faster one.
Will we need per-page test ?

> > +	struct page *page;
> > +	if (!pfn_valid(pfn))
> > +		return 0;
> 
> The caller is already calling pfn_valid() so this should be unnecessary.
> 
hmm, ok.

> Also, you may be calling pfn_valid() more than required. 
maybe yes.

> If you know a PFN
> is within a MAX_ORDER block that contains at least one valid page, you only
> have to call pfn_valid_within() which is a no-op on almost every architecture
> but IA64.
ok, I'll look it.

> 
> > +	page = pfn_to_page(pfn);
> > +	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
> > +		return 0;
> 
> You shouldn't need to check this for every single page.
> 
Hmm, I don't have to ?
BTW, shall I move this func to page_isolation.c ?

> > +	if (PageBuddy(page))
> > +		return pfn + (1 << page_order(page));
> > +	/* Means pages in pcp list */
> > +	if (page_count(page) == 0 && page_private(page) == MIGRATE_ISOLATE)
> > +		return pfn + 1;
> > +	return 0;
> > +}
> > +
> > +/*
> > + * set/clear page block's type to be ISOLATE.
> > + * page allocater never alloc memory from ISOLATE block.
> > + */
> > +
> > +
> 
> More spurious whitespace
> 
Ugh..sorry.

> > +int set_migratetype_isolate(struct page *page)
> > +{
> > +	struct zone *zone;
> > +	unsigned long flags;
> > +	int ret = -EBUSY;
> > +
> > +	zone = page_zone(page);
> > +	spin_lock_irqsave(&zone->lock, flags);
> > +	if (get_pageblock_migratetype(page) != MIGRATE_MOVABLE)
> > +		goto out;
> 
> hmmm, review this decision on a regular basis. If the block was reclaimable
> and Christoph's SLUB defragmentation patches work out, there will be more
> block types that can be isolated.
> 
*maybe* yes. we can change this check later.


> As these are externally available, they could do with kerneldoc comments
> explaining their purpose.
> 
> /**
>  * make_pagetype_isolated - Mark a range of pages to be isolated from the buddy allocator
>  * @start_pfn: The lower PFN of the range to be isolated
>  * @end_pfn: The upper PFN of the range to be isolated
>  *
>  * Mark a range of pages to be isolated from the buddy allocator. Any
>  * currently free page will no longer be available when this returns
>  * successfully. Any page freed in the future will similarly be isolated
>  * 
>  * Returns 0 on success and -EBUSY if any part of the range cannot be
>  * isolated
>  */
> 
> or something
ok, I'll do.

> 
> The names are not great either.
> 
> isolate_page_range() and putback_isolated_range() prehaps? I am not the
> best at naming things so prehaps others will have better suggestions.
> 
Hmm, I'll look for better name.

> > +	unsigned long pfn, start_pfn_aligned, end_pfn_aligned;
> > +	unsigned long undo_pfn;
> > +
> > +	start_pfn_aligned = rounddown(start_pfn, NR_PAGES_ISOLATION_BLOCK);
> > +	end_pfn_aligned = roundup(end_pfn, NR_PAGES_ISOLATION_BLOCK);
> > +
> 
> Check that the aligned PFNs do not go outside the zone range. This sort of
> check has come up a lot, it may be a candidate for it's own helper.
> 
Hmm, now, the caller checks it. but ok. I'll add check here.


> 
> > +#define PAGE_ISOLATION_ORDER	(MAX_ORDER - 1)
> > +#define NR_PAGES_ISOLATION_BLOCK	(1 << PAGE_ISOLATION_ORDER)
> > +
> 
> Consider using pageblock_order and pageblock_nr_pages from
> pageblock-flags.h
> 
yes, of course.

Thank you for review.

It seems the total constructure of patch is not so good. I'll rebuild it.

-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* memory unplug v4 intro [5/6] page offlining
  2007-06-08  5:35 memory unplug v4 intro [0/6] KAMEZAWA Hiroyuki
                   ` (3 preceding siblings ...)
  2007-06-08  5:41 ` memory unplug v4 intro [4/6] page isolation KAMEZAWA Hiroyuki
@ 2007-06-08  5:43 ` KAMEZAWA Hiroyuki
  2007-06-08  5:43 ` memory unplug v4 intro [6/6] ia64 interface KAMEZAWA Hiroyuki
  5 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-06-08  5:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto, clameter, hugh

Changes V3->V4
 - Kconfig is changed. "select MIGRATION" is removed and "depends on MIGRATION"
   is added.
 - page scan logic is changed. scan range of pfn and find LRU page.
 - make use of walk_memory_resource() patch.
 - be simpler.
 

Logic.
 - set all pages in  [start,end)  as isolated migration-type.
   by this, all free pages in the range will be not-for-use.
 - Migrate all LRU pages in the range.
 - Test all pages in the range's refcnt is zero or not.

Todo:
 - allocate migration destination page from better area.
 - confirm page_count(page)== 0 && PageReserved(page) page is safe to be freed..
 (I don't like this kind of page but..
 - Find out pages which cannot be migrated.
 - more running tests.

Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-Off-By: Yasunori Goto <y-goto@jp.fujitsu.com>

---
 include/linux/memory_hotplug.h |    5 
 mm/Kconfig                     |    5 
 mm/memory_hotplug.c            |  226 +++++++++++++++++++++++++++++++++++++++++
 mm/page_alloc.c                |   48 ++++++++
 4 files changed, 283 insertions(+), 1 deletion(-)

Index: devel-2.6.22-rc4-mm2/mm/Kconfig
===================================================================
--- devel-2.6.22-rc4-mm2.orig/mm/Kconfig
+++ devel-2.6.22-rc4-mm2/mm/Kconfig
@@ -126,6 +126,11 @@ config MEMORY_HOTPLUG_SPARSE
 	def_bool y
 	depends on SPARSEMEM && MEMORY_HOTPLUG
 
+config MEMORY_HOTREMOVE
+	bool "Allow for memory hot remove"
+	depends on MEMORY_HOTPLUG
+	depends on MIGRATION
+
 # Heavily threaded applications may benefit from splitting the mm-wide
 # page_table_lock, so that faults on different parts of the user address
 # space can be handled with less contention: split it at this NR_CPUS.
Index: devel-2.6.22-rc4-mm2/mm/memory_hotplug.c
===================================================================
--- devel-2.6.22-rc4-mm2.orig/mm/memory_hotplug.c
+++ devel-2.6.22-rc4-mm2/mm/memory_hotplug.c
@@ -23,6 +23,9 @@
 #include <linux/vmalloc.h>
 #include <linux/ioport.h>
 #include <linux/cpuset.h>
+#include <linux/delay.h>
+#include <linux/migrate.h>
+#include <linux/page-isolation.h>
 
 #include <asm/tlbflush.h>
 
@@ -301,3 +304,227 @@ error:
 	return ret;
 }
 EXPORT_SYMBOL_GPL(add_memory);
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+/*
+ * Scanning pfn is much easier than scanning lru list.
+ * Scan pfn from start to end and Find LRU page.
+ */
+int scan_lru_pages(unsigned long start, unsigned long end)
+{
+	unsigned long pfn;
+	struct page *page;
+	for (pfn = start; pfn < end; pfn++) {
+		if (pfn_valid(pfn)) {
+			page = pfn_to_page(pfn);
+			if (PageLRU(page))
+				return pfn;
+		}
+	}
+	return 0;
+}
+
+static struct page *
+hotremove_migrate_alloc(struct page *page,
+			unsigned long private,
+			int **x)
+{
+	/* This should be improoooooved!! */
+	return alloc_page(GFP_HIGHUSER_PAGECACHE);
+}
+
+
+#define NR_OFFLINE_AT_ONCE_PAGES	(256)
+static int
+do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long pfn;
+	struct page *page;
+	int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
+	int not_managed = 0;
+	int ret = 0;
+	LIST_HEAD(source);
+
+	for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+		if (!pfn_valid(pfn))
+			continue;
+		page = pfn_to_page(pfn);
+		if (!page_count(page))
+			continue;
+		/*
+		 * We can skip free pages. And we can only deal with pages on
+		 * LRU.
+		 */
+		ret = isolate_lru_page(page, &source);
+		if (!ret) { /* Success */
+			move_pages--;
+		} else {
+			/* Becasue we don't have big zone->lock. we should
+			   check this again here. */
+			if (page_count(page))
+				not_managed++;
+#ifdef CONFIG_DEBUG_VM
+			printk("Not Migratable page found %lx/%d/%lx\n",
+				pfn, page_count(page), page->flags);
+#endif
+		}
+	}
+	ret = -EBUSY;
+	if (not_managed) {
+		if (!list_empty(&source))
+			putback_lru_pages(&source);
+		goto out;
+	}
+	ret = 0;
+	if (list_empty(&source))
+		goto out;
+	/* this function returns # of failed pages */
+	ret = migrate_pages(&source, hotremove_migrate_alloc, 0);
+
+out:
+	return ret;
+}
+
+/*
+ * remove from free_area[] and mark all as Reserved.
+ */
+static int
+offline_isolated_pages_cb(unsigned long start, unsigned long nr_pages,
+			void *data)
+{
+	__offline_isolated_pages(start, start + nr_pages);
+	return 0;
+}
+
+static void
+offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
+{
+	walk_memory_resource(start_pfn, end_pfn - start_pfn, NULL,
+				offline_isolated_pages_cb);
+}
+
+/*
+ * Check all pages in range, recoreded as memory resource, are isolated.
+ */
+static int
+check_pages_isolated_cb(unsigned long start_pfn, unsigned long nr_pages,
+			void *data)
+{
+	int ret;
+	long offlined = *(long*)data;
+	ret = test_pages_isolated(start_pfn, start_pfn + nr_pages);
+	offlined = nr_pages;
+	if (!ret)
+		*(long*)data += offlined;
+	return ret;
+}
+
+static long
+check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
+{
+	long offlined = 0;
+	int ret;
+
+	ret = walk_memory_resource(start_pfn, end_pfn - start_pfn, &offlined,
+			check_pages_isolated_cb);
+	if (ret < 0)
+		offlined = (long)ret;
+	return offlined;
+}
+
+extern void drain_all_local_pages(void);
+
+int offline_pages(unsigned long start_pfn,
+		  unsigned long end_pfn, unsigned long timeout)
+{
+	unsigned long pfn, nr_pages, expire;
+	long offlined_pages;
+	int ret, drain, retry_max;
+	struct zone *zone;
+
+	BUG_ON(start_pfn >= end_pfn);
+	/* at least, alignment against pageblock is necessary */
+	if (start_pfn & (NR_PAGES_ISOLATION_BLOCK - 1))
+		return -EINVAL;
+	if (end_pfn & (NR_PAGES_ISOLATION_BLOCK - 1))
+		return -EINVAL;
+	/* This makes hotplug much easier...and readable.
+	   we assume this for now. .*/
+	if (page_zone(pfn_to_page(start_pfn)) !=
+		page_zone(pfn_to_page(end_pfn - 1)))
+		return -EINVAL;
+	/* set above range as isolated */
+	ret = make_pagetype_isolated(start_pfn, end_pfn);
+	if (ret)
+		return ret;
+	nr_pages = end_pfn - start_pfn;
+	pfn = start_pfn;
+	expire = jiffies + timeout;
+	drain = 0;
+	retry_max = 5;
+repeat:
+	/* start memory hot removal */
+	ret = -EAGAIN;
+	if (time_after(jiffies, expire))
+		goto failed_removal;
+	ret = -EINTR;
+	if (signal_pending(current))
+		goto failed_removal;
+	ret = 0;
+	if (drain) {
+		lru_add_drain_all();
+		flush_scheduled_work();
+		cond_resched();
+		drain_all_local_pages();
+	}
+
+	pfn = scan_lru_pages(start_pfn, end_pfn);
+	if (pfn) { /* We have page on LRU */
+		ret = do_migrate_range(pfn, end_pfn);
+		if (!ret) {
+			drain = 1;
+			goto repeat;
+		} else {
+			if (ret < 0)
+				if (--retry_max == 0)
+					goto failed_removal;
+			yield();
+			drain = 1;
+			goto repeat;
+		}
+	}
+	/* drain all zone's lru pagevec, this is asyncronous... */
+	lru_add_drain_all();
+	flush_scheduled_work();
+	yield();
+	/* drain pcp pages , this is synchrouns. */
+	drain_all_local_pages();
+	/* check again */
+	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
+	if (offlined_pages < 0) {
+		ret = -EBUSY;
+		goto failed_removal;
+	}
+	printk("Offlined Pages %ld\n",offlined_pages);
+	/* Ok, all of our target is islaoted.
+	   We cannot do rollback at this point. */
+	offline_isolated_pages(start_pfn, end_pfn);
+	/* reset pagetype flags */
+	make_pagetype_movable(start_pfn, end_pfn);
+	/* removal success */
+	zone = page_zone(pfn_to_page(start_pfn));
+	zone->present_pages -= offlined_pages;
+	zone->zone_pgdat->node_present_pages -= offlined_pages;
+	totalram_pages -= offlined_pages;
+	num_physpages -= offlined_pages;
+	vm_total_pages = nr_free_pagecache_pages();
+	writeback_set_ratelimit();
+	return 0;
+
+failed_removal:
+	printk("memory offlining %lx to %lx failed\n",start_pfn, end_pfn);
+	/* pushback to free area */
+	make_pagetype_movable(start_pfn, end_pfn);
+	return ret;
+}
+#endif /* CONFIG_MEMORY_HOTREMOVE */
Index: devel-2.6.22-rc4-mm2/include/linux/memory_hotplug.h
===================================================================
--- devel-2.6.22-rc4-mm2.orig/include/linux/memory_hotplug.h
+++ devel-2.6.22-rc4-mm2/include/linux/memory_hotplug.h
@@ -59,7 +59,10 @@ extern int add_one_highpage(struct page 
 extern void online_page(struct page *page);
 /* VM interface that may be used by firmware interface */
 extern int online_pages(unsigned long, unsigned long);
-
+#ifdef CONFIG_MEMORY_HOTREMOVE
+extern int offline_pages(unsigned long, unsigned long, unsigned long);
+extern void __offline_isolated_pages(unsigned long, unsigned long);
+#endif
 /* reasonably generic interface to expand the physical pages in a zone  */
 extern int __add_pages(struct zone *zone, unsigned long start_pfn,
 	unsigned long nr_pages);
Index: devel-2.6.22-rc4-mm2/mm/page_alloc.c
===================================================================
--- devel-2.6.22-rc4-mm2.orig/mm/page_alloc.c
+++ devel-2.6.22-rc4-mm2/mm/page_alloc.c
@@ -4472,3 +4472,51 @@ void clear_migratetype_isolate(struct pa
 out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 }
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+/*
+ * All pages in the range must be isolated before calling this.
+ */
+void
+__offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
+{
+	struct page *page;
+	struct zone *zone;
+	int order, i;
+	unsigned long pfn;
+	unsigned long flags;
+	/* find the first valid pfn */
+	for (pfn = start_pfn; pfn < end_pfn; pfn++)
+		if (pfn_valid(pfn))
+			break;
+	if (pfn == end_pfn)
+		return;
+	zone = page_zone(pfn_to_page(pfn));
+	spin_lock_irqsave(&zone->lock, flags);
+	printk("do offline \n");
+	pfn = start_pfn;
+	while (pfn < end_pfn) {
+		if (!pfn_valid(pfn)) {
+			pfn++;
+			continue;
+		}
+		page = pfn_to_page(pfn);
+		BUG_ON(page_count(page));
+		BUG_ON(!PageBuddy(page));
+		order = page_order(page);
+#ifdef CONFIG_DEBUG_VM
+		printk("remove from free list %lx %d %lx\n",
+		       pfn, 1 << order, end_pfn);
+#endif
+		list_del(&page->lru);
+		rmv_page_order(page);
+		zone->free_area[order].nr_free--;
+		__mod_zone_page_state(zone, NR_FREE_PAGES,
+				      - (1UL << order));
+		for (i = 0; i < (1 << order); i++)
+			SetPageReserved((page+i));
+		pfn += (1 << order);
+	}
+	spin_unlock_irqrestore(&zone->lock,flags);
+}
+#endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* memory unplug v4 intro [6/6] ia64 interface
  2007-06-08  5:35 memory unplug v4 intro [0/6] KAMEZAWA Hiroyuki
                   ` (4 preceding siblings ...)
  2007-06-08  5:43 ` memory unplug v4 intro [5/6] page offlining KAMEZAWA Hiroyuki
@ 2007-06-08  5:43 ` KAMEZAWA Hiroyuki
  5 siblings, 0 replies; 20+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-06-08  5:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki; +Cc: linux-mm, mel, y-goto, clameter, hugh

IA64 memory unplug interface.

Signed-Off-By: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

---
 arch/ia64/mm/init.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Index: devel-2.6.22-rc4-mm2/arch/ia64/mm/init.c
===================================================================
--- devel-2.6.22-rc4-mm2.orig/arch/ia64/mm/init.c
+++ devel-2.6.22-rc4-mm2/arch/ia64/mm/init.c
@@ -724,7 +724,17 @@ int arch_add_memory(int nid, u64 start, 
 
 int remove_memory(u64 start, u64 size)
 {
-	return -EINVAL;
+	unsigned long start_pfn, end_pfn;
+	unsigned long timeout = 120 * HZ;
+	int ret;
+	start_pfn = start >> PAGE_SHIFT;
+	end_pfn = start_pfn + (size >> PAGE_SHIFT);
+	ret = offline_pages(start_pfn, end_pfn, timeout);
+	if (ret)
+		goto out;
+	/* we can free mem_map at this point */
+out:
+	return ret;
 }
 EXPORT_SYMBOL_GPL(remove_memory);
 #endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-06-08 13:59 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-08  5:35 memory unplug v4 intro [0/6] KAMEZAWA Hiroyuki
2007-06-08  5:38 ` memory unplug v4 intro [1/6] migration without mm->sem KAMEZAWA Hiroyuki
2007-06-08  5:47   ` Christoph Lameter
2007-06-08  5:54     ` KAMEZAWA Hiroyuki
2007-06-08  5:57       ` Christoph Lameter
2007-06-08  6:06         ` KAMEZAWA Hiroyuki
2007-06-08  6:44           ` Christoph Lameter
2007-06-08  7:01             ` KAMEZAWA Hiroyuki
2007-06-08  7:21               ` Christoph Lameter
2007-06-08  7:25                 ` KAMEZAWA Hiroyuki
2007-06-08  5:39 ` memory unplug v4 [2/6] lru isolation race fix KAMEZAWA Hiroyuki
2007-06-08  5:52   ` Christoph Lameter
2007-06-08  5:58     ` KAMEZAWA Hiroyuki
2007-06-08  5:58       ` Christoph Lameter
2007-06-08  5:40 ` memory unplug v4 intro [3/6] walk memory resources KAMEZAWA Hiroyuki
2007-06-08  5:41 ` memory unplug v4 intro [4/6] page isolation KAMEZAWA Hiroyuki
2007-06-08 13:24   ` Mel Gorman
2007-06-08 13:59     ` KAMEZAWA Hiroyuki
2007-06-08  5:43 ` memory unplug v4 intro [5/6] page offlining KAMEZAWA Hiroyuki
2007-06-08  5:43 ` memory unplug v4 intro [6/6] ia64 interface KAMEZAWA Hiroyuki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox