linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hirokazu Takahashi <taka@valinux.co.jp>
To: marcelo.tosatti@cyclades.com
Cc: linux-mm@kvack.org, iwamoto@valinux.co.jp, haveblue@us.ibm.com,
	hugh@veritas.com
Subject: Re: migration cache, updated
Date: Wed, 24 Nov 2004 19:21:56 +0900 (JST)	[thread overview]
Message-ID: <20041124.192156.73388074.taka@valinux.co.jp> (raw)
In-Reply-To: <20041123121447.GE4524@logos.cnet>

Hi Marcelo,

> > Hi Marcelo,
> > 
> > I've been testing the memory migration code with your patch.
> > I found problems and I think the attached patch would
> > fix some of them.
> > 
> > One of the problems is a race condition between add_to_migration_cache()
> > and try_to_unmap(). Some pages in the migration cache cannot
> > be removed with the current implementation. Please suppose
> > a process space might be removed between them. In this case
> > no one can remove pages the process had from the migration cache,
> > because they can be removed only when the pagetables pointed
> > the pages.
> 
> I guess I dont fully understand you Hirokazu.
> 
> unmap_vmas function (called by exit_mmap) calls zap_pte_range, 
> and that does:
> 
>                         if (pte_is_migration(pte)) {
>                                 migration_remove_entry(swp_entry);
>                         } else
>                                 free_swap_and_cache(swp_entry);
> 
> migration_remove_entry should decrease the IDR counter, and 
> remove the migration cache page on zero reference.
> 
> Am I missing something?

That's true only if the pte points a migration entry.
However, the pte may not point it when zap_pte_range() is called
in some case.

Please suppose the following flow.
Any process may exit or munmap during memory migration
before calling set_pte(migration entry). This will
keep some unreferenced pages in the migration cache.
No one can remove these pages.

  <start page migration>                  <Process A>
        |                                      |
        |                                      |
        |                                      |
 add_to_migration_cache()                      |
    insert a page of Process A  ----------->   |
    in the migration cache.                    |
        |                                      |
        |                               zap_pte_range()
        |                   X <------------ migration_remove_entry()
        |                      the pte associated with the page doesn't
        |                      point any migration entries.
        |
        |
 try_to_unmap() -----------------------> X
     migration_duplicate()       no pte mapping the page can be found.
     set_pte(migration entry)
        |
        |
 migrate_fn()
        |
        |
    <finish>
         the page still remains in the migration cache.
	 the page may be referred by no process.


> I assume you are seeing this problems in practice?

Yes, it often happens without the patch.

> Sorry for the delay, been busy with other things.

No problem. Everyone knows you're doing hard work!

> > Therefore, I made pages removed from the migration cache
> > at the end of generic_migrate_page() if they remain in the cache.
> > 
> > The another is a fork() related problem. If fork() has occurred
> > during page migration, the previous work may not go well.
> > pages may not be removed from the migration cache.
> > 
> > So I made the swapcode ignore pages in the migration cache.
> > However, as you know this is just a workaround and not a correct
> > way to fix it.
> > 
> > > Hi Hirokazu!
> > > 
> > > The problem is that another thread can fault in the pte 
> > > (removing the radix tree entry) while the current thread dropped the 
> > > page_table_lock - which explains the NULL lookup_migration_cache. 
> > > The swap code handles this situation, but I've completly missed it. 
> > > 
> > > Updated patch attached.
> > > 
> > > Extreme thanks for your testing, its being crucial! 
> > > 
> > > We're getting there.
> > > 
> > > do_swap_page now does:
> > > 
> > >  again:
> > > +       if (pte_is_migration(orig_pte)) {
> > > +               page = lookup_migration_cache(entry.val);
> > > +               if (!page) {
> > > +                       spin_lock(&mm->page_table_lock);
> > > +                       page_table = pte_offset_map(pmd, address);
> > > +                       if (likely(pte_same(*page_table, orig_pte)))
> > > +                               ret = VM_FAULT_OOM;
> > > +                       else
> > > +                               ret = VM_FAULT_MINOR;
> > > +                       pte_unmap(page_table);
> > > +                       spin_unlock(&mm->page_table_lock);
> > > +                       goto out;
> > > +               }
> > > +       } else {
> > > 
> > 
> > 
> > 
> > Signed-off-by: Hirokazu Takahashi <taka@valinux.co.jp>
> > ---
> > 
> >  linux-2.6.9-rc4-taka/mm/memory.c   |    2 +-
> >  linux-2.6.9-rc4-taka/mm/mmigrate.c |   28 +++++++++++++++++++++-------
> >  linux-2.6.9-rc4-taka/mm/vmscan.c   |    4 ++++
> >  3 files changed, 26 insertions, 8 deletions
> > 
> > diff -puN mm/mmigrate.c~marcelo-FIX1 mm/mmigrate.c
> > --- linux-2.6.9-rc4/mm/mmigrate.c~marcelo-FIX1	Tue Nov 16 10:43:56 2004
> > +++ linux-2.6.9-rc4-taka/mm/mmigrate.c	Tue Nov 16 11:07:10 2004
> > @@ -114,14 +114,14 @@ int migration_remove_entry(swp_entry_t e
> >  
> >  	lock_page(page);	
> >  
> > -	migration_remove_reference(page);
> > +	migration_remove_reference(page, 1);
> >  
> >  	unlock_page(page);
> >  
> >  	page_cache_release(page);
> >  }
> >  
> > -int migration_remove_reference(struct page *page)
> > +int migration_remove_reference(struct page *page, int dec)
> >  {
> >  	struct counter *c;
> >  	swp_entry_t entry;
> > @@ -134,10 +134,9 @@ int migration_remove_reference(struct pa
> >  
> >  	read_unlock_irq(&migration_space.tree_lock);
> >  
> > -	if (!c->i)
> > -		BUG();
> > +	BUG_ON(c->i < dec);
> >  
> > -	c->i--;
> > +	c->i -= dec;
> >  
> >  	if (!c->i) {
> >  		remove_from_migration_cache(page, page->private);
> > @@ -146,6 +145,15 @@ int migration_remove_reference(struct pa
> >  	}
> >  }
> >  
> > +int detach_from_migration_cache(struct page *page)
> > +{
> > +	lock_page(page);	
> > +	migration_remove_reference(page, 0);
> > +	unlock_page(page);
> > +
> > +	return 0;
> > +}
> > +
> >  int add_to_migration_cache(struct page *page, int gfp_mask) 
> >  {
> >  	int error, offset;
> > @@ -522,7 +530,9 @@ generic_migrate_page(struct page *page, 
> >  
> >  	/* map the newpage where the old page have been mapped. */
> >  	touch_unmapped_address(&vlist);
> > -	if (PageSwapCache(newpage)) {
> > +	if (PageMigration(newpage))
> > +		detach_from_migration_cache(newpage);
> > +	else if (PageSwapCache(newpage)) {
> >  		lock_page(newpage);
> >  		__remove_exclusive_swap_page(newpage, 1);
> >  		unlock_page(newpage);
> 
> I dont see this code on 2.6.9-rc2-mm4-mhp, I should upgrade.
> 
> > @@ -538,7 +548,9 @@ out_busy:
> >  	/* Roll back all operations. */
> >  	unwind_page(page, newpage);
> >  	touch_unmapped_address(&vlist);
> > -	if (PageSwapCache(page)) {
> > +	if (PageMigration(page))
> > +		detach_from_migration_cache(page);
> > +	else if (PageSwapCache(page)) {
> >  		lock_page(page);
> >  		__remove_exclusive_swap_page(page, 1);
> >  		unlock_page(page);
> > @@ -550,6 +562,8 @@ out_removing:
> >  		BUG();
> >  	unlock_page(page);
> >  	unlock_page(newpage);
> > +	if (PageMigration(page))
> > +		detach_from_migration_cache(page);
> >  	return ret;
> >  }
> >  
> > diff -puN mm/vmscan.c~marcelo-FIX1 mm/vmscan.c
> > --- linux-2.6.9-rc4/mm/vmscan.c~marcelo-FIX1	Mon Nov 15 12:20:35 2004
> > +++ linux-2.6.9-rc4-taka/mm/vmscan.c	Tue Nov 16 11:06:06 2004
> > @@ -459,6 +459,10 @@ int shrink_list(struct list_head *page_l
> >  			goto keep_locked;
> >  		}
> >  
> > +		if (PageMigration(page)) {
> > +			write_unlock_irq(&mapping->tree_lock);
> > +			goto keep_locked;
> > +		}
> >  #ifdef CONFIG_SWAP
> >  		if (PageSwapCache(page)) {
> >  			swp_entry_t swap = { .val = page->private };
> > diff -puN mm/memory.c~marcelo-FIX1 mm/memory.c
> > --- linux-2.6.9-rc4/mm/memory.c~marcelo-FIX1	Tue Nov 16 11:06:31 2004
> > +++ linux-2.6.9-rc4-taka/mm/memory.c	Tue Nov 16 11:06:57 2004
> > @@ -1621,7 +1621,7 @@ again:
> >  		if (vm_swap_full())
> >  			remove_exclusive_swap_page(page);
> >  	} else {
> > -		migration_remove_reference(page);
> > +		migration_remove_reference(page, 1);
> >  	}
> >  
> >  	mm->rss++;
> > _
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

  reply	other threads:[~2004-11-24 10:21 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-10-25 21:39 Marcelo Tosatti
2004-10-26  1:17 ` Hiroyuki KAMEZAWA
2004-10-26 12:01   ` Marcelo Tosatti
2004-10-26 23:47     ` Hiroyuki KAMEZAWA
2004-10-26  6:37 ` Hirokazu Takahashi
2004-10-26  9:20   ` Marcelo Tosatti
2004-10-26 13:45     ` Hirokazu Takahashi
2004-10-26 11:41       ` Marcelo Tosatti
2004-10-27 13:40       ` Hirokazu Takahashi
2004-10-26  9:15 ` Hirokazu Takahashi
2004-10-26  9:25   ` Marcelo Tosatti
2004-10-26 14:01     ` Hirokazu Takahashi
2004-10-26 12:24       ` Marcelo Tosatti
2004-10-27  7:25         ` IWAMOTO Toshihiro
2004-10-27 16:27           ` Marcelo Tosatti
2004-10-27 13:48         ` Hirokazu Takahashi
2004-10-28 15:19           ` Marcelo Tosatti
2004-10-28 16:05             ` Marcelo Tosatti
2004-10-28 18:51               ` Dave Hansen
2004-10-28 16:26                 ` Marcelo Tosatti
2004-10-28 20:24                   ` Dave Hansen
2004-11-03 15:21                   ` Marcelo Tosatti
2004-11-04  8:01                     ` Hirokazu Takahashi
2004-11-05 13:49               ` Hirokazu Takahashi
2004-11-05 15:16                 ` Marcelo Tosatti
2004-11-16  4:07                   ` Hirokazu Takahashi
2004-11-23 12:14                     ` Marcelo Tosatti
2004-11-24 10:21                       ` Hirokazu Takahashi [this message]
2004-12-01 20:21                         ` Marcelo Tosatti
2004-12-08 13:23                           ` Hirokazu Takahashi
2005-01-17  9:59                             ` Marcelo Tosatti
2005-01-31 18:33                               ` Ray Bryant
2005-01-31 18:44                                 ` Marcelo Tosatti
2005-02-02 21:28                                   ` Ray Bryant
2005-02-03  2:59                                     ` Hirokazu Takahashi
2005-02-03 15:19                                       ` Ray Bryant
2005-02-04  7:32                                         ` Hirokazu Takahashi
2005-02-04 16:08                                           ` Ray Bryant
2005-02-07 12:46                                             ` Hirokazu Takahashi
2005-02-07 20:54                                               ` Ray Bryant
2005-02-08  2:17                                                 ` Hirokazu Takahashi
     [not found]                                                   ` <42083913.9050306@sgi.com>
     [not found]                                                     ` <20050209.151938.63052333.taka@valinux.co.jp>
2005-02-09 20:48                                                       ` Ray Bryant
2005-02-07 13:16                                     ` Hirokazu Takahashi
2005-02-03  2:49                               ` Hirokazu Takahashi
2005-01-03 19:04 page migration Ray Bryant
2005-01-03 19:37 ` Dave Hansen
2005-01-03 20:15   ` Ray Bryant
2005-01-04 14:42     ` Hirokazu Takahashi
2005-01-04 17:30       ` Ray Bryant
2005-01-04 17:40         ` process " Dave Hansen
2005-01-04 18:26           ` Ray Bryant
2005-01-07 16:57             ` migration cache, updated Ray Bryant
2005-01-10 10:07               ` Marcelo Tosatti
2005-01-03 19:25 Ray Bryant
2005-02-06  2:02 Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041124.192156.73388074.taka@valinux.co.jp \
    --to=taka@valinux.co.jp \
    --cc=haveblue@us.ibm.com \
    --cc=hugh@veritas.com \
    --cc=iwamoto@valinux.co.jp \
    --cc=linux-mm@kvack.org \
    --cc=marcelo.tosatti@cyclades.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox