[-mm][PATCH 0/10] memory related bugfix set for 2.6.26-rc5-mm3 v2

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [-mm][PATCH 0/10]  memory related bugfix set for 2.6.26-rc5-mm3 v2
@ 2008-06-25  9:59 KOSAKI Motohiro
  2008-06-25 10:01 ` [-mm][PATCH 1/10] fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build KOSAKI Motohiro
                   ` (10 more replies)
  0 siblings, 11 replies; 28+ messages in thread
From: KOSAKI Motohiro @ 2008-06-25  9:59 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Lee Schermerhorn, Rik van Riel
  Cc: kosaki.motohiro

Hi, Andrew and mm guys!

this is mm related fixes patchset for 2.6.26-rc5-mm3 v2.

Unfortunately, this version has several bugs and 
some bugs depend on each other.
So, I collect, sort, and fold these patchs.

btw: I wrote "this patch still crashed" last midnight.
but it works well today.
umm.. I was dreaming?

Anyway, I believe this patchset improve robustness and
provide better testing baseline.

enjoy!

Andrew, this patchset is my silver-spoon.
if you like it, I'm glad too.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [-mm][PATCH 1/10] fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build
  2008-06-25  9:59 [-mm][PATCH 0/10] memory related bugfix set for 2.6.26-rc5-mm3 v2 KOSAKI Motohiro
@ 2008-06-25 10:01 ` KOSAKI Motohiro
  2008-07-03  5:36   ` Andrew Morton
  2008-06-25 10:02 ` [-mm][PATCH 2/10] fix printk in show_free_areas() KOSAKI Motohiro
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: KOSAKI Motohiro @ 2008-06-25 10:01 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Lee Schermerhorn, Rik van Riel,
	Benjamin Kidwell
  Cc: kosaki.motohiro

=
From: Rik van Riel <riel@redhat.com>

Both CONFIG_PROC_PAGE_MONITOR and CONFIG_UNEVICTABLE_LRU depend on
mm/pagewalk.c being built.  Create a CONFIG_PAGE_WALKER Kconfig
variable that is automatically selected if needed.

Debugged-by: Benjamin Kidwell <benjkidwell@yahoo.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki@jp.fujitsu.com>

---
 init/Kconfig |    1 +
 mm/Kconfig   |    5 +++++
 mm/Makefile  |    2 +-
 3 files changed, 7 insertions(+), 1 deletion(-)

Index: b/init/Kconfig
===================================================================
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -803,6 +803,7 @@ source "arch/Kconfig"
 config PROC_PAGE_MONITOR
  	default y
 	depends on PROC_FS && MMU
+	select PAGE_WALKER
 	bool "Enable /proc page monitoring" if EMBEDDED
  	help
 	  Various /proc files exist to monitor process memory utilization:
Index: b/mm/Kconfig
===================================================================
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -209,9 +209,14 @@ config VIRT_TO_BUS
 	def_bool y
 	depends on !ARCH_NO_VIRT_TO_BUS
 
+# automatically selected by UNEVICTABLE_LRU or PROC_PAGE_MONITOR
+config PAGE_WALKER
+	def_bool n
+
 config UNEVICTABLE_LRU
 	bool "Add LRU list to track non-evictable pages"
 	default y
+	select PAGE_WALKER
 	help
 	  Keeps unevictable pages off of the active and inactive pageout
 	  lists, so kswapd will not waste CPU time or have its balancing
Index: b/mm/Makefile
===================================================================
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -13,7 +13,7 @@ obj-y			:= bootmem.o filemap.o mempool.o
 			   prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
 			   page_isolation.o $(mmu-y)
 
-obj-$(CONFIG_PROC_PAGE_MONITOR) += pagewalk.o
+obj-$(CONFIG_PAGE_WALKER) += pagewalk.o
 obj-$(CONFIG_BOUNCE)	+= bounce.o
 obj-$(CONFIG_SWAP)	+= page_io.o swap_state.o swapfile.o thrash.o
 obj-$(CONFIG_HAS_DMA)	+= dmapool.o


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [-mm][PATCH 1/10] fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build
  2008-06-25 10:01 ` [-mm][PATCH 1/10] fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build KOSAKI Motohiro
@ 2008-07-03  5:36   ` Andrew Morton
  2008-07-03  6:02     ` KOSAKI Motohiro
  0 siblings, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2008-07-03  5:36 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Lee Schermerhorn, Rik van Riel, Benjamin Kidwell

On Wed, 25 Jun 2008 19:01:40 +0900 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> 
> =
> From: Rik van Riel <riel@redhat.com>
> 
> Both CONFIG_PROC_PAGE_MONITOR and CONFIG_UNEVICTABLE_LRU depend on
> mm/pagewalk.c being built.  Create a CONFIG_PAGE_WALKER Kconfig
> variable that is automatically selected if needed.
> 
> Debugged-by: Benjamin Kidwell <benjkidwell@yahoo.com>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> Signed-off-by: KOSAKI Motohiro <kosaki@jp.fujitsu.com>
> 
> ---
>  init/Kconfig |    1 +
>  mm/Kconfig   |    5 +++++
>  mm/Makefile  |    2 +-
>  3 files changed, 7 insertions(+), 1 deletion(-)
> 
> Index: b/init/Kconfig
> ===================================================================
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -803,6 +803,7 @@ source "arch/Kconfig"
>  config PROC_PAGE_MONITOR
>   	default y
>  	depends on PROC_FS && MMU
> +	select PAGE_WALKER
>  	bool "Enable /proc page monitoring" if EMBEDDED
>   	help
>  	  Various /proc files exist to monitor process memory utilization:

You used select!  With the usual consequences.

mm/pagewalk.c: In function `walk_pud_range':
mm/pagewalk.c:64: error: implicit declaration of function `pud_none_or_clear_bad'
mm/pagewalk.c: In function `walk_page_range':
mm/pagewalk.c:119: error: implicit declaration of function `pgd_addr_end'
mm/pagewalk.c:120: error: implicit declaration of function `pgd_none_or_clear_ba

That's SuperH allmodconfig.  I expect all nommu builds are busted.

> Index: b/mm/Kconfig
> ===================================================================
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -209,9 +209,14 @@ config VIRT_TO_BUS
>  	def_bool y
>  	depends on !ARCH_NO_VIRT_TO_BUS
>  
> +# automatically selected by UNEVICTABLE_LRU or PROC_PAGE_MONITOR
> +config PAGE_WALKER
> +	def_bool n
> +
>  config UNEVICTABLE_LRU
>  	bool "Add LRU list to track non-evictable pages"
>  	default y
> +	select PAGE_WALKER

So what do we do?  Make UNEVICTABLE_LRU depend on CONFIG_MMU?  That
would be even worse than what we have now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [-mm][PATCH 1/10] fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build
  2008-07-03  5:36   ` Andrew Morton
@ 2008-07-03  6:02     ` KOSAKI Motohiro
  2008-07-03 13:16       ` Rik van Riel
  0 siblings, 1 reply; 28+ messages in thread
From: KOSAKI Motohiro @ 2008-07-03  6:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro, LKML, linux-mm, Lee Schermerhorn, Rik van Riel,
	Benjamin Kidwell

> >  config UNEVICTABLE_LRU
> >  	bool "Add LRU list to track non-evictable pages"
> >  	default y
> > +	select PAGE_WALKER
> 
> So what do we do?  Make UNEVICTABLE_LRU depend on CONFIG_MMU?  That
> would be even worse than what we have now.

I'm not sure about what do we do. but I'd prefer "depends on MMU".
because current munlock implementation need pagewalker.
So, munlock rewriting have high risk rather than change depend on.

Rik, What do you think?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [-mm][PATCH 1/10] fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build
  2008-07-03  6:02     ` KOSAKI Motohiro
@ 2008-07-03 13:16       ` Rik van Riel
  0 siblings, 0 replies; 28+ messages in thread
From: Rik van Riel @ 2008-07-03 13:16 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, LKML, linux-mm, Lee Schermerhorn, Benjamin Kidwell

On Thu, 03 Jul 2008 15:02:23 +0900
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> > >  config UNEVICTABLE_LRU
> > >  	bool "Add LRU list to track non-evictable pages"
> > >  	default y
> > > +	select PAGE_WALKER
> > 
> > So what do we do?  Make UNEVICTABLE_LRU depend on CONFIG_MMU?  That
> > would be even worse than what we have now.
> 
> I'm not sure about what do we do. but I'd prefer "depends on MMU".
> because current munlock implementation need pagewalker.
> So, munlock rewriting have high risk rather than change depend on.
> 
> Rik, What do you think?

I suspect that systems without an MMU will not run into
page replacement scalability issues, so making the
UNEVICTABLE_LRU config option depend on MMU should be
ok.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [-mm][PATCH 2/10] fix printk in show_free_areas()
  2008-06-25  9:59 [-mm][PATCH 0/10] memory related bugfix set for 2.6.26-rc5-mm3 v2 KOSAKI Motohiro
  2008-06-25 10:01 ` [-mm][PATCH 1/10] fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build KOSAKI Motohiro
@ 2008-06-25 10:02 ` KOSAKI Motohiro
  2008-06-25 10:03 ` [-mm][PATCH 3/10] fix munlock page table walk - now requires 'mm' KOSAKI Motohiro
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: KOSAKI Motohiro @ 2008-06-25 10:02 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Lee Schermerhorn, Rik van Riel,
	Hiroshi Shimamoto
  Cc: kosaki.motohiro

=
From: Rik van Riel <riel@redhat.com>

Fix typo in show_free_areas.

Debugged-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

---
 mm/page_alloc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1931,7 +1931,7 @@ void show_free_areas(void)
 		}
 	}
 
-	printk("Active_anon:%lu active_file:%lu inactive_anon%lu\n"
+	printk("Active_anon:%lu active_file:%lu inactive_anon:%lu\n"
 		" inactive_file:%lu"
 //TODO:  check/adjust line lengths
 #ifdef CONFIG_UNEVICTABLE_LRU


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [-mm][PATCH 3/10] fix munlock page table walk - now requires 'mm'
  2008-06-25  9:59 [-mm][PATCH 0/10] memory related bugfix set for 2.6.26-rc5-mm3 v2 KOSAKI Motohiro
  2008-06-25 10:01 ` [-mm][PATCH 1/10] fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build KOSAKI Motohiro
  2008-06-25 10:02 ` [-mm][PATCH 2/10] fix printk in show_free_areas() KOSAKI Motohiro
@ 2008-06-25 10:03 ` KOSAKI Motohiro
  2008-06-25 10:04 ` [-mm][PATCH 4/10] fix migration_entry_wait() for speculative page cache KOSAKI Motohiro
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: KOSAKI Motohiro @ 2008-06-25 10:03 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Lee Schermerhorn, Rik van Riel
  Cc: kosaki.motohiro

=
From: Lee Schermerhorn <lee.schermerhorn@hp.com>

Against 2.6.26-rc5-mm3.

Initialize the 'mm' member of the mm_walk structure, else the
page table walk doesn't occur, and mlocked pages will not be
munlocked.  This is visible in the vmstats:  

	noreclaim_pgs_munlocked - should equal noreclaim_pgs_mlocked
	  less (nr_mlock + noreclaim_pgs_cleared), but is always zero 
	  [munlock_vma_page() never called]

	noreclaim_pgs_mlockfreed - should be zero [for debug only],
	  but == noreclaim_pgs_mlocked - (nr_mlock + noreclaim_pgs_cleared)


Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

 mm/mlock.c |    1 +
 1 file changed, 1 insertion(+)

Index: b/mm/mlock.c
===================================================================
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -301,6 +301,7 @@ static void __munlock_vma_pages_range(st
 		.pmd_entry = __munlock_pmd_handler,
 		.pte_entry = __munlock_pte_handler,
 		.private = &mpw,
+		.mm = mm,
 	};
 
 	VM_BUG_ON(start & ~PAGE_MASK || end & ~PAGE_MASK);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [-mm][PATCH 4/10]  fix migration_entry_wait() for speculative page cache
  2008-06-25  9:59 [-mm][PATCH 0/10] memory related bugfix set for 2.6.26-rc5-mm3 v2 KOSAKI Motohiro
                   ` (2 preceding siblings ...)
  2008-06-25 10:03 ` [-mm][PATCH 3/10] fix munlock page table walk - now requires 'mm' KOSAKI Motohiro
@ 2008-06-25 10:04 ` KOSAKI Motohiro
  2008-06-25 10:05 ` [-mm][PATCH 5/10] collect lru meminfo statistics from correct offset KOSAKI Motohiro
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: KOSAKI Motohiro @ 2008-06-25 10:04 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Lee Schermerhorn, Rik van Riel,
	Daisuke Nishimura
  Cc: kosaki.motohiro

=
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

In speculative page cache lookup protocol, page_count(page) is set to 0
while radix-tree modification is going on, truncation, migration, etc...

While page migration, a page fault to page under migration should wait
unlock_page() and migration_entry_wait() waits for the page from its
pte entry. It does get_page() -> wait_on_page_locked() -> put_page() now.

In page migration, page_freeze_refs() -> page_unfreeze_refs() is called.

Here, page_unfreeze_refs() expects page_count(page) == 0 and panics
if page_count(page) != 0. To avoid this, we shouldn't touch page_count()
if it is zero. This patch uses page_cache_get_speculative() to avoid
the panic.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/migrate.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Index: b/mm/migrate.c
===================================================================
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -243,7 +243,15 @@ void migration_entry_wait(struct mm_stru
 
 	page = migration_entry_to_page(entry);
 
-	get_page(page);
+	/*
+	 * Once radix-tree replacement of page migration started, page_count
+	 * *must* be zero. And, we don't want to call wait_on_page_locked()
+	 * against a page without get_page().
+	 * So, we use get_page_unless_zero(), here. Even failed, page fault
+	 * will occur again.
+	 */
+	if (!get_page_unless_zero(page))
+		goto out;
 	pte_unmap_unlock(ptep, ptl);
 	wait_on_page_locked(page);
 	put_page(page);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [-mm][PATCH 5/10] collect lru meminfo statistics from correct offset
  2008-06-25  9:59 [-mm][PATCH 0/10] memory related bugfix set for 2.6.26-rc5-mm3 v2 KOSAKI Motohiro
                   ` (3 preceding siblings ...)
  2008-06-25 10:04 ` [-mm][PATCH 4/10] fix migration_entry_wait() for speculative page cache KOSAKI Motohiro
@ 2008-06-25 10:05 ` KOSAKI Motohiro
  2008-06-25 10:06 ` [-mm][PATCH 6/10] fix incorrect Mlocked field of /proc/meminfo KOSAKI Motohiro
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: KOSAKI Motohiro @ 2008-06-25 10:05 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Lee Schermerhorn, Rik van Riel
  Cc: kosaki.motohiro

=
From: Lee Schermerhorn <lee.schermerhorn@hp.com>

Against:  2.6.26-rc5-mm3

Offset 'lru' by 'NR_LRU_BASE' to obtain global page state for
lru list 'lru'.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

 fs/proc/proc_misc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: b/fs/proc/proc_misc.c
===================================================================
--- a/fs/proc/proc_misc.c
+++ b/fs/proc/proc_misc.c
@@ -158,7 +158,7 @@ static int meminfo_read_proc(char *page,
 	get_vmalloc_info(&vmi);
 
 	for (lru = LRU_BASE; lru < NR_LRU_LISTS; lru++)
-		pages[lru] = global_page_state(lru);
+		pages[lru] = global_page_state(NR_LRU_BASE + lru);
 
 	/*
 	 * Tagged format, for easy grepping and expansion.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [-mm][PATCH 6/10] fix incorrect Mlocked field of /proc/meminfo
  2008-06-25  9:59 [-mm][PATCH 0/10] memory related bugfix set for 2.6.26-rc5-mm3 v2 KOSAKI Motohiro
                   ` (4 preceding siblings ...)
  2008-06-25 10:05 ` [-mm][PATCH 5/10] collect lru meminfo statistics from correct offset KOSAKI Motohiro
@ 2008-06-25 10:06 ` KOSAKI Motohiro
  2008-06-25 10:07 ` [-mm][PATCH 7/10] prevent incorrect oom under split_lru KOSAKI Motohiro
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: KOSAKI Motohiro @ 2008-06-25 10:06 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Lee Schermerhorn, Rik van Riel,
	Hugh Dickins
  Cc: kosaki.motohiro

Against: 2.6.26-rc5-mm3

Mlocked field of /proc/meminfo display silly number.
because trivial mistake exist in meminfo_read_proc().

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

---
 fs/proc/proc_misc.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: b/fs/proc/proc_misc.c
===================================================================
--- a/fs/proc/proc_misc.c
+++ b/fs/proc/proc_misc.c
@@ -216,7 +216,7 @@ static int meminfo_read_proc(char *page,
 		K(pages[LRU_INACTIVE_FILE]),
 #ifdef CONFIG_UNEVICTABLE_LRU
 		K(pages[LRU_UNEVICTABLE]),
-		K(pages[NR_MLOCK]),
+		K(global_page_state(NR_MLOCK)),
 #endif
 #ifdef CONFIG_HIGHMEM
 		K(i.totalhigh),


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [-mm][PATCH 7/10]  prevent incorrect oom under split_lru
  2008-06-25  9:59 [-mm][PATCH 0/10] memory related bugfix set for 2.6.26-rc5-mm3 v2 KOSAKI Motohiro
                   ` (5 preceding siblings ...)
  2008-06-25 10:06 ` [-mm][PATCH 6/10] fix incorrect Mlocked field of /proc/meminfo KOSAKI Motohiro
@ 2008-06-25 10:07 ` KOSAKI Motohiro
  2008-06-25 10:09 ` [-mm][PATCH 8/10] fix shmem page migration incorrectness on memcgroup KOSAKI Motohiro
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: KOSAKI Motohiro @ 2008-06-25 10:07 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Lee Schermerhorn, Rik van Riel
  Cc: kosaki.motohiro

if zone->recent_scanned parameter become inbalanceing anon and file,
OOM killer can happened although swappable page exist.

So, if priority==0, We should try to reclaim all page for prevent OOM.


Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>

---
 mm/vmscan.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1460,8 +1460,10 @@ static unsigned long shrink_zone(int pri
 			 * kernel will slowly sift through each list.
 			 */
 			scan = zone_page_state(zone, NR_LRU_BASE + l);
-			scan >>= priority;
-			scan = (scan * percent[file]) / 100;
+			if (priority) {
+				scan >>= priority;
+				scan = (scan * percent[file]) / 100;
+			}
 			zone->lru[l].nr_scan += scan + 1;
 			nr[l] = zone->lru[l].nr_scan;
 			if (nr[l] >= sc->swap_cluster_max)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [-mm][PATCH 8/10] fix shmem page migration incorrectness on memcgroup
  2008-06-25  9:59 [-mm][PATCH 0/10] memory related bugfix set for 2.6.26-rc5-mm3 v2 KOSAKI Motohiro
                   ` (6 preceding siblings ...)
  2008-06-25 10:07 ` [-mm][PATCH 7/10] prevent incorrect oom under split_lru KOSAKI Motohiro
@ 2008-06-25 10:09 ` KOSAKI Motohiro
  2008-06-27  5:08   ` MinChan Kim
  2008-06-25 10:10 ` [-mm][PATCH 9/10] memcg: fix mem_cgroup_end_migration() race KOSAKI Motohiro
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: KOSAKI Motohiro @ 2008-06-25 10:09 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Lee Schermerhorn, Rik van Riel,
	KAMEZAWA Hiroyuki, Daisuke Nishimura
  Cc: kosaki.motohiro

=
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

mem_cgroup_uncharge() against old page is done after radix-tree-replacement.
And there were special handling to ingore swap-cache page. But, shmem can
be swap-cache and file-cache at the same time. Chekcing PageSwapCache() is
not correct here. Check PageAnon() instead.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

---
 mm/migrate.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Index: b/mm/migrate.c
===================================================================
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -332,7 +332,13 @@ static int migrate_page_move_mapping(str
 	__inc_zone_page_state(newpage, NR_FILE_PAGES);
 
 	spin_unlock_irq(&mapping->tree_lock);
-	if (!PageSwapCache(newpage))
+
+	/*
+	 * The page is removed from radix-tree implicitly.
+	 * We uncharge it here but swap cache of anonymous page should be
+	 * uncharged by mem_cgroup_ucharge_page().
+	 */
+	if (!PageAnon(newpage))
 		mem_cgroup_uncharge_cache_page(page);
 
 	return 0;
@@ -381,7 +387,8 @@ static void migrate_page_copy(struct pag
 		/*
 		 * SwapCache is removed implicitly. Uncharge against swapcache
 		 * should be called after ClearPageSwapCache() because
-		 * mem_cgroup_uncharge_page checks the flag.
+		 * mem_cgroup_uncharge_page checks the flag. shmem's swap cache
+		 * is uncharged before here.
 		 */
 		mem_cgroup_uncharge_page(page);
 	}


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [-mm][PATCH 8/10] fix shmem page migration incorrectness on memcgroup
  2008-06-25 10:09 ` [-mm][PATCH 8/10] fix shmem page migration incorrectness on memcgroup KOSAKI Motohiro
@ 2008-06-27  5:08   ` MinChan Kim
  2008-06-27  5:41     ` KOSAKI Motohiro
  0 siblings, 1 reply; 28+ messages in thread
From: MinChan Kim @ 2008-06-27  5:08 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: LKML, linux-mm, Andrew Morton, Lee Schermerhorn, Rik van Riel,
	KOSAKI Motohiro, Daisuke Nishimura

Hi, KAMEZAWA-san.

I have one question.
It's just curious.

On Wed, Jun 25, 2008 at 7:09 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>
> =
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
> mem_cgroup_uncharge() against old page is done after radix-tree-replacement.
> And there were special handling to ingore swap-cache page. But, shmem can
> be swap-cache and file-cache at the same time. Chekcing PageSwapCache() is
> not correct here. Check PageAnon() instead.

When/How shmem can be both swap-cache and file-cache ?
I can't understand that situation.

Thanks. :)

> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>
> ---
>  mm/migrate.c |   11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> Index: b/mm/migrate.c
> ===================================================================
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -332,7 +332,13 @@ static int migrate_page_move_mapping(str
>        __inc_zone_page_state(newpage, NR_FILE_PAGES);
>
>        spin_unlock_irq(&mapping->tree_lock);
> -       if (!PageSwapCache(newpage))
> +
> +       /*
> +        * The page is removed from radix-tree implicitly.
> +        * We uncharge it here but swap cache of anonymous page should be
> +        * uncharged by mem_cgroup_ucharge_page().
> +        */
> +       if (!PageAnon(newpage))
>                mem_cgroup_uncharge_cache_page(page);
>
>        return 0;
> @@ -381,7 +387,8 @@ static void migrate_page_copy(struct pag
>                /*
>                 * SwapCache is removed implicitly. Uncharge against swapcache
>                 * should be called after ClearPageSwapCache() because
> -                * mem_cgroup_uncharge_page checks the flag.
> +                * mem_cgroup_uncharge_page checks the flag. shmem's swap cache
> +                * is uncharged before here.
>                 */
>                mem_cgroup_uncharge_page(page);
>        }
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>



-- 
Kinds regards,
MinChan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [-mm][PATCH 8/10] fix shmem page migration incorrectness on memcgroup
  2008-06-27  5:08   ` MinChan Kim
@ 2008-06-27  5:41     ` KOSAKI Motohiro
  2008-06-27  7:57       ` MinChan Kim
  0 siblings, 1 reply; 28+ messages in thread
From: KOSAKI Motohiro @ 2008-06-27  5:41 UTC (permalink / raw)
  To: MinChan Kim
  Cc: kosaki.motohiro, KAMEZAWA Hiroyuki, LKML, linux-mm,
	Andrew Morton, Lee Schermerhorn, Rik van Riel, Daisuke Nishimura

> > mem_cgroup_uncharge() against old page is done after radix-tree-replacement.
> > And there were special handling to ingore swap-cache page. But, shmem can
> > be swap-cache and file-cache at the same time. Chekcing PageSwapCache() is
> > not correct here. Check PageAnon() instead.
> 
> When/How shmem can be both swap-cache and file-cache ?
> I can't understand that situation.

Hi

see, 

shmem_writepage()
   -> add_to_swap_cache()
      -> SetPageSwapCache()


BTW: his file-cache mean !Anon, not mean !SwapBacked.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [-mm][PATCH 8/10] fix shmem page migration incorrectness on memcgroup
  2008-06-27  5:41     ` KOSAKI Motohiro
@ 2008-06-27  7:57       ` MinChan Kim
  2008-06-27  8:52         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 28+ messages in thread
From: MinChan Kim @ 2008-06-27  7:57 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: KAMEZAWA Hiroyuki, LKML, linux-mm, Andrew Morton,
	Lee Schermerhorn, Rik van Riel, Daisuke Nishimura

On Fri, Jun 27, 2008 at 2:41 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>> > mem_cgroup_uncharge() against old page is done after radix-tree-replacement.
>> > And there were special handling to ingore swap-cache page. But, shmem can
>> > be swap-cache and file-cache at the same time. Chekcing PageSwapCache() is
>> > not correct here. Check PageAnon() instead.
>>
>> When/How shmem can be both swap-cache and file-cache ?
>> I can't understand that situation.
>
> Hi
>
> see,
>
> shmem_writepage()
>   -> add_to_swap_cache()
>      -> SetPageSwapCache()
>
>
> BTW: his file-cache mean !Anon, not mean !SwapBacked.

Hi KOSAKI-san.
Thanks for explaining.

In the migrate_page_move_mapping, the page was already locked in unmap_and_move.
Also, we have a lock for that page for calling shmem_writepage.

So I think race problem between shmem_writepage and
migrate_page_move_mapping don't occur.
But I am not sure I am right.

If I am wrong, could you tell me when race problem happen ? :)


>
>



-- 
Kinds regards,
MinChan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [-mm][PATCH 8/10] fix shmem page migration incorrectness on memcgroup
  2008-06-27  7:57       ` MinChan Kim
@ 2008-06-27  8:52         ` KAMEZAWA Hiroyuki
  2008-06-27  9:29           ` Daisuke Nishimura
                             ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: KAMEZAWA Hiroyuki @ 2008-06-27  8:52 UTC (permalink / raw)
  To: MinChan Kim
  Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Lee Schermerhorn,
	Rik van Riel, Daisuke Nishimura

On Fri, 27 Jun 2008 16:57:56 +0900
"MinChan Kim" <minchan.kim@gmail.com> wrote:

> On Fri, Jun 27, 2008 at 2:41 PM, KOSAKI Motohiro
> <kosaki.motohiro@jp.fujitsu.com> wrote:
> >> > mem_cgroup_uncharge() against old page is done after radix-tree-replacement.
> >> > And there were special handling to ingore swap-cache page. But, shmem can
> >> > be swap-cache and file-cache at the same time. Chekcing PageSwapCache() is
> >> > not correct here. Check PageAnon() instead.
> >>
> >> When/How shmem can be both swap-cache and file-cache ?
> >> I can't understand that situation.
> >
> > Hi
> >
> > see,
> >
> > shmem_writepage()
> >   -> add_to_swap_cache()
> >      -> SetPageSwapCache()
> >
> >
> > BTW: his file-cache mean !Anon, not mean !SwapBacked.
> 
> Hi KOSAKI-san.
> Thanks for explaining.
> 
> In the migrate_page_move_mapping, the page was already locked in unmap_and_move.
> Also, we have a lock for that page for calling shmem_writepage.
> 
> So I think race problem between shmem_writepage and
> migrate_page_move_mapping don't occur.
> But I am not sure I am right.
> 
> If I am wrong, could you tell me when race problem happen ? :)
> 
You are right. I misundestood the swap/shmem code. there is no race.
Hmm...

But situation is a bit complicated.
- shmem's page is charged as file-cache.
- shmem's swap cache is still charged by mem_cgroup_cache_charge() because
  it's implicitly (to memcg) converted to swap cache. 
- anon's swap cache is charged by mem_cgroup_uncharge_cache_page()

So, uncharging swap-cache of shmem by mem_cgroup_uncharge_cache_page() is valid.
Checking PageSwapCache() was bad and Cheking PageAnon() is good.
(From maintainance view)

I think the patch is valid but my patch description contains wrong information.
Andrew, could you drop this ? I'll rewrite the patch description.

Sorry,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [-mm][PATCH 8/10] fix shmem page migration incorrectness on memcgroup
  2008-06-27  8:52         ` KAMEZAWA Hiroyuki
@ 2008-06-27  9:29           ` Daisuke Nishimura
  2008-06-27 10:13           ` MinChan Kim
  2008-06-27 12:24           ` kamezawa.hiroyu
  2 siblings, 0 replies; 28+ messages in thread
From: Daisuke Nishimura @ 2008-06-27  9:29 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: MinChan Kim, KOSAKI Motohiro, LKML, linux-mm, Andrew Morton,
	Lee Schermerhorn, Rik van Riel

On Fri, 27 Jun 2008 17:52:01 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Fri, 27 Jun 2008 16:57:56 +0900
> "MinChan Kim" <minchan.kim@gmail.com> wrote:
> 
> > On Fri, Jun 27, 2008 at 2:41 PM, KOSAKI Motohiro
> > <kosaki.motohiro@jp.fujitsu.com> wrote:
> > >> > mem_cgroup_uncharge() against old page is done after radix-tree-replacement.
> > >> > And there were special handling to ingore swap-cache page. But, shmem can
> > >> > be swap-cache and file-cache at the same time. Chekcing PageSwapCache() is
> > >> > not correct here. Check PageAnon() instead.
> > >>
> > >> When/How shmem can be both swap-cache and file-cache ?
> > >> I can't understand that situation.
> > >
> > > Hi
> > >
> > > see,
> > >
> > > shmem_writepage()
> > >   -> add_to_swap_cache()
> > >      -> SetPageSwapCache()
> > >
> > >
> > > BTW: his file-cache mean !Anon, not mean !SwapBacked.
> > 
> > Hi KOSAKI-san.
> > Thanks for explaining.
> > 
> > In the migrate_page_move_mapping, the page was already locked in unmap_and_move.
> > Also, we have a lock for that page for calling shmem_writepage.
> > 
> > So I think race problem between shmem_writepage and
> > migrate_page_move_mapping don't occur.
> > But I am not sure I am right.
> > 
> > If I am wrong, could you tell me when race problem happen ? :)
> > 
> You are right. I misundestood the swap/shmem code. there is no race.
> Hmm...
> 
> But situation is a bit complicated.
> - shmem's page is charged as file-cache.
> - shmem's swap cache is still charged by mem_cgroup_cache_charge() because
>   it's implicitly (to memcg) converted to swap cache. 
> - anon's swap cache is charged by mem_cgroup_uncharge_cache_page()
> 
I'm sorry if I misunderstand something.

I think anon's swap cache is:

- charged by nowhere as "cache".
  If anon pages are also on swap cache, charges for them remain charged
  even when mem_cgroup_uncharge_page() is called, because it checks PG_swapcache.
  So, as a result, anon's swap cache is charged.
- uncharged by memcgroup_uncharge_page() in __delete_from_swap_cache()
  after clearing PG_swapcache.

right?

> So, uncharging swap-cache of shmem by mem_cgroup_uncharge_cache_page() is valid.
> Checking PageSwapCache() was bad and Cheking PageAnon() is good.
> (From maintainance view)
> 
agree.


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [-mm][PATCH 8/10] fix shmem page migration incorrectness on memcgroup
  2008-06-27  8:52         ` KAMEZAWA Hiroyuki
  2008-06-27  9:29           ` Daisuke Nishimura
@ 2008-06-27 10:13           ` MinChan Kim
  2008-06-27 12:24           ` kamezawa.hiroyu
  2 siblings, 0 replies; 28+ messages in thread
From: MinChan Kim @ 2008-06-27 10:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Lee Schermerhorn,
	Rik van Riel, Daisuke Nishimura

On Fri, Jun 27, 2008 at 5:52 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Fri, 27 Jun 2008 16:57:56 +0900
> "MinChan Kim" <minchan.kim@gmail.com> wrote:
>
>> On Fri, Jun 27, 2008 at 2:41 PM, KOSAKI Motohiro
>> <kosaki.motohiro@jp.fujitsu.com> wrote:
>> >> > mem_cgroup_uncharge() against old page is done after radix-tree-replacement.
>> >> > And there were special handling to ingore swap-cache page. But, shmem can
>> >> > be swap-cache and file-cache at the same time. Chekcing PageSwapCache() is
>> >> > not correct here. Check PageAnon() instead.
>> >>
>> >> When/How shmem can be both swap-cache and file-cache ?
>> >> I can't understand that situation.
>> >
>> > Hi
>> >
>> > see,
>> >
>> > shmem_writepage()
>> >   -> add_to_swap_cache()
>> >      -> SetPageSwapCache()
>> >
>> >
>> > BTW: his file-cache mean !Anon, not mean !SwapBacked.
>>
>> Hi KOSAKI-san.
>> Thanks for explaining.
>>
>> In the migrate_page_move_mapping, the page was already locked in unmap_and_move.
>> Also, we have a lock for that page for calling shmem_writepage.
>>
>> So I think race problem between shmem_writepage and
>> migrate_page_move_mapping don't occur.
>> But I am not sure I am right.
>>
>> If I am wrong, could you tell me when race problem happen ? :)
>>
> You are right. I misundestood the swap/shmem code. there is no race.
> Hmm...
>
> But situation is a bit complicated.
> - shmem's page is charged as file-cache.
> - shmem's swap cache is still charged by mem_cgroup_cache_charge() because
>  it's implicitly (to memcg) converted to swap cache.
> - anon's swap cache is charged by mem_cgroup_uncharge_cache_page()
>
> So, uncharging swap-cache of shmem by mem_cgroup_uncharge_cache_page() is valid.
> Checking PageSwapCache() was bad and Cheking PageAnon() is good.
> (From maintainance view)

I agree.
I also thought your patch is no problem.
It is just description problem.


> I think the patch is valid but my patch description contains wrong information.
> Andrew, could you drop this ? I'll rewrite the patch description.
>
> Sorry,
> -Kame
>
>



-- 
Kinds regards,
MinChan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Re: [-mm][PATCH 8/10] fix shmem page migration incorrectness on memcgroup
  2008-06-27  8:52         ` KAMEZAWA Hiroyuki
  2008-06-27  9:29           ` Daisuke Nishimura
  2008-06-27 10:13           ` MinChan Kim
@ 2008-06-27 12:24           ` kamezawa.hiroyu
  2 siblings, 0 replies; 28+ messages in thread
From: kamezawa.hiroyu @ 2008-06-27 12:24 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, MinChan Kim, KOSAKI Motohiro, LKML, linux-mm,
	Andrew Morton, Lee Schermerhorn, Rik van Riel

----- Original Message -----

>> But situation is a bit complicated.
>> - shmem's page is charged as file-cache.
>> - shmem's swap cache is still charged by mem_cgroup_cache_charge() because
>>   it's implicitly (to memcg) converted to swap cache. 
>> - anon's swap cache is charged by mem_cgroup_uncharge_cache_page()
>> 
>I'm sorry if I misunderstand something.
>
>I think anon's swap cache is:
>
>- charged by nowhere as "cache".
yes.
>  If anon pages are also on swap cache, charges for them remain charged
>  even when mem_cgroup_uncharge_page() is called, because it checks PG_swapca
che.
>  So, as a result, anon's swap cache is charged.
yes.

>- uncharged by memcgroup_uncharge_page() in __delete_from_swap_cache()
>  after clearing PG_swapcache.
>
>right?
>
You're right. Sorry for confusion.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [-mm][PATCH 9/10]  memcg: fix mem_cgroup_end_migration() race
  2008-06-25  9:59 [-mm][PATCH 0/10] memory related bugfix set for 2.6.26-rc5-mm3 v2 KOSAKI Motohiro
                   ` (7 preceding siblings ...)
  2008-06-25 10:09 ` [-mm][PATCH 8/10] fix shmem page migration incorrectness on memcgroup KOSAKI Motohiro
@ 2008-06-25 10:10 ` KOSAKI Motohiro
  2008-06-25 10:53   ` Daisuke Nishimura
  2008-06-25 10:11 ` [-mm][PATCH 10/10] putback_lru_page()/unevictable page handling rework v4 KOSAKI Motohiro
  2008-06-25 15:09 ` [-mm][PATCH 0/10] memory related bugfix set for 2.6.26-rc5-mm3 v2 Lee Schermerhorn
  10 siblings, 1 reply; 28+ messages in thread
From: KOSAKI Motohiro @ 2008-06-25 10:10 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Lee Schermerhorn, Rik van Riel,
	KAMEZAWA Hiroyuki, Balbir Singh
  Cc: kosaki.motohiro

=
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

In general, mem_cgroup's charge on ANON page is removed when page_remove_rmap()
is called.

At migration, the newpage is remapped again by remove_migration_ptes(). But
pte may be already changed (by task exits).
It is charged at page allocation but have no chance to be uncharged in that
case because it is never added to rmap.

Handle that corner case in mem_cgroup_end_migration().


Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

---
 mm/memcontrol.c |   14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Index: b/mm/memcontrol.c
===================================================================
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -747,10 +747,22 @@ int mem_cgroup_prepare_migration(struct 
 /* remove redundant charge if migration failed*/
 void mem_cgroup_end_migration(struct page *newpage)
 {
-	/* At success, page->mapping is not NULL and nothing to do. */
+	/*
+	 * At success, page->mapping is not NULL.
+	 * special rollback care is necessary when
+	 * 1. at migration failure. (newpage->mapping is cleared in this case)
+	 * 2. the newpage was moved but not remapped again because the task
+	 *    exits and the newpage is obsolete. In this case, the new page
+	 *    may be a swapcache. So, we just call mem_cgroup_uncharge_page()
+	 *    always for avoiding mess. The  page_cgroup will be removed if
+	 *    unnecessary. File cache pages is still on radix-tree. Don't
+	 *    care it.
+	 */
 	if (!newpage->mapping)
 		__mem_cgroup_uncharge_common(newpage,
 					 MEM_CGROUP_CHARGE_TYPE_FORCE);
+	else if (PageAnon(newpage))
+		mem_cgroup_uncharge_page(newpage);
 }
 
 /*


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [-mm][PATCH 9/10]  memcg: fix mem_cgroup_end_migration() race
  2008-06-25 10:10 ` [-mm][PATCH 9/10] memcg: fix mem_cgroup_end_migration() race KOSAKI Motohiro
@ 2008-06-25 10:53   ` Daisuke Nishimura
  0 siblings, 0 replies; 28+ messages in thread
From: Daisuke Nishimura @ 2008-06-25 10:53 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Lee Schermerhorn, Rik van Riel,
	KAMEZAWA Hiroyuki, Balbir Singh

On Wed, 25 Jun 2008 19:10:11 +0900, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 
> =
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> 
> In general, mem_cgroup's charge on ANON page is removed when page_remove_rmap()
> is called.
> 
> At migration, the newpage is remapped again by remove_migration_ptes(). But
> pte may be already changed (by task exits).
> It is charged at page allocation but have no chance to be uncharged in that
> case because it is never added to rmap.
> 
> Handle that corner case in mem_cgroup_end_migration().
> 
> 
Sorry for late reply.

I've confirmed that this patch fixes the bad page problem
I had been seeing on my test(survived more than 28h w/o errors).

> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 

Tested-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>


Thanks,
Daisuke Nishimura.

> ---
>  mm/memcontrol.c |   14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> Index: b/mm/memcontrol.c
> ===================================================================
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -747,10 +747,22 @@ int mem_cgroup_prepare_migration(struct 
>  /* remove redundant charge if migration failed*/
>  void mem_cgroup_end_migration(struct page *newpage)
>  {
> -	/* At success, page->mapping is not NULL and nothing to do. */
> +	/*
> +	 * At success, page->mapping is not NULL.
> +	 * special rollback care is necessary when
> +	 * 1. at migration failure. (newpage->mapping is cleared in this case)
> +	 * 2. the newpage was moved but not remapped again because the task
> +	 *    exits and the newpage is obsolete. In this case, the new page
> +	 *    may be a swapcache. So, we just call mem_cgroup_uncharge_page()
> +	 *    always for avoiding mess. The  page_cgroup will be removed if
> +	 *    unnecessary. File cache pages is still on radix-tree. Don't
> +	 *    care it.
> +	 */
>  	if (!newpage->mapping)
>  		__mem_cgroup_uncharge_common(newpage,
>  					 MEM_CGROUP_CHARGE_TYPE_FORCE);
> +	else if (PageAnon(newpage))
> +		mem_cgroup_uncharge_page(newpage);
>  }
>  
>  /*
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [-mm][PATCH 10/10] putback_lru_page()/unevictable page handling rework v4
  2008-06-25  9:59 [-mm][PATCH 0/10] memory related bugfix set for 2.6.26-rc5-mm3 v2 KOSAKI Motohiro
                   ` (8 preceding siblings ...)
  2008-06-25 10:10 ` [-mm][PATCH 9/10] memcg: fix mem_cgroup_end_migration() race KOSAKI Motohiro
@ 2008-06-25 10:11 ` KOSAKI Motohiro
  2008-06-25 10:14   ` KOSAKI Motohiro
  2008-06-25 16:29   ` Lee Schermerhorn
  2008-06-25 15:09 ` [-mm][PATCH 0/10] memory related bugfix set for 2.6.26-rc5-mm3 v2 Lee Schermerhorn
  10 siblings, 2 replies; 28+ messages in thread
From: KOSAKI Motohiro @ 2008-06-25 10:11 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Lee Schermerhorn, Rik van Riel,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro


Changelog
================
V3 -> V4
   o fix broken recheck logic in putback_lru_page().
   o fix shmem_lock() prototype.

V2 -> V3
   o remove lock_page() from scan_mapping_unevictable_pages() and
     scan_zone_unevictable_pages().
   o revert ipc/shm.c mm/shmem.c change of SHMEM unevictable patch.
     it become unnecessary by this patch.

V1 -> V2
   o undo unintented comment killing.
   o move putback_lru_page() from move_to_new_page() to unmap_and_move().
   o folded depend patch
       http://marc.info/?l=linux-mm&m=121337119621958&w=2
       http://marc.info/?l=linux-kernel&m=121362782406478&w=2
       http://marc.info/?l=linux-mm&m=121377572909776&w=2


Now, putback_lru_page() requires that the page is locked.
And in some special case, implicitly unlock it.

This patch tries to make putback_lru_pages() to be lock_page() free.
(Of course, some callers must take the lock.)

The main reason that putback_lru_page() assumes that page is locked
is to avoid the change in page's status among Mlocked/Not-Mlocked.

Once it is added to unevictable list, the page is removed from
unevictable list only when page is munlocked. (there are other special
case. but we ignore the special case.)
So, status change during putback_lru_page() is fatal and page should 
be locked.

putback_lru_page() in this patch has a new concepts.
When it adds page to unevictable list, it checks the status is 
changed or not again. if changed, retry to putback.

This patche changes also caller side and cleaning up lock/unlock_page().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroy@jp.fujitsu.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

---
 include/linux/mm.h |    9 +---
 ipc/shm.c          |   16 -------
 mm/internal.h      |    2 
 mm/migrate.c       |   60 +++++++++------------------
 mm/mlock.c         |   51 +++++++++++++----------
 mm/shmem.c         |    9 +---
 mm/vmscan.c        |  114 +++++++++++++++++++++++------------------------------
 7 files changed, 110 insertions(+), 151 deletions(-)

Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -486,31 +486,21 @@ int remove_mapping(struct address_space 
  * Page may still be unevictable for other reasons.
  *
  * lru_lock must not be held, interrupts must be enabled.
- * Must be called with page locked.
- *
- * return 1 if page still locked [not truncated], else 0
  */
-int putback_lru_page(struct page *page)
+#ifdef CONFIG_UNEVICTABLE_LRU
+void putback_lru_page(struct page *page)
 {
 	int lru;
-	int ret = 1;
 	int was_unevictable;
 
-	VM_BUG_ON(!PageLocked(page));
 	VM_BUG_ON(PageLRU(page));
 
+	was_unevictable = TestClearPageUnevictable(page);
+
+redo:
 	lru = !!TestClearPageActive(page);
-	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
 
-	if (unlikely(!page->mapping)) {
-		/*
-		 * page truncated.  drop lock as put_page() will
-		 * free the page.
-		 */
-		VM_BUG_ON(page_count(page) != 1);
-		unlock_page(page);
-		ret = 0;
-	} else if (page_evictable(page, NULL)) {
+	if (page_evictable(page, NULL)) {
 		/*
 		 * For evictable pages, we can use the cache.
 		 * In event of a race, worst case is we end up with an
@@ -519,40 +509,55 @@ int putback_lru_page(struct page *page)
 		 */
 		lru += page_is_file_cache(page);
 		lru_cache_add_lru(page, lru);
-		mem_cgroup_move_lists(page, lru);
-#ifdef CONFIG_UNEVICTABLE_LRU
-		if (was_unevictable)
-			count_vm_event(NORECL_PGRESCUED);
-#endif
 	} else {
 		/*
 		 * Put unevictable pages directly on zone's unevictable
 		 * list.
 		 */
+		lru = LRU_UNEVICTABLE;
 		add_page_to_unevictable_list(page);
-		mem_cgroup_move_lists(page, LRU_UNEVICTABLE);
-#ifdef CONFIG_UNEVICTABLE_LRU
-		if (!was_unevictable)
-			count_vm_event(NORECL_PGCULLED);
-#endif
 	}
 
+	mem_cgroup_move_lists(page, lru);
+
+	/*
+	 * page's status can change while we move it among lru. If an evictable
+	 * page is on unevictable list, it never be freed. To avoid that,
+	 * check after we added it to the list, again.
+	 */
+	if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
+		if (!isolate_lru_page(page)) {
+			ClearPageUnevictable(page);
+			put_page(page);
+			goto redo;
+		}
+		/* This means someone else dropped this page from LRU
+		 * So, it will be freed or putback to LRU again. There is
+		 * nothing to do here.
+		 */
+	}
+
+	if (was_unevictable && lru != LRU_UNEVICTABLE)
+		count_vm_event(NORECL_PGRESCUED);
+	else if (!was_unevictable && lru == LRU_UNEVICTABLE)
+		count_vm_event(NORECL_PGCULLED);
+
 	put_page(page);		/* drop ref from isolate */
-	return ret;		/* ret => "page still locked" */
 }
 
-/*
- * Cull page that shrink_*_list() has detected to be unevictable
- * under page lock to close races with other tasks that might be making
- * the page evictable.  Avoid stranding an evictable page on the
- * unevictable list.
- */
-static void cull_unevictable_page(struct page *page)
+#else
+
+void putback_lru_page(struct page *page)
 {
-	lock_page(page);
-	if (putback_lru_page(page))
-		unlock_page(page);
+	int lru;
+	VM_BUG_ON(PageLRU(page));
+
+	lru = !!TestClearPageActive(page) + page_is_file_cache(page);
+	lru_cache_add_lru(page, lru);
+	mem_cgroup_move_lists(page, lru);
+	put_page(page);
 }
+#endif
 
 /*
  * shrink_page_list() returns the number of reclaimed pages
@@ -746,8 +751,8 @@ free_it:
 		continue;
 
 cull_mlocked:
-		if (putback_lru_page(page))
-			unlock_page(page);
+		unlock_page(page);
+		putback_lru_page(page);
 		continue;
 
 activate_locked:
@@ -1127,7 +1132,7 @@ static unsigned long shrink_inactive_lis
 			list_del(&page->lru);
 			if (unlikely(!page_evictable(page, NULL))) {
 				spin_unlock_irq(&zone->lru_lock);
-				cull_unevictable_page(page);
+				putback_lru_page(page);
 				spin_lock_irq(&zone->lru_lock);
 				continue;
 			}
@@ -1231,7 +1236,7 @@ static void shrink_active_list(unsigned 
 		list_del(&page->lru);
 
 		if (unlikely(!page_evictable(page, NULL))) {
-			cull_unevictable_page(page);
+			putback_lru_page(page);
 			continue;
 		}
 
@@ -2394,9 +2399,6 @@ int zone_reclaim(struct zone *zone, gfp_
  */
 int page_evictable(struct page *page, struct vm_area_struct *vma)
 {
-
-	VM_BUG_ON(PageUnevictable(page));
-
 	if (mapping_unevictable(page_mapping(page)))
 		return 0;
 
@@ -2452,8 +2454,8 @@ static void show_page_path(struct page *
  */
 static void check_move_unevictable_page(struct page *page, struct zone *zone)
 {
-
-	ClearPageUnevictable(page); /* for page_evictable() */
+retry:
+	ClearPageUnevictable(page);
 	if (page_evictable(page, NULL)) {
 		enum lru_list l = LRU_INACTIVE_ANON + page_is_file_cache(page);
 
@@ -2469,6 +2471,8 @@ static void check_move_unevictable_page(
 		 */
 		SetPageUnevictable(page);
 		list_move(&page->lru, &zone->lru[LRU_UNEVICTABLE].list);
+		if (page_evictable(page, NULL))
+			goto retry;
 	}
 }
 
@@ -2508,16 +2512,6 @@ void scan_mapping_unevictable_pages(stru
 				next = page_index;
 			next++;
 
-			if (TestSetPageLocked(page)) {
-				/*
-				 * OK, let's do it the hard way...
-				 */
-				if (zone)
-					spin_unlock_irq(&zone->lru_lock);
-				zone = NULL;
-				lock_page(page);
-			}
-
 			if (pagezone != zone) {
 				if (zone)
 					spin_unlock_irq(&zone->lru_lock);
@@ -2527,9 +2521,6 @@ void scan_mapping_unevictable_pages(stru
 
 			if (PageLRU(page) && PageUnevictable(page))
 				check_move_unevictable_page(page, zone);
-
-			unlock_page(page);
-
 		}
 		if (zone)
 			spin_unlock_irq(&zone->lru_lock);
@@ -2565,15 +2556,10 @@ void scan_zone_unevictable_pages(struct 
 		for (scan = 0;  scan < batch_size; scan++) {
 			struct page *page = lru_to_page(l_unevictable);
 
-			if (TestSetPageLocked(page))
-				continue;
-
 			prefetchw_prev_lru_page(page, l_unevictable, flags);
 
 			if (likely(PageLRU(page) && PageUnevictable(page)))
 				check_move_unevictable_page(page, zone);
-
-			unlock_page(page);
 		}
 		spin_unlock_irq(&zone->lru_lock);
 
Index: b/mm/mlock.c
===================================================================
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -55,21 +55,22 @@ EXPORT_SYMBOL(can_do_mlock);
  */
 void __clear_page_mlock(struct page *page)
 {
-	VM_BUG_ON(!PageLocked(page));	/* for LRU isolate/putback */
 
 	dec_zone_page_state(page, NR_MLOCK);
 	count_vm_event(NORECL_PGCLEARED);
-	if (!isolate_lru_page(page)) {
-		putback_lru_page(page);
-	} else {
-		/*
-		 * Page not on the LRU yet.  Flush all pagevecs and retry.
-		 */
-		lru_add_drain_all();
-		if (!isolate_lru_page(page))
+	if (page->mapping) {	/* truncated ? */
+		if (!isolate_lru_page(page)) {
 			putback_lru_page(page);
-		else if (PageUnevictable(page))
-			count_vm_event(NORECL_PGSTRANDED);
+		} else {
+			/*
+			 *Page not on the LRU yet. Flush all pagevecs and retry.
+			 */
+			lru_add_drain_all();
+			if (!isolate_lru_page(page))
+				putback_lru_page(page);
+			else if (PageUnevictable(page))
+				count_vm_event(NORECL_PGSTRANDED);
+		}
 	}
 }
 
@@ -79,7 +80,7 @@ void __clear_page_mlock(struct page *pag
  */
 void mlock_vma_page(struct page *page)
 {
-	BUG_ON(!PageLocked(page));
+	VM_BUG_ON(!page->mapping);
 
 	if (!TestSetPageMlocked(page)) {
 		inc_zone_page_state(page, NR_MLOCK);
@@ -109,7 +110,7 @@ void mlock_vma_page(struct page *page)
  */
 static void munlock_vma_page(struct page *page)
 {
-	BUG_ON(!PageLocked(page));
+	VM_BUG_ON(!page->mapping);
 
 	if (TestClearPageMlocked(page)) {
 		dec_zone_page_state(page, NR_MLOCK);
@@ -169,7 +170,8 @@ static int __mlock_vma_pages_range(struc
 
 		/*
 		 * get_user_pages makes pages present if we are
-		 * setting mlock.
+		 * setting mlock. and this extra reference count will
+		 * disable migration of this page.
 		 */
 		ret = get_user_pages(current, mm, addr,
 				min_t(int, nr_pages, ARRAY_SIZE(pages)),
@@ -197,14 +199,8 @@ static int __mlock_vma_pages_range(struc
 		for (i = 0; i < ret; i++) {
 			struct page *page = pages[i];
 
-			/*
-			 * page might be truncated or migrated out from under
-			 * us.  Check after acquiring page lock.
-			 */
-			lock_page(page);
-			if (page->mapping)
+			if (page_mapcount(page))
 				mlock_vma_page(page);
-			unlock_page(page);
 			put_page(page);		/* ref from get_user_pages() */
 
 			/*
@@ -240,6 +236,9 @@ static int __munlock_pte_handler(pte_t *
 	struct page *page;
 	pte_t pte;
 
+	/*
+	 * page is never be unmapped by page-reclaim. we lock this page now.
+	 */
 retry:
 	pte = *ptep;
 	/*
@@ -261,7 +260,15 @@ retry:
 		goto out;
 
 	lock_page(page);
-	if (!page->mapping) {
+	/*
+	 * Because we lock page here, we have to check 2 cases.
+	 * - the page is migrated.
+	 * - the page is truncated (file-cache only)
+	 * Note: Anonymous page doesn't clear page->mapping even if it
+	 * is removed from rmap.
+	 */
+	if (!page->mapping ||
+	     (PageAnon(page) && !page_mapcount(page))) {
 		unlock_page(page);
 		goto retry;
 	}
Index: b/mm/migrate.c
===================================================================
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -67,9 +67,7 @@ int putback_lru_pages(struct list_head *
 
 	list_for_each_entry_safe(page, page2, l, lru) {
 		list_del(&page->lru);
-		lock_page(page);
-		if (putback_lru_page(page))
-			unlock_page(page);
+		putback_lru_page(page);
 		count++;
 	}
 	return count;
@@ -579,7 +577,6 @@ static int fallback_migrate_page(struct 
 static int move_to_new_page(struct page *newpage, struct page *page)
 {
 	struct address_space *mapping;
-	int unlock = 1;
 	int rc;
 
 	/*
@@ -614,16 +611,10 @@ static int move_to_new_page(struct page 
 
 	if (!rc) {
 		remove_migration_ptes(page, newpage);
-		/*
-		 * Put back on LRU while holding page locked to
-		 * handle potential race with, e.g., munlock()
-		 */
-		unlock = putback_lru_page(newpage);
 	} else
 		newpage->mapping = NULL;
 
-	if (unlock)
-		unlock_page(newpage);
+	unlock_page(newpage);
 
 	return rc;
 }
@@ -640,19 +631,18 @@ static int unmap_and_move(new_page_t get
 	struct page *newpage = get_new_page(page, private, &result);
 	int rcu_locked = 0;
 	int charge = 0;
-	int unlock = 1;
 
 	if (!newpage)
 		return -ENOMEM;
 
 	if (page_count(page) == 1)
 		/* page was freed from under us. So we are done. */
-		goto end_migration;
+		goto move_newpage;
 
 	charge = mem_cgroup_prepare_migration(page, newpage);
 	if (charge == -ENOMEM) {
 		rc = -ENOMEM;
-		goto end_migration;
+		goto move_newpage;
 	}
 	/* prepare cgroup just returns 0 or -ENOMEM */
 	BUG_ON(charge);
@@ -660,7 +650,7 @@ static int unmap_and_move(new_page_t get
 	rc = -EAGAIN;
 	if (TestSetPageLocked(page)) {
 		if (!force)
-			goto end_migration;
+			goto move_newpage;
 		lock_page(page);
 	}
 
@@ -721,39 +711,29 @@ rcu_unlock:
 		rcu_read_unlock();
 
 unlock:
+	unlock_page(page);
 
 	if (rc != -EAGAIN) {
  		/*
- 		 * A page that has been migrated has all references
- 		 * removed and will be freed. A page that has not been
- 		 * migrated will have kepts its references and be
- 		 * restored.
- 		 */
- 		list_del(&page->lru);
-		if (!page->mapping) {
-			VM_BUG_ON(page_count(page) != 1);
-			unlock_page(page);
-			put_page(page);		/* just free the old page */
-			goto end_migration;
-		} else
-			unlock = putback_lru_page(page);
+		 * A page that has been migrated has all references
+		 * removed and will be freed. A page that has not been
+		 * migrated will have kepts its references and be
+		 * restored.
+		 */
+		list_del(&page->lru);
+		putback_lru_page(page);
 	}
 
-	if (unlock)
-		unlock_page(page);
-
-end_migration:
+move_newpage:
 	if (!charge)
 		mem_cgroup_end_migration(newpage);
 
-	if (!newpage->mapping) {
-		/*
-		 * Migration failed or was never attempted.
-		 * Free the newpage.
-		 */
-		VM_BUG_ON(page_count(newpage) != 1);
-		put_page(newpage);
-	}
+	/*
+	 * Move the new page to the LRU. If migration was not successful
+	 * then this will free the page.
+	 */
+	putback_lru_page(newpage);
+
 	if (result) {
 		if (rc)
 			*result = rc;
Index: b/mm/internal.h
===================================================================
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -43,7 +43,7 @@ static inline void __put_page(struct pag
  * in mm/vmscan.c:
  */
 extern int isolate_lru_page(struct page *page);
-extern int putback_lru_page(struct page *page);
+extern void putback_lru_page(struct page *page);
 
 /*
  * in mm/page_alloc.c
Index: b/ipc/shm.c
===================================================================
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -737,7 +737,6 @@ asmlinkage long sys_shmctl(int shmid, in
 	case SHM_LOCK:
 	case SHM_UNLOCK:
 	{
-		struct address_space *mapping = NULL;
 		struct file *uninitialized_var(shm_file);
 
 		lru_add_drain_all();  /* drain pagevecs to lru lists */
@@ -769,29 +768,18 @@ asmlinkage long sys_shmctl(int shmid, in
 		if(cmd==SHM_LOCK) {
 			struct user_struct * user = current->user;
 			if (!is_file_hugepages(shp->shm_file)) {
-				mapping = shmem_lock(shp->shm_file, 1, user);
-				if (IS_ERR(mapping))
-					err = PTR_ERR(mapping);
-				mapping = NULL;
+				err = shmem_lock(shp->shm_file, 1, user);
 				if (!err && !(shp->shm_perm.mode & SHM_LOCKED)){
 					shp->shm_perm.mode |= SHM_LOCKED;
 					shp->mlock_user = user;
 				}
 			}
 		} else if (!is_file_hugepages(shp->shm_file)) {
-			mapping = shmem_lock(shp->shm_file, 0, shp->mlock_user);
+			shmem_lock(shp->shm_file, 0, shp->mlock_user);
 			shp->shm_perm.mode &= ~SHM_LOCKED;
 			shp->mlock_user = NULL;
-			if (mapping) {
-				shm_file = shp->shm_file;
-				get_file(shm_file);	/* hold across unlock */
-			}
 		}
 		shm_unlock(shp);
-		if (mapping) {
-			scan_mapping_unevictable_pages(mapping);
-			fput(shm_file);
-		}
 		goto out;
 	}
 	case IPC_RMID:
Index: b/mm/shmem.c
===================================================================
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1474,12 +1474,11 @@ static struct mempolicy *shmem_get_polic
 }
 #endif
 
-struct address_space *shmem_lock(struct file *file, int lock,
-				 struct user_struct *user)
+int shmem_lock(struct file *file, int lock, struct user_struct *user)
 {
 	struct inode *inode = file->f_path.dentry->d_inode;
 	struct shmem_inode_info *info = SHMEM_I(inode);
-	struct address_space *retval = ERR_PTR(-ENOMEM);
+	int retval = -ENOMEM;
 
 	spin_lock(&info->lock);
 	if (lock && !(info->flags & VM_LOCKED)) {
@@ -1487,14 +1486,14 @@ struct address_space *shmem_lock(struct 
 			goto out_nomem;
 		info->flags |= VM_LOCKED;
 		mapping_set_unevictable(file->f_mapping);
-		retval = NULL;
 	}
 	if (!lock && (info->flags & VM_LOCKED) && user) {
 		user_shm_unlock(inode->i_size, user);
 		info->flags &= ~VM_LOCKED;
 		mapping_clear_unevictable(file->f_mapping);
-		retval = file->f_mapping;
+		scan_mapping_unevictable_pages(file->f_mapping);
 	}
+	retval = 0;
 out_nomem:
 	spin_unlock(&info->lock);
 	return retval;
Index: b/include/linux/mm.h
===================================================================
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -706,13 +706,12 @@ static inline int page_mapped(struct pag
 extern void show_free_areas(void);
 
 #ifdef CONFIG_SHMEM
-extern struct address_space *shmem_lock(struct file *file, int lock,
-					struct user_struct *user);
+extern int shmem_lock(struct file *file, int lock, struct user_struct *user);
 #else
-static inline struct address_space *shmem_lock(struct file *file, int lock,
-					struct user_struct *user)
+static inline int shmem_lock(struct file *file, int lock,
+			     struct user_struct *user)
 {
-	return NULL;
+	return 0;
 }
 #endif
 struct file *shmem_file_setup(char *name, loff_t size, unsigned long flags);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [-mm][PATCH 10/10] putback_lru_page()/unevictable page handling rework v4
  2008-06-25 10:11 ` [-mm][PATCH 10/10] putback_lru_page()/unevictable page handling rework v4 KOSAKI Motohiro
@ 2008-06-25 10:14   ` KOSAKI Motohiro
  2008-06-25 22:13     ` Andrew Morton
  2008-06-25 16:29   ` Lee Schermerhorn
  1 sibling, 1 reply; 28+ messages in thread
From: KOSAKI Motohiro @ 2008-06-25 10:14 UTC (permalink / raw)
  To: LKML, linux-mm, Andrew Morton, Lee Schermerhorn, Rik van Riel,
	KAMEZAWA Hiroyuki
  Cc: kosaki.motohiro

> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroy@jp.fujitsu.com>

Agghh, typo to kamezawa-san's e-mail.
resend this patch.


===========================================================
putback_lru_page()/unevictable page handling rework.

Changelog
================
V3 -> V4
   o fix broken recheck logic in putback_lru_page().
   o fix shmem_lock() prototype.

V2 -> V3
   o remove lock_page() from scan_mapping_unevictable_pages() and
     scan_zone_unevictable_pages().
   o revert ipc/shm.c mm/shmem.c change of SHMEM unevictable patch.
     it become unnecessary by this patch.

V1 -> V2
   o undo unintented comment killing.
   o move putback_lru_page() from move_to_new_page() to unmap_and_move().
   o folded depend patch
       http://marc.info/?l=linux-mm&m=121337119621958&w=2
       http://marc.info/?l=linux-kernel&m=121362782406478&w=2
       http://marc.info/?l=linux-mm&m=121377572909776&w=2


Now, putback_lru_page() requires that the page is locked.
And in some special case, implicitly unlock it.

This patch tries to make putback_lru_pages() to be lock_page() free.
(Of course, some callers must take the lock.)

The main reason that putback_lru_page() assumes that page is locked
is to avoid the change in page's status among Mlocked/Not-Mlocked.

Once it is added to unevictable list, the page is removed from
unevictable list only when page is munlocked. (there are other special
case. but we ignore the special case.)
So, status change during putback_lru_page() is fatal and page should 
be locked.

putback_lru_page() in this patch has a new concepts.
When it adds page to unevictable list, it checks the status is 
changed or not again. if changed, retry to putback.

This patche changes also caller side and cleaning up lock/unlock_page().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

---
 include/linux/mm.h |    9 +---
 ipc/shm.c          |   16 -------
 mm/internal.h      |    2 
 mm/migrate.c       |   60 +++++++++------------------
 mm/mlock.c         |   51 +++++++++++++----------
 mm/shmem.c         |    9 +---
 mm/vmscan.c        |  114 +++++++++++++++++++++++------------------------------
 7 files changed, 110 insertions(+), 151 deletions(-)

Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -486,31 +486,21 @@ int remove_mapping(struct address_space 
  * Page may still be unevictable for other reasons.
  *
  * lru_lock must not be held, interrupts must be enabled.
- * Must be called with page locked.
- *
- * return 1 if page still locked [not truncated], else 0
  */
-int putback_lru_page(struct page *page)
+#ifdef CONFIG_UNEVICTABLE_LRU
+void putback_lru_page(struct page *page)
 {
 	int lru;
-	int ret = 1;
 	int was_unevictable;
 
-	VM_BUG_ON(!PageLocked(page));
 	VM_BUG_ON(PageLRU(page));
 
+	was_unevictable = TestClearPageUnevictable(page);
+
+redo:
 	lru = !!TestClearPageActive(page);
-	was_unevictable = TestClearPageUnevictable(page); /* for page_evictable() */
 
-	if (unlikely(!page->mapping)) {
-		/*
-		 * page truncated.  drop lock as put_page() will
-		 * free the page.
-		 */
-		VM_BUG_ON(page_count(page) != 1);
-		unlock_page(page);
-		ret = 0;
-	} else if (page_evictable(page, NULL)) {
+	if (page_evictable(page, NULL)) {
 		/*
 		 * For evictable pages, we can use the cache.
 		 * In event of a race, worst case is we end up with an
@@ -519,40 +509,55 @@ int putback_lru_page(struct page *page)
 		 */
 		lru += page_is_file_cache(page);
 		lru_cache_add_lru(page, lru);
-		mem_cgroup_move_lists(page, lru);
-#ifdef CONFIG_UNEVICTABLE_LRU
-		if (was_unevictable)
-			count_vm_event(NORECL_PGRESCUED);
-#endif
 	} else {
 		/*
 		 * Put unevictable pages directly on zone's unevictable
 		 * list.
 		 */
+		lru = LRU_UNEVICTABLE;
 		add_page_to_unevictable_list(page);
-		mem_cgroup_move_lists(page, LRU_UNEVICTABLE);
-#ifdef CONFIG_UNEVICTABLE_LRU
-		if (!was_unevictable)
-			count_vm_event(NORECL_PGCULLED);
-#endif
 	}
 
+	mem_cgroup_move_lists(page, lru);
+
+	/*
+	 * page's status can change while we move it among lru. If an evictable
+	 * page is on unevictable list, it never be freed. To avoid that,
+	 * check after we added it to the list, again.
+	 */
+	if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) {
+		if (!isolate_lru_page(page)) {
+			ClearPageUnevictable(page);
+			put_page(page);
+			goto redo;
+		}
+		/* This means someone else dropped this page from LRU
+		 * So, it will be freed or putback to LRU again. There is
+		 * nothing to do here.
+		 */
+	}
+
+	if (was_unevictable && lru != LRU_UNEVICTABLE)
+		count_vm_event(NORECL_PGRESCUED);
+	else if (!was_unevictable && lru == LRU_UNEVICTABLE)
+		count_vm_event(NORECL_PGCULLED);
+
 	put_page(page);		/* drop ref from isolate */
-	return ret;		/* ret => "page still locked" */
 }
 
-/*
- * Cull page that shrink_*_list() has detected to be unevictable
- * under page lock to close races with other tasks that might be making
- * the page evictable.  Avoid stranding an evictable page on the
- * unevictable list.
- */
-static void cull_unevictable_page(struct page *page)
+#else
+
+void putback_lru_page(struct page *page)
 {
-	lock_page(page);
-	if (putback_lru_page(page))
-		unlock_page(page);
+	int lru;
+	VM_BUG_ON(PageLRU(page));
+
+	lru = !!TestClearPageActive(page) + page_is_file_cache(page);
+	lru_cache_add_lru(page, lru);
+	mem_cgroup_move_lists(page, lru);
+	put_page(page);
 }
+#endif
 
 /*
  * shrink_page_list() returns the number of reclaimed pages
@@ -746,8 +751,8 @@ free_it:
 		continue;
 
 cull_mlocked:
-		if (putback_lru_page(page))
-			unlock_page(page);
+		unlock_page(page);
+		putback_lru_page(page);
 		continue;
 
 activate_locked:
@@ -1127,7 +1132,7 @@ static unsigned long shrink_inactive_lis
 			list_del(&page->lru);
 			if (unlikely(!page_evictable(page, NULL))) {
 				spin_unlock_irq(&zone->lru_lock);
-				cull_unevictable_page(page);
+				putback_lru_page(page);
 				spin_lock_irq(&zone->lru_lock);
 				continue;
 			}
@@ -1231,7 +1236,7 @@ static void shrink_active_list(unsigned 
 		list_del(&page->lru);
 
 		if (unlikely(!page_evictable(page, NULL))) {
-			cull_unevictable_page(page);
+			putback_lru_page(page);
 			continue;
 		}
 
@@ -2394,9 +2399,6 @@ int zone_reclaim(struct zone *zone, gfp_
  */
 int page_evictable(struct page *page, struct vm_area_struct *vma)
 {
-
-	VM_BUG_ON(PageUnevictable(page));
-
 	if (mapping_unevictable(page_mapping(page)))
 		return 0;
 
@@ -2452,8 +2454,8 @@ static void show_page_path(struct page *
  */
 static void check_move_unevictable_page(struct page *page, struct zone *zone)
 {
-
-	ClearPageUnevictable(page); /* for page_evictable() */
+retry:
+	ClearPageUnevictable(page);
 	if (page_evictable(page, NULL)) {
 		enum lru_list l = LRU_INACTIVE_ANON + page_is_file_cache(page);
 
@@ -2469,6 +2471,8 @@ static void check_move_unevictable_page(
 		 */
 		SetPageUnevictable(page);
 		list_move(&page->lru, &zone->lru[LRU_UNEVICTABLE].list);
+		if (page_evictable(page, NULL))
+			goto retry;
 	}
 }
 
@@ -2508,16 +2512,6 @@ void scan_mapping_unevictable_pages(stru
 				next = page_index;
 			next++;
 
-			if (TestSetPageLocked(page)) {
-				/*
-				 * OK, let's do it the hard way...
-				 */
-				if (zone)
-					spin_unlock_irq(&zone->lru_lock);
-				zone = NULL;
-				lock_page(page);
-			}
-
 			if (pagezone != zone) {
 				if (zone)
 					spin_unlock_irq(&zone->lru_lock);
@@ -2527,9 +2521,6 @@ void scan_mapping_unevictable_pages(stru
 
 			if (PageLRU(page) && PageUnevictable(page))
 				check_move_unevictable_page(page, zone);
-
-			unlock_page(page);
-
 		}
 		if (zone)
 			spin_unlock_irq(&zone->lru_lock);
@@ -2565,15 +2556,10 @@ void scan_zone_unevictable_pages(struct 
 		for (scan = 0;  scan < batch_size; scan++) {
 			struct page *page = lru_to_page(l_unevictable);
 
-			if (TestSetPageLocked(page))
-				continue;
-
 			prefetchw_prev_lru_page(page, l_unevictable, flags);
 
 			if (likely(PageLRU(page) && PageUnevictable(page)))
 				check_move_unevictable_page(page, zone);
-
-			unlock_page(page);
 		}
 		spin_unlock_irq(&zone->lru_lock);
 
Index: b/mm/mlock.c
===================================================================
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -55,21 +55,22 @@ EXPORT_SYMBOL(can_do_mlock);
  */
 void __clear_page_mlock(struct page *page)
 {
-	VM_BUG_ON(!PageLocked(page));	/* for LRU isolate/putback */
 
 	dec_zone_page_state(page, NR_MLOCK);
 	count_vm_event(NORECL_PGCLEARED);
-	if (!isolate_lru_page(page)) {
-		putback_lru_page(page);
-	} else {
-		/*
-		 * Page not on the LRU yet.  Flush all pagevecs and retry.
-		 */
-		lru_add_drain_all();
-		if (!isolate_lru_page(page))
+	if (page->mapping) {	/* truncated ? */
+		if (!isolate_lru_page(page)) {
 			putback_lru_page(page);
-		else if (PageUnevictable(page))
-			count_vm_event(NORECL_PGSTRANDED);
+		} else {
+			/*
+			 *Page not on the LRU yet. Flush all pagevecs and retry.
+			 */
+			lru_add_drain_all();
+			if (!isolate_lru_page(page))
+				putback_lru_page(page);
+			else if (PageUnevictable(page))
+				count_vm_event(NORECL_PGSTRANDED);
+		}
 	}
 }
 
@@ -79,7 +80,7 @@ void __clear_page_mlock(struct page *pag
  */
 void mlock_vma_page(struct page *page)
 {
-	BUG_ON(!PageLocked(page));
+	VM_BUG_ON(!page->mapping);
 
 	if (!TestSetPageMlocked(page)) {
 		inc_zone_page_state(page, NR_MLOCK);
@@ -109,7 +110,7 @@ void mlock_vma_page(struct page *page)
  */
 static void munlock_vma_page(struct page *page)
 {
-	BUG_ON(!PageLocked(page));
+	VM_BUG_ON(!page->mapping);
 
 	if (TestClearPageMlocked(page)) {
 		dec_zone_page_state(page, NR_MLOCK);
@@ -169,7 +170,8 @@ static int __mlock_vma_pages_range(struc
 
 		/*
 		 * get_user_pages makes pages present if we are
-		 * setting mlock.
+		 * setting mlock. and this extra reference count will
+		 * disable migration of this page.
 		 */
 		ret = get_user_pages(current, mm, addr,
 				min_t(int, nr_pages, ARRAY_SIZE(pages)),
@@ -197,14 +199,8 @@ static int __mlock_vma_pages_range(struc
 		for (i = 0; i < ret; i++) {
 			struct page *page = pages[i];
 
-			/*
-			 * page might be truncated or migrated out from under
-			 * us.  Check after acquiring page lock.
-			 */
-			lock_page(page);
-			if (page->mapping)
+			if (page_mapcount(page))
 				mlock_vma_page(page);
-			unlock_page(page);
 			put_page(page);		/* ref from get_user_pages() */
 
 			/*
@@ -240,6 +236,9 @@ static int __munlock_pte_handler(pte_t *
 	struct page *page;
 	pte_t pte;
 
+	/*
+	 * page is never be unmapped by page-reclaim. we lock this page now.
+	 */
 retry:
 	pte = *ptep;
 	/*
@@ -261,7 +260,15 @@ retry:
 		goto out;
 
 	lock_page(page);
-	if (!page->mapping) {
+	/*
+	 * Because we lock page here, we have to check 2 cases.
+	 * - the page is migrated.
+	 * - the page is truncated (file-cache only)
+	 * Note: Anonymous page doesn't clear page->mapping even if it
+	 * is removed from rmap.
+	 */
+	if (!page->mapping ||
+	     (PageAnon(page) && !page_mapcount(page))) {
 		unlock_page(page);
 		goto retry;
 	}
Index: b/mm/migrate.c
===================================================================
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -67,9 +67,7 @@ int putback_lru_pages(struct list_head *
 
 	list_for_each_entry_safe(page, page2, l, lru) {
 		list_del(&page->lru);
-		lock_page(page);
-		if (putback_lru_page(page))
-			unlock_page(page);
+		putback_lru_page(page);
 		count++;
 	}
 	return count;
@@ -579,7 +577,6 @@ static int fallback_migrate_page(struct 
 static int move_to_new_page(struct page *newpage, struct page *page)
 {
 	struct address_space *mapping;
-	int unlock = 1;
 	int rc;
 
 	/*
@@ -614,16 +611,10 @@ static int move_to_new_page(struct page 
 
 	if (!rc) {
 		remove_migration_ptes(page, newpage);
-		/*
-		 * Put back on LRU while holding page locked to
-		 * handle potential race with, e.g., munlock()
-		 */
-		unlock = putback_lru_page(newpage);
 	} else
 		newpage->mapping = NULL;
 
-	if (unlock)
-		unlock_page(newpage);
+	unlock_page(newpage);
 
 	return rc;
 }
@@ -640,19 +631,18 @@ static int unmap_and_move(new_page_t get
 	struct page *newpage = get_new_page(page, private, &result);
 	int rcu_locked = 0;
 	int charge = 0;
-	int unlock = 1;
 
 	if (!newpage)
 		return -ENOMEM;
 
 	if (page_count(page) == 1)
 		/* page was freed from under us. So we are done. */
-		goto end_migration;
+		goto move_newpage;
 
 	charge = mem_cgroup_prepare_migration(page, newpage);
 	if (charge == -ENOMEM) {
 		rc = -ENOMEM;
-		goto end_migration;
+		goto move_newpage;
 	}
 	/* prepare cgroup just returns 0 or -ENOMEM */
 	BUG_ON(charge);
@@ -660,7 +650,7 @@ static int unmap_and_move(new_page_t get
 	rc = -EAGAIN;
 	if (TestSetPageLocked(page)) {
 		if (!force)
-			goto end_migration;
+			goto move_newpage;
 		lock_page(page);
 	}
 
@@ -721,39 +711,29 @@ rcu_unlock:
 		rcu_read_unlock();
 
 unlock:
+	unlock_page(page);
 
 	if (rc != -EAGAIN) {
  		/*
- 		 * A page that has been migrated has all references
- 		 * removed and will be freed. A page that has not been
- 		 * migrated will have kepts its references and be
- 		 * restored.
- 		 */
- 		list_del(&page->lru);
-		if (!page->mapping) {
-			VM_BUG_ON(page_count(page) != 1);
-			unlock_page(page);
-			put_page(page);		/* just free the old page */
-			goto end_migration;
-		} else
-			unlock = putback_lru_page(page);
+		 * A page that has been migrated has all references
+		 * removed and will be freed. A page that has not been
+		 * migrated will have kepts its references and be
+		 * restored.
+		 */
+		list_del(&page->lru);
+		putback_lru_page(page);
 	}
 
-	if (unlock)
-		unlock_page(page);
-
-end_migration:
+move_newpage:
 	if (!charge)
 		mem_cgroup_end_migration(newpage);
 
-	if (!newpage->mapping) {
-		/*
-		 * Migration failed or was never attempted.
-		 * Free the newpage.
-		 */
-		VM_BUG_ON(page_count(newpage) != 1);
-		put_page(newpage);
-	}
+	/*
+	 * Move the new page to the LRU. If migration was not successful
+	 * then this will free the page.
+	 */
+	putback_lru_page(newpage);
+
 	if (result) {
 		if (rc)
 			*result = rc;
Index: b/mm/internal.h
===================================================================
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -43,7 +43,7 @@ static inline void __put_page(struct pag
  * in mm/vmscan.c:
  */
 extern int isolate_lru_page(struct page *page);
-extern int putback_lru_page(struct page *page);
+extern void putback_lru_page(struct page *page);
 
 /*
  * in mm/page_alloc.c
Index: b/ipc/shm.c
===================================================================
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -737,7 +737,6 @@ asmlinkage long sys_shmctl(int shmid, in
 	case SHM_LOCK:
 	case SHM_UNLOCK:
 	{
-		struct address_space *mapping = NULL;
 		struct file *uninitialized_var(shm_file);
 
 		lru_add_drain_all();  /* drain pagevecs to lru lists */
@@ -769,29 +768,18 @@ asmlinkage long sys_shmctl(int shmid, in
 		if(cmd==SHM_LOCK) {
 			struct user_struct * user = current->user;
 			if (!is_file_hugepages(shp->shm_file)) {
-				mapping = shmem_lock(shp->shm_file, 1, user);
-				if (IS_ERR(mapping))
-					err = PTR_ERR(mapping);
-				mapping = NULL;
+				err = shmem_lock(shp->shm_file, 1, user);
 				if (!err && !(shp->shm_perm.mode & SHM_LOCKED)){
 					shp->shm_perm.mode |= SHM_LOCKED;
 					shp->mlock_user = user;
 				}
 			}
 		} else if (!is_file_hugepages(shp->shm_file)) {
-			mapping = shmem_lock(shp->shm_file, 0, shp->mlock_user);
+			shmem_lock(shp->shm_file, 0, shp->mlock_user);
 			shp->shm_perm.mode &= ~SHM_LOCKED;
 			shp->mlock_user = NULL;
-			if (mapping) {
-				shm_file = shp->shm_file;
-				get_file(shm_file);	/* hold across unlock */
-			}
 		}
 		shm_unlock(shp);
-		if (mapping) {
-			scan_mapping_unevictable_pages(mapping);
-			fput(shm_file);
-		}
 		goto out;
 	}
 	case IPC_RMID:
Index: b/mm/shmem.c
===================================================================
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1474,12 +1474,11 @@ static struct mempolicy *shmem_get_polic
 }
 #endif
 
-struct address_space *shmem_lock(struct file *file, int lock,
-				 struct user_struct *user)
+int shmem_lock(struct file *file, int lock, struct user_struct *user)
 {
 	struct inode *inode = file->f_path.dentry->d_inode;
 	struct shmem_inode_info *info = SHMEM_I(inode);
-	struct address_space *retval = ERR_PTR(-ENOMEM);
+	int retval = -ENOMEM;
 
 	spin_lock(&info->lock);
 	if (lock && !(info->flags & VM_LOCKED)) {
@@ -1487,14 +1486,14 @@ struct address_space *shmem_lock(struct 
 			goto out_nomem;
 		info->flags |= VM_LOCKED;
 		mapping_set_unevictable(file->f_mapping);
-		retval = NULL;
 	}
 	if (!lock && (info->flags & VM_LOCKED) && user) {
 		user_shm_unlock(inode->i_size, user);
 		info->flags &= ~VM_LOCKED;
 		mapping_clear_unevictable(file->f_mapping);
-		retval = file->f_mapping;
+		scan_mapping_unevictable_pages(file->f_mapping);
 	}
+	retval = 0;
 out_nomem:
 	spin_unlock(&info->lock);
 	return retval;
Index: b/include/linux/mm.h
===================================================================
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -706,13 +706,12 @@ static inline int page_mapped(struct pag
 extern void show_free_areas(void);
 
 #ifdef CONFIG_SHMEM
-extern struct address_space *shmem_lock(struct file *file, int lock,
-					struct user_struct *user);
+extern int shmem_lock(struct file *file, int lock, struct user_struct *user);
 #else
-static inline struct address_space *shmem_lock(struct file *file, int lock,
-					struct user_struct *user)
+static inline int shmem_lock(struct file *file, int lock,
+			     struct user_struct *user)
 {
-	return NULL;
+	return 0;
 }
 #endif
 struct file *shmem_file_setup(char *name, loff_t size, unsigned long flags);


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [-mm][PATCH 10/10] putback_lru_page()/unevictable page handling rework v4
  2008-06-25 10:14   ` KOSAKI Motohiro
@ 2008-06-25 22:13     ` Andrew Morton
  2008-06-26  1:31       ` KOSAKI Motohiro
  0 siblings, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2008-06-25 22:13 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-kernel, linux-mm, Lee.Schermerhorn, riel, kamezawa.hiroyu

On Wed, 25 Jun 2008 19:14:54 +0900
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> putback_lru_page()/unevictable page handling rework.

The other nine patches slotted into the patch series quite nicely. 
This means that those nine patches can later be folded into the patches
which they fixed and everything is nice and logical.

But this patch is not like that - it changes code which was added by
lots of different patches.  This means that if I merge it, this patch
besomes a sort of impermeable barrier which other patches cannot be
reordered across.

And that's kind-of OK.  It's messy, but we could live with it.  However
as I expect there will be more fixes to these patches before all this
work goes into mainline, this particular patch will become more of a
problem as it will make the whole body of work more messy and harder to
review and understand.

So.  Can this patch be simplified in any way?  Or split up into
finer-grained patches or something like that?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [-mm][PATCH 10/10] putback_lru_page()/unevictable page handling rework v4
  2008-06-25 22:13     ` Andrew Morton
@ 2008-06-26  1:31       ` KOSAKI Motohiro
  0 siblings, 0 replies; 28+ messages in thread
From: KOSAKI Motohiro @ 2008-06-26  1:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro, linux-kernel, linux-mm, Lee.Schermerhorn, riel,
	kamezawa.hiroyu

> And that's kind-of OK.  It's messy, but we could live with it.  However
> as I expect there will be more fixes to these patches before all this
> work goes into mainline, this particular patch will become more of a
> problem as it will make the whole body of work more messy and harder to
> review and understand.
> 
> So.  Can this patch be simplified in any way?  Or split up into
> finer-grained patches or something like that?

Yes, sir!
I'll do it.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [-mm][PATCH 10/10] putback_lru_page()/unevictable page handling rework v4
  2008-06-25 10:11 ` [-mm][PATCH 10/10] putback_lru_page()/unevictable page handling rework v4 KOSAKI Motohiro
  2008-06-25 10:14   ` KOSAKI Motohiro
@ 2008-06-25 16:29   ` Lee Schermerhorn
  2008-06-26  8:08     ` KOSAKI Motohiro
  1 sibling, 1 reply; 28+ messages in thread
From: Lee Schermerhorn @ 2008-06-25 16:29 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, linux-mm, Andrew Morton, Rik van Riel, KAMEZAWA Hiroyuki

I'm updating the unevictable-lru doc in Documentation/vm.
I have a question, below, on the removal of page_lock() from
__mlock_vma_pages_range().  The document discusses how we hold the page
lock when calling mlock_vma_page() to prevent races with migration
[addressed by putback_lru_page() rework] and truncation.  I'm wondering
if we're properly protected from truncation now...

On Wed, 2008-06-25 at 19:11 +0900, KOSAKI Motohiro wrote:
> 
> Changelog
> ================
> V3 -> V4
>    o fix broken recheck logic in putback_lru_page().
>    o fix shmem_lock() prototype.
> 
> V2 -> V3
>    o remove lock_page() from scan_mapping_unevictable_pages() and
>      scan_zone_unevictable_pages().
>    o revert ipc/shm.c mm/shmem.c change of SHMEM unevictable patch.
>      it become unnecessary by this patch.
> 
> V1 -> V2
>    o undo unintented comment killing.
>    o move putback_lru_page() from move_to_new_page() to unmap_and_move().
>    o folded depend patch
>        http://marc.info/?l=linux-mm&m=121337119621958&w=2
>        http://marc.info/?l=linux-kernel&m=121362782406478&w=2
>        http://marc.info/?l=linux-mm&m=121377572909776&w=2
> 
> 
> Now, putback_lru_page() requires that the page is locked.
> And in some special case, implicitly unlock it.
> 
> This patch tries to make putback_lru_pages() to be lock_page() free.
> (Of course, some callers must take the lock.)
> 
> The main reason that putback_lru_page() assumes that page is locked
> is to avoid the change in page's status among Mlocked/Not-Mlocked.
> 
> Once it is added to unevictable list, the page is removed from
> unevictable list only when page is munlocked. (there are other special
> case. but we ignore the special case.)
> So, status change during putback_lru_page() is fatal and page should 
> be locked.
> 
> putback_lru_page() in this patch has a new concepts.
> When it adds page to unevictable list, it checks the status is 
> changed or not again. if changed, retry to putback.
> 
> This patche changes also caller side and cleaning up lock/unlock_page().
> 
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroy@jp.fujitsu.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> 
> ---
>  include/linux/mm.h |    9 +---
>  ipc/shm.c          |   16 -------
>  mm/internal.h      |    2 
>  mm/migrate.c       |   60 +++++++++------------------
>  mm/mlock.c         |   51 +++++++++++++----------
>  mm/shmem.c         |    9 +---
>  mm/vmscan.c        |  114 +++++++++++++++++++++++------------------------------
>  7 files changed, 110 insertions(+), 151 deletions(-)
> 
> Index: b/mm/vmscan.c
> ===================================================================

<snip>

> Index: b/mm/mlock.c
> ===================================================================
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -55,21 +55,22 @@ EXPORT_SYMBOL(can_do_mlock);
>   */
>  void __clear_page_mlock(struct page *page)
>  {
> -	VM_BUG_ON(!PageLocked(page));	/* for LRU isolate/putback */
>  
>  	dec_zone_page_state(page, NR_MLOCK);
>  	count_vm_event(NORECL_PGCLEARED);
> -	if (!isolate_lru_page(page)) {
> -		putback_lru_page(page);
> -	} else {
> -		/*
> -		 * Page not on the LRU yet.  Flush all pagevecs and retry.
> -		 */
> -		lru_add_drain_all();
> -		if (!isolate_lru_page(page))
> +	if (page->mapping) {	/* truncated ? */
> +		if (!isolate_lru_page(page)) {
>  			putback_lru_page(page);
> -		else if (PageUnevictable(page))
> -			count_vm_event(NORECL_PGSTRANDED);
> +		} else {
> +			/*
> +			 *Page not on the LRU yet. Flush all pagevecs and retry.
> +			 */
> +			lru_add_drain_all();
> +			if (!isolate_lru_page(page))
> +				putback_lru_page(page);
> +			else if (PageUnevictable(page))
> +				count_vm_event(NORECL_PGSTRANDED);
> +		}
>  	}
>  }
>  
> @@ -79,7 +80,7 @@ void __clear_page_mlock(struct page *pag
>   */
>  void mlock_vma_page(struct page *page)
>  {
> -	BUG_ON(!PageLocked(page));
> +	VM_BUG_ON(!page->mapping);

If we're not holding the page locked here, can the page be truncated out
from under us?  If so, I think we could hit this BUG or, if we just miss
it, we could end up setting PageMlocked on a truncated page, and end up
freeing an mlocked page.

>  
>  	if (!TestSetPageMlocked(page)) {
>  		inc_zone_page_state(page, NR_MLOCK);
> @@ -109,7 +110,7 @@ void mlock_vma_page(struct page *page)
>   */
>  static void munlock_vma_page(struct page *page)
>  {
> -	BUG_ON(!PageLocked(page));
> +	VM_BUG_ON(!page->mapping);
>  
>  	if (TestClearPageMlocked(page)) {
>  		dec_zone_page_state(page, NR_MLOCK);
> @@ -169,7 +170,8 @@ static int __mlock_vma_pages_range(struc
>  
>  		/*
>  		 * get_user_pages makes pages present if we are
> -		 * setting mlock.
> +		 * setting mlock. and this extra reference count will
> +		 * disable migration of this page.
>  		 */
>  		ret = get_user_pages(current, mm, addr,
>  				min_t(int, nr_pages, ARRAY_SIZE(pages)),
> @@ -197,14 +199,8 @@ static int __mlock_vma_pages_range(struc
>  		for (i = 0; i < ret; i++) {
>  			struct page *page = pages[i];
>  
> -			/*
> -			 * page might be truncated or migrated out from under
> -			 * us.  Check after acquiring page lock.
> -			 */
> -			lock_page(page);
Safe to remove the locking?  I.e., page can't be truncated here?

> -			if (page->mapping)
> +			if (page_mapcount(page))
>  				mlock_vma_page(page);
> -			unlock_page(page);
>  			put_page(page);		/* ref from get_user_pages() */
>  
>  			/*
> @@ -240,6 +236,9 @@ static int __munlock_pte_handler(pte_t *
>  	struct page *page;
>  	pte_t pte;
>  
> +	/*
> +	 * page is never be unmapped by page-reclaim. we lock this page now.
> +	 */
>  retry:
>  	pte = *ptep;
>  	/*
> @@ -261,7 +260,15 @@ retry:
>  		goto out;
>  
>  	lock_page(page);
> -	if (!page->mapping) {
> +	/*
> +	 * Because we lock page here, we have to check 2 cases.
> +	 * - the page is migrated.
> +	 * - the page is truncated (file-cache only)
> +	 * Note: Anonymous page doesn't clear page->mapping even if it
> +	 * is removed from rmap.
> +	 */
> +	if (!page->mapping ||
> +	     (PageAnon(page) && !page_mapcount(page))) {
>  		unlock_page(page);
>  		goto retry;
>  	}
> Index: b/mm/migrate.c
> ===================================================================

<snip>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [-mm][PATCH 10/10] putback_lru_page()/unevictable page handling rework v4
  2008-06-25 16:29   ` Lee Schermerhorn
@ 2008-06-26  8:08     ` KOSAKI Motohiro
  0 siblings, 0 replies; 28+ messages in thread
From: KOSAKI Motohiro @ 2008-06-26  8:08 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, Rik van Riel,
	KAMEZAWA Hiroyuki

> I'm updating the unevictable-lru doc in Documentation/vm.
> I have a question, below, on the removal of page_lock() from
> __mlock_vma_pages_range().  The document discusses how we hold the page
> lock when calling mlock_vma_page() to prevent races with migration
> [addressed by putback_lru_page() rework] and truncation.  I'm wondering
> if we're properly protected from truncation now...

Thanks for careful review.
I'll fix it and split into sevaral patches for easy review.


> > @@ -79,7 +80,7 @@ void __clear_page_mlock(struct page *pag
> >   */
> >  void mlock_vma_page(struct page *page)
> >  {
> > -	BUG_ON(!PageLocked(page));
> > +	VM_BUG_ON(!page->mapping);
> 
> If we're not holding the page locked here, can the page be truncated out
> from under us?  If so, I think we could hit this BUG or, if we just miss
> it, we could end up setting PageMlocked on a truncated page, and end up
> freeing an mlocked page.

this is obiously folding mistake by me ;)
this VM_BUG_ON() should be removed.



> > @@ -169,7 +170,8 @@ static int __mlock_vma_pages_range(struc
> >  
> >  		/*
> >  		 * get_user_pages makes pages present if we are
> > -		 * setting mlock.
> > +		 * setting mlock. and this extra reference count will
> > +		 * disable migration of this page.
> >  		 */
> >  		ret = get_user_pages(current, mm, addr,
> >  				min_t(int, nr_pages, ARRAY_SIZE(pages)),
> > @@ -197,14 +199,8 @@ static int __mlock_vma_pages_range(struc
> >  		for (i = 0; i < ret; i++) {
> >  			struct page *page = pages[i];
> >  
> > -			/*
> > -			 * page might be truncated or migrated out from under
> > -			 * us.  Check after acquiring page lock.
> > -			 */
> > -			lock_page(page);
> Safe to remove the locking?  I.e., page can't be truncated here?

you are right.
this lock_page() is necessary.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [-mm][PATCH 0/10]  memory related bugfix set for 2.6.26-rc5-mm3 v2
  2008-06-25  9:59 [-mm][PATCH 0/10] memory related bugfix set for 2.6.26-rc5-mm3 v2 KOSAKI Motohiro
                   ` (9 preceding siblings ...)
  2008-06-25 10:11 ` [-mm][PATCH 10/10] putback_lru_page()/unevictable page handling rework v4 KOSAKI Motohiro
@ 2008-06-25 15:09 ` Lee Schermerhorn
  10 siblings, 0 replies; 28+ messages in thread
From: Lee Schermerhorn @ 2008-06-25 15:09 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton, Rik van Riel

On Wed, 2008-06-25 at 18:59 +0900, KOSAKI Motohiro wrote:
> Hi, Andrew and mm guys!
> 
> this is mm related fixes patchset for 2.6.26-rc5-mm3 v2.
> 
> Unfortunately, this version has several bugs and 
> some bugs depend on each other.
> So, I collect, sort, and fold these patchs.
> 
> 
> btw: I wrote "this patch still crashed" last midnight.
> but it works well today.
> umm.. I was dreaming?

Yes.  I ran my stress load with Nishimura-san's cpuset migration test on
x86_64 and ia64 platforms overnight.  I didn't have all of the memcgroup
patches applied--just the unevictable lru related patches.  Tests ran
for ~19 hours--including 70k-80k passes through the cpuset migration
test--until I shut them down w/o error.  

OK, I did see two oom kills on the ia64.  My stress load was already
pretty close to edge, but they look suspect because I still had a couple
of MB free on each node according to the console logs.  The system did
seem to choose a reasonable task to kill, tho'--a memtoy test that locks
down 10s of GB of memory.

> 
> Anyway, I believe this patchset improve robustness and
> provide better testing baseline.
> 
> enjoy!

I'll restart the tests with this series.

> 
> 
> Andrew, this patchset is my silver-spoon.
> if you like it, I'm glad too.
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2008-07-03 13:16 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-25  9:59 [-mm][PATCH 0/10] memory related bugfix set for 2.6.26-rc5-mm3 v2 KOSAKI Motohiro
2008-06-25 10:01 ` [-mm][PATCH 1/10] fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build KOSAKI Motohiro
2008-07-03  5:36   ` Andrew Morton
2008-07-03  6:02     ` KOSAKI Motohiro
2008-07-03 13:16       ` Rik van Riel
2008-06-25 10:02 ` [-mm][PATCH 2/10] fix printk in show_free_areas() KOSAKI Motohiro
2008-06-25 10:03 ` [-mm][PATCH 3/10] fix munlock page table walk - now requires 'mm' KOSAKI Motohiro
2008-06-25 10:04 ` [-mm][PATCH 4/10] fix migration_entry_wait() for speculative page cache KOSAKI Motohiro
2008-06-25 10:05 ` [-mm][PATCH 5/10] collect lru meminfo statistics from correct offset KOSAKI Motohiro
2008-06-25 10:06 ` [-mm][PATCH 6/10] fix incorrect Mlocked field of /proc/meminfo KOSAKI Motohiro
2008-06-25 10:07 ` [-mm][PATCH 7/10] prevent incorrect oom under split_lru KOSAKI Motohiro
2008-06-25 10:09 ` [-mm][PATCH 8/10] fix shmem page migration incorrectness on memcgroup KOSAKI Motohiro
2008-06-27  5:08   ` MinChan Kim
2008-06-27  5:41     ` KOSAKI Motohiro
2008-06-27  7:57       ` MinChan Kim
2008-06-27  8:52         ` KAMEZAWA Hiroyuki
2008-06-27  9:29           ` Daisuke Nishimura
2008-06-27 10:13           ` MinChan Kim
2008-06-27 12:24           ` kamezawa.hiroyu
2008-06-25 10:10 ` [-mm][PATCH 9/10] memcg: fix mem_cgroup_end_migration() race KOSAKI Motohiro
2008-06-25 10:53   ` Daisuke Nishimura
2008-06-25 10:11 ` [-mm][PATCH 10/10] putback_lru_page()/unevictable page handling rework v4 KOSAKI Motohiro
2008-06-25 10:14   ` KOSAKI Motohiro
2008-06-25 22:13     ` Andrew Morton
2008-06-26  1:31       ` KOSAKI Motohiro
2008-06-25 16:29   ` Lee Schermerhorn
2008-06-26  8:08     ` KOSAKI Motohiro
2008-06-25 15:09 ` [-mm][PATCH 0/10] memory related bugfix set for 2.6.26-rc5-mm3 v2 Lee Schermerhorn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox