linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: munlock use mapcount to avoid terrible overhead
@ 2011-10-19  0:02 Hugh Dickins
  2011-10-19  0:14 ` Andrew Morton
  2011-10-19  0:25 ` Andi Kleen
  0 siblings, 2 replies; 5+ messages in thread
From: Hugh Dickins @ 2011-10-19  0:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michel Lespinasse, linux-kernel, linux-mm

A process spent 30 minutes exiting, just munlocking the pages of a large
anonymous area that had been alternately mprotected into page-sized vmas:
for every single page there's an anon_vma walk through all the other
little vmas to find the right one.

A general fix to that would be a lot more complicated (use prio_tree on
anon_vma?), but there's one very simple thing we can do to speed up the
common case: if a page to be munlocked is mapped only once, then it is
our vma that it is mapped into, and there's no need whatever to walk
through all the others.

Okay, there is a very remote race in munlock_vma_pages_range(), if
between its follow_page() and lock_page(), another process were to
munlock the same page, then page reclaim remove it from our vma, then
another process mlock it again.  We would find it with page_mapcount
1, yet it's still mlocked in another process.  But never mind, that's
much less likely than the down_read_trylock() failure which munlocking
already tolerates (in try_to_unmap_one()): in due course page reclaim
will discover and move the page to unevictable instead.
    
Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/mlock.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--- 3.1-rc10/mm/mlock.c	2011-07-21 19:17:23.000000000 -0700
+++ linux/mm/mlock.c	2011-10-06 12:47:54.670436979 -0700
@@ -110,7 +110,10 @@ void munlock_vma_page(struct page *page)
 	if (TestClearPageMlocked(page)) {
 		dec_zone_page_state(page, NR_MLOCK);
 		if (!isolate_lru_page(page)) {
-			int ret = try_to_munlock(page);
+			int ret = SWAP_AGAIN;
+
+			if (page_mapcount(page) > 1)
+				ret = try_to_munlock(page);
 			/*
 			 * did try_to_unlock() succeed or punt?
 			 */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: munlock use mapcount to avoid terrible overhead
  2011-10-19  0:02 [PATCH] mm: munlock use mapcount to avoid terrible overhead Hugh Dickins
@ 2011-10-19  0:14 ` Andrew Morton
  2011-10-19  0:56   ` Hugh Dickins
  2011-10-19  0:25 ` Andi Kleen
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2011-10-19  0:14 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Michel Lespinasse, linux-kernel, linux-mm

On Tue, 18 Oct 2011 17:02:56 -0700 (PDT)
Hugh Dickins <hughd@google.com> wrote:

> A process spent 30 minutes exiting, just munlocking the pages of a large
> anonymous area that had been alternately mprotected into page-sized vmas:
> for every single page there's an anon_vma walk through all the other
> little vmas to find the right one.
> 
> A general fix to that would be a lot more complicated (use prio_tree on
> anon_vma?), but there's one very simple thing we can do to speed up the
> common case: if a page to be munlocked is mapped only once, then it is
> our vma that it is mapped into, and there's no need whatever to walk
> through all the others.
> 
> Okay, there is a very remote race in munlock_vma_pages_range(), if
> between its follow_page() and lock_page(), another process were to
> munlock the same page, then page reclaim remove it from our vma, then
> another process mlock it again.  We would find it with page_mapcount
> 1, yet it's still mlocked in another process.  But never mind, that's
> much less likely than the down_read_trylock() failure which munlocking
> already tolerates (in try_to_unmap_one()): in due course page reclaim
> will discover and move the page to unevictable instead.
>     

And how long did the test case take with the patch applied?

> ---
>  mm/mlock.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> --- 3.1-rc10/mm/mlock.c	2011-07-21 19:17:23.000000000 -0700
> +++ linux/mm/mlock.c	2011-10-06 12:47:54.670436979 -0700
> @@ -110,7 +110,10 @@ void munlock_vma_page(struct page *page)
>  	if (TestClearPageMlocked(page)) {
>  		dec_zone_page_state(page, NR_MLOCK);
>  		if (!isolate_lru_page(page)) {
> -			int ret = try_to_munlock(page);
> +			int ret = SWAP_AGAIN;
> +
> +			if (page_mapcount(page) > 1)
> +				ret = try_to_munlock(page);
>  			/*
>  			 * did try_to_unlock() succeed or punt?
>  			 */

tsk.

--- a/mm/mlock.c~mm-munlock-use-mapcount-to-avoid-terrible-overhead-fix
+++ a/mm/mlock.c
@@ -112,6 +112,11 @@ void munlock_vma_page(struct page *page)
 		if (!isolate_lru_page(page)) {
 			int ret = SWAP_AGAIN;
 
+			/*
+			 * Optimization: if the page was mapped just once,
+			 * that's our mapping and we don't need to check all the
+			 * other vmas.
+			 */
 			if (page_mapcount(page) > 1)
 				ret = try_to_munlock(page);
 			/*
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: munlock use mapcount to avoid terrible overhead
  2011-10-19  0:02 [PATCH] mm: munlock use mapcount to avoid terrible overhead Hugh Dickins
  2011-10-19  0:14 ` Andrew Morton
@ 2011-10-19  0:25 ` Andi Kleen
  2011-10-19  0:59   ` Hugh Dickins
  1 sibling, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2011-10-19  0:25 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, Michel Lespinasse, linux-kernel, linux-mm

Hugh Dickins <hughd@google.com> writes:

> A process spent 30 minutes exiting, just munlocking the pages of a large
> anonymous area that had been alternately mprotected into page-sized vmas:
> for every single page there's an anon_vma walk through all the other
> little vmas to find the right one.

We had the same problem recently after a mmap+touch workload: in this
case it was hugepaged walking all these anon_vmas and the list was over
100k long. 

Had some data on this at plumbers:
http://halobates.de/plumbers-fork-locks_v2.pdf

> A general fix to that would be a lot more complicated (use prio_tree on
> anon_vma?), but there's one very simple thing we can do to speed up the
> common case: if a page to be munlocked is mapped only once, then it is
> our vma that it is mapped into, and there's no need whatever to walk
> through all the others.

I think we need a generic fix, this problem does not only happen
in munmap. 


-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: munlock use mapcount to avoid terrible overhead
  2011-10-19  0:14 ` Andrew Morton
@ 2011-10-19  0:56   ` Hugh Dickins
  0 siblings, 0 replies; 5+ messages in thread
From: Hugh Dickins @ 2011-10-19  0:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michel Lespinasse, linux-kernel, linux-mm

On Tue, 18 Oct 2011, Andrew Morton wrote:
> On Tue, 18 Oct 2011 17:02:56 -0700 (PDT)
> Hugh Dickins <hughd@google.com> wrote:
> 
> > A process spent 30 minutes exiting, just munlocking the pages of a large
> > anonymous area that had been alternately mprotected into page-sized vmas:
> > for every single page there's an anon_vma walk through all the other
> > little vmas to find the right one.
> 
> And how long did the test case take with the patch applied?

5 seconds in the case tried; but the issue is quadratic,
so you can make the improvement look arbitrarily good.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: munlock use mapcount to avoid terrible overhead
  2011-10-19  0:25 ` Andi Kleen
@ 2011-10-19  0:59   ` Hugh Dickins
  0 siblings, 0 replies; 5+ messages in thread
From: Hugh Dickins @ 2011-10-19  0:59 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Morton, Michel Lespinasse, linux-kernel, linux-mm

On Tue, 18 Oct 2011, Andi Kleen wrote:
> Hugh Dickins <hughd@google.com> writes:
> 
> > A process spent 30 minutes exiting, just munlocking the pages of a large
> > anonymous area that had been alternately mprotected into page-sized vmas:
> > for every single page there's an anon_vma walk through all the other
> > little vmas to find the right one.
> 
> We had the same problem recently after a mmap+touch workload: in this
> case it was hugepaged walking all these anon_vmas and the list was over
> 100k long. 
> 
> Had some data on this at plumbers:
> http://halobates.de/plumbers-fork-locks_v2.pdf
> 
> > A general fix to that would be a lot more complicated (use prio_tree on
> > anon_vma?), but there's one very simple thing we can do to speed up the
> > common case: if a page to be munlocked is mapped only once, then it is
> > our vma that it is mapped into, and there's no need whatever to walk
> > through all the others.
> 
> I think we need a generic fix, this problem does not only happen
> in munmap. 

Thanks for the pointer, Andi, I'll have to look into it when I've a
moment; but I don't look forward to making this area more complicated.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-10-19  0:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-19  0:02 [PATCH] mm: munlock use mapcount to avoid terrible overhead Hugh Dickins
2011-10-19  0:14 ` Andrew Morton
2011-10-19  0:56   ` Hugh Dickins
2011-10-19  0:25 ` Andi Kleen
2011-10-19  0:59   ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox