linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: mel@skynet.ie (Mel Gorman)
To: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org, clameter@sgi.com,
	riel@redhat.com, balbir@linux.vnet.ibm.com, andrea@suse.de,
	a.p.zijlstra@chello.nl, eric.whitney@hp.com, npiggin@suse.de
Subject: Re: [PATCH/RFC 1/14] Reclaim Scalability:  Convert anon_vma lock to read/write lock
Date: Mon, 17 Sep 2007 12:02:35 +0100	[thread overview]
Message-ID: <20070917110234.GF25706@skynet.ie> (raw)
In-Reply-To: <20070914205405.6536.37532.sendpatchset@localhost>

On (14/09/07 16:54), Lee Schermerhorn didst pronounce:
> [PATCH/RFC] 01/14 Reclaim Scalability:  Convert anon_vma list lock a read/write lock
> 
> Against 2.6.23-rc4-mm1
> 
> Make the anon_vma list lock a read/write lock.  Heaviest use of this
> lock is in the page_referenced()/try_to_unmap() calls from vmscan
> [shrink_page_list()].  These functions can use a read lock to allow
> some parallelism for different cpus trying to reclaim pages mapped
> via the same set of vmas.
> 
> This change should not change the footprint of the anon_vma in the
> non-debug case.
> 
> Note:  I have seen systems livelock with all cpus in reclaim, down
> in page_referenced_anon() or try_to_unmap_anon() spinning on the
> anon_vma lock.  I have only seen this with the AIM7 benchmark with
> workloads of 10s of thousands of tasks.  All of these tasks are
> children of a single ancestor, so they all share the same anon_vma
> for each vm area in their respective mm's.  I'm told that Apache
> can fork thousands of children to handle incoming connections, and
> I've seen similar livelocks--albeit on the i_mmap_lock [next patch]
> running 1000s of Oracle users on a large ia64 platform.
> 
> With this patch [along with Rik van Riel's split LRU patch] we were
> able to see the AIM7 workload start swapping, instead of hanging,
> for the first time.  Same workload DID hang with just Rik's patch,
> so this patch is apparently useful.
> 

In light of what Peter and Linus said about rw-locks being more expensive
than spinlocks, we'll need to measure this with some benchmark. The plus
side is that this patch can be handled in isolation because it's either a
scalability fix or it isn't. It's worth investigating because you say it
fixed a real problem where under load the job was able to complete with
this patch and live-locked without it.

kernbench is unlikely to show up anything useful here although it might be
worth running anyway just in case. brk_test from aim9 might be useful as it's
a micro-benchmark that uses brk() which is a path affected by this patch. As
aim7 is exercising this path, it would be interesting to see does it show
performance differences in the normal non-stressed case. Other suggestions?

For this one, it'll be important to test on single CPU systems with an
SMP kernel as well as this feels like one of those fixes that helps large
machines but punishes small ones.

When you decide on a test-case, I can test just this patch and see what
results I find.

Otherwise the patch looks ok.

> Signed-off-by:  Lee Schermerhorn <lee.schermerhorn@hp.com>
> 
>  include/linux/rmap.h |    9 ++++++---
>  mm/migrate.c         |    4 ++--
>  mm/mmap.c            |    4 ++--
>  mm/rmap.c            |   20 ++++++++++----------
>  4 files changed, 20 insertions(+), 17 deletions(-)
> 
> Index: Linux/include/linux/rmap.h
> ===================================================================
> --- Linux.orig/include/linux/rmap.h	2007-09-10 10:09:47.000000000 -0400
> +++ Linux/include/linux/rmap.h	2007-09-10 11:43:11.000000000 -0400
> @@ -25,7 +25,7 @@
>   * pointing to this anon_vma once its vma list is empty.
>   */
>  struct anon_vma {
> -	spinlock_t lock;	/* Serialize access to vma list */
> +	rwlock_t rwlock;	/* Serialize access to vma list */
>  	struct list_head head;	/* List of private "related" vmas */
>  };
>  
> @@ -43,18 +43,21 @@ static inline void anon_vma_free(struct 
>  	kmem_cache_free(anon_vma_cachep, anon_vma);
>  }
>  
> +/*
> + * This needs to be a write lock for __vma_link()
> + */
>  static inline void anon_vma_lock(struct vm_area_struct *vma)
>  {
>  	struct anon_vma *anon_vma = vma->anon_vma;
>  	if (anon_vma)
> -		spin_lock(&anon_vma->lock);
> +		write_lock(&anon_vma->rwlock);
>  }
>  
>  static inline void anon_vma_unlock(struct vm_area_struct *vma)
>  {
>  	struct anon_vma *anon_vma = vma->anon_vma;
>  	if (anon_vma)
> -		spin_unlock(&anon_vma->lock);
> +		write_unlock(&anon_vma->rwlock);
>  }
>  
>  /*
> Index: Linux/mm/rmap.c
> ===================================================================
> --- Linux.orig/mm/rmap.c	2007-09-10 10:09:47.000000000 -0400
> +++ Linux/mm/rmap.c	2007-09-10 11:43:11.000000000 -0400
> @@ -25,7 +25,7 @@
>   *   mm->mmap_sem
>   *     page->flags PG_locked (lock_page)
>   *       mapping->i_mmap_lock
> - *         anon_vma->lock
> + *         anon_vma->rwlock
>   *           mm->page_table_lock or pte_lock
>   *             zone->lru_lock (in mark_page_accessed, isolate_lru_page)
>   *             swap_lock (in swap_duplicate, swap_info_get)
> @@ -68,7 +68,7 @@ int anon_vma_prepare(struct vm_area_stru
>  		if (anon_vma) {
>  			allocated = NULL;
>  			locked = anon_vma;
> -			spin_lock(&locked->lock);
> +			write_lock(&locked->rwlock);
>  		} else {
>  			anon_vma = anon_vma_alloc();
>  			if (unlikely(!anon_vma))
> @@ -87,7 +87,7 @@ int anon_vma_prepare(struct vm_area_stru
>  		spin_unlock(&mm->page_table_lock);
>  
>  		if (locked)
> -			spin_unlock(&locked->lock);
> +			write_unlock(&locked->rwlock);
>  		if (unlikely(allocated))
>  			anon_vma_free(allocated);
>  	}
> @@ -113,9 +113,9 @@ void anon_vma_link(struct vm_area_struct
>  	struct anon_vma *anon_vma = vma->anon_vma;
>  
>  	if (anon_vma) {
> -		spin_lock(&anon_vma->lock);
> +		write_lock(&anon_vma->rwlock);
>  		list_add_tail(&vma->anon_vma_node, &anon_vma->head);
> -		spin_unlock(&anon_vma->lock);
> +		write_unlock(&anon_vma->rwlock);
>  	}
>  }
>  
> @@ -127,12 +127,12 @@ void anon_vma_unlink(struct vm_area_stru
>  	if (!anon_vma)
>  		return;
>  
> -	spin_lock(&anon_vma->lock);
> +	write_lock(&anon_vma->rwlock);
>  	list_del(&vma->anon_vma_node);
>  
>  	/* We must garbage collect the anon_vma if it's empty */
>  	empty = list_empty(&anon_vma->head);
> -	spin_unlock(&anon_vma->lock);
> +	write_unlock(&anon_vma->rwlock);
>  
>  	if (empty)
>  		anon_vma_free(anon_vma);
> @@ -143,7 +143,7 @@ static void anon_vma_ctor(void *data, st
>  {
>  	struct anon_vma *anon_vma = data;
>  
> -	spin_lock_init(&anon_vma->lock);
> + 	rwlock_init(&anon_vma->rwlock);
>  	INIT_LIST_HEAD(&anon_vma->head);
>  }
>  
> @@ -170,7 +170,7 @@ static struct anon_vma *page_lock_anon_v
>  		goto out;
>  
>  	anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
> -	spin_lock(&anon_vma->lock);
> +	read_lock(&anon_vma->rwlock);
>  	return anon_vma;
>  out:
>  	rcu_read_unlock();
> @@ -179,7 +179,7 @@ out:
>  
>  static void page_unlock_anon_vma(struct anon_vma *anon_vma)
>  {
> -	spin_unlock(&anon_vma->lock);
> +	read_unlock(&anon_vma->rwlock);
>  	rcu_read_unlock();
>  }
>  
> Index: Linux/mm/mmap.c
> ===================================================================
> --- Linux.orig/mm/mmap.c	2007-09-10 10:09:47.000000000 -0400
> +++ Linux/mm/mmap.c	2007-09-10 11:43:11.000000000 -0400
> @@ -570,7 +570,7 @@ again:			remove_next = 1 + (end > next->
>  	if (vma->anon_vma)
>  		anon_vma = vma->anon_vma;
>  	if (anon_vma) {
> -		spin_lock(&anon_vma->lock);
> +		write_lock(&anon_vma->rwlock);
>  		/*
>  		 * Easily overlooked: when mprotect shifts the boundary,
>  		 * make sure the expanding vma has anon_vma set if the
> @@ -624,7 +624,7 @@ again:			remove_next = 1 + (end > next->
>  	}
>  
>  	if (anon_vma)
> -		spin_unlock(&anon_vma->lock);
> +		write_unlock(&anon_vma->rwlock);
>  	if (mapping)
>  		spin_unlock(&mapping->i_mmap_lock);
>  
> Index: Linux/mm/migrate.c
> ===================================================================
> --- Linux.orig/mm/migrate.c	2007-09-10 10:09:47.000000000 -0400
> +++ Linux/mm/migrate.c	2007-09-10 11:43:11.000000000 -0400
> @@ -234,12 +234,12 @@ static void remove_anon_migration_ptes(s
>  	 * We hold the mmap_sem lock. So no need to call page_lock_anon_vma.
>  	 */
>  	anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
> -	spin_lock(&anon_vma->lock);
> +	read_lock(&anon_vma->rwlock);
>  
>  	list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
>  		remove_migration_pte(vma, old, new);
>  
> -	spin_unlock(&anon_vma->lock);
> +	read_unlock(&anon_vma->rwlock);
>  }
>  
>  /*
> 

-- 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-09-17 11:02 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-14 20:53 [PATCH/RFC 0/14] Page Reclaim Scalability Lee Schermerhorn
2007-09-14 20:54 ` [PATCH/RFC 1/14] Reclaim Scalability: Convert anon_vma lock to read/write lock Lee Schermerhorn
2007-09-17 11:02   ` Mel Gorman [this message]
2007-09-18  2:41     ` KAMEZAWA Hiroyuki
2007-09-18 11:01       ` Mel Gorman
2007-09-18 14:57         ` Rik van Riel
2007-09-18 15:37       ` Lee Schermerhorn
2007-09-18 20:17     ` Lee Schermerhorn
2007-09-20 10:19       ` Mel Gorman
2007-09-14 20:54 ` [PATCH/RFC 2/14] Reclaim Scalability: convert inode i_mmap_lock to reader/writer lock Lee Schermerhorn
2007-09-17 12:53   ` Mel Gorman
2007-09-20  1:24   ` Andrea Arcangeli
2007-09-20 14:10     ` Lee Schermerhorn
2007-09-20 14:16       ` Andrea Arcangeli
2007-09-14 20:54 ` [PATCH/RFC 3/14] Reclaim Scalability: move isolate_lru_page() to vmscan.c Lee Schermerhorn
2007-09-14 21:34   ` Peter Zijlstra
2007-09-15  1:55     ` Rik van Riel
2007-09-17 14:11     ` Lee Schermerhorn
2007-09-17  9:20   ` Balbir Singh
2007-09-17 19:19     ` Lee Schermerhorn
2007-09-14 20:54 ` [PATCH/RFC 4/14] Reclaim Scalability: Define page_anon() function Lee Schermerhorn
2007-09-15  2:00   ` Rik van Riel
2007-09-17 13:19   ` Mel Gorman
2007-09-18  1:58   ` KAMEZAWA Hiroyuki
2007-09-18  2:27     ` Rik van Riel
2007-09-18  2:40       ` KAMEZAWA Hiroyuki
2007-09-18 15:04     ` Lee Schermerhorn
2007-09-18 19:41       ` Christoph Lameter
2007-09-19  0:30       ` KAMEZAWA Hiroyuki
2007-09-19 16:58         ` Lee Schermerhorn
2007-09-20  0:56           ` KAMEZAWA Hiroyuki
2007-09-14 20:54 ` [PATCH/RFC 5/14] Reclaim Scalability: Use an indexed array for LRU variables Lee Schermerhorn
2007-09-17 13:40   ` Mel Gorman
2007-09-17 14:17     ` Lee Schermerhorn
2007-09-17 14:39       ` Lee Schermerhorn
2007-09-17 18:58   ` Balbir Singh
2007-09-17 19:12     ` Lee Schermerhorn
2007-09-17 19:36       ` Balbir Singh
2007-09-17 19:36     ` Rik van Riel
2007-09-17 20:21       ` Balbir Singh
2007-09-17 21:01         ` Rik van Riel
2007-09-14 20:54 ` [PATCH/RFC 6/14] Reclaim Scalability: "No Reclaim LRU Infrastructure" Lee Schermerhorn
2007-09-14 22:47   ` Christoph Lameter
2007-09-17 15:17     ` Lee Schermerhorn
2007-09-17 18:41       ` Christoph Lameter
2007-09-18  9:54         ` Mel Gorman
2007-09-18 19:45           ` Christoph Lameter
2007-09-19 11:11             ` Mel Gorman
2007-09-19 18:03               ` Christoph Lameter
2007-09-19  6:00   ` Balbir Singh
2007-09-19 14:47     ` Lee Schermerhorn
2007-09-14 20:54 ` [PATCH/RFC 7/14] Reclaim Scalability: Non-reclaimable page statistics Lee Schermerhorn
2007-09-17  1:56   ` Rik van Riel
2007-09-14 20:54 ` [PATCH/RFC 8/14] Reclaim Scalability: Ram Disk Pages are non-reclaimable Lee Schermerhorn
2007-09-17  1:57   ` Rik van Riel
2007-09-17 14:40     ` Lee Schermerhorn
2007-09-17 18:42       ` Christoph Lameter
2007-09-14 20:54 ` [PATCH/RFC 9/14] Reclaim Scalability: SHM_LOCKED pages are nonreclaimable Lee Schermerhorn
2007-09-17  2:18   ` Rik van Riel
2007-09-14 20:55 ` [PATCH/RFC 10/14] Reclaim Scalability: track anon_vma "related vmas" Lee Schermerhorn
2007-09-17  2:52   ` Rik van Riel
2007-09-17 15:52     ` Lee Schermerhorn
2007-09-14 20:55 ` [PATCH/RFC 11/14] Reclaim Scalability: swap backed pages are nonreclaimable when no swap space available Lee Schermerhorn
2007-09-17  2:53   ` Rik van Riel
2007-09-18 17:46     ` Lee Schermerhorn
2007-09-18 20:01       ` Rik van Riel
2007-09-19 14:55         ` Lee Schermerhorn
2007-09-18  2:59   ` KAMEZAWA Hiroyuki
2007-09-18 15:47     ` Lee Schermerhorn
2007-09-14 20:55 ` [PATCH/RFC 12/14] Reclaim Scalability: Non-reclaimable Mlock'ed pages Lee Schermerhorn
2007-09-14 20:55 ` [PATCH/RFC 13/14] Reclaim Scalability: Handle Mlock'ed pages during map/unmap and truncate Lee Schermerhorn
2007-09-14 20:55 ` [PATCH/RFC 14/14] Reclaim Scalability: cull non-reclaimable anon pages in fault path Lee Schermerhorn
2007-09-14 21:11 ` [PATCH/RFC 0/14] Page Reclaim Scalability Peter Zijlstra
2007-09-14 21:42   ` Linus Torvalds
2007-09-14 22:02     ` Peter Zijlstra
2007-09-15  0:07       ` Linus Torvalds
2007-09-17  6:44 ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070917110234.GF25706@skynet.ie \
    --to=mel@skynet.ie \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andrea@suse.de \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=clameter@sgi.com \
    --cc=eric.whitney@hp.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox