From: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
To: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <andrea@suse.de>,
linux-mm@kvack.org, Nick Dokos <nicholas.dokos@hp.com>
Subject: Re: [PATCH 01 of 16] remove nr_scan_inactive/active
Date: Fri, 29 Jun 2007 09:38:29 -0400 [thread overview]
Message-ID: <1183124309.5037.31.camel@localhost> (raw)
In-Reply-To: <468439E8.4040606@redhat.com>
On Thu, 2007-06-28 at 18:44 -0400, Rik van Riel wrote:
> Andrew Morton wrote:
>
> > Where's the system time being spent?
>
> OK, it turns out that there is quite a bit of variability
> in where the system spends its time. I did a number of
> reaim runs and averaged the time the system spent in the
> top functions.
>
> This is with the Fedora rawhide kernel config, which has
> quite a few debugging options enabled.
>
> _raw_spin_lock 32.0%
> page_check_address 12.7%
> __delay 10.8%
> mwait_idle 10.4%
> anon_vma_unlink 5.7%
> __anon_vma_link 5.3%
> lockdep_reset_lock 3.5%
> __kmalloc_node_track_caller 2.8%
> security_port_sid 1.8%
> kfree 1.6%
> anon_vma_link 1.2%
> page_referenced_one 1.1%
>
> In short, the system is waiting on the anon_vma lock.
>
> I wonder if Lee Schemmerhorn's patch to turn that
> spinlock into an rwlock would help this workload,
> or if we simply should scan fewer pages in the
> pageout code.
>
Rik:
Here's a fairly recent version of the patch if you want to try it on
your workload. We've seen mixed results on somewhat larger systems,
with and without your split LRU patch. I've started writing up those
results. I'll try to get back to finishing up the writeup after OLS and
vacation.
Regards,
Lee
-----------
Patch against 2.6.22-rc4-mm2
Make the anon_vma list lock a read/write lock. Heaviest use of this
lock is in the page_referenced()/try_to_unmap() calls from vmscan
[shrink_page_list()]. These functions can use a read lock to allow
some parallelism for different cpus trying to reclaim pages mapped
via the same set of vmas.
This change should not change the footprint of the anon_vma in the
non-debug case.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
include/linux/rmap.h | 9 ++++++---
mm/migrate.c | 4 ++--
mm/mmap.c | 4 ++--
mm/rmap.c | 20 ++++++++++----------
4 files changed, 20 insertions(+), 17 deletions(-)
Index: Linux/include/linux/rmap.h
===================================================================
--- Linux.orig/include/linux/rmap.h 2007-06-11 14:39:56.000000000 -0400
+++ Linux/include/linux/rmap.h 2007-06-20 09:49:24.000000000 -0400
@@ -24,7 +24,7 @@
* pointing to this anon_vma once its vma list is empty.
*/
struct anon_vma {
- spinlock_t lock; /* Serialize access to vma list */
+ rwlock_t rwlock; /* Serialize access to vma list */
struct list_head head; /* List of private "related" vmas */
};
@@ -42,18 +42,21 @@ static inline void anon_vma_free(struct
kmem_cache_free(anon_vma_cachep, anon_vma);
}
+/*
+ * This needs to be a write lock for __vma_link()
+ */
static inline void anon_vma_lock(struct vm_area_struct *vma)
{
struct anon_vma *anon_vma = vma->anon_vma;
if (anon_vma)
- spin_lock(&anon_vma->lock);
+ write_lock(&anon_vma->rwlock);
}
static inline void anon_vma_unlock(struct vm_area_struct *vma)
{
struct anon_vma *anon_vma = vma->anon_vma;
if (anon_vma)
- spin_unlock(&anon_vma->lock);
+ write_unlock(&anon_vma->rwlock);
}
/*
Index: Linux/mm/rmap.c
===================================================================
--- Linux.orig/mm/rmap.c 2007-06-11 14:40:06.000000000 -0400
+++ Linux/mm/rmap.c 2007-06-20 09:50:27.000000000 -0400
@@ -25,7 +25,7 @@
* mm->mmap_sem
* page->flags PG_locked (lock_page)
* mapping->i_mmap_lock
- * anon_vma->lock
+ * anon_vma->rwlock
* mm->page_table_lock or pte_lock
* zone->lru_lock (in mark_page_accessed, isolate_lru_page)
* swap_lock (in swap_duplicate, swap_info_get)
@@ -85,7 +85,7 @@ int anon_vma_prepare(struct vm_area_stru
if (anon_vma) {
allocated = NULL;
locked = anon_vma;
- spin_lock(&locked->lock);
+ write_lock(&locked->rwlock);
} else {
anon_vma = anon_vma_alloc();
if (unlikely(!anon_vma))
@@ -104,7 +104,7 @@ int anon_vma_prepare(struct vm_area_stru
spin_unlock(&mm->page_table_lock);
if (locked)
- spin_unlock(&locked->lock);
+ write_unlock(&locked->rwlock);
if (unlikely(allocated))
anon_vma_free(allocated);
}
@@ -132,10 +132,10 @@ void anon_vma_link(struct vm_area_struct
struct anon_vma *anon_vma = vma->anon_vma;
if (anon_vma) {
- spin_lock(&anon_vma->lock);
+ write_lock(&anon_vma->rwlock);
list_add_tail(&vma->anon_vma_node, &anon_vma->head);
validate_anon_vma(vma);
- spin_unlock(&anon_vma->lock);
+ write_unlock(&anon_vma->rwlock);
}
}
@@ -147,13 +147,13 @@ void anon_vma_unlink(struct vm_area_stru
if (!anon_vma)
return;
- spin_lock(&anon_vma->lock);
+ write_lock(&anon_vma->rwlock);
validate_anon_vma(vma);
list_del(&vma->anon_vma_node);
/* We must garbage collect the anon_vma if it's empty */
empty = list_empty(&anon_vma->head);
- spin_unlock(&anon_vma->lock);
+ write_unlock(&anon_vma->rwlock);
if (empty)
anon_vma_free(anon_vma);
@@ -164,7 +164,7 @@ static void anon_vma_ctor(void *data, st
{
struct anon_vma *anon_vma = data;
- spin_lock_init(&anon_vma->lock);
+ rwlock_init(&anon_vma->rwlock);
INIT_LIST_HEAD(&anon_vma->head);
}
@@ -191,7 +191,7 @@ static struct anon_vma *page_lock_anon_v
goto out;
anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
- spin_lock(&anon_vma->lock);
+ read_lock(&anon_vma->rwlock);
return anon_vma;
out:
rcu_read_unlock();
@@ -200,7 +200,7 @@ out:
static void page_unlock_anon_vma(struct anon_vma *anon_vma)
{
- spin_unlock(&anon_vma->lock);
+ read_unlock(&anon_vma->rwlock);
rcu_read_unlock();
}
Index: Linux/mm/mmap.c
===================================================================
--- Linux.orig/mm/mmap.c 2007-06-20 09:39:03.000000000 -0400
+++ Linux/mm/mmap.c 2007-06-20 09:49:24.000000000 -0400
@@ -571,7 +571,7 @@ again: remove_next = 1 + (end > next->
if (vma->anon_vma)
anon_vma = vma->anon_vma;
if (anon_vma) {
- spin_lock(&anon_vma->lock);
+ write_lock(&anon_vma->rwlock);
/*
* Easily overlooked: when mprotect shifts the boundary,
* make sure the expanding vma has anon_vma set if the
@@ -625,7 +625,7 @@ again: remove_next = 1 + (end > next->
}
if (anon_vma)
- spin_unlock(&anon_vma->lock);
+ write_unlock(&anon_vma->rwlock);
if (mapping)
spin_unlock(&mapping->i_mmap_lock);
Index: Linux/mm/migrate.c
===================================================================
--- Linux.orig/mm/migrate.c 2007-06-20 09:39:04.000000000 -0400
+++ Linux/mm/migrate.c 2007-06-20 09:49:24.000000000 -0400
@@ -228,12 +228,12 @@ static void remove_anon_migration_ptes(s
* We hold the mmap_sem lock. So no need to call page_lock_anon_vma.
*/
anon_vma = (struct anon_vma *) (mapping - PAGE_MAPPING_ANON);
- spin_lock(&anon_vma->lock);
+ read_lock(&anon_vma->rwlock);
list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
remove_migration_pte(vma, old, new);
- spin_unlock(&anon_vma->lock);
+ read_unlock(&anon_vma->rwlock);
}
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2007-06-29 13:38 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-08 20:02 [PATCH 00 of 16] OOM related fixes Andrea Arcangeli
2007-06-08 20:02 ` [PATCH 01 of 16] remove nr_scan_inactive/active Andrea Arcangeli
2007-06-10 17:36 ` Rik van Riel
2007-06-10 18:17 ` Andrea Arcangeli
2007-06-11 14:58 ` Rik van Riel
2007-06-26 17:08 ` Rik van Riel
2007-06-26 17:55 ` Andrew Morton
2007-06-26 19:02 ` Rik van Riel
2007-06-28 22:44 ` Rik van Riel
2007-06-28 22:57 ` Andrew Morton
2007-06-28 23:04 ` Rik van Riel
2007-06-28 23:13 ` Andrew Morton
2007-06-28 23:16 ` Rik van Riel
2007-06-28 23:29 ` Andrew Morton
2007-06-29 0:00 ` Rik van Riel
2007-06-29 0:19 ` Andrew Morton
2007-06-29 0:45 ` Rik van Riel
2007-06-29 1:12 ` Andrew Morton
2007-06-29 1:20 ` Rik van Riel
2007-06-29 1:29 ` Andrew Morton
2007-06-28 23:25 ` Andrea Arcangeli
2007-06-29 0:12 ` Andrew Morton
2007-06-29 13:38 ` Lee Schermerhorn [this message]
2007-06-29 14:12 ` Andrea Arcangeli
2007-06-29 14:59 ` Rik van Riel
2007-06-29 22:39 ` "Noreclaim Infrastructure" [was Re: [PATCH 01 of 16] remove nr_scan_inactive/active] Lee Schermerhorn
2007-06-29 22:42 ` RFC "Noreclaim Infrastructure - patch 1/3 basic infrastructure" Lee Schermerhorn
2007-06-29 22:44 ` RFC "Noreclaim Infrastructure patch 2/3 - noreclaim statistics..." Lee Schermerhorn
2007-06-29 22:49 ` "Noreclaim - client patch 3/3 - treat pages w/ excessively references anon_vma as nonreclaimable" Lee Schermerhorn
2007-06-26 20:37 ` [PATCH 01 of 16] remove nr_scan_inactive/active Andrea Arcangeli
2007-06-26 20:57 ` Rik van Riel
2007-06-26 22:21 ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 02 of 16] avoid oom deadlock in nfs_create_request Andrea Arcangeli
2007-06-10 17:38 ` Rik van Riel
2007-06-10 18:27 ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 03 of 16] prevent oom deadlocks during read/write operations Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 04 of 16] serialize oom killer Andrea Arcangeli
2007-06-09 6:43 ` Peter Zijlstra
2007-06-09 15:27 ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 05 of 16] avoid selecting already killed tasks Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 06 of 16] reduce the probability of an OOM livelock Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 07 of 16] balance_pgdat doesn't return the number of pages freed Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 08 of 16] don't depend on PF_EXITING tasks to go away Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 09 of 16] fallback killing more tasks if tif-memdie doesn't " Andrea Arcangeli
2007-06-08 21:57 ` Christoph Lameter
2007-06-08 20:03 ` [PATCH 10 of 16] stop useless vm trashing while we wait the TIF_MEMDIE task to exit Andrea Arcangeli
2007-06-08 21:48 ` Christoph Lameter
2007-06-09 1:59 ` Andrea Arcangeli
2007-06-09 3:01 ` Christoph Lameter
2007-06-09 14:05 ` Andrea Arcangeli
2007-06-09 14:38 ` Andrea Arcangeli
2007-06-11 16:07 ` Christoph Lameter
2007-06-11 16:50 ` Andrea Arcangeli
2007-06-11 16:57 ` Christoph Lameter
2007-06-11 17:51 ` Andrea Arcangeli
2007-06-11 17:56 ` Christoph Lameter
2007-06-11 18:22 ` Andrea Arcangeli
2007-06-11 18:39 ` Christoph Lameter
2007-06-11 18:58 ` Andrea Arcangeli
2007-06-11 19:25 ` Christoph Lameter
2007-06-11 16:04 ` Christoph Lameter
2007-06-08 20:03 ` [PATCH 11 of 16] the oom schedule timeout isn't needed with the VM_is_OOM logic Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 12 of 16] show mem information only when a task is actually being killed Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 13 of 16] simplify oom heuristics Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 14 of 16] oom select should only take rss into account Andrea Arcangeli
2007-06-10 17:17 ` Rik van Riel
2007-06-10 17:30 ` Andrea Arcangeli
2007-06-08 20:03 ` [PATCH 15 of 16] limit reclaim if enough pages have been freed Andrea Arcangeli
2007-06-10 17:20 ` Rik van Riel
2007-06-10 17:32 ` Andrea Arcangeli
2007-06-10 17:52 ` Rik van Riel
2007-06-11 16:23 ` Christoph Lameter
2007-06-11 16:57 ` Rik van Riel
2007-06-08 20:03 ` [PATCH 16 of 16] avoid some lock operation in vm fast path Andrea Arcangeli
2007-06-08 21:26 ` [PATCH 00 of 16] OOM related fixes William Lee Irwin III
2007-06-09 14:55 ` Andrea Arcangeli
2007-06-12 8:58 ` Petr Tesarik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1183124309.5037.31.camel@localhost \
--to=lee.schermerhorn@hp.com \
--cc=akpm@linux-foundation.org \
--cc=andrea@suse.de \
--cc=linux-mm@kvack.org \
--cc=nicholas.dokos@hp.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox