rmap spin_trylock success rates

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* rmap spin_trylock success rates
@ 2004-05-01  7:54 Andrew Morton
  2004-05-02  8:42 ` Hugh Dickins
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2004-05-01  7:54 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm

I applied the appended patch to determine the spin_trylock success rate in
rmap.c.  The machine is a single P4-HT pseudo-2-way.  256MB of memory and
the workload is a straightforward `usemem -m 400': allocate and touch 400MB
of memory.

page_referenced_one_miss = 4027
page_referenced_one_hit = 212605
try_to_unmap_one_miss = 3257
try_to_unmap_one_hit = 61153

That's a 5% failure rate in try_to_unmap_one()'s spin_trylock().

I suspect this is the reason for the problem which Martin Schwidefsky
reported a while back: this particular workload only achieves half the disk
bandwidth on SMP when compared with UP.  I poked around with that a bit at
the time and determined that it was due to poor I/O submission patterns. 
Increasing the disk queue from 128 slots to 1024 fixed it completely
because the request queue fixed up the bad I/O submission patterns.

With `./qsbench -p 4 -m 96':

page_referenced_one_miss = 401
page_referenced_one_hit = 1224748
try_to_unmap_one_miss = 103
try_to_unmap_one_hit = 339944

That's negligible.

I don't think we really need to do anything about this - the
everything-in-one-mm case isn't the most interesting situation.

In a way it's an argument for serialising the whole page reclaim path.

It'd be nice of we can reduce the page_table_lock hold times in there.

hm, now I'm confused.  We're running try_to_unmap_one() under anonhd->lock,
so why is there any contention for page_table_lock at all with this
workload?  It must be contending with page_referenced_one().  Taking
anonhd->lock in page_referenced_one() also might fix this up.

 25-akpm/mm/rmap.c |   21 +++++++++++++++++++--
 1 files changed, 19 insertions(+), 2 deletions(-)

diff -puN mm/rmap.c~rmap-trylock-instrumentation mm/rmap.c
--- 25/mm/rmap.c~rmap-trylock-instrumentation	2004-05-01 00:19:05.485768648 -0700
+++ 25-akpm/mm/rmap.c	2004-05-01 00:22:32.029369248 -0700
@@ -27,6 +27,15 @@

 #include <asm/tlbflush.h>

+static struct stats {
+	int page_referenced_one_miss;
+	int page_referenced_one_hit;
+	int try_to_unmap_one_miss;
+	int try_to_unmap_one_hit;
+	int try_to_unmap_cluster_miss;
+	int try_to_unmap_cluster_hit;
+} stats;
+
 /*
  * struct anonmm: to track a bundle of anonymous memory mappings.
  *
@@ -178,6 +187,7 @@ static int page_referenced_one(struct pa
 	int referenced = 0;

 	if (!spin_trylock(&mm->page_table_lock)) {
+		stats.page_referenced_one_miss++;
 		/*
 		 * For debug we're currently warning if not all found,
 		 * but in this case that's expected: suppress warning.
@@ -185,6 +195,7 @@ static int page_referenced_one(struct pa
 		(*failed)++;
 		return 0;
 	}
+	stats.page_referenced_one_hit++;

 	pgd = pgd_offset(mm, address);
 	if (!pgd_present(*pgd))
@@ -495,8 +506,11 @@ static int try_to_unmap_one(struct page 
 	 * We need the page_table_lock to protect us from page faults,
 	 * munmap, fork, etc...
 	 */
-	if (!spin_trylock(&mm->page_table_lock))
+	if (!spin_trylock(&mm->page_table_lock)) {
+		stats.try_to_unmap_one_miss++;
 		goto out;
+	}
+	stats.try_to_unmap_one_hit++;

 	pgd = pgd_offset(mm, address);
 	if (!pgd_present(*pgd))
@@ -596,8 +610,11 @@ static int try_to_unmap_cluster(struct m
 	 * We need the page_table_lock to protect us from page faults,
 	 * munmap, fork, etc...
 	 */
-	if (!spin_trylock(&mm->page_table_lock))
+	if (!spin_trylock(&mm->page_table_lock)) {
+		stats.try_to_unmap_cluster_miss++;
 		return SWAP_FAIL;
+	}
+	stats.try_to_unmap_cluster_hit++;

 	address = (vma->vm_start + cursor) & CLUSTER_MASK;
 	end = address + CLUSTER_SIZE;

_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: rmap spin_trylock success rates
  2004-05-01  7:54 rmap spin_trylock success rates Andrew Morton
@ 2004-05-02  8:42 ` Hugh Dickins
  2004-05-02  9:36   ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Hugh Dickins @ 2004-05-02  8:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm

On Sat, 1 May 2004, Andrew Morton wrote:
> 
> I applied the appended patch to determine the spin_trylock success rate in
> rmap.c.

Good, thank you, this needed doing.
I think you want to extend it to the spin_trylocks on i_shared_lock too.

> The machine is a single P4-HT pseudo-2-way.  256MB of memory and
> the workload is a straightforward `usemem -m 400': allocate and touch 400MB
> of memory.
> 
> page_referenced_one_miss = 4027
> page_referenced_one_hit = 212605
> try_to_unmap_one_miss = 3257
> try_to_unmap_one_hit = 61153
> 
> That's a 5% failure rate in try_to_unmap_one()'s spin_trylock().

Considerably better than I feared.

> I suspect this is the reason for the problem which Martin Schwidefsky
> reported a while back: this particular workload only achieves half the disk
> bandwidth on SMP when compared with UP.  I poked around with that a bit at
> the time and determined that it was due to poor I/O submission patterns. 
> Increasing the disk queue from 128 slots to 1024 fixed it completely
> because the request queue fixed up the bad I/O submission patterns.

Well, I'm glad you can make such connections!  My understanding went so
far as to wonder whether you'd cut'n'pasted that paragraph in by mistake!

> With `./qsbench -p 4 -m 96':
> 
> page_referenced_one_miss = 401
> page_referenced_one_hit = 1224748
> try_to_unmap_one_miss = 103
> try_to_unmap_one_hit = 339944
> 
> That's negligible.
> 
> I don't think we really need to do anything about this - the
> everything-in-one-mm case isn't the most interesting situation.

Yes, nothing to worry about in these particular figures.
But probably somewhere out there is an app with a feedback
effect which really makes them bad.  Later on I would like
to go back and see what can be done about missed trylocks.

> In a way it's an argument for serialising the whole page reclaim path.
> 
> It'd be nice of we can reduce the page_table_lock hold times in there.
> 
> hm, now I'm confused.  We're running try_to_unmap_one() under anonhd->lock,
> so why is there any contention for page_table_lock at all with this
> workload?  It must be contending with page_referenced_one().  Taking
> anonhd->lock in page_referenced_one() also might fix this up.

page_referenced_anon does already take the anonhd->lock.  If try_to_unmap
is actually unmapping pages, won't that workload want to fault them back
in again later, hence page_table_lock contention with faulting?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: rmap spin_trylock success rates
  2004-05-02  8:42 ` Hugh Dickins
@ 2004-05-02  9:36   ` Andrew Morton
  0 siblings, 0 replies; 3+ messages in thread
From: Andrew Morton @ 2004-05-02  9:36 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm

Hugh Dickins <hugh@veritas.com> wrote:
>
> On Sat, 1 May 2004, Andrew Morton wrote:
> > 
> > I applied the appended patch to determine the spin_trylock success rate in
> > rmap.c.
> 
> Good, thank you, this needed doing.
> I think you want to extend it to the spin_trylocks on i_shared_lock too.

hm.

> > The machine is a single P4-HT pseudo-2-way.  256MB of memory and
> > the workload is a straightforward `usemem -m 400': allocate and touch 400MB
> > of memory.
> > 
> > page_referenced_one_miss = 4027
> > page_referenced_one_hit = 212605
> > try_to_unmap_one_miss = 3257
> > try_to_unmap_one_hit = 61153
> > 
> > That's a 5% failure rate in try_to_unmap_one()'s spin_trylock().
> 
> Considerably better than I feared.

It's only a pretend 2-way.  Bigger machines may be worse.

But the workload isn't very interesting.

> > I suspect this is the reason for the problem which Martin Schwidefsky
> > reported a while back: this particular workload only achieves half the disk
> > bandwidth on SMP when compared with UP.  I poked around with that a bit at
> > the time and determined that it was due to poor I/O submission patterns. 
> > Increasing the disk queue from 128 slots to 1024 fixed it completely
> > because the request queue fixed up the bad I/O submission patterns.
> 
> Well, I'm glad you can make such connections!  My understanding went so
> far as to wonder whether you'd cut'n'pasted that paragraph in by mistake!

I saw the debug output.  It was blocks of 1000-odd successive writepages
followed by blobs of some tens of missed trylocks.  Those missed trylocks
result in the pages being refiled on the LRU, so we writepage them later
on, when the disk head is elsewhere.  (hm.  If we unmap the page from
swapcache on missed trylock that won't happen).

If the CPU which holds page_table_lock wanders off to service an interrupt
we'll miss the lock a lot of times in succession.

> If try_to_unmap
> is actually unmapping pages, won't that workload want to fault them back
> in again later, hence page_table_lock contention with faulting?

The workload is just a linear wipe, so it'll be faults against new pages,
yup.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2004-05-02  9:36 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-05-01  7:54 rmap spin_trylock success rates Andrew Morton
2004-05-02  8:42 ` Hugh Dickins
2004-05-02  9:36   ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox