linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: David Howells <dhowells@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>,
	Ulrich Drepper <drepper@redhat.com>,
	Andi Kleen <andi@firstfloor.org>, Rik van Riel <riel@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, Hugh Dickins <hugh@veritas.com>,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: preemption and rwsems (was: Re: missing madvise functionality)
Date: Thu, 5 Apr 2007 12:27:24 -0700	[thread overview]
Message-ID: <20070405122724.b1712aa6.akpm@linux-foundation.org> (raw)
In-Reply-To: <19526.1175777338@redhat.com>

On Thu, 05 Apr 2007 13:48:58 +0100
David Howells <dhowells@redhat.com> wrote:

> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > 
> > What we effectively have is 32 threads on a single CPU all doing
> > 
> > 	for (ever) {
> > 		down_write()
> > 		up_write()
> > 		down_read()
> > 		up_read();
> > 	}
> 
> That's not quite so.  In that test program, most loops do two d/u writes and
> then a slew of d/u reads with virtually no delay between them.  One of the
> write-locked periods possibly lasts a relatively long time (it frees a bunch
> of pages), and the read-locked periods last a potentially long time (have to
> allocate a page).

Whatever.  I think it is still the case that the queueing behaviour of
rwsems causes us to get into this abababababab scenario.  And a single,
sole, solitary cond_resched() is sufficient to trigger the whole process
happening, and once it has started, it is sustained.

> If they weren't, you'd have to expect writer starvation in this situation.  As
> it is, you're guaranteed progress on all threads.
> 
> > CONFIG_PREEMPT_VOLUNTARY=y
> 
> Which means the periods of lock-holding can be extended by preemption of the
> lock holder(s), making the whole situation that much worse.  You have to
> remember, you *can* be preempted whilst you hold a semaphore, rwsem or mutex.

Of course - the same thing happens with CONFIG_PREEMPT=y.

> > I run it all on a single CPU under `taskset -c 0' on the 8-way and it still
> > causes 160,000 context switches per second and takes 9.5 seconds (after
> > s/100000/1000).
> 
> How about if you have a UP kernel?  (ie: spinlocks -> nops)

dunno.

> > the context switch rate falls to zilch and total runtime falls to 6.4
> > seconds.
> 
> I presume you don't mean literally zero.

I do.  At least, I was unable to discern any increase in the context-switch
column in the `vmstat 1' output.

> > If that cond_resched() was not there, none of this would ever happen - each
> > thread merrily chugs away doing its ups and downs until it expires its
> > timeslice.  Interesting, in a sad sort of way.
> 
> The trouble is, I think, that you spend so much more time holding (or
> attempting to hold) locks than not, and preemption just exacerbates things.

No.  Preemption *triggers* things.  We're talking about an increase in
context switch rate by a factor of at least 10,000.  Something changed in a
fundamental way.

> I suspect that the reason the problem doesn't seem so obvious when you've got
> 8 CPUs crunching their way through at once is probably because you can make
> progress on several read loops simultaneously fast enough that the preemption
> is lost in the things having to stop to give everyone writelocks.

The context switch rate is enormous on SMP on all kernel configs.  Perhaps
a better way of looking at it is to observe that the special case of a
single processor running a non-preemptible kernel simply got lucky.

> But short of recording the lock sequence, I don't think there's anyway to find
> out for sure.  printk probably won't cut it as a recording mechanism because
> its overheads are too great.

I think any code sequence which does

	for ( ; ; ) {
		down_write()
		up_write()
		down_read()
		up_read()
	}

is vulnerable to the artifact which I described.


I don't think we can (or should) do anything about it at the lock
implementation level.  It's more a matter of being aware of the possible
failure modes of rwsems, and being more careful to avoid that situation in
the code which uses rwsems.  And, of course, being careful about when and
where we use rwsems as opposed to other types of locks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2007-04-05 19:27 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <46128051.9000609@redhat.com>
     [not found] ` <p73648dz5oa.fsf@bingen.suse.de>
     [not found]   ` <46128CC2.9090809@redhat.com>
     [not found]     ` <20070403172841.GB23689@one.firstfloor.org>
2007-04-03 19:59       ` missing madvise functionality Andrew Morton
2007-04-03 20:09         ` Andi Kleen
2007-04-03 20:17         ` Ulrich Drepper
2007-04-03 20:29           ` Jakub Jelinek
2007-04-03 20:38             ` Rik van Riel
2007-04-03 21:49             ` Andrew Morton
2007-04-03 23:01               ` Eric Dumazet
2007-04-04  2:22                 ` Nick Piggin
2007-04-04  5:41                   ` Eric Dumazet
2007-04-04  6:09                     ` [patches] threaded vma patches (was Re: missing madvise functionality) Nick Piggin
2007-04-04  6:26                       ` Andrew Morton
2007-04-04  6:38                         ` Nick Piggin
2007-04-04  6:42                       ` Ulrich Drepper
2007-04-04  6:44                         ` Nick Piggin
2007-04-04  6:50                         ` Eric Dumazet
2007-04-04  6:54                           ` Ulrich Drepper
2007-04-04  7:33                             ` Eric Dumazet
2007-04-04  8:25                   ` missing madvise functionality Peter Zijlstra
2007-04-04  8:55                     ` Nick Piggin
2007-04-04  9:12                       ` William Lee Irwin III
2007-04-04  9:23                         ` Nick Piggin
2007-04-04  9:34                       ` Eric Dumazet
2007-04-04  9:45                         ` Nick Piggin
2007-04-04 10:05                         ` Nick Piggin
2007-04-04 11:54                           ` Eric Dumazet
2007-04-05  2:01                             ` Nick Piggin
2007-04-05  6:09                               ` Eric Dumazet
2007-04-05  6:19                                 ` Ulrich Drepper
2007-04-05  6:54                                   ` Eric Dumazet
2007-04-03 23:02               ` Andrew Morton
2007-04-04  9:15                 ` Hugh Dickins
2007-04-04 14:55                   ` Rik van Riel
2007-04-04 15:25                     ` Hugh Dickins
2007-04-05  1:44                       ` Nick Piggin
2007-04-04 18:04                   ` Andrew Morton
2007-04-04 18:08                     ` Rik van Riel
2007-04-04 20:56                       ` Andrew Morton
2007-04-04 18:39                     ` Hugh Dickins
2007-04-03 23:44               ` Andrew Morton
2007-04-04 13:09             ` William Lee Irwin III
2007-04-04 13:38               ` William Lee Irwin III
2007-04-04 18:51               ` Andrew Morton
2007-04-05  4:14                 ` William Lee Irwin III
2007-04-04 23:00             ` preemption and rwsems (was: Re: missing madvise functionality) Andrew Morton
2007-04-05  7:31             ` missing madvise functionality Rik van Riel
2007-04-05  7:39               ` Rik van Riel
2007-04-05  8:32                 ` Andrew Morton
2007-04-05 15:47                   ` Rik van Riel
2007-04-05  8:08               ` Eric Dumazet
2007-04-05  8:31                 ` Rik van Riel
2007-04-05  9:06                   ` Eric Dumazet
2007-04-05  9:45               ` Jakub Jelinek
2007-04-05 16:15                 ` Rik van Riel
2007-04-05 16:10               ` Ulrich Drepper
2007-04-06  2:28                 ` Nick Piggin
2007-04-06  2:52                   ` Ulrich Drepper
2007-04-06  2:59                     ` Nick Piggin
2007-04-05 12:48             ` preemption and rwsems (was: Re: missing madvise functionality) David Howells
2007-04-05 19:11               ` Ingo Molnar
2007-04-05 20:37                 ` Andrew Morton
2007-04-06  9:08                   ` Ingo Molnar
2007-04-06 19:30                     ` Andrew Morton
2007-04-06 19:40                       ` Ingo Molnar
2007-04-05 19:27               ` Andrew Morton [this message]
2007-04-03 20:51           ` missing madvise functionality Andrew Morton
2007-04-03 20:57             ` Ulrich Drepper
2007-04-03 21:00             ` Rik van Riel
2007-04-03 21:10               ` Eric Dumazet
2007-04-03 21:12                 ` Jörn Engel
2007-04-03 21:15                 ` Rik van Riel
2007-04-03 21:30                   ` Eric Dumazet
2007-04-03 21:22                 ` Jeremy Fitzhardinge
2007-04-03 21:29                   ` Rik van Riel
2007-04-03 21:46                 ` Ulrich Drepper
2007-04-03 22:51                   ` Andi Kleen
2007-04-03 23:07                     ` Ulrich Drepper
2007-04-03 21:16               ` Andrew Morton
2007-04-04 18:49             ` Anton Blanchard
2007-04-04  7:46 ` Nick Piggin
2007-04-04  8:04   ` Nick Piggin
2007-04-04  8:20   ` Jakub Jelinek
2007-04-04  8:47     ` Nick Piggin
2007-04-05  4:23       ` Nick Piggin
2007-04-05 18:38   ` Rik van Riel
2007-04-05 21:07     ` Andrew Morton
2007-04-05 21:39       ` Rik van Riel
2007-04-06  1:28     ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070405122724.b1712aa6.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=dhowells@redhat.com \
    --cc=drepper@redhat.com \
    --cc=hugh@veritas.com \
    --cc=jakub@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox