linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Contention on ->i_shared_lock in dup_mmap()
@ 2000-06-08  3:45 Alexander Viro
  2000-06-08 14:19 ` Stephen C. Tweedie
  2000-06-08 16:21 ` Manfred Spraul
  0 siblings, 2 replies; 4+ messages in thread
From: Alexander Viro @ 2000-06-08  3:45 UTC (permalink / raw)
  To: linux-mm

tests with -ac10 + dcache-ac10-MM1 and results are interesting: most of
contention comes from the dup_mmap().
109 dup_mmap:->i_shared_lock
48 do_syslog(case 5):console_lock
25 d_lookup:dcache_lock
21 enable_irq:desc->lock
20 bdget:bdev_lock
13 bdput:bdev_lock
9 __set_personality:->exit_sem
8 get_empty_filp:files_lock
7 insert_into_queues:hash_table_lock
...

OK, do_syslog() is just plain silly - it's resetting the buffer and code
in question looks so:
                spin_lock_irq(&console_lock);
                logged_chars = 0;
                spin_unlock_irq(&console_lock);
... which is for all purposes equivalent to
		if (logged_chars) {
			...
		}
so this one is easy (looks like a klogd silliness).

dcache_lock may need splitting. Or maybe not - I want to see more testing
results before going there.

bdget() and bdput() are my fault (bad hash-function and too small hash
table). Fixable.

__set_personality() one is actually a bug (it shouldn't be called at all
in the tests I've run) and that's also on todo list.

However, all that stuff pales compared to dup_mmap() one. What happens
there is that we copy all VMAs and insert them into ->i_shared lists of
their inodes. Which requires ->i_shared_lock and that amounts to visible
contention. Notice that most of the calls are followed by exec() and thus
by exit_mmap(), which merrily removes all these VMAs from their lists and
frees them.

Proposal: let's take the head of ->mmap out of the mm_struct, add
reference counter and allow the thing to be shared between different
mm_struct. Rules:
	a) whenever we take ->mmap_sem, take a semaphore on that new
structure (in principle that may make some uses of ->mmap_sem unneeded,
but that's another story).
	b) if we are going to modify the ->mmap (which requires ->mmap_sem
taken) && ->mmap is shared - create a private copy and use it (decrement
the counter on old one, indeed).
	c) fork() should just share the ->mmap with parent.
	d) exec() should drop the reference to ->mmap, killing it if we
were the sole owners.

	In effect it's COW for ->mmap. Comments?

PS: yes, the big lock was _way_ down the list - nowhere near the top ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Contention on ->i_shared_lock in dup_mmap()
  2000-06-08  3:45 Contention on ->i_shared_lock in dup_mmap() Alexander Viro
@ 2000-06-08 14:19 ` Stephen C. Tweedie
  2000-06-08 16:21 ` Manfred Spraul
  1 sibling, 0 replies; 4+ messages in thread
From: Stephen C. Tweedie @ 2000-06-08 14:19 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-mm, Stephen Tweedie

Hi,

On Wed, Jun 07, 2000 at 11:45:58PM -0400, Alexander Viro wrote:
> 
> 	In effect it's COW for ->mmap. Comments?

Sounds plausible, definitely.  However, for the lock to be shared by
mm's which share the mmap list, we need the mmap semaphore to be in
the shared structure.  What then locks the pointer from the mm to the
mmap struct?  In other words, if two tasks share the mmap list and
one blocks on the lock in the mmap list, there's no guarantee that it
still owns the lock when it wakes up.

Think about mm A (with threads X, Y, and Z):

	X forks
	Y starts a mmap COW
	Y goes for the mmap lock (still shared between X, Y and Z)
	Z goes for the mmap lock (still shared between X, Y and Z) 
	X does COW, goes for the mmap lock
	Y finishes, unlocks the two mmaps

Y and Z now have a different mmap, but Z is still waiting
for the old mmap lock on X's mmap list!

If the next thing X does is to exit, that exit may get the lock
before Z gets scheduled, so the mmap lock is deallocated while Z
is still waiting for it.

In other words, to make it work I think you also need to bump up the
refcount on the mmap struct when you wait on the mmap semaphore.  Is
there a danger, then, that we'll be doing unnecessary COW due to the
temporarily-raised refcount?

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Contention on ->i_shared_lock in dup_mmap()
  2000-06-08  3:45 Contention on ->i_shared_lock in dup_mmap() Alexander Viro
  2000-06-08 14:19 ` Stephen C. Tweedie
@ 2000-06-08 16:21 ` Manfred Spraul
  2000-06-08 23:07   ` Alexander Viro
  1 sibling, 1 reply; 4+ messages in thread
From: Manfred Spraul @ 2000-06-08 16:21 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-mm

Alexander Viro wrote:
> 
> OK, do_syslog() is just plain silly - it's resetting the buffer and code
> in question looks so:
>                 spin_lock_irq(&console_lock);
>                 logged_chars = 0;
>                 spin_unlock_irq(&console_lock);
> ... which is for all purposes equivalent to
>                 if (logged_chars) {
>                         ...
>                 }
> so this one is easy (looks like a klogd silliness).
> 
cpu0: do_syslog
	logged_chars = 0;

cpu1: printk()
	logged_chars++;

We really need the spinlock :-(

How many cpu's? How did you measure the contention?
I cannot imagine that do_syslog is really the source for the contention,
perhaps cpu0 was executing a slow console switch, and cpu1 called
do_syslog?

--
	Manfred
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Contention on ->i_shared_lock in dup_mmap()
  2000-06-08 16:21 ` Manfred Spraul
@ 2000-06-08 23:07   ` Alexander Viro
  0 siblings, 0 replies; 4+ messages in thread
From: Alexander Viro @ 2000-06-08 23:07 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: linux-mm


On Thu, 8 Jun 2000, Manfred Spraul wrote:

> Alexander Viro wrote:
> > 
> > OK, do_syslog() is just plain silly - it's resetting the buffer and code
> > in question looks so:
> >                 spin_lock_irq(&console_lock);
> >                 logged_chars = 0;
> >                 spin_unlock_irq(&console_lock);
> > ... which is for all purposes equivalent to
> >                 if (logged_chars) {
> >                         ...
> >                 }
> > so this one is easy (looks like a klogd silliness).
> > 
> cpu0: do_syslog
> 	logged_chars = 0;
> 
> cpu1: printk()
> 	logged_chars++;
> 
> We really need the spinlock :-(

And? With the current code:

cpu0:					cpu1:
	spin_lock_irq
	logged_chars = 0
	spin_unlock_irq
					spin_lock_irq
					...
					logged_chars++;
	break;

IOW, you have no warranty that upon the exit from do_syslog() you will
have logged_chars == 0 and that's precisely the same as you will get if
you replace the code with
	if (logged_chars) {
		spin_lock_irq(&console_lock);
		logged_chars = 0;
		spin_unlock_irq(&console_lock);
	}
- if logged_chars was non-zero we are getting the same result anyway and
if it was and something was in the middle of a changing it to non-zero -
fine, we just act as if do_syslog() happened before that.

As for the source of contention... beats me. _Probably_ weird mutated
klogd, but I didn't look at the patches RH slapped on it. syslog(5,...) is
damn silly anyway - it's an unavoidable race.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2000-06-08 23:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-06-08  3:45 Contention on ->i_shared_lock in dup_mmap() Alexander Viro
2000-06-08 14:19 ` Stephen C. Tweedie
2000-06-08 16:21 ` Manfred Spraul
2000-06-08 23:07   ` Alexander Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox