Re: locking question: do_mmap(), do

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: locking question: do_mmap(), do_munmap()
       [not found] <Pine.LNX.4.10.9910101713010.364-100000@alpha.random>
@ 1999-10-10 15:52 ` Manfred Spraul
  1999-10-10 16:07   ` Alexander Viro
  0 siblings, 1 reply; 51+ messages in thread
From: Manfred Spraul @ 1999-10-10 15:52 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, linux-mm

Andrea Arcangeli wrote:
> 
> On Sun, 10 Oct 1999, Manfred Spraul wrote:
> 
> >But this means that both locks are required if you modify the vma list.
> >Single reader, multiple writer synchronization. Unusual, but interesting
> >:-)
> 
> Yes, that's always been this way also in 2.2.x.

and which lock protects "mm->rss"?

It's not an atomic variable, but
* increased by do_swap_page() outside lock_kernel.
* decreased by the swapper.

I've started adding "assert_down()" and "assert_kernellocked()" macros,
and now I don't see the login prompt any more...

eg. sys_mprotect calls merge_segments without lock_kernel().

--
	Manfred
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 15:52 ` locking question: do_mmap(), do_munmap() Manfred Spraul
@ 1999-10-10 16:07   ` Alexander Viro
  1999-10-10 16:25     ` Alexander Viro
                       ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Alexander Viro @ 1999-10-10 16:07 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Andrea Arcangeli, linux-kernel, Ingo Molnar, linux-mm

[Cc'd to mingo]

On Sun, 10 Oct 1999, Manfred Spraul wrote:

> I've started adding "assert_down()" and "assert_kernellocked()" macros,
> and now I don't see the login prompt any more...
> 
> eg. sys_mprotect calls merge_segments without lock_kernel().

Manfred, Andrea - please stop it. Yes, it does and yes, it should.
Plonking the big lock around every access to VM is _not_ a solution. If
swapper doesn't use mmap_sem - _swapper_ should be fixed. How the hell
does lock_kernel() have smaller deadlock potential than
down(&mm->mmap_sem)?

If you want to return to 2.2 you know where to find it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 16:07   ` Alexander Viro
@ 1999-10-10 16:25     ` Alexander Viro
  1999-10-10 16:45       ` Manfred Spraul
  1999-10-10 17:12       ` Andrea Arcangeli
  1999-10-10 16:56     ` Andrea Arcangeli
  1999-10-11 15:41     ` Stephen C. Tweedie
  2 siblings, 2 replies; 51+ messages in thread
From: Alexander Viro @ 1999-10-10 16:25 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Andrea Arcangeli, linux-kernel, Ingo Molnar, linux-mm


On Sun, 10 Oct 1999, Alexander Viro wrote:

> 
> [Cc'd to mingo]
> 
> On Sun, 10 Oct 1999, Manfred Spraul wrote:
> 
> > I've started adding "assert_down()" and "assert_kernellocked()" macros,
> > and now I don't see the login prompt any more...
> > 
> > eg. sys_mprotect calls merge_segments without lock_kernel().
> 
> Manfred, Andrea - please stop it. Yes, it does and yes, it should.
> Plonking the big lock around every access to VM is _not_ a solution. If
> swapper doesn't use mmap_sem - _swapper_ should be fixed. How the hell
> does lock_kernel() have smaller deadlock potential than
> down(&mm->mmap_sem)?

OK, folks. Code in swapper (unuse_process(), right?) is called only from
sys_swapoff(). It's a syscall. Andrea, could you show a scenario for
deadlock here? OK, some process (but not the process doing swapoff()) may
have the map locked So?  it is not going to release the thing - we are
seriously screwed anyway (read: we already are in deadlock). We don't hold
the semaphore ourselves.

Andrea, post a deadlock scenario, please.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 16:25     ` Alexander Viro
@ 1999-10-10 16:45       ` Manfred Spraul
  1999-10-10 17:25         ` Alexander Viro
  1999-10-10 17:12       ` Andrea Arcangeli
  1 sibling, 1 reply; 51+ messages in thread
From: Manfred Spraul @ 1999-10-10 16:45 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Andrea Arcangeli, linux-kernel, Ingo Molnar, linux-mm

Alexander Viro wrote:
> 
> On Sun, 10 Oct 1999, Alexander Viro wrote:
> 
> >
> > [Cc'd to mingo]
> >
> > On Sun, 10 Oct 1999, Manfred Spraul wrote:
> >
> > > I've started adding "assert_down()" and "assert_kernellocked()" macros,
> > > and now I don't see the login prompt any more...
> > >
> > > eg. sys_mprotect calls merge_segments without lock_kernel().
> >
> > Manfred, Andrea - please stop it. Yes, it does and yes, it should.

Yes, it should cause oops?

> > Plonking the big lock around every access to VM is _not_ a solution

I never did that, I'll never do that, I only notice that the current
code is filled with races.

> >. If
> > swapper doesn't use mmap_sem - _swapper_ should be fixed. How the hell
> > does lock_kernel() have smaller deadlock potential than
> > down(&mm->mmap_sem)?

lock_kernel() is dropped on thread switch, the semaphore is not dropped.

> 
> OK, folks. Code in swapper (unuse_process(), right?) is called only from
> sys_swapoff(). It's a syscall. Andrea, could you show a scenario for
> deadlock here? OK, some process (but not the process doing swapoff()) may
> have the map locked So?  it is not going to release the thing - we are
> seriously screwed anyway (read: we already are in deadlock). We don't hold
> the semaphore ourselves.

AFAIK the problem is OOM:
* a process accesses a not-present, ie page fault:
...
handle_mm_fault(): this process own mm->mmap_sem.
->handle_pte_fault().
-> (eg.) do_wp_page().
-> get_free_page().
now get_free_page() notices that there is no free memory.
--> wakeup kswapd.

* the swapper runs, and it tries to swap out data from that process.
mm->mmap_sem is already acquired --> lock-up.

--
	Manfred
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 16:45       ` Manfred Spraul
@ 1999-10-10 17:25         ` Alexander Viro
  0 siblings, 0 replies; 51+ messages in thread
From: Alexander Viro @ 1999-10-10 17:25 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Andrea Arcangeli, linux-kernel, Ingo Molnar, linux-mm

[Apologies to Andrea - idiot me looked into the wrong place ;-<]

On Sun, 10 Oct 1999, Manfred Spraul wrote:

> AFAIK the problem is OOM:
> * a process accesses a not-present, ie page fault:
> ...
> handle_mm_fault(): this process own mm->mmap_sem.
> ->handle_pte_fault().
> -> (eg.) do_wp_page().
> -> get_free_page().
> now get_free_page() notices that there is no free memory.
> --> wakeup kswapd.
> 
> * the swapper runs, and it tries to swap out data from that process.
> mm->mmap_sem is already acquired --> lock-up.

Nasty... But adding a big lock around all traversals of the mm->mmap will
hurt like hell. Let's see... The problem is in swap_out_mm(), right? And
it looks for the first suitable page. OK, it looks like we can get a
deadlock only if we call try_to_free_pages() with ->mmap_sem grabbed and
__GFP_WAIT in flags. Umhm... Called only from __get_free_pages() which
sets PF_MEMALLOC... OK, I'll try to look at it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 16:25     ` Alexander Viro
  1999-10-10 16:45       ` Manfred Spraul
@ 1999-10-10 17:12       ` Andrea Arcangeli
  1999-10-10 17:48         ` Alexander Viro
  1999-10-11 15:43         ` Stephen C. Tweedie
  1 sibling, 2 replies; 51+ messages in thread
From: Andrea Arcangeli @ 1999-10-10 17:12 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Manfred Spraul, linux-kernel, Ingo Molnar, linux-mm

On Sun, 10 Oct 1999, Alexander Viro wrote:

>sys_swapoff(). It's a syscall. Andrea, could you show a scenario for

do_page_fault -> down() -> GFP -> swap_out() -> down() -> deadlock

To grab the mm semaphore in swap_out we could swap_out only from kswapd
doing a kind of wakeup_and_wait_kswapd() ala wakeup_bdflush(1) but it would
be slow and I don't want to run worse than in 2.2.x in UP to get some more
SMP scalability in SMP (that won't pay the cost).

The other option is to make the mmap semaphore recursive checking that GFP
is not called in the middle of a vma change. I don't like this one it sound
not robust as the spinlock way to me (see below).

What I like is to go as in 2.2.x with a proper spinlock for doing vma
reads (I am _not_ talking about the big kernel lock!).

Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 17:12       ` Andrea Arcangeli
@ 1999-10-10 17:48         ` Alexander Viro
  1999-10-10 18:42           ` Manfred Spraul
  1999-10-11 15:43         ` Stephen C. Tweedie
  1 sibling, 1 reply; 51+ messages in thread
From: Alexander Viro @ 1999-10-10 17:48 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Manfred Spraul, linux-kernel, Ingo Molnar, linux-mm

On Sun, 10 Oct 1999, Andrea Arcangeli wrote:

> On Sun, 10 Oct 1999, Alexander Viro wrote:
> 
> >sys_swapoff(). It's a syscall. Andrea, could you show a scenario for
> 
> do_page_fault -> down() -> GFP -> swap_out() -> down() -> deadlock

Yes, I had realized that I was looking into the wrong place. Unfortunately
after I've sent a posting. My apologies.

> To grab the mm semaphore in swap_out we could swap_out only from kswapd
> doing a kind of wakeup_and_wait_kswapd() ala wakeup_bdflush(1) but it would
> be slow and I don't want to run worse than in 2.2.x in UP to get some more
> SMP scalability in SMP (that won't pay the cost).
> 
> The other option is to make the mmap semaphore recursive checking that GFP
> is not called in the middle of a vma change. I don't like this one it sound
> not robust as the spinlock way to me (see below).
> 
> What I like is to go as in 2.2.x with a proper spinlock for doing vma
> reads (I am _not_ talking about the big kernel lock!).

I'm not sure that it will work (we scan the thing in many places and
quite a few may be blocking ;-/), unless you propose to protect individual
steps of the scan, which will give you lots of overhead. I suspect that
swap_out_mm() needs fixing, not everything else... And it looks like we
can't drop the sucker earlier in handle_mm_fault. Or can we?

As crazy as it may sound, what about keeping a small cache of pages,
taking from that cache and doing refills when we are crossing the boundary
of dangerous area (refusing to enter it until the number of pages in cache
will grow bigger than amount of processes in dangerous part)?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 17:48         ` Alexander Viro
@ 1999-10-10 18:42           ` Manfred Spraul
  1999-10-10 19:03             ` Alexander Viro
  1999-10-11 15:47             ` Stephen C. Tweedie
  0 siblings, 2 replies; 51+ messages in thread
From: Manfred Spraul @ 1999-10-10 18:42 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Andrea Arcangeli, linux-kernel, Ingo Molnar, linux-mm

Alexander Viro wrote:
> I'm not sure that it will work (we scan the thing in many places and
> quite a few may be blocking ;-/), unless you propose to protect individual
> steps of the scan, which will give you lots of overhead.

The overhead should be low, we could keep the "double synchronization",
ie
* either down(&mm->mmap_sem) or spin_lock(&mm->vma_list_lock) for read
* both locks for write.

I think that 3 to 5 spin_lock() calls are required.

> I suspect that
> swap_out_mm() needs fixing, not everything else... And it looks like we
> can't drop the sucker earlier in handle_mm_fault. Or can we?

That would be a good idea:
For multi-threaded applications, swap-in is currently single-threaded,
ie we do not overlap the io operations if 2 threads of the same process
cause page faults. Everything is fully serialized.

But I think this would be a huge change, eg do_munmap() in one thread
while another thread waits for page-in....

--
	Manfred
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 18:42           ` Manfred Spraul
@ 1999-10-10 19:03             ` Alexander Viro
  1999-10-10 21:31               ` Manfred Spraul
                                 ` (2 more replies)
  1999-10-11 15:47             ` Stephen C. Tweedie
  1 sibling, 3 replies; 51+ messages in thread
From: Alexander Viro @ 1999-10-10 19:03 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Andrea Arcangeli, linux-kernel, Ingo Molnar, linux-mm

On Sun, 10 Oct 1999, Manfred Spraul wrote:

> Alexander Viro wrote:
> > I'm not sure that it will work (we scan the thing in many places and
> > quite a few may be blocking ;-/), unless you propose to protect individual
> > steps of the scan, which will give you lots of overhead.
> 
> The overhead should be low, we could keep the "double synchronization",
> ie
> * either down(&mm->mmap_sem) or spin_lock(&mm->vma_list_lock) for read
> * both locks for write.
> 
> I think that 3 to 5 spin_lock() calls are required.

Hold on. In swap_out_mm() you have to protect find_vma() (OK, it doesn't
block, but we'll have to take care of mm->mmap_cache) _and_ you'll have to
protect vma from destruction all way down to try_to_swap_out(). And to
vma->swapout(). Which can sleep, so spinlocks are out of question here.

I still think that just keeping a cyclic list of pages, grabbing from that
list before taking mmap_sem _if_ we have a chance for blocking
__get_free_page(), refilling if the list is empty (prior to down()) and
returning the page into the list if we didn't use it may be the simplest
way.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 19:03             ` Alexander Viro
@ 1999-10-10 21:31               ` Manfred Spraul
  1999-10-10 21:53               ` Andrea Arcangeli
  1999-10-11 15:50               ` Stephen C. Tweedie
  2 siblings, 0 replies; 51+ messages in thread
From: Manfred Spraul @ 1999-10-10 21:31 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Andrea Arcangeli, linux-kernel, Ingo Molnar, linux-mm

Alexander Viro wrote:
> Hold on. In swap_out_mm() you have to protect find_vma() (OK, it doesn't
> block, but we'll have to take care of mm->mmap_cache) _and_ you'll have to
> protect vma from destruction all way down to try_to_swap_out(). And to
> vma->swapout(). Which can sleep, so spinlocks are out of question here.

I found vma->swapout() when I tried to implement it. Sh... 
We could make vma_list_lock a semaphore, but I haven't checked for any
hidden problems yet.

> 
> I still think that just keeping a cyclic list of pages, grabbing from that
> list before taking mmap_sem _if_ we have a chance for blocking
> __get_free_page(), refilling if the list is empty (prior to down()) and
> returning the page into the list if we didn't use it may be the simplest
> way.

I don't like the idea, but it sounds possible.
A problem could be that the page-in functions can allocate memory:
do_nopage() -> filemap_nopage(): it calls i_op->readpage() which would
call get_block(), eg ext2: load the indirect page, this needs memory -->
OOM.

--
	Manfred
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 19:03             ` Alexander Viro
  1999-10-10 21:31               ` Manfred Spraul
@ 1999-10-10 21:53               ` Andrea Arcangeli
  1999-10-10 22:34                 ` Alexander Viro
  1999-10-11 15:50               ` Stephen C. Tweedie
  2 siblings, 1 reply; 51+ messages in thread
From: Andrea Arcangeli @ 1999-10-10 21:53 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Manfred Spraul, linux-kernel, Ingo Molnar, linux-mm

On Sun, 10 Oct 1999, Alexander Viro wrote:

>I still think that just keeping a cyclic list of pages, grabbing from that
>list before taking mmap_sem _if_ we have a chance for blocking
>__get_free_page(), refilling if the list is empty (prior to down()) and
>returning the page into the list if we didn't use it may be the simplest
>way.

I can't understand very well your plan.

We just have a security pool. We just block only when the pool become low.
To refill our just existing pool we have to walk the vmas. That's the
problem in first place.

Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 21:53               ` Andrea Arcangeli
@ 1999-10-10 22:34                 ` Alexander Viro
  1999-10-10 23:28                   ` Andrea Arcangeli
  0 siblings, 1 reply; 51+ messages in thread
From: Alexander Viro @ 1999-10-10 22:34 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Manfred Spraul, linux-kernel, Ingo Molnar, linux-mm

On Sun, 10 Oct 1999, Andrea Arcangeli wrote:

> On Sun, 10 Oct 1999, Alexander Viro wrote:
> 
> >I still think that just keeping a cyclic list of pages, grabbing from that
> >list before taking mmap_sem _if_ we have a chance for blocking
> >__get_free_page(), refilling if the list is empty (prior to down()) and
> >returning the page into the list if we didn't use it may be the simplest
> >way.
> 
> I can't understand very well your plan.
> 
> We just have a security pool. We just block only when the pool become low.
> To refill our just existing pool we have to walk the vmas. That's the
> problem in first place.

I missed the fact that page-in can suck additional pages. Sorry. Original
idea was to do that _before_ we are getting the mmap_sem - that would
allow to grab it in swap_out_mm. 

We can't do _anything_ with vmas while swap_out_mm is running over their
mm - no list modifications, no vma removal, etc. We could introduce a new
semaphore (spinlocks are not going to work - ->swapout gets vma as
argument and it can sleep. The question being: where can we trigger
__get_free_pages() with __GFP_WAIT if the mmap_sem is held? And another
one - where do we modify ->mmap? If they can be easily separated -
fine. Then we need to protect the thing with semaphore - no contention in
normal case (we already have mmap_sem), enough protection in the swapper.
Probably it make sense to choose the protected area as wide as possible -
there will be no contention in normal case and we can cut the overhead
down.

I'll try to look through the thing tonight (while the truncate stuff will
eat the fs on the testbox ;-), but I'ld be really grateful if some of VM
people would check the results.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 22:34                 ` Alexander Viro
@ 1999-10-10 23:28                   ` Andrea Arcangeli
  0 siblings, 0 replies; 51+ messages in thread
From: Andrea Arcangeli @ 1999-10-10 23:28 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Manfred Spraul, linux-kernel, Ingo Molnar, linux-mm

On Sun, 10 Oct 1999, Alexander Viro wrote:

>mm - no list modifications, no vma removal, etc. We could introduce a new
>semaphore (spinlocks are not going to work - ->swapout gets vma as

We could add a reference count to each vma protected by a per-mm spinlock.
Then we could drop the spinlock in swap_out_mm as soon as we incremented
the refcount of the vma. I am talking by memory, I am not sure if it can
be really done and if it's the right thing to do this time. (I'll check
the code ASAP).

>argument and it can sleep. The question being: where can we trigger
>__get_free_pages() with __GFP_WAIT if the mmap_sem is held? And another

All userspace allocations do exactly that.

>one - where do we modify ->mmap? If they can be easily separated -

We modify mmap in the mmap.c and infact we hold the semaphore there too.

The reason they are not separated is that during all the page fault path
you can't have _your_ vmas to change under you if another thread is
running munmap in parallel.

>eat the fs on the testbox ;-), but I'ld be really grateful if some of VM
>people would check the results.

I can check them of course ;).

Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 19:03             ` Alexander Viro
  1999-10-10 21:31               ` Manfred Spraul
  1999-10-10 21:53               ` Andrea Arcangeli
@ 1999-10-11 15:50               ` Stephen C. Tweedie
  1999-10-11 16:05                 ` Alexander Viro
  2 siblings, 1 reply; 51+ messages in thread
From: Stephen C. Tweedie @ 1999-10-11 15:50 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Manfred Spraul, Andrea Arcangeli, linux-kernel, Ingo Molnar,
	linux-mm, Stephen Tweedie

Hi,

On Sun, 10 Oct 1999 15:03:45 -0400 (EDT), Alexander Viro
<viro@math.psu.edu> said:

> Hold on. In swap_out_mm() you have to protect find_vma() (OK, it doesn't
> block, but we'll have to take care of mm->mmap_cache) _and_ you'll have to
> protect vma from destruction all way down to try_to_swap_out(). And to
> vma->swapout(). Which can sleep, so spinlocks are out of question
> here.

No, spinlocks would be ideal.  The vma swapout codes _have_ to be
prepared for the vma to be destroyed as soon as we sleep.  In fact, the
entire mm may disappear if the process happens to exit.  Once we know
which page to write where, the swapout operation becomes a per-page
operation, not per-vma.

We had some rather interesting bugs in the 1.0 and 1.2 timeframes
surrounding exactly this.  Nowadays we assume that all mm context
disappears as soon as the swapper blocks.  We bump the refcount on the
page itself to make the page write safe, but we have to be very careful
indeed to do all the mm manipulations before we get to that stage.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 15:50               ` Stephen C. Tweedie
@ 1999-10-11 16:05                 ` Alexander Viro
  1999-10-11 18:02                   ` Manfred Spraul
  1999-10-11 20:13                   ` Stephen C. Tweedie
  0 siblings, 2 replies; 51+ messages in thread
From: Alexander Viro @ 1999-10-11 16:05 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Manfred Spraul, Andrea Arcangeli, linux-kernel, Ingo Molnar, linux-mm


On Mon, 11 Oct 1999, Stephen C. Tweedie wrote:

> Hi,
> 
> On Sun, 10 Oct 1999 15:03:45 -0400 (EDT), Alexander Viro
> <viro@math.psu.edu> said:
> 
> > Hold on. In swap_out_mm() you have to protect find_vma() (OK, it doesn't
> > block, but we'll have to take care of mm->mmap_cache) _and_ you'll have to
> > protect vma from destruction all way down to try_to_swap_out(). And to
> > vma->swapout(). Which can sleep, so spinlocks are out of question
> > here.
> 
> No, spinlocks would be ideal.  The vma swapout codes _have_ to be
> prepared for the vma to be destroyed as soon as we sleep.  In fact, the
> entire mm may disappear if the process happens to exit.  Once we know
> which page to write where, the swapout operation becomes a per-page
> operation, not per-vma.

Aha, so you propose to drop it in ->swapout(), right? (after get_file() in
filemap_write_page()... Ouch. Probably we'ld better lambda-expand the call
in filemap_swapout() - the thing is called from other places too)...

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 16:05                 ` Alexander Viro
@ 1999-10-11 18:02                   ` Manfred Spraul
  1999-10-11 19:07                     ` Kanoj Sarcar
                                       ` (3 more replies)
  1999-10-11 20:13                   ` Stephen C. Tweedie
  1 sibling, 4 replies; 51+ messages in thread
From: Manfred Spraul @ 1999-10-11 18:02 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Stephen C. Tweedie, Andrea Arcangeli, linux-kernel, Ingo Molnar,
	linux-mm

Alexander Viro wrote:
> 
> On Mon, 11 Oct 1999, Stephen C. Tweedie wrote:
> 
> > Hi,
> >
> > On Sun, 10 Oct 1999 15:03:45 -0400 (EDT), Alexander Viro
> > <viro@math.psu.edu> said:
> >
> > > Hold on. In swap_out_mm() you have to protect find_vma() (OK, it doesn't
> > > block, but we'll have to take care of mm->mmap_cache) _and_ you'll have to
> > > protect vma from destruction all way down to try_to_swap_out(). And to
> > > vma->swapout(). Which can sleep, so spinlocks are out of question
> > > here.
> >
> > No, spinlocks would be ideal.  The vma swapout codes _have_ to be
> > prepared for the vma to be destroyed as soon as we sleep.  In fact, the
> > entire mm may disappear if the process happens to exit.  Once we know
> > which page to write where, the swapout operation becomes a per-page
> > operation, not per-vma.
> 
> Aha, so you propose to drop it in ->swapout(), right? (after get_file() in
> filemap_write_page()... Ouch. Probably we'ld better lambda-expand the call
> in filemap_swapout() - the thing is called from other places too)...

What about something like a rw-semaphore which protects the vma list:
vma-list modifiers [ie merge_segments(), insert_vm_struct() and
do_munmap()] grab it exclusive, swapper grabs it "shared, starve
exclusive".
All other vma-list readers are protected by mm->mmap_sem.

This should not dead-lock, and no changes are required in
vm_ops->swapout().

--
	Manfred

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 18:02                   ` Manfred Spraul
@ 1999-10-11 19:07                     ` Kanoj Sarcar
  1999-10-11 22:23                       ` Stephen C. Tweedie
  1999-10-11 20:15                     ` Stephen C. Tweedie
                                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 51+ messages in thread
From: Kanoj Sarcar @ 1999-10-11 19:07 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: viro, sct, andrea, linux-kernel, mingo, linux-mm

> 
> What about something like a rw-semaphore which protects the vma list:
> vma-list modifiers [ie merge_segments(), insert_vm_struct() and
> do_munmap()] grab it exclusive, swapper grabs it "shared, starve
> exclusive".
> All other vma-list readers are protected by mm->mmap_sem.
> 
> This should not dead-lock, and no changes are required in
> vm_ops->swapout().
>

I have tried to follow most of the logic and solutions proposed
on this thread. This is the best solution, imo. In fact, I had
already coded something on these lines against a 2.2.10 kernel,
which I still have around. I will try to port this against a 
2.3.19 kernel over the next couple of days and post it for
everyone to review.

Thanks.

Kanoj
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 19:07                     ` Kanoj Sarcar
@ 1999-10-11 22:23                       ` Stephen C. Tweedie
  1999-10-13  1:25                         ` Kanoj Sarcar
  0 siblings, 1 reply; 51+ messages in thread
From: Stephen C. Tweedie @ 1999-10-11 22:23 UTC (permalink / raw)
  To: Kanoj Sarcar
  Cc: Manfred Spraul, viro, sct, andrea, linux-kernel, mingo, linux-mm

Hi,

On Mon, 11 Oct 1999 12:07:08 -0700 (PDT), kanoj@google.engr.sgi.com
(Kanoj Sarcar) said:

>> What about something like a rw-semaphore which protects the vma list:
>> vma-list modifiers [ie merge_segments(), insert_vm_struct() and
>> do_munmap()] grab it exclusive, swapper grabs it "shared, starve
>> exclusive".
>> All other vma-list readers are protected by mm->mmap_sem.


> I have tried to follow most of the logic and solutions proposed
> on this thread. 

It will deadlock.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 22:23                       ` Stephen C. Tweedie
@ 1999-10-13  1:25                         ` Kanoj Sarcar
  1999-10-13  7:32                           ` Manfred Spraul
  1999-10-13 10:45                           ` Stephen C. Tweedie
  0 siblings, 2 replies; 51+ messages in thread
From: Kanoj Sarcar @ 1999-10-13  1:25 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: manfreds, viro, andrea, linux-kernel, mingo, linux-mm

> 
> Hi,
> 
> On Mon, 11 Oct 1999 12:07:08 -0700 (PDT), kanoj@google.engr.sgi.com
> (Kanoj Sarcar) said:
> 
> >> What about something like a rw-semaphore which protects the vma list:
> >> vma-list modifiers [ie merge_segments(), insert_vm_struct() and
> >> do_munmap()] grab it exclusive, swapper grabs it "shared, starve
> >> exclusive".
> >> All other vma-list readers are protected by mm->mmap_sem.
> 
> It will deadlock.
> 

Not sure why you say that. Yes, I did see the nesting semdaphore grabbing
scenario that you posted, but I think you can prevent that from 
happening if you follow some rules. 

Here's a primitive patch showing the direction I am thinking of. I do not
have any problem with a spinning lock, but I coded this against 2.2.10,
where insert_vm_struct could go to sleep, hence I had to use sleeping
locks to protect the vma chain. Slight change is needed to vmscan.c
to use spinning locks.

Kanoj

This is a skeleton of the solution that prevents kswapd from walking
down a vma chain without protections. I am trying to get comments on
this approach before I try a full blown implementation.

The rules:
1. To modify the vmlist (add/delete), you must hold mmap_sem to 
guard against clones doing mmap/munmap/faults, (ie all vm system 
calls and faults), and from ptrace, swapin due to swap deletion
etc.
2. To modify the vmlist (add/delete), you must also hold
vmlist_modify_lock, to guard against page stealers scanning the
list.
3. To scan the vmlist, you must either 
	a. grab mmap_sem, which should be all cases except page stealer.
or
	b. grab vmlist_access_lock, only done by page stealer.
4. While holding the vmlist_modify_lock, you must be able to guarantee
that no code path will lead to page stealing.
5. You must be able to guarantee that while holding vmlist_modify_lock
or vmlist_access_lock of mm A, you will not try to get either lock
for mm B.

The assumptions:
1. No code path reachable thru insert_vm_struct and merge_segments
sleep for memory.


--- /usr/tmp/p_rdiff_a00368/vmscan.c	Tue Oct 12 17:58:50 1999
+++ mm/vmscan.c	Tue Oct 12 16:55:13 1999
@@ -295,6 +295,7 @@
 	/*
 	 * Find the proper vm-area
 	 */
+	vmlist_access_lock(mm);
 	vma = find_vma(mm, address);
 	if (vma) {
 		if (address < vma->vm_start)
@@ -302,8 +303,10 @@
 
 		for (;;) {
 			int result = swap_out_vma(vma, address, gfp_mask);
-			if (result)
+			if (result) {
+				vmlist_access_unlock(mm);
 				return result;
+			}
 			vma = vma->vm_next;
 			if (!vma)
 				break;
@@ -310,6 +313,7 @@
 			address = vma->vm_start;
 		}
 	}
+	vmlist_access_unlock(mm);
 
 	/* We didn't find anything for the process */
 	mm->swap_cnt = 0;
--- /usr/tmp/p_rdiff_a0036H/mlock.c	Tue Oct 12 17:59:06 1999
+++ mm/mlock.c	Tue Oct 12 16:35:25 1999
@@ -13,7 +13,9 @@
 
 static inline int mlock_fixup_all(struct vm_area_struct * vma, int newflags)
 {
+	vmlist_modify_lock(vma->vm_mm);
 	vma->vm_flags = newflags;
+	vmlist_modify_unlock(vma->vm_mm);
 	return 0;
 }
 
@@ -26,15 +28,17 @@
 	if (!n)
 		return -EAGAIN;
 	*n = *vma;
-	vma->vm_start = end;
 	n->vm_end = end;
-	vma->vm_offset += vma->vm_start - n->vm_start;
 	n->vm_flags = newflags;
 	if (n->vm_file)
 		get_file(n->vm_file);
 	if (n->vm_ops && n->vm_ops->open)
 		n->vm_ops->open(n);
+	vmlist_modify_lock(vma->vm_mm);
+	vma->vm_offset += end - vma->vm_start;
+	vma->vm_start = end;
 	insert_vm_struct(current->mm, n);
+	vmlist_modify_unlock(vma->vm_mm);
 	return 0;
 }
 
@@ -47,7 +51,6 @@
 	if (!n)
 		return -EAGAIN;
 	*n = *vma;
-	vma->vm_end = start;
 	n->vm_start = start;
 	n->vm_offset += n->vm_start - vma->vm_start;
 	n->vm_flags = newflags;
@@ -55,7 +58,10 @@
 		get_file(n->vm_file);
 	if (n->vm_ops && n->vm_ops->open)
 		n->vm_ops->open(n);
+	vmlist_modify_lock(vma->vm_mm);
+	vma->vm_end = start;
 	insert_vm_struct(current->mm, n);
+	vmlist_modify_unlock(vma->vm_mm);
 	return 0;
 }
 
@@ -75,10 +81,7 @@
 	*left = *vma;
 	*right = *vma;
 	left->vm_end = start;
-	vma->vm_start = start;
-	vma->vm_end = end;
 	right->vm_start = end;
-	vma->vm_offset += vma->vm_start - left->vm_start;
 	right->vm_offset += right->vm_start - left->vm_start;
 	vma->vm_flags = newflags;
 	if (vma->vm_file)
@@ -88,8 +91,14 @@
 		vma->vm_ops->open(left);
 		vma->vm_ops->open(right);
 	}
+	vmlist_modify_lock(vma->vm_mm);
+	vma->vm_offset += start - vma->vm_start;
+	vma->vm_start = start;
+	vma->vm_end = end;
+	vma->vm_flags = newflags;
 	insert_vm_struct(current->mm, left);
 	insert_vm_struct(current->mm, right);
+	vmlist_modify_unlock(vma->vm_mm);
 	return 0;
 }
 
@@ -168,7 +177,9 @@
 			break;
 		}
 	}
+	vmlist_modify_lock(current->mm);
 	merge_segments(current->mm, start, end);
+	vmlist_modify_unlock(current->mm);
 	return error;
 }
 
@@ -240,7 +251,9 @@
 		if (error)
 			break;
 	}
+	vmlist_modify_lock(current->mm);
 	merge_segments(current->mm, 0, TASK_SIZE);
+	vmlist_modify_unlock(current->mm);
 	return error;
 }
 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-13  1:25                         ` Kanoj Sarcar
@ 1999-10-13  7:32                           ` Manfred Spraul
  1999-10-15  9:58                             ` Ralf Baechle
  1999-10-13 10:45                           ` Stephen C. Tweedie
  1 sibling, 1 reply; 51+ messages in thread
From: Manfred Spraul @ 1999-10-13  7:32 UTC (permalink / raw)
  To: Kanoj Sarcar
  Cc: Stephen C. Tweedie, viro, andrea, linux-kernel, mingo, linux-mm

Kanoj Sarcar wrote:
> Here's a primitive patch showing the direction I am thinking of. I do not
> have any problem with a spinning lock, but I coded this against 2.2.10,
> where insert_vm_struct could go to sleep, hence I had to use sleeping
> locks to protect the vma chain.

I found a few places where I don't know how to change them.

1) arch/mips/mm/r4xx0.c:
their flush_cache_range() function internally calls find_vma().
flush_cache_range() is called by proc/mem.c, and it seems that this
function cannot get the mmap semaphore.
Currently, every caller of flush_cache_range() either owns the kernel
lock or the mmap_sem.
OTHO, this function contains a race anyway [src_vma can go away if
handle_mm_fault() sleeps, src_vma is used at the end of the function.]

2) arch/sparc/mm/fault.c:
>    /* This conditional is 'interesting'. */
>         if (pgd_val(*pgdp) && !(write && !(pte_val(*ptep) & _SUN4C_PAGE_WRITE))
>             && (pte_val(*ptep) & _SUN4C_PAGE_VALID))
>                 /* Note: It is safe to not grab the MMAP semaphore here because
>                  *       we know that update_mmu_cache() will not sleep for
>                  *       any reason (at least not in the current implementation)
>                  *       and therefore there is no danger of another thread getting
>                  *       on the CPU and doing a shrink_mmap() on this vma.
>                  */
>                 sun4c_update_mmu_cache (find_vma(current->mm, address), address,
>                                         *ptep);
>         else
>                 do_sparc_fault(regs, text_fault, write, address);
> }
could be safe because sun4c is only UP?

3) include/ppc-asm/pgtable.h:
> extern __inline__ pte_t *find_pte(struct mm_struct *mm,unsigned long va)
> {
>         pgd_t *dir;
>         pmd_t *pmd;
>         pte_t *pte;
> 
>         va &= PAGE_MASK;
>         
>         dir = pgd_offset( mm, va );
>         if (dir)
>         {
>                 pmd = pmd_offset(dir, va & PAGE_MASK);
>                 if (pmd && pmd_present(*pmd))
>                 {
>                         pte = pte_offset(pmd, va);
>                         if (pte && pte_present(*pte))
>                         {                       
>                                 pte_uncache(*pte);
>                                 flush_tlb_page(find_vma(mm,va),va);
>                         }
>                 }
>         }
>         return pte;
> }
Could be safe because only called for "init_mm"?

I've not yet looked at swap_out [mm/swapfile.c and
arch/m68k/atari/stram.c] and proc/array.c

--
	Manfred

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-13  7:32                           ` Manfred Spraul
@ 1999-10-15  9:58                             ` Ralf Baechle
  1999-10-15 17:50                               ` Kanoj Sarcar
  0 siblings, 1 reply; 51+ messages in thread
From: Ralf Baechle @ 1999-10-15  9:58 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Kanoj Sarcar, Stephen C. Tweedie, viro, andrea, linux-kernel,
	mingo, linux-mm, linux, linux-mips, linux-mips

On Wed, Oct 13, 1999 at 09:32:54AM +0200, Manfred Spraul wrote:

> Kanoj Sarcar wrote:
> > Here's a primitive patch showing the direction I am thinking of. I do not
> > have any problem with a spinning lock, but I coded this against 2.2.10,
> > where insert_vm_struct could go to sleep, hence I had to use sleeping
> > locks to protect the vma chain.
> 
> I found a few places where I don't know how to change them.
> 
> 1) arch/mips/mm/r4xx0.c:
> their flush_cache_range() function internally calls find_vma().
> flush_cache_range() is called by proc/mem.c, and it seems that this
> function cannot get the mmap semaphore.
> Currently, every caller of flush_cache_range() either owns the kernel
> lock or the mmap_sem.
> OTHO, this function contains a race anyway [src_vma can go away if
> handle_mm_fault() sleeps, src_vma is used at the end of the function.]

The sole reason for fiddling with the VMA is that we try to optimize
icache flushing for non-VM_EXEC vmas.  This optimization is broken
as the MIPS hardware doesn't make a difference between read and execute
in page permissions, so the icache might be dirty even though the vma
has no exec permission.  So I'll have to re-implement this whole things
anyway.  The other problem is an efficience problem.  A call like
flush_cache_range(some_mm_ptr, 0, TASK_SIZE) would take a minor eternity
and for MIPS64 a full eternity ...

  Ralf
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-15  9:58                             ` Ralf Baechle
@ 1999-10-15 17:50                               ` Kanoj Sarcar
  0 siblings, 0 replies; 51+ messages in thread
From: Kanoj Sarcar @ 1999-10-15 17:50 UTC (permalink / raw)
  To: Ralf Baechle
  Cc: manfreds, sct, viro, andrea, linux-kernel, mingo, linux-mm,
	linux, linux-mips, linux-mips

> 
> On Wed, Oct 13, 1999 at 09:32:54AM +0200, Manfred Spraul wrote:
> 
> > Kanoj Sarcar wrote:
> > > Here's a primitive patch showing the direction I am thinking of. I do not
> > > have any problem with a spinning lock, but I coded this against 2.2.10,
> > > where insert_vm_struct could go to sleep, hence I had to use sleeping
> > > locks to protect the vma chain.
> > 
> > I found a few places where I don't know how to change them.
> > 
> > 1) arch/mips/mm/r4xx0.c:
> > their flush_cache_range() function internally calls find_vma().
> > flush_cache_range() is called by proc/mem.c, and it seems that this
> > function cannot get the mmap semaphore.
> > Currently, every caller of flush_cache_range() either owns the kernel
> > lock or the mmap_sem.
> > OTHO, this function contains a race anyway [src_vma can go away if
> > handle_mm_fault() sleeps, src_vma is used at the end of the function.]
> 
> The sole reason for fiddling with the VMA is that we try to optimize
> icache flushing for non-VM_EXEC vmas.  This optimization is broken
> as the MIPS hardware doesn't make a difference between read and execute
> in page permissions, so the icache might be dirty even though the vma
> has no exec permission.  So I'll have to re-implement this whole things
> anyway.  The other problem is an efficience problem.  A call like
> flush_cache_range(some_mm_ptr, 0, TASK_SIZE) would take a minor eternity
> and for MIPS64 a full eternity ...
> 
>   Ralf

Ralf,

Looking in 2.3.21, all the find_vma's in arch/mips/mm/r4xx0.c are used to 
set a flag called "text" which is not used at all. Also, if the find_vma
returns null, the code basically does nothing. So the optimized icache
flushing is probably not implemented yet? Then, the only reason to 
do the flush_vma currently is to check whether the lower level flush 
routine should be called. Without holding some locks, this is always
tricky to do on a third party mm.

Btw, this probably belongs to linux-mips, but what do you mean by saying
the icache might be dirty? Its been a while since I worked on the
older mips chips, but as far as I remember, the icache can not hold 
dirty lines.

Kanoj
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-13  1:25                         ` Kanoj Sarcar
  1999-10-13  7:32                           ` Manfred Spraul
@ 1999-10-13 10:45                           ` Stephen C. Tweedie
  1 sibling, 0 replies; 51+ messages in thread
From: Stephen C. Tweedie @ 1999-10-13 10:45 UTC (permalink / raw)
  To: Kanoj Sarcar
  Cc: Stephen C. Tweedie, manfreds, viro, andrea, linux-kernel, mingo,
	linux-mm

Hi,

On Tue, 12 Oct 1999 18:25:42 -0700 (PDT), kanoj@google.engr.sgi.com
(Kanoj Sarcar) said:

> This is a skeleton of the solution that prevents kswapd from walking
> down a vma chain without protections. I am trying to get comments on
> this approach before I try a full blown implementation.

> The rules:
> 1. To modify the vmlist (add/delete), you must hold mmap_sem to 
> guard against clones doing mmap/munmap/faults, (ie all vm system 
> calls and faults), and from ptrace, swapin due to swap deletion
> etc.
> 2. To modify the vmlist (add/delete), you must also hold
> vmlist_modify_lock, to guard against page stealers scanning the
> list.
> 3. To scan the vmlist, you must either 
> 	a. grab mmap_sem, which should be all cases except page stealer.
> or
> 	b. grab vmlist_access_lock, only done by page stealer.
> 4. While holding the vmlist_modify_lock, you must be able to guarantee
> that no code path will lead to page stealing.
> 5. You must be able to guarantee that while holding vmlist_modify_lock
> or vmlist_access_lock of mm A, you will not try to get either lock
> for mm B.

This looks like the same mechanism and set of rules that Al Viro
proposed, and it seems watertight.  I'd like the locking written down in
the source somewhere if anyone implements this, btw, as otherwise we'll
just be fixing it ourselves every time in the future when somebody who
doesn't understand them touches anything in the mmap paths...

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 18:02                   ` Manfred Spraul
  1999-10-11 19:07                     ` Kanoj Sarcar
@ 1999-10-11 20:15                     ` Stephen C. Tweedie
  1999-10-11 21:14                       ` Manfred Spraul
  1999-10-11 21:37                     ` Alexander Viro
  1999-10-11 22:22                     ` Stephen C. Tweedie
  3 siblings, 1 reply; 51+ messages in thread
From: Stephen C. Tweedie @ 1999-10-11 20:15 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Alexander Viro, Stephen C. Tweedie, Andrea Arcangeli,
	linux-kernel, Ingo Molnar, linux-mm

Hi,

On Mon, 11 Oct 1999 20:02:40 +0200, Manfred Spraul
<manfreds@colorfullife.com> said:

> What about something like a rw-semaphore which protects the vma list:
> vma-list modifiers [ie merge_segments(), insert_vm_struct() and
> do_munmap()] grab it exclusive, swapper grabs it "shared, starve
> exclusive".
> All other vma-list readers are protected by mm->mmap_sem.

> This should not dead-lock, and no changes are required in
> vm_ops-> swapout().

The swapout method will need to drop the spinlock.  We need to preserve
the vma over the call into the swapout method, and the method will need
to be able to block.  

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 20:15                     ` Stephen C. Tweedie
@ 1999-10-11 21:14                       ` Manfred Spraul
  0 siblings, 0 replies; 51+ messages in thread
From: Manfred Spraul @ 1999-10-11 21:14 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Alexander Viro, Andrea Arcangeli, linux-kernel, Ingo Molnar, linux-mm

"Stephen C. Tweedie" wrote:
> The swapout method will need to drop the spinlock.  We need to preserve
> the vma over the call into the swapout method, and the method will need
> to be able to block.

no spinlock, a rw-semaphore, ie a multiple-reader single-writer sync
object which calls schedule() when the resource is busy.

IIRC, the vma-list is only modified by
* insert_vma_struct(): never sleeps, doesn't allocate memory. No
problems with swap-out.
* merge_vm_area(): dito.
* do_munmap(): the area which modifies the vma-list makes no memory
allocations, should make no problems under low-memory.
--> everyone who needs an exclusive access is OOM safe.

Additionally, the swap-out should use a "starve writer"-policy, ie there
will be no dead-locks with multiple concurrent swap-outs in the same
"struct mm" [concurrent means overlapped io, still serialized by
lock_kernel()].
I think the result should be OOM safe without touching
vm_ops->swapout().


--
	Manfred
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 18:02                   ` Manfred Spraul
  1999-10-11 19:07                     ` Kanoj Sarcar
  1999-10-11 20:15                     ` Stephen C. Tweedie
@ 1999-10-11 21:37                     ` Alexander Viro
  1999-10-11 22:13                       ` Manfred Spraul
  1999-10-11 22:22                     ` Stephen C. Tweedie
  3 siblings, 1 reply; 51+ messages in thread
From: Alexander Viro @ 1999-10-11 21:37 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Stephen C. Tweedie, Andrea Arcangeli, linux-kernel, Ingo Molnar,
	linux-mm


On Mon, 11 Oct 1999, Manfred Spraul wrote:

> Alexander Viro wrote:
> > 
> > On Mon, 11 Oct 1999, Stephen C. Tweedie wrote:
> > 
> > > Hi,
> > >
> > > On Sun, 10 Oct 1999 15:03:45 -0400 (EDT), Alexander Viro
> > > <viro@math.psu.edu> said:
> > >
> > > > Hold on. In swap_out_mm() you have to protect find_vma() (OK, it doesn't
> > > > block, but we'll have to take care of mm->mmap_cache) _and_ you'll have to
> > > > protect vma from destruction all way down to try_to_swap_out(). And to
> > > > vma->swapout(). Which can sleep, so spinlocks are out of question
> > > > here.
> > >
> > > No, spinlocks would be ideal.  The vma swapout codes _have_ to be
> > > prepared for the vma to be destroyed as soon as we sleep.  In fact, the
> > > entire mm may disappear if the process happens to exit.  Once we know
> > > which page to write where, the swapout operation becomes a per-page
> > > operation, not per-vma.
> > 
> > Aha, so you propose to drop it in ->swapout(), right? (after get_file() in
> > filemap_write_page()... Ouch. Probably we'ld better lambda-expand the call
> > in filemap_swapout() - the thing is called from other places too)...
> 
> What about something like a rw-semaphore which protects the vma list:
> vma-list modifiers [ie merge_segments(), insert_vm_struct() and
> do_munmap()] grab it exclusive, swapper grabs it "shared, starve
> exclusive".
> All other vma-list readers are protected by mm->mmap_sem.
> 
> This should not dead-lock, and no changes are required in
> vm_ops->swapout().

What does it buy you over the simple semaphore here? Do you really see a
contention scenario?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 21:37                     ` Alexander Viro
@ 1999-10-11 22:13                       ` Manfred Spraul
  0 siblings, 0 replies; 51+ messages in thread
From: Manfred Spraul @ 1999-10-11 22:13 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Stephen C. Tweedie, Andrea Arcangeli, linux-kernel, Ingo Molnar,
	linux-mm

Alexander Viro wrote:
> What does it buy you over the simple semaphore here? Do you really see a
> contention scenario?

I think you are right, I see no case where a normal semaphore would
lock-up and the rw semaphore would not lock up.

we win something _if_ vm_ops->swapout() is extremely slow:
* with lock_kernel() [ie currently], multiple threads can sleep within
vm_ops->swapout() of the same "struct mm"
* an rw-semaphore would mimic that behaviour.
* a normal semaphore would prevent that.

I'm not sure if it is worth to implement a rw-semaphore, especially
since we win something in a very obscure case, but we loose cpu-cycles
for every down_rw()/up_rw() [there is no 2 asm-instruction rw-semaphore]

--
	Manfred
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 18:02                   ` Manfred Spraul
                                       ` (2 preceding siblings ...)
  1999-10-11 21:37                     ` Alexander Viro
@ 1999-10-11 22:22                     ` Stephen C. Tweedie
  1999-10-11 23:01                       ` Alexander Viro
  3 siblings, 1 reply; 51+ messages in thread
From: Stephen C. Tweedie @ 1999-10-11 22:22 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Alexander Viro, Stephen C. Tweedie, Andrea Arcangeli,
	linux-kernel, Ingo Molnar, linux-mm

Hi,

On Mon, 11 Oct 1999 20:02:40 +0200, Manfred Spraul
<manfreds@colorfullife.com> said:

> What about something like a rw-semaphore which protects the vma list:
> vma-list modifiers [ie merge_segments(), insert_vm_struct() and
> do_munmap()] grab it exclusive, swapper grabs it "shared, starve
> exclusive".

Deadlock.  Process A tries to do an mmap on mm A, gets the exclusive
lock, tries to swap out from process B, and grabs mm B's shared lock.
Process B in the mean time is doing the same thing and has an exclusive
lock on mm B, and is trying to share-lock A.  Whoops.

> This should not dead-lock, 

Are you sure?!

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 22:22                     ` Stephen C. Tweedie
@ 1999-10-11 23:01                       ` Alexander Viro
  1999-10-12 14:06                         ` [more fun] " Alexander Viro
  1999-10-13 10:16                         ` Stephen C. Tweedie
  0 siblings, 2 replies; 51+ messages in thread
From: Alexander Viro @ 1999-10-11 23:01 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Manfred Spraul, Andrea Arcangeli, linux-kernel, Ingo Molnar, linux-mm


On Mon, 11 Oct 1999, Stephen C. Tweedie wrote:

> Hi,
> 
> On Mon, 11 Oct 1999 20:02:40 +0200, Manfred Spraul
> <manfreds@colorfullife.com> said:
> 
> > What about something like a rw-semaphore which protects the vma list:
> > vma-list modifiers [ie merge_segments(), insert_vm_struct() and
> > do_munmap()] grab it exclusive, swapper grabs it "shared, starve
> > exclusive".
> 
> Deadlock.  Process A tries to do an mmap on mm A, gets the exclusive
> lock, tries to swap out from process B, and grabs mm B's shared lock.
> Process B in the mean time is doing the same thing and has an exclusive
> lock on mm B, and is trying to share-lock A.  Whoops.

<looking at the places in question>
insert_vm_struct doesn't allocate anything.
Ditto for merge_segments
In do_munmap() the area that should be protected (ripping the vmas from
the list) doesn't allocate anything too.
In the swapper we are protected from recursion, aren't we?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [more fun] Re: locking question: do_mmap(), do_munmap()
  1999-10-11 23:01                       ` Alexander Viro
@ 1999-10-12 14:06                         ` Alexander Viro
  1999-10-13  7:35                           ` Manfred Spraul
  1999-10-13 10:16                         ` Stephen C. Tweedie
  1 sibling, 1 reply; 51+ messages in thread
From: Alexander Viro @ 1999-10-12 14:06 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Manfred Spraul, Andrea Arcangeli, linux-kernel, Ingo Molnar,
	Linus Torvalds, linux-mm

Funny path #1: on 386 (sucky WP) copy_to_user() -> access_ok() ->
__verify_write() -> handle_mm_fault() and no mmap_sem in sight. Ditto for
__verify_write() on sh.

Another one: ptrace_readdata() -> access_process_vm() -> find_extend_vma()
-> make_pages_present(). Again, no mmap_sem in sight.

irix_brk(): calls do_brk() without mmap_sem.

sys_cacheflush() (on m68k): plays with vma without mmap_sem.

Patch follows (2.3.20, but these files didn't change in .21).

diff -urN linux-2.3.20/arch/i386/mm/fault.c linux-bird.mm/arch/i386/mm/fault.c
--- linux-2.3.20/arch/i386/mm/fault.c	Sun Sep 12 11:01:01 1999
+++ linux-bird.mm/arch/i386/mm/fault.c	Tue Oct 12 07:44:55 1999
@@ -35,6 +35,7 @@
 	if (!size)
 		return 1;
 
+	down(&current->mm->mmap_sem);
 	vma = find_vma(current->mm, start);
 	if (!vma)
 		goto bad_area;
@@ -64,6 +65,7 @@
 		if (!(vma->vm_flags & VM_WRITE))
 			goto bad_area;;
 	}
+	up(&current->mm->mmap_sem);
 	return 1;
 
 check_stack:
@@ -73,6 +75,7 @@
 		goto good_area;
 
 bad_area:
+	up(&current->mm->mmap_sem);
 	return 0;
 }
 
diff -urN linux-2.3.20/arch/m68k/kernel/sys_m68k.c linux-bird.mm/arch/m68k/kernel/sys_m68k.c
--- linux-2.3.20/arch/m68k/kernel/sys_m68k.c	Mon Jun 21 12:35:55 1999
+++ linux-bird.mm/arch/m68k/kernel/sys_m68k.c	Tue Oct 12 09:56:24 1999
@@ -535,6 +535,7 @@
 	int ret = -EINVAL;
 
 	lock_kernel();
+	down(&current->mm->mmap_sem);
 	if (scope < FLUSH_SCOPE_LINE || scope > FLUSH_SCOPE_ALL ||
 	    cache & ~FLUSH_CACHE_BOTH)
 		goto out;
@@ -591,6 +592,7 @@
 		ret = cache_flush_060 (addr, scope, cache, len);
 	}
 out:
+	up(&current->mm->mmap_sem);
 	unlock_kernel();
 	return ret;
 }
diff -urN linux-2.3.20/arch/mips/kernel/sysirix.c linux-bird.mm/arch/mips/kernel/sysirix.c
--- linux-2.3.20/arch/mips/kernel/sysirix.c	Sun Sep 12 05:54:08 1999
+++ linux-bird.mm/arch/mips/kernel/sysirix.c	Tue Oct 12 09:46:09 1999
@@ -534,6 +534,7 @@
 	int ret;
 
 	lock_kernel();
+	down(&current->mm->mmap_sem);
 	if (brk < current->mm->end_code) {
 		ret = -ENOMEM;
 		goto out;
@@ -591,6 +592,7 @@
 	ret = 0;
 
 out:
+	up(&current->mm->mmap_sem);
 	unlock_kernel();
 	return ret;
 }
diff -urN linux-2.3.20/arch/sh/mm/fault.c linux-bird.mm/arch/sh/mm/fault.c
--- linux-2.3.20/arch/sh/mm/fault.c	Sun Sep 12 13:29:49 1999
+++ linux-bird.mm/arch/sh/mm/fault.c	Tue Oct 12 09:57:03 1999
@@ -38,6 +38,7 @@
 	if (!size)
 		return 1;
 
+	down(&current->mm->mmap_sem);
 	vma = find_vma(current->mm, start);
 	if (!vma)
 		goto bad_area;
@@ -67,6 +68,7 @@
 		if (!(vma->vm_flags & VM_WRITE))
 			goto bad_area;;
 	}
+	up(&current->mm->mmap_sem);
 	return 1;
 
 check_stack:
@@ -76,6 +78,7 @@
 		goto good_area;
 
 bad_area:
+	up(&current->mm->mmap_sem);
 	return 0;
 }
 
diff -urN linux-2.3.20/kernel/ptrace.c linux-bird.mm/kernel/ptrace.c
--- linux-2.3.20/kernel/ptrace.c	Sun Sep 12 13:03:24 1999
+++ linux-bird.mm/kernel/ptrace.c	Tue Oct 12 09:15:27 1999
@@ -79,14 +79,15 @@
 
 int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write)
 {
-	int copied;
-	struct vm_area_struct * vma = find_extend_vma(tsk, addr);
+	int copied = 0;
+	struct vm_area_struct * vma;
+
+	down(&tsk->mm->mmap_sem);
+	vma = find_extend_vma(tsk, addr);
 
 	if (!vma)
-		return 0;
+		goto out;
 
-	down(&tsk->mm->mmap_sem);
-	copied = 0;
 	for (;;) {
 		unsigned long offset = addr & ~PAGE_MASK;
 		int this_len = PAGE_SIZE - offset;
@@ -115,6 +116,7 @@
 	
 		vma = vma->vm_next;
 	}
+out:
 	up(&tsk->mm->mmap_sem);
 	return copied;
 }


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [more fun] Re: locking question: do_mmap(), do_munmap()
  1999-10-12 14:06                         ` [more fun] " Alexander Viro
@ 1999-10-13  7:35                           ` Manfred Spraul
  1999-10-13 18:34                             ` Kanoj Sarcar
  0 siblings, 1 reply; 51+ messages in thread
From: Manfred Spraul @ 1999-10-13  7:35 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Stephen C. Tweedie, Andrea Arcangeli, linux-kernel, Ingo Molnar,
	Linus Torvalds, linux-mm

Alexander Viro wrote:
> Another one: ptrace_readdata() -> access_process_vm() -> find_extend_vma()
> -> make_pages_present(). Again, no mmap_sem in sight.

Can 2 processes ptrace() each other? If yes, then this will lock-up if
you acquire the mmap_sem.

--
	Manfred
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [more fun] Re: locking question: do_mmap(), do_munmap()
  1999-10-13  7:35                           ` Manfred Spraul
@ 1999-10-13 18:34                             ` Kanoj Sarcar
  0 siblings, 0 replies; 51+ messages in thread
From: Kanoj Sarcar @ 1999-10-13 18:34 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: viro, sct, andrea, linux-kernel, mingo, torvalds, linux-mm

> 
> Alexander Viro wrote:
> > Another one: ptrace_readdata() -> access_process_vm() -> find_extend_vma()
> > -> make_pages_present(). Again, no mmap_sem in sight.

Actually, I think access_process_vm() does get mmap_sem, although
after doing the find_extend_vma, which should be fixed. I will take
care of it in my patch.

> 
> Can 2 processes ptrace() each other? If yes, then this will lock-up if
> you acquire the mmap_sem.
>

Theoretically, there is no problem with 2 processes ptrace()ing each other.
IMO, no deadlock is possible in the approach I posted.

Kanoj

> --
> 	Manfred
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://humbolt.geo.uu.nl/Linux-MM/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 23:01                       ` Alexander Viro
  1999-10-12 14:06                         ` [more fun] " Alexander Viro
@ 1999-10-13 10:16                         ` Stephen C. Tweedie
  1 sibling, 0 replies; 51+ messages in thread
From: Stephen C. Tweedie @ 1999-10-13 10:16 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Stephen C. Tweedie, Manfred Spraul, Andrea Arcangeli,
	linux-kernel, Ingo Molnar, linux-mm

Hi,

On Mon, 11 Oct 1999 19:01:12 -0400 (EDT), Alexander Viro
<viro@math.psu.edu> said:

>> Deadlock.  Process A tries to do an mmap on mm A, gets the exclusive
>> lock, tries to swap out from process B, and grabs mm B's shared lock.
>> Process B in the mean time is doing the same thing and has an exclusive
>> lock on mm B, and is trying to share-lock A.  Whoops.

> <looking at the places in question>
> insert_vm_struct doesn't allocate anything.
> Ditto for merge_segments
> In do_munmap() the area that should be protected (ripping the vmas from
> the list) doesn't allocate anything too.

No such luck.  We don't take the mm semaphore in any of those places: we
already hold it.  The mmap() operation itself takes the semaphore, and
it _certainly_ allocates memory.  You could allow take it shared and
promote it later once the memory has been allocated, but that will just
open up a new class of promotion deadlocks.  Or you could drop and
retake after the allocations, but then you lose the protection of the
semaphore and you have to back and revalidate your vma once you have got
the exclusive lock (which _is_ possible, especially with a generation
number in the vma, but it's hideously ugly).

That's exactly why taking the semaphore in the swapper is so dangerous.  

> In the swapper we are protected from recursion, aren't we?

Yes: the PF_MEMALLOC flag sees to that.

--Stephen

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 16:05                 ` Alexander Viro
  1999-10-11 18:02                   ` Manfred Spraul
@ 1999-10-11 20:13                   ` Stephen C. Tweedie
  1999-10-11 21:40                     ` Alexander Viro
  1 sibling, 1 reply; 51+ messages in thread
From: Stephen C. Tweedie @ 1999-10-11 20:13 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Stephen C. Tweedie, Manfred Spraul, Andrea Arcangeli,
	linux-kernel, Ingo Molnar, linux-mm

Hi,

On Mon, 11 Oct 1999 12:05:23 -0400 (EDT), Alexander Viro
<viro@math.psu.edu> said:

> On Mon, 11 Oct 1999, Stephen C. Tweedie wrote:
>> No, spinlocks would be ideal.  The vma swapout codes _have_ to be
>> prepared for the vma to be destroyed as soon as we sleep.  In fact, the
>> entire mm may disappear if the process happens to exit.  Once we know
>> which page to write where, the swapout operation becomes a per-page
>> operation, not per-vma.

> Aha, so you propose to drop it in ->swapout(), right? (after get_file() in
> filemap_write_page()... Ouch. Probably we'ld better lambda-expand the call
> in filemap_swapout() - the thing is called from other places too)...

Right now it is the big kernel lock which is used for this, and the
scheduler drops it anyway for us.  If anyone wants to replace that lock
with another spinlock, then yes, the swapout method would have to drop
it before doing anything which could block.  And that is ugly: having
spinlocks unbalanced over function calls is a maintenance nightmare.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 20:13                   ` Stephen C. Tweedie
@ 1999-10-11 21:40                     ` Alexander Viro
  1999-10-11 22:20                       ` Stephen C. Tweedie
  0 siblings, 1 reply; 51+ messages in thread
From: Alexander Viro @ 1999-10-11 21:40 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Manfred Spraul, Andrea Arcangeli, linux-kernel, Ingo Molnar, linux-mm


On Mon, 11 Oct 1999, Stephen C. Tweedie wrote:

> Hi,
> 
> On Mon, 11 Oct 1999 12:05:23 -0400 (EDT), Alexander Viro
> <viro@math.psu.edu> said:
> 
> > On Mon, 11 Oct 1999, Stephen C. Tweedie wrote:
> >> No, spinlocks would be ideal.  The vma swapout codes _have_ to be
> >> prepared for the vma to be destroyed as soon as we sleep.  In fact, the
> >> entire mm may disappear if the process happens to exit.  Once we know
> >> which page to write where, the swapout operation becomes a per-page
> >> operation, not per-vma.
> 
> > Aha, so you propose to drop it in ->swapout(), right? (after get_file() in
> > filemap_write_page()... Ouch. Probably we'ld better lambda-expand the call
> > in filemap_swapout() - the thing is called from other places too)...
> 
> Right now it is the big kernel lock which is used for this, and the
> scheduler drops it anyway for us.  If anyone wants to replace that lock
> with another spinlock, then yes, the swapout method would have to drop
> it before doing anything which could block.  And that is ugly: having
> spinlocks unbalanced over function calls is a maintenance nightmare.

Agreed, but the big lock does not (and IMHO should not) cover the vma list
modifications.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 21:40                     ` Alexander Viro
@ 1999-10-11 22:20                       ` Stephen C. Tweedie
  1999-10-11 22:31                         ` Alexander Viro
  0 siblings, 1 reply; 51+ messages in thread
From: Stephen C. Tweedie @ 1999-10-11 22:20 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Stephen C. Tweedie, Manfred Spraul, Andrea Arcangeli,
	linux-kernel, Ingo Molnar, linux-mm

Hi,

On Mon, 11 Oct 1999 17:40:52 -0400 (EDT), Alexander Viro
<viro@math.psu.edu> said:

> Agreed, but the big lock does not (and IMHO should not) cover the vma list
> modifications.

Fine, but as I've said you need _something_.  It doesn't matter what,
but the fact that the kernel lock is no longer being held for vma
updates has introduced swapper races.  We can't fix those without either
restoring or replacing the big lock.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 22:20                       ` Stephen C. Tweedie
@ 1999-10-11 22:31                         ` Alexander Viro
  1999-10-13 10:25                           ` Stephen C. Tweedie
  0 siblings, 1 reply; 51+ messages in thread
From: Alexander Viro @ 1999-10-11 22:31 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Manfred Spraul, Andrea Arcangeli, linux-kernel, Ingo Molnar, linux-mm


On Mon, 11 Oct 1999, Stephen C. Tweedie wrote:

> Hi,
> 
> On Mon, 11 Oct 1999 17:40:52 -0400 (EDT), Alexander Viro
> <viro@math.psu.edu> said:
> 
> > Agreed, but the big lock does not (and IMHO should not) cover the vma list
> > modifications.
> 
> Fine, but as I've said you need _something_.  It doesn't matter what,
> but the fact that the kernel lock is no longer being held for vma
> updates has introduced swapper races.  We can't fix those without either
> restoring or replacing the big lock.

And spinlock being released in the ->swapout() is outright ugly. OK, so
we are adding to mm_struct a new semaphore (vma_sem) and getting it around
the places where the list is modified + in the swapper (for scanning). In
normal situation it will never give us contention - everyone except
swapper uses it with mmap_sem already held. Are there any objections
against it? If it's OK I'll go ahead and do it. Comments?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 22:31                         ` Alexander Viro
@ 1999-10-13 10:25                           ` Stephen C. Tweedie
  0 siblings, 0 replies; 51+ messages in thread
From: Stephen C. Tweedie @ 1999-10-13 10:25 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Stephen C. Tweedie, Manfred Spraul, Andrea Arcangeli,
	linux-kernel, Ingo Molnar, linux-mm

Hi,

On Mon, 11 Oct 1999 18:31:37 -0400 (EDT), Alexander Viro
<viro@math.psu.edu> said:

> And spinlock being released in the ->swapout() is outright ugly. OK, so
> we are adding to mm_struct a new semaphore (vma_sem) and getting it around
> the places where the list is modified + in the swapper (for scanning). In
> normal situation it will never give us contention - everyone except
> swapper uses it with mmap_sem already held. Are there any objections
> against it? If it's OK I'll go ahead and do it. Comments?

Looks OK as long as the swapper remains non-recursive and we never, ever
allocate memory outside the swapper with vma_sem held.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 18:42           ` Manfred Spraul
  1999-10-10 19:03             ` Alexander Viro
@ 1999-10-11 15:47             ` Stephen C. Tweedie
  1 sibling, 0 replies; 51+ messages in thread
From: Stephen C. Tweedie @ 1999-10-11 15:47 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Alexander Viro, Andrea Arcangeli, linux-kernel, Ingo Molnar,
	linux-mm, Stephen Tweedie

Hi,

On Sun, 10 Oct 1999 20:42:31 +0200, Manfred Spraul
<manfreds@colorfullife.com> said:

>> I'm not sure that it will work (we scan the thing in many places and
>> quite a few may be blocking ;-/), unless you propose to protect individual
>> steps of the scan, which will give you lots of overhead.

> The overhead should be low, we could keep the "double synchronization",
> ie
> * either down(&mm->mmap_sem) or spin_lock(&mm->vma_list_lock) for read
> * both locks for write.

Looks good.

> That would be a good idea:
> For multi-threaded applications, swap-in is currently single-threaded,
> ie we do not overlap the io operations if 2 threads of the same process
> cause page faults. Everything is fully serialized.

That can be fixed another way --- it is relatively simple to drop the
semaphore while we read the page from disk and revalidate the page fault
context when we unblock again.  A sequence number against vma list
changes plus a double-check against the contents of the pte would be
sufficient here.  If the context has changed, we can just return: we'll
get another page fault from scratch, and the page will already be in
cache. 

We already have to do part of this work anyway because every time we
block while processing a write fault, we have to double-check to see if
the swapper removed the pte while we were asleep.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 17:12       ` Andrea Arcangeli
  1999-10-10 17:48         ` Alexander Viro
@ 1999-10-11 15:43         ` Stephen C. Tweedie
  1 sibling, 0 replies; 51+ messages in thread
From: Stephen C. Tweedie @ 1999-10-11 15:43 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Alexander Viro, Manfred Spraul, linux-kernel, Ingo Molnar,
	linux-mm, Stephen Tweedie

Hi,

On Sun, 10 Oct 1999 19:12:58 +0200 (CEST), Andrea Arcangeli
<andrea@suse.de> said:

> The other option is to make the mmap semaphore recursive checking that GFP
> is not called in the middle of a vma change. I don't like this one it sound
> not robust as the spinlock way to me (see below).

Doesn't work, because you can still have a process which takes one mmap
semaphore and then attempts to take a different one inside the swapper
as the result of a memory allocation.  As soon as you have two processes
doing that to each other's semaphores, you still have a deadlock.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 16:07   ` Alexander Viro
  1999-10-10 16:25     ` Alexander Viro
@ 1999-10-10 16:56     ` Andrea Arcangeli
  1999-10-11 15:41     ` Stephen C. Tweedie
  2 siblings, 0 replies; 51+ messages in thread
From: Andrea Arcangeli @ 1999-10-10 16:56 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Manfred Spraul, linux-kernel, Ingo Molnar, linux-mm

On Sun, 10 Oct 1999, Alexander Viro wrote:

>Manfred, Andrea - please stop it. Yes, it does and yes, it should.

I don't want to use the big kernel lock of course.

Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 16:07   ` Alexander Viro
  1999-10-10 16:25     ` Alexander Viro
  1999-10-10 16:56     ` Andrea Arcangeli
@ 1999-10-11 15:41     ` Stephen C. Tweedie
  1999-10-11 15:52       ` Alexander Viro
  2 siblings, 1 reply; 51+ messages in thread
From: Stephen C. Tweedie @ 1999-10-11 15:41 UTC (permalink / raw)
  To: Alexander Viro
  Cc: Manfred Spraul, Andrea Arcangeli, linux-kernel, Ingo Molnar,
	linux-mm, Stephen Tweedie

Hi,

On Sun, 10 Oct 1999 12:07:38 -0400 (EDT), Alexander Viro
<viro@math.psu.edu> said:

>> eg. sys_mprotect calls merge_segments without lock_kernel().

> Manfred, Andrea - please stop it. Yes, it does and yes, it should.
> Plonking the big lock around every access to VM is _not_ a solution. If
> swapper doesn't use mmap_sem - _swapper_ should be fixed. How the hell
> does lock_kernel() have smaller deadlock potential than
> down(&mm->mmap_sem)?

The swapout code cannot claim the mmap semaphore.  There are just too
many deadlock possibilities.  For example, the whole VM assumes that it
is safe to try to allocate memory while holding the mmap semaphore.  How
are you going to make that work if we are short of immediately free
pages and the allocation request recurses into the swapper?

The swapper has very strict requirements: to avoid blocking it requires
the big lock and the page table spinlocks, so that it can survive
without the mm semaphore.  Adding the mm semaphore to the swapout loop
is not really an option.  That means that you need the kernel lock when
modifying vma lists.

We can, however, improve things by using a per-mm spinlock instead of
using the kernel lock to provide that guarantee.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-11 15:41     ` Stephen C. Tweedie
@ 1999-10-11 15:52       ` Alexander Viro
  0 siblings, 0 replies; 51+ messages in thread
From: Alexander Viro @ 1999-10-11 15:52 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Manfred Spraul, Andrea Arcangeli, linux-kernel, Ingo Molnar, linux-mm


On Mon, 11 Oct 1999, Stephen C. Tweedie wrote:

> The swapper has very strict requirements: to avoid blocking it requires
> the big lock and the page table spinlocks, so that it can survive
> without the mm semaphore.  Adding the mm semaphore to the swapout loop
> is not really an option.  That means that you need the kernel lock when
> modifying vma lists.

Ouch...

> We can, however, improve things by using a per-mm spinlock instead of
> using the kernel lock to provide that guarantee.

->swapout() may block. We have three areas here:
1. vma accesses in swapper.
2. vma list reads outside of swapper.
3. vma modifications/destruction.

Looks like we need exclusion between 1 and 3 (on per-mm basis, that is).


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* locking question: do_mmap(), do_munmap()
@ 1999-10-09 12:48 Manfred Spraul
  1999-10-09 13:12 ` Alexander Viro
  0 siblings, 1 reply; 51+ messages in thread
From: Manfred Spraul @ 1999-10-09 12:48 UTC (permalink / raw)
  To: linux-kernel, linux-mm

which semaphores/spinlocks protect do_mmap() and do_munmap()?

do_mmap():
	I wrote a test patch, and I found out that some (all?) callers call
lock_kernel(), but that mm->mmap_sem is _NOT ACQUIRED_. Eg
sys_uselib()-> do_load_elf_??() -> do_mmap().

do_munmap():
???? I think here is a race:
sys_munmap() doesn't call lock_kernel(), it only acquires mm->mmap_sem.
do_mmap() internally calls do_munmap(), ie with the kernel lock, but
without mm->mmap_sem.

What about adding debugging runtime checks to these function?
ie #defines which call down_trylock() and spin_trylock() and oops on
missing locks?
We could define them to "(void)0" before 2.4.

--
	Manfred
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-09 12:48 Manfred Spraul
@ 1999-10-09 13:12 ` Alexander Viro
  1999-10-09 13:17   ` Manfred Spraul
  1999-10-09 16:01   ` Andrea Arcangeli
  0 siblings, 2 replies; 51+ messages in thread
From: Alexander Viro @ 1999-10-09 13:12 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: linux-kernel, linux-mm

On Sat, 9 Oct 1999, Manfred Spraul wrote:

> which semaphores/spinlocks protect do_mmap() and do_munmap()?
> 
> do_mmap():
> 	I wrote a test patch, and I found out that some (all?) callers call
> lock_kernel(), but that mm->mmap_sem is _NOT ACQUIRED_. Eg
> sys_uselib()-> do_load_elf_??() -> do_mmap().
> 
> do_munmap():
> ???? I think here is a race:
> sys_munmap() doesn't call lock_kernel(), it only acquires mm->mmap_sem.
> do_mmap() internally calls do_munmap(), ie with the kernel lock, but
> without mm->mmap_sem.

do_munmap() doesn't need the big lock. do_mmap() callers should grab
the semaphore, unless they are sure that nobody else owns the same
context. Big lock on do_mmap() is still needed. do_mmap() in
do_load_elf_binary() is OK (we are the sole owners of context), but in the
do_load_elf_library() it is _not_. Moreover, sys_uselib() may do
interesting things to cloned processes. IMO the right thing would be to
check for the number of mm users.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-09 13:12 ` Alexander Viro
@ 1999-10-09 13:17   ` Manfred Spraul
  1999-10-09 13:38     ` Alexander Viro
  1999-10-09 16:01   ` Andrea Arcangeli
  1 sibling, 1 reply; 51+ messages in thread
From: Manfred Spraul @ 1999-10-09 13:17 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel, linux-mm

Alexander Viro wrote:
> Moreover, sys_uselib() may do
> interesting things to cloned processes. IMO the right thing would be to
> check for the number of mm users.

I don't know the details of the mm implementation, but if there is only
one user, then down(&mm->mmap_sem) will never sleep, and you loose
nothing by getting the semaphore.

I would prefer a clean implementation, ie always down() before
do_mmap(), and ASSERT_DOWN() macros to enforce this.

--
	Manfred
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-09 13:17   ` Manfred Spraul
@ 1999-10-09 13:38     ` Alexander Viro
  0 siblings, 0 replies; 51+ messages in thread
From: Alexander Viro @ 1999-10-09 13:38 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: linux-kernel, linux-mm

On Sat, 9 Oct 1999, Manfred Spraul wrote:

> Alexander Viro wrote:
> > Moreover, sys_uselib() may do
> > interesting things to cloned processes. IMO the right thing would be to
> > check for the number of mm users.
> 
> I don't know the details of the mm implementation, but if there is only
> one user, then down(&mm->mmap_sem) will never sleep, and you loose
> nothing by getting the semaphore.

Yes (especially since sys_uselib() is not _too_ time-critical), but
consider what will happen if the thread A does sys_uselib() while the
thread B runs in the affected area. It's a different problem. As for the
details - check where the new users may come from.

I don't think that it calls for ASSERT-style macros, though. IMO it's a
matter of one big grep. As of 2.3.20-pre2 affected places are
arch/mips/kernel/sysirix.c
arch/sparc64/kernel/binfmt_aout32.c
fs/binfmt_aout.c
fs/binfmt_elf.c
drivers/char/drm/bufs.c (WTF is that?)
drivers/sgi/char/{graphics.c,shmiq.c}

So there... I'm not sure that we need to protect calls in ->load_binary(),
though - it's really unnecessary. 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-09 13:12 ` Alexander Viro
  1999-10-09 13:17   ` Manfred Spraul
@ 1999-10-09 16:01   ` Andrea Arcangeli
  1999-10-10 13:05     ` Manfred Spraul
  1999-10-11 15:05     ` Stephen C. Tweedie
  1 sibling, 2 replies; 51+ messages in thread
From: Andrea Arcangeli @ 1999-10-09 16:01 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Manfred Spraul, linux-kernel, linux-mm

On Sat, 9 Oct 1999, Alexander Viro wrote:

>do_munmap() doesn't need the big lock. do_mmap() callers should grab

Look the swapout path. Without the big kernel lock you'll free vmas under
swap_out().

	ftp://ftp.suse.com/pub/people/andrea/kernel-patches/pending-2.3.x/munmap-lock-1

Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-09 16:01   ` Andrea Arcangeli
@ 1999-10-10 13:05     ` Manfred Spraul
  1999-10-11 15:09       ` Stephen C. Tweedie
  1999-10-11 15:05     ` Stephen C. Tweedie
  1 sibling, 1 reply; 51+ messages in thread
From: Manfred Spraul @ 1999-10-10 13:05 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, linux-mm

Andrea Arcangeli wrote:
> Look the swapout path. Without the big kernel lock you'll free vmas under
> swap_out().

I checked to code in mm/*.c, and it seems that reading the vma-list is
protected by either lock_kernel() [eg: swapper] or down(&mm->mmap_sem)
[eg: do_mlock].

But this means that both locks are required if you modify the vma list.
Single reader, multiple writer synchronization. Unusual, but interesting
:-)

Unfortunately, it seems that this is often ignored, eg. 

sys_mlock()->do_mlock()->merge_segments().
sys_brk()
sys_munmap() <<<<<< fixed by your patch.

It that correct?
Should I write a patch or is someone working on these problems?
How should we fix it?

a) the swapper calls down(&mm->mmap_sem), but I guess that would
lock-up.

b) everyone who changes the vma list calls lock_kernel().
I think it would be a bad thing to call lock_kernel() immediately in the
sys_??() function, I think we should hide the lock_kernel() call
somewhere
inside the vma-list code [add functions which modify the vma list, and
they call lock_kernel()].

--
	Manfred

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-10 13:05     ` Manfred Spraul
@ 1999-10-11 15:09       ` Stephen C. Tweedie
  0 siblings, 0 replies; 51+ messages in thread
From: Stephen C. Tweedie @ 1999-10-11 15:09 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Andrea Arcangeli, linux-kernel, linux-mm, Stephen Tweedie

Hi,

On Sun, 10 Oct 1999 15:05:44 +0200, Manfred Spraul <manfreds@colorfullife.com> said:

> Andrea Arcangeli wrote:
>> Look the swapout path. Without the big kernel lock you'll free vmas under
>> swap_out().

> I checked to code in mm/*.c, and it seems that reading the vma-list is
> protected by either lock_kernel() [eg: swapper] or down(&mm->mmap_sem)
> [eg: do_mlock].

The swapper relies on it being protected by the big lock.  The mm
semaphore is required when you need additional protection: specifically,
if you need to sleep while manipulating the vma lists (eg. in page
faults). 

> But this means that both locks are required if you modify the vma list.
> Single reader, multiple writer synchronization. Unusual, but interesting
> :-)

Correct, but you only need the one lock --- the big lock --- to read the
vma list, which is what the swapper does.  The swapper only needs write
access to the page tables, not to the vma list.

> How should we fix it?

> a) the swapper calls down(&mm->mmap_sem), but I guess that would
> lock-up.

Massive deadlock, indeed.  We've looked at this but it is soooo painful.

> b) everyone who changes the vma list calls lock_kernel().

... or an equivalent lock.  The big lock itself isn't needed if we have
a per-mm spinlock, but we do need something lighter weight than the mmap
semaphore to let the swapper read-protect the vma lists.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: locking question: do_mmap(), do_munmap()
  1999-10-09 16:01   ` Andrea Arcangeli
  1999-10-10 13:05     ` Manfred Spraul
@ 1999-10-11 15:05     ` Stephen C. Tweedie
  1 sibling, 0 replies; 51+ messages in thread
From: Stephen C. Tweedie @ 1999-10-11 15:05 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Alexander Viro, Manfred Spraul, linux-kernel, linux-mm, Stephen Tweedie

Hi,

On Sat, 9 Oct 1999 18:01:27 +0200 (CEST), Andrea Arcangeli
<andrea@suse.de> said:

> On Sat, 9 Oct 1999, Alexander Viro wrote:
>> do_munmap() doesn't need the big lock. do_mmap() callers should grab

> Look the swapout path. Without the big kernel lock you'll free vmas under
> swap_out().

Yes.  The swapout code relies on the big lock to freeze the vma, and on
the page_table_lock to protect the ptes, so that it can avoid worrying
about the mm_sem at all.

If munmap ever drops vmas without the big lock, the swapper _will_
break.  Making this into a per-mm lock would not be hard, btw.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~1999-10-15 17:50 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.10.9910101713010.364-100000@alpha.random>
1999-10-10 15:52 ` locking question: do_mmap(), do_munmap() Manfred Spraul
1999-10-10 16:07   ` Alexander Viro
1999-10-10 16:25     ` Alexander Viro
1999-10-10 16:45       ` Manfred Spraul
1999-10-10 17:25         ` Alexander Viro
1999-10-10 17:12       ` Andrea Arcangeli
1999-10-10 17:48         ` Alexander Viro
1999-10-10 18:42           ` Manfred Spraul
1999-10-10 19:03             ` Alexander Viro
1999-10-10 21:31               ` Manfred Spraul
1999-10-10 21:53               ` Andrea Arcangeli
1999-10-10 22:34                 ` Alexander Viro
1999-10-10 23:28                   ` Andrea Arcangeli
1999-10-11 15:50               ` Stephen C. Tweedie
1999-10-11 16:05                 ` Alexander Viro
1999-10-11 18:02                   ` Manfred Spraul
1999-10-11 19:07                     ` Kanoj Sarcar
1999-10-11 22:23                       ` Stephen C. Tweedie
1999-10-13  1:25                         ` Kanoj Sarcar
1999-10-13  7:32                           ` Manfred Spraul
1999-10-15  9:58                             ` Ralf Baechle
1999-10-15 17:50                               ` Kanoj Sarcar
1999-10-13 10:45                           ` Stephen C. Tweedie
1999-10-11 20:15                     ` Stephen C. Tweedie
1999-10-11 21:14                       ` Manfred Spraul
1999-10-11 21:37                     ` Alexander Viro
1999-10-11 22:13                       ` Manfred Spraul
1999-10-11 22:22                     ` Stephen C. Tweedie
1999-10-11 23:01                       ` Alexander Viro
1999-10-12 14:06                         ` [more fun] " Alexander Viro
1999-10-13  7:35                           ` Manfred Spraul
1999-10-13 18:34                             ` Kanoj Sarcar
1999-10-13 10:16                         ` Stephen C. Tweedie
1999-10-11 20:13                   ` Stephen C. Tweedie
1999-10-11 21:40                     ` Alexander Viro
1999-10-11 22:20                       ` Stephen C. Tweedie
1999-10-11 22:31                         ` Alexander Viro
1999-10-13 10:25                           ` Stephen C. Tweedie
1999-10-11 15:47             ` Stephen C. Tweedie
1999-10-11 15:43         ` Stephen C. Tweedie
1999-10-10 16:56     ` Andrea Arcangeli
1999-10-11 15:41     ` Stephen C. Tweedie
1999-10-11 15:52       ` Alexander Viro
1999-10-09 12:48 Manfred Spraul
1999-10-09 13:12 ` Alexander Viro
1999-10-09 13:17   ` Manfred Spraul
1999-10-09 13:38     ` Alexander Viro
1999-10-09 16:01   ` Andrea Arcangeli
1999-10-10 13:05     ` Manfred Spraul
1999-10-11 15:09       ` Stephen C. Tweedie
1999-10-11 15:05     ` Stephen C. Tweedie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox