Re: [RFC] another crazy idea to get rid of mmap_sem in faults

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	hugh <hugh@veritas.com>,
	Paul E McKenney <paulmck@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>, linux-mm <linux-mm@kvack.org>
Subject: Re: [RFC] another crazy idea to get rid of mmap_sem in faults
Date: Mon, 01 Dec 2008 16:28:36 +0100	[thread overview]
Message-ID: <1228145317.7070.17.camel@twins> (raw)
In-Reply-To: <Pine.LNX.4.64.0812010857040.15331@quilx.com>

On Mon, 2008-12-01 at 09:06 -0600, Christoph Lameter wrote:
> On Mon, 1 Dec 2008, Peter Zijlstra wrote:
> 
> > Well, yeah, mainly the vmas, right? We need to ensure the vma we fault
> > into doesn't go away or change under us.
> >
> > Does mmap_sem protect more than the vma's and their RB tree?
> 
> It may protect portions of the mm_struct?

Maybe, I'll see what of that faults use.

> > I'll ponder the COW stuff, but since that too holds on to the PTL I
> > think we're good.
> 
> Copy pte range is called under mmap_sem when forking.

Obviously, but if we do faults without mmap_sem there's a race, we need
to do the right thing.

> > > Then you will need to use RCU for the vmas in general. We already use
> > > RCU for the anonymous vma. Extend that to all vmas?
> >
> > RCU cannot be used, since we need to be able to sleep in a fault in
> > order to do IO. So we need to extend SRCU - but since we have all the
> > experience of preemptible RCU to draw from I think that should not be an
> > issue.
> 
> Then do not hold it across sleeping portions. If you have to sleep then
> start another rcu section and recheck the situation before proceeding.

I'd prefer to start out with SRCU and maybe try that later.

> > > How would this sync with other operations that need to take mmap_sem?
> >
> > What other ops? mmap/munmap/mremap/madvise etc.?
> 
> Look for down_write operations on mmap_sem. Things like set_brk etc.

Sure, I thought you had something specific in mind. 

The only relevant thing for faults is vma's (afaict) and any operation
that changes those either fudges PTEs and thus needs the PTE lock, or
leaves the fault valid in which case the extended validation should find
it so.

> > The idea is that any change to a vma (which would require exclusive
> > mmap_sem) will replace the vma - marking the old one dead and SRCU free
> > it.
> 
> That is a classic RCU approach. The problem then is that operations may
> continue on the stale object. You need to make sure that no modifications
> can occur to the original VMA while it exists in two version.
> 
> > struct vm_area_struct *find_vma(mm, addr)
> > {
> > again:
> > 	rcu_read_lock(); /* solely for the lookup structure */
> > 	vma = tree_lookup(&mm->vma_tree, addr); /* vma is srcu guarded */
> > 	rcu_read_unlock();
> > 	if (vma && vma_is_dead(vma))
> > 		goto again;
> >
> > 	return vma;
> 
> Dont you need to check and increase the refcount in the rcu section (in
> the absence of any other locking) to make sure that the object still
> exists when you refer to it next?

/* vma is srcu guarded */

taking a ref on the vma is just as bad as using mmap_sem - we might as
well not bother with the whole ordeal in that case.

> > /*
> >  * validates if a previously obtained vma is still valid,
> >  * synchronizes with vma against PTL.
> >  */
> > int validate_vma(mm, vma, addr)
> > {
> > 	ASSERT_PTL_LOCKED(mm, addr);
> >
> > 	if (!vma_is_dead(vma))
> > 		return 1;
> >
> > 	vma2 = find_vma(mm, addr);
> >
> > 	if (/*old vma fault is still valid in vma2*/)
> > 		return 1
> >
> > 	return 0;
> > }
> 
> This can only be done while the refcount is elevated.

Like above, we're still in the one SRCU section. The basic idea is to
replace down_read/up_read with srcu_read_lock/srcu_read_unlock for
faults, since the latter scales (or can scale) but a single lock can
not.

> > > srcu may have too much of an overhead for this.
> >
> > Then we need to fix that ;-) But surely SRCU is cheaper than mmap_sem.
> 
> Holding off frees for a long time (sleeping???) is usually bad for cache
> hot behavior. It introduces cacheline refetches. Avoid if possible.

Well, what do you rather have, mmap_sem cacheline bouncing and
contention?

Do you think the small vma slab will suffer significantly on the
relatively expensive exclusive mmap_sem operations?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-12-01 15:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-28 15:42 Peter Zijlstra
2008-11-28 17:04 ` Paul E. McKenney
2008-11-30 19:27 ` Linus Torvalds
2008-11-30 19:42   ` Peter Zijlstra
2008-12-01 22:55     ` Hugh Dickins
2008-12-01 14:00 ` Christoph Lameter
2008-12-01 14:48   ` Peter Zijlstra
2008-12-01 15:06     ` Christoph Lameter
2008-12-01 15:28       ` Peter Zijlstra [this message]
2008-12-01 16:22       ` Paul E. McKenney
2008-12-01 15:27     ` Lee Schermerhorn
2008-12-01 15:33       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1228145317.7070.17.camel@twins \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox