On Wednesday 10 October 2007 17:50, Ken Chen wrote:
> On 10/9/07, Ken Chen <kenchen@google.com> wrote:
> > That's what I figures.  In that case, why don't we get rid of all spin
> > lock in the fast path of follow_hugetlb_pages.
> >
> > follow_hugetlb_page is called from get_user_pages, which should
> > already hold mm->mmap_sem in read mode.  That means page table tear
> > down can not happen.  We do a racy read on page table chain.  If a
> > race happened with another thread, no big deal, it will just fall into
> > hugetlb_fault() which will then serialize with
> > hugetlb_instantiation_mutex or mm->page_table_lock.  And that's slow
> > path anyway.
>
> never mind.  ftruncate can come through in another path removes
> mapping without holding mm->mmap_sem.  So much for the crazy idea.

Yeah, that's a killer...

Here is another crazy idea I've been mulling around. I was on
the brink of forgetting the whole thing until Suresh just now
showed how much performance there is to be had.

I don't suppose the mmap_sem avoidance from this patch matters
so much if your database isn't using threads. But at least it
should be faster (unless my crazy idea has some huge hole, and
provided hugepages are implemented).

Basic idea is that architectures can override get_user_pages.
Or at least, a fast if not complete version and subsequently
fall back to regular get_user_pages if it encounters something
difficult (eg. a swapped out page).

I *think* we can do this for x86-64 without taking mmap_sem, or
_any_ page table locks at all. Obviously the CPUs themselves do
a very similar lockless lookup for TLB fill.

[ We actually might even be able to go one better if we could have
  virt->phys instructions in the CPU that would lookup and even
  fill the TLB for us. I don't know what the chances of that
  happening are, Suresh ;) ]

Attached is the really basic sketch of how it will work. Any
party poopers care tell me why I'm an idiot? :)