linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@osdl.org>
To: 'David Gibson' <david@gibson.dropbear.id.au>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>,
	'Christoph Lameter' <christoph@schroedinger.engr.sgi.com>,
	Hugh Dickins <hugh@veritas.com>,
	bill.irwin@oracle.com, Adam Litke <agl@us.ibm.com>,
	linux-mm@kvack.org
Subject: Re: [RFC] reduce hugetlb_instantiation_mutex usage
Date: Thu, 26 Oct 2006 17:04:15 -0700	[thread overview]
Message-ID: <20061026170415.ec0bb0b9.akpm@osdl.org> (raw)
In-Reply-To: <20061026233137.GA11733@localhost.localdomain>

On Fri, 27 Oct 2006 09:31:37 +1000
"'David Gibson'" <david@gibson.dropbear.id.au> wrote:

> On Thu, Oct 26, 2006 at 03:44:51PM -0700, Andrew Morton wrote:
> > On Thu, 26 Oct 2006 15:17:20 -0700
> > "Chen, Kenneth W" <kenneth.w.chen@intel.com> wrote:
> > 
> > > First rev of patch to allow hugetlb page fault to scale.
> > > 
> > > hugetlb_instantiation_mutex was introduced to prevent spurious allocation
> > > failure in a corner case: two threads race to instantiate same page with
> > > only one free page left in the global pool.  However, this global
> > > serialization hurts fault performance badly as noted by Christoph Lameter.
> > > This patch attempt to cut back the use of mutex only when free page resource
> > > is limited, thus allow fault to scale in most common cases.
> > >
> > 
> > ug.
> > 
> > How about we kill that instantiation_mutex thing altogether and fix
> > the original bug in a better fashion?  Like...
> > 
> > In hugetlb_no_page():
> > 
> > retry:
> > 	page = find_lock_page(...)
> > 	if (!page) {
> > 		write_lock_irq(&mapping->tree_lock);
> > 		if (radix_tree_lookup(...)) {
> > 			write_unlock_irq(tree_lock);
> > 			goto retry;
> > 		}
> > 		page = alloc_huge_page(...);
> > 		if (!page)
> > 			bail;
> > 		radix_tree_insert(...);
> > 		SetPageLocked(page);
> > 		write_unlock_irq(tree_lock);
> > 		clear_huge_page(...);
> > 	}
> > 
> > 	<stick it in page tables>
> > 
> > 	unlock_page(page);
> > 
> > The key points:
> > 
> > - Use tree_lock to prevent the race
> > 
> > - allocate the hugepage inside tree_lock so we never get into this
> >   two-threads-tried-to-allocate-the-final-page problem.
> > 
> > - The hugepage is zeroed without locks held, under lock_page()
> > 
> > - lock_page() is used to make the other thread(s) sleep while the winner
> >   thread is zeroing out the page.
> > 
> > It means that rather a lot of add_to_page_cache() will need to be copied
> > into hugetlb_no_page().
> 
> This handles the case of processes racing on a shared mapping, but not
> the case of threads racing on a private mapping.  In the latter case
> the race ends at the set_pte() rather than the add_to_page_cache()
> (well, strictly with the whole page_table_lock atomic lump).  And we
> can't move the clear after the set_pte() :(.
> 

I expect we can do a similar thing, using page_table_lock to prevent the
race.

The key is to be able to make racing threads still block on the page lock. 
Perhaps install a temp pte which is !pte_present() and also !pte_none(). 
So the racing thread can use that pte to locate and wait upon the
presently-locked page while it is being COWed by another CPU.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-10-27  0:04 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-26 22:17 Chen, Kenneth W
2006-10-26 22:44 ` Andrew Morton
2006-10-26 23:31   ` 'David Gibson'
2006-10-27  0:04     ` Andrew Morton [this message]
2006-10-27  3:11       ` 'David Gibson'
2006-10-27  3:35         ` Andrew Morton
2006-10-27  4:06           ` 'David Gibson'
2006-10-31  2:54             ` Chen, Kenneth W
2006-10-31  3:17               ` 'David Gibson'
2006-10-31  5:15                 ` Chen, Kenneth W
2006-10-31 11:05                   ` 'David Gibson'
2006-10-31 12:48                     ` Hugh Dickins
2006-11-01  6:18                       ` Nick Piggin
2006-11-01 10:17                         ` Chen, Kenneth W
2006-11-02  3:06                           ` Nick Piggin
2006-11-02  2:29                       ` 'David Gibson'
2006-10-27  1:47     ` 'David Gibson'
2006-10-30 20:55       ` Adam Litke
2006-10-26 23:47 ` 'David Gibson'

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061026170415.ec0bb0b9.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=agl@us.ibm.com \
    --cc=bill.irwin@oracle.com \
    --cc=christoph@schroedinger.engr.sgi.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=hugh@veritas.com \
    --cc=kenneth.w.chen@intel.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox