linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Davidlohr Bueso <davidlohr@hp.com>
To: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>, Michal Hocko <mhocko@suse.cz>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Hugh Dickins <hughd@google.com>,
	Davidlohr Bueso <davidlohr.bueso@hp.com>,
	David Gibson <david@gibson.dropbear.id.au>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Wanpeng Li <liwanp@linux.vnet.ibm.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Hillf Danton <dhillf@gmail.com>,
	aswin@hp.com
Subject: Re: [PATCH v3 13/14] mm, hugetlb: retry if failed to allocate and there is concurrent user
Date: Fri, 03 Jan 2014 11:55:45 -0800	[thread overview]
Message-ID: <1388778945.2956.20.camel@buesod1.americas.hpqcorp.net> (raw)
In-Reply-To: <20131223021118.GA2487@lge.com>

Hi Joonsoo,

Sorry about the delay...

On Mon, 2013-12-23 at 11:11 +0900, Joonsoo Kim wrote:
> On Mon, Dec 23, 2013 at 09:44:38AM +0900, Joonsoo Kim wrote:
> > On Fri, Dec 20, 2013 at 10:48:17PM -0800, Davidlohr Bueso wrote:
> > > On Fri, 2013-12-20 at 14:01 +0000, Mel Gorman wrote:
> > > > On Thu, Dec 19, 2013 at 05:02:02PM -0800, Andrew Morton wrote:
> > > > > On Wed, 18 Dec 2013 15:53:59 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > > > > 
> > > > > > If parallel fault occur, we can fail to allocate a hugepage,
> > > > > > because many threads dequeue a hugepage to handle a fault of same address.
> > > > > > This makes reserved pool shortage just for a little while and this cause
> > > > > > faulting thread who can get hugepages to get a SIGBUS signal.
> > > > > > 
> > > > > > To solve this problem, we already have a nice solution, that is,
> > > > > > a hugetlb_instantiation_mutex. This blocks other threads to dive into
> > > > > > a fault handler. This solve the problem clearly, but it introduce
> > > > > > performance degradation, because it serialize all fault handling.
> > > > > > 
> > > > > > Now, I try to remove a hugetlb_instantiation_mutex to get rid of
> > > > > > performance degradation.
> > > > > 
> > > > > So the whole point of the patch is to improve performance, but the
> > > > > changelog doesn't include any performance measurements!
> > > > > 
> > > > 
> > > > I don't really deal with hugetlbfs any more and I have not examined this
> > > > series but I remember why I never really cared about this mutex. It wrecks
> > > > fault scalability but AFAIK fault scalability almost never mattered for
> > > > workloads using hugetlbfs.  The most common user of hugetlbfs by far is
> > > > sysv shared memory. The memory is faulted early in the lifetime of the
> > > > workload and after that it does not matter. At worst, it hurts application
> > > > startup time but that is still poor motivation for putting a lot of work
> > > > into removing the mutex.
> > > 
> > > Yep, important hugepage workloads initially pound heavily on this lock,
> > > then it naturally decreases.
> > > 
> > > > Microbenchmarks will be able to trigger problems in this area but it'd
> > > > be important to check if any workload that matters is actually hitting
> > > > that problem.
> > > 
> > > I was thinking of writing one to actually get some numbers for this
> > > patchset -- I don't know of any benchmark that might stress this lock. 
> > > 
> > > However I first measured the amount of cycles it costs to start an
> > > Oracle DB and things went south with these changes. A simple 'startup
> > > immediate' calls hugetlb_fault() ~5000 times. For a vanilla kernel, this
> > > costs ~7.5 billion cycles and with this patchset it goes up to ~27.1
> > > billion. While there is naturally a fair amount of variation, these
> > > changes do seem to do more harm than good, at least in real world
> > > scenarios.
> > 
> > Hello,
> > 
> > I think that number of cycles is not proper to measure this patchset,
> > because cycles would be wasted by fault handling failure. Instead, it
> > targeted improved elapsed time. 

Fair enough, however the fact of the matter is this approach does en up
hurting performance. Regarding total startup time, I didn't see hardly
any differences, with both vanilla and this patchset it takes close to
33.5 seconds.

> Could you tell me how long it
> > takes to fault all of it's hugepages?
> > 
> > Anyway, this order of magnitude still seems a problem. :/
> > 
> > I guess that cycles are wasted by zeroing hugepage in fault-path like as
> > Andrew pointed out.
> > 
> > I will send another patches to fix this problem.
> 
> Hello, Davidlohr.
> 
> Here goes the fix on top of this series.

... and with this patch we go from 27 down to 11 billion cycles, so this
approach still costs more than what we currently have. A perf stat shows
that an entire 1Gb huge page aware DB startup costs around ~30 billion
cycles on a vanilla kernel, so the impact of hugetlb_fault() is
definitely non trivial and IMO worth considering.

Now, I took my old patchset (https://lkml.org/lkml/2013/7/26/299) for a
ride and things do look quite better, which is basically what Andrew was
suggesting previously anyway. With the hash table approach the startup
time did go down to ~25.1 seconds, which is a nice -24.7% time
reduction, with hugetlb_fault() consuming roughly 5.3 billion cycles.
This hash table was on a 80 core system, so since we do the power of two
round up we end up with 256 entries -- I think we can do better if we
enlarger further, maybe something like statically 1024, or probably
better, 8-ish * nr cpus.

Thoughts? Is there any reason why we cannot go with this instead? Yes,
we still keep the mutex, but the approach is (1) proven better for
performance on real world workloads and (2) far less invasive. 

Thanks,
Davidlohr

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-01-03 19:55 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-18  6:53 [PATCH v3 00/14] mm, hugetlb: remove a hugetlb_instantiation_mutex Joonsoo Kim
2013-12-18  6:53 ` [PATCH v3 01/14] mm, hugetlb: unify region structure handling Joonsoo Kim
2013-12-21  9:04   ` David Gibson
2014-01-07  2:37   ` Davidlohr Bueso
2013-12-18  6:53 ` [PATCH v3 02/14] mm, hugetlb: region manipulation functions take resv_map rather list_head Joonsoo Kim
2013-12-21 13:43   ` David Gibson
2014-01-07  2:39   ` Davidlohr Bueso
2013-12-18  6:53 ` [PATCH v3 03/14] mm, hugetlb: protect region tracking via newly introduced resv_map lock Joonsoo Kim
2013-12-21 13:58   ` David Gibson
2013-12-23  1:05     ` Joonsoo Kim
2013-12-24 12:00       ` David Gibson
2014-01-06  0:12         ` Joonsoo Kim
2014-01-07  2:39   ` Davidlohr Bueso
2013-12-18  6:53 ` [PATCH v3 04/14] mm, hugetlb: remove resv_map_put() Joonsoo Kim
2013-12-18  6:53 ` [PATCH v3 05/14] mm, hugetlb: make vma_resv_map() works for all mapping type Joonsoo Kim
2013-12-18  6:53 ` [PATCH v3 06/14] mm, hugetlb: remove vma_has_reserves() Joonsoo Kim
2013-12-18  6:53 ` [PATCH v3 07/14] mm, hugetlb: mm, hugetlb: unify chg and avoid_reserve to use_reserve Joonsoo Kim
2013-12-18  6:53 ` [PATCH v3 08/14] mm, hugetlb: call vma_needs_reservation before entering alloc_huge_page() Joonsoo Kim
2013-12-18  6:53 ` [PATCH v3 09/14] mm, hugetlb: remove a check for return value of alloc_huge_page() Joonsoo Kim
2013-12-18  6:53 ` [PATCH v3 10/14] mm, hugetlb: move down outside_reserve check Joonsoo Kim
2013-12-18  6:53 ` [PATCH v3 11/14] mm, hugetlb: move up anon_vma_prepare() Joonsoo Kim
2013-12-18  6:53 ` [PATCH v3 12/14] mm, hugetlb: clean-up error handling in hugetlb_cow() Joonsoo Kim
2013-12-18  6:53 ` [PATCH v3 13/14] mm, hugetlb: retry if failed to allocate and there is concurrent user Joonsoo Kim
2013-12-20  1:02   ` Andrew Morton
2013-12-20  1:58     ` Joonsoo Kim
2013-12-20  2:15       ` Andrew Morton
2013-12-20  5:00         ` Joonsoo Kim
2013-12-20  2:31     ` Davidlohr Bueso
2013-12-20  4:47       ` Joonsoo Kim
2013-12-20 14:01     ` Mel Gorman
2013-12-21  6:48       ` Davidlohr Bueso
2013-12-23  0:44         ` Joonsoo Kim
2013-12-23  2:11           ` Joonsoo Kim
2014-01-03 19:55             ` Davidlohr Bueso [this message]
2014-01-06  0:19               ` Joonsoo Kim
2014-01-06 12:19                 ` Davidlohr Bueso
2014-01-07  1:57                   ` Joonsoo Kim
2014-01-07  2:36                     ` Davidlohr Bueso
2014-01-15  3:08                       ` David Rientjes
2014-01-15  4:37                         ` Davidlohr Bueso
2014-01-15  4:56                           ` Andrew Morton
2014-01-15 20:47                             ` Davidlohr Bueso
2014-01-15 20:50                               ` Andrew Morton
2013-12-18  6:54 ` [PATCH v3 14/14] mm, hugetlb: remove a hugetlb_instantiation_mutex Joonsoo Kim
2014-03-31 16:27 ` [PATCH v3 00/14] " Dave Hansen
2014-03-31 17:26   ` Davidlohr Bueso
2014-03-31 18:41     ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1388778945.2956.20.camel@buesod1.americas.hpqcorp.net \
    --to=davidlohr@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=aswin@hp.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=davidlohr.bueso@hp.com \
    --cc=dhillf@gmail.com \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liwanp@linux.vnet.ibm.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox